DEV Community

Krinskumar Vaghasia
Krinskumar Vaghasia

Posted on

GCC - A new pass

This is part two of my project where I will be implementing a custom pass in GCC compiler. In my last blog, I built my own version of GCC and made the version my default version. This can be checked using which gcc which should display the path address of the build. The pass that I will be making a pass that will be responsible for finding duplicate functions and pruning them.

Get the demo pass to work

When I had my personal build running, I started by implementing the demo code that professor gave us to get things started and get familiar with how all of this works, I just plugged the code which included
a new line in makefile.in file -> tree-prunecheck.o/,
a new line in passes.def file -> NEXT_PASS (pass_prunecheck);,
a new line in tree-pass.h file -> extern gimple_opt_pass *make_pass_prunecheck (gcc::context *ctxt);
and finally the actual implementation in the file called tree-prunecheck.cc. All of this in the gcc/gcc folder in the gcc public repository.

I plugged these in my gcc local version in aarch64 version of my class servers. I thought that I can run make and make install in my previous build and everything would work. I ended up doing that and running a simple hello_world program which ended up not working. I was just not able to find the pass dump as one of the dumps and when I specifically passed my pass name as one of the dumps to be produced, it error out. I thought the problem was my architecture, so I started building in x86, but the problem continued.

I made a build on my own local mac and faced a different issue altogether. I think it was because of some compatibility issues with my mac version or something. I mad a decision to not waste a lot of time in this, because we had a class servers for a reason and I can ask for help if something is not working in my class servers with my friends and professor as opposed to some random issue on my mac.

I eventually consulted with one of my friends doing the same course, and he told me that I am supposed to make in an entirely new directory and start everything from scratch, this will make sure everything is clean. I did this and this time my passes started showing up as one of the dump files when I compiled my programme. I dont know if making a new directory from scratch fixed my problem or reseting my changes in my local gcc did. Whichever it was, I could finally start working on my own pass now.

Making changes to the pass

I was also told not to run make and make install on the same build directory after I make my changes, so every time I make a change I was supposed to do the whole build process. Regardless I was sure from the start the I wanted to implement a code such that I am hashing each function and comparing the hashes. Here is what the pseudocode would look like:

  1. Check if the function is versioned:
    • No: Return NOPRUNE
  2. Yes: Search for the origin of the function
    • Not found: Return NOPRUNE
  3. Found: Calculate the hash of both of the functions and compare
    • Hash same: Return PRUNE
    • Hash not same: Return NOPRUNE

This is how my code would look like or so I thought, but I was not aware as to how I can share context of hashes across different executions. Untimely I ended u just printing the hashes and looking at the dump file to compare the files.
This is what the code looked like:

#define INCLUDE_MEMORY
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"
#include "hash-table.h" // GCC hash utilities
#include <string>        // For std::string

// Pass metadata
const pass_data pass_data_prune_check = {
  GIMPLE_PASS, /* type */
  "prune", /* name */
  OPTGROUP_NONE, /* optinfo_flags */
  TV_NONE, /* tv_id */
  PROP_cfg, /* properties_required */
  0, /* properties_provided */
  0, /* properties_destroyed */
  0, /* todo_flags_start */
  0, /* todo_flags_finish */
};

class pass_prunecheck : public gimple_opt_pass {
public:
  pass_prunecheck(gcc::context *ctxt) : gimple_opt_pass(pass_data_prune_check, ctxt) {}

  /* opt_pass methods */
  bool gate(function *) final override {
    return 1; // Always execute pass
  }

  unsigned int execute(function *) final override;
};

unsigned int
pass_prunecheck::execute(function *fun)
{
  // Check if the function is a clone
  int isClone = DECL_FUNCTION_VERSIONED(fun->decl);

  if (dump_file) {
    fprintf(dump_file, "Function: %s\n", function_name(fun));
    if (isClone) {
      fprintf(dump_file, "Function is a clone/version: True\n");
    } else {
      fprintf(dump_file, "Function is a clone/version: False\n");
    }
  }

  // Initialize a hash value
  unsigned long hash = 5381;

  // Process all Gimple statements to compute the hash
  basic_block bb;
  FOR_EACH_BB_FN(bb, fun) {
    for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
      gimple *g = gsi_stmt(gsi);
      int code = gimple_code(g);

      // Update hash using a simple algorithm (DJB2-inspired)
      hash = ((hash << 5) + hash) + code; // hash * 33 + code
    }
  }

  // Print the hash
  if (dump_file) {
    fprintf(dump_file, "Hash for function %s: %lu\n", function_name(fun), hash);
  }

  if (dump_file) {
    fprintf(dump_file, "\n\n##### End function prune check, start regular dump of current gimple #####\n\n\n");
  }

  return 0;
}

// Entry point for GCC to register the pass
gimple_opt_pass *make_pass_prunecheck(gcc::context *ctxt) {
  return new pass_prunecheck(ctxt);
}

Enter fullscreen mode Exit fullscreen mode

Testing the pass

We were given a utility folder with two different runs with one having duplicate functions. This way we can test both cases. This is what the dump file looked like when duplicate function was in the code. See how we have two hashes with the same values indicating that the functions are same. NOTE: There is more in this file, I cropped the screenshot to only show the main part with the hashes displayed.
Image description

And this is the part where there was no duplicate code. Here you can notice that the hashes displayed are not the same hence indicating that the function are not same therefore no pruning needed.
Image description

Pain point

One of the most challenging and annoying part of this project was to build the gcc compiler. Primarily because of how long it takes to execute. And I have to do this every time I make a change in my the build to complete.
There is very less documentation for gcc in general and there is a very big learning curve if someone wanted to contribute to gcc. So doing spikes on some of my ideas was an absolute nightmare and spent a lot of hours trying to research for potential solutions.

Conclusion

I think this was one of the most challenging project I worked on, not just in this course but in my student journey at Seneca. Partly because of so many inconsistencies in my knowledge on gcc and the passes. Even after so many extensions I was not able to completely complete this project. I think this was a good starting point for me and now I am much more confident with how passes work. I just need to learn more of the utilities provided by gcc that I can use in my passes.

Top comments (0)