Testing & Fixes - Project Stage 02 | Programming and Development

Introduction

Welcome back (again) to Project Stage 02! 😄

In the previous part, we implemented logic for our tree-kzaw.cc pass which have been set up in gcc since Project Stage 01. In this continued part of stage 02, I will be testing out our logic and adding any fixes as needed.

As always, I will be using aarch64 and x86 servers to test my logic.

❇️ Please find the source code here: GitHub ❇️

📌 Recompiling & Rebuilding

Now that our pass logic has been implemented in the previous blog, we will need to rebuild our gcc to update it. Fortunately, if we did all that stuff with Makefile.in from Project Stage 01, we won't have to wait to do a full gcc rebuild.

Run these commands:

cd ~/gcc-build-001
time make -j 24 |& tee build-stage2-01.log
make install

If nothing goes wrong, we should be good for testing!

📌 Testing

In each of our servers, there is a file in the path: /public/spo600-test-clone.tgz. Two binaries will be built once we extract this and use make: one for PRUNE test and one for NOTPRUNE.

We will run this code to extract it and go into the directory as follows:

tar xvf /public/spo600-test-clone.tgz
cd ~/spo600/examples/test-clone

I see see a Makefile, README.txt and some other .c and .h files. The README.txt shows me that running make will output two binaries:

test-clone-ARCH-prune
test-clone-ARCH-noprune where ARCH is either 'aarch64' or 'x86' depending on the server.

👉 Setting Up Makefile

Let's check the Makefile as there are some changes we must do.

Now there is a lot of things happening in there, but what we need to focus on is the gcc. Within the file, you may see something like:

clone-test-x86-prune: clone-test-core.c $(LIBRARIES)
        gcc -D 'CLONE_ATTRIBUTE=......

Here, the gcc means the server's root gcc, not our local gcc so we will change that.

Add this at the top. Make sure to change the directory with your install directory (not build directory):

# Use the locally built GCC
CC = $(HOME)/gcc-test-001/bin/gcc

Now replace all gcc with $(CC).

clone-test-x86-prune: clone-test-core.c $(LIBRARIES)
        $(CC) -D 'CLONE_ATTRIBUTE=......

This ensures that the gcc being used is our local one! While we're at it, uncomment the DUMP_ALL = 1 since we would want to check dump files to know whether our code worked or not.

# Set DUMP_ALL to a non-empty value to enable all GCC dumps
 DUMP_ALL = 1

I think we are all done with Makefile.

👉 Checking Dump Files

Now what we should do next is to run a make command and check out our dump files.

make all

When you have run this, ls to see the files in the directory. You will see a BUNCH of dump files.

Like this:

clone-test-core.c
clone-test-x86-noprune
clone-test-x86-noprune-clone-test-core.c.000i.cgraph
clone-test-x86-noprune-clone-test-core.c.000i.ipa-clones
clone-test-x86-noprune-clone-test-core.c.006t.original
.
.
.
clone-test-x86-prune
clone-test-x86-prune-clone-test-core.c.000i.cgraph
clone-test-x86-prune-clone-test-core.c.000i.ipa-clones
clone-test-x86-prune-clone-test-core.c.000i.type-inheritance

On a closer look, you may notice that it is in 2 groups: one for prune and one for noprune.

Now, I don't really know which one to check, but I'm assuming the ones with our pass's names could be a great place to start.

clone-test-x86-noprune-clone-test-core.c.265t.kzaw
clone-test-x86-prune-clone-test-core.c.265t.kzaw

That sounds about right, I can see some of our printed messages. Checking the prune dump file:

;; Function scale_samples (scale_samples.default, funcdef_no=23, decl_uid=3954, cgraph_uid=24, symbol_order=23)

__attribute__((target ("default"), target_clones ("default", "popcnt")))
void scale_samples (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}

;; Function scale_samples.popcnt (scale_samples.popcnt, funcdef_no=25, decl_uid=3985, cgraph_uid=30, symbol_order=28)

>>>> Found potential clone: scale_samples.popcnt (base: scale_samples, variant: .popcnt)
>>>> Collected 165 statements from function scale_samples.popcnt
__attribute__((target ("popcnt"), target_clones ("default", "popcnt")))
void scale_samples.popcnt (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}

We know that the code is being compiled so that is good. However the logic might not be correct. What we can see from this:

❌ It's not analyzing scale_samples, I asumme it's because of the variant format (function.variant). That function should be fixed.
❌ It's checking scale_samples.popcnt, but it's stopped after collecting statements. Why?
✅ It's skipping scale_samples.resolver, which it should.
✅ It's skipping sum_sample, because there's no variant of it.

Let's debug and fix it

📌 Debugging: PRUNE Case

I want to make sure that the prune case work first. So let's fix those issues.

👉 Issue 1: Not Analyzing the Default Function

The pass isn't recognizing scale_samples (the default function) because it's only looking for functions with a period (.) in their name.

We need to modify the is_clone_function method to also recognize the base function.

Instead of returning false if the dot is not found. We do another check to see if it's a default function with target_clones attribute.
If it is, mark it as a .default variant.

...
  // Check if it's a default function with target_clones attribute
  if (lookup_attribute("target_clones", DECL_ATTRIBUTES(decl))) {
    base_name = func_name;
    variant = ".default";  // Use .default to mark it as the default variant

    if (dump_file) {
      fprintf(dump_file, "Found default function with clones: %s\n", full_name);
    }
    return true;
  }

  return false;

👉 Issue 2: Fix Clone Comparison in `execute()`

Now that the default functions can come out of that compare function and continue with their statements being collected, we have to fix our comparison logic.

In our existing code, it just compares the first two variants it finds, which might not include the default function. It also makes a single pruning decision for the base name, not distinguishing between variants.

Let's change the execute() logic. Specifically in the part where we select the functions to compare from the clone_groups.

Here we added code to find the default function in the group. Then, set it up to compare all variants against the default function (not just the first two).

// Find the default variant to use as reference
      size_t default_idx = 0;
      for (size_t i = 0; i < clone_groups[base_name].size(); i++) {
        if (clone_groups[base_name][i].variant == ".default") {
          default_idx = i;
          break;
        }
      }

      // Compare each non-default variant with the default
      const function_info &default_info = clone_groups[base_name][default_idx];
      bool all_same = true;

Below is the new comparison logic.

There is now a loop to compare each variant against the default
Added individual pruning for each variant (if it matches default or not)
Added overall decision for default function (if all variants are same or not)

for (size_t i = 0; i < clone_groups[base_name].size(); i++) {
        if (i == default_idx) continue; // Skip comparing default to itself

        const function_info &variant_info = clone_groups[base_name][i];

        if (dump_file) {
          fprintf(dump_file, "Comparing %s%s with %s%s\n", 
                  base_name.c_str(), default_info.variant.c_str(),
                  base_name.c_str(), variant_info.variant.c_str());
        }

        bool are_same = compare_functions(default_info.stmts, variant_info.stmts);

        // If any variant differs from default, mark the group as different
        if (!are_same) {
          all_same = false;
          // Print the pruning decision for this specific variant
          if (dump_file) {
            fprintf(dump_file, "NOPRUNE: %s%s\n", base_name.c_str(), variant_info.variant.c_str());
          }
        } else {
          // Print the pruning decision for this specific variant
          if (dump_file) {
            fprintf(dump_file, "PRUNE: %s%s\n", base_name.c_str(), variant_info.variant.c_str());
          }
        }
      }

      // Print the overall pruning decision for the default function
      print_prune_decision(base_name, all_same);

I believe this should be good. Let's test it again.

👉 Test Changes

After running the make all again after changes, let's check dump files.

;; Function scale_samples (scale_samples.default, funcdef_no=23, decl_uid=3954, cgraph_uid=24, symbol_order=23)

Found default function with clones: scale_samples
Collected 165 statements from function scale_samples
Collected 165 statements from function scale_samples.default
__attribute__((target ("default"), target_clones ("default", "popcnt")))
void scale_samples (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}

;; Function scale_samples.popcnt (scale_samples.popcnt, funcdef_no=25, decl_uid=3985, cgraph_uid=30, symbol_order=28)

Found potential clone: scale_samples.popcnt (base: scale_samples, variant: .popcnt)
Collected 165 statements from function scale_samples.popcnt
Collected 165 statements from function scale_samples.popcnt
Analyzing clones of function: scale_samples
Comparing scale_samples.default with scale_samples.popcnt
Functions are substantially the same
PRUNE: scale_samples.popcnt
NOPRUNE: scale_samples
__attribute__((target ("popcnt"), target_clones ("default", "popcnt")))
void scale_samples.popcnt (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}

As you can see, it is working. We can see:

✅ Checking default, skipping logic when it's the only one
✅ Checking .popcnt variant, determining the variant should be PRUNE.
✅ Decision given for the function itself, since it has a variant, it should be PRUNE.

Here is the screenshot:
Dump File

Great! Now we know our logic is working. How about for the NOPRUNE case?

📌 Debugging: NOPRUNE Case

Let's check the dump file, clone-test-x86-noprune-clone-test-core.c.265t.kzaw:

;; Function scale_samples (scale_samples.default, funcdef_no=23, decl_uid=3954, cgraph_uid=24, symbol_order=23)

Found default function with clones: scale_samples
Collected 165 statements from function scale_samples
Collected 165 statements from function scale_samples.default
__attribute__((target ("default"), target_clones ("default", "arch=x86-64-v3")))
void scale_samples (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}

;; Function scale_samples.arch_x86_64_v3 (scale_samples.arch_x86_64_v3, funcdef_no=25, decl_uid=3985, cgraph_uid=30, sy>
Found potential clone: scale_samples.arch_x86_64_v3 (base: scale_samples, variant: .arch_x86_64_v3)
Collected 193 statements from function scale_samples.arch_x86_64_v3
Collected 193 statements from function scale_samples.arch_x86_64_v3
Analyzing clones of function: scale_samples
Comparing scale_samples.default with scale_samples.arch_x86_64_v3
Functions have different statement counts: 165 vs 193
NOPRUNE: scale_samples.arch_x86_64_v3
NOPRUNE: scale_samples
__attribute__((target ("arch=x86-64-v3"), target_clones ("default", "arch=x86-64-v3")))
void scale_samples.arch_x86_64_v3 (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}

It is also working. We can see:

✅ Checking default, skipping logic when it's the only one.
✅ Checking .arch_x86_64_v3 variant, determining the variant should be NOPRUNE.
✅ Decision given for the function itself, since there is no clone, it should be NOPRUNE.

Screenshot:

Dump File

📌 How about `AArch64`?

OKAY! So we know that our pass is working fine in the x86 server since so far, everything we have done is within that server.

Let's check it in the aarch64 server now. Hopefully, we don't have to change anything.

After running the make in our spo600/examples/test-clone directory, I got this:

[kzaw@aarch64-002 test-clone]$ make all
/home/kzaw/gcc-test-001/bin/gcc -c  vol_createsample.c -o vol_createsample.o
/home/kzaw/gcc-test-001/bin/gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","rng") ))'\
        -march=armv8-a -g -O3 -fno-lto  -ftree-vectorize  -fdump-tree-all -fdump-ipa-all -fdump-rtl-all \
        clone-test-core.c vol_createsample.o -o clone-test-aarch64-prune
clone-test-core.c:28:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]
   28 | void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
      |      ^~~~~~~~~~~~~
during GIMPLE pass: kzaw
dump file: clone-test-aarch64-prune-clone-test-core.c.265t.kzaw
clone-test-core.c: In function ‘scale_samples.rng’:
clone-test-core.c:28:6: internal compiler error: Segmentation fault
0x22cf17b internal_error(char const*, ...)
        /home/kzaw/git/gcc/gcc/diagnostic-global-context.cc:517
0xfe97fb crash_signal
        /home/kzaw/git/gcc/gcc/toplev.cc:322
0x22ef0ff pp_string(pretty_printer*, char const*)
        /home/kzaw/git/gcc/gcc/pretty-print.cc:2656
0x22f013f format_phase_2
        /home/kzaw/git/gcc/gcc/pretty-print.cc:2035
0x22f013f pretty_printer::format(text_info&)
        /home/kzaw/git/gcc/gcc/pretty-print.cc:1711
0x22f1cb3 pp_format(pretty_printer*, text_info*)
        /home/kzaw/git/gcc/gcc/pretty-print.h:594
0x22f1cb3 pp_printf(pretty_printer*, char const*, ...)
        /home/kzaw/git/gcc/gcc/pretty-print.cc:2578
0xc2f0c3 print_gimple_stmt(_IO_FILE*, gimple*, int, dump_flag)
        /home/kzaw/git/gcc/gcc/gimple-pretty-print.cc:159
0x1055597 compare_functions
        /home/kzaw/git/gcc/gcc/tree-kzaw.cc:163
0x1055597 execute
        /home/kzaw/git/gcc/gcc/tree-kzaw.cc:372
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
make: *** [Makefile:38: clone-test-aarch64-prune] Error 1

That's a whole lot of words but what we can get from this is:

The crash definitely happened during our GIMPLE pass: kzaw
It happened at Line 163 (inside compare_functions) and Line 372 (inside execute)

Conclusion

Overall, this project stage 02 was a different level of difficult compared to stage 01. Where stage 01 had more issues with compiling, for this stage 02, a lot of effort went into creating the logic for the pass.

As you can see from my blog journey, I went through a lot of debugging for the pass to work as intended. Then again, it was different for each server, so more care needed to be put into the pass logic.

All in all, this stage jogged my memory on C/C++ language and made me research on a lot of gcc macro functions that could help me in what I am trying to achieve. I received help by working together with other classmates. Overall, this project was a challenge all the way from implementation to testing!

Thank you. 😄

Testing & Fixes - Project Stage 02

Introduction

📌 Recompiling & Rebuilding

📌 Testing

👉 Setting Up Makefile

👉 Checking Dump Files

📌 Debugging: PRUNE Case

👉 Issue 1: Not Analyzing the Default Function

👉 Issue 2: Fix Clone Comparison in `execute()`

👉 Test Changes

📌 Debugging: NOPRUNE Case

📌 How about `AArch64`?

Conclusion

Comments (0)

Read More

#reading

#popular

Testing & Fixes - Project Stage 02

Introduction

📌 Recompiling & Rebuilding

📌 Testing

👉 Setting Up Makefile

👉 Checking Dump Files

📌 Debugging: PRUNE Case

👉 Issue 1: Not Analyzing the Default Function

👉 Issue 2: Fix Clone Comparison in execute()

👉 Test Changes

📌 Debugging: NOPRUNE Case

📌 How about AArch64?

Conclusion

Comments (0)

Read More

#reading

#popular

👉 Issue 2: Fix Clone Comparison in `execute()`

📌 How about `AArch64`?