Introduction
Welcome back (again) to Project Stage 02! 😄
In the previous part, we implemented logic for our tree-kzaw.cc
pass which have been set up in gcc
since Project Stage 01. In this continued part of stage 02, I will be testing out our logic and adding any fixes as needed.
As always, I will be using aarch64
and x86
servers to test my logic.
❇️ Please find the source code here: GitHub ❇️
📌 Recompiling & Rebuilding
Now that our pass logic has been implemented in the previous blog, we will need to rebuild our gcc
to update it. Fortunately, if we did all that stuff with Makefile.in
from Project Stage 01, we won't have to wait to do a full gcc
rebuild.
Run these commands:
cd ~/gcc-build-001
time make -j 24 |& tee build-stage2-01.log
make install
If nothing goes wrong, we should be good for testing!
📌 Testing
In each of our servers, there is a file in the path: /public/spo600-test-clone.tgz
. Two binaries will be built once we extract this and use make
: one for PRUNE
test and one for NOTPRUNE
.
We will run this code to extract it and go into the directory as follows:
tar xvf /public/spo600-test-clone.tgz
cd ~/spo600/examples/test-clone
I see see a Makefile
, README.txt
and some other .c
and .h
files. The README.txt
shows me that running make
will output two binaries:
- test-clone-ARCH-prune
- test-clone-ARCH-noprune where ARCH is either 'aarch64' or 'x86' depending on the server.
👉 Setting Up Makefile
Let's check the Makefile
as there are some changes we must do.
Now there is a lot of things happening in there, but what we need to focus on is the gcc
. Within the file, you may see something like:
clone-test-x86-prune: clone-test-core.c $(LIBRARIES)
gcc -D 'CLONE_ATTRIBUTE=......
Here, the gcc
means the server's root gcc
, not our local gcc
so we will change that.
Add this at the top. Make sure to change the directory with your install directory (not build directory):
# Use the locally built GCC
CC = $(HOME)/gcc-test-001/bin/gcc
Now replace all gcc
with $(CC)
.
clone-test-x86-prune: clone-test-core.c $(LIBRARIES)
$(CC) -D 'CLONE_ATTRIBUTE=......
This ensures that the gcc
being used is our local one! While we're at it, uncomment the DUMP_ALL = 1
since we would want to check dump files to know whether our code worked or not.
# Set DUMP_ALL to a non-empty value to enable all GCC dumps
DUMP_ALL = 1
I think we are all done with Makefile
.
👉 Checking Dump Files
Now what we should do next is to run a make
command and check out our dump files.
make all
When you have run this, ls
to see the files in the directory. You will see a BUNCH of dump files.
Like this:
clone-test-core.c
clone-test-x86-noprune
clone-test-x86-noprune-clone-test-core.c.000i.cgraph
clone-test-x86-noprune-clone-test-core.c.000i.ipa-clones
clone-test-x86-noprune-clone-test-core.c.006t.original
.
.
.
clone-test-x86-prune
clone-test-x86-prune-clone-test-core.c.000i.cgraph
clone-test-x86-prune-clone-test-core.c.000i.ipa-clones
clone-test-x86-prune-clone-test-core.c.000i.type-inheritance
On a closer look, you may notice that it is in 2 groups: one for prune
and one for noprune
.
Now, I don't really know which one to check, but I'm assuming the ones with our pass's names could be a great place to start.
clone-test-x86-noprune-clone-test-core.c.265t.kzaw
clone-test-x86-prune-clone-test-core.c.265t.kzaw
That sounds about right, I can see some of our printed messages. Checking the prune
dump file:
;; Function scale_samples (scale_samples.default, funcdef_no=23, decl_uid=3954, cgraph_uid=24, symbol_order=23)
__attribute__((target ("default"), target_clones ("default", "popcnt")))
void scale_samples (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}
;; Function scale_samples.popcnt (scale_samples.popcnt, funcdef_no=25, decl_uid=3985, cgraph_uid=30, symbol_order=28)
>>>> Found potential clone: scale_samples.popcnt (base: scale_samples, variant: .popcnt)
>>>> Collected 165 statements from function scale_samples.popcnt
__attribute__((target ("popcnt"), target_clones ("default", "popcnt")))
void scale_samples.popcnt (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}
We know that the code is being compiled so that is good. However the logic might not be correct. What we can see from this:
- ❌ It's not analyzing
scale_samples
, I asumme it's because of the variant format (function.variant). That function should be fixed. - ❌ It's checking
scale_samples.popcnt
, but it's stopped after collecting statements. Why? - ✅ It's skipping
scale_samples.resolver
, which it should. - ✅ It's skipping
sum_sample
, because there's no variant of it.
Let's debug and fix it
📌 Debugging: PRUNE Case
I want to make sure that the prune case work first. So let's fix those issues.
👉 Issue 1: Not Analyzing the Default Function
The pass isn't recognizing scale_samples
(the default function) because it's only looking for functions with a period (.
) in their name.
We need to modify the is_clone_function
method to also recognize the base function.
- Instead of returning
false
if the dot is not found. We do another check to see if it's a default function withtarget_clones
attribute. - If it is, mark it as a
.default
variant.
...
// Check if it's a default function with target_clones attribute
if (lookup_attribute("target_clones", DECL_ATTRIBUTES(decl))) {
base_name = func_name;
variant = ".default"; // Use .default to mark it as the default variant
if (dump_file) {
fprintf(dump_file, "Found default function with clones: %s\n", full_name);
}
return true;
}
return false;
👉 Issue 2: Fix Clone Comparison in execute()
Now that the default functions can come out of that compare function and continue with their statements being collected, we have to fix our comparison logic.
In our existing code, it just compares the first two variants it finds, which might not include the default function. It also makes a single pruning decision for the base name, not distinguishing between variants.
Let's change the execute()
logic. Specifically in the part where we select the functions to compare from the clone_groups
.
Here we added code to find the default function in the group. Then, set it up to compare all variants against the default function (not just the first two).
// Find the default variant to use as reference
size_t default_idx = 0;
for (size_t i = 0; i < clone_groups[base_name].size(); i++) {
if (clone_groups[base_name][i].variant == ".default") {
default_idx = i;
break;
}
}
// Compare each non-default variant with the default
const function_info &default_info = clone_groups[base_name][default_idx];
bool all_same = true;
Below is the new comparison logic.
- There is now a loop to compare each variant against the default
- Added individual pruning for each variant (if it matches default or not)
- Added overall decision for default function (if all variants are same or not)
for (size_t i = 0; i < clone_groups[base_name].size(); i++) {
if (i == default_idx) continue; // Skip comparing default to itself
const function_info &variant_info = clone_groups[base_name][i];
if (dump_file) {
fprintf(dump_file, "Comparing %s%s with %s%s\n",
base_name.c_str(), default_info.variant.c_str(),
base_name.c_str(), variant_info.variant.c_str());
}
bool are_same = compare_functions(default_info.stmts, variant_info.stmts);
// If any variant differs from default, mark the group as different
if (!are_same) {
all_same = false;
// Print the pruning decision for this specific variant
if (dump_file) {
fprintf(dump_file, "NOPRUNE: %s%s\n", base_name.c_str(), variant_info.variant.c_str());
}
} else {
// Print the pruning decision for this specific variant
if (dump_file) {
fprintf(dump_file, "PRUNE: %s%s\n", base_name.c_str(), variant_info.variant.c_str());
}
}
}
// Print the overall pruning decision for the default function
print_prune_decision(base_name, all_same);
I believe this should be good. Let's test it again.
👉 Test Changes
After running the make all
again after changes, let's check dump files.
;; Function scale_samples (scale_samples.default, funcdef_no=23, decl_uid=3954, cgraph_uid=24, symbol_order=23)
Found default function with clones: scale_samples
Collected 165 statements from function scale_samples
Collected 165 statements from function scale_samples.default
__attribute__((target ("default"), target_clones ("default", "popcnt")))
void scale_samples (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}
;; Function scale_samples.popcnt (scale_samples.popcnt, funcdef_no=25, decl_uid=3985, cgraph_uid=30, symbol_order=28)
Found potential clone: scale_samples.popcnt (base: scale_samples, variant: .popcnt)
Collected 165 statements from function scale_samples.popcnt
Collected 165 statements from function scale_samples.popcnt
Analyzing clones of function: scale_samples
Comparing scale_samples.default with scale_samples.popcnt
Functions are substantially the same
PRUNE: scale_samples.popcnt
NOPRUNE: scale_samples
__attribute__((target ("popcnt"), target_clones ("default", "popcnt")))
void scale_samples.popcnt (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}
As you can see, it is working. We can see:
- ✅ Checking default, skipping logic when it's the only one
- ✅ Checking
.popcnt
variant, determining the variant should be PRUNE. - ✅ Decision given for the function itself, since it has a variant, it should be PRUNE.
Here is the screenshot:
Great! Now we know our logic is working. How about for the NOPRUNE case?
📌 Debugging: NOPRUNE Case
Let's check the dump file, clone-test-x86-noprune-clone-test-core.c.265t.kzaw
:
;; Function scale_samples (scale_samples.default, funcdef_no=23, decl_uid=3954, cgraph_uid=24, symbol_order=23)
Found default function with clones: scale_samples
Collected 165 statements from function scale_samples
Collected 165 statements from function scale_samples.default
__attribute__((target ("default"), target_clones ("default", "arch=x86-64-v3")))
void scale_samples (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}
;; Function scale_samples.arch_x86_64_v3 (scale_samples.arch_x86_64_v3, funcdef_no=25, decl_uid=3985, cgraph_uid=30, sy>
Found potential clone: scale_samples.arch_x86_64_v3 (base: scale_samples, variant: .arch_x86_64_v3)
Collected 193 statements from function scale_samples.arch_x86_64_v3
Collected 193 statements from function scale_samples.arch_x86_64_v3
Analyzing clones of function: scale_samples
Comparing scale_samples.default with scale_samples.arch_x86_64_v3
Functions have different statement counts: 165 vs 193
NOPRUNE: scale_samples.arch_x86_64_v3
NOPRUNE: scale_samples
__attribute__((target ("arch=x86-64-v3"), target_clones ("default", "arch=x86-64-v3")))
void scale_samples.arch_x86_64_v3 (int16_t * in, int16_t * out, int cnt, int volume)
{
.
.
.
}
It is also working. We can see:
- ✅ Checking default, skipping logic when it's the only one.
- ✅ Checking
.arch_x86_64_v3
variant, determining the variant should be NOPRUNE. - ✅ Decision given for the function itself, since there is no clone, it should be NOPRUNE.
Screenshot:
📌 How about AArch64
?
OKAY! So we know that our pass is working fine in the x86
server since so far, everything we have done is within that server.
Let's check it in the aarch64
server now. Hopefully, we don't have to change anything.
After running the make
in our spo600/examples/test-clone
directory, I got this:
[kzaw@aarch64-002 test-clone]$ make all
/home/kzaw/gcc-test-001/bin/gcc -c vol_createsample.c -o vol_createsample.o
/home/kzaw/gcc-test-001/bin/gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","rng") ))'\
-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-all -fdump-ipa-all -fdump-rtl-all \
clone-test-core.c vol_createsample.o -o clone-test-aarch64-prune
clone-test-core.c:28:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]
28 | void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
| ^~~~~~~~~~~~~
during GIMPLE pass: kzaw
dump file: clone-test-aarch64-prune-clone-test-core.c.265t.kzaw
clone-test-core.c: In function ‘scale_samples.rng’:
clone-test-core.c:28:6: internal compiler error: Segmentation fault
0x22cf17b internal_error(char const*, ...)
/home/kzaw/git/gcc/gcc/diagnostic-global-context.cc:517
0xfe97fb crash_signal
/home/kzaw/git/gcc/gcc/toplev.cc:322
0x22ef0ff pp_string(pretty_printer*, char const*)
/home/kzaw/git/gcc/gcc/pretty-print.cc:2656
0x22f013f format_phase_2
/home/kzaw/git/gcc/gcc/pretty-print.cc:2035
0x22f013f pretty_printer::format(text_info&)
/home/kzaw/git/gcc/gcc/pretty-print.cc:1711
0x22f1cb3 pp_format(pretty_printer*, text_info*)
/home/kzaw/git/gcc/gcc/pretty-print.h:594
0x22f1cb3 pp_printf(pretty_printer*, char const*, ...)
/home/kzaw/git/gcc/gcc/pretty-print.cc:2578
0xc2f0c3 print_gimple_stmt(_IO_FILE*, gimple*, int, dump_flag)
/home/kzaw/git/gcc/gcc/gimple-pretty-print.cc:159
0x1055597 compare_functions
/home/kzaw/git/gcc/gcc/tree-kzaw.cc:163
0x1055597 execute
/home/kzaw/git/gcc/gcc/tree-kzaw.cc:372
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See for instructions.
make: *** [Makefile:38: clone-test-aarch64-prune] Error 1
That's a whole lot of words but what we can get from this is:
- The crash definitely happened during our GIMPLE pass:
kzaw
- It happened at Line 163 (inside
compare_functions
) and Line 372 (insideexecute
)
Conclusion
Overall, this project stage 02 was a different level of difficult compared to stage 01. Where stage 01 had more issues with compiling, for this stage 02, a lot of effort went into creating the logic for the pass.
As you can see from my blog journey, I went through a lot of debugging for the pass to work as intended. Then again, it was different for each server, so more care needed to be put into the pass logic.
All in all, this stage jogged my memory on C/C++ language and made me research on a lot of gcc
macro functions that could help me in what I am trying to achieve. I received help by working together with other classmates. Overall, this project was a challenge all the way from implementation to testing!
Thank you. 😄