Friday, February 4, 2022

Demystifying the Profraw Format

Three months ago, I was looking into building coverage for eBPF programs.

That’s how bpfcov was born.

I did not want to reinvent the wheel, so I was looking into the existing coverage instrumentation features of LLVM.

Among the different options LLVM provides, source-based code coverage suddenly appeared very appealing to me.

It’s the most precise type of code coverage. In my opinion, the only one that gives its user a damn clue on what code region of the program is executing. It also helps understand why a particular code branch gets executed.

Using it is pretty straightforward. All resolve to the following steps:

  1. compile your C/C++ code with the -fprofile-instr-generate -fcoverage-mapping Clang options
  2. run the program
  3. the execution of the program automatically creates a default.profraw fileYou can set it to something else via the LLVM_PROFILE_FILE environment variable.
  4. feed the default.profraw file to LLVM tools like llvm-profdata and llvm-cov

You now have a beautifully fine-grained coverage report for your C/C++ code.

But what caught my eye was feeling that this kind of coverage was not using debug info (like gcov does) at all, but it was using profiling data, coming directly from the darkest parts of LLVM.

My curiosity took me to some extraordinary places in the LLVM codebase…

I wanted to understand what the -fprofile-instr-generate was causing.

Also, I wanted to know how I got a binary file (default.profraw) without my code writing anything to the disk.

And what it contained.

Spoiler. It contains the globals that the -fprofile-instr-generate instruments into your code.

But we probably need to move one step at a time to learn something new.

-fprofile-instr-generate§

So, this is the dummy C code - hello.c - that I used to walk myself through this study.

#include <stdio.h>
#include <stdint.h>

void ciao()
{
    printf("ciao\n");
}

void foo()
{
    printf("foo\n");
}

int main(int argc, char **argv)
{
    if (argc > 1)
    {
        foo();
        for (int i = 0; i < 22; i++) {
            ciao();
        }
    }
    printf("main\n");
}

I compiled it to textual LLVM IR:

clang -g -fprofile-instr-generate -emit-llvm -S $< -o hello.ll

In a matter of milliseconds, I was able to look at its intermediate representation:

@__profc_ciao = private global [1 x i64] zeroinitializer, section "__llvm_prf_cnts", align 8
@__profd_ciao = private global { i64, i64, i64*, i8*, i8*, i32, [2 x i16] } { i64 -1479480177954886802, i64 0, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__profc_ciao, i32 0, i32 0), i8* bitcast (void ()* @ciao to i8*), i8* null, i32 1, [2 x i16] zeroinitializer }, section "__llvm_prf_data", align 8
@__profc_foo = private global [1 x i64] zeroinitializer, section "__llvm_prf_cnts", align 8
@__profd_foo = private global { i64, i64, i64*, i8*, i8*, i32, [2 x i16] } { i64 6699318081062747564, i64 0, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__profc_foo, i32 0, i32 0), i8* bitcast (void ()* @foo to i8*), i8* null, i32 1, [2 x i16] zeroinitializer }, section "__llvm_prf_data", align 8
@__profc_main = private global [3 x i64] zeroinitializer, section "__llvm_prf_cnts", align 8
@__profd_main = private global { i64, i64, i64*, i8*, i8*, i32, [2 x i16] } { i64 -2624081020897602054, i64 717562688593, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i32 0, i32 0), i8* bitcast (i32 (i32, i8**)* @main to i8*), i8* null, i32 3, [2 x i16] zeroinitializer }, section "__llvm_prf_data", align 8
@__llvm_prf_nm = private constant [23 x i8] c"\0D\15x\DAK\CEL\CCgL\CB\CFg\CCM\CC\CC\03\00\1Fy\04\88", section "__llvm_prf_names", align 1

It stands out from what we see that it contains:

Also, all these variables have ELF sections.

The __profc_<function_name> variables end up into the __llvm_prf_cnts section. The __profd_<function_name> variables end up into the __llvm_prf_data section. The __llvm_prf_nm variable ends up into the __llvm_prf_names section.

But what __profc_<function_name> variables are?

The section names helped me here: they are the counters used by LLVM for instrumenting our code for profiling!

Scrolling down into the LLVM IR, we see they’re getting incremented (load, add, store instructions sequence) at the correct spots:

define dso_local i32 @main(i32 %0, i8** %1) #0 !dbg !19 {
  ...
  %7 = load i64, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i64 0, i64 0), align 8, !dbg !30
  %8 = add i64 %7, 1, !dbg !30
  store i64 %8, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i64 0, i64 0), align 8, !dbg !30
  ...
  br i1 %10, label %11, label %24, !dbg !34

11:                                               ; preds = %2
  %12 = load i64, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i64 0, i64 1), align 8, !dbg !34
  %13 = add i64 %12, 1, !dbg !34
  store i64 %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i64 0, i64 1), align 8, !dbg !34
  call void @foo(), !dbg !35
  ...
  br label %14, !dbg !40

14:                                               ; preds = %20, %11
  %15 = load i32, i32* %6, align 4, !dbg !41
  %16 = icmp slt i32 %15, 22, !dbg !43
  br i1 %16, label %17, label %23, !dbg !44

17:                                               ; preds = %14
  %18 = load i64, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i64 0, i64 2), align 8, !dbg !44
  %19 = add i64 %18, 1, !dbg !44
  store i64 %19, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @__profc_main, i64 0, i64 2), align 8, !dbg !44
  call void @ciao(), !dbg !45
  br label %20, !dbg !47

20:                                               ; preds = %17
  %21 = load i32, i32* %6, align 4, !dbg !48
  %22 = add nsw i32 %21, 1, !dbg !48
  store i32 %22, i32* %6, align 4, !dbg !48
  br label %14, !dbg !49, !llvm.loop !50

23:                                               ; preds = %14
  br label %24, !dbg !53

24:                                               ; preds = %23, %2
  %25 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str.2, i64 0, i64 0)), !dbg !54
  %26 = load i32, i32* %3, align 4, !dbg !55
  ret i32 %26, !dbg !55
}

We now know why the __profc_main array has size 3 (see [3 x i64]): LLVM instrumented one counter for the main() function’s entry, one for the if block, while the last one regards the for block.

We also know that we have a total of 5 counters, given the __profc_ciao and __profc_foo have both size 1.

Verifying this is just a matter of obtaining our program binary…

clang -g -fprofile-instr-generate hello.c -o hello

And playing with llvm-objdump:

llvm-objdump --section-headers hello | grep __llvm
# 17 __llvm_prf_names   00000017 0000000000008b89 DATA
# 26 __llvm_prf_cnts    00000028 000000000000c110 DATA
# 27 __llvm_prf_data    00000090 000000000000c138 DATA
llvm-objdump --syms hello | grep __llvm_prf_cnts
# 000000000000c110 g       __llvm_prf_cnts        0000000000000000 .hidden __start___llvm_prf_cnts
# 000000000000c138 g       __llvm_prf_cnts        0000000000000000 .hidden __stop___llvm_prf_cnts

The output here helps us confirm that we have 5 total counters of 8 bytes each, starting at address c110 and ending at c138.

In fact, $c138 - c110 = 28$, which means 40 bytes in decimal base (ie., $8 * 5$).

What about the __profd_<function_name> variables instead?

llvm-objdump --syms hello | grep __llvm_prf_data
# 000000000000c1c8 g       __llvm_prf_data        0000000000000000 .hidden __stop___llvm_prf_data
# 000000000000c138 g       __llvm_prf_data        0000000000000000 .hidden __start___llvm_prf_data

At the moment, all you need to know is that those are structs containing data about the function they refer to and the instrumented counters for it.

I hope the following image clarifies the relationship. Anyways, we’ll dig deeper into the __profd_<function_name> variables later.

LLVM IR with profiling data

Generating the profraw§

At this point, I still didn’t know how the default.profraw file got created every time I executed my binary.

I came back to the LLVM source code. It’s incredible the superpowers that reading the source gives you…

I found InstrProfilingFile.cSource..

Long story short: the code in this LLVM source file implements the profiling runtime initialization, which means it is responsible for gathering all the data from the variables (counters, data, etc.) we’ve seen and flushing them out in the default.profraw file.

While looking at it, I suddenly noticed the __llvm_profile_initializeSource. function.

It gets invoked by the profiling runtime initialization hook to get the filename of the profraw to create. Either from an environment variable (LLVM_PROFILE_FILE) or using the default one (default.profraw). Finally, it calls the C library function atexit(__llvm_profile_write_file). Which will cause the __llvm_profile_write_file function to be invoked when the program terminates.

Thus, I also noticed the __llvm_profile_write_fileSource. function.

The main thing this function does is to invoke writeFile(Filename)Source.. Which, in turn, calls lprofWriteData Source..

It’s again time to verify with llvm-objdump.

llvm-objdump --syms hello | grep __llvm_profile
# 0000000000004180 l     F .text  00000000000001a1 __llvm_profile_write_file
# 0000000000004610 g     F .text  0000000000000037 .hidden __llvm_profile_register_write_file_atexit
# 00000000000121e4 g     O .bss   0000000000000004 .hidden __llvm_profile_runtime
# 000000000000c108  w    O .data  0000000000000008 __llvm_profile_raw_version
# 00000000000122e8  w    O .bss   0000000000000001 __llvm_profile_filename
# 00000000000040a0 g     F .text  00000000000000ae .hidden __llvm_profile_initialize
# ...

Our ELF got flooded by __llvm_profile_* functions.

Seeing all those symbols in the ELF made me 100% sure this is how LLVM makes our binary open the default.profraw file as soon it executes. In this way, it also makes our program automatically dump the profile data when it terminates.

Coming back to the lprofWriteData function implementation (in the InstrProfilingWriter.c source file), we instead discover what it dumps into default.profraw. It’s while doing so that we understand what a profraw file contains.

Such a functionSource.:

  1. calculates the size of the data, counters, and names
  2. calculates the paddings needed before and after the counters section (__llvm_prf_cnts) and after the names section (__llvm_prf_names)
  3. initializes the header structure (more on it below)
  4. writes the header
  5. writes the binary IDs
    1. if NT_GNU_BUILD_ID is defined, otherwise skips them
  6. writes the data
    1. the __profd_<function_name> globals
  7. writes the padding bytes before the counters for page alignment
  8. writes the counters
    1. the content of the __profc_<function_name> globals at program termination
  9. writes the padding bytes after the counters for page alignment
  10. writes the names
    1. the __llvm_prf_nm variable
  11. writes the padding bytes after the names part
    1. the profraw format want its size to be a multiple of 8
  12. writes the value profiling data
    1. only in case it was enabled

All these moving parts are written using the ProfDataIOVec data structureSource..

So, we can finally assert what a profraw binary file is composed by:

The INSTR_PROF_RAW_HEADER macro at InstrProfData.inc defines the parts composing the header of the profraw file. Which are:

  1. the magic
  2. the version
  3. the size of binary IDs
  4. the size of the data section:
    1. it is equal to the number of __profd_<function_name> global variables
    2. the data part starts where __start___llvm_prf_data is
    3. the data part ends where __stop___llvm_prf_data is
    4. assuming 3 __profd_<function_name> global struct exists{i64, i64, i64*, i8*, i8*, i32, [2 x i16]} (48 bytes), the data section size is the difference between the __stop_ (c1c8) and __start_ addresses (c138), thus 90 in hexadecimal0x90 is 144 decimal, so $144 / 48 = 3$.
  5. the padding bytes before the counters sectionIt can also be 0 bytes.
  6. the size of the counters section
    1. it is equal to the number of __profc_<function_name> global variables
    2. the counters part starts where the (__start_) __llvm_prf_cnts is
    3. the counters part starts where the (__stop_) __llvm_prf_cnts is
    4. assuming 3 __profc_* global variables existsFor a total of 5 counters, 8 bytes each: [1 x i64], [1 x i64], [3 x i64], the counters section size is the difference between the __stop_ (c138) and the __start_ addresses (c110), thus 28 in hexadecimal0x28 is 40 decimal, so $40 / 8 = 5$ which is the total number of counters.
  7. the padding bytes after the counters sectionIt can also be 0 bytes.
  8. the size of the names section
    1. it is equal to the size of __llvm_prf_nm global variable
    2. the names part starts where the (__start_) __llvm_prf_names is
    3. the names part starts where the (__stop_) __llvm_prf_names is
  9. the counters delta
  10. it is the difference between the address at which the counter section begins and the address at which the data section begins
  11. it is equal to the address of the first __profc_<function_name> counter
  12. the names delta
    1. the address at which the names section begins
  13. IPVK_LastIt seems to be always 1.

With the following table I tried to summarize the header’s content:

00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
magic version
data size padding before counters
counters size padding after counters
names size counters delta
names begin value kind last

But probably an image is worth a thousand words.

This is the default.profraw file after executing ./hello yay:

Header

We can see that its header is composed of:

  • the magic (in black)
  • the version (in blue)
  • the number of functions (in red)
    • it equals to the number of __profd_<function_name> variables too: 3
  • the padding bytes before the counters (in light green)
  • the total number of the counters (in fluo green)
    • meaning, the total size of __profc_<function_name> variables: 5
  • the padding bytes after the counters (in dark green)
  • the length of the __llvm_prf_nm variable (in magenta): 0F in hex, 15 decimal
  • the address of the first counter (in brown)
  • the address of __llvm_prf_nm (in pink)
  • IPVK_Last (1) in yellow

Exactly the way we were now expecting it.

Data§

Focusing on the data part, I’ve highlighted with (orange) rectangles the 3 __profd_<function_name> global variables.

Data

Each of the __profd_<function_name> contains 6 parts:

  • 1 is the ID (hex, reverse order) of the function to which it refers
  • 2 is the function control flow hash
  • 3 points to the counter it relates to
  • 4 points to the function it refers to
  • 5 points to the value expressions, if any
  • 6 contains the number of the counters (2 bytes) and the initialization array (number of value sites, 2 elements, 2 bytes each, often zero)

Counters§

After the data parte comes the counters part.

Counters

It just is a serialization of what the variables in the __llvm_prf_cnts ELF section contains.

In fact, in this screenshot you can see underlined (with waves, in shades of cyan/blue):

  • __profc_ciao: [1 x i64]
    • 1 counter, with value 16
    • it means the ciao() function was executed 22 (decimal) times
  • __profc_foo: [1 x i64]
    • 1 counter, with value 1
    • it means the foo() function only was executed 1 time
  • __profc_main: [3 x i64]
    • 3 counters, first two with value 1, last one with value 16
    • it means the function main() was executed 1 time, its if conditional evaluated to true, and the for inside it got executed 22 (decimal) times

Names§

Finally, the __llvm_prf_nm variable populates the last part of the profraw file.

Names

Here (in grey) you can see its contentFunction names are uncompressed because I compiled the binary with -mllvm -enable-name-compression=false flags., padded with a \0 byte (the one circled) at the end to align the profraw file size to be multiple of 8.

Usage§

It’s time to use the default.profraw file now:

llvm-profdata show --all-functions --counts default.profraw
# Counters:
#   ciao:
#     Hash: 0x0000000000000000
#     Counters: 1
#     Function count: 22
#     Block counts: []
#   foo:
#     Hash: 0x0000000000000000
#     Counters: 1
#     Function count: 1
#     Block counts: []
#   main:
#     Hash: 0x000000a71211b451
#     Counters: 3
#     Function count: 1
#     Block counts: [1, 22]
# Instrumentation level: Front-end
# Functions shown: 3
# Total functions: 3
# Maximum function count: 22
# Maximum internal block count: 22

Isn’t it cool?

There’s more you can do with profraw files and the LLVM toolchain, indeed. But showing that was not the goal of this post.

I hope you now have a better understanding of what profraw files are.



from Hacker News https://ift.tt/heCIRcu

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.