GO Renaming

Introduction

Here is a simple example:

a.c:

extern int printf (char const *,...);
void f(void) {
    printf ("Lorem\n");
}

b.c:

extern int printf (char const *,...);
void g(void) {
    printf ("Ipsum\n");
}

c.c:

extern int printf (char const *,...);
void h(void) {
    printf ("Lorem\n");
}
void j(void) {
    printf ("Ipsum\n");
}

Taking these three simple source files, it’s fairly obvious that we should be generating exactly 2 fragments for these compilations: f() and h() are identical, as are g() and j(). However, what actually happens is a little different:

$ pstore-dump --all-compilations clang.db
---
- compilations:
    - digest: 1ee605135f246e24191f534f0335b4c4
      compilation: 
          - { digest: 3884195e5cf9700bd5c00bd24e08ca92, name: f }
          - { digest: e88ce1bd9c14b6c379492240050ae7a8, name: str }
    - digest: f1cf114d4ddba292adb08ae52b104b55
      compilation: 
          - { digest: cf90f785b56d6b8449146d22f201afc4, name: g }
          - { digest: 3e3784847c18273a1a9c1e1bffa57e68, name: str }
    - digest: 1ac0b95e1a341cfd42cb22cb1a80e38e
      compilation: 
          - { digest: 3884195e5cf9700bd5c00bd24e08ca92, name: h }
          - { digest: bf848182a05c1c93b97d6feafd49b136, name: j }
          - { digest: e88ce1bd9c14b6c379492240050ae7a8, name: str }
          - { digest: 3e3784847c18273a1a9c1e1bffa57e68, name: str.2 }

(I’ve tried to simplify pstore-dump’s output for clarity.)

What’s Happening Here?

The reason that the fragments don’t match up is known: in each case an optimization is deleting the string passed to printf() and replacing with a new version without the trailing '\n' and then calling puts() instead. The new string object must have internal linkage and be named in such a way that it doesn’t clash with any existing names. This gives us:

Compilation	Fragment	Linked definitions
a.c	`f` → hash(f)	{ str }
b.c	`g` → hash(g)	{ str }
c.c	`h` → hash(f) `j` → hash(j)	{ str } { str.2 }

Our back-end code understands and can handle the relationship between the fragment and its linked definition(s). That’s great but this behaviour still creates a problem for us. The name for new GOs created by the optimizer ("str" and "str.2" above) are specific to their compilation: they’re named to avoid clashes there. This obviously doesn’t guarantee that the same will be true in other compilations and we’d like to be able to “materialize” the definitions of these functions in other contexts. It also means that the exact collection of fragments that we produce will change if the order of the functions changes: not ideal.

A Possible Solution

In the repo code, we have a significant advantage over the conventional code when it comes to generating unique names for objects: the hash-generation pass has computed a guaranteed unique name for each GO seen by the front-end. When the optimizer creates GOs, it does so in response to the contents of other GOs. In our example above, the string object with hash e88ce1bd9c14b6c379492240050ae7a8 is created in response to f(), the string with hash 3e3784847c18273a1a9c1e1bffa57e68 is caused by g().

This should allow us to create names that are guaranteed unique by stiching the optimizer’s base-name (“str” in this case) together with the hash and a further value where more than one object is created for an individual fragment. Reworking the table above:

Compilation	Fragment	Linked definitions
a.c	`f` → hash(f)	{ str.hash(f) }
b.c	`g` → hash(g)	{ str.hash(g) }
c.c	`h` → hash(f) `j` → hash(g)	{ str.hash(f) } { str.hash(g) }

Here’s the same output in (psuedo) pstore-dump format:

$ pstore-dump --all-compilations clang.db
---
- compilations:
    - digest: 1ee605135f246e24191f534f0335b4c4
      compilation: 
          - { digest: 3884195e5cf9700bd5c00bd24e08ca92, name: f }
          - { digest: e88ce1bd9c14b6c379492240050ae7a8, name: str.3884195e5cf9700bd5c00bd24e08ca92 }
    - digest: f1cf114d4ddba292adb08ae52b104b55
      compilation: 
          - { digest: cf90f785b56d6b8449146d22f201afc4, name: g }
          - { digest: 3e3784847c18273a1a9c1e1bffa57e68, name: str.cf90f785b56d6b8449146d22f201afc4 }
    - digest: 1ac0b95e1a341cfd42cb22cb1a80e38e
      compilation: 
          - { digest: 3884195e5cf9700bd5c00bd24e08ca92, name: h }
          - { digest: cf90f785b56d6b8449146d22f201afc4, name: j }
          - { digest: e88ce1bd9c14b6c379492240050ae7a8, name: str.3884195e5cf9700bd5c00bd24e08ca92 }
          - { digest: 3e3784847c18273a1a9c1e1bffa57e68, name: str.cf90f785b56d6b8449146d22f201afc4 }

This gives us the output that we’d like. There are just four fragments: the two functions and the two strings.

Notes

I exclusively discussed the optimization that creates additional strings in response to switching a printf() call to puts(), but I think (at the moment) that the general approach is sound in all cases where an optimizer creates new internal-linkage objects.
The output above is an improvement over our first example, but it’s still not optimal for the repo. Each instance of a str.hash(x) GO requires an entry in the compilation with internal linkage and each reference requires an external fixup. Better output would make the str GO a separate section within the responsible fragment which would eliminate the compilation entry and enable use of an internal fixup (which avoids a name lookup in the linker).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GO Renaming

Table of Contents

Introduction

What’s Happening Here?

A Possible Solution

Notes

Clone this wiki locally