Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read ELF build ids directly from the target process instead of mmap()ing libraries #71

Closed
gabrielesvelto opened this issue Feb 28, 2023 · 6 comments · Fixed by #112
Closed

Comments

@gabrielesvelto
Copy link
Contributor

When populating the module list on Linux we extract the GNU build ID from each executable. To do so we first mmap() the entire file in the process writing the minidump and then extract the data from there. This has a couple of important drawbacks:

  • If the file on disk has been deleted we'll get an empty ID, something that we see often in Firefox crashes.
  • If the file on disk has been altered we'll get the wrong ID.
  • If the file is extremely large mmap() may fail on 32-bit hosts, we've also seen this happen in 32-bit builds of Firefox.

We could avoid all these issues by reading the ELF headers directly from the process we're dumping. goblin supports parsing an ELF file lazily, though one has to do it manually (see this example).

@jld
Copy link

jld commented Feb 28, 2023

goblin's API is a little suboptimal here; Elf::iter_note_headers seems to expect a &[u8] that starts at the beginning of the file and covers all of the PT_NOTE segments. But what we really want to do is:

  1. Read the ELF header
  2. Read the program headers
  3. Read the note segments

Although, if we expect that those will all be close to the start of the file — and in practice they seem to be within the first ~1k — we could just copy in some convenient small amount like 4k (in one syscall, if we have #72), and fall back to larger sizes if we get goblin::error::Error::Scroll(scroll::error::Error::BadOffset(…)).

We'd still want to use that lazy_parse feature for that, I think, because there are going to be headers that point to things outside the area we're reading and that we don't care about.

@gabrielesvelto
Copy link
Contributor Author

We'd need to fetch some strings from the STRTAB to find the appropriate note, but we could do that lazily and it would require only a handful of extra system calls. The whole STRTAB for libxul.so is ~5MiB in my local build so we could also load it in its entirety if lazy-parsing doesn't work.

@gabrielesvelto
Copy link
Contributor Author

Note: I've experimented with lazy parsing via goblin and it's working just fine.

@lissyx
Copy link
Contributor

lissyx commented Mar 7, 2024

there's m4b/goblin#391 to fix some of it?

@lissyx
Copy link
Contributor

lissyx commented Mar 7, 2024

And in fact whole feature is in https://phabricator.services.mozilla.com/D199710

@gabrielesvelto gabrielesvelto changed the title Read ELF build ids directly from the target process instead of mmap()ing it Read ELF build ids directly from the target process instead of mmap()ing libraries Mar 19, 2024
@afranchuk
Copy link
Contributor

I'm working on this now (can't assign myself though), based off of https://phabricator.services.mozilla.com/D199710. Though depending on how expensive/slow ptrace PEEKDATA is, it'd still be a lot less memory reading (fewer ptrace calls) to lazy parse ourselves rather than using goblin (for portions of it, at least).

afranchuk added a commit to afranchuk/minidump-writer that referenced this issue Mar 29, 2024
Closes rust-minidump#71.

A few things to consider:
* Since we read from the process memory, the process must be in
  ptrace-stop (see `test_file_id`). This changes when the build ids can
  be read. Previously they could be read without the process being
  stopped if the mapped files still existed (and were hopefully the same
  that the process was using).
* The previous implementation made some mutations to deleted mapping
  names (removing the ` (deleted)` suffix). We need to decide whether we
  still want/need this behavior. In the meantime I commented out a
  failing test assertion.
afranchuk added a commit to afranchuk/minidump-writer that referenced this issue Mar 29, 2024
Closes rust-minidump#71.

A few things to consider:
* Since we read from the process memory, the process must be in
  ptrace-stop (see `test_file_id`). This changes when the build ids can
  be read. Previously they could be read without the process being
  stopped if the mapped files still existed (and were hopefully the same
  that the process was using).
* The previous implementation made some mutations to deleted mapping
  names (removing the ` (deleted)` suffix). We need to decide whether we
  still want/need this behavior. In the meantime I commented out a
  failing test assertion.
afranchuk added a commit to afranchuk/minidump-writer that referenced this issue Mar 29, 2024
Closes rust-minidump#71.

A few things to consider:
* Since we read from the process memory, the process must be in
  ptrace-stop (see `test_file_id`). This changes when the build ids can
  be read. Previously they could be read without the process being
  stopped if the mapped files still existed (and were hopefully the same
  that the process was using).
* The previous implementation made some mutations to deleted mapping
  names (removing the ` (deleted)` suffix). We need to decide whether we
  still want/need this behavior. In the meantime I commented out a
  failing test assertion.
Jake-Shadle pushed a commit that referenced this issue Apr 12, 2024
* Read ELF build ids directly from the target process.

Closes #71.

A few things to consider:
* Since we read from the process memory, the process must be in
  ptrace-stop (see `test_file_id`). This changes when the build ids can
  be read. Previously they could be read without the process being
  stopped if the mapped files still existed (and were hopefully the same
  that the process was using).
* The previous implementation made some mutations to deleted mapping
  names (removing the ` (deleted)` suffix). We need to decide whether we
  still want/need this behavior. In the meantime I commented out a
  failing test assertion.

* Address review comments.

* Always remove ` (deleted)` from module names at parse time.

* Fix failing CI tests.

This test needed to be disabled due to permissions issues.

* Improve error handling of strtab and impl ModuleMemory for &[u8].

* Add tests to build id reader.
Jake-Shadle pushed a commit that referenced this issue Apr 25, 2024
Closes #71.

A few things to consider:
* Since we read from the process memory, the process must be in
  ptrace-stop (see `test_file_id`). This changes when the build ids can
  be read. Previously they could be read without the process being
  stopped if the mapped files still existed (and were hopefully the same
  that the process was using).
* The previous implementation made some mutations to deleted mapping
  names (removing the ` (deleted)` suffix). We need to decide whether we
  still want/need this behavior. In the meantime I commented out a
  failing test assertion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants