-
Notifications
You must be signed in to change notification settings - Fork 957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement telfhash for ELF import table #936
Conversation
Lets run TC tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linux-doxygen-build
failed.win-build
failed - looks like it has trouble compiling TLSH. Solving errors on WIndows is a major pita. If you don't have a windows machine, and cannot get you hands on some virtualbox, let me know. We could probably give you access to a windows server, but it is kinda last resort - having it locally in virutalbox is better.- TLSH sources must not be between other RetDec sources. Move it to
/deps/tlsh
. Create CMake library target for it - look for example attinyxml2
which should be similar. I can help with CMake if needed. Also, add at least license file to sources, and some version identification, e.g. version file if it exists - preferably use some tagged TLSH version, not random commit atmaster
. Also, add TLSH entry to our/LICENSE-THIRD-PARTY
file. - I also expect PR in retdec-regression-tests that on some real-world input sample tests both plain and json values. Place it in
/tools/fileinfo/features
. - Can you send me a file, where it produces some hash? I tried it on
retdec-regression-tests/integration/ack/2018-09-13/ack.x64.gcc.O0.g.elf
and I got newly added entries forSHA256
,MD5
,CRC32
, but no TLSH. Is it working for you? M'I doing something wrong? This is exactly the thing that regression tests would solve automatically. - After you fix these issues, and test the implementation on those 450k samples I sent, change the PR from draft to a full PR, and let me know. I will go over it again and merge it.
include/retdec/fileformat/types/import_table/elf_import_table.h
Outdated
Show resolved
Hide resolved
imported_symbols.push_back(funcName); | ||
} | ||
|
||
std::sort(imported_symbols.begin(), imported_symbols.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are sorting, why not use std::set
? Are duplicates possible/desirable? If yes, and we want to keep them, std::vector
is ok. If we don't want them, std::set
would solve it for us.
Does telfhash allow duplicates in this container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a thing, is the sorting algorithm behaving the same as in the telfhash implementation? I think it is possible to somehow control the string sorting in C++? There might be differences. And even it there are not, a comment on this would be helpful so that no one screws it up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original implementation seems to use a python list and not check for duplicates, so I think it allows duplicates. Python sort and C++ one on strings should sort lexicographically. But I'll be more sure after further testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be confusing C++ with Linux, but there are definitely more ways to sort strings - Linux sort
, or maybe some other app, is controlled by LOCALE
(I think). Just check iPython <-> C++, and add some note. In general, because we try to exactly reimplement some other code here, comments on these things are highly desirable.
This one for example https://www.virustotal.com/gui/file/fb5b5659801beccc790444f5fecb6acc74c10469acf7554f8f1eeea033bbbfe8
|
Lets run TC tests. |
I've moved TLSH to deps and wrote a CMakeList for it inspired by existing ones, but it's the first time I used CMake so I am not sure if I did it correctly, especially with the cond_add which I've set if RETDEC_ENABLE_FILEFORMAT (in option.cmake) is set. I've added their license file content into the LICENSE-THIRD-PARTY file. But I am not sure what should I put into the license file and version file that I should put among the source files. I've used the latest tagged release /~https://github.com/trendmicro/tlsh/releases/tag/4.2.1 . The version values are in the CMakeList |
After testing out the implementation on 464527 ELF binaries I've found few things about telfhash that I didn't really account for:
The first one results in a situation where we calculated telfhash, but because there are no imported symbols, there is no import table in the output, so we don't output telfhash. Maybe use something else strictly for the import table and use telfhash separately? That could also give the ability to use symbols from the dynamic segments. |
…phashes, create default imphash for ELF
I moved the telfhash to the I've tested it and seems to work well compared to original implementation, I think it's ready for re-review. |
I've added tlsh /~https://github.com/trendmicro/tlsh code into the project and used it to calculate telfhash for ELF import tables.