-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cached disasm #556
base: master
Are you sure you want to change the base?
Feature/cached disasm #556
Conversation
tests are passing
…sing generated json files
) | ||
|
||
|
||
class CachedProgramUnpacker(Unpacker[None]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is untested due to code regions being unpacked by the ELF unpacker, could be useful for raw binaries, maybe should be removed?
…to feature/cached_disasm
class CachedCodeRegionModifier(CachedCodeRegionModifier): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached code region unpacker runs this modifier directly, this is a workaround to have it be discoverable with only pyghidra components injected. I tested that it does not cause conflicts if both the cached_disassembly and pyghidra are injected at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't really get to review the meat of the code (most of the actual analyzers and components don't have many comments from me). But I included some comments about general stuff in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file already exist in the OFRAK core test assets directory? If so, do we need to duplicate it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have to stay aware that if we ever change or recompile the test files, we need to regenerate the corresponding json files, but yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever change or recompile the test files, then these tests will fail. When that happens, we'll be able to fix the tests by either: copying the modified file to this directory like it is now, or by regenerating the files. Probably makes sense to remove the duplicated file and defer that decision to then, in my opinion.
# Download & install java and supervisor | ||
RUN wget https://download.oracle.com/java/21/latest/jdk-21_linux-x64_bin.deb && \ | ||
dpkg -i jdk-21_linux-x64_bin.deb && \ | ||
rm -f jdk-21_linux-x64_bin.debd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this conflict with JDK versions installed from apt in other Ghidra packages/core OFRAK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, regarding ghidra, I think a separate MR to bump ghidra and unify the containers is the solution. Regarding the APK components, when we run tests on specific modules in ofrak (i.e disassemblers) does it run the ofrak_core tests from that test container as well, or only from the base image? I will try to check on this, probably @whyitfor knows.
...ssemblers/ofrak_cached_disassembly/ofrak_cached_disassembly/components/cached_disassembly.py
Outdated
Show resolved
Hide resolved
# RUN git clone https://github.com/NationalSecurityAgency/ghidra.git && \ | ||
# cd ghidra && \ | ||
# git checkout 5b7b12e7d016700eb0834d7421d266dd527a7355 && \ | ||
# ./gradlew -I gradle/support/fetchDependencies.gradle && \ | ||
# ./gradlew buildGhidra --info && \ | ||
# cd build/dist/ && \ | ||
# unzip ghidra*.zip > /dev/null && \ | ||
# mv ghidra_11.3_DEV /opt/rbs/ghidra_11.3_DEV && \ | ||
# cd ../../../ && \ | ||
# rm -rf ghidra && \ | ||
# cd /opt/rbs/ghidra_11.3_DEV/Ghidra/Features/PyGhidra/pypkg/ && \ | ||
# python3 -m pip install -e . | ||
|
||
# ENV GHIDRA_INSTALL_DIR=/opt/rbs/ghidra_11.3_DEV/ | ||
# RUN export GHIDRA_INSTALL_DIR=/opt/rbs/ghidra_11.3_DEV/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is dead, took a minute to figure out how to do it properly tho so i didn't wanna throw it out immediately, I can move it to personal notes, but i could see it being useful for someone else in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about taking it out vs. leaving it in. But if we leave it in, I think it's important that we include an explanatory comment giving context on what it is.
ofrak-pyghidra.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use some clarification on why it makes sense to have a separate PyGhidra image that is different from the Ghidra image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 reasons, 1 that is not longer valid is that it took a long time to compile ghidra from source, so having a seperate image that pulls a release seemed beneficial. The second and probably better reason is i didn't want to include bumping the ghidra version in this MR, especially since I don't recall the last time we've done so. I would suggest doing this and merging the images in a subsequent MR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we should be tracking "bump Ghidra version" as a separate issue on the OFRAK GitHub issue tracker? To my knowledge, we don't currently have an issue for that.
disassemblers/ofrak_pyghidra/ofrak_pyghidra_test/test_pyghidra_components.py
Show resolved
Hide resolved
) | ||
|
||
|
||
async def test_cached_decompilation(ofrak_context: OFRAKContext): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how we can be sure that this test case tests cached decompilation since I see no mention of the cache components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect name, my bad.
Note: technically this does use cached components bc the pyghidra analysis front loads all analysis and caches automatically to avoid having to reopen the ghidra project between calls. Name not intentional though.
One sentence summary of this PR (This should go in the CHANGELOG!)
Creates two new disassemblers for OFRAK, ofrak_pyghidra using the PyGhidra feature in the ghidra repository, and ofrak_cached for cacheing binary analysis.
Link to Related Issue(s)
None
Please describe the changes in your request.
ofrak_pyghidra.components
: One single unpacker that unpacks the entire binary, down to instruction level, with an option switch to not unpack basic blocks into instructions.ofrak_pyghidra.standalone
(name pending): A utility completely seperate from OFRAK that uses pyghidra to generate the cache file used inofrak_cached
ofrak_cached.components
: OFRAK components that ingest a cached analysis file, associates it with a resource id, and uses it to unpack the binary as requested.Anyone you think should look at this, specifically?
@rbs-jacob