Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cached disasm #556

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

dannyp303
Copy link
Collaborator

One sentence summary of this PR (This should go in the CHANGELOG!)
Creates two new disassemblers for OFRAK, ofrak_pyghidra using the PyGhidra feature in the ghidra repository, and ofrak_cached for cacheing binary analysis.

Link to Related Issue(s)
None

Please describe the changes in your request.
ofrak_pyghidra.components: One single unpacker that unpacks the entire binary, down to instruction level, with an option switch to not unpack basic blocks into instructions.

ofrak_pyghidra.standalone(name pending): A utility completely seperate from OFRAK that uses pyghidra to generate the cache file used in ofrak_cached

ofrak_cached.components: OFRAK components that ingest a cached analysis file, associates it with a resource id, and uses it to unpack the binary as requested.

Anyone you think should look at this, specifically?
@rbs-jacob

@dannyp303 dannyp303 requested a review from rbs-jacob January 3, 2025 20:16
@rbs-jacob rbs-jacob added this to the 3.3.0 Release milestone Jan 6, 2025
@rbs-jacob rbs-jacob linked an issue Jan 6, 2025 that may be closed by this pull request
)


class CachedProgramUnpacker(Unpacker[None]):
Copy link
Collaborator Author

@dannyp303 dannyp303 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component is untested due to code regions being unpacked by the ELF unpacker, could be useful for raw binaries, maybe should be removed?

Comment on lines +54 to +55
class CachedCodeRegionModifier(CachedCodeRegionModifier):
pass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cached code region unpacker runs this modifier directly, this is a workaround to have it be discoverable with only pyghidra components injected. I tested that it does not cause conflicts if both the cached_disassembly and pyghidra are injected at the same time.

Copy link
Member

@rbs-jacob rbs-jacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't really get to review the meat of the code (most of the actual analyzers and components don't have many comments from me). But I included some comments about general stuff in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file already exist in the OFRAK core test assets directory? If so, do we need to duplicate it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to stay aware that if we ever change or recompile the test files, we need to regenerate the corresponding json files, but yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever change or recompile the test files, then these tests will fail. When that happens, we'll be able to fix the tests by either: copying the modified file to this directory like it is now, or by regenerating the files. Probably makes sense to remove the duplicated file and defer that decision to then, in my opinion.

Comment on lines +1 to +4
# Download & install java and supervisor
RUN wget https://download.oracle.com/java/21/latest/jdk-21_linux-x64_bin.deb && \
dpkg -i jdk-21_linux-x64_bin.deb && \
rm -f jdk-21_linux-x64_bin.debd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this conflict with JDK versions installed from apt in other Ghidra packages/core OFRAK?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, regarding ghidra, I think a separate MR to bump ghidra and unify the containers is the solution. Regarding the APK components, when we run tests on specific modules in ofrak (i.e disassemblers) does it run the ofrak_core tests from that test container as well, or only from the base image? I will try to check on this, probably @whyitfor knows.

Comment on lines +8 to +22
# RUN git clone https://github.com/NationalSecurityAgency/ghidra.git && \
# cd ghidra && \
# git checkout 5b7b12e7d016700eb0834d7421d266dd527a7355 && \
# ./gradlew -I gradle/support/fetchDependencies.gradle && \
# ./gradlew buildGhidra --info && \
# cd build/dist/ && \
# unzip ghidra*.zip > /dev/null && \
# mv ghidra_11.3_DEV /opt/rbs/ghidra_11.3_DEV && \
# cd ../../../ && \
# rm -rf ghidra && \
# cd /opt/rbs/ghidra_11.3_DEV/Ghidra/Features/PyGhidra/pypkg/ && \
# python3 -m pip install -e .

# ENV GHIDRA_INSTALL_DIR=/opt/rbs/ghidra_11.3_DEV/
# RUN export GHIDRA_INSTALL_DIR=/opt/rbs/ghidra_11.3_DEV/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is dead, took a minute to figure out how to do it properly tho so i didn't wanna throw it out immediately, I can move it to personal notes, but i could see it being useful for someone else in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about taking it out vs. leaving it in. But if we leave it in, I think it's important that we include an explanatory comment giving context on what it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use some clarification on why it makes sense to have a separate PyGhidra image that is different from the Ghidra image.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 reasons, 1 that is not longer valid is that it took a long time to compile ghidra from source, so having a seperate image that pulls a release seemed beneficial. The second and probably better reason is i didn't want to include bumping the ghidra version in this MR, especially since I don't recall the last time we've done so. I would suggest doing this and merging the images in a subsequent MR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we should be tracking "bump Ghidra version" as a separate issue on the OFRAK GitHub issue tracker? To my knowledge, we don't currently have an issue for that.

)


async def test_cached_decompilation(ofrak_context: OFRAKContext):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how we can be sure that this test case tests cached decompilation since I see no mention of the cache components

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect name, my bad.
Note: technically this does use cached components bc the pyghidra analysis front loads all analysis and caches automatically to avoid having to reopen the ghidra project between calls. Name not intentional though.

@rbs-jacob rbs-jacob requested a review from whyitfor February 13, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Ghidra Support
2 participants