Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The scalability of PolyTracker #6576

Open
llooFlashooll opened this issue Nov 27, 2024 · 5 comments
Open

The scalability of PolyTracker #6576

llooFlashooll opened this issue Nov 27, 2024 · 5 comments

Comments

@llooFlashooll
Copy link

llooFlashooll commented Nov 27, 2024

Hi folks, I really appreciate your work.

However, I have a question about whether this repo can scale to other LLVM-backend languages such as Rust, etc.?

For example, I want to use a simple program to test.

@hbrodin
Copy link
Collaborator

hbrodin commented Nov 27, 2024

Hi,
Thanks for your interest in PolyTracker! We have not tried to instrument e.g. Rust code. I don't see any reason why it wouldn't work. The steps that one would need are roughly:

  1. Modify the rust-build to emit LLVM IR
  2. Introduce new taint-sources (there are a number implemented already for e.g. read, fread etc.)
  3. If desired, introduce new taint-sinks (e.g. write)
  4. Apply the polytracker LLVM-passes to the IR from step 1
  5. Link the polytracker runtime library.

There might be additional things needed, but this should be the high-level steps at least.
If you do end up going this route, it would be great to hear from you. Also, feel free to open a PR.

@llooFlashooll
Copy link
Author

llooFlashooll commented Nov 28, 2024

Hi, thank you very much for your detailed reply!! I will follow your guidance and test it out.

@llooFlashooll
Copy link
Author

llooFlashooll commented Dec 2, 2024

Hi @hbrodin , I did make some attempts recently. There are two steps I haven't finished.

  • Link the Rustc generated llvm bitcodes with polytracker and Rustc static, dynamic, and rlib libraries. Bugs always happen if I don't correctly link all the things and miss dependencies. I am working on fixing it.
  • Modify PolyTracker to support more sources and sinks.

Can you further help me provide some instructions, especially the second point, like referring to the required code locations? Thank you very much.

@kaoudis
Copy link
Collaborator

kaoudis commented Dec 3, 2024

Hey @llooFlashooll I can speak some to the second point, though naturally I defer to Henrik if I'm wrong. Tracking tainted bytes can start anywhere input is read or taken into the instrumented program, but you will need to define what those start points would be for a Rust program, if they are different from what is already implemented in Polytracker.

What is already implemented that should still work with Rust or C++ or whatever is the code that actually writes source labels out to the tdag and starts tainting "from there" with respect to the data flow of the instrumented program. What you might need to define for Rust is when those source labels should be created.

To do this, I believe you'd need to add whatever you are interested in initially tainting to the taint sources. If we use taint_source_buffer as an example,

static void taint_source_buffer(int fd, void *buff, Offset offset,
                                Length length, dfsan_label &ret_label) {
  if (length.valid()) {
    get_polytracker_tdag().source_taint(fd, buff, offset, *length.value());
  }
  ret_label = 0;
}

In the above function we're setting taint sources in the tdag. Each source label set will correspond to a particular input byte. Please keep in mind that taint and provenance are tracked at the byte level in Polytracker. What we're doing for each type of taint-source function varies a bit, so you might want to read all of them before deciding how to implement your own.

With respect to sinks, there is also some naming overlap in the code, since we also refer to Polytracker writing output to the tdag file as writing to a sink, but we do need to write the program sinks to the tdag so that the full "trees" of taint can be post hoc reconstructed. Taint-sinks are the functions that processed the bytes that identify where taint tracking stopped.

If you have not yet, you may also potentially want to modify the ABI lists here to make sure the Rust functionality you are interested in tracking taint through gets instrumented, and any functionality you are not interested in instrumenting can be ignorelisted. My understanding is that any code that is not either part of the ABI lists nor defined as a source or sink will be instrumented as if it were codebase business logic, meaning taint can be tracked through it, but can't originate from it.

@llooFlashooll
Copy link
Author

Thank you very much again! I have solved the first point and am working on your instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants