Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Sensitive data library #18414

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Rust: Sensitive data library #18414

wants to merge 5 commits into from

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Jan 6, 2025

Add a library for recognising sensitive data in Rust. The core logic in SensitiveDataHeuristics.qll is shared between languages, the integration in SensitiveData.qll is Rust specific (so are the tests and summary queries).

The implementation here is a bit minimal, there's a lot I want to improve both in the integration and in the shared library. However I'm keen to get something in place and working so that I can start building queries on top of this alongside making improvements to it.

@geoffw0 geoffw0 added the Rust Pull requests that update Rust code label Jan 6, 2025
@Copilot Copilot bot review requested due to automatic review settings January 6, 2025 13:36
Copilot

This comment was marked as spam.

Comment on lines +34 to +35
sink(pass_phrase); // $ MISSING: sensitive=password
sink(auth_key); // $ MISSING: sensitive=password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add those?

we should also consider that snake-case is the convention in rust for variables (by default you even have that flagged and many IDEs propose to fix the case automatically), so I think we should add snakecase counterparts to the camelcase heuristics (auth_key and authentication_key).

Also, if we add pass_phrase/passPhrase, we should probably also add passphrase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add those?

Yes we should, but I'd like to do it in the shared library and that requires agreement from other language teams (and wider testing). So it'll be a follow-up PR, and it is planned.

if we add pass_phrase/passPhrase, we should probably also add passphrase.

The regexs are all case insensitive, so passPhrase and passphrase should both be matched. I can add some test cases around this if you like?

Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :) Just a few tiny things and a question.

private import codeql.rust.dataflow.DataFlow

/**
* A data flow node that might contain sensitive data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A data flow node that might contain sensitive data.
* A data-flow node that might contain sensitive data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use "data flow" more commonly than "data-flow" at present I think. I don't really mind, but I'd like to move towards greater consistency.

}

/**
* A function call data flow node that might produce sensitive data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A function call data flow node that might produce sensitive data.
* A function call data-flow node that might produce sensitive data.

}

/**
* A variable access data flow node that might produce sensitive data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A variable access data flow node that might produce sensitive data.
* A variable access data-flow node that might produce sensitive data.

sink(authenticationkey); // $ sensitive=password
sink(authKey); // $ sensitive=password

sink(ms); // $ MISSING: sensitive=password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this one be sensitive=password?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because ms is a MyStruct which contains a sensitive field password. So if you transmitted the whole of ms plaintext you would be transmitting a password (and some other stuff) unencrypted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants