Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deduplication mechanism; improve reporting #239

Merged
merged 5 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ This is the changelog for [Nosey Parker](https://github.com/praetorian-inc/nosey
All notable changes to the project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project aspires to use [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Note that the use of semantic versioning applies to the command-line interface and output formats; the Rust crate APIs are considered an implementation detail at this point.


## Unreleased
Expand All @@ -22,14 +23,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

This fixes a bug in v0.20.0 where provenance entries from an extensible enumerator could _only_ be JSON objects, instead of arbitrary JSON values as claimed by the documentation.

- The datastore schema has changed in order to support a new finding deduplication mechanism ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
Datastores from previous versions of Nosey Parker are not supported.

- The `report` command now reports at most 3 provenenance entries per match by default ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
This can be overridden with the new `--max-provenance=N` option.

- The `report` command now includes finding and match IDs in its default "human" format ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).

- The `scan` command now prints a simplified summary at the end, without the unpopulated status columns ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).

### Fixes
- The `Blynk Organization Client Credentials` rule now has a non-varying number of capture groups

- Fixed a typo in the `report` command that could cause a diagnostic message about suppressed matches to be incorrect ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).

### Changes
- The `Slack Bot Token` rule has been modified to match additional cases.
- The `rules check` command now more thoroughly checks the number of capture groups of each rule

### Additions
- A new finding deduplication mechanism is enabled by default when reporting ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
This mechanism suppresses matches and findings that overlap with others if they are less specific.
For example, a single blob might contain text that matches _both_ the `HTTP Bearer Token` and `Slack User Token` rules; the less-specific `HTTP Bearer Token` match will be suppressed.

- New rules have been added:

- `Connection String in .NET Configuration` ([#238](https://github.com/praetorian-inc/noseyparker/pull/238))
Expand All @@ -43,6 +60,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
Only a few rules have descriptions so far.
Use `rules list -f json` to see.

- The `report` command has a new `--max-provenance=N` option that limits the number of provenance entries displayed for any single match ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
A negative number means "no limit".
The default value is 3.


## [v0.21.0](https://github.com/praetorian-inc/noseyparker/releases/v0.21.0) (2024-11-20)

Expand Down
24 changes: 21 additions & 3 deletions crates/noseyparker-cli/src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -987,7 +987,7 @@ pub struct ContentFilteringArgs {
/// Do not scan files larger than the specified size
///
/// The value is parsed as a floating point literal, and hence fractional values can be supplied.
/// A negative value means "no limit".
/// A non-positive value means "no limit".
/// Note that scanning requires reading the entire contents of each file into memory, so using an excessively large limit may be problematic.
#[arg(
long("max-file-size"),
Expand Down Expand Up @@ -1069,7 +1069,7 @@ pub struct ReportArgs {
pub struct ReportFilterArgs {
/// Limit the number of matches per finding to at most N
///
/// A negative value means "no limit".
/// A non-positive value means "no limit".
#[arg(
long,
default_value_t = 3,
Expand All @@ -1078,7 +1078,18 @@ pub struct ReportFilterArgs {
)]
pub max_matches: i64,

/// Only report findings that have a mean score of at least N.
/// Limit the number of provenance entries per match to at most N
///
/// A non-positive value means "no limit".
#[arg(
long,
default_value_t = 3,
value_name = "N",
allow_negative_numbers = true
)]
pub max_provenance: i64,

/// Only report findings that have a mean score of at least N
///
/// Scores are floating point numbers in the range [0, 1].
/// Use the value `0` to disable this filtering.
Expand All @@ -1090,6 +1101,13 @@ pub struct ReportFilterArgs {
/// Include only findings with the assigned status
#[arg(long, value_name = "STATUS")]
pub finding_status: Option<FindingStatus>,

/// Suppress redundant matches and findings
///
/// A match is considered redundant to another if they overlap significantly within the same
/// blob and satisfy a handful of heuristics.
#[arg(long, default_value_t=true, action=ArgAction::Set, value_name="BOOL")]
pub suppress_redundant: bool,
}

#[derive(ValueEnum, Debug, Display, Clone, Copy)]
Expand Down
86 changes: 43 additions & 43 deletions crates/noseyparker-cli/src/cmd_report.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ pub fn run(global_args: &GlobalArgs, args: &ReportArgs) -> Result<()> {
Some(args.filter_args.max_matches.try_into().unwrap())
};

let max_provenance = if args.filter_args.max_provenance <= 0 {
None
} else {
Some(args.filter_args.max_provenance.try_into().unwrap())
};

let min_score = if args.filter_args.min_score <= 0.0 {
None
} else {
Expand All @@ -57,6 +63,8 @@ pub fn run(global_args: &GlobalArgs, args: &ReportArgs) -> Result<()> {
let reporter = DetailsReporter {
datastore,
max_matches,
max_provenance,
suppress_redundant: args.filter_args.suppress_redundant,
min_score,
finding_status: args.filter_args.finding_status,
styles,
Expand All @@ -67,7 +75,9 @@ pub fn run(global_args: &GlobalArgs, args: &ReportArgs) -> Result<()> {
struct DetailsReporter {
datastore: Datastore,
max_matches: Option<usize>,
max_provenance: Option<usize>,
min_score: Option<f64>,
suppress_redundant: bool,
finding_status: Option<FindingStatus>,
styles: Styles,
}
Expand All @@ -89,66 +99,43 @@ impl DetailsReporter {
fn get_finding_metadata(&self) -> Result<Vec<FindingMetadata>> {
let datastore = &self.datastore;
let mut group_metadata = datastore
.get_finding_metadata()
.get_finding_metadata(self.suppress_redundant)
.context("Failed to get match group metadata from datastore")?;

// How many findings were suppressed due to their status not matching?
let mut num_suppressed_for_status: usize = 0;

// How many findings were suppressed due to their status not matching?
let mut num_suppressed_for_score: usize = 0;

// Suppress findings with non-matching status
if let Some(status) = self.finding_status {
group_metadata.retain(|md| {
if statuses_match(status, md.statuses.0.as_slice()) {
true
} else {
num_suppressed_for_status += 1;
false
}
})
}
let old_len = group_metadata.len();
group_metadata.retain(|md| statuses_match(status, md.statuses.0.as_slice()));
let num_suppressed = old_len - group_metadata.len();

// Suppress findings with non-matching score
if let Some(min_score) = self.min_score {
group_metadata.retain(|md| match md.mean_score {
Some(mean_score) if mean_score < min_score => {
num_suppressed_for_score += 1;
false
}
_ => true,
})
}

if num_suppressed_for_status > 0 {
let finding_status = self.finding_status.unwrap();

if num_suppressed_for_status == 1 {
if num_suppressed == 1 {
info!(
"Note: 1 finding with status not matching {finding_status} was suppressed; \
rerun without `--finding-status={finding_status}` to show it"
"Note: 1 finding with status not matching {status} was suppressed; \
rerun without `--finding-status={status}` to show it"
);
} else {
} else if num_suppressed > 1 {
info!(
"Note: {num_suppressed_for_status} findings with status not matching \
`{finding_status}` were suppressed; \
rerun without `--finding-status={finding_status}` to show them"
"Note: {num_suppressed} findings with status not matching \
`{status}` were suppressed; \
rerun without `--finding-status={status}` to show them"
);
}
}

if num_suppressed_for_score > 0 {
let min_score = self.min_score.unwrap();
// Suppress findings with non-matching score
if let Some(min_score) = self.min_score {
let old_len = group_metadata.len();
group_metadata.retain(|md| md.mean_score.map(|s| s >= min_score).unwrap_or(true));
let num_suppressed = old_len - group_metadata.len();

if num_suppressed_for_status == 1 {
if num_suppressed == 1 {
info!(
"Note: 1 finding with meanscore less than {min_score} was suppressed; \
rerun with `--min-score=0` to show it"
);
} else {
} else if num_suppressed > 1 {
info!(
"Note: {num_suppressed_for_score} findings with mean score less than \
"Note: {num_suppressed} findings with mean score less than \
{min_score} were suppressed; \
rerun with `--min-score=0` to show them"
);
Expand All @@ -162,7 +149,12 @@ impl DetailsReporter {
fn get_matches(&self, metadata: &FindingMetadata) -> Result<Vec<ReportMatch>> {
Ok(self
.datastore
.get_finding_data(metadata, self.max_matches)
.get_finding_data(
metadata,
self.max_matches,
self.max_provenance,
self.suppress_redundant,
)
.with_context(|| format!("Failed to get matches for finding {metadata:?}"))
.expect("should be able to find get matches for finding")
.into_iter()
Expand All @@ -182,6 +174,10 @@ impl DetailsReporter {
self.styles.style_heading.apply_to(val)
}

fn style_id<D>(&self, val: D) -> StyledObject<D> {
self.styles.style_id.apply_to(val)
}

fn style_match<D>(&self, val: D) -> StyledObject<D> {
self.styles.style_match.apply_to(val)
}
Expand Down Expand Up @@ -286,6 +282,9 @@ struct ReportMatch {

/// An optional status assigned to the match
status: Option<Status>,

/// The match structural IDs that this match is considered redundant to
redundant_to: Vec<String>,
}

impl From<FindingDataEntry> for ReportMatch {
Expand All @@ -297,6 +296,7 @@ impl From<FindingDataEntry> for ReportMatch {
score: e.match_score,
comment: e.match_comment,
status: e.match_status,
redundant_to: e.redundant_to,
}
}
}
Expand Down
23 changes: 17 additions & 6 deletions crates/noseyparker-cli/src/cmd_report/human_format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ impl DetailsReporter {
let finding = Finding { metadata, matches };
writeln!(
&mut writer,
"{}",
self.style_finding_heading(format!("Finding {finding_num}/{num_findings}"))
"{} (id {})",
self.style_finding_heading(format!("Finding {finding_num}/{num_findings}")),
self.style_id(&finding.metadata.finding_id),
)?;
writeln!(&mut writer, "{}", PrettyFinding(self, &finding))?;
}
Expand Down Expand Up @@ -90,7 +91,7 @@ impl<'a> Display for PrettyFinding<'a> {
f,
"{}",
reporter.style_heading(format!(
"Showing {}/{} occurrences:",
"Showing {}/{} matches:",
finding.num_matches_available(),
finding.total_matches()
))
Expand All @@ -109,14 +110,25 @@ impl<'a> Display for PrettyFinding<'a> {
score,
comment,
status,
redundant_to,
} = rm;

writeln!(
f,
"{}",
reporter.style_heading(format!("Occurrence {i}/{}", finding.total_matches())),
"{} (id {})",
reporter.style_heading(format!("Match {i}/{}", finding.total_matches())),
reporter.style_id(&m.structural_id),
)?;

if !redundant_to.is_empty() {
writeln!(
f,
"{} {}",
reporter.style_heading("Redundant to:"),
redundant_to.join(", "),
)?;
}

// write out match status if set
if let Some(status) = status {
let status = match status {
Expand Down Expand Up @@ -145,7 +157,6 @@ impl<'a> Display for PrettyFinding<'a> {
)
};

// FIXME: limit the total number of provenance entries displayed
for p in provenance.iter() {
match p {
Provenance::File(e) => {
Expand Down
3 changes: 3 additions & 0 deletions crates/noseyparker-cli/src/cmd_report/styles.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ pub struct Styles {
pub style_heading: Style,
pub style_match: Style,
pub style_metadata: Style,
pub style_id: Style,
}

impl Styles {
Expand All @@ -23,13 +24,15 @@ impl Styles {
let style_heading = Style::new().bold().force_styling(styles_enabled);
let style_match = Style::new().yellow().force_styling(styles_enabled);
let style_metadata = Style::new().bright().blue().force_styling(styles_enabled);
let style_id = Style::new().bright().green().force_styling(styles_enabled);

Self {
style_finding_heading,
style_rule,
style_heading,
style_match,
style_metadata,
style_id,
}
}
}
6 changes: 4 additions & 2 deletions crates/noseyparker-cli/src/cmd_scan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -549,7 +549,7 @@ pub fn run(global_args: &args::GlobalArgs, args: &args::ScanArgs) -> Result<()>
.unwrap()
.context("Failed to enumerate inputs")?;

let (datastore, num_matches, num_new_matches) = datastore_thread
let (mut datastore, num_matches, num_new_matches) = datastore_thread
.join()
.unwrap()
.context("Failed to save results to the datastore")?;
Expand All @@ -561,6 +561,8 @@ pub fn run(global_args: &args::GlobalArgs, args: &args::ScanArgs) -> Result<()>

progress.finish();

datastore.check_match_redundancies()?;

// ---------------------------------------------------------------------------------------------
// Finalize and report
// ---------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -614,7 +616,7 @@ pub fn run(global_args: &args::GlobalArgs, args: &args::ScanArgs) -> Result<()>
.get_summary()
.context("Failed to get finding summary")
.unwrap();
let table = crate::cmd_summarize::summary_table(&summary);
let table = crate::cmd_summarize::summary_table(&summary, /* simple= */ true);
println!();
table.print_tty(global_args.use_color(std::io::stdout()))?;
}
Expand Down
Loading
Loading