Skip to content

Commit

Permalink
Add deduplication mechanism; improve reporting (#239)
Browse files Browse the repository at this point in the history
- Changelog: Clarify use of semantic versioning
- `report`: Add new deduplication mechanism that suppresses overlapping redundant matches
- Datastore: schema has been expanded to support new deduplication mechanism
- `report`: Limit provenance entries per match to 3 by default; override with --max-provenance=N
- `report`: Fix a bug that would print incorrect diagnostics
- `report`: Include finding and match IDs in default "human" format output
  • Loading branch information
bradlarsen authored Dec 17, 2024
1 parent 3e2a837 commit 882f6b4
Show file tree
Hide file tree
Showing 33 changed files with 670 additions and 563 deletions.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ This is the changelog for [Nosey Parker](https://github.com/praetorian-inc/nosey
All notable changes to the project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project aspires to use [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Note that the use of semantic versioning applies to the command-line interface and output formats; the Rust crate APIs are considered an implementation detail at this point.


## Unreleased
Expand All @@ -22,14 +23,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

This fixes a bug in v0.20.0 where provenance entries from an extensible enumerator could _only_ be JSON objects, instead of arbitrary JSON values as claimed by the documentation.

- The datastore schema has changed in order to support a new finding deduplication mechanism ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
Datastores from previous versions of Nosey Parker are not supported.

- The `report` command now reports at most 3 provenenance entries per match by default ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
This can be overridden with the new `--max-provenance=N` option.

- The `report` command now includes finding and match IDs in its default "human" format ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).

- The `scan` command now prints a simplified summary at the end, without the unpopulated status columns ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).

### Fixes
- The `Blynk Organization Client Credentials` rule now has a non-varying number of capture groups

- Fixed a typo in the `report` command that could cause a diagnostic message about suppressed matches to be incorrect ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).

### Changes
- The `Slack Bot Token` rule has been modified to match additional cases.
- The `rules check` command now more thoroughly checks the number of capture groups of each rule

### Additions
- A new finding deduplication mechanism is enabled by default when reporting ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
This mechanism suppresses matches and findings that overlap with others if they are less specific.
For example, a single blob might contain text that matches _both_ the `HTTP Bearer Token` and `Slack User Token` rules; the less-specific `HTTP Bearer Token` match will be suppressed.

- New rules have been added:

- `Connection String in .NET Configuration` ([#238](https://github.com/praetorian-inc/noseyparker/pull/238))
Expand All @@ -43,6 +60,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
Only a few rules have descriptions so far.
Use `rules list -f json` to see.

- The `report` command has a new `--max-provenance=N` option that limits the number of provenance entries displayed for any single match ([#239](https://github.com/praetorian-inc/noseyparker/pull/239)).
A negative number means "no limit".
The default value is 3.


## [v0.21.0](https://github.com/praetorian-inc/noseyparker/releases/v0.21.0) (2024-11-20)

Expand Down
24 changes: 21 additions & 3 deletions crates/noseyparker-cli/src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -987,7 +987,7 @@ pub struct ContentFilteringArgs {
/// Do not scan files larger than the specified size
///
/// The value is parsed as a floating point literal, and hence fractional values can be supplied.
/// A negative value means "no limit".
/// A non-positive value means "no limit".
/// Note that scanning requires reading the entire contents of each file into memory, so using an excessively large limit may be problematic.
#[arg(
long("max-file-size"),
Expand Down Expand Up @@ -1069,7 +1069,7 @@ pub struct ReportArgs {
pub struct ReportFilterArgs {
/// Limit the number of matches per finding to at most N
///
/// A negative value means "no limit".
/// A non-positive value means "no limit".
#[arg(
long,
default_value_t = 3,
Expand All @@ -1078,7 +1078,18 @@ pub struct ReportFilterArgs {
)]
pub max_matches: i64,

/// Only report findings that have a mean score of at least N.
/// Limit the number of provenance entries per match to at most N
///
/// A non-positive value means "no limit".
#[arg(
long,
default_value_t = 3,
value_name = "N",
allow_negative_numbers = true
)]
pub max_provenance: i64,

/// Only report findings that have a mean score of at least N
///
/// Scores are floating point numbers in the range [0, 1].
/// Use the value `0` to disable this filtering.
Expand All @@ -1090,6 +1101,13 @@ pub struct ReportFilterArgs {
/// Include only findings with the assigned status
#[arg(long, value_name = "STATUS")]
pub finding_status: Option<FindingStatus>,

/// Suppress redundant matches and findings
///
/// A match is considered redundant to another if they overlap significantly within the same
/// blob and satisfy a handful of heuristics.
#[arg(long, default_value_t=true, action=ArgAction::Set, value_name="BOOL")]
pub suppress_redundant: bool,
}

#[derive(ValueEnum, Debug, Display, Clone, Copy)]
Expand Down
86 changes: 43 additions & 43 deletions crates/noseyparker-cli/src/cmd_report.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ pub fn run(global_args: &GlobalArgs, args: &ReportArgs) -> Result<()> {
Some(args.filter_args.max_matches.try_into().unwrap())
};

let max_provenance = if args.filter_args.max_provenance <= 0 {
None
} else {
Some(args.filter_args.max_provenance.try_into().unwrap())
};

let min_score = if args.filter_args.min_score <= 0.0 {
None
} else {
Expand All @@ -57,6 +63,8 @@ pub fn run(global_args: &GlobalArgs, args: &ReportArgs) -> Result<()> {
let reporter = DetailsReporter {
datastore,
max_matches,
max_provenance,
suppress_redundant: args.filter_args.suppress_redundant,
min_score,
finding_status: args.filter_args.finding_status,
styles,
Expand All @@ -67,7 +75,9 @@ pub fn run(global_args: &GlobalArgs, args: &ReportArgs) -> Result<()> {
struct DetailsReporter {
datastore: Datastore,
max_matches: Option<usize>,
max_provenance: Option<usize>,
min_score: Option<f64>,
suppress_redundant: bool,
finding_status: Option<FindingStatus>,
styles: Styles,
}
Expand All @@ -89,66 +99,43 @@ impl DetailsReporter {
fn get_finding_metadata(&self) -> Result<Vec<FindingMetadata>> {
let datastore = &self.datastore;
let mut group_metadata = datastore
.get_finding_metadata()
.get_finding_metadata(self.suppress_redundant)
.context("Failed to get match group metadata from datastore")?;

// How many findings were suppressed due to their status not matching?
let mut num_suppressed_for_status: usize = 0;

// How many findings were suppressed due to their status not matching?
let mut num_suppressed_for_score: usize = 0;

// Suppress findings with non-matching status
if let Some(status) = self.finding_status {
group_metadata.retain(|md| {
if statuses_match(status, md.statuses.0.as_slice()) {
true
} else {
num_suppressed_for_status += 1;
false
}
})
}
let old_len = group_metadata.len();
group_metadata.retain(|md| statuses_match(status, md.statuses.0.as_slice()));
let num_suppressed = old_len - group_metadata.len();

// Suppress findings with non-matching score
if let Some(min_score) = self.min_score {
group_metadata.retain(|md| match md.mean_score {
Some(mean_score) if mean_score < min_score => {
num_suppressed_for_score += 1;
false
}
_ => true,
})
}

if num_suppressed_for_status > 0 {
let finding_status = self.finding_status.unwrap();

if num_suppressed_for_status == 1 {
if num_suppressed == 1 {
info!(
"Note: 1 finding with status not matching {finding_status} was suppressed; \
rerun without `--finding-status={finding_status}` to show it"
"Note: 1 finding with status not matching {status} was suppressed; \
rerun without `--finding-status={status}` to show it"
);
} else {
} else if num_suppressed > 1 {
info!(
"Note: {num_suppressed_for_status} findings with status not matching \
`{finding_status}` were suppressed; \
rerun without `--finding-status={finding_status}` to show them"
"Note: {num_suppressed} findings with status not matching \
`{status}` were suppressed; \
rerun without `--finding-status={status}` to show them"
);
}
}

if num_suppressed_for_score > 0 {
let min_score = self.min_score.unwrap();
// Suppress findings with non-matching score
if let Some(min_score) = self.min_score {
let old_len = group_metadata.len();
group_metadata.retain(|md| md.mean_score.map(|s| s >= min_score).unwrap_or(true));
let num_suppressed = old_len - group_metadata.len();

if num_suppressed_for_status == 1 {
if num_suppressed == 1 {
info!(
"Note: 1 finding with meanscore less than {min_score} was suppressed; \
rerun with `--min-score=0` to show it"
);
} else {
} else if num_suppressed > 1 {
info!(
"Note: {num_suppressed_for_score} findings with mean score less than \
"Note: {num_suppressed} findings with mean score less than \
{min_score} were suppressed; \
rerun with `--min-score=0` to show them"
);
Expand All @@ -162,7 +149,12 @@ impl DetailsReporter {
fn get_matches(&self, metadata: &FindingMetadata) -> Result<Vec<ReportMatch>> {
Ok(self
.datastore
.get_finding_data(metadata, self.max_matches)
.get_finding_data(
metadata,
self.max_matches,
self.max_provenance,
self.suppress_redundant,
)
.with_context(|| format!("Failed to get matches for finding {metadata:?}"))
.expect("should be able to find get matches for finding")
.into_iter()
Expand All @@ -182,6 +174,10 @@ impl DetailsReporter {
self.styles.style_heading.apply_to(val)
}

fn style_id<D>(&self, val: D) -> StyledObject<D> {
self.styles.style_id.apply_to(val)
}

fn style_match<D>(&self, val: D) -> StyledObject<D> {
self.styles.style_match.apply_to(val)
}
Expand Down Expand Up @@ -286,6 +282,9 @@ struct ReportMatch {

/// An optional status assigned to the match
status: Option<Status>,

/// The match structural IDs that this match is considered redundant to
redundant_to: Vec<String>,
}

impl From<FindingDataEntry> for ReportMatch {
Expand All @@ -297,6 +296,7 @@ impl From<FindingDataEntry> for ReportMatch {
score: e.match_score,
comment: e.match_comment,
status: e.match_status,
redundant_to: e.redundant_to,
}
}
}
Expand Down
23 changes: 17 additions & 6 deletions crates/noseyparker-cli/src/cmd_report/human_format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ impl DetailsReporter {
let finding = Finding { metadata, matches };
writeln!(
&mut writer,
"{}",
self.style_finding_heading(format!("Finding {finding_num}/{num_findings}"))
"{} (id {})",
self.style_finding_heading(format!("Finding {finding_num}/{num_findings}")),
self.style_id(&finding.metadata.finding_id),
)?;
writeln!(&mut writer, "{}", PrettyFinding(self, &finding))?;
}
Expand Down Expand Up @@ -90,7 +91,7 @@ impl<'a> Display for PrettyFinding<'a> {
f,
"{}",
reporter.style_heading(format!(
"Showing {}/{} occurrences:",
"Showing {}/{} matches:",
finding.num_matches_available(),
finding.total_matches()
))
Expand All @@ -109,14 +110,25 @@ impl<'a> Display for PrettyFinding<'a> {
score,
comment,
status,
redundant_to,
} = rm;

writeln!(
f,
"{}",
reporter.style_heading(format!("Occurrence {i}/{}", finding.total_matches())),
"{} (id {})",
reporter.style_heading(format!("Match {i}/{}", finding.total_matches())),
reporter.style_id(&m.structural_id),
)?;

if !redundant_to.is_empty() {
writeln!(
f,
"{} {}",
reporter.style_heading("Redundant to:"),
redundant_to.join(", "),
)?;
}

// write out match status if set
if let Some(status) = status {
let status = match status {
Expand Down Expand Up @@ -145,7 +157,6 @@ impl<'a> Display for PrettyFinding<'a> {
)
};

// FIXME: limit the total number of provenance entries displayed
for p in provenance.iter() {
match p {
Provenance::File(e) => {
Expand Down
3 changes: 3 additions & 0 deletions crates/noseyparker-cli/src/cmd_report/styles.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ pub struct Styles {
pub style_heading: Style,
pub style_match: Style,
pub style_metadata: Style,
pub style_id: Style,
}

impl Styles {
Expand All @@ -23,13 +24,15 @@ impl Styles {
let style_heading = Style::new().bold().force_styling(styles_enabled);
let style_match = Style::new().yellow().force_styling(styles_enabled);
let style_metadata = Style::new().bright().blue().force_styling(styles_enabled);
let style_id = Style::new().bright().green().force_styling(styles_enabled);

Self {
style_finding_heading,
style_rule,
style_heading,
style_match,
style_metadata,
style_id,
}
}
}
6 changes: 4 additions & 2 deletions crates/noseyparker-cli/src/cmd_scan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -549,7 +549,7 @@ pub fn run(global_args: &args::GlobalArgs, args: &args::ScanArgs) -> Result<()>
.unwrap()
.context("Failed to enumerate inputs")?;

let (datastore, num_matches, num_new_matches) = datastore_thread
let (mut datastore, num_matches, num_new_matches) = datastore_thread
.join()
.unwrap()
.context("Failed to save results to the datastore")?;
Expand All @@ -561,6 +561,8 @@ pub fn run(global_args: &args::GlobalArgs, args: &args::ScanArgs) -> Result<()>

progress.finish();

datastore.check_match_redundancies()?;

// ---------------------------------------------------------------------------------------------
// Finalize and report
// ---------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -614,7 +616,7 @@ pub fn run(global_args: &args::GlobalArgs, args: &args::ScanArgs) -> Result<()>
.get_summary()
.context("Failed to get finding summary")
.unwrap();
let table = crate::cmd_summarize::summary_table(&summary);
let table = crate::cmd_summarize::summary_table(&summary, /* simple= */ true);
println!();
table.print_tty(global_args.use_color(std::io::stdout()))?;
}
Expand Down
Loading

0 comments on commit 882f6b4

Please sign in to comment.