Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: add Manifest::intersect_manifest to Rust core #3305

Merged
merged 40 commits into from
Sep 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
6b1d1bb
make error catchable
ctb Aug 21, 2024
7e73e8f
flag another problematic unwrap
ctb Aug 21, 2024
2896c49
more
ctb Aug 21, 2024
45b1a8f
more
ctb Aug 21, 2024
e8d7999
upd log message
ctb Aug 21, 2024
53bcf02
provide correct error
ctb Aug 21, 2024
d467d9f
switch Manifest from paths to TryFrom
ctb Aug 22, 2024
5f1eef6
Revert "provide correct error"
ctb Aug 22, 2024
eb46ecd
Revert "switch Manifest from paths to TryFrom"
ctb Aug 22, 2024
923af44
poor person's picklist?
ctb Aug 23, 2024
315dfff
add picklist select
ctb Aug 23, 2024
1fb658a
ok
ctb Aug 23, 2024
a122982
picklist by ref
ctb Aug 23, 2024
39816ab
add manifest.is_empty()
ctb Aug 23, 2024
61416fb
update revindex indexing message
ctb Aug 24, 2024
3a7abe9
propagate error on bad directory when opening RocksDB
ctb Aug 24, 2024
07e3a09
Merge branch 'update_revindex' into remove_unwrap
ctb Aug 24, 2024
fe75c6c
do we not need len?
ctb Aug 24, 2024
5fb20fc
nope, don't need em
ctb Aug 24, 2024
2c59010
revert for now
ctb Aug 24, 2024
c8477df
Merge branch 'latest' of github.com:sourmash-bio/sourmash into remove…
ctb Aug 27, 2024
6fee403
adjust select_picklist per luiz
ctb Aug 27, 2024
0675402
simplify and encapsulate
ctb Aug 27, 2024
17f50ef
cargo fmt
ctb Aug 27, 2024
39c140b
add a test of intersect_manifest
ctb Sep 15, 2024
16f72c5
impl PartialEq/Eq for Record, ignoring internal_location
ctb Sep 15, 2024
2146f30
switch to using full Record for intersect_manifest
ctb Sep 16, 2024
32839ae
round out comparison & hashing for Record
ctb Sep 16, 2024
43ee757
cargo fmt
ctb Sep 16, 2024
31aa378
Merge branch 'latest' into remove_unwrap
ctb Sep 16, 2024
1594dc4
remove identity closure
ctb Sep 16, 2024
2465c02
add in a print for debugging
ctb Sep 16, 2024
5af5f45
more print
ctb Sep 16, 2024
b46565d
remove prints
ctb Sep 16, 2024
b7a9850
Merge branch 'latest' of github.com:sourmash-bio/sourmash into remove…
ctb Sep 17, 2024
9c26752
add a test for Collection::intersect_manifest
ctb Sep 17, 2024
f2fde5d
cargo fmt
ctb Sep 17, 2024
f4e89f4
modify intersect_manifest signature
ctb Sep 18, 2024
9b9e17e
mut Collection for intersect_manifest
ctb Sep 18, 2024
aa109f8
omit unit return type
ctb Sep 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions src/core/src/collection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,10 @@
assert_eq!(sig.signatures.len(), 1);
Ok(sig)
}

pub fn intersect_manifest(&mut self, mf: &Manifest) {

Check warning on line 219 in src/core/src/collection.rs

View check run for this annotation

Codecov / codecov/patch

src/core/src/collection.rs#L219

Added line #L219 was not covered by tests
self.manifest = self.manifest.intersect_manifest(mf);
}
}

impl Select for Collection {
Expand All @@ -233,10 +237,11 @@
use super::Collection;

use crate::encodings::HashFunctions;
use crate::manifest::Manifest;
use crate::prelude::Select;
use crate::selection::Selection;
use crate::signature::Signature;
use crate::Result;

Check warning on line 244 in src/core/src/collection.rs

View workflow job for this annotation

GitHub Actions / test (beta)

unused import: `crate::Result`

Check warning on line 244 in src/core/src/collection.rs

View workflow job for this annotation

GitHub Actions / test (stable)

unused import: `crate::Result`

Check warning on line 244 in src/core/src/collection.rs

View workflow job for this annotation

GitHub Actions / test (macos)

unused import: `crate::Result`

Check warning on line 244 in src/core/src/collection.rs

View workflow job for this annotation

GitHub Actions / test (windows)

unused import: `crate::Result`

#[test]
fn sigstore_selection_with_downsample() {
Expand Down Expand Up @@ -358,6 +363,32 @@
assert_eq!(cl.len(), 0);
}

#[test]
fn collection_intersect_manifest() {
// load test sigs
let mut filename = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
// four num=500 sigs
filename.push("../../tests/test-data/genome-s11.fa.gz.sig");
let file = File::open(filename).unwrap();
let reader = BufReader::new(file);
let sigs: Vec<Signature> = serde_json::from_reader(reader).expect("Loading error");
assert_eq!(sigs.len(), 4);
// load sigs into collection + select compatible signatures
let mut cl = Collection::from_sigs(sigs).unwrap();
// all sigs should remain
assert_eq!(cl.len(), 4);

// grab first record
let manifest = cl.manifest();
let record = manifest.iter().next().unwrap().clone();
let vr = vec![record];

// now intersect:
let manifest2 = Manifest::from(vr);
cl.intersect_manifest(&manifest2);
assert_eq!(cl.len(), 1);
}

#[test]
fn sigstore_sig_from_record() {
// load test sigs
Expand Down
105 changes: 104 additions & 1 deletion src/core/src/manifest.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
use std::collections::HashSet;
use std::fs::File;
use std::hash::{Hash, Hasher};
use std::io::{BufRead, BufReader, Read, Write};
use std::ops::Deref;

Expand All @@ -17,7 +19,7 @@

/// Individual manifest record, containing information about sketches.

#[derive(Debug, Serialize, Deserialize, Clone, CopyGetters, Getters, Setters, PartialEq, Eq)]
#[derive(Debug, Serialize, Deserialize, Clone, CopyGetters, Getters, Setters)]
pub struct Record {
#[getset(get = "pub", set = "pub")]
internal_location: PathBuf,
Expand Down Expand Up @@ -176,6 +178,37 @@
}
}

impl PartialEq for Record {
// match everything but internal_location
fn eq(&self, other: &Self) -> bool {
self.md5 == other.md5
&& self.ksize == other.ksize
&& self.moltype == other.moltype
&& self.scaled == other.scaled
&& self.num == other.num
&& self.n_hashes == other.n_hashes
&& self.with_abundance == other.with_abundance
&& self.name == other.name
&& self.filename == other.filename
}
}

impl Eq for Record {}

impl Hash for Record {
fn hash<H: Hasher>(&self, state: &mut H) {

Check warning on line 199 in src/core/src/manifest.rs

View check run for this annotation

Codecov / codecov/patch

src/core/src/manifest.rs#L199

Added line #L199 was not covered by tests
self.md5.hash(state);
self.ksize.hash(state);

Check warning on line 201 in src/core/src/manifest.rs

View check run for this annotation

Codecov / codecov/patch

src/core/src/manifest.rs#L201

Added line #L201 was not covered by tests
self.moltype.hash(state);
self.scaled.hash(state);
self.num.hash(state);
self.n_hashes.hash(state);
self.with_abundance.hash(state);

Check warning on line 206 in src/core/src/manifest.rs

View check run for this annotation

Codecov / codecov/patch

src/core/src/manifest.rs#L204-L206

Added lines #L204 - L206 were not covered by tests
self.name.hash(state);
self.filename.hash(state);
}
}

impl Manifest {
pub fn from_reader<R: Read>(rdr: R) -> Result<Self> {
let mut records = vec![];
Expand Down Expand Up @@ -209,6 +242,20 @@
pub fn iter(&self) -> impl Iterator<Item = &Record> {
self.records.iter()
}

pub fn intersect_manifest(&self, other: &Manifest) -> Self {
// extract tuples from other mf:
let pairs: HashSet<_> = other.iter().collect();

let records = self
.records
.iter()
.filter(|row| pairs.contains(row))

Check warning on line 253 in src/core/src/manifest.rs

View check run for this annotation

Codecov / codecov/patch

src/core/src/manifest.rs#L253

Added line #L253 was not covered by tests
.cloned()
.collect();

Self { records }
}
}

impl Select for Manifest {
Expand Down Expand Up @@ -521,4 +568,60 @@
let scaled100 = manifest.select(&selection).unwrap();
assert_eq!(scaled100.len(), 6);
}

#[test]
fn manifest_intersect() {
let temp_dir = TempDir::new().unwrap();
let utf8_output = PathBuf::from_path_buf(temp_dir.path().to_path_buf())
.expect("Path should be valid UTF-8");
let filename = utf8_output.join("sig-pathlist.txt");
// build sig filenames
let base_path = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
let test_sigs = vec![
"../../tests/test-data/47.fa.sig",
"../../tests/test-data/63.fa.sig",
];

let full_paths: Vec<_> = test_sigs
.into_iter()
.map(|sig| base_path.join(sig))
.collect();

// write a file in test directory with a filename on each line
let mut pathfile = File::create(&filename).unwrap();
for sigfile in &full_paths {
writeln!(pathfile, "{}", sigfile).unwrap();
}

// load into manifest
let manifest = Manifest::from(&filename);
assert_eq!(manifest.len(), 2);

// now do just one sketch -
let test_sigs2 = vec!["../../tests/test-data/63.fa.sig"];

let filename2 = utf8_output.join("sig-pathlist-single.txt");

let full_paths: Vec<_> = test_sigs2
.into_iter()
.map(|sig| base_path.join(sig))
.collect();

let mut pathfile2 = File::create(&filename2).unwrap();
for sigfile in &full_paths {
writeln!(pathfile2, "{}", sigfile).unwrap();
}

// load into another manifest
let manifest2 = Manifest::from(&filename2);
assert_eq!(manifest2.len(), 1);

// intersect with itself => same.
let new_mf = manifest2.intersect_manifest(&manifest);
assert_eq!(new_mf.len(), 1);

// intersect with other => single.
let new_mf = manifest.intersect_manifest(&manifest2);
assert_eq!(new_mf.len(), 1);
}
}
Loading