Skip to content

Commit

Permalink
MRG: panic when FSStorage::load_sig encounters more than one `Signa…
Browse files Browse the repository at this point in the history
…ture` in a JSON record (#3333)

This PR was originally about debugging
sourmash-bio/sourmash_plugin_branchwater#445,
but that's going to require more work to fix properly. For now, I would
like to nominate it for merge because sourmash fails silently in this
situation, and that's Bad.

In brief, the main thing this PR does is panic with an `unimplemented!`
when `FSStorage::load_sig` encounters more than one `Signature` in a
JSON record.

This PR also adds a bit of documentation to `InnerStorage`, per the
bottom of [this
comment](sourmash-bio/sourmash_plugin_branchwater#445 (comment)).

---

The problem at hand: when loading a `SigStore`/`Signature` from a
`Storage`, sourmash only loads the first one and ignores any others.


https://github.com/sourmash-bio/sourmash/blob/26b50f3e3566006fd6356a4f8b4d47c5e381aeec/src/core/src/storage/mod.rs#L34-L38

This results from the concept of a `Signature` as containing one or more
sketches; the history of this is described
[here](#616 (comment)),
and it leads to some interesting silliness [in the Python
layer](https://github.com/sourmash-bio/sourmash/blob/d63c464e825529fa54bb7e8b81faa53b858b09de/src/sourmash/save_load.py#L297).

The contrapositive is that, in Rust, a single `Signature` can include
multiple sketches, e.g. with different ksizes. So this works fine for
the wort case where we have a single `.sig` file with k=21, k=31, k51.

Note that the Python layer (and hence the entire sourmash CLI) fully
supports multiple `Signature`s in JSON: this is well tested and well
covered behavior. The branchwater plugin runs into it because it is
using the Rust layer and the API is not fully fleshed out there.

---
  • Loading branch information
ctb authored Nov 4, 2024
1 parent f707db4 commit c9fb078
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 9 deletions.
17 changes: 17 additions & 0 deletions src/core/src/collection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,23 @@ mod test {
}
}

#[test]
#[should_panic] // for now...
fn sigstore_sig_from_record_2() {
let mut filename = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
filename.push("../../tests/test-data/short.sig.gz");
let v = [filename];
let collection = Collection::from_paths(&v).expect("no sigs!?");

// pull off first record
let v: Vec<_> = collection.iter().collect();
let (_idx, rec) = v.first().expect("no records in collection?!");

// this will panic with "unimplemented" because there are two
// sketches and that is not supported.
let _first_sig = collection.sig_from_record(rec).expect("no sig!?");
}

#[test]
fn sigstore_selection_moltype_zip() {
// load test sigs
Expand Down
35 changes: 26 additions & 9 deletions src/core/src/storage/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,11 @@ pub trait Storage {
/// Load signature from internal path
fn load_sig(&self, path: &str) -> Result<SigStore> {
let raw = self.load(path)?;
let sig = Signature::from_reader(&mut &raw[..])?
// TODO: select the right sig?
.swap_remove(0);
let mut vs = Signature::from_reader(&mut &raw[..])?;
if vs.len() > 1 {
unimplemented!("only one Signature currently allowed");
}
let sig = vs.swap_remove(0);

Ok(sig.into())
}
Expand Down Expand Up @@ -70,6 +72,16 @@ pub enum StorageError {
MissingFeature(String, String),
}

/// InnerStorage: a catch-all type that allows using any Storage in
/// parallel contexts.
///
/// Arc allows ref counting to share it between threads;
/// RwLock makes sure there is only one writer possible (and a lot of readers);
/// dyn Storage so we can init with anything that implements the Storage trait.

// Send + Sync + 'static is kind of a cheat to avoid lifetimes issues: we
// should get rid of that 'static if possible... -- Luiz.

#[derive(Clone)]
pub struct InnerStorage(Arc<RwLock<dyn Storage + Send + Sync + 'static>>);

Expand Down Expand Up @@ -299,9 +311,12 @@ impl Storage for FSStorage {

fn load_sig(&self, path: &str) -> Result<SigStore> {
let raw = self.load(path)?;
let sig = Signature::from_reader(&mut &raw[..])?
// TODO: select the right sig?
.swap_remove(0);

let mut vs = Signature::from_reader(&mut &raw[..])?;
if vs.len() > 1 {
unimplemented!("only one Signature currently allowed when using 'load_sig'");
}
let sig = vs.swap_remove(0);

Ok(sig.into())
}
Expand Down Expand Up @@ -369,9 +384,11 @@ impl Storage for ZipStorage {

fn load_sig(&self, path: &str) -> Result<SigStore> {
let raw = self.load(path)?;
let sig = Signature::from_reader(&mut &raw[..])?
// TODO: select the right sig?
.swap_remove(0);
let mut vs = Signature::from_reader(&mut &raw[..])?;
if vs.len() > 1 {
unimplemented!("only one Signature currently allowed");
}
let sig = vs.swap_remove(0);

Ok(sig.into())
}
Expand Down
Binary file added tests/test-data/short.sig.gz
Binary file not shown.

0 comments on commit c9fb078

Please sign in to comment.