Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add probability of overlap and weighted containment for Multisearch matches #458

Merged
merged 94 commits into from
Nov 12, 2024

Conversation

olgabot
Copy link
Contributor

@olgabot olgabot commented Sep 28, 2024

Originally explored in this notebook, some high-containment hits are a result of highly frequent k-mers, and I want to downweight the containment by the probability of overlap. As I imagine it now, the frequencies would be computed based on all queries and all database signatures.

Here is a worked example, please let me know if I am missing something:

query: ACGTTTTT

3-mers (6 total):

  1. ACG
  2. CGT
  3. GTT
  4. TTT x 3

target: TTTTTTTTTAC

3-mers (8 total):

  1. TTT x 7
  2. TAC

Containment:

intersecting k-mers in query / intersecting k-mers in target

= 3/7

Probability of overlap:

frequency of intersecting k-mers in query * frequency of intersecting k-mers in target

= 3/6 * 7/8

= 1/2 * 7/8

= 7/16

Weighted containment:

Containment / Probability of overlap = Containment * (1/Probability of overlap)

= 3/7 * 7/16

= 3/16

Update: fix k-mers in example

@olgabot
Copy link
Contributor Author

olgabot commented Sep 30, 2024

This function uses log of probabilities to prevent underflow, but apparently Rust log calculations, e.g. ln() for natural log has unspecified precision, with non-deterministic outputs? This seems Really Bad(TM) for consistent results across different multisearch runs...

Source: https://doc.rust-lang.org/std/primitive.f64.html#method.ln

pub fn [ln](https://doc.rust-lang.org/std/primitive.[f64](https://doc.rust-lang.org/std/primitive.f64.html).html#method.ln)(self) -> f64

Returns the natural logarithm of the number.

Unspecified precision

The precision of this function is non-deterministic. This means it varies by platform, Rust version, and can even differ within the same execution from one invocation to the next.

Examples

let one = 1.0_f64;
// e^1
let e = one.exp();

// ln(e) - 1 == 0
let abs_difference = (e.ln() - 1.0).abs();

assert!(abs_difference < 1e-10);

EDIT: update code formatting

@olgabot
Copy link
Contributor Author

olgabot commented Oct 1, 2024

Interesting, the example dataset for test_multisearch.py:test_simple_no_ani shows some interesting behavior for self-similarity using "adjusted" containment, where containment_adjusted = containment / prob_overlap_adjusted. The adjustment of probability was performed with simple, strict Bonferroni correction: prob_overlap_adjusted = prob_overlap * n_comparisons

It's interesting that while the containment varies from ~0.48-1, the containment_adjusted varies from ~2405 - 4610.

query_name query_md5 match_name match_md5 containment max_containment jaccard intersect_hashes prob_overlap prob_overlap_adjusted containment_adjusted containment_adjusted_log10
CP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome f3a90d4e5528864a5bcc8434b0d0c3b1 CP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome f3a90d4e5528864a5bcc8434b0d0c3b1 1 1 1 2701 2.41E-05 0.0002168808804 4610.82599 3.663778732
NC_009661.1 Shewanella baltica OS185 plasmid pS18501, complete genome 09a08691ce52952152f0e866a59f6261 NC_011665.1 Shewanella baltica OS223 plasmid pS22303, complete sequence 38729c6374925585db28916b82a6f513 0.4885068573 0.4885068573 0.3206949024 2529 2.26E-05 0.0002030698802 2405.609619 3.381225152
NC_011665.1 Shewanella baltica OS223 plasmid pS22303, complete sequence 38729c6374925585db28916b82a6f513 NC_011665.1 Shewanella baltica OS223 plasmid pS22303, complete sequence 38729c6374925585db28916b82a6f513 1 1 1 5238 4.67E-05 0.0004205931327 2377.594693 3.376137823
NC_009661.1 Shewanella baltica OS185 plasmid pS18501, complete sequence 09a08691ce52952152f0e866a59f6261 NC_009661.1 Shewanella baltica OS185 plasmid pS18501, complete sequence 09a08691ce52952152f0e866a59f6261 1 1 1 5177 4.62E-05 0.0004156950454 2405.609619 3.381225152
NC_011665.1 Shewanella baltica OS223 plasmid pS22303, complete sequence 38729c6374925585db28916b82a6f513 NC_009661.1 Shewanella baltica OS185 plasmid pS18501, complete sequence 09a08691ce52952152f0e866a59f6261 0.4828178694 0.4885068573 0.3206949024 2529 2.26E-05 0.0002030698802 2377.594693 3.376137823

@olgabot
Copy link
Contributor Author

olgabot commented Oct 1, 2024

As a follow-up, from this notebook, I compared all human GENCODE proteins vs Botryllus schlosseri proteins. I was experimenting with how to avoid very common k-mers and in particular, hits to Titin, the largest known protein with 25,000 - 35,000 amino acids per protein (!!!). This method of using the frequency of k-mers across all queries and againsts, subsetting to only the overlapping k-mers between a single query and against, and multiplying each pair and taking the sum, was successful in getting rid of the spurious matches to Titin.

However, this method doesn't take length of the query or against into account. It only uses the frequencies of the k-mers across all queries/againsts.

Here are some plots to show the distribution of p-values, containment, and adjusted containment:

Adjusted p-value distribution

image

Containment (original)

image

Containment adjusted, log10

image

I think the bump to the left is all false positives, caused by spurious matches from very common k-mers.

@olgabot
Copy link
Contributor Author

olgabot commented Nov 12, 2024

Hi @ctb, I have addressed your requests, but the tests are still failing and I have some questions.

Why is there a difference in Cargo.lock changes source repo for sourmash to rust-lang?

My local Cargo.lock file has the differences below. I haven't committed them because they don't make sense to me. Do you know what may be going on?

diff --git a/Cargo.lock b/Cargo.lock
index 72ed890..dfa863d 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1796,7 +1796,8 @@ checksum = "bceb57dc07c92cdae60f5b27b3fa92ecaaa42fe36c55e22dbfb0b44893e0b1f7"
 [[package]]
 name = "sourmash"
 version = "0.17.0"
-source = "git+https://github.com/sourmash-bio/sourmash.git?branch=latest#c7363154b546058eb417b78bb77aca6523591cb1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8ce05fed73303390f6f208d6640f390cd999db0af0b6c007d60db2794ad5fcc0"
 dependencies = [
  "az",
  "byteorder",

Help with test_fastgather.py:test_against_multisigfile failing due to not implemented: only one Signature currently allowed when using 'load_sig'

All tests are passing except test_fastgather.py:test_against_multisigfile (which is explored here: #445 -- so I'm not the only one who is confused!), which is failing on the GitHub actions (but not on my local machine) due to pyo3_runtime.PanicException: not implemented: only one Signature currently allowed when using 'load_sig' (full error message below). But I can't see any differences between my branch and main for src/fastgather.rs. The error seems to be originating with the fix from PR sourmash-bio/sourmash#3333 -- how should we proceed?

(branchwater) (base) 
 Tue 12 Nov - 18:37  ~/sourmash_plugin_branchwater   origin ☊ olgabot/multisearch-evalue 1● 
 ec2-user@ip-172-31-54-97  git diff olgabot/multisearch-evalue..main -- src/fastgather.rs  | wc -l
0
Error output
============================= test session starts ==============================
platform linux -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater
configfile: pyproject.toml
collected 398 items

src/python/tests/test_cluster.py ................                        [  4%]
src/python/tests/test_fastgather.py .................................... [ 13%]
.F....................                                                   [ 18%]
src/python/tests/test_fastmultigather.py ............................... [ 26%]
.........................................                                [ 36%]
src/python/tests/test_index.py ...................................       [ 45%]
src/python/tests/test_manysearch.py .................................... [ 54%]
................................                                         [ 62%]
src/python/tests/test_multisearch.py ................................... [ 71%]
..................................................                       [ 83%]
src/python/tests/test_pairwise.py ................................       [ 91%]
src/python/tests/test_sketch.py ................................         [100%]

=================================== FAILURES ===================================
_______________________ test_against_multisigfile[False] _______________________

runtmp = <tests.sourmash_tst_utils.RunnerContext object at 0x7f2c1d89ec30>
zip_against = False

    def test_against_multisigfile(runtmp, zip_against):
        # test against a sigfile that contains multiple sketches
        query = get_test_data("SRR606249.sig.gz")
        against_list = runtmp.output("against.txt")
    
        sig2 = get_test_data("2.fa.sig.gz")
        sig47 = get_test_data("47.fa.sig.gz")
        sig63 = get_test_data("63.fa.sig.gz")
    
        combined = runtmp.output("combined.sig.gz")
        runtmp.sourmash("sig", "cat", sig2, sig47, sig63, "-o", combined)
        make_file_list(against_list, [combined])
    
        if zip_against:
            against_list = zip_siglist(runtmp, against_list, runtmp.output("against.zip"))
    
        g_output = runtmp.output("gather.csv")
        p_output = runtmp.output("prefetch.csv")
    
>       runtmp.sourmash(
            "scripts",
            "fastgather",
            query,
            against_list,
            "-o",
            g_output,
            "--output-prefetch",
            p_output,
            "-s",
            "100000",
        )

src/python/tests/test_fastgather.py:483: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tests.sourmash_tst_utils.RunnerContext object at 0x7f2c1d89ec30>
args = ('scripts', 'fastgather', '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/SRR606249.sig.gz', '/tmp/sourmashtest_hoei6qdi/against.txt', '-o', '/tmp/sourmashtest_hoei6qdi/gather.csv', ...)
kwargs = {'fail_ok': True, 'in_directory': '/tmp/sourmashtest_hoei6qdi'}
cmdlist = ['sourmash', 'scripts', 'fastgather', '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/SRR606249.sig.gz', '/tmp/sourmashtest_hoei6qdi/against.txt', '-o', ...]

    def run_sourmash(self, *args, **kwargs):
        "Run the sourmash script with the given arguments."
        kwargs["fail_ok"] = True
        if "in_directory" not in kwargs:
            kwargs["in_directory"] = self.location
    
        cmdlist = ["sourmash"]
        cmdlist.extend((str(x) for x in args))
        self.last_command = " ".join(cmdlist)
        self.last_result = runscript("sourmash", args, **kwargs)
    
        if self.last_result.status:
>           raise SourmashCommandFailed(self.last_result.err)
E           tests.sourmash_tst_utils.SourmashCommandFailed: 
=> sourmash_plugin_branchwater 0.9.10; cite Irber et al., doi: 10.1101/2022.11.02.514947
E           
E           
ksize: 31 / scaled: 100000 / moltype: DNA / threshold bp: 50000
E           
gathering all sketches in '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/SRR606249.sig.gz' against '/tmp/sourmashtest_hoei6qdi/against.txt' using 4 threads
E           Traceback (most recent call last):
E             File "/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/sourmash_tst_utils.py", line 143, in runscript
E               status = _runscript(scriptname)
E                        ^^^^^^^^^^^^^^^^^^^^^^
E             File "/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/sourmash_tst_utils.py", line 85, in _runscript
E               pkg_resources.load_entry_point("sourmash", "console_scripts", scriptname)()
E             File "/home/runner/miniconda3/envs/sourmash_dev/lib/python3.12/site-packages/sourmash/__main__.py", line 20, in main
E               retval = mainmethod(args)
E                        ^^^^^^^^^^^^^^^^
E             File "/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/sourmash_plugin_branchwater/__init__.py", line 206, in main
E               status = sourmash_plugin_branchwater.do_fastgather(
E                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E           pyo3_runtime.PanicException: not implemented: only one Signature currently allowed when using 'load_sig'

src/python/tests/sourmash_tst_utils.py:220: SourmashCommandFailed
----------------------------- Captured stdout call -----------------------------
running: sourmash in: /tmp/sourmashtest_hoei6qdi
arguments ['sourmash', 'sig', 'cat', '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/2.fa.sig.gz', '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/47.fa.sig.gz', '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/63.fa.sig.gz', '-o', '/tmp/sourmashtest_hoei6qdi/combined.sig.gz']
running: sourmash in: /tmp/sourmashtest_hoei6qdi
arguments ['sourmash', 'scripts', 'fastgather', '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/SRR606249.sig.gz', '/tmp/sourmashtest_hoei6qdi/against.txt', '-o', '/tmp/sourmashtest_hoei6qdi/gather.csv', '--output-prefetch', '/tmp/sourmashtest_hoei6qdi/prefetch.csv', '-s', '100000']
----------------------------- Captured stderr call -----------------------------
Reading query(s) from: '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/SRR606249.sig.gz'
Loaded 1 query signature(s)
Reading search(s) from: '/tmp/sourmashtest_hoei6qdi/against.txt'
SUCCEEDED in loading as JSON files, woot woot
Loaded 3 search signature(s)
using threshold overlap: 1 50000
thread '<unnamed>' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.17.0/src/storage/mod.rs:317:13:
not implemented: only one Signature currently allowed when using 'load_sig'
thread '<unnamed>' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.17.0/src/storage/mod.rs:317:13:
not implemented: only one Signature currently allowed when using 'load_sig'
thread '<unnamed>' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sourmash-0.17.0/src/storage/mod.rs:317:13:
not implemented: only one Signature currently allowed when using 'load_sig'
=============================== warnings summary ===============================
src/python/tests/sourmash_tst_utils.py:10
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/sourmash_tst_utils.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

src/python/tests/test_fastgather.py::test_equal_matches
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_fastgather.py:1346: DeprecatedWarning: save_signatures is deprecated as of 4.8.9 and will be removed in 5.0. use sourmash_args.SaveSignaturesToLocation instead.
    sourmash.save_signatures([ss], open(runtmp.output("a.sig"), "wb"))

src/python/tests/test_fastgather.py::test_equal_matches
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_fastgather.py:1348: DeprecatedWarning: save_signatures is deprecated as of 4.8.9 and will be removed in 5.0. use sourmash_args.SaveSignaturesToLocation instead.
    sourmash.save_signatures([ss], open(runtmp.output("b.sig"), "wb"))

src/python/tests/test_fastgather.py::test_equal_matches
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_fastgather.py:1350: DeprecatedWarning: save_signatures is deprecated as of 4.8.9 and will be removed in 5.0. use sourmash_args.SaveSignaturesToLocation instead.
    sourmash.save_signatures([ss], open(runtmp.output("mg.sig"), "wb"))

src/python/tests/test_fastmultigather.py::test_equal_matches[True]
src/python/tests/test_fastmultigather.py::test_equal_matches[False]
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_fastmultigather.py:2128: DeprecatedWarning: save_signatures is deprecated as of 4.8.9 and will be removed in 5.0. use sourmash_args.SaveSignaturesToLocation instead.
    sourmash.save_signatures([ss], open(runtmp.output("a.sig"), "wb"))

src/python/tests/test_fastmultigather.py::test_equal_matches[True]
src/python/tests/test_fastmultigather.py::test_equal_matches[False]
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_fastmultigather.py:2130: DeprecatedWarning: save_signatures is deprecated as of 4.8.9 and will be removed in 5.0. use sourmash_args.SaveSignaturesToLocation instead.
    sourmash.save_signatures([ss], open(runtmp.output("b.sig"), "wb"))

src/python/tests/test_fastmultigather.py::test_equal_matches[True]
src/python/tests/test_fastmultigather.py::test_equal_matches[False]
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_fastmultigather.py:2132: DeprecatedWarning: save_signatures is deprecated as of 4.8.9 and will be removed in 5.0. use sourmash_args.SaveSignaturesToLocation instead.
    sourmash.save_signatures([ss], open(runtmp.output("mg.sig"), "wb"))

src/python/tests/test_sketch.py::test_manysketch_singleton
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:705: DeprecatedWarning: load_signatures is deprecated as of 3.5.1 and will be removed in 5.0. Use load_file_as_signatures instead.
    ss_sketch = sourmash.load_signatures(singleton_sketch)

src/python/tests/test_sketch.py::test_manysketch_reads
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:768: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig1 = sourmash.load_one_signature(s1)

src/python/tests/test_sketch.py::test_manysketch_reads
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:781: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig2 = sourmash.load_one_signature(s3)

src/python/tests/test_sketch.py::test_manysketch_reads_singleton
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:845: DeprecatedWarning: load_signatures is deprecated as of 3.5.1 and will be removed in 5.0. Use load_file_as_signatures instead.
    ss = sourmash.load_signatures(s1)

src/python/tests/test_sketch.py::test_manysketch_prefix
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:930: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig1 = sourmash.load_one_signature(s1)

src/python/tests/test_sketch.py::test_manysketch_prefix
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:943: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig2 = sourmash.load_one_signature(s2)

src/python/tests/test_sketch.py::test_manysketch_prefix2
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:1026: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig1 = sourmash.load_one_signature(s1)

src/python/tests/test_sketch.py::test_manysketch_prefix2
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:1039: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig2 = sourmash.load_one_signature(s2)

src/python/tests/test_sketch.py::test_singlesketch_simple
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:1190: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig = sourmash.load_one_signature(output)

src/python/tests/test_sketch.py::test_singlesketch_with_name
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:1208: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig = sourmash.load_one_signature(output)

src/python/tests/test_sketch.py::test_singlesketch_mult_k
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:1236: DeprecatedWarning: load_signatures is deprecated as of 3.5.1 and will be removed in 5.0. Use load_file_as_signatures instead.
    sigs = list(sourmash.load_signatures(output))

src/python/tests/test_sketch.py::test_singlesketch_mult_moltype
  /home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test_sketch.py:1256: DeprecatedWarning: load_one_signature is deprecated as of 4.8.9 and will be removed in 5.0. Use load_file_as_signatures instead.
    sig = sourmash.load_one_signature(output)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED src/python/tests/test_fastgather.py::test_against_multisigfile[False] - tests.sourmash_tst_utils.SourmashCommandFailed: 
=> sourmash_plugin_branchwater 0.9.10; cite Irber et al., doi: 10.1101/2022.11.02.514947


ksize: 31 / scaled: 100000 / moltype: DNA / threshold bp: 50000

gathering all sketches in '/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/test-data/SRR606249.sig.gz' against '/tmp/sourmashtest_hoei6qdi/against.txt' using 4 threads
Traceback (most recent call last):
  File "/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/sourmash_tst_utils.py", line 143, in runscript
    status = _runscript(scriptname)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/tests/sourmash_tst_utils.py", line 85, in _runscript
    pkg_resources.load_entry_point("sourmash", "console_scripts", scriptname)()
  File "/home/runner/miniconda3/envs/sourmash_dev/lib/python3.12/site-packages/sourmash/__main__.py", line 20, in main
    retval = mainmethod(args)
             ^^^^^^^^^^^^^^^^
  File "/home/runner/work/sourmash_plugin_branchwater/sourmash_plugin_branchwater/src/python/sourmash_plugin_branchwater/__init__.py", line 206, in main
    status = sourmash_plugin_branchwater.do_fastgather(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: not implemented: only one Signature currently allowed when using 'load_sig'
============ 1 failed, 397 passed, 22 warnings in 71.18s (0:01:11) =============
Error: Process completed with exit code 1.

Thank you so much for your help!

@ctb
Copy link
Collaborator

ctb commented Nov 12, 2024

My local Cargo.lock file has the differences below. I haven't committed them because they don't make sense to me. Do you know what may be going on?

it looks like maybe some cruft left over from using a development branch of sourmash, vs the official crates.io release. I would suggest this: try merging and then doing cargo update -p sourmash and see what happens!

@ctb
Copy link
Collaborator

ctb commented Nov 12, 2024

please let me know when ready for rereview!

@olgabot
Copy link
Contributor Author

olgabot commented Nov 12, 2024

My local Cargo.lock file has the differences below. I haven't committed them because they don't make sense to me. Do you know what may be going on?

it looks like maybe some cruft left over from using a development branch of sourmash, vs the official crates.io release. I would suggest this: try merging and then doing cargo update -p sourmash and see what happens!

I did that, but still get the same diff 😕 What I'm confused about is that Cargo.lock suggests using the official crate on rust-lang, but the current main of sourmash_plugin_branchwater matches my "old" one, what Cargo.lock is trying to change:

name = "sourmash"
version = "0.17.0"
source = "git+https://github.com/sourmash-bio/sourmash.git?branch=latest#c7363154b546058eb417b78bb77aca6523591cb1"

I can change the sourmash dependency to the below, but it will cause merge conflicts, so I haven't yet:

[[package]]
name = "sourmash"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ce05fed73303390f6f208d6640f390cd999db0af0b6c007d60db2794ad5fcc0"

@ctb
Copy link
Collaborator

ctb commented Nov 12, 2024

I did that, but still get the same diff 😕 What I'm confused about is that Cargo.lock suggests using the official crate on rust-lang, but the current main of sourmash_plugin_branchwater matches my "old" one, what Cargo.lock is trying to change:

if you merge in main, and then change it to the rust-lang crate, you shouldn't get merge conflicts out of it?

In any case, I can deal with it in review, as long as your tests pass with whatever you have on the branch!

@olgabot olgabot requested a review from ctb November 12, 2024 20:02
@olgabot
Copy link
Contributor Author

olgabot commented Nov 12, 2024

Ready for re-review @ctb!

@@ -45,10 +168,10 @@ pub fn multisearch(

let ksize = selection.ksize().unwrap() as f64;

let mut new_selection = selection;
let mut new_selection = selection.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTB: to check. This clone should maybe not be needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was correct that the clone is not required; I like not having it because it consumes selection so you can't reuse selection accidentally below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed! This saves a problem because I was using selection accidentally below .. so yay memory safety!

@@ -494,7 +494,11 @@ def test_against_multisigfile(runtmp, zip_against):
"100000",
)
df = pandas.read_csv(g_output)
assert len(df) == 3
if zip_against:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this set of changes is unintentional. Can you revert to what's in main?

Copy link
Collaborator

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great! Only one or two minor changes left and then I can approve.

Note that I bumped sourmash to sourmash v0.17.1 in Cargo.toml.

@ctb
Copy link
Collaborator

ctb commented Nov 12, 2024

p.s. please let me know whether or not you'd like to merge it once I approve it!

@olgabot olgabot requested a review from ctb November 12, 2024 21:29
@ctb ctb merged commit 0dd65d6 into main Nov 12, 2024
3 checks passed
@ctb ctb deleted the olgabot/multisearch-evalue branch November 12, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants