zebrafish example bug #2

AltynaiA · 2019-01-24T13:17:56Z

I tried to run the zebrafish notebook and received this error message

>>>var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-8-3cb7686bdda8> in <module>
----> 1 var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'./data/gene_names.txt' does not exist

falexwolf · 2019-01-24T20:14:44Z

Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?

calebweinreb · 2019-01-24T20:23:52Z

Hi Alex, As far as I know, you are free to upload the files and host them however you want want since the paper is published. Let me know if you need us to send the files again. I am pretty sure we have also uploaded them somewhere- maybe Dan can comment on that.

…

-- Caleb

On Thu, Jan 24, 2019 at 3:14 PM Alex Wolf ***@***.***> wrote: Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb <https://github.com/calebweinreb>, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AF3OsnmFntcnbB9Qsy46CW1osqB4-wo3ks5vGhQ0gaJpZM4aQ2UU> .

calebweinreb · 2019-01-24T20:52:33Z

Hi Caleb & Alex, Yes, Caleb is correct. The files are publicly available in multiple formats from the locations linked below. I am not sure if a more specific format/version would be needed for seamless integration with Alex's PAGA notebook. Please let me know if I could help with that. I am also happy to upload any additional such files to our Kleintools web portal, which could then be easily linked to from your Github. Best, Dan GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112294 Kleintools web portal: http://www.tinyurl.com/scZfish2018 Other Matlab and h5ad versions of the dataset: https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.mat https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad

…

On Thu, Jan 24, 2019 at 3:23 PM Caleb Weinreb ***@***.***> wrote: Hi Alex, As far as I know, you are free to upload the files and host them however you want want since the paper is published. Let me know if you need us to send the files again. I am pretty sure we have also uploaded them somewhere- maybe Dan can comment on that. -- Caleb On Thu, Jan 24, 2019 at 3:14 PM Alex Wolf ***@***.***> wrote: > Hi Altynai! The zebrafish notebook is the only one where there is no > public data at this stage - all other notebooks can be executed. The data > has been sent out by email from the lab. @calebweinreb > <https://github.com/calebweinreb>, are you or dan (he is probably not on > github, right?) planning to upload the files that you shared at the time > somewhere? Would you mind if I upload them somewhere so that the notebook > becomes executable? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AF3OsnmFntcnbB9Qsy46CW1osqB4-wo3ks5vGhQ0gaJpZM4aQ2UU> > . >

flying-sheep · 2019-01-25T08:06:03Z

Ah, nice! h5ad even. This should make it trivial to include a line like

adata = sc.read('WagnerScience2018', backup_url='https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad')

right? No cache=True necessary, since it’s h5ad anyway, and using an extensionless key means it saves it into the sc.settings.writedir.

falexwolf · 2019-01-25T11:08:43Z

Awesome, @calebweinreb & Dan, thanks for the quick answer! And so cool that you uploaded an .h5ad version of the dataset! 😄

As the uploaded h5ad doesn't contain the single-cell graph, however, the material is not enough to run the PAGA notebook here. What Caleb sent me around May consisted in these files:

The uploaded h5ad only contains the following

What misses is essentially the graph that underlies the figures in your paper and the SPRING-based exploration, which saved me (saves the people trying to reproduce the notebook) from doing all the preprocessing and graph computation. It be really cool if the graph was publicly available. For this, one could either upload a compressed version of the files (as shared via email at the time). Or one could upload an AnnData object that contains the graph. The only problem I see is that the current https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad seems to contain unfiltered data with 63530 cells, whereas the graph that Caleb sent out contains 53181 cells. I don't know whether you'd want me to send you this AnnData, which contains the graph. You could upload it as WagnerScience2018_processed.h5ad or something similar. And then I could polish the notebook and make something similar to this for the zebrafish. 😉

falexwolf · 2019-01-25T11:11:06Z

If you don't want me to send you the AnnData containing the graph, I'm also happy to upload the data somewhere else. But the canonical location would be your web page, I guess. 🙂

psahai10 · 2019-03-15T21:17:26Z

Hello, I've been following the thread and was also interested in reproducing the PAGA notebook with the zebrafish dataset but still couldn't find any of the additional files needed to run the notebook. Any information on how to obtain them would be greatly appreciated. Thanks in advance.

ChengTao2017 · 2019-03-18T06:44:05Z

How can I get the additional files that I can run the zebrafish notebook successfully?Could anyone help with this? Thanks in advance!

falexwolf · 2019-03-19T10:28:42Z

Sorry for the late response on this.

I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing

I'll also make a note on the notebook.

falexwolf · 2019-03-19T10:32:19Z

Here is the note on the notebook: b5dfcff

kangbw702 · 2020-04-27T01:11:00Z

Hi, I try to replicate zebrafish.ipynb, but encounter some errors.

In chunk 8, I cannot write the anndata to file.

If I omit the error in Chunk 8, and run umap in chunk 9. There is still an error.

In chunk 13 and 17, there are no sc.utils.merge_groups in scanpy.api.

Also, I try to directly use the Anndata object WagnerScience2018.h5ad on https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/mainpage.html. But this data has a different number of cells with the object in Jupyter.

Could you please help me to figure this out?

Thanks

davidfstein · 2020-06-03T19:21:19Z

Im having the same issue as @kangbw702

yotamcons · 2023-01-03T15:49:41Z

Joining in late:
1 - I would very much appreciate if someone can re-upload the cell-cell connectivity file

2 - The difference in cell numbers is probably due to some filtering. In the original dataset there is the wildtype experiment (36749 cells) TraceSeq experiment (another ~30k cells), all from 24hpf.
to remove them we used the library ID:
adata = adata[adata.obs.library_id.str.startswith('DEW0'), :]

3 - Regarding the missing sc.utils.merge_groups function, after you get the mapping d, you can use:
adata.obs['cluster_coarse'] = adata.obs['clusters'].map(d)
(Note that on the WagnerScience2018.h5ad the clusters column is actually ClusterName, so you'll need to replace that in the tutorial)

ZoeyYDKY · 2024-02-06T09:00:44Z

@falexwolf

Sorry for the late response on this.

I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing

I'll also make a note on the notebook.

sorry, I'm coming too late so the link you have shared is unavailable. Can you share it again? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zebrafish example bug #2

zebrafish example bug #2

AltynaiA commented Jan 24, 2019

falexwolf commented Jan 24, 2019

calebweinreb commented Jan 24, 2019 via email

calebweinreb commented Jan 24, 2019 via email

flying-sheep commented Jan 25, 2019 •

edited

Loading

falexwolf commented Jan 25, 2019

falexwolf commented Jan 25, 2019

psahai10 commented Mar 15, 2019

ChengTao2017 commented Mar 18, 2019

falexwolf commented Mar 19, 2019

falexwolf commented Mar 19, 2019

kangbw702 commented Apr 27, 2020

davidfstein commented Jun 3, 2020 •

edited

Loading

yotamcons commented Jan 3, 2023

ZoeyYDKY commented Feb 6, 2024

zebrafish example bug #2

zebrafish example bug #2

Comments

AltynaiA commented Jan 24, 2019

falexwolf commented Jan 24, 2019

calebweinreb commented Jan 24, 2019 via email

calebweinreb commented Jan 24, 2019 via email

flying-sheep commented Jan 25, 2019 • edited Loading

falexwolf commented Jan 25, 2019

falexwolf commented Jan 25, 2019

psahai10 commented Mar 15, 2019

ChengTao2017 commented Mar 18, 2019

falexwolf commented Mar 19, 2019

falexwolf commented Mar 19, 2019

kangbw702 commented Apr 27, 2020

davidfstein commented Jun 3, 2020 • edited Loading

yotamcons commented Jan 3, 2023

ZoeyYDKY commented Feb 6, 2024

flying-sheep commented Jan 25, 2019 •

edited

Loading

davidfstein commented Jun 3, 2020 •

edited

Loading