Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebrafish example bug #2

Open
AltynaiA opened this issue Jan 24, 2019 · 14 comments
Open

zebrafish example bug #2

AltynaiA opened this issue Jan 24, 2019 · 14 comments

Comments

@AltynaiA
Copy link

I tried to run the zebrafish notebook and received this error message

>>>var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-8-3cb7686bdda8> in <module>
----> 1 var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'./data/gene_names.txt' does not exist
@falexwolf
Copy link
Contributor

Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?

@calebweinreb
Copy link

calebweinreb commented Jan 24, 2019 via email

@calebweinreb
Copy link

calebweinreb commented Jan 24, 2019 via email

@flying-sheep
Copy link

flying-sheep commented Jan 25, 2019

Ah, nice! h5ad even. This should make it trivial to include a line like

adata = sc.read('WagnerScience2018', backup_url='https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad')

right? No cache=True necessary, since it’s h5ad anyway, and using an extensionless key means it saves it into the sc.settings.writedir.

@falexwolf
Copy link
Contributor

Awesome, @calebweinreb & Dan, thanks for the quick answer! And so cool that you uploaded an .h5ad version of the dataset! 😄

As the uploaded h5ad doesn't contain the single-cell graph, however, the material is not enough to run the PAGA notebook here. What Caleb sent me around May consisted in these files:
image
The uploaded h5ad only contains the following
image

What misses is essentially the graph that underlies the figures in your paper and the SPRING-based exploration, which saved me (saves the people trying to reproduce the notebook) from doing all the preprocessing and graph computation. It be really cool if the graph was publicly available. For this, one could either upload a compressed version of the files (as shared via email at the time). Or one could upload an AnnData object that contains the graph. The only problem I see is that the current https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad seems to contain unfiltered data with 63530 cells, whereas the graph that Caleb sent out contains 53181 cells. I don't know whether you'd want me to send you this AnnData, which contains the graph. You could upload it as WagnerScience2018_processed.h5ad or something similar. And then I could polish the notebook and make something similar to this for the zebrafish. 😉

@falexwolf
Copy link
Contributor

If you don't want me to send you the AnnData containing the graph, I'm also happy to upload the data somewhere else. But the canonical location would be your web page, I guess. 🙂

@psahai10
Copy link

Hello, I've been following the thread and was also interested in reproducing the PAGA notebook with the zebrafish dataset but still couldn't find any of the additional files needed to run the notebook. Any information on how to obtain them would be greatly appreciated. Thanks in advance.

@ChengTao2017
Copy link

How can I get the additional files that I can run the zebrafish notebook successfully?Could anyone help with this? Thanks in advance!

@falexwolf
Copy link
Contributor

Sorry for the late response on this.

I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing

I'll also make a note on the notebook.

@falexwolf
Copy link
Contributor

Here is the note on the notebook: b5dfcff

@kangbw702
Copy link

Hi, I try to replicate zebrafish.ipynb, but encounter some errors.

  1. In chunk 8, I cannot write the anndata to file.

Screen Shot 2020-04-26 at 9 01 17 PM

  1. If I omit the error in Chunk 8, and run umap in chunk 9. There is still an error.

Screen Shot 2020-04-26 at 9 02 36 PM

  1. In chunk 13 and 17, there are no sc.utils.merge_groups in scanpy.api.

Also, I try to directly use the Anndata object WagnerScience2018.h5ad on https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/mainpage.html. But this data has a different number of cells with the object in Jupyter.

Could you please help me to figure this out?

Thanks

@davidfstein
Copy link

davidfstein commented Jun 3, 2020

Im having the same issue as @kangbw702

@yotamcons
Copy link

Joining in late:
1 - I would very much appreciate if someone can re-upload the cell-cell connectivity file

2 - The difference in cell numbers is probably due to some filtering. In the original dataset there is the wildtype experiment (36749 cells) TraceSeq experiment (another ~30k cells), all from 24hpf.
to remove them we used the library ID:
adata = adata[adata.obs.library_id.str.startswith('DEW0'), :]

3 - Regarding the missing sc.utils.merge_groups function, after you get the mapping d, you can use:
adata.obs['cluster_coarse'] = adata.obs['clusters'].map(d)
(Note that on the WagnerScience2018.h5ad the clusters column is actually ClusterName, so you'll need to replace that in the tutorial)

@ZoeyYDKY
Copy link

ZoeyYDKY commented Feb 6, 2024

@falexwolf

Sorry for the late response on this.

I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing

I'll also make a note on the notebook.

sorry, I'm coming too late so the link you have shared is unavailable. Can you share it again? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants