Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of scripts & dataset #7

Open
rando2 opened this issue Apr 20, 2020 · 1 comment
Open

Order of scripts & dataset #7

rando2 opened this issue Apr 20, 2020 · 1 comment

Comments

@rando2
Copy link

rando2 commented Apr 20, 2020

Hi @danich1, thank you so much for telling me about this approach you developed and I am really amazed looking through this code!
I was wondering if I could ask what is the starting dataset (or starting script to generate the dataset) that you're using? Everything I've looked at seems to make sense, but I just can't figure out if there's something I'm supposed to have downloaded initially to get it to run or whether I'm just missing something.
Thank you again!

@danich1
Copy link
Contributor

danich1 commented Apr 21, 2020

Ah the dataset for this code is the bioRxiv xml dump. This repository is intentionally missing the dump because the bioRxiv group asked me to not share with anybody until they were ready to go public. Plus the dump is 2 terabytes, so definitely not a size github would be happy with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants