-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33
base: master
Are you sure you want to change the base?
Conversation
Initial work on science direct/elsevier scraper.
Create science_direct.json
The typo meant that the fulltext XML was not extracted, as it was spelled "fulltest".
Fixed typo in MDPI scraper
Looks really good. The large image problem can be solved using 'follow-ons', a feature of ScraperJSON that I have not yet documented. I'll add that to this scraper. Also, the supplementary material element only captures the link to a page that lists the files for download rather than downloading the files themselves. This situation also requires a follow-on. |
Full figure images and supplementary data files both require clicking a link to navigate to a new page before direct links to the files are exposed. The 'followable' and 'follow' features of ScaperJSON are used to accomplish this.
Follow-ons would be really valuable. Could I suggest Ross and me as On Tue, May 12, 2015 at 11:58 PM, Richard Smith-Unna <
Peter Murray-Rust |
Use following to get links one-step removed
Good idea @petermr - if you look at the commit I made above (rossmounce@1eae300) you can see them in action. Basically, any element can 'follow' any other element in the elements array, just by adding the key-value pair The followed array must capture a URL. |
I hope I've done this correctly...
The figure image scrapable isn't ideal. At the moment it only get the tiny .gif
Ideally I'd want the largest version of the figure images but this isn't available one-click from the full-text HTML page -- it requires *large.jpg to be appended on. I might raise that as an issue.