Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33

Open
wants to merge 128 commits into
base: master
Choose a base branch
from

Conversation

rossmounce
Copy link
Member

I hope I've done this correctly...

The figure image scrapable isn't ideal. At the moment it only get the tiny .gif
Ideally I'd want the largest version of the figure images but this isn't available one-click from the full-text HTML page -- it requires *large.jpg to be appended on. I might raise that as an issue.

@blahah
Copy link
Member

blahah commented May 12, 2015

Looks really good. The large image problem can be solved using 'follow-ons', a feature of ScraperJSON that I have not yet documented. I'll add that to this scraper.

Also, the supplementary material element only captures the link to a page that lists the files for download rather than downloading the files themselves. This situation also requires a follow-on.

Full figure images and supplementary data files both
require clicking a link to navigate to a new page before
direct links to the files are exposed. The 'followable'
and 'follow' features of ScaperJSON are used to accomplish
this.
@petermr
Copy link
Member

petermr commented May 13, 2015

Follow-ons would be really valuable. Could I suggest Ross and me as
alpha-explorers for the existing undocumented code?

On Tue, May 12, 2015 at 11:58 PM, Richard Smith-Unna <
[email protected]> wrote:

Looks really good. The large image problem can be solved using
'follow-ons', a feature of ScraperJSON that I have not yet documented. I'll
add that to this scraper.

Also, the supplementary material element only captures the link to a page
that lists the files for download rather than downloading the files
themselves. This situation also requires a follow-on.


Reply to this email directly or view it on GitHub
#33 (comment)
.

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Use following to get links one-step removed
@blahah
Copy link
Member

blahah commented May 13, 2015

Good idea @petermr - if you look at the commit I made above (rossmounce@1eae300) you can see them in action.

Basically, any element can 'follow' any other element in the elements array, just by adding the key-value pair "follow": "element_name" to the element that does the following. If you want to follow an element, but don't want the followed element to be included in the results, you add it to a followables array instead of the elements array, as shown in the example I linked.

The followed array must capture a URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants