International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33

rossmounce · 2015-05-12T22:15:51Z

I hope I've done this correctly...

The figure image scrapable isn't ideal. At the moment it only get the tiny .gif
Ideally I'd want the largest version of the figure images but this isn't available one-click from the full-text HTML page -- it requires *large.jpg to be appended on. I might raise that as an issue.

Initial work on science direct/elsevier scraper.

Create science_direct.json

The typo meant that the fulltext XML was not extracted, as it was spelled "fulltest".

Fixed typo in MDPI scraper

blahah · 2015-05-12T22:58:07Z

Looks really good. The large image problem can be solved using 'follow-ons', a feature of ScraperJSON that I have not yet documented. I'll add that to this scraper.

Also, the supplementary material element only captures the link to a page that lists the files for download rather than downloading the files themselves. This situation also requires a follow-on.

Full figure images and supplementary data files both require clicking a link to navigate to a new page before direct links to the files are exposed. The 'followable' and 'follow' features of ScaperJSON are used to accomplish this.

petermr · 2015-05-13T06:43:57Z

Follow-ons would be really valuable. Could I suggest Ross and me as
alpha-explorers for the existing undocumented code?

On Tue, May 12, 2015 at 11:58 PM, Richard Smith-Unna <
[email protected]> wrote:

Looks really good. The large image problem can be solved using
'follow-ons', a feature of ScraperJSON that I have not yet documented. I'll
add that to this scraper.

Also, the supplementary material element only captures the link to a page
that lists the files for download rather than downloading the files
themselves. This situation also requires a follow-on.

—
Reply to this email directly or view it on GitHub
#33 (comment)
.

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Use following to get links one-step removed

blahah · 2015-05-13T08:59:12Z

Good idea @petermr - if you look at the commit I made above (rossmounce@1eae300) you can see them in action.

Basically, any element can 'follow' any other element in the elements array, just by adding the key-value pair "follow": "element_name" to the element that does the following. If you want to follow an element, but don't want the followed element to be included in the results, you add it to a followables array instead of the elements array, as shown in the example I linked.

The followed array must capture a URL.

HTML not working

blahah and others added 30 commits May 26, 2014 23:54

Update README.md

1ecac9e

update example

e05e2a3

fix README formatting and typo

a35fecd

add html and text special attributes to README

acb9e7c

Update README.md

a2a6d26

Create science_direct.json

ebea30a

Initial work on science direct/elsevier scraper.

Merge pull request ContentMine#5 from ianthe/master

f6380f9

Create science_direct.json

travis setup

40b80a4

auto test generation script

13ef95b

move scrapers to subdir

073a4da

test generator script fixes

4962bab

self-populating tests and peerj example

cdb542b

Merge branch 'master' of github.com:ContentMine/journal-scrapers

3118404

move sciencedirect to scrapers

0ee30be

fix tmpdir use

c1cafdd

debug test generator tmpdir error

1cdbd3b

test set for peerj scraper

9c82fa0

fix test generator - now working

4e94869

fix test runner - now working

ad734f3

attempted fix for travis dependency install

86c3735

remove unneeded prints from tests

1ce66d6

tests for plos scraper

bf2c926

another attempted travis install fix

9b1f20d

delete wayward results file

aa0cfc6

add .gitignore

998b0de

add travis badge and explanation to README

fb463a8

tidy formatting in README

ab88faf

add science direct tests

8e9d287

Merge branch 'master' of https://github.com/ContentMine/journal-scrapers

04902e3

add CC0 license

da2a0f2

Richard Smith-Unna and others added 22 commits October 6, 2014 22:26

bump quickscrape dependency

f0f8c10

don't include file hash results in coverage

28c7415

Merge branch 'master' of github.com:ContentMine/journal-scrapers

f5c22d0

migrate to nvm

83d7ee0

restart bash after nvm install

a59d3f3

travis comes with nvm!

d0bf3f1

need latest 0.10.32

5a0ede7

use latest npm

e32730b

peerj scraper is complete - showcase of latest scraperJSON

15e7f35

Merge branch 'master' of github.com:ContentMine/journal-scrapers

b5992bb

bump quickscrape dependency version

6dba241

fix license path, improve copyright

120cd2a

tests for license and copyright improvements

8bb4071

added bmc and trialsjournal scrapers

23cbdd4

updated plos scraper

bd636a5

Merge branch 'master' of github.com:ContentMine/journal-scrapers

2b62bb4

Update PLoS scraper to fix fulltext xml capture

2081cb6

Fixed typo in MDPI scraper

f2d4d6f

The typo meant that the fulltext XML was not extracted, as it was spelled "fulltest".

Merge pull request ContentMine#32 from robintw/robintw-mdpi-typo

59ed0f2

Fixed typo in MDPI scraper

International Journal of Systematic and Evolutionary Microbiology

bac25e3

International Journal of Systematic and Evolutionary Microbiology

3fd2398

Delete ijsem-urls.txt~

8b9fc41

Use following to get links one-step removed

1eae300

Full figure images and supplementary data files both require clicking a link to navigate to a new page before direct links to the files are exposed. The 'followable' and 'follow' features of ScaperJSON are used to accomplish this.

Merge pull request #1 from ContentMine/fix-ijsem-pr

556ec95

Use following to get links one-step removed

Ross Mounce and others added 2 commits July 21, 2015 01:07

updated for new ingenta website July 2015

f454fce

HTML not working

pensoft no figures or captions yet

d3f2964

petermr force-pushed the master branch from 1a1ab44 to fbedb6e Compare June 6, 2016 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33

International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33

rossmounce commented May 12, 2015

blahah commented May 12, 2015

petermr commented May 13, 2015

blahah commented May 13, 2015

International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33

Are you sure you want to change the base?

International Journal of Systematic and Evolutionary Microbiology (IJSEM) #33

Conversation

rossmounce commented May 12, 2015

blahah commented May 12, 2015

petermr commented May 13, 2015

blahah commented May 13, 2015