-
Notifications
You must be signed in to change notification settings - Fork 33
Journal article data collected for ContentMine
Richard Smith-Unna edited this page Jan 11, 2015
·
1 revision
Every journal scraper in the collection targets the same data. A scraper should collect as many as possible of the elements in the list below. Words that are styled like this
are the keywords that should be used as element names in the scraper definition.
-
publisher
- the name of the publisher - journal:
journal_name
journal_issn
volume
issue
firstpage
title
-
keywords
- either a single string containing all the keywords, or each keyword can be captured separately - authors:
author_name
author_institution
author_givenName
author_familyName
author_orcid
- date:
date_published
date_accepted
date_submitted
- identifiers:
doi
-
pmid
- PubMed ID
license
copyright
- links:
fulltext_html
fulltext_pdf
fulltext_xml
supplementary_file
- sections - generally in either/both of HTML or text. HTML versions should use the
html
attribute, while text versions should use thetext
attribute.- abstract:
abstract_html
abstract_text
- introduction:
introduction_html
introduction_text
- methods:
methods_html
methods_text
- results:
results_html
results_text
- discussion:
discussion_html
discussion_text
- conclusions:
conclusion_html
conclusion_text
- author contributions:
author_contrib_html
author_contrib_text
- competing interests:
competing_interests_html
competing_interests_text
- figures - currently only captured as HTML and image file download
figures_html
-
figures_image
- a download of the image file, with no renaming
- tables - currently only captured as HTML
tables_html
- references - currently only captured as HTML
references_html
- abstract: