Journal article data collected for ContentMine

Every journal scraper in the collection targets the same data. A scraper should collect as many as possible of the elements in the list below. Words that are styled like this are the keywords that should be used as element names in the scraper definition.

Metadata

publisher - the name of the publisher
journal:
- journal_name
- journal_issn
- volume
- issue
- firstpage
title
keywords - either a single string containing all the keywords, or each keyword can be captured separately
authors:
- author_name
- author_institution
- author_givenName
- author_familyName
- author_orcid
date:
- date_published
- date_accepted
- date_submitted
identifiers:
- doi
- pmid - PubMed ID
license
copyright

Content

links:
- fulltext_html
- fulltext_pdf
- fulltext_xml
- supplementary_file
sections - generally in either/both of HTML or text. HTML versions should use the html attribute, while text versions should use the text attribute.
- abstract:
  - abstract_html
  - abstract_text
- introduction:
  - introduction_html
  - introduction_text
- methods:
  - methods_html
  - methods_text
- results:
  - results_html
  - results_text
- discussion:
  - discussion_html
  - discussion_text
- conclusions:
  - conclusion_html
  - conclusion_text
- author contributions:
  - author_contrib_html
  - author_contrib_text
- competing interests:
  - competing_interests_html
  - competing_interests_text
- figures - currently only captured as HTML and image file download
  - figures_html
  - figures_image - a download of the image file, with no renaming
- tables - currently only captured as HTML
  - tables_html
- references - currently only captured as HTML
  - references_html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal article data collected for ContentMine

Metadata

Content

Clone this wiki locally