Skip to content

Latest commit

 

History

History
248 lines (167 loc) · 8.36 KB

README.md

File metadata and controls

248 lines (167 loc) · 8.36 KB



Query the Flora of North America Semantic MediaWiki

These scripts allow you to query the http://beta.semanticfna.org/ API module "ask" using R or Python. They return a CSV file of the results.

Getting started

Prepare your query

The Flora of North America Semantic MediaWiki can be queried using the Semantic MediaWiki semantic search syntax.

In brief, you must have a condition:

[[Authority::Linnaeus]]

You can optionally return properties of the taxa matching your condition:

?Distribution

Putting this all together using pipes, we have a query like this:

[[Authority::Linnaeus]]|?Distribution

Or with additional properties requested, like this:

[[Authority::Linnaeus]]|?Distribution|?Taxon family

Sample queries can be found here:

Read more about Semantic MediaWiki query syntax:

Query size limitations

Semantic MediaWiki limits API queries to 5,000 results. If you expect your query to return more than 5,000 results, you should run your query in batches. (N.B.: There are ~20,000 treatments in the FNA Online.)

We recommend running your queries by 'published volume' by adding a volume condition to your query (e.g., "[[Volume::Volume 17]]"). Please see this page for a list of volumes that can be queried.

Use R

This section assumes you are familiar with the R programming language.

Show instructions

Prerequisites

Open a terminal.

Type git clone https://github.com/jocelynpender/fna-query.git

Open an R console. Type

install.packages("WikipediR")
install.packages("tidyverse")

Run your query

  1. Open an R console
  2. Open the run_query.R script
  3. Run your query:

Option A: Return taxa names only (i.e., query does not include ? parameter)

E.g., [[Distribution::Nunavut]]

Use ask_query_titles. It returns only a list of Taxon names that match your query.

In the fna-query directory, run

source("R/src/query.R")
page_titles_vector <- ask_query_titles("[[Distribution::Nunavut]]", "output_file_name.csv")

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Distribution::Nunavut]]|?Taxon family

Use ask_query_titles_properties It returns a list of Taxon names and associated properties asked for by your query

In the fna-query directory, run

source("R/src/query.R")
properties_texts_data_frame <- ask_query_titles_properties("[[Distribution::Nunavut]]|?Taxon family", "output_file_name.csv")

Expected output

Option A: Return taxa names only (i.e., query does not include ? parameter)

E.g., [[Distribution::Nunavut]]

> page_titles_vector

[1] "Abietinella abietina"                     
[2] "Achillea millefolium"                     
[3] "Agrostis"                                 
[4] "Agrostis anadyrensis"        
 ...

See https://github.com/jocelynpender/fna-query/blob/master/R/demo_queries/distribution/nunavut_taxa.csv for a sample output file.

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Distribution::Nunavut]]|?Taxon family

> properties_texts_data_frame
                                            Taxon family
Abietinella abietina                         Thuidiaceae
Achillea millefolium                          Asteraceae
Agrostis                                         Poaceae
Agrostis anadyrensis                             Poaceae   
 ...

See https://github.com/jocelynpender/fna-query/blob/master/R/demo_queries/distribution/nunavut_taxa_family_name.csv for a sample output file.

Run a demo query

Don't know what to query? See the demo queries here: https://github.com/jocelynpender/fna-query/tree/master/R/demo_queries

Use Python

This section assumes you are familiar with Python programming.

Show instructions

Prerequisites

Create an account

You'll need to create an account to use the API with Python

  1. Create your account http://beta.floranorthamerica.org/Special:CreateAccount

  2. Find the file called local.py.example in the python/src folder. Rename it to local.py and add your credentials.

Dependencies

Option A. Use pip

requirements.txt has been generated with pip freeze > requirements.txt

Open a terminal.

cd fna-query
pip install -r requirements.txt

Option B. Use conda

The project was built within a conda environment. A conda YAML file has been generated with conda env export > fna-query.yml.

Open a terminal.

cd fna-query
conda env create -f fna-query.yml

Run your query

  1. Open a terminal.
  2. Prepare your query. E.g., [[Special status::Introduced]]
  3. Run your query using: (if using conda, start with: conda activate environment-name)
cd fna-query
cd python
python -m src.run_query --output_file_name "output_file_name.csv" --query_string "[[Query::here]]"

The -m flag tells Python to run the script run_query.py and import the src module.

Expected output

If your query results are extensive, the query will take some time to process. Please be patient.

Option A: Taxa names only (i.e., query does not include ? parameter)

E.g., [[Illustrator::+]][[Illustration::Present]]

python -m src.run_query --output_file_name "illustrated_taxa.csv" --query_string "[[Illustrator::+]][[Illustration::Present]][[Taxon family::Asteraceae]]"

See https://github.com/jocelynpender/fna-query/blob/master/python/demo_queries/distribution/nunavut_taxa.csv for a sample output file.

Option B: Taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Illustrator::+]][[Illustration::Present]]|?Taxon rank

python -m src.run_query --output_file_name "illustrated_taxa_taxon_family.csv" --query_string "[[Illustrator::+]][[Illustration::Present]][[Taxon family::Asteraceae]]|?Taxon rank"

See https://github.com/jocelynpender/fna-query/blob/master/python/demo_queries/distribution/nunavut_taxa_family_name.csv for a sample output file.

Run a demo query

Don't know what to query? See the demo queries here: https://github.com/jocelynpender/fna-query/tree/master/python/demo_queries

Getting help

Contact [email protected] or [email protected] for support.

Bug reports

Please leave your bug reports here: https://github.com/jocelynpender/fna-query/issues

Resources

Dependency documentation

Merging multiple CSV files

Sometimes you'll need to batch the API return results. Here is an R script for merging multiple CSV files.