Query the Flora of North America Semantic MediaWiki

These scripts allow you to query the http://beta.semanticfna.org/ API module "ask" using R or Python. They return a CSV file of the results.

Getting started
- Prepare your query
- Query size limitations
Use R
Use Python
Getting help
- Bug reports
Resources
- Dependency documentation
- Merging multiple CSV files

Getting started

Prepare your query

The Flora of North America Semantic MediaWiki can be queried using the Semantic MediaWiki semantic search syntax.

In brief, you must have a condition:

[[Authority::Linnaeus]]

You can optionally return properties of the taxa matching your condition:

?Distribution

Putting this all together using pipes, we have a query like this:

[[Authority::Linnaeus]]|?Distribution

Or with additional properties requested, like this:

[[Authority::Linnaeus]]|?Distribution|?Taxon family

Sample queries can be found here:

http://beta.floranorthamerica.org/Sample_Queries

Read more about Semantic MediaWiki query syntax:

https://www.semantic-mediawiki.org/wiki/Help:Semantic_search
https://www.semantic-mediawiki.org/wiki/Help:Search_operators

Query size limitations

Semantic MediaWiki limits API queries to 5,000 results. If you expect your query to return more than 5,000 results, you should run your query in batches. (N.B.: There are ~20,000 treatments in the FNA Online.)

We recommend running your queries by 'published volume' by adding a volume condition to your query (e.g., "[[Volume::Volume 17]]"). Please see this page for a list of volumes that can be queried.

Use R

This section assumes you are familiar with the R programming language.

Show instructions

Prerequisites

R 3.x
WikipediR
tidyverse

Open a terminal.

Type git clone https://github.com/jocelynpender/fna-query.git

Open an R console. Type

install.packages("WikipediR")
install.packages("tidyverse")

Run your query

Open an R console
Open the run_query.R script
Run your query:

Option A: Return taxa names only (i.e., query does not include ? parameter)

E.g., [[Distribution::Nunavut]]

Use ask_query_titles. It returns only a list of Taxon names that match your query.

In the fna-query directory, run

source("R/src/query.R")
page_titles_vector <- ask_query_titles("[[Distribution::Nunavut]]", "output_file_name.csv")

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Distribution::Nunavut]]|?Taxon family

Use ask_query_titles_properties It returns a list of Taxon names and associated properties asked for by your query

In the fna-query directory, run

source("R/src/query.R")
properties_texts_data_frame <- ask_query_titles_properties("[[Distribution::Nunavut]]|?Taxon family", "output_file_name.csv")

Expected output

Option A: Return taxa names only (i.e., query does not include ? parameter)

E.g., [[Distribution::Nunavut]]

> page_titles_vector

[1] "Abietinella abietina"                     
[2] "Achillea millefolium"                     
[3] "Agrostis"                                 
[4] "Agrostis anadyrensis"        
 ...

See https://github.com/jocelynpender/fna-query/blob/master/R/demo_queries/distribution/nunavut_taxa.csv for a sample output file.

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Distribution::Nunavut]]|?Taxon family

> properties_texts_data_frame
                                            Taxon family
Abietinella abietina                         Thuidiaceae
Achillea millefolium                          Asteraceae
Agrostis                                         Poaceae
Agrostis anadyrensis                             Poaceae   
 ...

See https://github.com/jocelynpender/fna-query/blob/master/R/demo_queries/distribution/nunavut_taxa_family_name.csv for a sample output file.

Run a demo query

Don't know what to query? See the demo queries here: https://github.com/jocelynpender/fna-query/tree/master/R/demo_queries

Use Python

This section assumes you are familiar with Python programming.

Show instructions

Prerequisites

Create an account

You'll need to create an account to use the API with Python

Create your account http://beta.floranorthamerica.org/Special:CreateAccount
Find the file called local.py.example in the python/src folder. Rename it to local.py and add your credentials.

Dependencies

Python 3.7
mwclient
pandas

Option A. Use pip

requirements.txt has been generated with pip freeze > requirements.txt

Open a terminal.

cd fna-query
pip install -r requirements.txt

Option B. Use conda

The project was built within a conda environment. A conda YAML file has been generated with conda env export > fna-query.yml.

Open a terminal.

cd fna-query
conda env create -f fna-query.yml

Run your query

Open a terminal.
Prepare your query. E.g., [[Special status::Introduced]]
Run your query using: (if using conda, start with: conda activate environment-name)

cd fna-query
cd python
python -m src.run_query --output_file_name "output_file_name.csv" --query_string "[[Query::here]]"

The -m flag tells Python to run the script run_query.py and import the src module.

Expected output

If your query results are extensive, the query will take some time to process. Please be patient.

Option A: Taxa names only (i.e., query does not include ? parameter)

E.g., [[Illustrator::+]][[Illustration::Present]]

python -m src.run_query --output_file_name "illustrated_taxa.csv" --query_string "[[Illustrator::+]][[Illustration::Present]][[Taxon family::Asteraceae]]"

See https://github.com/jocelynpender/fna-query/blob/master/python/demo_queries/distribution/nunavut_taxa.csv for a sample output file.

Option B: Taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Illustrator::+]][[Illustration::Present]]|?Taxon rank

python -m src.run_query --output_file_name "illustrated_taxa_taxon_family.csv" --query_string "[[Illustrator::+]][[Illustration::Present]][[Taxon family::Asteraceae]]|?Taxon rank"

See https://github.com/jocelynpender/fna-query/blob/master/python/demo_queries/distribution/nunavut_taxa_family_name.csv for a sample output file.

Run a demo query

Don't know what to query? See the demo queries here: https://github.com/jocelynpender/fna-query/tree/master/python/demo_queries

Getting help

Contact pender.jocelyn@gmail.com or joel.sachs@canada.ca for support.

Bug reports

Please leave your bug reports here: https://github.com/jocelynpender/fna-query/issues

Resources

Dependency documentation

Read more about the WikipediR package for R.
Read more about the mwclient for Python.

Merging multiple CSV files

Sometimes you'll need to batch the API return results. Here is an R script for merging multiple CSV files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Query the Flora of North America Semantic MediaWiki

Getting started

Prepare your query

Query size limitations

Use R

Prerequisites

Run your query

Option A: Return taxa names only (i.e., query does not include ? parameter)

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

Expected output

Option A: Return taxa names only (i.e., query does not include ? parameter)

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

Run a demo query

Use Python

Prerequisites

Create an account

Dependencies

Option A. Use pip

Option B. Use conda

Run your query

Expected output

Option A: Taxa names only (i.e., query does not include ? parameter)

Option B: Taxa names and properties (i.e., query includes a ? parameter)

Run a demo query

Getting help

Bug reports

Resources

Dependency documentation

Merging multiple CSV files

Files

README.md

Latest commit

History

README.md

File metadata and controls

Query the Flora of North America Semantic MediaWiki

Getting started

Prepare your query

Query size limitations

Use R

Prerequisites

Run your query

Option A: Return taxa names only (i.e., query does not include ? parameter)

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

Expected output

Option A: Return taxa names only (i.e., query does not include ? parameter)

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

Run a demo query

Use Python

Prerequisites

Create an account

Dependencies

Option A. Use pip

Option B. Use conda

Run your query

Expected output

Option A: Taxa names only (i.e., query does not include ? parameter)

Option B: Taxa names and properties (i.e., query includes a ? parameter)

Run a demo query

Getting help

Bug reports

Resources

Dependency documentation

Merging multiple CSV files