Tools for working with Elsevier's Pure API.
If you set the PURE_API_DOMAIN
and PURE_API_KEY
environment variables,
and want to use the latest Pure API version, using this package's client
can be as easy as:
from pureapi import client
response = client.get('research-outputs')
For more control, pass a client.Config
object. For example, to make requests
to two different Pure API servers:
from pureapi.client import Config
response = client.get(
'persons',
config=Config(domain='example.com', key='123-abc')
)
test_repsonse = client.get(
'persons',
config=Config(domain='test.example.com', key='456-def', version='523')
)
All functions that make requests of a Pure API server accept a config
parameter. See the documentation for client.Config
for more details.
When using more than one function that requires configuration, or
calling such functions multiple times, preconfig()
helps avoid
repetition, allowing multiple functions to be configured only once,
then re-used multiple times:
# Pre-configure a single function:
[preconfigured_get] = client.preconfig(Config(version='523'), client.get)
# Another way to configure a single function:
preconfigured_get_all = client.preconfig(Config(version='523'), client.get_all)[0]
# Preconfigure multiple functions:
get, get_all, get_all_transformed = client.preconfig(
Config(version='523'),
client.get,
client.get_all,
client.get_all_transformed
)
Many collections may contain too many records to download in a single request.
For downloading large numbers of records in multiple requests, use the
client.*_all*()
functions. For example, to download all research outputs:
for response in client.get_all('research-outputs'):
json = response.json()
For even more convenience, each client.*_all()
function has an associated
*_all_transformed()
function. Instead of yielding HTTP responses, which
typicall contain many records each,these functions yield individual records,
transformed from raw JSON into addict.Dict
objects. These transformed records
allow easier access to deeply nested fields, and may also ensure that some
critical fields exist in each record, even if they may be None
or empty by
default.
An example using one of these functions to get the titles of all research outputs:
for ro in client.get_all_transformed('research-outputs'):
title = ro.title.value
For more details, see the documentation for each module. For more examples, see
tests/test_*.py
.
Successfully tested against Pure API versions 5.18.x - 5.21.x.
pureapi requires Python >= 3.
The only configuration values the user absolutely must provide are a Pure API domain and key. There are a couple of ways to do this:
client.Config
objectPURE_API_DOMAIN
andPURE_API_KEY
environment variables
Running pytest --integration tests/test_client.py
requires the environment
variables. One to set them is with a .env
file. See env.dist
for an example.
To manage dependencies we use poetry, and to manage Python versions we use pyenv. Some of us also use anyenv to manage pyenv as well as other language version managers. While alternative tools will work, we document here only the tools we use. We will document the use of other tools if demand arises.
One way to set up all these tools to work together, for a new project, is to follow the workflow below.
Note that we prefer to put virtual environments inside the project directory. Note also that we use the
built-in venv
module to create virtual environments, and we name their directories .venv
, because
that is what poetry
does and expects.
- Install pyenv.
pyenv install $python_version
mkdir $project_dir; cd $project_dir
- Create a
.python-version
file, containing$python_version
. pip install poetry
poetry config settings.virtualenvs.in-project true
python -m venv ./.venv/
source ./.venv/bin/activate
Now running commands like poetry install
or poetry update
should install packages into the virtual
environment in ./.venv
. Don't forget to deactivate
the virtual environment when finished using it.
If the project virtual environment is not activated, poetry run
and poetry shell
will activate it.
When using poetry shell
, exit the shell to deactivate the virtual environment.
Poetry seems to no longer reliably support git-repository-based dependencies. While this method of specifying a dependency may still work for versions <= 4.3.1, we recommended using the preferred method, with versions >= 4.3.2, below.
Add to pyproject.toml
:
pureapi = {git = "https://github.com/UMNLibraries/pureapi.git"}
To specify a version, include the tag
parameter:
pureapi = {git = "https://github.com/UMNLibraries/pureapi.git", tag = "4.3.2"}
Add to pyproject.toml
a URL-based dependency, with a path to a tar.gz
file
in this repository's `dist/' directory:
pureapi = {url = "https://github.com/UMNLibraries/pureapi/raw/main/dist/pureapi-4.3.3.tar.gz"}
To install, run poetry install
.
Run the following, either as arguments
to poetry run
, or after running poetry shell
:
pytest tests/test_client.py
pytest tests/test_common.py
pytest tests/test_response.py
Or to run all tests: pytest
Note that tests/test_client.py
includes integration tests that make requests
against a Pure API server. To run those tests, pass the --integration
option
to pytest, and set the environment variables described in
Requirements and Recommendations.
Elsevier releases a new version of Pure, including a new version of the Pure API, every four months.
Testing existing clients against new Pure API schema versions presents significant challenges. Elsevier releases new versions of the Pure API every four months, and supports only three versions older than the current version. Also, the JSON schema for each API version is quite large, such that attempting to load them in some schema browsers will cause them to crash. Much of this size is due to documentation, which tends to change significantly between versions, making it difficult to find the signal of actual schema changes when diffing schema versions against each other.
Our strategy for solving these problems is to test against only the fraction of an API schema that we actually use, ignoring changes to parts of that schema we do not use. The tactics we use here are to find sample records that contain all of the fields we use for a given record type, and to run automated tests against downloads of the same record from each supported API version endpoint. The sample records are in tests/data/, and the tests are in tests/test_response.py.
We use Google-style docstrings. The Napoleon extension for Sphinx has some good examples.
Poetry seems to no longer reliably support git-repository-based dependencies. Below is a method of building packages we used previously for such dependencies.
Python package managers, including poetry, will be unable to install a VCS-based package without a
setup.py
file in the project root. To generate setup.py
:
poetry build
tar -zxf dist/pureapi-1.0.0.tar.gz pureapi-1.0.0/setup.py --strip-components 1
When specifying a dependency on this package, use a URL to a tar.gz
file produced by
the following, as described in Installing.
poetry build
To allow for flexibility in dependency versions, do not commit poetry.lock
.
If multiple developers encounter problems with conflicting dependency versions, we may
consider committing pyproject.lock
at that point.