ADSimportpipeline

Overview

Coordinates ingest of a full ADS record.

Parses "classic" bibcodes files defined in settings.py
Operates on any bibcode whose "timestamp" differs from the cooresponding "JSON_fingerprint" field in the mongodb
Uses ads.ADSExports.ADSRecords to consolidate data from classic based on bibcodes in 2.
Parses resulting xmlobject to python dict via xmltodict.py
Enforces type=list on any potentially repeated entries
Merges any repeated blocks having the same @type attribute
Insert (upsert=True) data to mongodb

Step 1 is initiated by invoking run.py.

Invoking run.py --async publishes the [(bibcode, fingerprint),...] records to rabbitmq.
Workers that consume these messages are defined in pipeline/psettings.py and pipeline/workers.py.
Workers are controlled via a master process in pipeline/ADSimportpipeliny.py.

pika
rabbitmq
ADSExports
pymongo + mongo
Note: The rabbitmq server should be configured for frame_max=512000
Note: pika should be configured with frame_max=512000 (seemingly must be changed in spec.py in addition to normal connection definition)

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
lib		lib
logs		logs
pipeline		pipeline
rules		rules
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ast.txt		ast.txt
run.py		run.py
schema.json		schema.json
settings.py		settings.py