seed-mcm

File import module for the SEED Platform; provides core functionality of "Map, Clean, Merge" (MCM)

Overview

MCM has two main peices, a reader and a mapper.

Reader
- Reads csv files, returns a generator of DictCSVReader parsed rows.
- Optionally chunks the rows into groupings of specified sizes.
Mapper
- Can build a probabalistic column mapping given a schema and some raw data.
  - Will substitute saved values for suggested mapping (e.g. pulling a previous mapping from DB).
  - Totally flexible, you pass a callable which takes the raw data and returns a mapping.
- Will clean data based on a Cleaner object for a given type. Type is inferred from the mapping schema.
- Ability to set "initial_data"
  - If you always need to set some information in the object that you're mapping data into, this is useful.
- Concatenate rows together with a specified delimiter character.
- Data which doesn't match a given schema's mapping is still saved. It's put in a dictionary called extra_data.

Integration

from mcm import cleaners, mapper, reader

# Here our mapping is just a dictionary where our keys are raw data representations
# and our values are our normalized attributes that we're mapping to.
mapping = {'Thing': 'thing_1', 'Other thing': 'thing_2'}

# model_class can be any type of object.
model_class = object

# Reading and mapping from a CSV file, simple case.
parser = reader.MCMParser(csv_file_handle)
mapped_objs = [m for m in parser.map_rows(mapping, model_class)]

Developing

Clone.
Create a virtualenv; if you use virtualenv wrapper you'll need to
1. Run python setup.py develop to hardlink your files into your env.

Testing

Unfortunately, there are some directory path issues still baked in. To run tests you have to be in the tests directory:

$ flake8 mcm --exclude=data
$ cd mcm/tests && nosetests

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
mcm		mcm
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
circle.yml		circle.yml
requirements-testing.txt		requirements-testing.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seed-mcm

Overview

Integration

Developing

Testing

Copyright

About

Releases

Packages

Contributors 2

Languages

License

SEED-platform/seed-mcm

Folders and files

Latest commit

History

Repository files navigation

seed-mcm

Overview

Integration

Developing

Testing

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages