Skip to content

Code to transform the plastics data prior to the web app

License

Notifications You must be signed in to change notification settings

SchmidtDSE/plastics-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plastics Pipeline

Luigi-based pipeline to sweep and select machine learning models for the plastics outcomes projection. This is used by https://global-plastics-tool.org/.


Purpose

Pipeline which executes pre-processing and machine learning tasks, working on the raw "input" data for the plastics business as usual projection model to make those projections multiple ways:

  • Naive: Simple polynomial curve fitting extrapoloation of past trends for trade, waste, and consumption.
  • Curve: Simple polynomial model that predicts trade, waste, and consumption having fit a curve against those response variables using population and GDP as input.
  • ML: A more sophisticated machine learning sweep which considers SVR, CART / trees, AdaBoost, and Random Forest.

In practice, the machine learning branch is used by the tool.


Usage

Most users can simply reference the output from the latest execution. That output is written to https://global-plastics-tool.org/datapipeline.zip and is publicly available under the CC-BY-NC License. That said, users may also leverage a local environment if desired.

Container Environment

A containerized Docker environment is available for execution. This will conduct the model sweeps and prepare the outputs required for the front-end tool. See COOKBOOK.md for more details.

Manual Environment

In addition to the Docker container, a manual environment can be established simply by running pip install -r requirements.txt. This assumes that sqlite3 is installed. Afterwards, simply run bash build.sh.

Configuration

The configuration for the Luigi pipeline can be modified by providing a custom json file. See task/job.json for an example. Note that the pipeline, by default, uses random forest even though a full sweep is conducted because that approach tends to yield better avoidance of overfitting. Parallelization can be enabled by changing the value of workers.

Extension

For examples of adding new regions or updating existing data, see COOKBOOK.md.

Snapshot database

Inputs snapshot for reproducibility is located in data/snapshot.db. Use of this preformatted dataset is controlled through const.USE_PREFORMATTED which defaults to True meaning that the included SQLite snapshot is used. For more details see the data directory.


Tool

Note that an interactive tool for this model is also available at https://github.com/SchmidtDSE/plastics-prototype.


Local Environment

Setup the local environment with pip -r requirements.txt.


Testing

Some unit tests and other automated checks are available. The following is recommended:

$ pip install pycodestyle pyflakes nose2
$ pyflakes *.py
$ pycodestyle *.py
$ nose2

Note that unit tests and code quality checks are run in CI / CD.


Deployment

This pipeline can be deployed by merging to the deploy branch of the repository, firing GitHub actions. This will cause the pipeline output files to be written to https://global-plastics-tool.org/datapipeline.zip.


Development Standards

CI / CD should be passing before merges to main which is used to stage pipeline deployments and deploy. Where possible, please follow the Google Python Style Guide. Please note that tests run as part of the pipeline itself and separate test files are encouraged but not required. That said, developers should document which tasks are tests and expand these tests like typical unit tests as needed in the future. We allow lines to go to 100 characters.


Data and Citation

Citations for data in this repository:

Our thanks to those authors and resources. Manuscript in progress data available upon request to authors.


Related Repositories

See also source code for the web-based tool running at global-plastics-tool.org and source code for the GHG pipeline.


Open Source

This project is released as open source (BSD and CC-BY-NC). See LICENSE.md for further details. In addition to this, please note that this project uses the following open source:

The following are also potentially used as executables like from the command line but are not statically linked to code:

Additional license information:

About

Code to transform the plastics data prior to the web app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages