A SQL-like interface for python and the web to query probabilistic models and the data they were trained on.
Version: 0.95
modelbase
can be used to model tabular data with generic probabilistic modelling as well to analyse, use and explore both, the fitted model as well as the data.
To this end the fitted models offer different types of operations such as prediction, conditionalization or marginalization.
Semantically equivalent operations, namely aggregation, row select and column selection are also available for data.
An overview over the capabilities of modelbase
and a short introductory example of its Python API usage can be found in the jupyter-notebook files Intro_example.ipynb and predict_API.ipynb
modelbase
provides an API-level access to model and data.
It provides a generic modelling and querying backend, similar to what data base management systems are for tabular data alone.
We have also developed lumen for visual-interactive access to probabiliistic models that requires no coding at all.
lumen
provides a web-application for exploration, comparison and validation of probabilistic models and its data. It uses the webservice interface of modelbase
.
The modelbase
repository contains a number directories as follows:
bin
: This folder contains scripts to run an instance of the webservice-backend locally. In particular you may executewebservice.py
to run the webservice as a local Flask app (see below)dev
: This folder contains stuff only relevant for development.docs
: This folder contains documentation and introductions as jupyter notebooks. These runnable examples that serve as starting points for different use cases and tasks.cgmodsel
: Contains a required external python package (resolved as a git submodule).mb-*-pkg
: each is a namespace package under the common namespacemb
.tests
: contains tests.
- Python3
- jupyterlab or jupyternotebook is required to run the intro examples and many other tutorials. See here for instructions.
- R may be required for some optional components.
- Clone this repository into a folder of your choice. Let's call it
<root>
. - Install other dependencies that are only available as git repositories (so called submodules) as follows:
- Install the
cgmodsel
package withgit submodule init && git submodule update
- Install submodule
pip3 install cgmodsel
from<root>
.
- Install the
- Install the base package
mb.modelbase
of the backend locally, i.e, docd <root>/mb-modelbase-pkg && pip3 install .
- Install the data package
mb.data
, i.e, docd <root>/mb-data-pkg && pip3 install .
- Run
bin/initialize.py
: this will create some initial probabilistic models inbin/fitted_models
. This is also a sanity check that things are all right with your installation.
This project uses the namespace mb
.
In that namespace a number of packages exist.
Following the setup instructions above you just installed the core package mb.modelbase
and the data package mb.data
.
If you want to install additional optional components you simply install the corresponding namespace packages, analogous to above.
Note that these subpackages may have conflicting dependencies, due to particular dependencies on third-party packages. Hence, you may not be able to install all components at once.
The following additional optional components and corresponding namespace packages exist: Each of them provide an additional type of model to work with.
mb.pymc3
: Use probabilistic programs / statistical models written in the PyMC3 probabilistic programming language.mb.stan
: Use probabilistic programs / statistical models written in the STAN probabilistic programming language.mb.gaussspn
: A gauss-SPN adapter. SPN stands for Sum-Product-Network.mb.libspn
: A libspn adapter.mb.mspn
: A mixed-SPN model adapter, based on some R package. Note that you need to install R for this to work.mb.spflow
A SPN-adapter based on the Python3 package spflow.
You have installed modelbase
as a number of (namespace) packages in your local python installation.
To update modelbase
you have to update each of these packages to the latest repository version by indivudually uninstalling them, fetching the latest version from git and installing them.
Here it is explained with the mb.modelbase
core package
- uninstall the current version: For instance for the core package do:
pip uninstall mb.modelbase
- change into the local repository /mb-modelbase-pkg
- pull the latest version from the repo:
git pull origin master
- install the latest version:
pip3 install .
Alternatively, you can use the --editable
when installing the packages above. Then, you simply need to pull the latest version from the repo.
modelbase
provides three layers of APIs:
-
a webservice that accepts JSON-formatted http/https requests with a SQL inspired syntax. See 'Running the modelbase webservice' and 'configuring the modelbase webservice' below.
-
a python class
ModelBase
. An instance of that class is like a instance of a data base management system - just (also) for probabilistic models. Use its member methods to add, remove models and run queries against it. See the class documentation for more information. -
a python class
Model
, which is the base class of all concrete models implemented in `modelbase'. A instance of such a class hence represents one particular model. See the class documentation for more information.
This repository contains a number of namespace pacakges which you should have installed by now.
It also contains the bin
directory, which contains executable scripts that you can use to run the backend as a webservice.
-
Execute
webservice.py
. This will start a Flask web server locally. It is the default and recommened way of usingmodelbase
if you just use it locally in combination with the sister projectlumen
. -
Run it as an WSGI application with (for example) apache2. The
modelbase.wsgi
file is provided for your convenience.
When you start the webservice it will load all models from the directory you provided (see configuration options below).
There is three ways to configure the webservice. In order of precedence (highest to lowest):
- use command line arguments to
webservice.py
. This cannot be used if you run modelbase as a WSGI application. Seewebservice.py --help
for available options. - set options in
run_conf.cfg
. This is respected by both ways of runningmodelbase
as a webservice (see above).run_conf.cfg
may have any subset of the options indefault_run_conf.cfg
and has the same format. To find out what options are available, please seerun_conf.cfg
. - set options in
default_run_conf.cfg
. Changing settings here is not recommended. Userun_conf.cfg
instead.
An instance of the webservice watches a directory and loads all models in that directory so that you can execute queries on these models and their data.
Models are stored as .mdl
files and are serialized (pickled) instances of md.Model
, that is, instances of some probabilistic model class.
By default models are loaded from the bin/fitted_models
directory, and this directory some models that are created/trained during the setup process of the modelbase
.
Check out the jupyter notebooks in doc/
to get started, and in particular these two
For any questions, feedback, bug reports, feature requests, rants, etc please contact: [email protected].
© 2016-2021 Philipp Lucas ([email protected])
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/.