Skip to content

Using MadLib Packman 2

jhellerstein edited this page Dec 19, 2010 · 6 revisions

Using MadLib PackMan

These instructions and the attached rpm assume you have greenplum in /usr/local/greenplum-db, and a gpadmin user account with sudo privileges.

  1. Make sure you have Python setuptools installed. If you're using Greenplum, you'll need to do this manually. Log into your gpadmin account, make sure you've sourced the greenplum paths, and do the following:
     cd /tmp
     wget http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg#md5=bfa92100bd772d5a213eedd356d64086
     sh setuptools-0.6c11-py2.6.egg
  1. Install the python libraries used by the madlib installer.
     export CFLAGS="-L/usr/local/greenplum-db/ext/python/lib/ -L /usr/local/greenplum-db/lib/"
     /usr/local/greenplum-db/ext/python/bin/easy_install argparse hashlib pyyaml sqlparse psycopg2
  1. Change directory into the madlib-contrib root. You now have two choices: build an rpm for distribution and install it, or simply install the python code you have. The former is more like what will happen eventually, the latter is easier for madlib developers.

    a. Option 1: Build rpm and Install.

     python setup.py bdist_rpm
     cd dist
     rpm -Uvh madlib-0.01-1.noarch.rpm 
     sudo chown -R gpadmin /usr/local/greenplum-db/ext/python/lib/python2.6/site-packages/mad*
  b. **Option 2: Install directly from the repo.**
     python setup.py install
  1. In your newly-installed madpy extension, use vi (or substitute your favorite editor) to edit the madpy/Config.yml file to reflect your information. You'll likely only need to change the connect_args, but you may want to change the other fields as well.
     vi /usr/local/greenplum-db/ext/python/lib/python2.6/site-packages/madpy/Config.yml
  1. Now that the python libraries are installed in the filesystem, it's time to build the database extensions and install them. Make sure you have already defined the apprioriate madlib schema in the appropriate database (the schema and database specified in your madpy/Config.yml in the previous step).
     madpack install
  1. To undo things, you want to uninstall the extensions from the database, and remove the rpm if you installed that way.
     madpack uninstall
     sudo rpm -e madlib

Adding methods to the Package Manager config

Information about packages is stored in two places.

  1. Installation configuration is in madpy/Config.yml. The format is fairly straightforward: you specify a unique name for your method (which should be the directory name under methods in the repo) and a desired port to install (which should be the directory name under <yourmethod>/src.) If you like, you can also place a Config.yml file into some directory //Config.yml in your filesystem, and run madpack -c /<path-to-dir> install.)
  2. Each port directory should have an Install.yml file that specifies SQL scripts to roll "forward" (fw) and "backward" (bw). A module keys is also required (but is unused as of now so this may change). See sketch/src/extended_sql/pg_gp/Install.yml for an example.

Note: the madpack script will attempt to run make install in the port directory, which you can use to generate appropriate SQL install directory references via the use of pgxs. SQL scripts should use the string MADLIB_SCHEMA as the schema before any function or table names; this will be replaced by the value of the target_schema in Config.yml. See sketch/src/extended_sql/pg_gp/sketches.sql.in for an example.