Skip to content

Pl Python

Manas Sivakumar edited this page Sep 11, 2022 · 15 revisions

Overview

  • PL/Python procedural language allows PostgreSQL functions and procedures to be written in the Python language.
  • PL/Python is only available as an “untrusted” language, meaning it does not offer any way of restricting what users can do in it.
  • Until a trusted variant plpython becomes available in the future, the writer of a function must take caution that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator.

How to use

To install PL/Python in your database, Use CREATE EXTENSION plpython3u or CREATE EXTENSION plpython2u depending on the python version you want to use.

Installing 3rd party libraries in PL/Python

PL/Python by default comes with vanilla python installed. In order to use 3rd party modules:

  • You must install the library in any of the following default PYTHONPATH locations:
    • /usr/lib/python38.zip
    • /usr/lib/python3.8
    • /usr/lib/python3.8/lib-dynload
    • /usr/local/lib/python3.8/dist-packages
    • /usr/lib/python3/dist-packages
  • Or edit the PYTHONPATH variable of the postgresql server, i.e add the library installation path to etc/postgresql/<postgres_version>/main/environment.

For more details on PL/Python Environment Variables check Environment variables

Writing Functions in PL/Python

Functions in PL/Python are declared via the standard CREATE FUNCTION syntax

CREATE FUNCTION funcname (argument-list)
  RETURNS return-type
AS $$
  # PL/Python function body
$$ LANGUAGE plpython3u;

Example

A function that returns max of two numbers

CREATE FUNCTION pymax (a integer, b integer)
  RETURNS integer
AS $$
  if a > b:
    return a
  return b
$$ LANGUAGE plpython3u;

The Python code that is given as the body of the function definition is transformed into a Python function by the server like this

def __plpython_procedure_pymax_123456():
  if a > b:
    return a
  return b

Here 123456 is the OID assigned by POSTGRESQL to this function

Note: It is advised to treat the input variables to the function as READ-ONLY. If its necessary to reassign a value to the input, one must precede with the global keyword to obey the scope rules set in python.

More information on PL/Python Basics

Database Access

The PL/Python plpy module provides two Python functions to execute an SQL query and prepare an execution plan for a query, plpy.execute and plpy.prepare. Preparing the execution plan for a query is useful if you run the query from multiple Python functions.

PL/Python also supports the plpy.subtransaction() function to help manage plpy.execute calls in an explicit subtransaction. See Explicit Subtransactions in the PostgreSQL documentation for additional information about plpy.subtransaction().

Execute a query inside python

plpy.execute(query [, max-rows])

The result object emulates a list or dictionary object. The result object can be accessed by row number and column name. For example:

rv = plpy.execute("SELECT * FROM my_table", 5)
foo = rv[i]["my_column_name"]

Input arguments

  • Query: SQL statement to execute
  • max_rows: optional integer limiting the number of rows fetched

Output arguments

Apart from the rows, the result provides these additional methods

  • nrows(): number of rows processed by the command.
  • status(): The SPI_execute() return value.
  • colnames(): list of column names.
  • coltypes(): list of column type OIDs.
  • coltypmods(): list of type-specific type modifiers for the columns.

Note that calling plpy.execute will cause the entire result set to be read into memory. Only use that function when you are sure that the result set will be relatively small. If you don't want to risk excessive memory usage when fetching large results, use plpy.cursor rather than plpy.execute.

Prepare and Cursor functions

plpy.prepare

plpy.prepare prepares the execution plan for a query. It is called with a query string and a list of parameter types, if you have parameter references in the query

plan = plpy.prepare("SELECT last_name FROM my_users WHERE first_name = $1", ["text"])
rv = plpy.execute(plan, ["name"], 5)
or 
rv = plan.execute(["name"], 5)

plpy.cursor

  • The plpy.cursor function accepts the same arguments as plpy.execute (except for the row limit) and returns a cursor object, which allows you to process large result sets in smaller chunks.
  • The cursor object provides a fetch method that accepts an integer parameter and returns a result object. Each time you call fetch, the returned object will contain the next batch of rows, never larger than the parameter value. Once all rows are exhausted, fetch starts returning an empty result object

For more details on database access and trapping errors, refer to Database Access

GD, SD

  • When you prepare an execution plan using the PL/Python module the plan is automatically saved. See the Postgres Server Programming Interface (SPI) documentation for information about the execution plans https://www.postgresql.org/docs/9.4/spi.html.

  • To make effective use of saved plans across function calls you use one of the Python persistent storage dictionaries SD or GD.

  • The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, available to all Python functions within a session. Use GD with care.

Example use of the SD dictionary:

CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
  if SD.has_key("plan"):
    plan = SD["plan"]
  else:
    plan = plpy.prepare("SELECT 1")
    SD["plan"] = plan

  # rest of function
$$ LANGUAGE plpythonu;

Example use of the GD dictionary:

In terms of performance, importing a Python module is an expensive operation and can affect performance. If you are importing the same module frequently, you can use Python global variables to load the module on the first invocation and not require importing the module on subsequent calls. The following PL/Python function uses the GD persistent storage dictionary to avoid importing a module if it has already been imported and is in the GD.

psql=#
   CREATE FUNCTION pytest() returns text as $$ 
      if 'mymodule' not in GD:
        import mymodule
        GD['mymodule'] = mymodule
    return GD['mymodule'].sumd([1,2,3])
$$;

Utility Functions

The command print does not work in PL/Python. The Python module plpy implements these functions to manage errors and messages.

  • plpy.debug
  • plpy.log
  • plpy.info
  • plpy.notice
  • plpy.warning
  • plpy.error
  • plpy.fatal
  • plpy.debug

The message functions plpy.error and plpy.fatal raise a Python exception which, if uncaught, propagates out to the calling query, causing the current transaction or subtransaction to be aborted. The functions raise plpy.ERROR(msg) and raise plpy.FATAL(msg) are equivalent to calling plpy.error and plpy.fatal, respectively. The other message functions only generate messages of different priority levels.

For More Information refer to Utility Functions

References