Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Commit

Permalink
Install pgvector on CI (#225)
Browse files Browse the repository at this point in the history
Previsously we did not have test for embedding indexing. This patch
installs [pgvector](https://github.com/pgvector/pgvector) from the 
PostgreSQL's apt repo so that we can test the embedding-related
features on CI.

To keep it simple, instead of modifying a container at runtime, this
patch adds a new `tox` env, called `test-container`, which builds the
docker image of database server with all dependencies before running
the tests. The `test` env is still available for running tests with
an existing server.

To make tests more effective, this patch also adds an option called 
`server_use_pickler` to `pytest`, indicating that whether to use the
pickler `dill` to deserialize UDFs on server. As a result, we can
have test cases for the pickler, which will fail when the pickler is
not applicable. Such test cases are currently missing and we have
encountered several issues because of that.
  • Loading branch information
xuebinsu authored Nov 15, 2023
1 parent 48270ad commit 417dcc8
Show file tree
Hide file tree
Showing 21 changed files with 227 additions and 149 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- name: Build the docs
run: |
export TAG_REF=${{ github.ref }}
tox -e docs
tox -e doc
- name: Save ref
run: echo "${{ github.ref }}" >> build/doc/ref.txt

Expand Down
89 changes: 25 additions & 64 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Test Postgres
name: Test

on:
push:
Expand All @@ -8,81 +8,42 @@ on:
paths-ignore:
- "*.md"

env:
PGPASSWORD: postgres
TESTDB: greenplum_python_test
PGUSER: postgres

jobs:
build:
name: Test Python${{ matrix.python-version }}, PSQL${{matrix.postgres-version}}
name: python${{ matrix.python-version }}; ${{ matrix.server }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
fail-fast: true
matrix:
python-version: ["3.9", "3.10"]
postgres-version: ['12', '13']
python-version: ["3.9", "3.11"]
server: ["postgres12-python39", "postgres12-python311"]
include:
- tox_test: "test_py39"
python-version: "3.9"
- tox_test: "test_py310"
python-version: "3.10"
- dill_version: python3.9
postgres-version: '12'
- dill_version: python3.9
postgres-version: '13'

services:
postgres:
# we need postgres with plpython extension postgres base docker hub image did not have that
# I use this one
# image: yihong0618/postgres-plpython:${{ matrix.postgres-version }}
image: thorinschiffer/postgres-plpython:${{ matrix.postgres-version }}
env:
POSTGRES_USER: ${{ env.PGUSER }}
POSTGRES_PASSWORD: ${{ env.PGPASSWORD }}
POSTGRES_DB: ${{ env.TESTDB }}
ports:
- 5432:5432
options: --name postgres
- server: "postgres12-python39"
server-python-version: "3.9"
- server: "postgres12-python311"
server-python-version: "3.11"

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
- name: Setup python on client
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: pip
- name: Create plpython3u language and environment
run: |
psql postgres://${{ env.PGUSER }}:${{ env.PGPASSWORD }}@localhost/${{ env.TESTDB }} -c "CREATE LANGUAGE plpython3u;"
- name: Install Dependencies
run: |
pip install tox==3.25.0
- name: List all containers
run: docker ps -a

- name: Install Python in PostgreSQL server container
run: |
docker exec --user 0 postgres sh -c 'apt-get update && apt-get install -y python3-pip && mkdir -p $(su --login postgres --session-command "python3 -m site --user-site") && chown -R postgres /var/lib/postgresql'
- name: Run Tests
run: |
export POSTGRES_PASSWORD=${{ env.PGPASSWORD }}
export POSTGRES_USER=${{ env.PGUSER }}
export POSTGRES_DB=${{ env.TESTDB }}
tox -e ${{ matrix.tox_test }}
- name: Install Dependencies for greenplumpython dill
# plpytyhon for postgres install to pg services
- name: Run tests without pickler
run: |
pip install pip install --target=. dill
docker cp dill $(docker ps -q):/usr/lib/${{ matrix.dill_version }}/
rm -rf dill pip
- name: Run Tests with dill
python3 -m pip install tox~=4.11 tox-docker~=4.1 && \
tox \
--override=docker:server.dockerfile=server/${{ matrix.server }}.Dockerfile \
-e test-container \
-- \
--override-ini=server_use_pickler=false \
--ignore=tests/test_use_pickler.py \
- name: Run tests with pickler if python versions match
if: ${{ matrix.python-version == matrix.server-python-version }}
run: |
export POSTGRES_PASSWORD=${{ env.PGPASSWORD }}
export POSTGRES_USER=${{ env.PGUSER }}
export POSTGRES_DB=${{ env.TESTDB }}
tox -e ${{ matrix.tox_test }}
tox \
--override=docker:server.dockerfile=server/${{ matrix.server }}.Dockerfile \
-e test-container \
-- \
--override-ini=server_use_pickler=true \
10 changes: 0 additions & 10 deletions BUILD.md

This file was deleted.

3 changes: 1 addition & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
include versioneer.py
include greenplumpython/_version.py
include greenplumpython/VERSION
12 changes: 6 additions & 6 deletions README.dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

### [tox](https://tox.wiki)

Install with `pip`:
We use tox as the task runner. Tox can be installed with

```
pip install tox
python3 -m pip install tox
```

Install with `brew` on macOS:
Expand Down Expand Up @@ -36,16 +36,16 @@ The tests will create connection to the Greenplum cluster. So the `PGPORT` needs
export PGPORT=6000
```

Test with the default python version:
Test with the default python version and a local database server:

```
tox -e test
```

Test with specified officially supported version:

To run tests against a database server in container:
```
tox -e test_py39
python3 -m pip install tox-docker
tox -e test-container
```

Run a specified test case(s):
Expand Down
3 changes: 1 addition & 2 deletions concourse/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ _main() {
unset PYTHONPATH
unset PYTHONHOME
python3.9 -m pip install tox
python3.9 -m pip install .
tox -e test_py39
tox -e test
popd
}

Expand Down
14 changes: 7 additions & 7 deletions greenplumpython/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -798,9 +798,9 @@ def refresh(self) -> "DataFrame":
.. highlight:: python
.. code-block:: Python
>>> cursor.execute("DROP TABLE IF EXISTS const_table;")
>>> cursor.execute("DROP TABLE IF EXISTS t_refresh;")
>>> nums = db.create_dataframe(rows=[(i,) for i in range(5)], column_names=["num"])
>>> df = nums.save_as("const_table", column_names=["num"], temp=False).order_by("num")[:]
>>> df = nums.save_as("t_refresh", column_names=["num"], temp=False).order_by("num")[:]
>>> df
-----
num
Expand All @@ -812,7 +812,7 @@ def refresh(self) -> "DataFrame":
4
-----
(5 rows)
>>> cursor.execute("INSERT INTO const_table(num) VALUES (5);")
>>> cursor.execute("INSERT INTO t_refresh(num) VALUES (5);")
>>> df
-----
num
Expand All @@ -836,7 +836,7 @@ def refresh(self) -> "DataFrame":
5
-----
(6 rows)
>>> cursor.execute("DROP TABLE const_table;")
>>> cursor.execute("DROP TABLE t_refresh;")
Note:
`cursor` is a predefined `Psycopg Cursor <https://www.psycopg.org/docs/cursor.html>`_
Expand Down Expand Up @@ -913,7 +913,7 @@ def save_as(
.. code-block:: Python
>>> nums = db.create_dataframe(rows=[(i,) for i in range(5)], column_names=["num"])
>>> df = nums.save_as("const_table", column_names=["num"], temp=True)
>>> df = nums.save_as("t_saved", column_names=["num"], temp=True)
>>> df.order_by("num")[:]
-----
num
Expand All @@ -925,8 +925,8 @@ def save_as(
4
-----
(5 rows)
>>> const_table = db.create_dataframe(table_name="const_table")
>>> const_table.order_by("num")[:]
>>> t_saved = db.create_dataframe(table_name="t_saved")
>>> t_saved.order_by("num")[:]
-----
num
-----
Expand Down
4 changes: 1 addition & 3 deletions greenplumpython/experimental/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,9 @@ def _archive_and_upload(tmp_archive_name: str, files: list[str], db: gp.Database
def _from_files(_, files: list[str], parser: NormalFunction, db: gp.Database) -> gp.DataFrame:
tmp_archive_name = f"tar_{uuid.uuid4().hex}"
_archive_and_upload(tmp_archive_name, files, db)
func_sig = inspect.signature(parser.unwrap())
result_members = get_type_hints(func_sig.return_annotation)
return db.apply(
lambda: parser(_extract_files(tmp_archive_name, "files")),
expand=len(result_members) == 0,
expand=True,
)


Expand Down
1 change: 0 additions & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
black==22.3.0
isort==5.10.1
pytest==7.0.1
pglast==3.10
pyright==1.1.250
pandas
1 change: 0 additions & 1 deletion requirements-doc.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
-r requirements.txt
sphinx==5.1.1
sphinx_rtd_theme==1.0.0
nbsphinx
Expand Down
21 changes: 21 additions & 0 deletions server/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

set -o nounset -o xtrace -o errexit -o pipefail

PG_MAJOR_VERSION=$(pg_config --version | grep --only-matching --extended-regexp '[0-9]+' | head -n 1)
export DEBIAN_FRONTEND=nointeractive
apt-get update
apt-get install --no-install-recommends -y \
postgresql-plpython3-"$PG_MAJOR_VERSION" \
postgresql-"$PG_MAJOR_VERSION"-pgvector \
python3-pip \
python3-venv
apt-get autoclean

POSTGRES_USER_SITE=$(su --login postgres --session-command "python3 -m site --user-site")
POSTGRES_USER_BASE=$(su --login postgres --session-command "python3 -m site --user-base")
mkdir --parents "$POSTGRES_USER_SITE"
chown --recursive postgres "$POSTGRES_USER_BASE"

cp /tmp/initdb.sh /docker-entrypoint-initdb.d
chown postgres /docker-entrypoint-initdb.d/*
12 changes: 12 additions & 0 deletions server/initdb.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

set -o nounset -o xtrace -o errexit -o pipefail

{
echo "logging_collector = ON"
echo "log_statement = 'all'"
echo "log_destination = 'csvlog'"
} >>"$PGDATA"/postgresql.conf

python3 -m venv "$HOME"/venv
source "$HOME"/venv/bin/activate
11 changes: 11 additions & 0 deletions server/postgres12-python311.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM postgres:12-bookworm

COPY build.sh initdb.sh /tmp/
RUN bash /tmp/build.sh

HEALTHCHECK --interval=1s --timeout=1s --start-period=1s --retries=30 CMD psql \
--single-transaction \
--user=$POSTGRES_USER \
--dbname=$POSTGRES_DB \
--host=localhost \
--command="SELECT version();"
11 changes: 11 additions & 0 deletions server/postgres12-python39.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM postgres:12-bullseye

COPY build.sh initdb.sh /tmp/
RUN bash /tmp/build.sh

HEALTHCHECK --interval=1s --timeout=1s --start-period=1s --retries=30 CMD psql \
--single-transaction \
--user=$POSTGRES_USER \
--dbname=$POSTGRES_DB \
--host=localhost \
--command="SELECT version();"
46 changes: 39 additions & 7 deletions tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,58 @@
import greenplumpython as gp


@pytest.fixture()
def db():
# NOTE: This UDF must **not** depend on picklers, such as dill.
@gp.create_function
def pip_install(requirements: str) -> str:
import subprocess as sp
import sys

assert sys.executable, "Python executable is required to install packages."
cmd = [
sys.executable,
"-m",
"pip",
"install",
"--requirement",
"/dev/stdin",
]
try:
output = sp.check_output(cmd, text=True, stderr=sp.STDOUT, input=requirements)
return output
except sp.CalledProcessError as e:
raise Exception(e.stdout)


@pytest.fixture(scope="session")
def db(server_use_pickler: bool):
# for the connection both work for GitHub Actions and concourse
db = gp.database(
params={
"host": environ.get("POSTGRES_HOST", "localhost"),
"dbname": environ.get("TESTDB", "gpadmin"),
"user": environ.get("PGUSER"),
"host": environ.get("PGHOST", "localhost"),
"dbname": environ.get("TESTDB", environ.get("USER")),
"user": environ.get("PGUSER", environ.get("USER")),
"password": environ.get("PGPASSWORD"),
}
)
db._execute("DROP SCHEMA IF EXISTS test CASCADE; CREATE SCHEMA test;", has_results=False)
db._execute(
"""
CREATE EXTENSION IF NOT EXISTS plpython3u;
CREATE EXTENSION IF NOT EXISTS vector;
DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test;
""",
has_results=False,
)
if server_use_pickler:
print(db.apply(lambda: pip_install("dill==0.3.6")))
yield db
db.close()


@pytest.fixture()
def con():
host = "localhost"
dbname = environ.get("TESTDB", "gpadmin")
dbname = environ.get("TESTDB", environ.get("USER"))
con = f"postgresql://{host}/{dbname}"
yield con

Expand Down
16 changes: 16 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import pytest


def pytest_addoption(parser: pytest.Parser):
parser.addini(
"server_use_pickler",
type="bool",
default=True,
help="Use pickler to deserialize UDFs on server.",
)


@pytest.fixture(scope="session")
def server_use_pickler(pytestconfig: pytest.Config) -> bool:
val: bool = pytestconfig.getini("server_use_pickler")
return val
Loading

0 comments on commit 417dcc8

Please sign in to comment.