Skip to content

Orm yaml #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 85 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
e89ed26
Initial proof of concept of orm.yaml
Nov 27, 2024
5543f44
primary, nullable and unique added to orm.yaml
Nov 28, 2024
5978809
Foreign keys in orm.yaml
Nov 29, 2024
4c7d2e3
ssg.py code generation fixed
Nov 29, 2024
a137cae
Creating data now works for Paglia!
Dec 3, 2024
709eb07
Primary key increment, unique constraint support
Dec 5, 2024
ec6c749
New unique constraint handling.
Dec 6, 2024
9421fba
Vocab table generation moved into new command make-vocab
Dec 9, 2024
9ea0c1e
Using Typer more conventially now
Dec 11, 2024
2b1a9ab
Vocabulary tables can now be created even with circular dependencies.
Dec 18, 2024
4e878cc
Fixed some issues
Dec 18, 2024
a47c1ee
Copy with config.yaml tables properties being empty.
Dec 18, 2024
a457c62
Sort tables ourselves so we don't warn about vocab table loops
Dec 19, 2024
92828c5
added --compress to make-vocab
Dec 19, 2024
2f59627
make-vocab progress reporting
Dec 19, 2024
ff7e3e0
remove-vocab and remove-data doesn't use ssg.py. Progress for create-…
Dec 20, 2024
405409c
Initial attempt for generic column generator analysis
Jan 8, 2025
dc180eb
fit fix and tidy up
Jan 8, 2025
2654db8
small fixes
Jan 9, 2025
ef3e690
"obvious" generators
Jan 14, 2025
128befc
Incomplete summary data fails all distrubtions that depend on it.
Jan 15, 2025
5f17371
distrubution buckets should not include NULL
Jan 15, 2025
08ef715
protect against standard deviation being NaN
Jan 15, 2025
cdf0480
Fixed dynamic module loading
Jan 15, 2025
ab571bf
Fixed make-generators without src-stats.yaml
Jan 15, 2025
4e038b4
Initial missingness implementation
Feb 13, 2025
cbd6e05
and template changes
Feb 13, 2025
20b59fb
primary keys now start after all existing keys
Feb 14, 2025
0883fa8
Fixed ssg template choice stuff
Feb 14, 2025
1e3ca32
Generating config.yaml and making it mandatory as input
Feb 24, 2025
e9898cc
Removed column uniquness, added postgres CIDR and BIT
Feb 26, 2025
4eb7465
Documentation and minor fixes
Feb 27, 2025
d28cb46
create-vocab can load .yaml.gz
Mar 3, 2025
d699710
Make DB tests use temporary database
Mar 6, 2025
32f66aa
factored out tests loading .dump files and fixed RstTests.test_dir
Mar 7, 2025
9c1a1aa
Interactive tables configuration with initial test
Mar 7, 2025
f929a10
Remaining tests for configure-tables
Mar 11, 2025
5e26bda
list-tables command
Mar 21, 2025
52779cc
configure-tables gains tab-completion for next and data commands
Mar 26, 2025
1c154a3
configure-tables list command
Mar 27, 2025
b5543b1
Initial generator interactivity
Apr 3, 2025
0276ef3
new generators implementation!
Apr 4, 2025
54e061e
configure-generators gains "compare" command
Apr 7, 2025
cebe37f
Fresh connection for each query
Apr 8, 2025
15bcb33
Mimesis string generators gain fit based on length.
Apr 9, 2025
c19e120
refactored do_compare
Apr 9, 2025
8d0a544
Initial outputting of config.yaml
Apr 10, 2025
428adec
Fix configure-generators output
Apr 11, 2025
8738abd
Mimesis DateTime generators
Apr 11, 2025
5b375ae
Automatic generators removed
Apr 14, 2025
b202495
Fix verbosity to apply to root config
Apr 15, 2025
26e076d
#9 speed up tests using Postgres
Apr 15, 2025
88f9dec
#11 configure-generators keeps column order stable
Apr 15, 2025
a315078
Initial tests for configure-generators
Apr 16, 2025
b5f8130
configure-generators keeps old generators
Apr 16, 2025
39a7a7f
configure-generators writes out existing queries to config.yaml
Apr 16, 2025
2f6ef58
select aggregate clauses merge
Apr 21, 2025
b5a051f
#12 tab completion for configure_generators' next
Apr 23, 2025
8d5eded
#2 configure-generators' compare command reports the privacy of the t…
Apr 23, 2025
fbdea22
testing that configure-generators does not damage other configuration
Apr 24, 2025
ab02542
Updated dependencies to work with Python 3.13
Apr 26, 2025
6ed417d
Cope with nulls in config.yaml
Apr 26, 2025
88fb03d
More robustness against null table configuration
Apr 26, 2025
54652c5
Dockerfile
Apr 28, 2025
695f17e
make-tables should not take config.yaml by default
Apr 28, 2025
2ab8074
initial configure-missing implementation
May 13, 2025
c2c308f
initial missingness_generators implementation
May 13, 2025
da0e165
First test for configure-missing
May 14, 2025
0b8b3de
removed some cruft
May 14, 2025
8ce26a0
end-to-end missingness test
May 15, 2025
89b2c18
interactive commands ask to save even when no changes have been made
May 15, 2025
7f3ed81
Much better information on columns in configure-tables and configure-…
May 16, 2025
97d3e34
configure-generators propose on all-null column no longer fails
May 16, 2025
2ec6192
EMPTY (num_passes: 0) table type in interactive commands
May 16, 2025
24eeb1b
datetime generator creation fixed
May 16, 2025
8e43d43
NULL generator
May 16, 2025
7c08c55
#20 configure-generators unset command
May 16, 2025
5aa3faa
#21 configure-generators: better text for propose if no source data
May 16, 2025
4533b4c
Some fixes
May 17, 2025
241e141
#25 interactive peek, select and count
May 18, 2025
ea3df37
make-vocab gets --only
May 18, 2025
49a1eb2
removed generate-configuration
May 20, 2025
4274b53
change make-generators to create-generators
May 20, 2025
8f46dea
Various fixes
May 20, 2025
08b4896
Updated quick start guide
May 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.*
*.yaml.gz
orm.yaml
config.yaml
src-stats.yaml
ssg.py
dist
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ docs/temp/*
# vim swap files
*.swp

# tool outputs
ssg.py
orm.py
orm.yaml
src-stats.yaml
config.yaml
*.yaml.gz
12 changes: 12 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM python:3.13.3-alpine3.21
RUN apk add bash poetry
WORKDIR /app
ADD . /app
RUN mkdir /pypoetry
ENV POETRY_VIRTUALENVS_PATH=/pypoetry/cache/virtualenv
ENV SHELL=/bin/bash
ENV HOME=/
RUN poetry install
RUN poetry run sqlsynthgen --install-completion bash
WORKDIR /data
CMD ["poetry", "--directory=/app", "shell"]
5 changes: 5 additions & 0 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ Configuration Reference
SqlSynthGen is configured using a YAML file, which is passed to several commands with the ``--config-file`` option.
Throughout the docs, we will refer to this file as ``config.yaml`` but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called ``config``).

You can generate an example configuration file, based on your source database and filled with only default values (therefore you can safely delete any parts of the generated configuration file you don't need) like this:

.. code-block:: shell
sqlsynthgen generate-config

Below, we see the schema for the configuration file.
Note that our config file format includes a section of SmartNoise SQL metadata, which is explained more fully `here <https://docs.smartnoise.org/sql/metadata.html#yaml-format>`_.

Expand Down
37 changes: 37 additions & 0 deletions docs/source/docker.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Using Docker
============

Sqlsynthgen can be run in a docker container. You can build it locally or run it directly from Docker Hub.

Building Docker locally
-----------------------

This will build a Docker image locally called ``ssg``:

.. code-block:: shell
docker build -t ssg .

Running sqlsynthgen in Docker
-----------------------------

Let us run the image built above in a way that can access a source
database on the local machine (with DSN
``postgresql://tim:tim@localhost:5432/pagila`` and schema ``public``),
and stores the files produced in a directory called ``output``:

.. code-block:: shell
mkdir output
docker run --rm --user $(id -u):$(id -g) --network host -e SRC_SCHEMA=public -e SRC_DSN=postgresql://tim:tim@localhost:5432/pagila -itv ./output:data ssg

You do need to create the output folder first.

You don't need ``--network host`` if the source database is not on the local
computer.

Running the image in this way will give you a command prompt from which
sqlsynthgen can be called. Tab completion can be used. For example, if
you type ``sq<TAB> ma<TAB>t<TAB>`` you will see
``sqlsynthgen make-tables``; although you might have to wait a second
or two after some of the ``<TAB>`` key presses for the completed text
to appear. Tab completion can also be used for command options such
as ``--force``. Press ``<TAB>`` twice to see a list of possible completions.
8 changes: 4 additions & 4 deletions docs/source/health_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Before getting into the config itself, we need to discuss a few peculiarities of
1. Some versions of OMOP contain a circular foreign key, for instance between the ``vocabulary``, ``concept``, and ``domain`` tables.
2. There are several standardized vocabulary tables (``concept``, ``concept_relationship``, etc).
These should be marked as such in the sqlsynthgen config file.
The tables will be exported to ``.yaml`` files during the ``make-generators`` step.
The tables will be exported to ``.yaml`` files during the ``create-generators`` step.
However, some of these vocabulary tables may be too large to practically be writable to ``.yaml`` files, and will need to be dealt with manually.
You should also check the license agreement of each standardized vocabulary before sharing any of the ``.yaml`` files.

Expand Down Expand Up @@ -106,15 +106,15 @@ The usual way is to run

.. code-block:: shell

sqlsynthgen make-generators --config-file=config.yaml
sqlsynthgen create-generators --config-file=config.yaml
sqlsynthgen create-vocab --config-file=config.yaml

``make-generators`` downloads all the vocabulary tables to your local machine as YAML files and ``create-vocab`` uploads them to the target database.
``create-generators`` downloads all the vocabulary tables to your local machine as YAML files and ``create-vocab`` uploads them to the target database.
In the CCHIC dataset we were looking at some of the vocabulary tables were several gigabytes, and downloading those as YAML files was a bad idea.
Thus we rather set SSG to ignore those tables and copied them over from the source schema to the destination schema manually, which was easier to do (in our case the source and the destination were just different schemas within the same database).

The ``ignore: true`` option can also be used to make SSG ignore tables that we are not interested in at all.
Note though that if one of the ignored tables is foreign key referenced by one of the tables we are `not` ignoring, the ignored table is still included in the ``orm.py`` and created by ``create-tables``, although ignored by ``make-generators`` and ``create-data``.
Note though that if one of the ignored tables is foreign key referenced by one of the tables we are `not` ignoring, the ignored table is still included in the ``orm.py`` and created by ``create-tables``, although ignored by ``create-generators`` and ``create-data``.
This is necessary to not break the network of foreign key relations.
It is also good, because it means that after we copy the big vocabulary tables over manually, all foreign key references and things like automatically generating default values for referencing columns work as usual.

Expand Down
7 changes: 0 additions & 7 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,6 @@ The latter also goes through some more advanced features of SSG and how to use t

This project will be under active development from Jan - Oct 2023


.. note::

We do not currently support tables without primary keys.
If you have tables without primary keys, some sqlsynthgen functionality
may work but vocabulary tables will not.

Contents:
---------

Expand Down
5 changes: 4 additions & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@ To use SqlSynthGen, first install it:

.. code-block:: console

$ pip install sqlsynthgen
$ pipx install git+https://github.com/tim-band/sqlsynthgen

Check that you can view the help message with:

.. code-block:: console

$ sqlsynthgen --help

It can also be used directly within a Docker container by downloading image ``timband/ssg``.
See the :ref:`quickstart guide <page-quickstart>` for more information.
Loading