Replies: 4 comments
-
A relatively easy way of doing this now would be to load a data package into SQLite and publish it to a free hosting service using |
Beta Was this translation helpful? Give feedback.
-
Another approach would be to put the data into SQLite db in a Github repo, then use a Github action to publish it along with |
Beta Was this translation helpful? Give feedback.
-
@psychemedia super cool suggestions Tony. I think this could be easy to do - have you done this with data packages already? |
Beta Was this translation helpful? Give feedback.
-
Not with datapackages specifically. |
Beta Was this translation helpful? Give feedback.
-
Overview
Proposal for runnable data packages.
From @psychemedia
I was wondering if anyone has looked at toolchains that support the creation of ad hoc working environments around a particular environment.
For example, I have a recipe for creating an analysis environment that links and launches RStudio connected to MySQL seeded with a particular dataset using Docker compose and a Dockerfile that created the seeded database:
The seeded database is linked to RStudio:
So in the extreme case, I imagine running a command something like the following as an equivalent to the above:
reallyfrictionless -app rocker/tidyverse -app jupyter/datascience-notebook -dbms myqsl -dbalias ergastdb -db ergastdb -dbpwd f1 -datasrc http://ergast.com/downloads/f1db.sql.gz -datatyp sql
and consequently being presented with an environment (Jupyter notebooks and RStudio) opened in my browser that let me link to the populated DB and access the data from it directly.
(Actually it'd have to be a tad more complicated to support eg mounted directories/volumes? Hmm... Maybe easier if we just allow a single
-app
then-guestdir
and-hostdir
switches, or a single-appvolume
switch?)-app
: name of application container(s) to install-dbms
: DBMS container-dbalias
: alias by which DBMS is accessed from app containers-dbpwd
: DBMS root password-db
: database created and used inside DBMs-datasrc
: data file-datatyp
: datafile typeFor a datapackage, use
-datatyp datapackage
and the URL to the data package, and configure the database from that?(To simplify things on the DBMS side, may be sensible to define images
reallyfrictionless/postgres
,reallyfrictionless/mysql
etc that have standardised config variables?)In the interim, how about some simple Dockerfile recipes that allow a datapackage to be identified and a runnable DBMS docker image created from it?
From @pwalsh
Hi @psychemedia
have you looked at https://github.com/frictionlessdata/datapackage-pipelines ?
If I understand your use case correctly, DPP could be used to do exactly what you want, with the following additions on top of the base library:
We've love to see such a use case and even contributions of processors to do these things!
From @psychemedia
@pwalsh Not played with that, no... will take a look...
@pwalsh Actually, the datapackage isn't opened into Jupyter or RStudio environment directly. It's loaded into the database container, which is then linked to, and accessible from, the application environment(s).
That's why I posted to this repo...;-)
So the question more succinctly is: is there a simple, automated recipe for populating a MySQL or Postgres container with a datapackage so that someone who doesn't know how to set up a database can create a running containerised instance of a database containing the datapackage contents from a simple command?
Beta Was this translation helpful? Give feedback.
All reactions