Merge pull request #27 from NREL/feat/ingest-multiple-tables

Allow user to ingest multiple tables at once
NREL · Dec 3, 2024 · 9ffa946 · 9ffa946
2 parents f2f95da + a89e32d
commit 9ffa946
Show file tree

Hide file tree

Showing 29 changed files with 1,771 additions and 145 deletions.
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/conf.py b/docs/conf.py
@@ -0,0 +1,56 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = "Chronify"
+copyright = "2024, NREL"
+author = "NREL"
+release = "0.1.0"
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    "myst_parser",
+    "sphinx.ext.githubpages",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.todo",
+    "sphinx_copybutton",
+    "sphinxcontrib.autodoc_pydantic",
+    "sphinx_tabs.tabs",
+]
+
+templates_path = ["_templates"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+source_suffix = {
+    ".txt": "markdown",
+    ".md": "markdown",
+}
+
+html_theme = "furo"
+html_title = "Chronify Documentation"
+html_theme_options = {
+    "navigation_with_keys": True,
+}
+html_static_path = ["_static"]
+
+todo_include_todos = True
+autoclass_content = "both"
+autodoc_member_order = "bysource"
+todo_include_todos = True
+copybutton_only_copy_prompt_lines = True
+copybutton_exclude = ".linenos, .gp, .go"
+copybutton_line_continuation_character = "\\"
+copybutton_here_doc_delimiter = "EOT"
+copybutton_prompt_text = "$"
+copybutton_copy_empty_lines = False
diff --git a/docs/explanation/index.md b/docs/explanation/index.md
@@ -0,0 +1,11 @@
+```{eval-rst}
+.. _explanation-page:
+```
+# Explanation
+
+```{eval-rst}
+.. toctree::
+    :maxdepth: 2
+    :caption: Contents:
+
+```
diff --git a/docs/how_tos/getting_started/index.md b/docs/how_tos/getting_started/index.md
@@ -0,0 +1,9 @@
+# Getting Started
+
+```{eval-rst}
+.. toctree::
+   :maxdepth: 2
+
+   installation
+   quick_start
+```
diff --git a/docs/how_tos/getting_started/installation.md b/docs/how_tos/getting_started/installation.md
@@ -0,0 +1,37 @@
+
+```{eval-rst}
+.. _installation:
+```
+
+# Installation
+
+1. Install Python 3.11 or later.
+
+#. Create a Python 3.11+ virtual environment. This example uses the ``venv`` module in the standard
+library to create a virtual environment in your home directory. You may prefer a single
+`python-envs` in your home directory instead of the current directory. You may also prefer ``conda``
+or ``mamba``.
+
+```{eval-rst}
+.. code-block:: console
+
+   $ python -m venv env
+```
+
+2. Activate the virtual environment.
+
+```{eval-rst}
+.. code-block:: console
+
+   $ source env/bin/activate
+```
+
+Whenever you are done using chronify, you can deactivate the environment by running ``deactivate``.
+
+3. Install the Python package `chronify`.
+
+```{eval-rst}
+.. code-block:: console
+
+    $ pip install chronify
+```
diff --git a/docs/how_tos/getting_started/quick_start.md b/docs/how_tos/getting_started/quick_start.md
@@ -0,0 +1,43 @@
+# Quick Start
+
+```python
+
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+from chronify import DatetimeRange, Store, TableSchema
+
+store = Store.create_file_db(file_path="time_series.db")
+resolution = timedelta(hours=1)
+time_range = pd.date_range("2020-01-01", "2020-12-31 23:00:00", freq=resolution)
+store.ingest_tables(
+    (
+        pd.DataFrame({"timestamp": time_range, "value": np.random.random(8784), "id": 1}),
+        pd.DataFrame({"timestamp": time_range, "value": np.random.random(8784), "id": 2}),
+    ),
+    TableSchema(
+        name="devices",
+        value_column="value",
+        time_config=DatetimeRange(
+            time_column="timestamp",
+            start=datetime(2020, 1, 1, 0),
+            length=8784,
+            resolution=timedelta(hours=1),
+        ),
+        time_array_id_columns=["id"],
+    )
+ )
+query = "SELECT timestamp, value FROM devices WHERE id = ?"
+df = store.read_query("devices", query, params=(2,))
+df.head()
+```
+
+```
+            timestamp     value  id
+0 2020-01-01 00:00:00  0.594748   2
+1 2020-01-01 01:00:00  0.608295   2
+2 2020-01-01 02:00:00  0.297535   2
+3 2020-01-01 03:00:00  0.870238   2
+4 2020-01-01 04:00:00  0.376144   2
+```
diff --git a/docs/how_tos/index.md b/docs/how_tos/index.md
@@ -0,0 +1,14 @@
+```{eval-rst}
+.. _how-tos-page:
+```
+# How Tos
+
+```{eval-rst}
+.. toctree::
+    :maxdepth: 2
+    :caption: Contents:
+
+    getting_started/index
+    ingest_multiple_tables
+    map_time_config
+```
diff --git a/docs/how_tos/ingest_multiple_tables.md b/docs/how_tos/ingest_multiple_tables.md
@@ -0,0 +1,72 @@
+# How to Ingest Multiple Tables Efficiently
+
+There are a few important considerations when ingesting many tables:
+- Use one database connection.
+- Avoid loading all tables into memory at once, if possible.
+- Ensure additions are atomic. If anything fails, the final state should be the same as the initial
+state.
+
+**Setup**
+
+The input data are in CSV files. Each file contains a timestamp column and one value column per
+device.
+
+```python
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+from chronify import DatetimeRange, Store, TableSchema, CsvTableSchema
+
+store = Store.create_in_memory_db()
+resolution = timedelta(hours=1)
+time_config = DatetimeRange(
+    time_column="timestamp",
+    start=datetime(2020, 1, 1, 0),
+    length=8784,
+    resolution=timedelta(hours=1),
+)
+src_schema = CsvTableSchema(
+    time_config=time_config,
+    column_dtypes=[
+        ColumnDType(name="timestamp", dtype=DateTime(timezone=False)),
+        ColumnDType(name="device1", dtype=Double()),
+        ColumnDType(name="device2", dtype=Double()),
+        ColumnDType(name="device3", dtype=Double()),
+    ],
+    value_columns=["device1", "device2", "device3"],
+    pivoted_dimension_name="device",
+)
+dst_schema = TableSchema(
+    name="devices",
+    value_column="value",
+    time_array_id_columns=["id"],
+)
+```
+
+## Automated through chronfiy
+Chronify will manage the database connection and errors.
+```python
+store.ingest_from_csvs(
+    src_schema,
+    dst_schema,
+    (
+        "/path/to/file1.csv",
+        "/path/to/file2.csv",
+        "/path/to/file3.csv",
+    ),
+ )
+
+```
+
+## Self-Managed
+Open one connection to the database for the duration of your additions. Handle errors.
+```python
+with store.engine.connect() as conn:
+    try:
+        store.ingest_from_csv(src_schema, dst_schema, "/path/to/file1.csv")
+        store.ingest_from_csv(src_schema, dst_schema, "/path/to/file2.csv")
+        store.ingest_from_csv(src_schema, dst_schema, "/path/to/file3.csv")
+    except Exception:
+        conn.rollback()
+```
diff --git a/docs/how_tos/map_time_config.md b/docs/how_tos/map_time_config.md
@@ -0,0 +1,90 @@
+# How to Map Time
+This recipe demonstrates how to map a table's time configuration from one type to another.
+
+**Source table**: data is stored in representative time where there is one week of data per month by
+hour for one year.
+
+**Destination table**: data is stored with `datetime` timestamps for each hour of the year.
+
+**Workflow**:
+- Add the source table to the database.
+- Call `Store.map_table_time_config()`
+- Chronify adds the destination table to the database.
+
+This example creates a representative time table used in chronify's tests.
+
+1. Ingest the source data.
+
+```python
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+
+from chronify import (
+    DatetimeRange,
+    RepresentativePeriodFormat,
+    RepresentativePeriodTime,
+    Store,
+    CsvTableSchema,
+    TableSchema,
+)
+
+src_table_name = "ev_charging"
+dst_table_name = "ev_charging_datetime"
+hours_per_year = 12 * 7 * 24
+num_time_arrays = 3
+df = pd.DataFrame({
+    "id": np.concat([np.repeat(i, hours_per_year) for i in range(1, 1 + num_time_arrays)]),
+    "month": np.tile(np.repeat(range(1, 13), 7 * 24), num_time_arrays),
+    "day_of_week": np.tile(np.tile(np.repeat(range(7), 24), 12), num_time_arrays),
+    "hour": np.tile(np.tile(range(24), 12 * 7), num_time_arrays),
+    "value": np.random.random(hours_per_year * num_time_arrays),
+})
+schema = TableSchema(
+    name=src_table_name,
+    value_column="value",
+    time_config=RepresentativePeriodTime(
+        time_format=RepresentativePeriodFormat.ONE_WEEK_PER_MONTH_BY_HOUR,
+    ),
+    time_array_id_columns=["id"],
+)
+store = Store.create_in_memory_db()
+store.ingest_table(df, schema)
+store.read_query(src_table_name, f"SELECT * FROM {src_table_name} LIMIT 5").head()
+```
+
+```
+   id  month  day_of_week  hour     value
+0   1      1            0     0  0.578496
+1   1      1            0     1  0.092271
+2   1      1            0     2  0.111521
+3   1      1            0     3  0.671668
+4   1      1            0     4  0.782365
+```
+
+2. Map the table's time to datetime.
+```python
+dst_schema = TableSchema(
+    name=dst_table_name,
+    value_column="value",
+    time_array_id_columns=["id"],
+    time_config=DatetimeRange(
+        time_column="timestamp",
+        start=datetime(2020, 1, 1, 0),
+        length=8784,
+        resolution=timedelta(hours=1),
+    )
+)
+store.map_table_time_config(src_table_name, dst_schema)
+store.read_query(dst_table_name, f"SELECT * FROM {dst_table_name} LIMIT 5").head()
+```
+
+```
+   id     value           timestamp
+0   3  0.006213 2020-01-01 00:00:00
+1   3  0.865765 2020-01-01 01:00:00
+2   3  0.187256 2020-01-01 02:00:00
+3   3  0.336157 2020-01-01 03:00:00
+4   3  0.582281 2020-01-01 04:00:00
+```
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,36 @@
+# Chronify
+
+This package implements validation, mapping, and storage of time series data in support of
+Python-based modeling packages.
+
+## Features
+- Stores time series data in any database supported by SQLAlchemy.
+- Supports data ingestion in a variety of file formats and configurations.
+- Supports efficient retrieval of time series through SQL queries.
+- Validates consistency of timestamps and resolution.
+- Provides mappings between different time configurations.
+
+```{eval-rst}
+.. toctree::
+    :maxdepth: 2
+    :caption: Contents:
+    :hidden:
+
+    how_tos/index
+    tutorials/index
+    reference/index
+    explanation/index
+```
+
+## How to use this guide
+- Refer to [How Tos](#how-tos-page) for step-by-step instructions for creating store and ingesting data.
+- Refer to [Tutorials](#tutorials-page) examples of ingesting different types of data and mapping
+between time configurations.
+- Refer to [Reference](#reference-page) for API reference material.
+- Refer to [Explanation](#explanation-page) for descriptions and behaviors of the time series store.
+
+# Indices and tables
+
+- {ref}`genindex`
+- {ref}`modindex`
+- {ref}`search`