Skip to content

Commit

Permalink
Merge pull request #27 from NREL/feat/ingest-multiple-tables
Browse files Browse the repository at this point in the history
Allow user to ingest multiple tables at once
  • Loading branch information
daniel-thom authored Dec 3, 2024
2 parents f2f95da + a89e32d commit 9ffa946
Show file tree
Hide file tree
Showing 29 changed files with 1,771 additions and 145 deletions.
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
56 changes: 56 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "Chronify"
copyright = "2024, NREL"
author = "NREL"
release = "0.1.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"myst_parser",
"sphinx.ext.githubpages",
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.todo",
"sphinx_copybutton",
"sphinxcontrib.autodoc_pydantic",
"sphinx_tabs.tabs",
]

templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

source_suffix = {
".txt": "markdown",
".md": "markdown",
}

html_theme = "furo"
html_title = "Chronify Documentation"
html_theme_options = {
"navigation_with_keys": True,
}
html_static_path = ["_static"]

todo_include_todos = True
autoclass_content = "both"
autodoc_member_order = "bysource"
todo_include_todos = True
copybutton_only_copy_prompt_lines = True
copybutton_exclude = ".linenos, .gp, .go"
copybutton_line_continuation_character = "\\"
copybutton_here_doc_delimiter = "EOT"
copybutton_prompt_text = "$"
copybutton_copy_empty_lines = False
11 changes: 11 additions & 0 deletions docs/explanation/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
```{eval-rst}
.. _explanation-page:
```
# Explanation

```{eval-rst}
.. toctree::
:maxdepth: 2
:caption: Contents:
```
9 changes: 9 additions & 0 deletions docs/how_tos/getting_started/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Getting Started

```{eval-rst}
.. toctree::
:maxdepth: 2
installation
quick_start
```
37 changes: 37 additions & 0 deletions docs/how_tos/getting_started/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@

```{eval-rst}
.. _installation:
```

# Installation

1. Install Python 3.11 or later.

#. Create a Python 3.11+ virtual environment. This example uses the ``venv`` module in the standard
library to create a virtual environment in your home directory. You may prefer a single
`python-envs` in your home directory instead of the current directory. You may also prefer ``conda``
or ``mamba``.

```{eval-rst}
.. code-block:: console
$ python -m venv env
```

2. Activate the virtual environment.

```{eval-rst}
.. code-block:: console
$ source env/bin/activate
```

Whenever you are done using chronify, you can deactivate the environment by running ``deactivate``.

3. Install the Python package `chronify`.

```{eval-rst}
.. code-block:: console
$ pip install chronify
```
43 changes: 43 additions & 0 deletions docs/how_tos/getting_started/quick_start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Quick Start

```python

from datetime import datetime, timedelta

import numpy as np
import pandas as pd
from chronify import DatetimeRange, Store, TableSchema

store = Store.create_file_db(file_path="time_series.db")
resolution = timedelta(hours=1)
time_range = pd.date_range("2020-01-01", "2020-12-31 23:00:00", freq=resolution)
store.ingest_tables(
(
pd.DataFrame({"timestamp": time_range, "value": np.random.random(8784), "id": 1}),
pd.DataFrame({"timestamp": time_range, "value": np.random.random(8784), "id": 2}),
),
TableSchema(
name="devices",
value_column="value",
time_config=DatetimeRange(
time_column="timestamp",
start=datetime(2020, 1, 1, 0),
length=8784,
resolution=timedelta(hours=1),
),
time_array_id_columns=["id"],
)
)
query = "SELECT timestamp, value FROM devices WHERE id = ?"
df = store.read_query("devices", query, params=(2,))
df.head()
```

```
timestamp value id
0 2020-01-01 00:00:00 0.594748 2
1 2020-01-01 01:00:00 0.608295 2
2 2020-01-01 02:00:00 0.297535 2
3 2020-01-01 03:00:00 0.870238 2
4 2020-01-01 04:00:00 0.376144 2
```
14 changes: 14 additions & 0 deletions docs/how_tos/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
```{eval-rst}
.. _how-tos-page:
```
# How Tos

```{eval-rst}
.. toctree::
:maxdepth: 2
:caption: Contents:
getting_started/index
ingest_multiple_tables
map_time_config
```
72 changes: 72 additions & 0 deletions docs/how_tos/ingest_multiple_tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# How to Ingest Multiple Tables Efficiently

There are a few important considerations when ingesting many tables:
- Use one database connection.
- Avoid loading all tables into memory at once, if possible.
- Ensure additions are atomic. If anything fails, the final state should be the same as the initial
state.

**Setup**

The input data are in CSV files. Each file contains a timestamp column and one value column per
device.

```python
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
from chronify import DatetimeRange, Store, TableSchema, CsvTableSchema

store = Store.create_in_memory_db()
resolution = timedelta(hours=1)
time_config = DatetimeRange(
time_column="timestamp",
start=datetime(2020, 1, 1, 0),
length=8784,
resolution=timedelta(hours=1),
)
src_schema = CsvTableSchema(
time_config=time_config,
column_dtypes=[
ColumnDType(name="timestamp", dtype=DateTime(timezone=False)),
ColumnDType(name="device1", dtype=Double()),
ColumnDType(name="device2", dtype=Double()),
ColumnDType(name="device3", dtype=Double()),
],
value_columns=["device1", "device2", "device3"],
pivoted_dimension_name="device",
)
dst_schema = TableSchema(
name="devices",
value_column="value",
time_array_id_columns=["id"],
)
```

## Automated through chronfiy
Chronify will manage the database connection and errors.
```python
store.ingest_from_csvs(
src_schema,
dst_schema,
(
"/path/to/file1.csv",
"/path/to/file2.csv",
"/path/to/file3.csv",
),
)

```

## Self-Managed
Open one connection to the database for the duration of your additions. Handle errors.
```python
with store.engine.connect() as conn:
try:
store.ingest_from_csv(src_schema, dst_schema, "/path/to/file1.csv")
store.ingest_from_csv(src_schema, dst_schema, "/path/to/file2.csv")
store.ingest_from_csv(src_schema, dst_schema, "/path/to/file3.csv")
except Exception:
conn.rollback()
```
90 changes: 90 additions & 0 deletions docs/how_tos/map_time_config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# How to Map Time
This recipe demonstrates how to map a table's time configuration from one type to another.

**Source table**: data is stored in representative time where there is one week of data per month by
hour for one year.

**Destination table**: data is stored with `datetime` timestamps for each hour of the year.

**Workflow**:
- Add the source table to the database.
- Call `Store.map_table_time_config()`
- Chronify adds the destination table to the database.

This example creates a representative time table used in chronify's tests.

1. Ingest the source data.

```python
from datetime import datetime, timedelta

import numpy as np
import pandas as pd

from chronify import (
DatetimeRange,
RepresentativePeriodFormat,
RepresentativePeriodTime,
Store,
CsvTableSchema,
TableSchema,
)

src_table_name = "ev_charging"
dst_table_name = "ev_charging_datetime"
hours_per_year = 12 * 7 * 24
num_time_arrays = 3
df = pd.DataFrame({
"id": np.concat([np.repeat(i, hours_per_year) for i in range(1, 1 + num_time_arrays)]),
"month": np.tile(np.repeat(range(1, 13), 7 * 24), num_time_arrays),
"day_of_week": np.tile(np.tile(np.repeat(range(7), 24), 12), num_time_arrays),
"hour": np.tile(np.tile(range(24), 12 * 7), num_time_arrays),
"value": np.random.random(hours_per_year * num_time_arrays),
})
schema = TableSchema(
name=src_table_name,
value_column="value",
time_config=RepresentativePeriodTime(
time_format=RepresentativePeriodFormat.ONE_WEEK_PER_MONTH_BY_HOUR,
),
time_array_id_columns=["id"],
)
store = Store.create_in_memory_db()
store.ingest_table(df, schema)
store.read_query(src_table_name, f"SELECT * FROM {src_table_name} LIMIT 5").head()
```

```
id month day_of_week hour value
0 1 1 0 0 0.578496
1 1 1 0 1 0.092271
2 1 1 0 2 0.111521
3 1 1 0 3 0.671668
4 1 1 0 4 0.782365
```

2. Map the table's time to datetime.
```python
dst_schema = TableSchema(
name=dst_table_name,
value_column="value",
time_array_id_columns=["id"],
time_config=DatetimeRange(
time_column="timestamp",
start=datetime(2020, 1, 1, 0),
length=8784,
resolution=timedelta(hours=1),
)
)
store.map_table_time_config(src_table_name, dst_schema)
store.read_query(dst_table_name, f"SELECT * FROM {dst_table_name} LIMIT 5").head()
```

```
id value timestamp
0 3 0.006213 2020-01-01 00:00:00
1 3 0.865765 2020-01-01 01:00:00
2 3 0.187256 2020-01-01 02:00:00
3 3 0.336157 2020-01-01 03:00:00
4 3 0.582281 2020-01-01 04:00:00
```
36 changes: 36 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Chronify

This package implements validation, mapping, and storage of time series data in support of
Python-based modeling packages.

## Features
- Stores time series data in any database supported by SQLAlchemy.
- Supports data ingestion in a variety of file formats and configurations.
- Supports efficient retrieval of time series through SQL queries.
- Validates consistency of timestamps and resolution.
- Provides mappings between different time configurations.

```{eval-rst}
.. toctree::
:maxdepth: 2
:caption: Contents:
:hidden:
how_tos/index
tutorials/index
reference/index
explanation/index
```

## How to use this guide
- Refer to [How Tos](#how-tos-page) for step-by-step instructions for creating store and ingesting data.
- Refer to [Tutorials](#tutorials-page) examples of ingesting different types of data and mapping
between time configurations.
- Refer to [Reference](#reference-page) for API reference material.
- Refer to [Explanation](#explanation-page) for descriptions and behaviors of the time series store.

# Indices and tables

- {ref}`genindex`
- {ref}`modindex`
- {ref}`search`
Loading

0 comments on commit 9ffa946

Please sign in to comment.