Skip to content

Commit

Permalink
fix: Excel implementation from polars
Browse files Browse the repository at this point in the history
  • Loading branch information
aalexmmaldonado committed Nov 8, 2024
1 parent 602792f commit 1aa47b0
Show file tree
Hide file tree
Showing 5 changed files with 70 additions and 55 deletions.
34 changes: 8 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,45 +51,27 @@ Clone the [repository](https://github.com/oasci/vaxstats):
git clone https://github.com/oasci/vaxstats.git
```

### Conda environment

Move into `vaxstats` directory (`cd vaxstats`) and install the development conda environment using [GNU Make](https://www.gnu.org/software/make/) (which could be installed by default on your system).

```bash
make environment
```

Now you can activate the new conda environment `vaxstats-dev` and use `vaxstats` commands.

```sh
conda activate vaxstats-dev
```

### Manual install

Alternatively, you can manually install `vaxstats` using `pip` after moving into the directory.
Install `vaxstats` using `pip` after moving into the directory.

```sh
pip install .
```

This will install all dependencies and `vaxstats` into your current Python environment.

## Deploying
## Development

We use [bump-my-version](https://github.com/callowayproject/bump-my-version) to release a new version.
This will create a git tag used by [poetry-dynamic-version](https://github.com/mtkennerly/poetry-dynamic-versioning) to generate version strings and update `CHANGELOG.md`.

For example, you would run the following command to bump the `minor` version.
We use [pixi](https://pixi.sh/latest/) to manage Python environments and simplify the developer workflow.
Once you have [pixi](https://pixi.sh/latest/) installed, move into `vaxstats` directory (e.g., `cd vaxstats`) and install the environment using the command

```bash
poetry run bump-my-version bump minor
pixi install
```

After releasing a new version, you must push and include all tags.
Now you can activate the new virtual environment using

```bash
git push --follow-tags
```sh
pixi shell
```

## License
Expand Down
36 changes: 34 additions & 2 deletions pixi.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,15 @@ black = { cmd = ["black", "--config", "pyproject.toml", "./content"] }
format = { depends-on = ["mdlint", "isort", "black"] }

[tool.pixi.dependencies]
python = ">=3.11.0,<3.13"
python = ">=3.12.0,<3.13"
polars = ">=1.12.0,<2"
loguru = ">=0.7.2,<0.8"
statsforecast = ">=1.7.8,<2"
xlsx2csv = ">=0.8.3,<0.9"
pandas = ">=2.2.3,<3"
matplotlib = ">=3.9.2,<4"
pyyaml = ">=6.0.2,<7"
fastexcel = ">=0.12.0,<0.13"

[tool.pixi.feature.dev.dependencies]
ruff = ">=0.7.2,<0.8"
Expand Down
2 changes: 1 addition & 1 deletion tests/test_df_prep.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def test_clean_df(path_example_excel):
def test_prep_df(path_example_excel):
df = load_file(path_example_excel)
df = clean_df(df)
df = prep_forecast_df(df, 0, 1, 6)
df = prep_forecast_df(df, 0, 0, 6)
assert df.columns == ["unique_id", "ds", "y"]
assert df.shape == (2_721, 3)

Expand Down
49 changes: 24 additions & 25 deletions vaxstats/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ def prep_forecast_df(
DataFrame's columns.
ValueError: If the date and time strings do not match the specified formats.
Notes:
If `date_idx` and `time_idx` are the same, we combine `input_date_fmt` and
`input_time_fmt` and load from the specified column.
Examples:
>>> import polars as pl
>>> data = {'date': ["01-01-23", "01-02-23"], 'time': ["01:00:00 PM", "02:00:00 PM"], 'y': [10, 20]}
Expand All @@ -96,35 +100,30 @@ def prep_forecast_df(
raise IndexError("One or more column indices are out of range")

# Select only the required columns using indices
df = df.select(df.columns[date_idx], df.columns[time_idx], df.columns[y_idx])

logger.debug("Combining date and time columns")
df = df.with_columns(
[pl.concat_str([df.columns[0], df.columns[1]], separator=" ").alias("ds")]
)
logger.debug(f"Example row: {df[0]}")

logger.debug(
if date_idx == time_idx:
df = df.select(df.columns[date_idx], df.columns[y_idx])
df = df.rename({df.columns[0]: "ds"})
else:
df = df.select(df.columns[date_idx], df.columns[time_idx], df.columns[y_idx])
logger.debug("Combining date and time columns")
df = df.with_columns(
[pl.concat_str([df.columns[0], df.columns[1]], separator=" ").alias("ds")]
)
logger.debug(
f"Parsing datetimes with date format '{input_date_fmt}' and time format '{input_time_fmt}'"
)
df = df.with_columns(
[
pl.col("ds")
.str.strptime(pl.Datetime, format=f"{input_date_fmt} {input_time_fmt}")
.alias("parsed_datetime")
]
)
logger.debug(f"Example row: {df[0]}")
)
df = df.with_columns(
[
pl.col("ds")
.str.strptime(pl.Datetime, format=f"{input_date_fmt} {input_time_fmt}", strict=False)
.alias("parsed_datetime")
]
)

logger.debug(f"Writing datetimes in '{output_fmt}'")
df = df.with_columns(
[pl.col("parsed_datetime").dt.strftime(output_fmt).alias("ds")]
)

df = df.drop("parsed_datetime")
logger.debug(f"Example row: {df[0]}")

# Rename the y column
df = df.rename({df.columns[2]: "y"})
df = df.rename({df.columns[1]: "y"})

logger.debug("Adding unique_id column")
df = df.with_columns(pl.lit(0).alias("unique_id"))
Expand Down

0 comments on commit 1aa47b0

Please sign in to comment.