Skip to content

Commit

Permalink
add data details
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilipMay committed Jul 3, 2024
1 parent 61c42a4 commit 2797864
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions source/blog/2024/pandas-data-format-and-compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,19 @@ Factors such as RAM usage are not considered.
For better transparency, the data and the Jupyter notebooks are stored in a
[GitHub repository](https://github.com/PhilipMay/pandas_compression).

## Test Data

Our test data has a size of 785.45 MB in RAM.
The table has 363,491 rows and 42 columns.
The content of the columns is as follows:

- a UUID (string)
- an English text (string)
- 20 columns with random integer values
- 20 columns with random float values

For details see the Notebook called [01_create_dataset.ipynb](https://github.com/PhilipMay/pandas_compression/blob/main/01_create_dataset.ipynb).

## Compression Methods

First, we compare the compression ratio of the different combinations of data format and compression method.
Expand Down

0 comments on commit 2797864

Please sign in to comment.