You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With opengrid-data I want to create a user-friendly solution to get (large sets of) building monitoring data. This repo should never be a dependency of opengrid, but will probably be a dependency of opengrid-demo. I also propose to keep the limited set of demo data in opengrid for elementary testing.
We should discuss on the best way to serve these large datasets. Here's a first set of desired features:
we have a frozen and a evolutive dataset, the first one never changes (useful for tests, demos, etc), the second one can be extended as more data becomes available with time
it should be easy to get an overview of available data
missing/updated data is fetched automatically
the data is cached on the local harddrive
I currently see two solutions, there may be others:
we simply put all data in a python package and host it on pip. Similarly to the module datasets.py in opengrid we have some code to list and load dataframes.
we host the datafiles on a public file hosting solution and write code to sync these files with the local hard drive and check for updates.
There's also git-lfs (https://git-lfs.github.com/) but that would require git and the additional git-lfs package to be installed which is not currently required for opengrid users who just want to do pip installs.
Feel free to add requirements and solutions and we put this topic on the agenda for our next dev meeting.
The text was updated successfully, but these errors were encountered:
I checked the GitHub storage limits: https://help.github.com/articles/what-is-my-disk-quota/
I don't think we will hit files exceeding 100MB, or 1GB in total anytime soon (or will we?), so maybe we're good with storing them on GitHub for now.
We should think about an interface that will allow us to move the data somewhere else in the future without it making a difference for the user.
About file formats: CSV is too slow and loses some metadata, Pickle is incompatible with other versions of Python, … so maybe we should look at HDF5? I have no experience with it myself but I'm hearing good things...
With opengrid-data I want to create a user-friendly solution to get (large sets of) building monitoring data. This repo should never be a dependency of
opengrid
, but will probably be a dependency ofopengrid-demo
. I also propose to keep the limited set ofdemo
data inopengrid
for elementary testing.We should discuss on the best way to serve these large datasets. Here's a first set of desired features:
frozen
and aevolutive
dataset, the first one never changes (useful for tests, demos, etc), the second one can be extended as more data becomes available with timeI currently see two solutions, there may be others:
datasets.py
inopengrid
we have some code to list and load dataframes.There's also git-lfs (https://git-lfs.github.com/) but that would require git and the additional git-lfs package to be installed which is not currently required for opengrid users who just want to do pip installs.
Feel free to add requirements and solutions and we put this topic on the agenda for our next dev meeting.
The text was updated successfully, but these errors were encountered: