Discussion on data structure #1

marcocaggioni · 2021-11-04T19:55:41Z

Opening this issue to discuss data structure:

Scope:
proposing a standard way to store and retrieve data

Proposing a dictionary of pandas dataframes as a good way to handle data from our primary sources:

Rheology measurements - sequence of measurements steps
Published data digitized from figures - collection of datasets identified by a key in the legend

When the order in the dictionary is meaningful, as for example in the rheology measurement case for which the steps order is important I propose to prepend an index to the step name

data_dict.keys()

dict_keys(['1_Flow_curve_down', '2_Strain_sweep_down', '3_Strain_sweep_up', '4_Freq_sweep', '5_Flow_curve_up'])

type(data_dict['1_Flow_curve_down'])

pandas.core.frame.DataFrame

This structure is very generic and can be stored as a sqllite file very easily:

import sqlite3
def data_dict_to_sqllite(data_dict,sqllite_file, add_step_index=False): 
    cnx = sqlite3.connect(sqllite_file)
    
    for index, (key, dataframe) in enumerate(data_dict_974.items()):
        if add_step_index:
            key=str(index) +'_'+ key
       
        dataframe.to_sql(name=key, con=cnx)
    
data_dict_to_sqllite(data_dict,'data_dict.db', add_step_index=True)

once you have the sqllite file you can explore it with for example:

https://sqlitebrowser.org/ a desktop application for windows and mac that allows to open and explore the sqllite file
https://github.com/pbugnion/jupyterlab-sql directly in jupyterlab

The data can be read back into a dictionary of tables:

def read_sqllite_to_data_dict(sqllitefile):
    try:
        cnx = sqlite3.connect(sqllitefile)

        data_dict={}
        for item in pd.read_sql("select * from sqlite_master WHERE type='table'", cnx).iterrows():
            table_name=item[1]['name']
            data_dict[table_name]=pd.read_sql_query(f'SELECT * FROM "{table_name}";',cnx)
            
        return data_dict

    except sqlite3.Error as error:
        print("Failed to execute the above query", error)

    finally:

        if cnx:
            cnx.close()
            print("the sqlite connection is closed")

data_dict=read_sqllite_to_data_dict('data_dict_974.db')

ManonMarchand · 2021-11-08T16:30:12Z

We could also have a look at this repo https://github.com/JuliaRheology/RHEOS.jl

It'd be nice if we make the flow from our database to their fitting library easy.

ManonMarchand · 2021-12-08T17:23:38Z

other cool software to check : https://reptate.readthedocs.io/

ManonMarchand · 2021-12-15T15:27:14Z

example of an organization of data shared for everyone on GitHub https://github.com/fivethirtyeight/data they stick to the CSV side of the force

ManonMarchand · 2022-01-31T11:03:13Z

Nomenclature of the Society of Rheology

https://sor.scitation.org/doi/pdf/10.1122/1.4811184

ManonMarchand · 2022-09-29T14:53:10Z

Unique identifiers for samples to investigate https://github.com/IGSN

ManonMarchand · 2023-03-30T12:57:22Z

Mix SQL and JSON files https://gitlab.obspm.fr/exoplanet/py-linq-sql/-/tree/main

--> Sqlite has a json extension

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion on data structure #1

Discussion on data structure #1

marcocaggioni commented Nov 4, 2021

ManonMarchand commented Nov 8, 2021

ManonMarchand commented Dec 8, 2021

ManonMarchand commented Dec 15, 2021

ManonMarchand commented Jan 31, 2022

ManonMarchand commented Sep 29, 2022

ManonMarchand commented Mar 30, 2023 •

edited

Loading

Discussion on data structure #1

Discussion on data structure #1

Comments

marcocaggioni commented Nov 4, 2021

ManonMarchand commented Nov 8, 2021

ManonMarchand commented Dec 8, 2021

ManonMarchand commented Dec 15, 2021

ManonMarchand commented Jan 31, 2022

ManonMarchand commented Sep 29, 2022

ManonMarchand commented Mar 30, 2023 • edited Loading

ManonMarchand commented Mar 30, 2023 •

edited

Loading