Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on data structure #1

Open
marcocaggioni opened this issue Nov 4, 2021 · 6 comments
Open

Discussion on data structure #1

marcocaggioni opened this issue Nov 4, 2021 · 6 comments

Comments

@marcocaggioni
Copy link
Contributor

Opening this issue to discuss data structure:

Scope:
proposing a standard way to store and retrieve data

Proposing a dictionary of pandas dataframes as a good way to handle data from our primary sources:

  • Rheology measurements - sequence of measurements steps
  • Published data digitized from figures - collection of datasets identified by a key in the legend

When the order in the dictionary is meaningful, as for example in the rheology measurement case for which the steps order is important I propose to prepend an index to the step name

data_dict.keys()

dict_keys(['1_Flow_curve_down', '2_Strain_sweep_down', '3_Strain_sweep_up', '4_Freq_sweep', '5_Flow_curve_up'])

type(data_dict['1_Flow_curve_down'])

pandas.core.frame.DataFrame

This structure is very generic and can be stored as a sqllite file very easily:

import sqlite3
def data_dict_to_sqllite(data_dict,sqllite_file, add_step_index=False): 
    cnx = sqlite3.connect(sqllite_file)
    
    for index, (key, dataframe) in enumerate(data_dict_974.items()):
        if add_step_index:
            key=str(index) +'_'+ key
       
        dataframe.to_sql(name=key, con=cnx)
    
data_dict_to_sqllite(data_dict,'data_dict.db', add_step_index=True)

once you have the sqllite file you can explore it with for example:

The data can be read back into a dictionary of tables:

def read_sqllite_to_data_dict(sqllitefile):
    try:
        cnx = sqlite3.connect(sqllitefile)

        data_dict={}
        for item in pd.read_sql("select * from sqlite_master WHERE type='table'", cnx).iterrows():
            table_name=item[1]['name']
            data_dict[table_name]=pd.read_sql_query(f'SELECT * FROM "{table_name}";',cnx)
            
        return data_dict

    except sqlite3.Error as error:
        print("Failed to execute the above query", error)

    finally:

        if cnx:
            cnx.close()
            print("the sqlite connection is closed")

data_dict=read_sqllite_to_data_dict('data_dict_974.db')
@ManonMarchand
Copy link
Owner

We could also have a look at this repo https://github.com/JuliaRheology/RHEOS.jl

It'd be nice if we make the flow from our database to their fitting library easy.

@ManonMarchand
Copy link
Owner

other cool software to check : https://reptate.readthedocs.io/

@ManonMarchand
Copy link
Owner

example of an organization of data shared for everyone on GitHub https://github.com/fivethirtyeight/data they stick to the CSV side of the force

@ManonMarchand
Copy link
Owner

Nomenclature of the Society of Rheology

https://sor.scitation.org/doi/pdf/10.1122/1.4811184

@ManonMarchand
Copy link
Owner

Unique identifiers for samples to investigate https://github.com/IGSN

@ManonMarchand
Copy link
Owner

ManonMarchand commented Mar 30, 2023

Mix SQL and JSON files https://gitlab.obspm.fr/exoplanet/py-linq-sql/-/tree/main

--> Sqlite has a json extension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants