Please note that writing .db files is memory intensive. You can decrease the memory usage by decreasing df_size in CreateDatabasev2.py (which is set to 100.000) but this will also increase run time. For reference: It takes around 2 hours to write 4.4 million events to a .db file at n_workers = 4.
CreateDatabasev2.py takes arguments:
--array_path: The path to numpy arrays from the I3-to-Numpy Pipeline I3Cols. E.g: /home/my_awesome_arrays
--key : The field/key of pulse information you want to add to the database. Multiple keys not supported. E.g : 'SplitInIcePulses'
--db_name : The name of your database. E.g: 'myfirstdatabase'
--gcd_path : The path to the gcd.pkl file containing spatial information. This file can be produced via /I3ToNumpy/create_geo_array.py if you don't have it.
--outdir : The Location in which you wish to save the database and the transformers. The script will save the database in yourpath/data and the pickled transformers in yourpath/meta. The transformers can be read using pandas.read_pickle()
--n_workers : The number of workers
Example:
python CreateDatabsesv2.py --array_path ~/numpy_arrays --key 'SplitInIcePulses' --db_name 'ADataBase' -- gcd_path ~/gcd --outdir ~/MyDatabases --n_workers 4
Suppose we now wanted to extract events (0,1,2,3,4), one could do so by
import pandas as pd
import sqlite3
db_file = "~data/mydbfile.db"
with sqlite3.connect(db_file) as con:
truth_query = 'select * from truth where event_no IN (0,1,2,3,4)'
truth = pd.read_sql(truth_query, con)
feature_query = 'select * from features where event_no IN (0,1,2,3,4)'
features = pd.read_sql(feature_query, con)
Notes:
This is effectively a Lite version of https://github.com/ehrhorn/cubedb, a more feature rich pipe-line.
./load_cvmfs.sh
Among many things, this loads IceTray , IceCube software required to read I3-files. Now you can write your I3-files to numpy arrays using I3Cols:
./makearray.sh
In I3ToNumpy/makearray.sh you can change the path and keys you wish to extract from the I3-files. To create the gcd.pkl file, you can then run:
./create_geo_array.py
Notes :
I3ToNumpy/create_geo_array.py was NOT made by me. (source: https://github.com/IceCubeOpenSource/retro/blob/master/retro/i3info/extract_gcd.py.)
If your cvmfs environment doesn't contain i3Cols or other external packages, you can install these on user level using
pip install --user yourpackage