You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@eliotgenton thanks. The selection argument in the ParquetDataset refers to the batches of chunked, pre-shuffled batches of data used for training. This is different from the SQLite dataset where the argument specifies individual events, because that data format provides fast random access to individual rows, making it possible to shuffle on the fly. As a result, the _get_all_indices methods are different - as you point out, the function in ParquetDataset returns the total amount of batches (files) available in the directory specified by the user.
I think this is indeed the intended usage of the method, but we could add statements to make this distinction clearer.
graphnet/src/graphnet/data/dataset/parquet/parquet_dataset.py
Line 192 in 652f194
I believe that this function is not intended to do this as this just returns the number of parquet files in a folder
The text was updated successfully, but these errors were encountered: