feature/maccor conversion #7

erikamundson · 2024-10-21T18:09:05Z

Conversion code for maccor csv. Test file is the first 25 cycles of one test from the UMich formation experiment.

For the conversion function, I assumed that the incoming dataframe would have a device_id and a test_id column. For the job, we can maybe accept a regex pattern in the job args to get this from the file name?

Other notes

Made some of the differential columns nullable in the telemetry schema since they'll be null on the first record.
There's a missing row in the CSV, do we want to accept the given record_number or re-index with a row_number() operation?
For the metadata column, I didn't cast any of the values but that should be fine.
The test is pretty simple (schema, row count, cycle number), let me know if there's anything else you think we should test for.
Added a cycler_type metadata entry with the value Maccor. Open to changing the name or value of this.

dyllamt · 2024-10-21T20:53:14Z

is it possible to write out a schema, like we have for telemetry, statistics_step, etc.? that would make it more explicit what is expected, and could specify the partition scheme for the table, sort order, a comment for the table, and ensure compatibility of data files with the expected schema. hard to tell if the [maccor] columns are the ones that usually are included in maccor files, and the other columns are added in from the people running the experiment. do you think that's why they are named that way in the test file?

I was imaging something like this for the application:

file bucket: file -> maccor
incremental processing: maccor -> telemetry

For tests, i've been doing:

schema validation
value correctness for one or two rows
null handling (easy
empty frame testing

I think we can treat each of these test suppliers exactly the same as the harmonized schemas (telemetry etc.). Each file has an expected schema and a transformation function

erikamundson · 2024-10-21T23:40:03Z

I added empty data handling, otherwise the tests have enough coverage IMO.

I was imaging something like this for the application:

file bucket: file -> maccor

incremental processing: maccor -> telemetry

I was thinking for the files, it makes more sense to go straight from file -> telemetry. If we want to include another step that's fine, but I don't see the benefit. I think the incremental processing should still work for steps/cycles aggregation.

I could come up with an expected schema for the CSV, although I think the VARIABLE columns and the auxiliary columns won't be expected to be the same for each test file.

Conversion code for maccor csv

53d601f

erikamundson requested a review from dyllamt October 21, 2024 18:09

add empty data test and handling

6fa617d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/maccor conversion #7

feature/maccor conversion #7

erikamundson commented Oct 21, 2024 •

edited

Loading

dyllamt commented Oct 21, 2024

erikamundson commented Oct 21, 2024

feature/maccor conversion #7

Are you sure you want to change the base?

feature/maccor conversion #7

Conversation

erikamundson commented Oct 21, 2024 • edited Loading

dyllamt commented Oct 21, 2024

erikamundson commented Oct 21, 2024

erikamundson commented Oct 21, 2024 •

edited

Loading