You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the right behavior for phaser in this situation?
The pipeline defaults to CSV format. It's desirable that the checkpoints and the output be in CSV format.
The original source file is JSON records format. The data is fine, and some records have more fields than others which is pretty normal.
Phaser wants to save a copy of the source file immediately to the working directory, with line numbers, in order to be able to do diffs later if asked and detect deleted/changed rows.
Without special logic, this fails, because the library that saves to CSV stumbles over the extra fields in some JSON records and throws a ValueError.
Some possibilities:
If saving the source copy fails, proceed without fixing. This will make diffs not work later, but the pipeline could still work.
Go through the data and ensure each dict has all the fields of any dict, before or as we pass to the CSV writer, so that the save as CSV with row numbers works.
Raise the failure and suggest that the user do what? switch to default JSON save behavior which they might not want? Fix the data before bringing it into phaser when fixing the data IN phaser is the whole point?
We'll have this question again for saving data between phases, won't we?
The text was updated successfully, but these errors were encountered:
What's the right behavior for phaser in this situation?
Without special logic, this fails, because the library that saves to CSV stumbles over the extra fields in some JSON records and throws a ValueError.
Some possibilities:
We'll have this question again for saving data between phases, won't we?
The text was updated successfully, but these errors were encountered: