353 new member field breaks create joined timeseries on existing datasets #355

mgdenno · 2024-12-18T03:18:49Z

@samlamont This is kind of a bug fix as the remote datasets on S3 do not have the member field. This is mostly ok because the joined_timeseries table has been generated and is alos stored in S3. But if the user wants to clone the Evaluation from S3 and then recreate the joined_timeseries table it fails because the secondary_timeseries doe snot pass validation. I added a `add_missing_fields arg and code to the BaseTable which can be used to fix this by forcing and read, validate and write. We need to do this on all the datasets in S3 (that we also have in teehr-hub) and re-sync. With that, I'd like to get this merged ASAP.

samlamont · 2024-12-18T10:37:47Z

src/teehr/evaluation/tables/base_table.py

+        if add_missing_columns:
+            for col_name in schema_cols:
+                if col_name not in df.columns:
+                    df = df.withColumn(col_name, lit(None))


just to make sure I understand, here we're adding an empty column(s) (column names that exist in the schema but not dataframe), then in line 186 it coerces the empty column(s) to the correct data type as defined by the schema?

mgdenno added 2 commits December 17, 2024 22:09

adds add_missing_columns arg to _validate()

ac6f8f5

updates change log and version to 0.4.6

1bb0e14

mgdenno requested a review from samlamont December 18, 2024 03:18

mgdenno linked an issue Dec 18, 2024 that may be closed by this pull request

new member field breaks create joined timeseries on existing datasets #353

Open

samlamont approved these changes Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

353 new member field breaks create joined timeseries on existing datasets #355

353 new member field breaks create joined timeseries on existing datasets #355

mgdenno commented Dec 18, 2024

samlamont Dec 18, 2024

353 new member field breaks create joined timeseries on existing datasets #355

Are you sure you want to change the base?

353 new member field breaks create joined timeseries on existing datasets #355

Conversation

mgdenno commented Dec 18, 2024

samlamont Dec 18, 2024

Choose a reason for hiding this comment