Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

353 new member field breaks create joined timeseries on existing datasets #355

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mgdenno
Copy link
Contributor

@mgdenno mgdenno commented Dec 18, 2024

@samlamont This is kind of a bug fix as the remote datasets on S3 do not have the member field. This is mostly ok because the joined_timeseries table has been generated and is alos stored in S3. But if the user wants to clone the Evaluation from S3 and then recreate the joined_timeseries table it fails because the secondary_timeseries doe snot pass validation. I added a `add_missing_fields arg and code to the BaseTable which can be used to fix this by forcing and read, validate and write. We need to do this on all the datasets in S3 (that we also have in teehr-hub) and re-sync. With that, I'd like to get this merged ASAP.

@mgdenno mgdenno requested a review from samlamont December 18, 2024 03:18
if add_missing_columns:
for col_name in schema_cols:
if col_name not in df.columns:
df = df.withColumn(col_name, lit(None))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure I understand, here we're adding an empty column(s) (column names that exist in the schema but not dataframe), then in line 186 it coerces the empty column(s) to the correct data type as defined by the schema?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new member field breaks create joined timeseries on existing datasets
2 participants