Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
bubriks committed Dec 9, 2024
1 parent ad441ee commit 840824e
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions utils/python/hsfs_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,9 @@ def offline_fg_materialization(spark: SparkSession, job_conf: Dict[Any, Any], in
.load()
)

# Cache the DataFrame in memory
df.cache()

# filter only the necassary entries
filtered_df = df.filter(expr("CAST(filter(headers, header -> header.key = 'featureGroupId')[0].value AS STRING)") == str(entity._id))
filtered_df = filtered_df.filter(expr("CAST(filter(headers, header -> header.key = 'subjectId')[0].value AS STRING)") == str(entity.subject["id"]))
Expand Down

0 comments on commit 840824e

Please sign in to comment.