[BUG] Misaligned DataFrame Schemas #87
Labels
bug
Something isn't working
dataframes
Issues or Features with Pandas Dataframes
Entity Types
Entity Resources
Intents
Intent Resources
Current Behavior
Currently, there are several areas in SCRAPI that we export and import DataFrames, and their schemas are misaligned.
This causes issues with streamlining a pipeline of events because column renaming or ETLs need to be done.
Examples:
Intents.intent_proto_to_dataframe exports columns =
display_name
,training_phrase
inbasic
mode.In advanced mode for the same method, the utterance is now called
text
.Mismatch of schema and semantics in the same method.
In DataframeFunctions.bulk_update_intents_from_dataframe, the
basic
mode expects input columns ofdisplay_name
andtext
.This is misaligned from the above schemas of the generated dataframes in Intents class.
So if your workflow is this:
Step 3 will break due to misaligned schema.
We should always be in alignment with "like for like" export/import (i.e. basic and basic should match 100%).
We should also be in alignment semantically across modes (i.e. basic and advanced have different schemas, but the columns that are shared are 100% named identically)
Expected Behavior
All DataFrame schemas within the same Resource type (i.e. Intents, Entity Types, etc.) should be in alignment.
Possible Solution
Centralize the creation and validation of all schema types to a file outside of the class that is using them.
Introduce
core/schemas.py
or similar to maintain a central schema repository.Then each respective class can pull their schema and schema validation rules from the central class, ensuring that we have continuity in DataFrame resources.
Steps to Reproduce
Try the following
The text was updated successfully, but these errors were encountered: