-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/File Type Conversion/XML to JSON #47
base: main
Are you sure you want to change the base?
Conversation
output_path = os.path.dirname(xml_path) | ||
output_directory = f'{output_path}/json' | ||
else: | ||
output_directory = f'{output_path}/json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an output path is explicitly provided, will we still need to add the JSON subfolder?
os.makedirs(output_directory) | ||
|
||
# Write the contents of the JSON file into the output folder path. | ||
file_path = os.path.join(output_directory, f'{file_name}.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here. Should a non-empty output_path
be a file or a directory?
file_path = os.path.join(output_directory, f'{file_name}.json') | ||
try: | ||
with open(file_path, "w") as json_file: | ||
json_file.write(json_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with XML. Will this write out rows that are bounded by brackets ([]
)? If so, we'll want to write out JSON lines by iterating over the dictionary rows and writing them out with a newline at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question about the naming conventions. Because we already have a translate_csv_file_to_jsonl()
method in the jsonl
callable submodule, I'd argue we should put it there.
@AngelicaLastra I left a few comments a while ago. Apologies for not tagging you to elevate them back to attention! If we can resolve them, we can get this merged into 0.3.2. |
Description & motivation
While developing an Earthmover Template for Stadium Texas’ HRIS Staff Data, we encountered challenges with Earthmover’s ability to parse variables within non-flattened XML files. To address this, we created a Python script that converts nested XML files into JSON format. After some consideration, we felt that this script general enough that it can be used by other projects, and thus thought it would be a good opportunity to include this in our
EA_airflow_util
repo.PR Merge Priority:
New files created:
file_type.py
: Not sure if the name of this model is the one we want to go with, but something longer like "file_type_conversion.py
" felt out of place given the short and straightforward names in the other files.Tests and QC done: