-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CSV inputs for uploader, link to multiple objects #19
Support CSV inputs for uploader, link to multiple objects #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot. Some small comments are inline.
omero2pandas/upload.py
Outdated
annotation_id = link_obj.child.id.val | ||
LOGGER.info(f"Uploaded as FileAnnotation {annotation_id}") | ||
extra_link_objs = [] | ||
unloaded_ann = omero.model.FileAnnotationI(annotation_id, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we are not importing FileAnnotationI
into the package scope?
else: | ||
raise NotImplementedError(f"Column type " | ||
f"{column_type} not supported") | ||
if col_class == omero.grid.StringColumn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As below, import this? There are also features in omero.columns
(which has the concrete implementations of the omero.grid
column types) that you may find useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a lot of classes from omero.grid and omero.model, so I'd just left it as a full import.
I wasn't aware that omero.columns
existed, it's not mentioned in the main Python docs at all. Probably worth discussing whether there's any benefit to be had from using those instead of grid, and if so it could be best as a dedicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few additional comments & suggestions from a strict code review perspective, some of them are mostly to clarify how the new feature is expected to be invoked.
I'll schedule some functional testing over the next few days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested through a workflow creating an OMERO.table from a CSV and linking it both to an ROI and the parent Image using the new API.
Everything worked as expected using a CSV with 100k rows and 338 columns as the input. The table was successfully created and linked and could be used as previously using the API. The chunk size is correctly computed internally and the upload logic worked without issue creating the table.
I have not tested all the validation conditions for incorrect target objects but there are still discussions on #19 (comment). Assuming these need additional scoping time (and might be better captured as a separate issue), the only outstanding question from my side is #19 (comment) and whether the README should suggest to let the code auto-compute the chunk size by default (and mention chunk_size
) for advanced usage.
Made revisions per @sbesson's suggestions
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DavidStirling, the latest API changes make sense to me and simplify the workflow when linking to multiple parent object at once. Successfully rested in the context described previously of a table linked to both a ROI and an Image.
I will run a final round of testing on Monday in particular to check some of the error conditions but overall, I think we should be in a good position to get this included and released as 0.3.0
and address bugs as we start consuming it.
omero2pandas.upload_table
now supports either a pandas dataframe or CSV path as an input. With a CSV we need to establish the max length of any string columns, so we need to scan the file before uploading. Strategy is as follows: