-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] IO - Change origin attribute when not find on system #6555
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6555 +/- ##
=======================================
Coverage 87.78% 87.78%
=======================================
Files 321 321
Lines 69420 69445 +25
=======================================
+ Hits 60938 60962 +24
- Misses 8482 8483 +1 |
1764fde
to
e81adb2
Compare
b189e71
to
ad5dc9a
Compare
/rebase |
ad5dc9a
to
e9775cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'm not very happy about this guessing, but something has to be done and I have no better idea. It would probably do the work.
Orange/data/io_util.py
Outdated
|
||
# all column paths in lookup dirs | ||
for ld in lookup_dirs: | ||
if all(os.path.exists(os.path.join(ld, v)) for v in table.get_column(attr)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we would skip unknown values here, e.g. by adding if v
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. I also added the test case for it.
Orange/data/io_util.py
Outdated
file_dir = os.path.dirname(file_path) | ||
parent_dir = os.path.dirname(file_dir) | ||
# if file_dir already root file_dir == parent_dir | ||
lookup_dirs = tuple({file_dir, parent_dir}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to look into file_dir
first, and only then into parent_dir
? Sets are unordered.
If you want to keep it short, use tuple({file_dir: 0, parent_dir: 0})
. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Fixed
Orange/data/io_util.py
Outdated
parent_dir = os.path.dirname(file_dir) | ||
# if file_dir already root file_dir == parent_dir | ||
lookup_dirs = tuple({file_dir, parent_dir}) | ||
for attr in table.domain: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just table.domain.metas
?
Not only because of efficiency; I'm never sure whether a loop over domain includes metas or not. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Also image analytics only consider metas https://github.com/biolab/orange3-imageanalytics/blob/02356b8c14b2ec2d63f3a2f697ce69e47f05c4fa/orangecontrib/imageanalytics/utils/image_utils.py#L15-L33
Orange/data/io_util.py
Outdated
# if file_dir already root file_dir == parent_dir | ||
lookup_dirs = tuple({file_dir, parent_dir}) | ||
for attr in table.domain: | ||
if "origin" in attr.attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add if attr.is_string
as precaution. Is someone uses origin
for something else this could prevent some false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extended this condition on discrete variables also since Image Analytics also search in those: https://github.com/biolab/orange3-imageanalytics/blob/02356b8c14b2ec2d63f3a2f697ce69e47f05c4fa/orangecontrib/imageanalytics/utils/image_utils.py#L15-L33
e9775cc
to
2e8de49
Compare
Tests fail because of xgboost release. Fixed in #6570 |
/rebase |
2e8de49
to
72ccefe
Compare
Issue
For example, files that Orange loads (with the File widget) may contain paths in one or more columns—for example, the path to images for image analysis. One way to store paths may be to keep them as origin prefixes in attributes of attribute and the last part of the path as column values. Paths (origin + column values) are absolute paths. When the table is transferred to another computer, the path may not be valid anymore.
Description of changes
If the dataset's author provides files besides the CSV file, they may be discovered, and the origin can be fixed.
Additionally, I replaced StringIO reader inputs with actual files in the test. The code should not be adapted to cases only possible in tests.
Includes