Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"object_type" column in ExternalResources may not be sufficient #71

Open
rly opened this issue May 23, 2023 · 5 comments
Open

"object_type" column in ExternalResources may not be sufficient #71

rly opened this issue May 23, 2023 · 5 comments
Assignees
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Milestone

Comments

@rly
Copy link
Contributor

rly commented May 23, 2023

We added "object_type" in the objects table in ExternalResources to make queries easier.

But in DynamicTables, the "object_type" would be "VectorData" which is very generic and using that would pick up a lot of false positives, so it does not make queries for annotations of table columns any easier.

@oruebel
Copy link
Contributor

oruebel commented May 23, 2023

CC @bendichter @mavaylon1

@mavaylon1
Copy link
Contributor

I believe the initial idea was to search for more specific structures such as Subject and that the user would want to see a variety of options to narrow down a search for whatever they were looking for. If we want to query something more specific then I think that should come from the feedback of the community. I am not sure of the kind of queries they would want. Query by name? (Assuming they know what they are looking for by name).

@oruebel
Copy link
Contributor

oruebel commented May 26, 2023

I believe the initial idea was to search for more specific structures such as Subject

Correct. The issue with tables is that the columns are typically just generic VectorData so that will typically be too generic for query.

@mavaylon1
Copy link
Contributor

@oruebel @rly This has been quiet for a bit and that's my fault. Let's restart the conversation with a question: How would a user search for something they want? In the case above, the "object_type" column is generic. If they want to look for a specific column the only thing they could get would be all objects that are VectorData. Even so, what are they trying to look for? Are they searching by name of the column? Are they searching for all columns that have a certain value?

I think it might be best to think of the "object_type" column as high level, i.e. finding Subjects, or specific TimeSeries subclasses, etc. Instead of changing the structure (because it's already a lot to look at for user) let's maybe shift to adding more query abilities. Thoughts?

@oruebel
Copy link
Contributor

oruebel commented Sep 13, 2023

I think it may be worth separating the issue of search from the main HERD data structure. Common field for search would probably be object_type, name, but could also include other, properties (e.g., the description etc.). Maybe instead of adding object_type to the main ObjectTable we could come up with a strategy to allow the user to specify which properties of objects should be "cached" with HERD to speed up search. Ultimately, the main reason to have this in HERD is to avoid having to open a large number of files to do the search. However, it's not clear to me that this is necessarily something we should do in HERD.

Option 1: Since HERD is being serialized to tsv one solution may be to have a separate, optional table object_properties that would store additional information about objects for search (e.g,. object_type and name). If a user could add custom columns to that table, then I think that would help address this issue. By placing this information into a separate table would have the advantage that it would help separate the desire for search from the core HERD data structure and at the same time make search based on specific properties easy.

Option 2: An alternative approach could also be to store a separate JSON file in HERD (which would have the same length as the ObjectTable and would store of each object a flat dict of key/value pairs. In this way each object can have it's own set of key/value pairs that it needs to expose for search.

These are just some ideas. We should discuss this issue a bit more. At first glance, I think Option 2 would be most flexible and may also be easiest to implement, but may not be optimal in terms of search performance (but probably still reasonable given the expected size).

@rly rly added category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Apr 11, 2024
@mavaylon1 mavaylon1 self-assigned this Apr 16, 2024
@mavaylon1 mavaylon1 added this to the 1.9.0 milestone Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

3 participants