-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] dedicated tutorial on updating database values #3638
Comments
Hi @nick-youngblut, Thanks for the feature request! You can update We're currently working on releasing a new TileDB Academy doc focused exclusivley on updating and adding new arrays to existing
Your note on querying for null-type values is great feedback. I'll discuss with the team about adding this to the docs as well. Spencer |
I missed In regards to null-type, I found that some of the values are encoded as literal "NaN" strings, while others are actually NaN values. I'm not sure why this happened, since I've appended data in the same manner for all datasets added to the database. |
@nick-youngblut this is timely. We have a couple things in flight at present:
If it's not too much trouble -- can you please share what your |
My
My
Converting some of the database obs to a pandas dataframe results in the following dtypes:
I hope that info is helpful! |
Thanks @nick-youngblut ! And in which column(s) are you seeing the |
A bit of background might be helpful. My workflow for building the tiledb-soma database is:
Currently, I'm batching 10's of mtx files into 1 h5ad file (less memory than 100's). Do you think tiledb-soma database appends are faster with:
Appending datasets to the tiledb-soma database takes a lot of time, so any suggestions on optimizing this process would be greatly appreciated! |
The NaN values are seen in |
Thanks @nick-youngblut for the question! In fact I do not have numbers on this performance question. My gut feeling is it doesn't matter too much as the bulk of the time is (in my experience) in |
From limited testing, I've found that separating the creation of small batched h5ad files (e.g., 10-20 datasets) and then appending them to a database increases the throughput by ~10x. However, I don't know if that really scales with batching larger numbers of datasets. I'll check it empirically. Thanks! |
@nick-youngblut our commercial product takes care of scaling, logging, cataloging TileDB-SOMAs along with your other data assets. We have solved these problems you're likely encountering if you'd like to discuss that more you can reach out to me or our team. |
Is your feature request related to a problem? Please describe.
It would be helpful to have a dedicated tutorial on updating values in a database (e.g., convert all
NaN
values in an obs metadata column to0
). I believe this would involve:NaN
values, I've triedtiledbsoma.AxisQuery(value_filter='is_null(my_column)')
, but that doesn't work.NaN
=>0
) in the dataframeDescribe alternatives you've considered
The existing tutorials, API docs, and getting help from LLMs
The text was updated successfully, but these errors were encountered: