Replies: 1 comment 2 replies
-
Not just that. WebAssembly uses 32-bit pointers, which imposes a 4 GB limit on the structures that can be created in the C++ code. This restriction should be lifted whenever the
This has been considered, albeit not with Cloudflare. The Workers have runtime limits that will probably be exceeded if your dataset is truly large. They also have a 1 MB upload limit for the total size of the app, and our Wasm file alone is already getting close to that, nevermind all the other JS bits and pieces around it. If we were to do this, we would likely need a full fledged EC2 instance.
We thought about this but have yet to decide on an approach. The problem is whether one is really running an analysis with kana, given that the data and compute are hosted remotely (with the associated issues of data privacy and backend deployment that kana was designed to avoid). If we were to implement this, it would probably be a different application that takes a subset of kana's features and provides some kind of backend specification for state and feature queries.
Provided it's not using Wasm, that is possible. For example, scran.chan uses the same C++ code to provide the same analysis functions as kana but in an R context. One could imagine writing the same bindings in your language of choice, e.g., Python, Julia, Golang... Of course, whether I want to tangle with libraries like Qt to create the UI is another matter altogether. |
Beta Was this translation helpful? Give feedback.
-
Some datasets are too big for a machine with insufficient RAM.
Even if a user's machine has more than enough RAM, my current understanding is that the Chrome web browser limits users to 4GB per tab. (More details here).
So, if Kana only accepts files on the user's local disk, then the user cannot run Kana on their files with millions of cells.
I'm interested to learn more about how to work around such limits and discuss two ideas below.
Cloudflare
Robert Aboukhalil has a great blog post that helped me to understand how Cloudflare might be useful for bioinformatics web apps.
Cloudflare can fetch a large file will millions of cells from a cloud provider like S3. Next, Cloudflare can run the WASM Kana code remotely and store the results remotely. Finally, the Kana user interface can fetch the subset of results that the user wants to see right now.
This means the user will never need to download the full dataset or the full results. Instead, the user will download a few megabytes of data on-the-fly to get a few million UMAP coordinates along with a few genes' expression values. So, it is certainly feasible to run Kana on millions of cells — but the analysis would be run remotely.
However, the current
.kana
file format for analysis results does not support random access, so it does not support fetching subsets of results.In contrast, the zarr file format does support random access and is easily extensible for any data. It is already in use by some groups who analyze single-cell data.
Desktop app
A desktop app built on top of the underlying C++ code will not have an artificial 4GB limit on memory.
I do not know whether or not a desktop app built with Node or Deno or Tauri are affected by the 4GB per-tab limit that exists inside Google Chrome.
One obvious benefit of a desktop app is that the user keeps their data on their machine without uploading it to Cloudflare.
Beta Was this translation helpful? Give feedback.
All reactions