Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request foe help with cellbender graph #222

Closed
beetlejuice007 opened this issue May 24, 2023 · 2 comments
Closed

Request foe help with cellbender graph #222

beetlejuice007 opened this issue May 24, 2023 · 2 comments
Assignees
Labels
user question User question about a specific dataset

Comments

@beetlejuice007
Copy link

Hi

Can anyone tell me what's wrong with this sample and how I can fix this ?
Thanks

image

cellbender remove-background --input xxx/outs/raw_feature_bc_matrix.h5 --output xxx/outs/cellbender/cb_feature_bc_matrix.h5 --low-count-threshold 5 --expected-cells 900 --total-droplets-included 50000 --fpr 0.01 --epochs 100

@sjfleming
Copy link
Member

Hi @hemantgujar , I've been doing a lot of digging into these kinds of issues as I'm getting everything finalized for v0.3.0. And I now believe that these kinds of spikes in the learning curve are caused by the model struggling to identify which cells contain droplets. (Really what's going on is that the initialization for the neural network that predicts posterior cell probabilities is fighting against what the model wants it to do... and so at some point the model has to climb out of a local minimum to achieve better performance, which is what those spikes are.) Sometimes it turns out that those kinds of spikes are not a big deal, and don't really affect the outcome. It's especially true if it looks like the learning curve "picks back up where it left off", which yours does. If the learning curve after the spikes looks weird, or converges to a different ELBO value than before... then I'd be worried that learning really went off track.

But long story short:

  • I think that particular training run might be fine
  • You can try to fix this by reducing the learning rate (but I admit that doesn't always work)
  • I am trying very hard to fix this for v0.3.0, and I think I'm almost there
  • You might also be able to reduce the value of --total-droplets-included depending on your dataset, if you think that you're sure you've reached the empty droplet plateau before droplet 50k

@sjfleming sjfleming added the user question User question about a specific dataset label Jun 2, 2023
@sjfleming
Copy link
Member

This kind of thing is hopefully much better in v0.3.0

Closed by #238

@sjfleming sjfleming self-assigned this Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user question User question about a specific dataset
Projects
None yet
Development

No branches or pull requests

2 participants