Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding [expected cells] parameter and [whitelist] parameter #155

Open
jhjlee opened this issue May 25, 2021 · 2 comments
Open

Understanding [expected cells] parameter and [whitelist] parameter #155

jhjlee opened this issue May 25, 2021 · 2 comments

Comments

@jhjlee
Copy link

jhjlee commented May 25, 2021

Hello,

Thanks for making this tool. I have a quick question about the parameters involving cells.
User can input the number of cells expected in the run, and/or also input the list of cell barcodes CITE-seq-Count would look for.
(1) For the number of cells, should this be restricted to how many filtered (from cellranger) cells were obtained in partner scRNA-seq? If I have 5,000 cells in scRNA-seq data, should I put down 5,000 or would it be better to put down a larger number, say 10,000? Are cells (and their associated hash) potentially lost if I put a smaller number?
(2) For the whitelist, I have it as the filtered cell barcodes from the partner scRNA-seq data. While I understand that CITE-seq-Count will correct other barcodes based on this list, am I potentially incurring false positives if I force the analysis to output results from this list?
(3) Putting them together, would the optimal approach be to include a larger number of cells (than the filtered cells from partner scRNA-seq data) AND include a whitelist comprised of cell barcodes from scRNA-seq? Thank you!

@sunta3iouxos
Copy link

Hi there,
Have you found the answers to these? I am also wondering on the same.
regards

@Hoohm
Copy link
Owner

Hoohm commented Nov 21, 2021

Hello @jhjlee, here are a few answers to your questions

  1. I would recommend using about 20% more cells than expected. So for 5k I would use about 6 or 7K. This will allow to catch a few more reads. I don't expect this to change your results much though. You should always try and run with both and compare your results.
  2. Using the whitelist is the safest way to ensure that you will only grab cells that you expect. It will only capture and report the cells that are in your list or the ones that have been corrected and are in your input list.
  3. For data analysis and downstream processing I would recommend using the received whitelist. For troubleshooting, I would recommend using expected cells to get everything out of your data and then compare to the whitelist.

I hope this helps. Let me know if you have other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants