Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Add miscellaneous unit tests gpuci Run GPU CI/CD on PR
#606 opened Mar 23, 2025 by ryantwolf Loading…
3 tasks done
Add SDG tests gpuci Run GPU CI/CD on PR
#605 opened Mar 23, 2025 by ryantwolf Loading…
3 tasks done
Add metrics tests gpuci Run GPU CI/CD on PR
#604 opened Mar 23, 2025 by ryantwolf Loading…
3 tasks done
Add image unit tests gpuci Run GPU CI/CD on PR
#603 opened Mar 22, 2025 by ryantwolf Loading…
3 tasks done
Automatically run gpuCI on PR updates gpuci Run GPU CI/CD on PR
#602 opened Mar 21, 2025 by sarahyurick Loading…
Sem Dedup Improvements / Tests
#598 opened Mar 19, 2025 by praateekmahajan Draft
3 tasks
Add more tests to test_dataset gpuci Run GPU CI/CD on PR
#594 opened Mar 17, 2025 by sarahyurick Loading…
LLM-based PII redaction
#585 opened Mar 12, 2025 by sarahyurick Draft
4 of 5 tasks
Nvingest curator tutorial
#584 opened Mar 11, 2025 by ruchaa-apte Loading…
Hard Negative Mining
#580 opened Mar 5, 2025 by vinay-raman Loading…
3 tasks done
Re-add Common Crawl tests with skips
#579 opened Mar 3, 2025 by sarahyurick Loading…
Add Regex Modifier
#568 opened Feb 24, 2025 by shuoyangd Loading…
3 tasks done
Add option to skip data by adding a flag instead of removing them
#566 opened Feb 22, 2025 by shuoyangd Loading…
1 of 3 tasks
Add a way to pass expected language to FastTextLangId filter
#565 opened Feb 21, 2025 by shuoyangd Loading…
2 of 3 tasks
Remove minhash conditional for 25.02
#558 opened Feb 18, 2025 by praateekmahajan Loading…
3 tasks
Create FastText classifier module
#546 opened Feb 13, 2025 by sarahyurick Draft
Hard negative mining for Retriever fine-tuning
#523 opened Feb 5, 2025 by vinay-raman Loading…
3 tasks done
Added LookUp error handling during encoding detection.
#502 opened Jan 30, 2025 by ggcr Loading…
Clean up Pandas, cuDF, Dask, and Dask-cuDF DocumentDataset type logic gpuci Run GPU CI/CD on PR
#494 opened Jan 23, 2025 by sarahyurick Loading…
Standardize text_field and id_field terminology gpuci Run GPU CI/CD on PR
#485 opened Jan 17, 2025 by sarahyurick Loading…
Add nemo-toolkit dependency to gpuCI gpuci Run GPU CI/CD on PR
#480 opened Jan 10, 2025 by sarahyurick Loading…
[pre-commit.ci] pre-commit suggestions
#470 opened Jan 7, 2025 by pre-commit-ci bot Loading…
[WIP] Add RAPIDS Nightly to GPU CI gpuci Run GPU CI/CD on PR
#436 opened Dec 17, 2024 by praateekmahajan Draft
3 tasks
Updating the Quick Example
#432 opened Dec 16, 2024 by stsfaroz Loading…
ProTip! Updated in the last three days: updated:>2025-03-21.