Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets
Jahanshahi, M. & Mockus, A.
Accepted in the Second International Workshop on Large Language Models for Code (LLM4Code 2025)
Preprint: https://arxiv.org/abs/2501.02628