Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Release Timeline #1

Open
iNeil77 opened this issue Jan 29, 2025 · 9 comments
Open

Dataset Release Timeline #1

iNeil77 opened this issue Jan 29, 2025 · 9 comments

Comments

@iNeil77
Copy link

iNeil77 commented Jan 29, 2025

Hello to the authors!

I was reading the ProSec paper and was excited by the direct use of CWE descriptions to induce vulnerabilities in the LLM generations. I would like to ask about the authors' timelines for releasing the secure-vulnerable code pairs data and the synthesized instructions as mentioned in the paper.

Many Thanks

@XZ-X
Copy link
Member

XZ-X commented Jan 29, 2025

Thank you for your interest! We are currently working on an updated version of the dataset.

We plan to release the version by next week.

Please feel free to let us know if you have further questions!

@iNeil77
Copy link
Author

iNeil77 commented Jan 31, 2025

Thanks for getting back. I eagerly await the data release!

@XZ-X
Copy link
Member

XZ-X commented Feb 7, 2025

Thank you again for your interest!

We released the first version of our vulnerability-inducing instruction dataset at Hugging Face🤗.

We plan to release the model-specific code pairs and the aligned models in the following one or two weeks.

Please definitely let us know if you have questions!

@iNeil77
Copy link
Author

iNeil77 commented Feb 11, 2025

Thanks for getting back! I eagerly await the code pairs dataset.

@Robin-Pwner
Copy link

Hello to the authors!
Is there any update about the timelines for releasing the model-specific code pairs?
And could you provide the synthesized instructions first so that we could generate the model-specific code pairs ourselves?
Many Thanks

@XZ-X
Copy link
Member

XZ-X commented Mar 5, 2025

Hello to the authors! Is there any update about the timelines for releasing the model-specific code pairs? And could you provide the synthesized instructions first so that we could generate the model-specific code pairs ourselves? Many Thanks

Hello Robin, Thank you for your interest!

Sorry for the late response. We will release the model-specific code pairs today.

We already released the synthesized instructions at Hugging Face🤗. Please feel encouraged to give it a try!

Let us know if you have further questions!

@XZ-X
Copy link
Member

XZ-X commented Mar 7, 2025

We have updated the model-specific code pairs for phi3-mini-4k-inst and codellama-7b-inst .

Feel free to ping us if you have further questions!

@Robin-Pwner
Copy link

Thanks a lot!

@Robin-Pwner
Copy link

Hi, the authors.
I am working on reproducing Prosec with the released code pairs and I have a question about the test dataset.
In the paper, you select 38 ⟨language, CWE⟩s from PurpleLlama that are overlapped with SafeCoder as the test dataset. Could you provide a detailed list about the selected 38 ⟨language, CWE⟩s?
Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants