You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project requires an NLP model trained on a poetry dataset, encompassing different languages with a current focus on English and Hindi. The dataset should meet the following constraints:
English short poems.
Hindi short poems.
The datasets must be of high quality, sufficiently large, and diverse.
Ensure that these short poems contain figures of speech.
For longer poems, consider the following contributions:
Provide scripts to preprocess the data and divide lengthy poems into shorter segments.
Provide the processed dataset.
Note:
It's crucial to collect short poems as the ultimate goal of dyPixa is to render the input text on the generated image.
Utilize publicly available datasets whenever possible.
If providing poems not openly available for analysis, ensure proper credit is given to the authors by adding an additional field for author names.
Poems may be the intellectual property of their authors; therefore, obtain authors' consent before uploading data to this repository.
I plan to curate a diverse dataset of poems from online repositories and public domain collections. With a focus on balanced sentiment representation and accurate annotations, I will ensure the dataset's quality and integrity.
That will be nice @Nabanita29. Please try to get the multilingual data and as mentioned, consider the short poems as a priority.
You may join the community's discord server for further discussion, queries, and suggestions!
Note: As the milestone "Dataset Collection" is nearing its deadline, pull requests associated with this issue will be considered a priority. 📅
Description: 📝
This project requires an NLP model trained on a poetry dataset, encompassing different languages with a current focus on English and Hindi. The dataset should meet the following constraints:
For longer poems, consider the following contributions:
Note:
The text was updated successfully, but these errors were encountered: