[Dataset] Poetry dataset is required #26

ravi-prakash1907 · 2023-10-02T13:12:04Z

Description: 📝

This project requires an NLP model trained on a poetry dataset, encompassing different languages with a current focus on English and Hindi. The dataset should meet the following constraints:

English short poems.
Hindi short poems.
The datasets must be of high quality, sufficiently large, and diverse.
Ensure that these short poems contain figures of speech.

For longer poems, consider the following contributions:

Provide scripts to preprocess the data and divide lengthy poems into shorter segments.
Provide the processed dataset.

Note:

It's crucial to collect short poems as the ultimate goal of dyPixa is to render the input text on the generated image.
Utilize publicly available datasets whenever possible.
If providing poems not openly available for analysis, ensure proper credit is given to the authors by adding an additional field for author names.
Poems may be the intellectual property of their authors; therefore, obtain authors' consent before uploading data to this repository.
For further dataset requirements, refer to Dataset: Collection and Description #6 for additional details.

Nabanita29 · 2023-10-19T05:57:12Z

I plan to curate a diverse dataset of poems from online repositories and public domain collections. With a focus on balanced sentiment representation and accurate annotations, I will ensure the dataset's quality and integrity.

ravi-prakash1907 · 2023-10-19T07:05:40Z

That will be nice @Nabanita29. Please try to get the multilingual data and as mentioned, consider the short poems as a priority.
You may join the community's discord server for further discussion, queries, and suggestions!

Note: As the milestone "Dataset Collection" is nearing its deadline, pull requests associated with this issue will be considered a priority. 📅

ravi-prakash1907 assigned ravi-prakash1907 and Team-thedatatribune and unassigned ravi-prakash1907 and Team-thedatatribune Oct 2, 2023

ravi-prakash1907 mentioned this issue Oct 2, 2023

[NLP] Create an NLP Model #27

Open

ravi-prakash1907 added good first issue Good for newcomers help wanted Extra attention is needed hacktoberfest labels Oct 2, 2023

ravi-prakash1907 mentioned this issue Oct 2, 2023

ML Model #4

Open

ravi-prakash1907 added the hacktoberfest-accepted label Oct 4, 2023

ravi-prakash1907 added this to the Dataset Collection milestone Oct 16, 2023

ravi-prakash1907 pinned this issue Oct 16, 2023

ravi-prakash1907 assigned Nabanita29 Oct 19, 2023

ravi-prakash1907 removed hacktoberfest labels Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dataset] Poetry dataset is required #26

[Dataset] Poetry dataset is required #26

ravi-prakash1907 commented Oct 2, 2023

Nabanita29 commented Oct 19, 2023

ravi-prakash1907 commented Oct 19, 2023

[Dataset] Poetry dataset is required #26

[Dataset] Poetry dataset is required #26

Comments

ravi-prakash1907 commented Oct 2, 2023

Description: 📝

Nabanita29 commented Oct 19, 2023

ravi-prakash1907 commented Oct 19, 2023