Dataset: Collection and Description #6

Team-thedatatribune · 2021-10-11T19:25:51Z

Dataset Requirements 📦📋

TL; DR 🥱

This issue is one of the great starting point for the beginners in opensource community, here you can:

share the authentic data
contribute with new APIs for collecting the same
even provide (original) scripts (in any preferred language) for data collection and/or preparation like cleaning

Issue Description:

In the context of the dyPixa project, this task revolves around the crucial need to gather and comprehensively document datasets for training and testing the machine learning models. This issue addresses the following key aspects:

Data Collection Scripts: Define a systematic approach and python (preferred) code for sourcing diverse datasets. This may include acquiring text data from social media, product reviews, and news articles, and images with associated sentiments from public image repositories.
Dataset Documentation: Document (in Documentation #7 once uploaded) or raise the concerns related to (existing or required) dataset specifics, source, size, language distribution, and preprocessing. Refer to Issue Documentation #7 for detailed documentation guidelines.
Data Quality Assurance: Ensure dataset integrity and consistency and is taken from authentic sources.
Multilingual Considerations: Explore strategies for multilingual datasets.
Collaboration with Contributors: Engage contributors in dataset sourcing and curation.

Types of Data Needed:

For the NLP and color suggestion models to be highly usable and effective, the following types of data should be considered:

Text Data:
- Social media posts
- Product reviews
- News articles
- Sentiment-labeled text in English and Hindi
- Multilingual text data to enhance language support
Image Data:
- Images with associated sentiment labels
- Diverse images representing a wide range of emotions
- Abstract images showcasing various color combinations

By addressing these components and collecting the appropriate types of data, this issue will lay the foundation for robust machine learning model development and further enhancements in the dyPixa project. Your contributions here will greatly advance the project's capabilities. 🚀🌈

The text was updated successfully, but these errors were encountered:

dharmraj617 · 2023-10-12T13:16:46Z

Hey, I am currently working on ML applications. I have some experience in Data Collection. Please Assign this issue to me.

Addy0000 · 2023-10-15T08:37:52Z

heya, i'd like to work on writing python scripts for collecting data.

Team-thedatatribune · 2023-10-16T02:43:25Z

Hey, I am currently working on ML applications. I have some experience in Data Collection. Please Assign this issue to me.

@dharmraj617, we require a diverse dataset of poetic content gathered from various platforms, including:

Social media platforms such as Twitter.
News editorial sections.
Haiku poetry, and more.

Your assistance in creating this dataset would be greatly appreciated, with the following key considerations in mind:

Each data point (in this case, poems) should be concise, consisting of no more than 3-4 lines.
We are primarily focused on English poems.
Ensure proper data cleaning, such as removing emojis and extraneous characters.

For further discussion and information, please join the dyPixa Discord server. We look forward to your valuable contributions! 🙌

ravi-prakash1907 · 2023-10-16T05:17:45Z

heya, i'd like to work on writing python scripts for collecting data.

@Addy000, we currently have a program (here) that's been trained on go_emotions, capable of classifying any given (English) text into one of 28 different emotions.

Now, we're on an exciting new mission. We need a dataset to generate and recommend colors for each of these sentiments. It would be fantastic if you could contribute by providing:

Images/Thumbnails corresponding to each of the 28 emotions.
The finest color sets corresponding to each emotion (at least 5 for each emotion).

For a detailed description, I recommend visiting issue #58.

You can find the complete list of all 28 emotions at https://huggingface.co/SamLowe/roberta-base-go_emotions. 🎨

I'll assign you the issue if you're interested.

Addy0000 · 2023-10-16T12:40:13Z

@ravi-prakash1907 i went through it, would like to work on it

Team-thedatatribune added enhancement New feature or request good first issue Good for newcomers labels Oct 11, 2021

ravi-prakash1907 added the hacktoberfest label Sep 26, 2023

ravi-prakash1907 changed the title ~~Define the Dataset~~ Dataset: Collection and Description Sep 29, 2023

ravi-prakash1907 mentioned this issue Sep 29, 2023

ML Model #4

Open

ravi-prakash1907 pinned this issue Sep 29, 2023

This was referenced Oct 2, 2023

[Dataset] Poetry dataset is required #26

Open

[NLP] Create an NLP Model #27

Open

[Dataset] Color Extraction using Python #40

Closed

ravi-prakash1907 unpinned this issue Oct 4, 2023

ravi-prakash1907 added the hacktoberfest-accepted label Oct 4, 2023

ravi-prakash1907 assigned dharmraj617 Oct 16, 2023

ravi-prakash1907 removed hacktoberfest labels Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset: Collection and Description #6

Dataset: Collection and Description #6

Team-thedatatribune commented Oct 11, 2021 •

edited by ravi-prakash1907

Loading

dharmraj617 commented Oct 12, 2023

Addy0000 commented Oct 15, 2023

Team-thedatatribune commented Oct 16, 2023

ravi-prakash1907 commented Oct 16, 2023 •

edited

Loading

Addy0000 commented Oct 16, 2023

Dataset: Collection and Description #6

Dataset: Collection and Description #6

Comments

Team-thedatatribune commented Oct 11, 2021 • edited by ravi-prakash1907 Loading

Dataset Requirements 📦📋

TL; DR 🥱

Issue Description:

dharmraj617 commented Oct 12, 2023

Addy0000 commented Oct 15, 2023

Team-thedatatribune commented Oct 16, 2023

ravi-prakash1907 commented Oct 16, 2023 • edited Loading

Addy0000 commented Oct 16, 2023

Team-thedatatribune commented Oct 11, 2021 •

edited by ravi-prakash1907

Loading

ravi-prakash1907 commented Oct 16, 2023 •

edited

Loading