Create or find a smaller dataset of interview question and answer snippets #48

audreyfeldroy · 2023-10-04T14:56:29Z

The current accompanying Kaggle dataset is a bit large for people on low bandwidth connections to download, and a bit large to annotate.

There are some existing speech recognition datasets, but I haven't seen one of questions and answers. Is there one? Or should we create our own?

Search online for a free speech recognition data set containing annotated, transcribed audio samples of questions and answers
If one exists, comment here with what you found
If one doesn't exist, create a tiny starter dataset consisting of 5 brief audio samples (1-3 minutes?) with accompanying transcription and timestamps. To help with this task, you can use slicer.py and transcript.py

Kaggle is one possible place to search for or host our own dataset. Open to other options too.

audreyfeldroy added help wanted Extra attention is needed hacktoberfest-accepted Issue or PR is approved for anyone who wants it to count toward Hacktoberfest high priority Opportunity to contribute something valuable that's urgently needed labels Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create or find a smaller dataset of interview question and answer snippets #48

Create or find a smaller dataset of interview question and answer snippets #48

audreyfeldroy commented Oct 4, 2023

Create or find a smaller dataset of interview question and answer snippets #48

Create or find a smaller dataset of interview question and answer snippets #48

Comments

audreyfeldroy commented Oct 4, 2023