Transcripts Generator & Search Engine (TGSE)

An English based audio files to text converter and search engine that ensures that grammar, casing and punctuation are on the spot. An efficient search engine allows users to define a text-based query and play an audio file from the exact location where the query occurs. Among others, it's a unique tool for podcating as it makes feel you are searching through audio files like you do through texts. Currently applied to the Skeptic's Guide to the Universe Podcast.

Functionality

Submit for transcription

Get transcripts

Search

Like that project?

Consider becoming a patreon by clicking https://www.patreon.com/maciejgierada

Contributions

Contributions are highly welcome! There is still a lot of work to be done!

How to run local

TGSA backend is Django based, so to run locally do:

# navigate to path where you will keep the project
cd path_to_install
# clone the repo (if you are planning to contribute, fork the repo and clone it)
git clone https://github.com/mgierada/TGSE.git
# enter the repo's root directory
cd TGSE
# create a virtual environment
python3 -m venv sgu-tse_venv
# activate the environment
source sgu-tse_venv/bin/activate
# upgrade pip
python3 -m pip install --upgrade pip
# install sgu-tse
python3 -m pip install -r requirements.txt
# run local server
python3 manage.py runserver
# open browser at http://127.0.0.1:8000/

REST API

It is not my main goal to have a nice REST API at this moment, however, there are a couple of enpoints you can access. More will come later:

endpoint	feature	method
`episodes/`	get details of all episodes	`GET`
`episodes/<int:episode_number>/`	get details of a given episode	`GET`

Wish List

better design
set up an event listiner to check for new episodes, get detials, submit for transcription, get transcript and populate DB in automated fashion
use timestaps to navigate to the exact moment in the audio file matching the query
better transcripts quality
improved search-engine by implementing a method to search for an almost exact match
refactoring
documentation

Tech Stack

Python
HTML/CSS
JavaScript
Django
PostgreSQL
Selenium
Assemblyai
Haystack
Heroku
CI/CD pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.github/workflows		.github/workflows
media		media
search_for_transcript		search_for_transcript
sgu_search		sgu_search
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt
run_download.py		run_download.py
run_submit.py		run_submit.py
scrape_sgu_website.py		scrape_sgu_website.py
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcripts Generator & Search Engine (TGSE)

Functionality

Submit for transcription

Get transcripts

Search

Like that project?

Contributions

How to run local

REST API

Wish List

Tech Stack

About

Releases

Packages

Contributors 2

Languages

mgierada/TGSE

Folders and files

Latest commit

History

Repository files navigation

Transcripts Generator & Search Engine (TGSE)

Functionality

Submit for transcription

Get transcripts

Search

Like that project?

Contributions

How to run local

REST API

Wish List

Tech Stack

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages