An English based audio files to text converter and search engine that ensures that grammar, casing and punctuation are on the spot. An efficient search engine allows users to define a text-based query and play an audio file from the exact location where the query occurs. Among others, it's a unique tool for podcating as it makes feel you are searching through audio files like you do through texts. Currently applied to the Skeptic's Guide to the Universe Podcast.
Consider becoming a patreon by clicking https://www.patreon.com/maciejgierada
Contributions are highly welcome! There is still a lot of work to be done!
TGSA backend is Django based, so to run locally do:
# navigate to path where you will keep the project
cd path_to_install
# clone the repo (if you are planning to contribute, fork the repo and clone it)
git clone https://github.com/mgierada/TGSE.git
# enter the repo's root directory
cd TGSE
# create a virtual environment
python3 -m venv sgu-tse_venv
# activate the environment
source sgu-tse_venv/bin/activate
# upgrade pip
python3 -m pip install --upgrade pip
# install sgu-tse
python3 -m pip install -r requirements.txt
# run local server
python3 manage.py runserver
# open browser at http://127.0.0.1:8000/
It is not my main goal to have a nice REST API at this moment, however, there are a couple of enpoints you can access. More will come later:
endpoint | feature | method |
---|---|---|
episodes/ |
get details of all episodes | GET |
episodes/<int:episode_number>/ |
get details of a given episode | GET |
- better design
- set up an event listiner to check for new episodes, get detials, submit for transcription, get transcript and populate DB in automated fashion
- use timestaps to navigate to the exact moment in the audio file matching the query
- better transcripts quality
- improved search-engine by implementing a method to search for an almost exact match
- refactoring
- documentation
- Python
- HTML/CSS
- JavaScript
- Django
- PostgreSQL
- Selenium
- Assemblyai
- Haystack
- Heroku
- CI/CD pipelines