Deep Learning for Audio (DLA)

Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
The current version of the course is conducted in autumn 2024 at the CS Faculty of HSE.

For previous years versions, see Past Versions section.

Syllabus

week01 Introduction to Course
- Lecture: Introduction to Course
- Seminar: Experiment tracking, Hydra, Git, VS code
- Self-Study: Introduction to PyTorch
week02 Introduction to Digital Signal Processing
- Lecture: Signals, Fourier Transform, spectrograms, MelScale, MFCC
- Seminar: DSP in practice, spectrogram creation, IRF, frequency filtering
week03 Speech Recognition I
- Lecture: Metrics, Datasets, Connectionist Temporal Classification (CTC), Classic Models, Beam Search, Language models
- Seminar: Audio Augmentations, Beam Search
- Q&A Session: Homework discussion, R&D coding tips
week04 Speech Recognition II
- Lecture: LAS, RNN-T, Language models for RNN-T and LAS
- Seminar: Hybrid RNN-T and CTC model training and inference
week05 Guest Lecture. Speech Recognition III and Audio SSL
- Lecture: Self-Supervised Models for Audio, Audio LLMs
week06 Source Separation I
- Lecture: A review of general Source Separation and Denoising, Encoder-Decoder-Separator architectures, Demucs family, DCCRN, FullSubNet+, BandSplitRNN
- Seminar: Metrics
week07 Source Separation II
- Lecture: Speech separation, Blind and Target Separation, Recurrent(TasNet, DPRNN, VoiceFilter) and CNN(ConvTasNet, SpEx+)
- Seminar: WienerFilter, SincFilter and DEMUCS; streaming processing and performance metrics
week08 Audio-Visual Deep Learning
- Lecture: Audio-Visual Fusion, Source Separation, Speech Recognition, and Self-Supervised Models. Wav2Lip and SadTalker (talking face)
- Q&A: Project and Slurm discussion
- Extra Seminar: Create Your Own Intelligent Voice Assistant
week09 Text to Speech (TTS)
- Lecture: Tacotron, DeepVoice, GST, FastSpeech, AdaSpeech, Attention Tricks
- Seminar: postponed
week10 Neural Vocoders
- Lecture: WaveNet, Parallel WaveGAN, WaveGlow, MelGAN, HiFiGAN
- Seminar: FastSpeech I, TTS pipeline: from text to audio
week11 Diffusion-based TTS
- Lecture: Diffusion concept. Diffusion Vocoders and Diffusion acoustic models.
week12 Voice Biometry I
- Lecture: Introduction. Reverberation. CMs for recorded and synthesized speech detection (LCNN, RawNet2, AASIST). GNNs
- Seminar: ASVspoof, Sinc-layer, GNN
week13 Voice Biometry II
- Guest Lecture: Kolmogorov-Arnold Networks (KANs), AASIST3, ASVspoof5
- Lecture: ASV systems. SASV systems. Streaming
week14 AI for Music
- Lecture: Tasks overview, Music Information Retrieval, Music Generation

Homeworks and Projects

HW_ASR Training a speech recognition model
Project_AVSS Training an audio-visual speech separation model
HW_NV Implementation of a TTS model (Neural Vocoder)

See our project template.

Resources

Lecture recordings on YouTube (in russian)

Some of the weeks have English recordings. See the corresponding sub-directories.

Contributors & course staff

Course materials and teaching (in different years) were delivered by:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning for Audio (DLA)

Syllabus

Homeworks and Projects

Resources

Contributors & course staff

Past Versions

About

Releases

Packages

Contributors 9

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
hw1_asr		hw1_asr
hw3_nv		hw3_nv
project_avss		project_avss
week01		week01
week02		week02
week03		week03
week04		week04
week05		week05
week06		week06
week07		week07
week08		week08
week09		week09
week10		week10
week11		week11
week12		week12
week13		week13
week14		week14
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

License

markovka17/dla

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for Audio (DLA)

Syllabus

Homeworks and Projects

Resources

Contributors & course staff

Past Versions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages