Skip to content

markovka17/dla

Repository files navigation

logo5v1

Deep Learning for Audio (DLA)

  • Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
  • Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
  • The current version of the course is conducted in autumn 2024 at the CS Faculty of HSE.

For previous years versions, see Past Versions section.

Syllabus

  • week01 Introduction to Course

    • Lecture: Introduction to Course
    • Seminar: Experiment tracking, Hydra, Git, VS code
    • Self-Study: Introduction to PyTorch
  • week02 Introduction to Digital Signal Processing

    • Lecture: Signals, Fourier Transform, spectrograms, MelScale, MFCC
    • Seminar: DSP in practice, spectrogram creation, IRF, frequency filtering
  • week03 Speech Recognition I

    • Lecture: Metrics, Datasets, Connectionist Temporal Classification (CTC), Classic Models, Beam Search, Language models
    • Seminar: Audio Augmentations, Beam Search
    • Q&A Session: Homework discussion, R&D coding tips
  • week04 Speech Recognition II

    • Lecture: LAS, RNN-T, Language models for RNN-T and LAS
    • Seminar: Hybrid RNN-T and CTC model training and inference
  • week05 Guest Lecture. Speech Recognition III and Audio SSL

    • Lecture: Self-Supervised Models for Audio, Audio LLMs
  • week06 Source Separation I

    • Lecture: A review of general Source Separation and Denoising, Encoder-Decoder-Separator architectures, Demucs family, DCCRN, FullSubNet+, BandSplitRNN
    • Seminar: Metrics
  • week07 Source Separation II

    • Lecture: Speech separation, Blind and Target Separation, Recurrent(TasNet, DPRNN, VoiceFilter) and CNN(ConvTasNet, SpEx+)
    • Seminar: WienerFilter, SincFilter and DEMUCS; streaming processing and performance metrics
  • week08 Audio-Visual Deep Learning

    • Lecture: Audio-Visual Fusion, Source Separation, Speech Recognition, and Self-Supervised Models. Wav2Lip and SadTalker (talking face)
    • Q&A: Project and Slurm discussion
    • Extra Seminar: Create Your Own Intelligent Voice Assistant
  • week09 Text to Speech (TTS)

    • Lecture: Tacotron, DeepVoice, GST, FastSpeech, AdaSpeech, Attention Tricks
    • Seminar: postponed
  • week10 Neural Vocoders

    • Lecture: WaveNet, Parallel WaveGAN, WaveGlow, MelGAN, HiFiGAN
    • Seminar: FastSpeech I, TTS pipeline: from text to audio
  • week11 Diffusion-based TTS

    • Lecture: Diffusion concept. Diffusion Vocoders and Diffusion acoustic models.
  • week12 Voice Biometry I

    • Lecture: Introduction. Reverberation. CMs for recorded and synthesized speech detection (LCNN, RawNet2, AASIST). GNNs
    • Seminar: ASVspoof, Sinc-layer, GNN
  • week13 Voice Biometry II

    • Guest Lecture: Kolmogorov-Arnold Networks (KANs), AASIST3, ASVspoof5
    • Lecture: ASV systems. SASV systems. Streaming
  • week14 AI for Music

    • Lecture: Tasks overview, Music Information Retrieval, Music Generation

Homeworks and Projects

  • HW_ASR Training a speech recognition model
  • Project_AVSS Training an audio-visual speech separation model
  • HW_NV Implementation of a TTS model (Neural Vocoder)

See our project template.

Resources

Some of the weeks have English recordings. See the corresponding sub-directories.

Contributors & course staff

Course materials and teaching (in different years) were delivered by:

Past Versions