Skip to content

This course will teach you about Natural Language Processing (NLP) using libraries from the Hugging Face 🤗 ecosystem (Transformers, Tokenizers, Datasets etc.)

Notifications You must be signed in to change notification settings

ANYANTUDRE/NLP-Course-Hugging-Face

Repository files navigation

NLP Course - Hugging Face 🤗

🕸 LinkedIn • 📙 Kaggle • 💻 Medium Blog • 🤗 Hugging Face


This repository contains a shorter version of the NLP Course on Hugging Face. The aim is to have a few notes from the course and the code snippets that seem most important to keep handy in each section. Think of it as a sort of "cheat sheet" to quickly explore the most important concepts in Transformers and Hugging Face ecosystem. If you've already taken the original or similar course, or have some basic knowledge of Transformers, you'll undoubtly find this useful as a quick refresher on the various concepts.

The course covers Natural Language Processing (NLP) using libraries from the Hugging Face 🤗 ecosystem :

  • Transformers,
  • Datasets,
  • Tokenizers, and
  • Accelerate — as well as the Hugging Face Hub.

What is NLP?

NLP is a field of linguistics and Machine Learning focused on understanding everything related to human language.

Common NLP tasks:
  • Classifying sentences and words: sentiment analysis, email spam detection, grammatical components and named entities identification...
  • Generating text content: text auto-generation, filling masked words...
  • Extracting an answer from a text: questions-answers.
  • Generating a new sentence from an input text: translation, summurization...

NLP also tackles complex challenges in speech recognition and computer vision (audio transcription, image description).

Why is it challenging ?

Computers don’t process information in the same way as humans. Humans can easily understand a sentence meaning or determine how similar two sentences are. For machine learning (ML) models, such tasks are more difficult. The text needs to be processed in a way that enables the model to learn from it. And because language is complex, we need to think carefully about how this processing must be done. There has been a lot of research done on how to represent text, and we will look at some methods in the next chapter.

About 🤗 Transformers library

The 🤗 Transformers library was created to provide a single API through which any Transformer model can be loaded, trained, and saved. The library’s main features are:

  • Ease of use: Downloading, loading, and using a state-of-the-art NLP model for inference can be done in just two lines of code.
  • Flexibility: At their core, all models are simple PyTorch nn.Module or TensorFlow tf.keras.Model classes and can be handled like any other models in their respective machine learning (ML) frameworks.
  • Simplicity: Hardly any abstractions are made across the library. The “All in one file” is a core concept: a model’s forward pass is entirely defined in a single file, so that the code itself is understandable and hackable.

Repository structure

The repository directories are organized as follow:

1. Transformer models

Title Description Article Notebook
1. Transformers, what can they do? Look at what Transformer models can do and use our first tool from the 🤗 Transformers library: the pipeline() function. Article Open In
2. How do Transformers work? High-level look at the architecture of Transformer models. Article Open In
3. Encoder, Decoders, Encoder-Decoder models Learn more about Encoder, Decoders, Encoder-Decoder models Article Open In

🛑 Disclaimer ❌:

This is by no means intended to replace the original course. If you're new to Transformers and Hugging Face, it would be best to refer to the latter, or at least have some basic knowledge of Deep Learning, particularly NLP. The aim is to have a few notes from the course and the code snippets that seem most important to me to keep to hand in each part. Like a sort of cheat sheet to refresh your memory on the most important concepts to remember. If you've already taken the original course, or have some basic knowledge of Transformers, you'll no doubt find this useful as a quick refresher on the various concepts.

Original course:

  • License: The original course is released under the permissive Apache 2 license.
  • Citation:
  author = {Hugging Face},
  title = {The Hugging Face Course, 2022},
  howpublished = "\url{https://huggingface.co/course}",
  year = {2022},
  note = "[Online; accessed <today>]"
}

About

This course will teach you about Natural Language Processing (NLP) using libraries from the Hugging Face 🤗 ecosystem (Transformers, Tokenizers, Datasets etc.)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages