NLP with Python for Machine Learning Essential Training

Course details

With the increased amount of data publicly available and the increased focus on unstructured text data, understanding how to clean, process, and analyze that text data is tremendously valuable. If you have some experience with Python and an interest in natural language processing (NLP), this course can provide you with the knowledge you need to tackle complex problems using machine learning. Instructor Derek Jedamski provides a quick summary of basic natural language processing (NLP) concepts, covers advanced data cleaning and vectorization techniques, and then takes a deep dive into building machine learning classifiers. During this last step, Derek shows how to build two different types of machine learning models, as well as how to evaluate and test variations of those models.

Learning objectives

Explain the definition of an NLP.
Describe the process of tokenizing.
Identify the purpose of vectorizing.
Recognize the outcomes of lemmatizing.
Summarize the characteristics of TF-IDF.
Define accuracy in terms of evaluation metrics.
Recall three benefits of using ensemble methods.

Chapters of the course

Introduction
- Welcome
- What you should know
- What tools do you need?
- Using the exercise files
NLP Basics
- What are NLP and NLTK?
- NLTK setup and overview
- Reading in text data
- Exploring the dataset
- What are regular expressions?
- Learning how to use regular expressions
- Regular expression replacements
- Machine learning pipeline
- Implementation: Removing punctuation
- Implementation: Tokenization
- Implementation: Removing stop words
- Chapter quiz
Supplemental Data Cleaning
- Introducing stemming
- Using stemming
- Introducing lemmatizing
- Using lemmatizing
- Chapter quiz
Vectorizing Raw Data
- Introducing vectorizing
- Count vectorization
- N-gram vectorizing
- Inverse document frequency weighting
- Chapter quiz
Feature Engineering
- Introducing feature engineering
- Feature creation
- Feature evaluation
- Identifying features for transformation
- Box-Cox power transformation
- Chapter quiz
Building Machine Learning Classifiers
- What is machine learning?
- Cross-validation and evaluation metrics
- Introducing random forest
- Building a random forest model
- Random forest with holdout test set
- Random forest model with grid search
- Evaluate random forest model performance
- Introducing gradient boosting
- Gradient-boosting grid search
- Evaluate gradient-boosting model performance
- Model selection: Data prep
- Model selection: Results
- Chapter quiz
Conclusion
- Next steps

Notes about the exercise files:

In some of the notebooks, minor codes have been changed from the original exercise files. This is due to the mismatch in the scikit-learn versions actually used and the one used in the original exercise files (the course was released on Mar. 23, 2018, so the scikit-learn used back then is outdated by today).
The SMS dataset is the same throughout the course, but there is a separate copy per chapter folder to facilitate ease in usage of the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Ch01		Ch01
Ch02		Ch02
Ch03		Ch03
Ch04		Ch04
Ch05		Ch05
CourseImage.png		CourseImage.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP with Python for Machine Learning Essential Training

Course details

Learning objectives

Chapters of the course

Notes about the exercise files:

About

Languages

ajgquional/LiL_NLP-with-Python-for-ML-Essential-Training

Folders and files

Latest commit

History

Repository files navigation

NLP with Python for Machine Learning Essential Training

Course details

Learning objectives

Chapters of the course

Notes about the exercise files:

About

Topics

Resources

Stars

Watchers

Forks

Languages