Skip to content

Latest commit

 

History

History
104 lines (59 loc) · 3.41 KB

README.md

File metadata and controls

104 lines (59 loc) · 3.41 KB

From Zero to Natural Language Processing Hero

Workshop materials for my From Zero to Natural Language Processing Hero Workshop at J on the Beach 2022

Ever wondered how your email identifies spam messages? How does news applications classify news into topics? Or how does Google translate detect the language of a page? In this session we will explore the theory foundations behind natural language processing. Then we will, step by step, build a machine learning solution that automatically classifies text messages.

Preparing your machine

Install Miniforge

Pick the right installer for your machine from this link

Alternatively, you can install Anaconda or Miniconda, if you prefer any of them to miniforge.

Initialize Conda

Run the following command on your terminal (shell, command line):

conda init

Download this repo

Download the code used in this workshop from this repo using the following command:

git clone [email protected]:gr33ndata/jonthebeach.git

If you get an error, try this instead:

git clone https://github.com/gr33ndata/jonthebeach.git

Create a new environment

First, go to the jonthebeach directory:

cd jonthebeach

Let's now create a new environment. Good practice, to keep your computer clean:

conda create --name jotb python=3.9

et's now activate the environment we have just created:

conda activate jotb

Install the needed requirements:

pip install --upgrade -r requirements.txt

Optional download

Ignore the next line if the connection is too slow:

python -m spacy download en_core_web_lg

Start notebook

jupyter notebook

Resources

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Join Adyen

Adyen payment platform

Adyen is always looking for clever people like you.

Check our open positions here

About me

Tarek Amr is a Senior Machine Learning Scientist at Adyen. Before Adyen, Tarek worked for marketplaces such as Catawiki and TicketSwap, where he built end-to-end data solutions and predictive models. Tarek has published a couple of books on machine learning and data visualization. His most recent book is “Hands-On Machine Learning with Scikit-Learn and Scientific Python Toolkits”. Tarek holds an MSc. in Computer Science from the University of East Anglia, with focus on algorithms, information retrieval and natural language processing.

Tarek Amr @gr33ndata