This is the homepage for the AI4ALL 2019 NLP research project. Here you can find links to all class materials used for the research project.
2018 Instructors: Rob Voigt ([email protected]), Bingbin Liu ([email protected])
2019 Instructors: Lucy Li ([email protected]) & Christina Yuan ([email protected])
- Week 1 - Thurs / Lesson 0: Introduction to NLP
- Week 1 - Fri / Lesson 1: Rule-based classifiers (Python cheat sheet)
- Week 2 - Tues / Lesson 2: Evaluation metrics (Exercise sheet here)
- Week 2 - Wed / Lesson 3: Probability theory and Bayes rule (Exercise sheet here)
- Week 2 - Thurs / Lesson 4: Naive Bayes classifier
- Week 2 - Fri / Lesson 5: More NLP
- Week 3 - Mon / Lesson 6: Naive Bayes classifier for Twitter project
- Week 3 - Tues / Lesson 7: Neural Networks
- Lesson 0: Data exploration spreadsheet
- Our NLP playground has interactive material to peruse for fun
- Lecture on text processing (e.g. regular expression, tokenization, lemmatization/stemming) from Stanford CS 124 by Professor Dan Jurafsky
- Unix for Poets has more details on text processing
- Python cheat sheet: feel free put comments / things you'd like to know about in the slides!
- Naive Bayes cheat sheet
- Latex to make our slides / poster pretty
- Next Steps: Resources for after AI4ALL
We will go through this together on June 27, but feel free to start on your own! :)
-
Check if Anaconda is installed, or install Anaconda.
conda
If you get
-bash: conda: command not found
, you don't have Anaconda yet!Anaconda is a python distribution that makes it really easy to install additional python packages and manage different Python versions. You can download Anaconda from https://www.anaconda.com/download/. Make sure to download the Python 3.6 version! This should also automatically install Jupyter notebook, which you'll need to run the notebooks.
-
Install numpy and nltk:
Open a Terminal window and type
conda install nltk numpy pandas
-
Copy ("clone") the GitHub repository to your computer:
Open a Terminal window and type
git clone https://github.com/lucy3/AI4ALL2019
This will copy all the notebooks to your computer.
-
Change into the directory:
In the same Terminal window, type
cd AI4ALL2019
-
Download the tokenizer models:
Start a Python console by typing
python
in the Terminal window. Then run the following commands:import nltk nltk.download("punkt") exit()
-
Run the jupyter notebook:
jupyter notebook
The directory filled
contains versions of the iPython notebooks with the solutions filled in, which will be released at the end of each day. If you would like to run these, you need to copy them to the main directory (i.e. AI4ALL2019
), overwriting the blank versions of the notebooks that are currently there. Then run jupyter notebook
and you should be able to access the completed versions of the notebooks.