You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Machine Learning the program is not able to do any changes in the available dataset and is only able to predict on given situation
Deep Learning learns from dataset and do self changes in the data to learn,adjust and predict better
Adjusting their input as their need and outcome by changing the weights of dataset, forms a model by self adjustment until a perfect model is not achieved is a part of Deep Learning
Branches of DL
NLP -> Neuro-linguistic Programming
Lexican
Semantic
NN - > Neural Network
ANN
CNN
RNN
Standford University is majorly working over Text processing, Summarizing,Text sentiment, or Opinion Mining or Find reviews/recommendation
Popular NLP modules are :
NLTK -> Natural Language Tool Kit
Textblob
spacy
NLTK
To install
pip3 install NLTK textblob spacy
download all the data from standford (3.5G)
Web Scraping
Requirements
* website is hosted somewhere
* web crawler and web spyder
* most powerful crawler is google bot , it moves to every website in every 2 hr to refresh its index table
practice site for web scraping
* Wiki
* Britanika
* php.net
Parser
* Use BeautifulSoup(https://www.crummy.com/software/BeautifulSoup/bs4/doc/), to install
```
pip3 install bs4
```
* to install parser
```
apt-get install python-html5lib
or
pip3 install html5lib
```
*
Tokenization
DATA can be divided into 3 categories :
sentence
words
character
this concept is based on making a list by seperating the words or sentences
Stemming
when there is a huge amount of data and need to find a specific keyword in it to be able to predict the type of text either mail or spam
just make a decision based on certain keyword search
Lemmatization
it creates a meaningful word from a collection of words
Used for sentimental analysis
Task
Take picture of a newspaper and extract text + apply nltk + remove stop words and plot top 10 frequency graph(cv2 + )
make word from a sentence
apply naive bayes while reading mail from program
find a speech of APJ abdul kalam, scrape it , tokenize it + remove stopwords and apply stemming then replace the word and save it in a new file.