Skip to content

This project aims to extract articles from Moroccan news websites and process them using some text mining algorithms and techniques

Notifications You must be signed in to change notification settings

Hamid-abdellaoui/BI-text-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BI-text-mining

An academic project that aims to extract and process text from a large amount of articles scrapped from many Moroccan news Websites.

This project is divided into 5 parts, each part is in an independent directory:


Further Details

Scrapping articles (Title, publiction date, Image, Link, Full text...)from Moroccan news websites(BeatifulSoup and requests).

Ressources :

Data was retrieved from the following websites:

Le matin.ma
La vie eco
Challenge.ma

Data structure :

We scrapped the Economy subcategory pages for each news website. for each article we got its :

  • title
  • publication date
  • image
  • link
  • full text.

Apply some text mining methods and algorithms(TF,IDF, NMF, TOPIC MODELING).

  • Texts are pre-treated and cleaned using the basic text processing techniques such as :

    • removing stop words
    • lemmatization
    • stemming
    • tokenization
    • removing punctuation
    • removing numbers
    • and
    • removing special characters.
  • Then we've applied some text mining algorithms such as :

    • TF-IDF
    • NMF
    • Topic Modeling.

Automate the process of scraping, text processing, Datawarehousing and loading Data into Postgresql Database(Airflow, Docker...). The Datapipeline architecture is as follows:

Present results and key measures in a dashboard (Web app with Flask).

Reporting results via a simple dashboard as follows:

5. Mining :

Extract association rules (R and python).



Contrubutors :

About

This project aims to extract articles from Moroccan news websites and process them using some text mining algorithms and techniques

Topics

Resources

Stars

Watchers

Forks