Text-summarization

Introduction

Text summarization can be defined as the process of extracting important sentences a large document and combining them into a short version of the document. Text summarization can be split into two types of summaries: extractive and abstractive :

Abstractive => which ai system tries to write the summary of a given text just like a human would do " first read the input text and try to write a summary".
Extractive => where we try to learn the system that learns about important sentence in the input text and just extract them "Try to achieve a model that make a text summarization as human ".

Summarizing a text can be both time-efficient and cost-efficient.

Time-efficient => means the human reader can spend less time reading a document by reading a summarized version
Cost-efficient => means summarizing can be used for compressing the amount of textual data being transferred from a device to another device.

Main Goal

The goal of classification is to classify the dataset before summarizing and test the summarized text on the trained models to see the accuracy increase or dicrease the same aspect for clustering, vlustering before and after and see the number of clusters

Main topics covered in methodology

Dataset
- Integration
  - BBC news (business,entertainment,politics,sport,tech)
  - Covid 19 research papers
- Visualization
- Pre-processing
  - Lemmatization lower
  - remove stop words
  - remove unwanted chars (/n/n)
  - remove punctuation, HTML and URL
Modeling
- LSA
- T5
- PEGASUS
- Clustring
- Classification (Linear SVM, MultiNomialNB, Logistic Regression)
Evaluation

Conclusion

Proposed a summarization system that use different techniques to summarize news + medical articles.
Used the summarized articles to do a classification job with different classification algorithms and compared its performance against the original articles.
The results shows high performance from the summarized articles with little difference against the original one, which proves that we can rely on them as a source of information.

contact

https://www.linkedin.com/in/%D9%90%D9%90alaa-elkhashap/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Text-summarization

Introduction

Main Goal

Main topics covered in methodology

Conclusion

contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Text-summarization

Introduction

Main Goal

Main topics covered in methodology

Conclusion

contact