This repository contains code for the assignments of an introductory course to Data Science taught at RWTH Aachen University in Winter 22. The working datasets are a mix of synthetic and real-life data. Used technologies include pandas
, sklearn
, matplotlib
, mlextend
, nltk
, gensim
, pm4py
, Docker
, Hadoop MapReduce
, and shell scripts.
For the datasets or reproducibility of the notebooks, please contact one of the contributors.
Contributors:
- Minh-Nghia Phan ([email protected])
- Quang Truong ([email protected])
- Khue Hoang ([email protected])
- Van Dao ([email protected])