This project consists in the application of machine learning models and algorithms related to supervised learning. Our dataset consists in cancer data. The goal is to predict whether a patient has cancer or not. The dataset is composed of 30 attributes and 1 class. The class is: B for benign, M for malignant. They work like a boolean (0 or 1), being 1 an cell with cancer. The dataset is composed of 570 instances (cells). The dataset is available in the file Cancer_Data.csv.
Supervised learning includes the following steps: dataset analysis to check for the need for data pre-processing, identification of the target concept, definition of the training and test sets, selection and parameterization of the learning algorithms to employ, and evaluation of the learning process (in particular on the test set). At least 3 supervised learning (classification) algorithms should be employed (Decision Trees, Neural Networks, K-NN, SVM, ...) but more may be employed and compared using the Scikit-Learn Python library and considering the characteristics of the dataset. Results should be compared using tables or plots (e.g., using Seaborn or Matplotlib libraries).
You can run the program by running the cells present in the Jupyter Notebook developed. The notebook is called cancer_notebook.ipynb and is present in the root directory of the project. You can see the results achived by us just seeing the output present in the notebook. The following libraries were needed: pandas, seaborn, numpy, copy, sklearn, matplotlib, time, tensorflow.
- André Tomás da Cunha Soares - [email protected]
- Diogo Alexandre da Costa Melo Moreira da Fonte - [email protected]
- Jorge Carlos Baptista Duarte - [email protected]