Breast cancer is one of the most common forms of cancer among women worldwide. Early detection and accurate diagnosis of breast cancer are crucial for effective treatment and recovery. In this project, my aim is to develop a machine learning model that can predict the diagnosis of breast cancer using different algorithms. The project is done in Google Colab using libraries such as:
numpy==1.26.4
pandas==2.2.2
matplotlib==3.8.4
seaborn==0.13.2
- Introduction
- Dataset Summary
- Exploring the Data
- Relationship between Features and Diagnosis
- Splitting the Dataset
- Machine Learning Models
- Comparison of Algorithm Accuracies
- Conclusion
- Data Attribution and Credit
The dataset contains 569 samples of breast cancer tumors and 30 features. The dataset is divided into two classes: malignant (cancerous) and benign (non-cancerous) tumors. The dataset is loaded into the program using the pandas library.
To explore the data further, we use the seaborn library to create a heatmap that shows the correlation between the different features. The heatmap shows that some features, such as the radius of the tumor, are highly correlated with the diagnosis of the tumor.
The dataset contains 30 features, including the diagnosis of the tumor malignant or benign. To understand the relationship between the different features and the diagnosis of the tumor, we use a bar plot to analyze the data and compare the distribution of malignant and benign tumors. The bar plot shows that some features, such as the radius of the tumor, are more indicative of malignant tumors than benign tumors.
To ensure that the model is accurate, we split the dataset into training and test sets. The training set is used to train the model, while the test set is used to evaluate the model's performance.
In this project, we use four machine learning algorithms to predict the diagnosis of breast cancer:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Support Vector Classifier
The results of this project show that all four machine learning algorithms used for predicting the diagnosis of breast cancer, namely Logistic Regression, Decision Tree Classifier, Support Vector Classifier, and Random Forest Classifier, have a similar accuracy of 0.982456. However, when it comes to the Random Forest Classifier Method, its accuracy is slightly lower with 0.959064.
To compare the accuracy of the different algorithms, we use the test set to evaluate the model's performance. The results show that the Logistic Regression has the highest accuracy, followed by the Decision Tree Classifier, the Support Vector Classifier, and the Random Forest Classifier.
In this project, we developed a machine learning model that can predict the diagnosis of breast cancer using different algorithms. The model was trained on the Wisconsin Diagnosis Breast Cancer (WDBC) dataset and tested on a test set. The project demonstrates the potential of machine learning in the early detection and accurate diagnosis of breast cancer. However, it is important to note that the results of this project are based on a specific dataset and may not be generalizable to other populations. Overall, this project highlights the importance of utilizing advanced machine learning techniques in the field of medical diagnosis, and the potential for these techniques to improve the quality of care for patients with breast cancer.
This project uses the Wisconsin Diagn Breast Cancer (WDBC) dataset, which was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. I would like to extend my gratitude to Dr. Wolberg for providing access to this valuable dataset, which has made this research possible.
Name | ID | |
---|---|---|
Rahul Chowdary Maddineni | [email protected] | R11976773 |
Sathwik Tatiparthi | [email protected] | R11976772 |
Teja Sri Dharma Reddy | Vanukuri [email protected] | R11974522 |
Shreyas Prabhakar | [email protected] | R11894057 |
Muhammad Talha Jabbar | [email protected] | R11914715 |