Breast Cancer Prediction with Machine Learning Algorithms

Introduction

Breast cancer is one of the most common forms of cancer among women worldwide. Early detection and accurate diagnosis of breast cancer are crucial for effective treatment and recovery. In this project, my aim is to develop a machine learning model that can predict the diagnosis of breast cancer using different algorithms. The project is done in Google Colab using libraries such as:

numpy==1.26.4
pandas==2.2.2
matplotlib==3.8.4
seaborn==0.13.2

python version 3.12.4

Dataset Summary

The dataset contains 569 samples of breast cancer tumors and 30 features. The dataset is divided into two classes: malignant (cancerous) and benign (non-cancerous) tumors. The dataset is loaded into the program using the pandas library.

Exploring the Data

To explore the data further, we use the seaborn library to create a heatmap that shows the correlation between the different features. The heatmap shows that some features, such as the radius of the tumor, are highly correlated with the diagnosis of the tumor.

Relationship between Features and Diagnosis

The dataset contains 30 features, including the diagnosis of the tumor malignant or benign. To understand the relationship between the different features and the diagnosis of the tumor, we use a bar plot to analyze the data and compare the distribution of malignant and benign tumors. The bar plot shows that some features, such as the radius of the tumor, are more indicative of malignant tumors than benign tumors.

Splitting the Dataset

To ensure that the model is accurate, we split the dataset into training and test sets. The training set is used to train the model, while the test set is used to evaluate the model's performance.

Machine Learning Models

In this project, we use four machine learning algorithms to predict the diagnosis of breast cancer:

Logistic Regression
Decision Tree Classifier
Random Forest Classifier
Support Vector Classifier

The results of this project show that all four machine learning algorithms used for predicting the diagnosis of breast cancer, namely Logistic Regression, Decision Tree Classifier, Support Vector Classifier, and Random Forest Classifier, have a similar accuracy of 0.982456. However, when it comes to the Random Forest Classifier Method, its accuracy is slightly lower with 0.959064.

Comparison of Algorithm Accuracies

To compare the accuracy of the different algorithms, we use the test set to evaluate the model's performance. The results show that the Logistic Regression has the highest accuracy, followed by the Decision Tree Classifier, the Support Vector Classifier, and the Random Forest Classifier.

Conclusion

In this project, we developed a machine learning model that can predict the diagnosis of breast cancer using different algorithms. The model was trained on the Wisconsin Diagnosis Breast Cancer (WDBC) dataset and tested on a test set. The project demonstrates the potential of machine learning in the early detection and accurate diagnosis of breast cancer. However, it is important to note that the results of this project are based on a specific dataset and may not be generalizable to other populations. Overall, this project highlights the importance of utilizing advanced machine learning techniques in the field of medical diagnosis, and the potential for these techniques to improve the quality of care for patients with breast cancer.

Data Attribution and Credit

This project uses the Wisconsin Diagn Breast Cancer (WDBC) dataset, which was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. I would like to extend my gratitude to Dr. Wolberg for providing access to this valuable dataset, which has made this research possible.

Team Members

Name	Email	ID
Rahul Chowdary Maddineni	[email protected]	R11976773
Sathwik Tatiparthi	[email protected]	R11976772
Teja Sri Dharma Reddy	Vanukuri [email protected]	R11974522
Shreyas Prabhakar	[email protected]	R11894057
Muhammad Talha Jabbar	[email protected]	R11914715

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Images		Images
Analysis of algorithm group 15 presentation slides.pdf		Analysis of algorithm group 15 presentation slides.pdf
Breast_Cancer_Prediction_Project.ipynb		Breast_Cancer_Prediction_Project.ipynb
Breast_Cancer_Prediction_System_Utilizing_Machine_Learning_Algorithms.pdf		Breast_Cancer_Prediction_System_Utilizing_Machine_Learning_Algorithms.pdf
Dataset.csv		Dataset.csv
Improved.ipynb		Improved.ipynb
LICENSE		LICENSE
Proposal Template.pdf		Proposal Template.pdf
README.md		README.md
group_15_requirements.txt		group_15_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breast Cancer Prediction with Machine Learning Algorithms

Introduction

python version 3.12.4

Table of Contents

Dataset Summary

Exploring the Data

Relationship between Features and Diagnosis

Splitting the Dataset

Machine Learning Models

Comparison of Algorithm Accuracies

Conclusion

Data Attribution and Credit

Team Members

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

shrprabh/Breast-Cancer-Prediction-System-Utilizing-Machine-Learning-Algorithms-Group-15

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Prediction with Machine Learning Algorithms

Introduction

python version 3.12.4

Table of Contents

Dataset Summary

Exploring the Data

Relationship between Features and Diagnosis

Splitting the Dataset

Machine Learning Models

Comparison of Algorithm Accuracies

Conclusion

Data Attribution and Credit

Team Members

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages