Skip to content

shrprabh/Breast-Cancer-Prediction-System-Utilizing-Machine-Learning-Algorithms-Group-15

Repository files navigation

Breast Cancer Prediction with Machine Learning Algorithms

Introduction

Breast cancer is one of the most common forms of cancer among women worldwide. Early detection and accurate diagnosis of breast cancer are crucial for effective treatment and recovery. In this project, my aim is to develop a machine learning model that can predict the diagnosis of breast cancer using different algorithms. The project is done in Google Colab using libraries such as:

  • numpy==1.26.4
  • pandas==2.2.2
  • matplotlib==3.8.4
  • seaborn==0.13.2

python version 3.12.4

Visualizations

Table of Contents

Dataset Summary

The dataset contains 569 samples of breast cancer tumors and 30 features. The dataset is divided into two classes: malignant (cancerous) and benign (non-cancerous) tumors. The dataset is loaded into the program using the pandas library.

Exploring the Data

To explore the data further, we use the seaborn library to create a heatmap that shows the correlation between the different features. The heatmap shows that some features, such as the radius of the tumor, are highly correlated with the diagnosis of the tumor.

Relationship between Features and Diagnosis

The dataset contains 30 features, including the diagnosis of the tumor malignant or benign. To understand the relationship between the different features and the diagnosis of the tumor, we use a bar plot to analyze the data and compare the distribution of malignant and benign tumors. The bar plot shows that some features, such as the radius of the tumor, are more indicative of malignant tumors than benign tumors.

Splitting the Dataset

To ensure that the model is accurate, we split the dataset into training and test sets. The training set is used to train the model, while the test set is used to evaluate the model's performance.

Machine Learning Models

In this project, we use four machine learning algorithms to predict the diagnosis of breast cancer:

  • Logistic Regression
  • Decision Tree Classifier
  • Random Forest Classifier
  • Support Vector Classifier

The results of this project show that all four machine learning algorithms used for predicting the diagnosis of breast cancer, namely Logistic Regression, Decision Tree Classifier, Support Vector Classifier, and Random Forest Classifier, have a similar accuracy of 0.982456. However, when it comes to the Random Forest Classifier Method, its accuracy is slightly lower with 0.959064.

Comparison of Algorithm Accuracies

To compare the accuracy of the different algorithms, we use the test set to evaluate the model's performance. The results show that the Logistic Regression has the highest accuracy, followed by the Decision Tree Classifier, the Support Vector Classifier, and the Random Forest Classifier.

Conclusion

In this project, we developed a machine learning model that can predict the diagnosis of breast cancer using different algorithms. The model was trained on the Wisconsin Diagnosis Breast Cancer (WDBC) dataset and tested on a test set. The project demonstrates the potential of machine learning in the early detection and accurate diagnosis of breast cancer. However, it is important to note that the results of this project are based on a specific dataset and may not be generalizable to other populations. Overall, this project highlights the importance of utilizing advanced machine learning techniques in the field of medical diagnosis, and the potential for these techniques to improve the quality of care for patients with breast cancer.

Data Attribution and Credit

This project uses the Wisconsin Diagn Breast Cancer (WDBC) dataset, which was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. I would like to extend my gratitude to Dr. Wolberg for providing access to this valuable dataset, which has made this research possible.

Team Members

Name Email ID
Rahul Chowdary Maddineni [email protected] R11976773
Sathwik Tatiparthi [email protected] R11976772
Teja Sri Dharma Reddy Vanukuri [email protected] R11974522
Shreyas Prabhakar [email protected] R11894057
Muhammad Talha Jabbar [email protected] R11914715

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published