Implement Logistic Regression Model for Amazon Order Status Prediction #1266
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Have you read the Contributing Guidelines ?
Yes
Description
Description:
This pull request includes the implementation of a machine learning pipeline to predict the status of Amazon orders using logistic regression. The main changes and additions are as follows:
Loaded the Amazon order dataset using pandas.
Displayed the first few rows to understand the dataset structure.
2. Data Preprocessing:
Identified categorical columns and converted them to the 'category' data type.
Implemented a pipeline to handle missing values and one-hot encode categorical features.
Standardized numerical features using a separate pipeline.
3. Feature Engineering:
Defined the feature set (X) by excluding the target variable (Status).
Split the dataset into training and testing sets (80% train, 20% test).
4. Model Building:
Created a preprocessing pipeline combining categorical and numerical transformers.
Implemented a logistic regression model with an increased iteration limit (max_iter=10000).
Combined the preprocessing pipeline and logistic regression model into a single pipeline.
5. Model Training and Evaluation:
Trained the logistic regression model on the training set.
Predicted the order statuses on the test set.
Evaluated the model using accuracy score, classification report, and confusion matrix.
6. Cross-Validation:
Implemented K-Fold cross-validation (5 folds) to assess the model's robustness.
Calculated and reported cross-validation scores and mean accuracy.
Results:
The model achieved an accuracy of 97.22% on the test set.
Detailed classification report and confusion matrix provided insights into the model's performance across different statuses.
Cross-validation confirmed the model's robustness with a mean accuracy of 97.21%.
Future Enhancements:
Explore other machine learning algorithms to further improve accuracy.
Incorporate additional features or data sources.
Implement advanced techniques to handle imbalanced classes.
Files Modified:
Added the script for loading, preprocessing, training, and evaluating the model.
Please review the changes and provide feedback or approval for merging into the main branch.
Fixes #1229
Checklist
README.md
and link to my code.Related Issues or Pull Requests
Fixes #1229