-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: 7: Prediction on the Animal Species #7
Comments
Data analysis review checklistReviewer: @dliviyaConflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1Review Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above. Overall I feel like the machine learning part of the analysis was done very well. The group showed all the figures (I do think they need to describe the figures a bit more though). I was not able to reproduce the analysis so I am assuming there was a missing dependency in the image. I am very impressed with the state of the repository, I feel like it is very easy to navigate and find the information you are looking for. On the downside, there are many spelling mistakes and the formatting of the readme makes it difficult to read. I provided comments under each of the bullet points where I felt it was needed, please refer to those for more information AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: edile47Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1Review Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: anamhira47Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Would reccommend adding a larger breadth of tests to ensure all edge cases are Reproducibility
Analysis report
Estimated hours spent reviewing:1h Review Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above. Really good job on the project, I found the topic to be pretty interesting and found it really interesting how you used a bunch of different machine learning models to see which one was best and then in the end choose based on like an ensemble type strategy. One thing that I would fix is, when running the makefile I do get an error and it does not let me reproduce the file. This is the error that I get and I believe it can be fixed pretty easily. Also another small issue that can be fixed pretty easily, is that when opening the container opens into the root environment rather than the directory that contains all the files. This is a very minor issue, but it is something that can be used to enhance the ease of use. Overall I really found your project interesting, and the formatting of the readME is good, making it so the instructions are really clear and easy to follow. AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: @izk20Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing:2 Hours Review Comments:I found your topic to be extremely interesting and quite educational. The idea of comparing and contrasting 4 different classification models of varying complexity makes for a strong report. You also do a wonderful job of providing background information, the motivation for the topic, and short descriptions of both the models that you are using as well as final results. There are some minor issues that I would like to point out. I was able produce the csv files, tables and the graphs for your models using the makefile. However, I got this error: Another is the redundancy in the scripts. In your KNN script (knn_script.py), you carry out the following preprocessing steps: zoo_data = pd.read_csv(data_loc) feature = zoo_data[["hair", "feathers", "eggs", "milk", "airborne", X = feature y = zoo_data['type'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4) However, the exact same code is used to read, process the data and then split it into training and testing prior to building the SVM and Decision Tree models. I would recommend creating an additional script that carries out the pre-processing first, then outputs training/testing data (or even the data in its state right before the split) so that these files are directly read by your models, the same way it is done in your original analysis notebook, as opposed to reusing large chunks of code in multiple scripts. Another recommendation is also to state why each of those models were chosen, as well as explain a little bit more what the chunks of code are doing. 3 of the models used are quite complex, so it would be helpful for the reader if there is also some text between chunks of code (separate from the comments, of which plenty were provided) to narrate the code a little bit more or explain what the results/steps mean. Overall your repository is complete and the analysis itself is strong. I found your figures and graphs to be readable (printing some tables/figures to the terminal when running the makefile was a nice touch). Great work! AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Submitting authors: @jossiej00 @sasiburi @poddarswakhar @luckyberen
Repository: https://github.com/DSCI-310/DSCI-310-Group-7
Abstract/executive summary:
The data set we used is the Zoo (1990) provided by UCL Machine Learning Repositry. It stores data about 7 classes of animals and their related factors inlcuding animal name, hair, feathers and so on. In this project, we picked classification as our method to classify a given animal to its most likely type. We also used multiple analysis to identify the type of the animals using 16 variables including hair, feathers, eggs, milk, airborne, aquatic, predator, toothed, backbone, breathes, venomous, fins, legs, tail, domestic, and catsize as our predictors. To best predict the class of a new observation, we implemented and compared a list of algorithms including K Nearest Neighbor(KNN), Decision Tree, Support Vector Machine and Logistic Regression. After comparison between accuracy of different methods, we finally find that KNN method produces the most accurate result of predicting animal type.
Editor: @ttimbers
Reviewer: @dliviya @edile47 @anamhira47 @izk20
The text was updated successfully, but these errors were encountered: