Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

05-Optimzation: Enhancements and fixes #57

Open
2 of 6 tasks
manojneuro opened this issue Jun 16, 2020 · 0 comments
Open
2 of 6 tasks

05-Optimzation: Enhancements and fixes #57

manojneuro opened this issue Jun 16, 2020 · 0 comments

Comments

@manojneuro
Copy link
Collaborator

manojneuro commented Jun 16, 2020

The following items can be improved in this notebook:

  • The early sections "Recap" and "Dataset" are almost identical, so redundant

  • Exercise 1
    Presumably the expectation is to separate the train/test sets for the classifier and also for the voxel selection. It might be worth emphasizing that using all the data for voxel selection is a common but subtle error. There are probably quite a few good examples in the literature that got past less technical reviewers
    In this example, I consistently get slightly below chance performance. I believe that this is driven by the cross-validation, see:
    Classification based hypothesis testing in neuroscience: Below‐chance level classification rates and overlooked statistical properties of linear parametric classifiers. HBM 2016
    Another subtle example of bias is given in the following by Watts et al 😊 Potholes and Molehills: Bias in the Diagnostic Performance of Diffusion-Tensor Imaging in Concussion. Radiology 2014

  • In 3.1 Grid search
    Strictly, the dependence of the number of combinations on granularity of the grid search is not exponential

  • 3.2 Regularization Example: L2 vs L1
    L1 regularization now requires solver='saga' in LogisticRegression call for L1 penalty. This is probably a change in the default behavior of Scikit Learn

  • 4. Build a Pipeline
    As with 3.1, there seem to be a lot of parameters that give perfect accuracy. Maybe classifying by blocks is too easy, and the number of blocks is relatively low, so big steps in accuracy

  • c_steps = [10e-1, 10e0, 10e1, 10e2] is confusing notation for exponents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant