Implement stratified k-fold cross-validation #151

ibnesayeed · 2017-02-15T17:53:12Z

Current k-fold cross-validation assumes that the supplied sample data is uniformly randomized, hence, performs simple slicing of the array for individual folds. We should partition the data in a way that the proportion of various classes are maintained in each fold. This can be the default or the only option or partition or alternatively an optional boolean parameter can be provided for stratification.

Ch4s3 · 2017-02-22T16:06:31Z

I'm open to this, but wouldn't know how to do it.

ibnesayeed · 2017-02-27T23:45:32Z

To enforce this, we will have to first prepare buckets of each class from the supplied sample set and then partition each subset into k equal parts. Finally, pick one chunk from each subset to make data for each of the k sets. It is not difficult to do. I can take care of it when I get a chance to play with the code again. However, for now we are shuffling the sample data before splitting, which would theoretically have the similar effect, except not very precise, depending on the randomness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement stratified k-fold cross-validation #151

Implement stratified k-fold cross-validation #151

ibnesayeed commented Feb 15, 2017

Ch4s3 commented Feb 22, 2017

ibnesayeed commented Feb 27, 2017

Implement stratified k-fold cross-validation #151

Implement stratified k-fold cross-validation #151

Comments

ibnesayeed commented Feb 15, 2017

Ch4s3 commented Feb 22, 2017

ibnesayeed commented Feb 27, 2017