Create a set of benchmark dataset #127

Optimox · 2020-06-04T09:39:34Z

Feature request

I created some Research Issues that would be interesting to work on. But it's hard to tell if an idea is a good idea without having a clear benchmark on different dataset.

So it would be great to have a few notebooks that could run on different datasets in order to monitor performances uplift of a new implementation.

What is the expected behavior?
The idea would be to run this for each improvement proposal and see whether it helped or not.

How should this be implemented in your opinion?
This issue could be closed little by little by adding new notebooks that each perform a benchmark on one well known dataset.

Or maybe it's a better a idea to incorporate tabnet to existing benchmarks like Catboost Benchmark : https://github.com/catboost/benchmarks

Are you willing to work on this yourself?
yes of course, but any help would be appreciated!

ddofer · 2020-07-25T10:41:57Z

I can help with this. It'd be best to run this with a testing framework, to allow for the tests or CI to check if changes to the models/code (e.g. defaults or improvements) break/reduce performance

athewsey · 2020-08-12T04:43:43Z

Anecdotally, I recently noticed a drop in accuracy (or maybe convergence speed) on Forest Cover Type when upgrading version of PyTorch... Would be interested to see whether others experience same and understand whether there's some issue that needs addressing or it's just a statistical variation.

Stopping at 200 epochs, observed test accuracy of:

95.584% on PyTorch v1.4.0
88.128% on PyTorch v1.5.1
92.840% on PyTorch v1.6.0

Optimox · 2020-08-12T07:26:45Z

@athewsey thanks for reporting that, it seems quite a lot for just changing the torch version. Have you been experimenting this on the latest release just by changing the pytorch version? I understand that random seeds could change from one version to another, but after 200 epochs there should not be such a gap.

@Hartorn @eduardocarvp did you notice such strong changes when monitoring tabnet scores?

athewsey · 2020-08-12T08:50:24Z

@Optimox those figures were I believe all using develop code as of my recent PR #164. Taking a random 80/10/10 Training/Validation/Test split of Forest Cover Type, and just trying the different PyTorch framework versions via AWS' provided deep learning container images on SageMaker - so all with Python 3.6, Ubuntu 16.04, and (if I interpret the container versioning correctly) CUDA 10.1... But there's a chance there are some small, relevant library differences between them. All the training was run on an ml.p3.2xlarge instance, so backed by 1xV100 GPU.

Appreciate the library versions aren't as controlled as they could be between tests, and will try to re-run on a fully controlled/local env with only PyTorch different if possible - but it's tricky as my current workflow is mostly set up for using those pre-built images. Just thought it was worth mentioning for consideration in this ticket's priority, and as I hadn't seen discussion about cross-version benchmarking/accuracy checks elsewhere on the project.

Optimox · 2020-11-03T17:52:28Z

adding some kaggle links:

Optimox added the enhancement New feature or request label Jun 4, 2020

Optimox assigned Hartorn, Optimox, j-abi and eduardocarvp and unassigned Hartorn, Optimox, j-abi and eduardocarvp Jun 4, 2020

Optimox mentioned this issue Sep 18, 2020

Performance is bad #189

Closed

athewsey mentioned this issue Oct 27, 2020

feat: embedding-aware attention #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a set of benchmark dataset #127

Create a set of benchmark dataset #127

Optimox commented Jun 4, 2020

ddofer commented Jul 25, 2020

athewsey commented Aug 12, 2020

Optimox commented Aug 12, 2020

athewsey commented Aug 12, 2020

Optimox commented Nov 3, 2020

Create a set of benchmark dataset #127

Create a set of benchmark dataset #127

Comments

Optimox commented Jun 4, 2020

Feature request

ddofer commented Jul 25, 2020

athewsey commented Aug 12, 2020

Optimox commented Aug 12, 2020

athewsey commented Aug 12, 2020

Optimox commented Nov 3, 2020