-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collecting feature requests around a developmental feature for RAMP #250
Comments
|
Adding on feature that I would be useful, at least to me: it would be great to have the ability to import more code from elsewhere in a submission, allowing multiple submissions to share some code. Now it can be done by creating a library and importing it, which is a bit tedious. |
Well it is more like "this makes me think of |
1- I find the step of reading data is taking too much time: slower than reading it without RAMP. |
Here are some features that could help:
|
From my (little) experience with RAMP, what made people a bit reluctant to use it was that it was too high level. Mearning that we don't see the classical sequential process we are used to see in a ML script (load data, instantiate model, train it, test it). As an example, Keras (not the same purpose as RAMP) embedded some part of the script to minimize the main script but kept the overall spirit of the classical script making it as understandable as the original one. Using ramp-test in command line may make RAMP more obscure to new users. Maybe that having a small script (as the one already in the documentation for example) giving the user a more pythonic way to play with it, without having to use ramp-test as a command line, could make machine learners more willing to use it. |
I have heard this many times too. Debugging is a pain etc. To fix this now
I stick to RAMP kits
where you need to return a sklearn estimator that implements fit and
predict so you can replace
ramp-test by sklearn cross_val_score and just use your favorite env to
inspect / debug / run (vscode,
notebook, google colab etc.)
… |
Calling
This page https://paris-saclay-cds.github.io/ramp-docs/ramp-workflow/advanced/scoring.html now contains two code snippets that you can use to call lower-level elements of the workflow and emulate a simple train/test and cross-validation loop. @LudoHackathon do you have a suggestion what else would be useful? E.g. an example notebook in the library? |
the doc says:
trained_workflow = problem.workflow.train_submission(
'submissions/starting_kit', X_train, y_train)
after all these years I did not know this :'(
this should be explained in the kits to save some pain to students
|
wasn't this the purpose of the "Working in the notebook" section of the old titanic notebook starting kit? |
Yes, @albertcthomas is right, but the snippet in the doc is cleaner now. I'm doing this decomposition in every kit now, see for example line 36 here https://github.com/ramp-kits/optical_network_modelling/blob/master/optical_network_modelling_starting_kit.ipynb. This snippet is even simpler than in the doc but less general, only works when the Predictions class does nothing with the input numpy array, which is most of the time (regression and classification). Feel free to reuse. |
The page is doing a good job at showing how you can call the different elements (and thus play with them, doing plots....)
from rampwf.utils import assert_submission
assert_submission(submission='starting_kit') we could have something like from rampwf import ramp_test
ramp_test(submission='starting_kit')
For debugging with the command line I have to say that I rely a lot on adding a |
this is an important point. 2 or 3 years ago I was rarely using the command-line and I always preferred staying in a python environment. Users should be able to use their favorite tool to play with their models and we should make sure that at the end it will work when calling |
|
is this for 4. and |
doing:
import imp feature_extractor = imp.load_source( '',
'submissions/starting_kit/feature_extractor.py') fe = feature_extractor.
FeatureExtractor() classifier = imp.load_source( '',
'submissions/starting_kit/classifier.py') clf = classifier.Classifier()
is to me too complex and should be avoided. We have a way suggested by @kegl
based on the ramwf function.
now I agree with @albertcthomas leaving the notebooks to edit python files
is a bit error prone.
what I have shown to students is to use the %%file magic to write a cell to
the file on the disk.
anyway I think we should show in each notebook what is the easy way.
ramp-test command is an easy for us to know that it works on their systems
but not the more
agile way when they need to come up with their own solution.
|
I'm not sure what you mean here. We're using |
I copied these lines from the titanic starting kit which is used to get
student started on RAMP.
… |
yes |
Another feature that would be nice to have : have an option to separate what is saved and what is printed to the console. |
Partial fit for models where eg. number of trees or number of epochs is a hyper. This would be mainly a feature used by hyperopt (killing trainings early) but maybe also useful as CLI param. |
Standardized latex tables computed out of saved scores. Probably two steps: first create all scores (of selected submissions and data labels) into a well-designed pandas table. Then a set of tools to create latex tables, scores with CI and also paired tests. I especially like the plots and score presentation in https://link.springer.com/article/10.1007/s10994-018-5724-2. |
would be great to have a look at MLflow, @agramfort pointed it out to me. There are some parts that we could use, for instance the tracking one |
|
When RAMP is used for developing models for a problem, we may want to tag certain versions of a submission, and even
problem.py
, together with the scores. One idea is to use git tags. For example, after runningramp-test ... --save-output
, one could run another script that git adds problem.py, the submission files, and the scores intraining_output/fold_<i>
, commit and tag with a user-defined tag (plus maybe a prefix indicating that it is a scoring tag, so later we may automatically search for all such tags).The text was updated successfully, but these errors were encountered: