Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Design of EstimatorReport #997

Merged
merged 146 commits into from
Jan 10, 2025
Merged

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Dec 20, 2024

closes #834

Investigate an API for a EstimatorReport.

TODO

  • Metrics
    • handle string metrics has specified in the accessor
    • handle callable metrics
    • handle scikit-learn scorers
    • use efficiently the cache as much as possible
    • add testing for all of those features
    • allow to pass new validation set to functions instead of using the internal validation set
    • add a proper help and rich __repr__
  • Plots
    • add the roc curve display
    • add the precision recall curve display
    • add prediction error display for regressor
    • make proper testing for those displays
    • add a proper __repr__ for those displays
  • Documentation
    • (done for the checked part) add an example to showcase all the different features
    • find a way to show the accessors documentation in the page of EstimatorReport. It could be a bit tricky because they are only defined once the instance created.
      • We need to have a look at the series.rst page from pandas to see how they document this sort of pattern.
    • check the autocompletion: when typing report.metrics.->tab it should provide the autocompetion. edit: having a stub file is actually working. I prefer this than type hints directly in the file.
  • Open questions
    • we use hashing to retrieve external set.
    • use the caching for the external validation set? To make it work we need to compute the hash of potentially big arrays. This might more costly than making the model predict.

Notes

This PR build upon:

@glemaitre glemaitre marked this pull request as draft December 20, 2024 21:10
@glemaitre glemaitre changed the title feat: design of ModelReport feat: Design of ModelReport Dec 20, 2024
skore/pyproject.toml Outdated Show resolved Hide resolved
@@ -1,9 +1,11 @@
"""Enhance `sklearn` functions."""

from skore.sklearn._estimator import EstimatorReport
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disturbing that you want to expose something from a private/protected module.
Shouldn't skore.sklearn.estimator be exposed too by removing _?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I want the user to be able to do

skore.EstimatorReport

or

skore.sklean.EstimatorReport

but I don't want to expose in a lower level. In scikit-learn (and other package), whenever you don't want people to import from the private module, you add an _ even if it is a folder.

For instance, I would probably to the same for cross_validation.

However, it is something that we can discuss later.

skore/tests/conftest.py Outdated Show resolved Hide resolved
skore/tests/conftest.py Outdated Show resolved Hide resolved
"""Setup and teardown fixture for matplotlib.

This fixture checks if we can import matplotlib. If not, the tests will be
skipped. Otherwise, we close the figures before and after running the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fmi, why closing before, not just after?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a definitive answer since I did not write in scikit-learn. What I can infer is that some test might fail and might not end in the teardown maybe. So the subsequent test is here to make a clean start. However, I'm unsure.

skore/src/skore/utils/_accessor.py Outdated Show resolved Hide resolved
skore/tests/unit/sklearn/plot/__init__.py Outdated Show resolved Hide resolved
"estimator[/bold cyan]"
)

def _create_help_tree(self):
Copy link
Collaborator

@thomass-dev thomass-dev Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add to the helper the representation of the attributes of the reporter.
For instance, it can help users to know that the reporter contains the fitted estimator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an ending branch listing all getter and init attributes.

image

)
)
# trigger the computation
list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have a list of indeterminated progress instead of one progress bar that "jumps".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see what we can do to improve the current state.

@@ -0,0 +1,168 @@
from typing import Any, Callable, Literal, Optional, Union
Copy link
Collaborator

@thomass-dev thomass-dev Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To-do: check if removing the stub files breaks the auto-completion or not, and check if a work-around exists (ping @augustebaum).

Copy link
Contributor

github-actions bot commented Jan 9, 2025

Documentation preview @ 82f6332

Copy link
Contributor

github-actions bot commented Jan 9, 2025

Coverage

Coverage Report for backend
FileStmtsMissCoverMissing
venv/lib/python3.12/site-packages/skore
   __init__.py120100% 
   __main__.py8180%19
   exceptions.py30100% 
venv/lib/python3.12/site-packages/skore/cli
   __init__.py50100% 
   cli.py33385%104, 111, 117
   color_format.py43390%35–>40, 41–43
   launch_dashboard.py261539%36–57
   quickstart_command.py14750%37–51
venv/lib/python3.12/site-packages/skore/item
   __init__.py210100% 
   cross_validation_item.py1371093%27–42, 370
   item.py411368%85, 88, 92–112
   item_repository.py42293%12–13
   media_item.py70494%15–18
   numpy_array_item.py25193%15
   pandas_dataframe_item.py34195%15
   pandas_series_item.py34195%15
   polars_dataframe_item.py32194%15
   polars_series_item.py27194%15
   primitive_item.py27292%13–15
   sklearn_base_estimator_item.py33195%15
   skrub_table_report_item.py10186%11
venv/lib/python3.12/site-packages/skore/persistence
   __init__.py00100% 
   abstract_storage.py22195%130
   disk_cache_storage.py33195%44
   in_memory_storage.py200100% 
venv/lib/python3.12/site-packages/skore/project
   __init__.py30100% 
   create.py52888%116–122, 132–133, 140–141
   load.py23389%43–45
   open.py140100% 
   project.py64491%135, 149, 183, 187
venv/lib/python3.12/site-packages/skore/sklearn
   __init__.py40100% 
   find_ml_task.py35195%41–>49, 50
   types.py20100% 
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
   __init__.py100100% 
   base.py76298%87–88
   metrics_accessor.py198298%131, 266
   report.py165197%145–>151, 147–>149, 150, 153–>155, 159–>163, 408–>413
   utils.py11110%1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
   __init__.py40100% 
   precision_recall_curve.py126297%200–>203, 313–314
   prediction_error.py75099%289–>297
   roc_curve.py95394%156, 167–>170, 223–224
   utils.py770100% 
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation
   __init__.py20100% 
   cross_validation_helpers.py47490%104–>136, 123–126
   cross_validation_reporter.py35195%177
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation/plots
   __init__.py00100% 
   compare_scores_plot.py29192%10, 45–>48
   timing_plot.py29194%10
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py34294%15–16
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py17378%16–18, 80
   high_class_imbalance_warning.py18288%16–18
   random_state_unset_warning.py11187%15
   shuffle_true_warning.py9091%44–>exit
   stratify_is_set_warning.py11187%15
   time_based_column_warning.py22189%17, 69–>exit
   train_test_split_warning.py5180%21
venv/lib/python3.12/site-packages/skore/ui
   __init__.py00100% 
   app.py25571%24, 53–58
   dependencies.py7186%12
   project_routes.py500100% 
venv/lib/python3.12/site-packages/skore/utils
   __init__.py00100% 
   _accessor.py70100% 
   _logger.py21484%14–18
   _show_versions.py310100% 
venv/lib/python3.12/site-packages/skore/view
   __init__.py00100% 
   view.py50100% 
   view_repository.py16283%8–9
TOTAL222513693% 

Tests Skipped Failures Errors Time
349 0 💤 0 ❌ 0 🔥 44.190s ⏱️

@glemaitre
Copy link
Member Author

OK. It should be good to go and we should be able to iterate.

Copy link
Contributor

@sylvaincom sylvaincom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this very useful PR @glemaitre and the whole team for reviewing it! Let's iterate on sub-issues if needed

@thomass-dev thomass-dev merged commit 1a4151a into probabl-ai:main Jan 10, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(back): Estimator Report
7 participants