Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Enable multitarget problem types for OWTestAndScore and OWPredictions #5848

Merged
merged 4 commits into from
Mar 21, 2022

Conversation

JakaKokosar
Copy link
Member

@JakaKokosar JakaKokosar commented Feb 17, 2022

Issue

In survival analysis, it is expected to have two target variables. First is the duration of time until the event of interest, and the second is the indicator of censorship.

Preferably, the Survival Analysis add-on would then use the same infrastructure for testing and scoring survival models as it is currently in place for classification and regression related problems. To achieve this, we need to loosen the constraint of a single target variable for the input data.

Description of changes

With this pull request, the task was not to change the interface to accommodate for all future tasks one would like to support but to:

  • allow the use of multitarget data and make no breaking changes to the existing codebase (do not change the default behaviour),
  • for a start, make survival models/scorers available in Orange.
Since the registration of Scorers is already implemented, the next step was how to find the usable scorers given the input data. The Scorer base class now holds additional [information](https://github.com/biolab/orange3/compare/master...JakaKokosar:multi_target?expand=1#diff-ebac791194327a764153704a5e2567261585ad3971623c2138be81e8c02b8da5R69) to recognise Scorers that are built-in and those implemented through add-ons.

If Scorer is defined as built-in, nothing changes. For non-built-in scorers, we look into Table attributes to determine the 'problem type' of input data. For example, for the survival analysis, As Survival data widget will set class variables to the output table and set attributes of the table as follows:

{
	..., 
	'problem_type': 'time_to_event'
}

Usable scorers are those that match the same problem_type with input data. This is not necessarily the best solution and could use further debate. At this stage, no significant changes to the code-base were needed. In theory, everything else should be handled by Learners, Models and Scorers defined for related tasks biolab/orange3-survival-analysis#27.

Some examples:
Screenshot 2022-02-17 at 15 37 02

Screenshot 2022-02-17 at 15 37 15 Screenshot 2022-02-17 at 15 38 15 Screenshot 2022-02-17 at 15 57 55
Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #5848 (f820dea) into master (9af442d) will decrease coverage by 0.01%.
The diff coverage is 92.07%.

@@            Coverage Diff             @@
##           master    #5848      +/-   ##
==========================================
- Coverage   86.29%   86.28%   -0.02%     
==========================================
  Files         315      315              
  Lines       66830    66884      +54     
==========================================
+ Hits        57674    57712      +38     
- Misses       9156     9172      +16     

Copy link
Contributor

@janezd janezd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code to see the idea. My comments refer to what I spotted, and do not mean I like the idea. (I don't. :)

The problem is not the idea itself. I dislike is that it is rather a patch over bad overall design. We should consider some deeper changes, though I fear they lead towards discussing finally moving to pandas.

I think that in light of survival analysis and similar problems, we should consider multiple roles of variables. Currently, a variable can be an independent variable, a dependent variable or a meta, and they are stored in different matrices. We had another type (a weight), which was supposed to be unique (e.g. you cannot choose between multiple weight variables on the same data). You propose to add another role...

Going pandas would mean abandoning X, Y and metas as permanently materialized, and instead having column-based representation. At the same time, every column could be assigned a role. class_var would then be a property that would return the variable that is assigned a "target role".

We need to decide whether to continue patching or bite into pandas.

Orange/classification/base_classification.py Outdated Show resolved Hide resolved
Orange/evaluation/scoring.py Outdated Show resolved Hide resolved
Orange/widgets/evaluate/owpredictions.py Outdated Show resolved Hide resolved
Orange/widgets/evaluate/owpredictions.py Outdated Show resolved Hide resolved
Orange/widgets/evaluate/owtestandscore.py Outdated Show resolved Hide resolved
Orange/widgets/evaluate/utils.py Outdated Show resolved Hide resolved
@janezd janezd removed their assignment Feb 25, 2022
@JakaKokosar JakaKokosar force-pushed the multi_target branch 3 times, most recently from d9b8d98 to 63b7129 Compare March 9, 2022 07:36
@JakaKokosar JakaKokosar changed the title [RFC] Remove single target variable constraint [ENH] Remove single target variable constraint Mar 10, 2022
@JakaKokosar JakaKokosar changed the title [ENH] Remove single target variable constraint [ENH] Enable multitarget problem types for OWTestAndScore and OWPredictions Mar 11, 2022
@JakaKokosar JakaKokosar force-pushed the multi_target branch 2 times, most recently from ba37414 to adc46c7 Compare March 14, 2022 08:44
@JakaKokosar JakaKokosar force-pushed the multi_target branch 4 times, most recently from 83163ea to 04cf2e3 Compare March 15, 2022 12:20
@JakaKokosar JakaKokosar force-pushed the multi_target branch 7 times, most recently from 8b327bd to cc5b1be Compare March 16, 2022 16:21
@JakaKokosar JakaKokosar force-pushed the multi_target branch 7 times, most recently from b29e861 to b9ee08b Compare March 18, 2022 14:25
Orange/base.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants