-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex synthetic dataset #25
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick first pass of feedback.
examples/plot_complex_data.py
Outdated
DEFAULT_SHAPE_RANGES = ( | ||
(0.7, 0.9), | ||
(1.0, 1.0), | ||
(2.0, 3.0), | ||
) | ||
|
||
DEFAULT_SCALE_RANGES = ( | ||
(1, 20), | ||
(1, 10), | ||
(1.5, 5), | ||
) | ||
n_events = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the default parameters of the synthetic solver in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, some feedbacks
hazardous/data/_competing_weibull.py
Outdated
|
||
frame = pd.concat([X, y], axis=1) | ||
return Bunch(data=frame[X.columns], target=X[y.columns], frame=frame) | ||
return Bunch(data=frame[X.columns], target=frame[y.columns], frame=frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return y_censored
instead of y
(uncensored) in Bunch when return_X_y=False
and return_uncensored_data=False
(default)?
degree_interaction=2, | ||
random_state=0, | ||
): | ||
rng = np.random.RandomState(random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rng = np.random.RandomState(random_state) | |
rng = check_random_state(random_state) |
hazardous/data/_competing_weibull.py
Outdated
X, event_durations, duration_argmin = make_complex_features_with_sparse_matrix( | ||
n_events, | ||
n_samples, | ||
base_scale, | ||
shape_ranges, | ||
scale_ranges, | ||
n_features, | ||
features_rate, | ||
degree_interaction, | ||
random_state, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X, event_durations, duration_argmin = make_complex_features_with_sparse_matrix( | |
n_events, | |
n_samples, | |
base_scale, | |
shape_ranges, | |
scale_ranges, | |
n_features, | |
features_rate, | |
degree_interaction, | |
random_state, | |
) | |
X, event_durations, duration_argmin = make_complex_features_with_sparse_matrix( | |
n_events=n_events, | |
n_samples=n_samples, | |
base_scale=base_scale, | |
shape_ranges=shape_ranges, | |
scale_ranges=scale_ranges, | |
n_features=n_features, | |
features_rate=features_rate, | |
degree_interaction=degree_interaction, | |
random_state=random_state, | |
) |
hazardous/data/_competing_weibull.py
Outdated
X, event_durations, duration_argmin = make_simple_features( | ||
n_events, n_samples, base_scale, shape_ranges, scale_ranges, random_state | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X, event_durations, duration_argmin = make_simple_features( | |
n_events, n_samples, base_scale, shape_ranges, scale_ranges, random_state | |
) | |
X, event_durations, duration_argmin = make_simple_features( | |
n_events=n_events, | |
n_samples=n_samples, | |
base_scale=base_scale, | |
shape_ranges=shape_ranges, | |
scale_ranges=scale_ranges, | |
random_state=random_state, | |
) |
hazardous/data/_competing_weibull.py
Outdated
df_features = pd.DataFrame(rng.randn(n_samples, n_features)) | ||
df_features.columns = [f"feature_{i}" for i in range(n_features)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick
df_features = pd.DataFrame(rng.randn(n_samples, n_features)) | |
df_features.columns = [f"feature_{i}" for i in range(n_features)] | |
columns = [f"feature_{i}" for i in range(n_features)] | |
df_features = pd.DataFrame( | |
rng.randn(n_samples, n_features), | |
columns=columns, | |
) |
hazardous/data/_competing_weibull.py
Outdated
shape = df_shape_scale_star[f"shape_{event}"].copy() | ||
scale = df_shape_scale_star[f"scale_{event}"].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make copies?
hazardous/data/_competing_weibull.py
Outdated
df_shape_scale_star[f"shape_{event}"] = ( | ||
shape_min | ||
+ (shape_max - shape_min) | ||
* expit(scaler.fit_transform(shape.values.reshape(-1, 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need numpy conversion here, we can pass directly df_shape_scale_star[[f"shape_{event}"]]
to the transformer.
hazardous/data/_competing_weibull.py
Outdated
censoring_relative_scale=1.5, | ||
random_state=0, | ||
): | ||
rng = np.random.RandomState(random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rng = np.random.RandomState(random_state) | |
rng = check_random_state(random_state) |
hazardous/data/_competing_weibull.py
Outdated
n_features=10, | ||
features_rate=0.3, | ||
degree_interaction=2, | ||
independent=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more specific?
independent=True, | |
independent_censoring=True, |
hazardous/data/_competing_weibull.py
Outdated
features_impact_censoring, | ||
0, | ||
) | ||
df_censoring_params = censoring_relative_scale * X @ features_impact_censoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable names are quite hard to understand at first glance. Maybe w_
something instead of features_impact_censoring
and X_params
instead of df_censoring_params
?
adding 50% features marinal and 50% features interaction in the creation of w_star
…hanged, some tests were broken)
Creating a complex dataset.