New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Complex synthetic dataset #25

Open

juAlberge wants to merge 12 commits into main from complex-synthetic-dataset

Collaborator

juAlberge commented Dec 11, 2023

Creating a complex dataset.

juAlberge added 5 commits

December 5, 2023 17:10


          Creating a new synthetic dataset. For now, the censoring method does …

ce322df

…not depend of the covariates.


          adding censoring depending on the covariates

ab5d372


          Synthetic dataset with expit

b038090


           refactoring the code, adding histograms.

4388bd5


          event well numeroted

dbf1baa

ogrisel reviewed

View reviewed changes

Contributor

ogrisel left a comment

A quick first pass of feedback.

examples/plot_complex_data.py Outdated Show resolved Hide resolved

examples/plot_complex_data.py Outdated Show resolved Hide resolved

examples/plot_complex_data.py Show resolved Hide resolved

examples/plot_complex_data.py Outdated Show resolved Hide resolved

hazardous/data/__init__.py Outdated Show resolved Hide resolved

juAlberge added 3 commits

December 13, 2023 15:04


          changes dones after Olivier's reviewing

2de17bb


          fix some tests.

f1eb9fc


          iadding seasborn to push doc.

373745d

ogrisel reviewed

View reviewed changes

examples/plot_complex_data.py Outdated

Comment on lines 29 to 40

+              DEFAULT_SHAPE_RANGES = (
+                  (0.7, 0.9),
+                  (1.0, 1.0),
+                  (2.0, 3.0),
+              )
+              DEFAULT_SCALE_RANGES = (
+                  (1, 20),
+                  (1, 10),
+                  (1.5, 5),
+              )
+              n_events = 3

Contributor

ogrisel Dec 14, 2023

Let's use the default parameters of the synthetic solver in the example.

Vincent-Maladiere reviewed

View reviewed changes

Collaborator

Vincent-Maladiere left a comment

Looking good, some feedbacks

hazardous/data/_competing_weibull.py Outdated

                   frame = pd.concat([X, y], axis=1)
-                  return Bunch(data=frame[X.columns], target=X[y.columns], frame=frame)
+                  return Bunch(data=frame[X.columns], target=frame[y.columns], frame=frame)

Collaborator

Vincent-Maladiere Dec 14, 2023

Should we return y_censored instead of y (uncensored) in Bunch when return_X_y=False and return_uncensored_data=False (default)?

hazardous/data/_competing_weibull.py

+                  degree_interaction=2,
+                  random_state=0,
+              ):
+                  rng = np.random.RandomState(random_state)

Collaborator

Vincent-Maladiere Dec 14, 2023

Suggested change

      
                rng = np.random.RandomState(random_state)
          
                rng = check_random_state(random_state)

hazardous/data/_competing_weibull.py Outdated

Comment on lines 245 to 255

+                      X, event_durations, duration_argmin = make_complex_features_with_sparse_matrix(
+                          n_events,
+                          n_samples,
+                          base_scale,
+                          shape_ranges,
+                          scale_ranges,
+                          n_features,
+                          features_rate,
+                          degree_interaction,
+                          random_state,
+                      )

Collaborator

Vincent-Maladiere Dec 14, 2023

Suggested change

      
                    X, event_durations, duration_argmin = make_complex_features_with_sparse_matrix(
          
                        n_events,
          
                        n_samples,
          
                        base_scale,
          
                        shape_ranges,
          
                        scale_ranges,
          
                        n_features,
          
                        features_rate,
          
                        degree_interaction,
          
                        random_state,
          
                    )
          
                    X, event_durations, duration_argmin = make_complex_features_with_sparse_matrix(
          
                        n_events=n_events,
          
                        n_samples=n_samples,
          
                        base_scale=base_scale,
          
                        shape_ranges=shape_ranges,
          
                        scale_ranges=scale_ranges,
          
                        n_features=n_features,
          
                        features_rate=features_rate,
          
                        degree_interaction=degree_interaction,
          
                        random_state=random_state,
          
                    )

hazardous/data/_competing_weibull.py Outdated

Comment on lines 257 to 259

+                      X, event_durations, duration_argmin = make_simple_features(
+                          n_events, n_samples, base_scale, shape_ranges, scale_ranges, random_state
+                      )

Collaborator

Vincent-Maladiere Dec 14, 2023

Suggested change

      
                    X, event_durations, duration_argmin = make_simple_features(
          
                        n_events, n_samples, base_scale, shape_ranges, scale_ranges, random_state
          
                    )
          
                    X, event_durations, duration_argmin = make_simple_features(
          
                        n_events=n_events,
          
                        n_samples=n_samples,
          
                        base_scale=base_scale,
          
                        shape_ranges=shape_ranges,
          
                        scale_ranges=scale_ranges,
          
                        random_state=random_state,
          
                    )

hazardous/data/_competing_weibull.py Outdated

Comment on lines 186 to 187

		df_features = pd.DataFrame(rng.randn(n_samples, n_features))
		df_features.columns = [f"feature_{i}" for i in range(n_features)]

Collaborator

Vincent-Maladiere Dec 14, 2023

Nitpick

Suggested change

      
                df_features = pd.DataFrame(rng.randn(n_samples, n_features))
          
                df_features.columns = [f"feature_{i}" for i in range(n_features)]
          
                columns = [f"feature_{i}" for i in range(n_features)]
          
                df_features = pd.DataFrame(
          
                    rng.randn(n_samples, n_features),
          
                    columns=columns,
          
                )

hazardous/data/_competing_weibull.py Outdated

Comment on lines 114 to 115

		shape = df_shape_scale_star[f"shape_{event}"].copy()
		scale = df_shape_scale_star[f"scale_{event}"].copy()

Collaborator

Vincent-Maladiere Dec 14, 2023

Why do we need to make copies?

hazardous/data/_competing_weibull.py Outdated

+                  df_shape_scale_star[f"shape_{event}"] = (
+                      shape_min
+                      + (shape_max - shape_min)
+                      * expit(scaler.fit_transform(shape.values.reshape(-1, 1)))

Collaborator

Vincent-Maladiere Dec 14, 2023

We don't need numpy conversion here, we can pass directly df_shape_scale_star[[f"shape_{event}"]] to the transformer.

hazardous/data/_competing_weibull.py Outdated

+                  censoring_relative_scale=1.5,
+                  random_state=0,
+              ):
+                  rng = np.random.RandomState(random_state)

Collaborator

Vincent-Maladiere Dec 14, 2023

Suggested change

      
                rng = np.random.RandomState(random_state)
          
                rng = check_random_state(random_state)

hazardous/data/_competing_weibull.py Outdated

+                  n_features=10,
+                  features_rate=0.3,
+                  degree_interaction=2,
+                  independent=True,

Collaborator

Vincent-Maladiere Dec 14, 2023

To be more specific?

Suggested change

      
                independent=True,
          
                independent_censoring=True,

hazardous/data/_competing_weibull.py Outdated

+                          features_impact_censoring,
+,
+                      )
+                      df_censoring_params = censoring_relative_scale * X @ features_impact_censoring

Collaborator

Vincent-Maladiere Dec 14, 2023

The variable names are quite hard to understand at first glance. Maybe w_ something instead of features_impact_censoring and X_params instead of df_censoring_params?

juAlberge added 4 commits

December 14, 2023 19:01


          fixing AJ estimator,

304dec7

adding 50% features marinal and 50% features interaction in the creation of w_star


           fixing tests (because shape ans scale default parameters have been c…

59008ad

…hanged, some tests were broken)


          adjusting code as Vincent's propositions

cdb3ad5


          refacto code with vincent, tuning parameters.

c5f3193

jovan-stojanovic mentioned this pull request

make_synthetic_competing_weibull function returns error with default parameters #64

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet