diff --git a/.doctrees/_notebooks/Basic_usage.doctree b/.doctrees/_notebooks/Basic_usage.doctree index 6a6158d..3c5e753 100644 Binary files a/.doctrees/_notebooks/Basic_usage.doctree and b/.doctrees/_notebooks/Basic_usage.doctree differ diff --git a/.doctrees/_notebooks/Tutorial.doctree b/.doctrees/_notebooks/Tutorial.doctree index 39e58f7..547e02d 100644 Binary files a/.doctrees/_notebooks/Tutorial.doctree and b/.doctrees/_notebooks/Tutorial.doctree differ diff --git a/.doctrees/_notebooks/Using_redflag_with_Pandas.doctree b/.doctrees/_notebooks/Using_redflag_with_Pandas.doctree index c134810..cb80080 100644 Binary files a/.doctrees/_notebooks/Using_redflag_with_Pandas.doctree and b/.doctrees/_notebooks/Using_redflag_with_Pandas.doctree differ diff --git a/.doctrees/_notebooks/Using_redflag_with_sklearn.doctree b/.doctrees/_notebooks/Using_redflag_with_sklearn.doctree index 347d3ed..8db6715 100644 Binary files a/.doctrees/_notebooks/Using_redflag_with_sklearn.doctree and b/.doctrees/_notebooks/Using_redflag_with_sklearn.doctree differ diff --git a/.doctrees/changelog.doctree b/.doctrees/changelog.doctree index 1a2edd0..673bc76 100644 Binary files a/.doctrees/changelog.doctree and b/.doctrees/changelog.doctree differ diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle index 3e4bdc2..7ae4339 100644 Binary files a/.doctrees/environment.pickle and b/.doctrees/environment.pickle differ diff --git a/.doctrees/redflag.doctree b/.doctrees/redflag.doctree index 63248d3..74d119a 100644 Binary files a/.doctrees/redflag.doctree and b/.doctrees/redflag.doctree differ diff --git a/_images/c0c1962d29e7e150fc708abea8ff668162e08bc30868c49f83ff65489ab29050.png b/_images/c0c1962d29e7e150fc708abea8ff668162e08bc30868c49f83ff65489ab29050.png new file mode 100644 index 0000000..6a31b82 Binary files /dev/null and b/_images/c0c1962d29e7e150fc708abea8ff668162e08bc30868c49f83ff65489ab29050.png differ diff --git a/_images/d7c8d901883c9d1b74e19dacf493c347184637b9408d881e681ee912d6ba0c17.png b/_images/d7c8d901883c9d1b74e19dacf493c347184637b9408d881e681ee912d6ba0c17.png deleted file mode 100644 index de5f7cc..0000000 Binary files a/_images/d7c8d901883c9d1b74e19dacf493c347184637b9408d881e681ee912d6ba0c17.png and /dev/null differ diff --git a/_notebooks/Basic_usage.html b/_notebooks/Basic_usage.html index 3e0580c..459b03d 100644 --- a/_notebooks/Basic_usage.html +++ b/_notebooks/Basic_usage.html @@ -241,7 +241,7 @@

🚩 Basic usage -
'0.1.dev1+g96ac82d'
+
'0.4.0rc1'
 
@@ -547,7 +547,7 @@

Imbalance metrics -
<matplotlib.lines.Line2D at 0x7f17ca3f72d0>
+
<matplotlib.lines.Line2D at 0x7f8ac0583310>
 
../_images/2bb44d03d42247969124582eea41e70bf462d350c4bfaceb853eecf4e89f5d6d.png @@ -630,7 +630,7 @@

Outliers

This truncated normal distribution has no outliers (there are only about 60, compared to the 100 we expect at this confidence level of 99% on this dataset of 10,000 records).

@@ -737,7 +737,7 @@

Clipping
<seaborn.axisgrid.FacetGrid at 0x7f17c81c3290>
+
<seaborn.axisgrid.FacetGrid at 0x7f8a8d0df310>
 
../_images/cd5838a765b85cc46ad4d3822253aaa1b0e9802d751724777bec08aee732895f.png @@ -782,7 +782,7 @@

Distribution shape -
Distribution(name='gumbel_r', shape=[], loc=10.040572536302586, scale=4.93432972751726)
+
Distribution(name='gumbel_r', shape=[], loc=10.04057253630259, scale=4.93432972751726)
 
@@ -798,7 +798,7 @@

Distribution shape
<seaborn.axisgrid.FacetGrid at 0x7f17c8141590>
+
<seaborn.axisgrid.FacetGrid at 0x7f8a8cef7310>
 
../_images/9a24bafc24d9b00917a0e1f2bf75c12f68152f5e5322a8a58a83974f1943cf77.png @@ -947,7 +947,7 @@

Feature importance -
array([0.25782719, 0.40422669, 0.272979  , 0.04737446, 0.        ])
+
array([0.33755287, 0.24960534, 0.36065005, 0.05219174, 0.        ])
 
@@ -974,7 +974,7 @@

Feature importance -
array([1, 2, 0])
+
array([2, 0, 1])
 
@@ -992,7 +992,7 @@

Feature importance -
-
array(['ss', 'ss'], dtype='<U2')
+
array(['ms', 'ss'], dtype='<U2')
 
@@ -318,7 +318,7 @@

A quick look at

-
'0.1.dev1+g96ac82d'
+
'0.4.0rc1'
 
@@ -570,7 +570,7 @@

Clipping
<seaborn.axisgrid.FacetGrid at 0x7f3a5e8bfa10>
+
<seaborn.axisgrid.FacetGrid at 0x7f8051160c10>
 
../_images/cd5838a765b85cc46ad4d3822253aaa1b0e9802d751724777bec08aee732895f.png @@ -623,7 +623,7 @@

Importance -
array([0.42306126, 0.19836246, 0.31959992, 0.05832766])
+
array([0.42250322, 0.16792185, 0.31437961, 0.09519533])
 
@@ -660,16 +660,18 @@

Pipelines
Pipeline(steps=[('rf.imbalance', ImbalanceDetector()),
                 ('rf.clip', ClipDetector()),
                 ('rf.correlation', CorrelationDetector()),
+                ('rf.multimodality', MultimodalityDetector()),
                 ('rf.outlier', OutlierDetector()),
                 ('rf.distributions', DistributionComparator()),
                 ('rf.importance', ImportanceDetector()),
                 ('rf.dummy', DummyPredictor())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

+ ('rf.dummy', DummyPredictor())])

We can include this in other pipelines:

@@ -687,26 +689,29 @@

Pipelines

🚩 There are more outliers than expected in the training data (316 vs 31).
 
🚩 Feature 3 has low importance; check for relevance.
-ℹ️ Dummy classifier scores: {'f1': 0.25488459423559595, 'roc_auc': 0.5} (most_frequent strategy).
+ℹ️ Dummy classifier scores: {'f1': 0.26324448737561057, 'roc_auc': 0.5009389627941475} (stratified strategy).
 
Pipeline(steps=[('standardscaler', StandardScaler()),
@@ -732,28 +739,31 @@ 

Pipelines

-
Pipeline(steps=[('detector',
-                 Detector(func=<function BaseRedflagDetector.__init__.<locals>.<lambda> at 0x7f3a5c794f40>,
-                          warning='are negative')),
-                ('svc', SVC())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

+ Detector(func=<function BaseRedflagDetector.__init__.<locals>.<lambda> at 0x7f8050fff920>, + message='are negative')), + ('svc', SVC())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The noise feature we added has negative values; the others are all positive, which is what we expect for these data.

(Careful! All standardized features will have negative values.)

diff --git a/_notebooks/Using_redflag_with_Pandas.html b/_notebooks/Using_redflag_with_Pandas.html index a96b665..1056253 100644 --- a/_notebooks/Using_redflag_with_Pandas.html +++ b/_notebooks/Using_redflag_with_Pandas.html @@ -234,7 +234,7 @@

🚩 Using redf
-
'0.1.dev1+g96ac82d'
+
'0.4.0rc1'
 
@@ -428,8 +428,8 @@

🚩 Using redf

-
{'f1': 0.2346177681289338,
- 'roc_auc': 0.500927788909045,
+
{'f1': 0.25070659735756834,
+ 'roc_auc': 0.5002038872861748,
  'strategy': 'stratified',
  'task': 'classification'}
 
@@ -461,9 +461,9 @@

🚩 Using redf

Continuous data suitable for regression
-Outliers:    [  34   35  140  141  142  143  175  532  575  581  583  633  662  757
-  768  769  801 1316 1498 1547 1744 1754 1756 1778 1779 1780 1784 1785
- 1788 1808 1812 2884 2932 2973 2974 3004 3087 3094 3100 3109]
+Outliers:    [  34   35  140  141  142  143  175  532  575  581  583  633  662  768
+  769  773  801 1316 1498 1547 1744 1754 1756 1778 1779 1780 1784 1788
+ 1808 1812 2884 2932 2973 2974 3004 3087 3100 3109]
 Correlated:  True
 Dummy scores:{'mean': {'mean_squared_error': 47528.78263092096, 'r2': 0.0}}
 
diff --git a/_notebooks/Using_redflag_with_sklearn.html b/_notebooks/Using_redflag_with_sklearn.html index 58f24bf..00a2cf2 100644 --- a/_notebooks/Using_redflag_with_sklearn.html +++ b/_notebooks/Using_redflag_with_sklearn.html @@ -496,16 +496,18 @@

Using the pre-built
Pipeline(steps=[('rf.imbalance', ImbalanceDetector()),
                 ('rf.clip', ClipDetector()),
                 ('rf.correlation', CorrelationDetector()),
+                ('rf.multimodality', MultimodalityDetector()),
                 ('rf.outlier', OutlierDetector()),
                 ('rf.distributions', DistributionComparator()),
                 ('rf.importance', ImportanceDetector()),
                 ('rf.dummy', DummyPredictor())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

+ ('rf.dummy', DummyPredictor())])
ImbalanceDetector()
ClipDetector()
CorrelationDetector()
MultimodalityDetector()
OutlierDetector()
DistributionComparator()
ImportanceDetector()
DummyPredictor()

We can use this in another pipeline:

@@ -525,26 +527,29 @@

Using the pre-built

ImbalanceDetector()
ClipDetector()
CorrelationDetector()
MultimodalityDetector()
OutlierDetector()
DistributionComparator()
ImportanceDetector()
DummyPredictor()
SVC()

During the fit phase, the redflag transformers do three things: