Changelog

All notable changes to this project will be documented in this file.

Release: PyCaret 2.3.10 | Release Date: April 10th, 2022 (BUG FIXES)

Fixed predict_model throwing an exception with loaded pipelines (pycaret#2349)
Fixed potential parameter leaking for ParallelBackend - thanks to @goodwanghan (pycaret#2339)
Refactored a piece of logic in arules - thanks to @daikikatsuragawa (pycaret#2316)
Added Two Tutorials in Chinese - thanks to @ryanxjhan (pycaret#2352)
Added CLF101 in Chinese - thanks to @ryanxjhan (pycaret#2353)
Added new tutorials in Chinese - thanks to @ryanxjhan (pycaret#2375)

Release: PyCaret 2.3.9 | Release Date: March 27th, 2022 (BUG FIXES)

Made log_experiment more configurable (pycaret#2334, pycaret#2335)
Made return_train_score=False use the old output format (pycaret#2333)

Release: PyCaret 2.3.8 | Release Date: March 21st, 2022 (BUG FIXES)

Fixed dashboard_logger key error during setup (pycaret#2311)

Release: PyCaret 2.3.7 | Release Date: March 20th, 2022 (NEW FEATURES, BUG FIXES)

Fugue integration - thanks to @goodwanghan (pycaret#2035)
Added W&B experiment logger - thanks to @AyushExel (pycaret#2231)
Fixed check_fairness exception when index is not and ordinal number - thanks to @reza1615 (pycaret#2055)
Unsupported characters in dataframes are now replaced - thanks to @reza1615 (pycaret#2058)
Fixed drift report with categorical columns - thanks to @reza1615 (pycaret#2063)
Added multivariable time series dataset from UCI - thanks to @reza1615 (pycaret#2094)
Fixed a UTF error during installation - thanks to @reza1615 (pycaret#2113)
MLFlow tracking API can now take in custom tags - thanks to @netoferraz (pycaret#1526)
Updated create_api function (pycaret#2146)
drift_report can now work with unseen data - thanks to @reza1615 (pycaret#2183)
Added Japanese tutorial - thanks to @hanaseleb (pycaret#2215)
Added Traffic and Drugs Related Violations dataset and example - thanks to @HaithemH (pycaret#2191)
Train score can now be returned from various supervised learning functions (return_train_score=True). Passing an unseen dataset with the label column to predict_model will now calculate the metrics for that dataset - thanks to @levelalphaone (pycaret#2237)
Fixed spelling mistakes in function docstrings - thanks to @aadarshsingh191198 (pycaret#2269)
Pinned numba<0.55 (pycaret#2056)

Release: PyCaret 2.3.6 | Release Date: January 12th, 2022 (NEW FEATURES, BUG FIXES)

Added new function create_app (pycaret#2044)
Refactored optimize_threshold function (pycaret#2041)
Added new function create_docker (pycaret#2005)
Added new function create_api (pycaret#2000)
Added new function check_fairness (pycaret#1997)
Added new function eda (pycaret#1983)
Added new function convert_model (pycaret#1959)
Added an ability to pass kwargs to plots in plot_model (https://github.com/pycaret/pycaret/pull/19400)
Added drift_report functionality to predict_model (pycaret#1935)
Added new function create_dashboard (pycaret#1925)
Added grid_interval parameter to optimize_threshold - thanks to @wolfryu (pycaret#1938)
Made logging level configurable by environment variable (pycaret#2026)
Made the optional path in AWS configurable (pycaret#2045)
Fixed TSNE plot with PCA (pycaret#2032)
Fixed rendering of streamlit plots (pycaret#2008)
Fixed class names in tree plot - thanks to @yamasakih (pycaret#1982)
Fixed NearZeroVariance preprocessor not being configurable - thanks to @Flyfoxs (pycaret#1952)
Removed duplicated code - thanks to @Flyfoxs (pycaret#1882)
Documentation improvements - thanks to @harsh204016, @khrapovs (https://github.com/pycaret/pycaret/pull/1931/files, pycaret#1956, pycaret#1946, pycaret#1949)
Pinned pyyaml<6.0.0 to fix issues with Google Colab

Release: PyCaret 2.3.5 | Release Date: November 19th, 2021 (NEW FEATURES, BUG FIXES)

Fixed an issue where Fix_multicollinearity would fail if the target was a float (pycaret#1640)
MLFlow runs are now nested - thanks to @jfagn (pycaret#1660)
Fixed a typo in REG102 tutorial - thanks to @bobo-jamson (pycaret#1684)
Fixed interpret_model not always respecting save_path (pycaret#1707)
Fixed certain plots not being logged by MLFlow (pycaret#1769)
Added dummy models to set a baseline in compare_models - thanks to @reza1615 (pycaret#1739)
Improved error message if a column specified in ignore_features doesn't exist in the dataset - thanks to @reza1615 (pycaret#1793)
Added an ability to set a custom probability threshold for binary classification through the probability_threshold argument in various methods (pycaret#1858)
Separated internal CV from validation CV for stack_models and calibrate_models (pycaret#1849, pycaret#1858)
A RuntimeError will now be raised if an incorrect version of scikit-learn is installed (pycaret#1870)
Improved readme, documentation and repository structure
Unpinned numba (pycaret#1735)

Release: PyCaret 2.3.4 | Release Date: September 23rd, 2021 (NEW FEATURES, BUG FIXES)

Added get_leaderboard function for classification and regression modules
It is now possible to specify the plot save path with the save argument of plot_model and interpret_model - thanks to @bhanuteja2001 (pycaret#1537)
Fixed interpret_model affecting plot_model behavior - thanks to @naujgf (pycaret#1600)
Fixed issues with conda builds - thanks to @melonhead901 (pycaret#1479)
Documentation improvements - thanks to @caron14 and @harsh204016 (pycaret#1499, pycaret#1502)
Fixed blend_models and stack_models throwing an exception when using custom estimators (pycaret#1500)
Fixed a "Target Missing" issue with "Remove Multicolinearity" option (pycaret#1508)
errors="ignore" parameter for compare_models now correctly ignores errors during full fit (pycaret#1510)
Fixed certain data types being incorrectly encoded as int64 during setup (pycaret#1515)
Pinned numba<0.54 (pycaret#1530)

Release: PyCaret 2.3.3 | Release Date: July 24th, 2021 (NEW FEATURES, BUG FIXES)

Fixed issues with [full] install by pinning interpret<=0.2.4
Added support for S3 folder path in deploy_model() with AWS
Enabled experimental Optuna TPESampler options to improve convergence (in tune_model())

Release: PyCaret 2.3.2 | Release Date: July 7th, 2021 (NEW FEATURES, BUG FIXES)

Implemented PDP, MSA and PFI plots in interpret_model - thanks to @IncubatorShokuhou (pycaret#1415)
Implemented Kolmogorov-Smirnov (KS) plot in plot_model under pycaret.classification module
Fixed a typo "RVF" to "RBF" - thanks to @baturayo (pycaret#1220)
Readme & license updates and improvements
Fixed remove_multicollinearity considering categorical features
Fixed keyword issues with PyCaret's cuML wrappers
Improved performance of iterative imputation
Fixed gain and lift plots taking wrong arguments, creating misleading plots
interpret_model on LightGBM will now show a beeswarm plot
Multiple improvements to exception handling and documentation in pycaret.persistence (pycaret#1324)
remove_perfect_collinearity option will now be show in the setup() summary - thanks to @mjkanji (pycaret#1342)
Fixed IterativeImputer setting wrong float precision
Fixed custom grids in tune_model raising an exception when composed of lists
Improved documentation in pycaret.clustering - thanks to @susmitpy (pycaret#1372)
Added support for LightGBM CUDA version - thanks to @IncubatorShokuhou (pycaret#1396)
Exposed address in get_data for alternative data sources - thanks to @IncubatorShokuhou (pycaret#1416)

Release: PyCaret 2.3.1 | Release Date: April 28, 2021 (SEVERAL BUGS FIXED)

Fixed an exception with missing variables (display_container etc.) during load_config()
Fixed exceptions when using Ridge and RF estimators with cuML (GPU mode)
Fixed PyCaret's cuML wrappers not being pickleable
Added an extra check to get_all_object_vars_and_properties internal method, fixing exceptions with certain estimators
save_model() now supports kwargs, which will be passed to joblib.dump()
Fixed an issue with load_model() from AWS (duplicate .pkl extension) - thanks to markgrujic (pycaret#1128)
Fixed a typo in documentation - thanks to koorukuroo (pycaret#1149)
Optimized Fix_multicollinearity transformer, drastically reducing the size of saved pipeline
interpret_model() now supports data passed as an argument - thanks to jbechtel (pycaret#1184)
Removed infer_signature from MLflow logging when log_experiment=True.
Fixed a rare issue where binary_multiclass_score_func was not pickleable
Fixed edge case exceptions in feature selection
Fixed an exception with finalize_model when using GroupKFold CV
Pinned mlxtend>=0.17.0, imbalanced-learn==0.7.0, and gensim<4.0.0

Release: PyCaret 2.3.0 | Release Date: February 21, 2021

Modules Impacted: pycaret.classification pycaret.regression pycaret.clustering pycaret.anomaly pycaret.arules

Summary of Changes

Added new interactive residual plots in pycaret.regression module. You can now generate interactive residual plots by using residuals_interactive in the plot_model function.
Added plot rendering support for streamlit applications. A new parameter display_format is added in the plot_model function. To render plot in streamlit app, set this to streamlit.
Revamped Boruta feature selection algorithm. (give it a try!).
tune_model in pycaret.classification and pycaret.regression is now compatible with custom models.
Added low_memory and max_len support to association rules module (pycaret#1008).
Increased robustness of DataFrame checks (pycaret#1005).
Improved loading of models from AWS (pycaret#1005).
Catboost and XGBoost are now optional dependencies. They are not automatically installed with default slim installation. To install optional dependencies use pip install pycaret[full].
Added raw_score argument in the predict_model function for pycaret.classification module. When set to True, scores for each class will be returned separately.
PyCaret now returns base scikit-learn objects, whenever possible.
When handle_unknown_categorical is set to False in the setup function, an exception will be raised during prediction if the data contains unknown levels in categorical features.
predict_model for multiclass classification now returns labels as an integer.
Fixed an edge case where an IndexError would be raised in pycaret.clustering and pycaret.anomaly.
Fixed text formatting for certain plots in pycaret.classification and pycaret.regression.
If a logs.log file cannot be created when setup is initialized, no exception will be raised now (support for more configurable logging to come in future).
User added metrics will not raise exceptions now and instead return 0.0.
Compatibility with tune-sklearn>=0.2.0.
Fixed an edge case for dropping NaNs in target column.
Fixed stacked models not being tuned correctly.
Fixed an exception with KFold when fold_shuffle=False.

Release: PyCaret 2.2.3 | Release Date: December 22, 2020 (SEVERAL BUGS FIX | CRITICAL COMPATIBILITY FIX)

Fixed exceptions with the predict_model function when data columns had non-string characters.
Fixed a rare exception with the remove_multicollinearity parameter in the setup function`.
Improved performance and robustness of conversion of date features to categoricals.
Fixed an exception with the models function when the type parameter was passed.
The data frame displayed after setup can now be accessed with the pull function.
Fixed an exception with save_config
Fixed a rare case where the target column would be treated as an ID column and thus dropped.
SHAP plots can now be saved (pass save parameter as True)
| CRITICAL | Compatibility broke for catboost, pyod (other impacts unknown as of now) with sklearn=0.24 (released on Dec 22, 2020). A temporary fix is requiring 0.23.2 specifically in the requirements.txt.

Release: PyCaret 2.2.2 | Release Date: November 25, 2020 (SEVERAL BUGS FIX)

Fixed an issue with the optimize_threshold function the pycaret.classification module. It now returns a float instead of an array.
Fixed issue with the predict_model function. It now uses original data frame to append the predictions. As such any extra columns given at the time of inference are not removed when returning the predictions. Instead they are internally ignored at the time of predictions.
Fixed edge case exceptions for the create_model function in pycaret.clustering.
Fixed exceptions when column names are not string.
Fixed exceptions in pycaret.regression when transform_target is True in the setup function.
Fixed an exception in the models function if the type parameter is specified.

Release: PyCaret 2.2.1 | Release Date: November 09, 2020 (SEVERAL BUGS FIX)

Post-release 2.2, the following issues have been fixed:

Fixed plot_model = 'tree' exceptions.
Fixed issue with predict_model causing errors with non-contiguous indices.
Fixed issue with remove_outliers parameter in the setup function. It was introducing extra columns in training data. The issue has been fixed now.
Fixed issue with plot_model in pycaret.clustering causing errors with non-contiguous indices.
Fixed an exception when the model was saved or logged when imputation_type is set to 'iterative' in the setup function.
compare_models now prints intermediate output when html=False.
Metrics in pycaret.classification for binary classification are now calculated with average='binary'. Before they were a weighted average of positive and negative class, now they are just calculated for positive class. For multiclass classification average='weighted'.
optimize_threshold now returns optimized probability threshold value as numpy object.
Fixed issue with certain exceptions in compare_models.
Added profile_kwargs argument in the setup function to pass keyword arguments to Pandas Profiler.
plot_model, interpret_model, and evaluate_model now accepts a new parameter use_train_data which when set to True, generates plot on train data instead of test data.

Release: PyCaret 2.2 | Release Date: October 28, 2020

Summary of Changes

Modules Impacted: pycaret.classification pycaret.regression pycaret.clustering pycaret.anomaly
Separate Train and Test Set: New parameter test_data has been added in the setup function of pycaret.classification and pycaret.regression. When a DataFrame is passed into the test_data, it is used as a holdout set and the train_size parameter is ignored. test_data must be labeled and the shape of test_data must match with the shape of data.
Disable Default Preprocessing: A new parameter preprocess has been added into the setup function. When preprocess is set to False, no transformations are applied except for train_test_split and custom transformations passed in the custom_pipeline param. Data must be ready for modeling (no missing values, no dates, categorical data encoding) when preprocess is set to False.
Custom Metrics: New functions get_metric, add_metric and remove_metric is now added in pycaret.classification, pycaret.regression, and pycaret.clustering, that can be used to add / remove metrics used in model evaluation.
Custom Transformations: A new parameter custom_pipeline has been added into the setup function. It takes a tuple of (str, transformer) or a list of tuples. When passed, it will append the custom transformers in the preprocessing pipeline and are applied on each CV fold separately and on the final fit. All the custom transformations are applied after train_test_split and before pycaret's internal transformations.
GPU enabled Training: To use GPU for training use_gpu parameter in the setup function can be set to True or force. When set to True, it will use GPU with algorithms that support it and fall back on CPU for remaining. When set to force it will only use GPU-enabled algorithms and raise exceptions if they are unavailable for use. The following algorithms are supported on GPU:
- Extreme Gradient Boosting pycaret.classification pycaret.regression
- LightGBM pycaret.classification pycaret.regression
- CatBoost pycaret.classification pycaret.regression
- Random Forest pycaret.classification pycaret.regression
- K-Nearest Neighbors pycaret.classification pycaret.regression
- Support Vector Machine pycaret.classification pycaret.regression
- Logistic Regression pycaret.classification
- Ridge Classifier pycaret.classification
- Linear Regression pycaret.regression
- Lasso Regression pycaret.regression
- Ridge Regression pycaret.regression
- Elastic Net (Regression) pycaret.regression
- K-Means pycaret.clustering
- Density-Based Spatial Clustering pycaret.clustering
Hyperparameter Tuning: New methods for hyperparameter tuning has been added in the tune_model function for pycaret.classification and pycaret.regression. New parameter search_library and search_algorithm in the tune_model function is added. search_library can be scikit-learn, scikit-optimize, tune-sklearn, and optuna. The search_algorithm param can take the following values based on its search_library:
- scikit-learn: random grid
- scikit-optimize: bayesian
- tune-sklearn: random grid bayesian hyperopt bohb
- optuna: random tpe
Except for scikit-learn, all the other search libraries are not hard dependencies of pycaret and must be installed separately.
Early Stopping: Early stopping now supported for hyperparameter tuning. A new parameter early_stopping is added in the tune_model function for pycaret.classification and pycaret.regression. It is ignored when search_library is scikit-learn, or if the estimator doesn't have a 'partial_fit' attribute. It can be either an object accepted by the search library or one of the following:
- asha for Asynchronous Successive Halving Algorithm
- hyperband for Hyperband
- median for median stopping rule
- When False or None, early stopping will not be used.
Iterative Imputation: Iterative imputation type for numeric and categorical missing values is now implemented. New parameters imputation_type, iterative_imptutation_iters, categorical_iterative_imputer, and numeric_iterative_imputer added in the setup function. Read the blog post for more details: https://www.linkedin.com/pulse/iterative-imputation-pycaret-22-antoni-baum/?trackingId=Shg1zF%2F%2FR5BE7XFpzfTHkA%3D%3D
New Plots: Following new plots have been added:
- lift pycaret.classification
- gain pycaret.classification
- tree pycaret.classification pycaret.regression
- feature_all pycaret.classification pycaret.regression
CatBoost Compatibility: CatBoostClassifier and CatBoostRegressor is now compatible with plot_model. It requires catboost>=0.23.2.
Log Plots in MLFlow Server: You can now log any plot in the MLFlow tracking server that is available in the plot_model function. To log specific plots, pass a list containing plot IDs in the log_plots parameter. Check the documentation of the plot_model to see all available plots.
Data Split Stratification: A new parameter data_split_stratify is added in the setup function of pycaret.classification and pycaret.regression. It controls stratification during train_test_split. When set to True, will stratify by target column. To stratify on any other columns, pass a list of column names.
Fold Strategy: A new parameter fold_strategy is added in the setup function for pycaret.classification and pycaret.regression. By default, it is 'stratifiedkfold' for pycaret.classification and 'kfold' for pycaret.regression. Possible values are:
- kfold for KFold CV;
- stratifiedkfold for Stratified KFold CV;
- groupkfold for Group KFold CV;
- timeseries for TimeSeriesSplit CV; or
- a custom CV generator object compatible with scikit-learn.
Global Fold Parameter: A new parameter fold has been added in the setup function for pycaret.classification and pycaret.regression. It controls the number of folds to be used in cross validation. This is a global setting that can be over-written at function level by using fold parameter within each function. Ignored when fold_strategy is a custom object.
Fold Groups: Optional Group labels when fold_strategy is groupkfold. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing the group label.
Transformation Pipeline: All transformations are now applied after train_test_split.
Data Type Handling: All data types handling internally has been changed from int64 and float64 to int32 and float32 respectively in order to improve memory usage and performance, as well as for better compatibility with GPU-based algorithms.
AutoML Behavior Change: automl function in pycaret.classification and pycaret.regression is no more re-fitting the model on the entire dataset. As such, if the model needs to be fitted on the entire dataset including the holdout set, finalize_model must be explicitly used.
Default Tuning Grid: Default hyperparameter tuning grid for RandomForest, XGBoost, CatBoost, and LightGBM has been amended to remove extreme values for max_depth and other training intense parameters to speed up the tuning process.
Random Forest Default Values: Default value of n_estimators for RandomForestClassifier and RandomForestRegressor has been changed from 10 to 100 to make it consistent with the default behavior of scikit-learn.
AUC for Multiclass Classification: AUC for Multiclass target is now available in the metric evaluation.
Google Colab Display: All output printed on screen (information grid, score grids) is now format compatible with Google Colab resulting in semantic improvements.
Sampling Parameter Removed: sampling parameter is now removed from the setup function of pycaret.classification and pycaret.regression.
Type Hinting: In order to make both the usage and development easier, type hints have been added to all updated pycaret functions, in accordance with best practices. Users can leverage those by using an IDE with support for type hints.
Documentation: All Modules documentation on the website is now retired. Updated documentation is available here: https://pycaret.readthedocs.io/en/latest/

Function Level Changes