Skip to content

Latest commit

 

History

History
2868 lines (2290 loc) · 180 KB

Changes.md

File metadata and controls

2868 lines (2290 loc) · 180 KB

Recent Changes

H2O

3.46.0.5 - 8/28/2024

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.46.0/5/index.html

Bug

  • [#16328] - Updated how ModelSelection handles categorical predictors to preserve the best categorical predictor when the best categorical level performs well relative to other predictors.
  • [#16120] - Resolved that MOJO is working for Isolation Forest and Extended Isolation forest for implemented versions.

New Feature

  • [#16327] - Ensured H2O-3 can load data from Snowflake using JDBC connector.

Docs

  • [#16215] - Updated the following user guide pages to adhere to style guide updates: Algorithms, Supported data types, Quantiles, and Early stopping.
  • [#16207] - Updated the Starting H2O user guide page to adhere to style guide updates.
  • [#15989] - Updated Python documentation for Decision Tree algorithm.

Security

  • [#16349] - Addressed sonatype-2024-0171 by upgrading jackson-databind to 2.17.2.
  • [#16342] - Addressed SNYK-JAVA-DNSJAVA-7547403, SNYK-JAVA-DNSJAVA-7547404, SNYK-JAVA-DNSJAVA-7547405, and CVE-2024-25638 by upgrading dnsjava to 3.6.0.

3.46.0.4 - 7/9/2024

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.46.0/4/index.html

Docs

  • [#16212] - Updating user guide - H2O Clients.
  • [#16214] - Updating user guide - Data Manipulation.
  • [#16213] - Updating user guide - Getting data into your H2O cluster.

Security

  • [#15748] - Addressed PRISMA-2023-0067 by upgrading jackson-databind.

3.46.0.3 - 6/11/2024

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.46.0/3/index.html

Bug Fix

  • [#16274] - Fixed plotting for H2O Explainabilty by resolving issue in the matplotlib wrapper.
  • [#16192] - Fixed h2o.findSynonyms failing if the word parameter is unknown to the Word2Vec model.
  • [#15947] - Fixed skipped_columns error caused by mismatch during the call to parse_setup when constructing an H2OFrame.

Improvement

  • [#16278] - Added flag to enable use_multi_thread automatically when using as_data_frame.

New Feature

  • [#16284] - Added support for Websockets to steam.jar.

Docs

  • [#16189] - Updating user guide - Downloading & Installing H2O.
  • [#16288] - Fixed GBM Python example in user guide.
  • [#16188] - Updated API-related changes page to adhere to style guide requirements.
  • [#16016] - Added examples to Python documentation for Uplift DRF.
  • [#15988] - Added examples to Python documentation for Isotonic Regression.

3.46.0.2 - 5/13/2024

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.46.0/2/index.html

Bug Fix

  • [#16161] - Fixed parquet export throwing NPEs when column types are strings.
  • [#16149] - Fixed GAM models failing with datasets of certain size by rebalancing the dataset to avoid collision.
  • [#16130] - Removed distutils version check to stop deprecation warnings with Python 3.12.
  • [#16026] - Removed custom_metric_func from ModelSelection.
  • [#15697] - Fixed MOJO failing to recognize fold_column and therefore using wrong index calculated for the offset_column.

Improvement

  • [#16116] - Implemented a warning if you want to use monotone splines for GAM but don’t set non_negative=True that you will not get a monotone output.
  • [#16056] - Added support to XGBoost for all gblinear parameters.
  • [#6722] - Implemented linear constraint support to GLM toolbox.

New Feature

  • [#16146] - Added ZSTD compression format support.

Docs

  • [#16193] - Added mapr7.0 to the download page for the Install on Hadoop tab.
  • [#16180] - Updated Index page to adhere to style guide requirements.
  • [#16131] - Added 3.46 release blog to the user guide.

Security

  • [#16170] - Addressed CVE-2024-21634 by upgrading aws-java-sdk-*.
  • [#16135] - Addressed CVE-2024-29131 by upgrading commons-configuration2.

3.46.0.1 - 3/13/2024

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.46.0/1/index.html

Bug Fix

  • [#16079] - Updated warning for multithreading in H2OFrame.as_data_frame.
  • [#16063] - Added error to explain method explaining incompatibility with UpliftDRF models.
  • [#16052] - Fixed finding best split point for UpliftDRF.
  • [#16043] - Fixed isin().
  • [#16036] - Fixed AstMatch failing with multinode.
  • [#15978] - Fixed Deep Learning Autoencoder MOJO PredictCSV failure.
  • [#15682] - Fixed log when web_ip is used.
  • [#15677] - Fixed match function only returning 1 and no match.

Improvement

  • [#16074] - Improved perRow metric calculation by implementing isGeneric() method.
  • [#16060] - Improved log message to show that Apple silicon is not supported.
  • [#16033] - Added optional GBLinear grid step to AutoML.
  • [#16015] - Suppressed the genmodel warnings when verbose=False.
  • [#15809] - Implemented ability to calculate full loglikelihood and AIC for an already-built GLM model.
  • [#15791] - Implemented early stopping for UpliftDRF and implemented gridable parameters for UpliftDRF.
  • [#15684] - Reconfigured all logs to standard error for level ERROR and FATAL.
  • [#7325] - Implemented prediction consistency check for constrained models.

New Feature

  • [#15993] - Added custom_metric as a hyperparameter for grid search.
  • [#15967] - Added custom metrics for XGBoost.
  • [#15858] - Implemented consistent mechanism that protects frames and their vecs from autodeletion.
  • [#15683] - Introduced a warning if web_ip is not specified that H2O Rest API is listening on all interfaces.
  • [#15654] - Introduced MLFlow flavors for working with H2O-3 MOJOs and POJOs instead of binary models.
  • [#6573] - Implemented machine learning interpretability support for UpliftDRF by allowing Uplift models to access partial dependences plots and variable importance.

Docs

  • [#16004] - Updated copyright year in user guide and Python guide.
  • [#16000] - Fixed Decision Tree Python example.
  • [#15930] - Fixed GLM Python example.
  • [#15915] - Added examples to Python documentation for Model Selection algorithm.
  • [#15798] - Added examples to Python documentation for GAM algorithm.
  • [#15709] - Added examples to Python documentation for ANOVA GLM algorithm.

Security

  • [#16102] - Addressed SNYK-JAVA-COMNIMBUSDS-6247633 by upgrading nimbus-jose-jwt to 9.37.2.
  • [#16093] - Addressed CVE-2024-26308 by upgrading org.apache.commons:commons-compress.
  • [#16067] - Addressed CVE-2023-35116 in the h2o-steam.jar.
  • [#15972] - Addressed CVE-2023-6038 by adding option to filter file system for reading and writing.
  • [#15971] - Addressed CVE-2023-6016 by introducing Java property that disables automatic import of POJOs during import_mojo or upload_mojo.

3.44.0.3 - 12/20/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.44.0/3/index.html

Bug Fix

  • [#15958] - Fixed maximum likelihood dispersion estimation for GLM tweedie family producing the wrong result for a specific dataset.
  • [#15936] - Added data frame transformations using polars since datatable cannot be installed on Python 3.10+.
  • [#15894] - Ensured that the functions that are supposed to be exported in the R package are exported.
  • [#15891] - Corrected sign in AIC calculation to fix problem with tweedie dispersion parameter estimation, AIC, and loglikelihood.
  • [#15887] - Allowed Python H2OFrame constructor to accept an existing H2OFrame.
  • [#6725] - Fixed LoggerFactory slf4j related regression.

Improvement

  • [#15937] - Exposed gainslift_bins parameter for Deep Learning, GAM, GLM, and Stacked Ensemble algorithms.
  • [#15916] - Sped up computation of Friedman-Popescu’s H statistic.

New Feature

  • [#15927] - Added anomaly score metric to be used as a sort_by metric when sorting grid model performances for Isolation Forest with grid search.
  • [#15780] - Added weak_learner_params parameter for AdaBoost.
  • [#15779] - Added weak_learner="deep_learning" option for AdaBoost.
  • [#7118] - Implemented scoring and scoring history for Extended Isolation Forest by adding score_each_iteration and score_tree_interval.

Docs

  • [#15817] - Improved default threshold API and documentation for binomial classification.

Security

  • [#15754] - Addressed CVE-2022-21230 by replacing nanohttpd.

3.44.0.2 - 11/8/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.44.0/2/index.html

Bug Fix

  • [#15906] - Fixed learning_curve_plot for CoxPH with specified metric = 'loglik'.
  • [#15889] - Fixed inability to call thresholds_and_metric_scores() with binomial models and metrics.
  • [#15861] - Fixed the warning message that caused as_data_frame to fail due to not having datatable installed.
  • [#15860] - Fixed force_col_type not working with skipped_columns when parsing parquet files.
  • [#15832] - Fixed UpliftDRF MOJO API and updated the documentation.
  • [#15761] - Fixed relevel_by_frequency resetting the values of the column.

Improvement

  • [#15893] - Renamed the data parameter of the partial_plot function to frame.

Docs

  • [#15881] - Added security note that Kubernetes images don’t apply security settings by default.
  • [#15851] - Added the 3.44 major release blog to the user guide.
  • [#15842] - Introduced Known Bug section to the release notes.
  • [#15840] - Fixed the release notes UI not loading by making them smaller by putting all release notes prior to 3.28.0.1 into a separate file.
  • [#6570] - Added information on the Friedman and Popescu H Statistic to XGBoost and GBM.

Security

  • [#15865] - Upgraded org.python.jython to CWE-416 of com.github.jnr:jnr-posix.

3.44.0.1 - 10/16/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.44.0/1/index.html

Bug Fix

  • [#15743] - Fixedshap_summary_plot for H2O Explainability Interface failing when one column was full of zeroes or NaN values.
  • [#15669] - Fixed R package to ensure it downloads the fixed version of H2O.
  • [#15651] - Upgraded the minimal supported version of ggplot2 to 3.3.0 to remove the deprecated dot-dot notation.

Improvement

  • [#15801] - Updated Friedman and Popescu’s H statistic calculation to include missing values support.
  • [#15741] - Implemented ability for force column types during parsing.
  • [#15713] - Improved the default threshold API for binomial classification.
  • [#15582] - Renamed prediction table header for UpliftDRF to be more user-friendly.
  • [#12678] - Added check to mojo_predict_df to look for a valid R dataframe.
  • [#7079] - Added verbosity to H2O initialization. h2oconn.clust.show_status() is now guarded and will only be shown when verbose=True during initialization.
  • [#6768] - Enabled categorical features for single decision tree.

New Feature

  • [#15773] - Implemented make_metrics with custom AUUC thresholds for UpliftDRF.
  • [#15565] - Implemented custom metric for AutoML.
  • [#15559] - Implemented custom metric for Stacked Ensemble.
  • [#15556] - Implemented MOJO support for UpliftDRF.
  • [#15535] - Implemented Python 3.10 and 3.11 support.
  • [#6784] - Implemented custom metric for Deep Learning.
  • [#6783] - Implemented custom metric functionalities and the ATE, ATT, and ATC metrics for UpliftDRF.
  • [#6779] - Implemented custom metric for leaderboard.
  • [#6723] - Implemented new AdaBoost algorithm for binary classification.
  • [#6698] - Implemented Shapley values support for ensemble models.

Security

  • [#15815] - Addressed CVE-2023-36478 by upgrading Jetty server.
  • [#15805] - Addressed CVE-2023-42503 by upgrading commons-compress to 1.24.0 in Standalone Jars.
  • [#15802] - Addressed CVE-2023-39410 by upgrading org.apache.avro:avro to 1.11.3.
  • [#15799] - Addressed CVE-2023-43642 by upgrading snappy-java in Standalone Jars to 1.1.10.5.
  • [#15759] - Addressed CVE-202-13949, CVE-2019-0205, CVE-2018-1320, and CVE-2018-11798 by excluding org.apache.thrift:libthrift from dependencies of Main Standalone Jar.
  • [#15757] - Addressed CVE-2020-29582 and CVE-2022-24329 by upgrading org.jetbrains.kotlin:kotlin-stdlib to 1.6.21 in Main and Steam Standalone Jars.
  • [#15755] - Addressed CVE-2023-3635 by upgrading com.squareup.okio:okio to 3.5.0 in Main and Steam Standalone Jars.
  • [#15752] - Addressed CVE-2023-34455, CVE-2023-34454, and CVE-2023-34453 by upgrading snappy-java to 1.1.10.3 in Main and Steam Standalone Jars.
  • [#15750] - Addressed CVE-2023-1370 by upgrading json-smart to 2.4.10 in Main standalone Jar.
  • [#15746] - Addressed CVE-2023-1436, CVE-2022-40149, CVE-2022-40150, CVE-2022-45685, and CVE-2022-45693 by upgrading org.codehaus.jettison:jettison to 1.5.4 in Main Standalone Jar.
  • [#15744] - Addressed CVE-2017-12197 by upgrading libpam4j to 1.11.
  • [#15706] - Addressed CVE-2023-40167 and CVE-2023-36479 by upgrading the Jetty server.
  • [#15470] - Upgraded Hadoop Libraries in Main Standalone Jar to address high and critical vulnerabilities.

Known Bug

(The list of bugs introduced by the changes in this release)

  • [#15832] - Broken Python and R API for UpliftDRF MOJO models. Resolved in 3.44.0.2.

3.42.0.4 - 10/3/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.42.0/4/index.html

Bug Fix

  • [#15729] - Implemented multi-thread as_data_frame by using Datatable to speedup the conversion process.
  • [#15643] - Fixed validation of include_explanation and exclude_explanation parameters

Improvement

  • [#15719] - Implemented warnings in python and R for accessing model.negative_log_likelihood()
  • [#13859] - Improved K-Means testing.

New Feature

  • [#15727] - Implemented new write_checksum parameter that allows you to disable default Hadoop Parquet writer systematically writing a .crc checksum file for each written data file.

Security

  • [#15766] - Addressed CVE-2023-40167 and CVE-2023-36479 in Steam Jar

3.42.0.3 - 8/22/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.42.0/3/index.html

Bug Fix

  • [#15679] - Fixed GBM invalid tree index feature interaction.
  • [#15666] - Updated test to showcase GBM checkpointing.
  • [#6605] - Fixed h2o.feature_interaction failing on cross-validation models with early stopping.

Improvement

  • [#6707] - Added extended message to h2o.init() to help users get around version mismatch error.

Docs

  • [#15694] - Added custom_metric_func and upload_custom_metric to GLM.
  • [#15680] - Added security installation disclaimer in documentation and on the download page.
  • [#15598] - Updated import_file description and added Google Storage support note.

Security

  • [#15687] - Replaced dependencies on no.priv.garshol.duke:duke:1.2 by extracting string comparators from Duke library.

3.42.0.2 - 7/25/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.42.0/2/index.html

Bug Fix

  • [#15637] - Fixed AUCPR plot assigning incorrect values to the variable recalls and precisions.
  • [#6545] - Fixed out of memory error on multi-node sorting stage or sorted frame generation process.

New Feature

  • [#15614] - Enabled H2OFrame to pandas DataFrame using multi-thread from datatable to speed-up the conversion process.
  • [#15597] - Added support for EMR 6.10.

Engineering Task

  • [#15626] - Updated Jira links in H2O Flow UI with GH issue links.

Docs

  • [#15629] - Fixed typo on Hadoop introduction page.
  • [#15606] - Updated major release blog for user guide.
  • [#15580] - Added information on UniformRobust method for histogram_type and created an accompanying blog post.
  • [#15563] - Updated out of date copyright year in user guide and python guide.
  • [#6574] - Added a warning to Infogram user guide that it should not be used to remove correlated columns.
  • [#6554] - Updated nfolds parameter description for AutoML in Python guide.

Security

  • [#15634] - Addressed CVE-2019-10086 by upgrading MOJO2 lib.

3.42.0.1 - 6/21/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-3.42.0/1/index.html

Bug Fix

  • [#15423] - Fixed Infogram cross-validation with weights.
  • [#15482] - Updated R package maintainer.
  • [#15461] - Fixed leaks in GLM’s Negative Binomial estimation.

Improvement

  • [#6843] - Changed warning tag to info tag when weights are not provided during validation/test dataset scoring when weights are present in training.
  • [#6828] - Removed support for Python 2.7 and 3.5.
  • [#6813] - Upgraded the default parquet library to 1.12.3 for standalone jar.
  • [#7630] - Upgraded XGBoost to version 1.6.1.

New Feature

  • [#6548] - Implemented AIC metric for all GLM model families.
  • [#6880] - Implemented Tweedie variance power maximum likelihood estimation for GLM.
  • [#6943] - Added ability to convert H2OAssembly to a MOJO2 artifact.
  • [#7008] - Implemented new Decision Tree algorithm.

Docs

  • [#15474] - Added link to AutoML Wave app from AutoML user guide.
  • [#15550] - Added documentation on H2OAssembly to MOJO 2 export functionality.
  • [#15602] - Added algorithm page in user guide for new Decision Tree algorithm.
  • [#15529] - Added AIC metric support for all GLM families to GLM user guide page and GLM booklet.
  • [#15466] - Updated authors and editors for GLM booklet.
  • [#6884] - Added documentation on Tweedie variance power maximum likelihood estimation to GLM booklet and user guide.
  • [#7200] - Improved user guide documentation for Generalized Additive Models algorithm.

Security

  • [#15594] - Addressed CVE-2023-2976 in h2o-steam.jar.
  • [#15548] - Addressed CVE-2020-29582 in h2o-steam.jar.
  • [#15546] - Addressed CVE-2023-26048 and CVE-2023-26049 by upgrading Jetty for minimal and steam jar.
  • [#15540] - Addressed PRISMA-2023-0067 in h2o-steam.jar.
  • [#6827] - Addressed CVE-2023-1436, CVE-2022-45693, CVE-2022-45685, and CVE-2022-40150 by upgrading org.codehaus.jettison:jettison in h2o-steam.jar.

Kurka (3.40.0.4) - 4/28/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zz_kurka/4/index.html

Bug Fix

  • [#6758] - Fixed the deprecation warning thrown for Python 2.7 and 3.5.

Improvement

  • [#6756] - Added official support for Python 3.9.

Docs

  • [#6759] - Removed mention of support for Python 2.7 and Python 3.5 from documentation.
  • [#7600] - Reorganized supervised and unsupervised algorithm parameters by algorithm-specific, common, and shared-tree (for tree-based algorithms). Updated parameter descriptions for all supervised and unsupervised algorithms. Shifted all shared GLM family parameters to the GLM algorithm page.

Security

  • [#6732] - Addressed CVE-2023-1370 by removing the vulnerability from h2o-steam.jar.

Kurka (3.40.0.3) - 4/4/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zz_kurka/3/index.html

Improvement

  • [#6763] - Added GAM Knot Locations to Model Output.
  • [#6764] - Addressed CVE-2014-125087 in h2o-steam.jar

Engineering Story

  • [#6767] - Disabled execution of tests in client mode.
  • [#6772] - Deprecated support for Python 2.7 and 3.5.

Docs

  • [#6773] - Introduced a page describing MOJO capabilities.
  • [#6790] - Updated the DRF documentation page to reflect what dataset is used to calculate the model metric.
  • [#6793] - Updated and rearranged the hyper-parameter list in the Grid Search documentation page.

Kurka (3.40.0.2) - 3/9/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zz_kurka/2/index.html

Bug Fix

  • [#6818] - Fixed dependency on numpy in Fairness-related code.
  • [#6819] - Added ability to debug GBM reproducibility by looking at tree structure with equal_gbm_model_tree_structure.

Improvement

  • [#6995] - Fixed the deviance computation for GBM Poisson distribution.

New Feature

  • [#6777] - Added save_plot_path parameter for Fairness plotting allowing you to save plots.

Task

  • [#6538] - Implemented incremental MaxRSweep without using sweep vectors.
  • [#6799] - Removed duplicate predictors for ModelSelection’s MaxRSweep.

Engineering Story

  • [#6776] - Pointed MLOps integration to internal.dedicated environment.

Docs

  • [#6501] - Added warning that max_runtime_secs cannot always produce reproducible models.
  • [#6503] - Added example for how to save a file as a parquet.
  • [#6811] - Added example for how to connect to an H2O cluster by name.
  • [#6887] - Added information on the implementation of the eval_metric for XGBoost.

Kurka (3.40.0.1) - 2/8/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zz_kurka/1/index.html

Bug Fix

  • [#6845] - Improved GLM negative binomial calculation time.
  • [#6882] - Cleaned up COLLATE field in the description of the R package by allowing Roxygen2 to generate the COLLATE field.
  • [#6891] - Changed the exceptions in Stacked Ensembles checks to ModelBuilder warnings.
  • [#7090] - Fixed GLM ignoring time budget when trained using cross-validation in AutoML.
  • [#7132] - Fixed incorrect actual ntrees value reported in tree-based models.

Improvement

  • [#6805] - Increased speed of XGBoost scoring on wide datasets.
  • [#6864] - Updated error message for when a user specifies the wrong cluster when connecting to a running H2O instance.
  • [#6886] - Improved memory usage in creation of parse-response for wide datasets.
  • [#6893] - Increased testing speed by adding ability to train XGBoost cross-validation models concurrently on the same GPU.
  • [#6900] - Added ability to score eval_metric on validation datasets for XGBoost.
  • [#6901] - Added notebook demonstrating eval_metric for XGBoost.
  • [#6902] - Increased XGBoost model training speed by disabling H2O scoring to rely solely on eval_metric.
  • [#6910] - Updated to Java 17 from Java 11/openjdk in H2O docker images.
  • [#7294] - Updated warning message for when H2O version is outdated.
  • [#7598] - Introduced a better format for storing default, input, and actual parameters in H2O model objects for R by using @params slots.
  • [#7835] - Added model_summary to Stacked Ensembles.
  • [#7980] - Moved StackedEnsembleModel::checkAndInheritModelProperties to StackedEnsemble class.

New Feature

  • [#6858] - Added ability to publish models to MLOps via Python API.
  • [#7009] - Added ability to grid over Infogram.
  • [#7044] - Implemented Regression Influence Diagnostics for GLM.
  • [#7045] - Enhanced GBM procedures to output which records are used for each tree.
  • [#7537] - Added learning curve plot to H2O’s Explainability.

Task

  • [#6802] - Added negative_log_likelihood and average_objective accessor functions in R and Python for GLM.
  • [#7088] - Limited the number of iterations when training the final GLM model after cross-validation.

Technical Task

  • [#6898] - Added support for scoring eval_metric on a validation set for external XGBoost cluster.
  • [#6899] - Added support for scoring eval_metric on a validation set for internal XGBoost cluster.
  • [#7012] - Implemented GLM dispersion estimation parameter using maximum likelihood method for the negative binomial family.

Docs

  • [#6820] - Highlighted information about how rebalancing makes reproducibility impossible.
  • [#6815] - Added documentation on the negative_log_likelihood and average_objective accessor functions.
  • [#6816] - Added information on GLM dispersion estimation using maximum likelihood method for the negative binomial family.
  • [#6821] - Added documentation on Regression Influence Diagnostics for GLM.
  • [#6803] - Fixed non-functional data paths in code examples throughout the user guide.
  • [#6804] - Added information on the row_to_tree_assignment function.
  • [#6807] - Added documentation on using H2O with Apple M1 chip.
  • [#6808] - Added information on init parameter being skipped due to estimate_k=True for K-Means.

Zygmund (3.38.0.4) - 1/5/2023

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zygmund/4/index.html

Bug Fix

  • [#6851] - Fixed error in SHAP values report for DRF.
  • [#6865] - Fixed a ModelSelection replacement error stopping too early and implemented incremental forward step and incremental replacement step for numerical predictors.

Task

  • [#6852] - Resolved hyperparameters amongst the algorithms.
  • [#6857] - Removed redundant predictors found in mode=“backward” for ModelSelection.

Engineering Story

  • [#6846] - Renamed the docker image h2o-steam-k8s to h2o-open-source-k8s-minimal.

Docs

  • [#6800] - Updated download page by adding options for steam jar and python client without h2o backend.
  • [#6849] - Fixed log likelihood of negative binomial for GLM.
  • [#6855] - Added how users can force an unsupported Java version.
  • [#6856] - Fixed broken links on the H2O Release page.
  • [#6860] - Added information on how Isolation Forest and Extended Isolation Forest handle missing values.
  • [#6862] - Fixed typos and made examples work on performance-and-prediction.html.
  • [#6863] - Removed outdated roadmap from Readme file.

Security

  • [#6794] - Addressed CVE-2022-3509 by upgrading google-cloud-storage.

Zygmund (3.38.0.3) - 11/23/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zygmund/3/index.html

Bug Fix

  • [#6877] - Enforced DkvClassLoader while accessing Python resources through JythonCFuncLoader.
  • [#6878] - Closed open file descriptors from H2OConnection.
  • [#6871] - Fixed incorrect value indicator for a partial dependence plot for its current row.
  • [#6873] - Fixed GBM model with interaction_constraints only building single-depth trees.
  • [#6897] - Fixed slow estimator validation when training model with wide datasets.
  • [#6907] - Fixed GAM failure when numknots=2 for I-spline.

Task

  • [#6896] - Ensured non-negative will not overwrite splines_non_negative for GAM I-spline.
  • [#6921] - Implemented p-value calculation for GLM with regularization.
  • [#6925] - Verified the minimum number of knots each spline type can support for GAM.
  • [#6926] - Implemented normal (non-monotonic) splines that can support any degrees.

Docs

  • [#6874] - Updated compute_p_value documentation for GLM and GAM to reflect that p-values and z-values can now be computed with regularization.
  • [#6875] - Documented GAM M-splines.
  • [#6876] - Updated site logo, favicon, and color scheme to reflect H2O’s brand kit.
  • [#6870] - Updated booklet links for GBM, GLM, and Deep Learning on their respective algorithm pages.
  • [#6881] - Fixed typo in Model Selection for build_glm_model parameter.
  • [#6885] - Updated links in the provided bibliography in the FAQ.
  • [#6894] - Removed Sparkling Water booklet link from the download page.
  • [#6904] - Added optional Python plotting requirement matplotlib to the download page.

Zygmund (3.38.0.2) - 10/27/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zygmund/2/index.html

Bug Fix

  • [#6895] - Fixed H2ODeepLearningEstimator autoencoder not working without y value.
  • [#6911] - Added libgomp into docker images thus enabling XGBoost multithreading.
  • [#6919] - Stopped throwing warning about jobs not having proper model types when models weren’t even trained.
  • [#6928] - Fixed cross validation failure for concurrent sorting.
  • [#6930] - Enabled parallelism in cross validation for Isotonic Regression.

Task

  • [#6917] - Enabled GAM I-spline to support increasing and decreasing functions.
  • [#6925] - Updated the number of knots required for GAM I-splines to be >=2.
  • [#6949] - Improved ModelSelection’s mode=“maxrsweep” runtime.

Docs

  • [#6890] - Added information on ModelSelection’s new build_glm_model parameter for mode=“maxrsweep”.
  • [#6903] - Fixed incorrect header case on ModelSelection and Cox Proportional Hazards algorithm pages in the user guide.
  • [#6913] - Added an example to Variable Inflation Factors in the user guide.
  • [#6917] - Fixed broken links on the “Welcome to H2O-3” page of the user guide.
  • [#6948] - Added model explainability for plotting SHAP to the “Performance and Prediction” page of the user guide.
  • [#7442] - Added examples for varsplits() and feature_frequencies() to Python documentation.

Security

  • [#6889] - Addressed CVE-2022-42003 and CVE-2022-42889 security issues through Library upgrades.

Zygmund (3.38.0.1) - 9/19/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zygmund/1/index.html

Bug Fix

  • [#6937] - Fixed the sorting of h2o.make_leaderboard.
  • [#6940] - Fixed H2O dependencies overriding Jetty implementation.
  • [#6951] - Fixed Flow’s export Frame throwing an NPE because it doesn’t provide a file type.
  • [#6959] - Fixed GLM ordinal generic metrics to provide missing information in the payload.
  • [#6960] - Fixed “maxrsweep” NPE in ModelSelection thrown when the replacement step stopped too early.
  • [#6961] - Fixed “maxrsweep” replacement bug in ModelSelection by updating the implementation method.
  • [#6973] - Fixed unnecessary transformations in the scikit-learn wrapper by using model performance API.
  • [#6979] - Fixed upload of big files in Sparkling Water deployment.
  • [#6983] - Changed the error message that GLM does not support contributions.
  • [#6985] - Fixed QuantilesGlobal histogram type failing in GBM when all columns were categorial.
  • [#7002] - Added support for MapR 6.2 to fix the error caused by updating the cluster.
  • [#7006] - Fixed large file upload in Python.
  • [#7023] - Fixed inability to stop print out of model information in Python.
  • [#7056] - Removed -seed variable hiding in GAM.
  • [#7104] - Updated h2o.upload_mojo to also work for POJO.
  • [#7432] - Added unsupported operation exception when trying to use SHAP summary plot when building DRF model with binomial_double_trees.
  • [#8542] - Refactored the rendering logic in the Python client.
  • [#10436] - Added xval argument to h2o.confusionMatrix in R.

Improvement

  • [#6933] - Added support for calibrating an already trained model manually.
  • [#6941] - Added support for using Isotonic Regression for model calibration.
  • [#6942] - Added ability to S3A allowing it to share the built-in AWS credential providers.
  • [#6947] - Improved configure_s3_using_s3a allowing it to be usable in any deployment.
  • [#6982] - Updated train_segments function in R to be independent of camel casing in the algorithm name.
  • [#6986] - Improved runtime for QuantilesGlobal histogram by using exact split-points for low-cardinality columns.
  • [#6992] - Exposed the Sequential Walker for R/Python and added option to disable early stopping.
  • [#7007] - Cleaned up Key API by removing replicas.
  • [#7340] - Cleaned up the default output after training a model.
  • [#7510] - Exposed calibrated probabilities in mojo_predict_pandas.

New Feature

  • [#6950] - Simplified the configuration of S3 for Frame exportation.
  • [#6984] - Added train_segments test for Isolation Forest.
  • [#6991] - Added ability to h2o.no_progress in R allowing it to accept expressions.
  • [#7011] - Implemented dispersion parameter estimation for GLM.
  • [#7016] - Added ability to export H2O Frame to a Parquet.
  • [#7076] - Added Pareto front plots to AutoML Explain.
  • [#7091] - Added “deviance” method to dispersion for calculating p-values.
  • [#7093] - Implemented variable inflation factors for GLM.
  • [#7192] - Implemented in-training checkpoints for GBM.
  • [#8005] - Implemented support for interactions to MOJO for CoxPH.
  • [#12152] - Added h2o.make_leaderboard function which scores and compares a set of models to AutoML.

Task

  • [#6927] - Secured XGBoost connections in multinode environments.
  • [#6948] - Added missing added predictor and deleted predictor to the result frame and model summary of ModelSelection.
  • [#6952] - Added support allowing you to force GLM to build a null model where the model only returns the coefficients for the intercept.
  • [#6953] - Added support allowing GLM gamma to fix the dispersion parameter to calculate p-values.
  • [#6998] - Implemented “maxr” speedup for Modelselection by introducting “maxrsweep”.
  • [#7013] - Implemented dispersion factor estimation using maximum likelihood for GLM gamma family.

Docs

  • [#6920] - Added documentation on Isotonic Regression.
  • [#6931] - Added variable inflation factors to GLM section of the user guide.
  • [#6932] - Added Tweedie dispersion parameter estimation to the GLM section of the user guide.
  • [#6934] - Added confusion matrix calculation explanation to performance and prediction.
  • [#6939] - Added get_predictors_removed_per_step() and get_predictors_added_per_step() examples to ModelSelection.
  • [#6945] - Added use case section to the welcome page of the user guide.
  • [#6987] - Added MOJO import/export information to each algorithm page.
  • [#7048] - Added major release blogs to user guide and moved change log to top of the sidebar.

Zumbo (3.36.1.5) - 9/15/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/5/index.html

Security

  • Addressed security vulnerability CVE-2021-22569 in the h2o.jar.

Zumbo (3.36.1.4) - 8/3/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/4/index.html

Bug Fix

  • [#6954] - Disabled partial_plot for Uplift DRF temporarily.
  • [#6958] - Added support for predicting with Autoencoder when using eigen encoding.
  • [#6962] - Fixed XGBoost failure with enabled cross validation in external cluster mode by explicitly starting external XGBoost before cross validation.

Security

  • [#6946] - Addressed security vulnerabilities CVE-2021-22573 and CVE-2019-10172 in Steam assembly.

Zumbo (3.36.1.3) - 7/8/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/3/index.html

Bug Fix

  • [#6975] - Fixed CoxPH MOJO ignoring offset column.
  • [#6976] - Fixed the incorrect predictions from the CoxPH MOJO on categorical columns.
  • [#6977] - Fixed the View button not working after completing an AutoML job.
  • [#6989] - Fixed num_of_features not being used in call for varimp_heatmap().
  • [#7015] - Fixed GAM’s fold_column being treated as a normal column to score for.
  • [#7053] - Updated GBM cross validation model summary tables to reflect that some trees are removed due to a better score occurring with fewer trees.
  • [#7140] - Fixed fit_params passthrough for scikit-learn compatibility.
  • [#7718] - Fixed validateWithCheckpoint to work with default parameter settings.
  • [#7719] - Fixed validateWithCheckpoint to work with parameters that are arrays.

Improvement

  • [#6990] - Added expert option to force-enable MOJO for CoxPH even when interactions are enabled.
  • [#7060] - Makes language rules generation on demand and introduced “EnumLimited” option for categorical encoding.

New Feature

  • [#6968] - Added transform_frame for GLRM allowing users to obtain the new X for a new data set.
  • [#6974] - Added support for numerical interactions in CoxPH MOJO.

Docs

  • [#6965] - Fixed the uplift_metric documentation for Uplift DRF.
  • [#6967] - Added transform_frame to GLRM documentation.
  • [#6970] - Added mode = “maxrsweep” to ModelSelection documentation.
  • [#6971] - Corrected the R documentation on R^2.
  • [#6988] - Updated supported MOJO list to include GAM MOJO import.

Security

  • [#6978] - Fixed security issue in genmodel (CVE-2022-25647).

Zumbo (3.36.1.2) - 5/26/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/2/index.html

Bug Fix

  • [#6999] - Refactored Uplift DRF methods.
  • [#7001] - Removed duplicate runs in MaxR.
  • [#7014] - Fixed the ambiguity check in Explain’s consolidate varimp.
  • [#7020] - Improved efficiency of pd_plot and ice_plot and made rug optional.
  • [#7021] - Fixed H2O failing with null pointer exception when providing an improper -network to h2o.jar.
  • [#7022] - Fixed external XGBoost on K8s.
  • [#7042] - Fixed failing concurrent-running GLM training processes.
  • [#7283] - Fixed missing time values SHAP summary plot error.
  • [#7732] - Fixed Partial Dependence Plot’s date/time handling from explainability modules.

New Feature

  • [#7004] - Updated Uplift DRF API.

Docs

  • [#7000] - Added model_summary examples for GLM.
  • [#7010] - Updated incorrect formula in GLM booklet.
  • [#7219] - Updated Python Module documentation readability.

Zumbo (3.36.1.1) - 4/13/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/1/index.html

Bug Fix

  • [#7035] - Fixed Residual Analysis plot flipping the residual calculation.
  • [#7040] - Added more detailed exception when disconnected due to error caused by Rcurl.
  • [#7047] - Made R client attempt to connect to curl package instead of Rcurl package first.
  • [#7055] - Ensures GLM models fail instead of throwing warnings when beta_contraints and `non_negative are used with multinomial or ordinal families.
  • [#7057] - Fixed how cv_computeAndSetOptimalParameters deals with multiple alpha and lambda values across different folds.
  • [#7058] - Increased MaxR running speed.
  • [#7068] - Fixed getGLMRegularizationPath erroring out when standardize = False.
  • [#7194] - Fixed Keystore not generating on Java 16+.
  • [#7634] - Added a num_of_features argument to h2o.varimp_heatmap to limit the number of displayed variables.
  • [#12130] - Fixed cross_validation_metrics_summary not being accessible for Stacked Ensemble.

Improvement

  • [#7029] - Improved AUUC result information in Uplift DRF by adding information on number of bins.
  • [#7038] - Replaced class() with inherits() in R package.
  • [#7039] - Fixed invalid URLs in R Package.
  • [#7051] - Added normalized AUUC to Uplift DRF.
  • [#7059] - Sped-up AutoML by avoiding sleep-waiting.
  • [#7061] - Removed Stacked Ensembles with XGB metalearner to increase speed.
  • [#7062] - Ensures AutoML reproducibility when max_models is used.
  • [#7135] - Updated AutoML default leaderboard regression sorting to RMSE.

New Feature

  • [#7036] - Bundled several basic datasets with H2O for use in examples.
  • [#7050] - Added h2o.jar assembly for secure Steam deployments and excluded PAM authentication from minimal/Steam builds.
  • [#7054] - Added ability to ingest data from secured Hive using h2odriver.jar in standalone.
  • [#7073] - Bundled KrbStandalone extension in h2odriver.jar.
  • [#7085] - Implemented new method for defining histogram split-points in GBM/DRF designed to address outlier issues with default UniformAdaptive method.
  • [#7092] - Added ability to reorder frame levels based on their frequencies and to relieve only topN levels for GLM.
  • [#7094] - Added a function to calculate predicted versus actual response in GLM.
  • [#7258] - Implemented MOJO for Extended Isolation Forest.
  • [#7261] - Added monotone splines to GAM.
  • [#7271] - Added a plot function for gains/lift to R and Python.
  • [#7285] - Added ability to acquire metric builder updates for Sparkling Water calculation without H2O runtime.
  • [#7664] - Added support for interaction_constraints to GBM.
  • [#7785] - Exposed distribution parameter in AutoML

Task

  • [#7078] - Decoupled Infogram and XGBoost removing Infograms reliance on XGBoost to work.
  • [#7080] - Verified GLM binomial IRLSM implementation and p-value calculation.
  • [#7089] - Added private ModelBuilder parameter to AutoML to enforce the time budget on the final model after cross-validation.

Sub-Task

  • [#7069] - Made Ice plot functionalities also available on pd_plot.
  • [#7160] - Added option to normalize y-axis values.
  • [#7163] - Added option to display logodds for binary models for Ice plots.
  • [#7164] - Added ability to save final graphing data to a frame for Ice plots.
  • [#7165] - Added option to specify a grouping variable for Ice plots.
  • [#7166] - Shows original observation values as points on the line for Ice plots.
  • [#7167] - Added option to toggle PDP vs Ice lines on or off.

Docs

  • [#7034] - Added documentation on the monotone spline for GAM.
  • [#7027] - Added links to the Additional Resources page to the sites where users can ask questions.
  • [#7031] - Updated the examples for the Residual Analysis Plot.
  • [#7037] - Updated the K8s deployment tutorial.
  • [#7043] - Improved Uplift DRF User Guide documentation.
  • [#7049] - Shifted the links from the H2O-3 docs page to the User Guide “Additional Resources” page.
  • [#7066] - Fixed MOJO importable/exportable table in User Guide.
  • [#7067] - Added a note that MOJOs won’t build if interactions are specified.
  • [#7070] - Added information on how H2O handles date columns.
  • [#7075] - Fixed code typos on Admissible ML page in User Guide.
  • [#7074] - Added information on the -hdfs_config tag.

Zorn (3.36.0.4) - 3/30/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zorn/4/index.html

Bug Fix

  • [#7046] - Fixed logic operations error in R package.
  • [#7052] - Clarified that enum and eigen categorical_encoding values do not work for XGBoost.

Improvement

  • [#7177] - Added the Qini value metric to Uplift DRF.

Docs

  • [#7157] - Added information on the make_metrics command to the Performance and Prediction section of the User Guide.

Zorn (3.36.0.3) - 2/16/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/index.html

Bug Fix

  • [#7098] - Fixed S3 file downloads not working by adding aws_java_sdk_sts as a dependency of H2O persist S3.
  • [#7102] - Added note to GBM, DRF, IF, and EIF that build_tree_one_node=True does not work with current release.
  • [#7108] - Extended AWS default credential chain instead of replacing it.
  • [#7112] - Fixed import failures for URLs longer than 152 characters.
  • [#7113] - Fix AutoML ignoring verbosity setting.
  • [#7138] - Fixed Huber distribution bug for deviance.

Improvement

  • [#7187] - Removed “H2O API Extensions” from h2o.init() output.

Docs

  • [#7096] - Corrected typos and inconsistencies in Admissible ML documentation.
  • [#7119] - Updated copyright year in documentation.
  • [#7128] - Clarified feasible intervals for tweedie power.
  • [#7196] - Clarified Java requirements when running H2O on Hadoop.

Zorn (3.36.0.2) - 1/25/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zorn/2/index.html

Bug Fix

  • [#7125] - Updated XGBoostMojoModel to only consider the number of built trees, not the value of ntrees.
  • [#7126] - Fixed issue in AutoEncoder’s early stopping automatic selection by setting AUTO = MSE instead of deviance.
  • [#7133] - Fixed MOJO imports to retain information on weights column.
  • [#7143] - Fixed XGBoost errors on Infogram by improving support for XGBoost.
  • [#7145] - Fixed MOJO import automatically re-using original Model ID for current release cycle.
  • [#7149] - Fixed import of Parquet files from S3.
  • [#7155] - Fixed h2o.group_by warning present in documentation example caused by function only reading the first column when several are provided.
  • [#7174] - Added check to ensure that a model supports MOJOs to prevent production of bad MOJOs.
  • [#7179] - Fixed Python warnings before model training when training with offset, weights, and fold columns.
  • [#7181] - Fixed MOJO upload in Python.
  • [#7201] - Fixed error in uploading pandas DataFrame to H2O by enforcing uft-8 encoding.
  • [#7273] - Customized FormAuthenticator to use relative redirects.

Improvement

  • [#7141] - Removed numpy dependency for Infogram.

New Feature

  • [#7232] - Added backward selection method for ModelSelection.

Task

  • [#7123] - Added support to PredictCsv for testing concurrent predictions.

Docs

  • [#7136] - Added backward mode documentation to ModelSelection.
  • [#7194] - Updated Kubernetes Headless Service and StatefulSet documentation.

Zorn (3.36.0.1) - 12/29/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zorn/1/index.html

Bug Fix

  • [#7214] - Fixed differences in H2O’s random behavior across Java versions by disabling Stream API in this task.
  • [#7247] - Fixed CoxPH summary method in Python to return H2OTwoDimTable.
  • [#7273] - Fixed form authentication not working by enforcing relative redirects in Jetty.
  • [#7888] - Fixed exception raised in K-Means when a model is built using nfolds by disabling centroid stats for Cross-Validation.

Improvement

  • [#7188] - Removed ymu and rank visibility from FlowUI.
  • [#7209] - Exposed lambda in Rulefit to have better control over regularization strength.
  • [#7217] - Implemented sequential replacement method with ModelSelection.
  • [#7222] - Improved rule extraction from trees in RuleFit.
  • [#7240] - Improved exception handling in AutoML and Grids to prevent model failure.
  • [#7395] - Ensured Infogram uses validation frame and cross-validation when enabled.
  • [#8096] - Added dynamic stacking metalearning strategy for Stacked Ensemble in AutoML.

New Feature

  • [#7246] - Added support and rule coverage to RuleFit.
  • [#7268] - Added support for importing GAM MOJO.
  • [#7279] - Added a convenience tool that converts MOJO to POJO from the command line.
  • [#7280] - Added support allowing users to modify floating point representation in POJO.
  • [#7287] - Added experimental support for importing POJO for in-H2O scoring.
  • [#7316] - Added official support for Java 16 and 17.
  • [#7323] - Added Java 16 and 17 to the cluster.
  • [#7333] - Added a compatibility K8s module that allows older versions of H2O to run on K8s.
  • [#7447] - Added ability to convert MOJO to POJO for tree models.
  • [#7515] - Added support enabling users to configure S3 with S3A configuration.
  • [#7574] - Implemented the Infogram model.
  • [#11818] - Implemented the Uplift DRF algorithm.

Task

  • [#7322] - Upgraded to Gradle 7 to support Java 16+.
  • [#7430] - Added R API for Infogram.

Docs

  • [#7212] - Added documentation on Infogram to the User Guide.
  • [#7279] - Added documentation on ModelSelection to the User Guide.
  • [#7275] - Added notebook on floating point issue for POJO and FAQ documentation on POJO split points.
  • [#7329] - Fixed bullet list formatting issues.
  • [#7742] - Updated R Reference Guide list.

Zizler (3.34.0.8) - 1/13/2022

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/8/index.html

Bug Fix

  • [#7148] - Fixed MOJO import automatically re-using original Model ID.

Security

  • [#7147] - Upgraded to log4j 2.17.1.

Zizler (3.34.0.7) - 12/21/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/7/index.html

Security

  • Fixed CVE-2021-45105 log4j vulnerability.

Zizler (3.34.0.6) - 12/15/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/6/index.html

Security

  • Fixed CVE-2021-45046 log4j vulnerability.

Zizler (3.34.0.5) - 12/13/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/5/index.html

Bug Fix

  • [#7213] - Fixed permutation variable importance to correctly work with weights.
  • [#7234] - Fixed data removal issue in GAM caused by fitting two different models on the same DataFrame.

Improvement

  • [#7233] - Added coef() and coef_norm() functions to MaxRGLM.
  • [#7251] - Added ability that labels observations that match rules in Rulefit.
  • [#7262] - Updated parquet parser to handle dates allowing H2O import_file() to import date columns from Spark DataFrame.
  • [#7276] - Consolidated Rulefit rules to remove unnecessary splits.
  • [#7439] - Improved the efficiency of job polling in AutoML.
  • [#7474] - Deduplicated Rulefit rules in post-processing step.

New Feature

  • [#7235] - Added option to mimic the “ActiveProcessorCount” for older JVMs.

Task

  • [#7227] - Added warning in GLRM for when users set model_id and representation_name to the same string to help avoid a collision of model and frame using the same key.
  • [#7239] - Added rank and ymu model outputs to GLM.

Docs

  • [#7210] - Added link to the Change Log in the User Guide index.
  • [#7218] - Updated parameter list for MaxRGLM and outlined that MaxRGLM only support regression.
  • [#7228] - Updated MaxRGLM examples to use new functions coef(), coef_norm(), and result().
  • [#7236] - Added examples in R/Python on how to get reproducibility information.
  • [#7296] - Fixed local build warnings for Python Module documentation.

Security

  • Upgraded to log4j 2.15.0 to address vulnerability CVE-2021-44228.

Zizler (3.34.0.4) - 11/17/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/4/index.html

Bug Fix

  • [#7252] - Fixed broken weights_column in GAM.
  • [#7256] - Fixed printing a DRF model when there are no out-of-bag samples.
  • [#7259] - Fixed the pyunit_PUBDEV_5008_5386_glm_ordinal_large.py test from failing.
  • [#7263] - Fixed AutoML XGBoost learn_rate search step.
  • [#7265] - Ensured that jobs are rendered correctly in Flow and that AutoML internal jobs can be monitored without crashing on the backend.
  • [#7269] - Fixed gam_columns failure in the pyunit_PUBDEV_7185_GAM_mojo_ordinal.py test.
  • [#7290] - Outlined that tree_method=“approx” is not supported with col_sample_rate or col_sample_by_level in XGBoost.
  • [#7517] - Fixed multinomial classification in Rulefit.
  • [#7681] - Fixed inconsistencies in GLM beta_constraints.
  • [#7738] - Enabled ability to provide metalearner parameters for NaiveBayes and XGBoost.

Improvement

  • [#7358] - Added a custom model ID parameter to MOJO importing/uploading methods through R/Python API and if a custom model ID is not specified, the default model ID is propagated as the models name from the MOJO path.
  • [#7361] - Added warning for users who accidentally build a regression model when attempting building a binary classification model because they forgot to convert their target to categorical.
  • [#7372] - Tuned scale_pos_weight parameter for XGBooost in AutoML for imbalanced data.

New Feature

  • [#7274] - Added saving parameters to plot functions.

Task

  • [#7250] - Added GAM training/validation metrics.
  • [#7264] - Ensured H2O-3 builds with pip version >= 21.3.
  • [#7311] - Added result frame to MAXRGLM.

Docs

  • [#7267] - Localized MOJO support list for all the H2O-3 algorithms.
  • [#7278] - Added Gains/Lift documentation to the Performance and Prediction section of the User Guide.
  • [#7288] - Corrected metric in the Performance and Prediction “Sensitive to Outliers” section of the User Guide.
  • [#7377] - Clarified that asnumeric() converted ‘enum’ columns to underlying factor values and highlighted correct transformation approach.

Zizler (3.34.0.3) - 10/7/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/3/index.html

Bug Fix

  • [#7291] - Fixed user login from key tab in standalone on Kerberos.
  • [#7300] - Improved error messages in Explain module by making the errors clearer.
  • [#7307] - Fixed H2OTable colTypes in Grid’s summary table.
  • [#7308] - Fixed infinite loop in hex.grid.HyperSpaceWalker.RandomDiscreteValueWalker.
  • [#7317] - Fixed AutoML ignoring optional Stacked Ensembles.
  • [#7319] - Fixed NPE thrown in AutoML when XGBoost is disabled/not available.
  • [#7320] - Fixed CRAN install.
  • [#8458] - Improved XGBoost API to ensure both col_sample_rate and colsample_bylevel (and other XGBoost parameters aliases) are set correctly.
  • [#7375] - Fixed NPE thrown for ModelJsonReader.findINJson for cases when path does not exist.

Improvement

  • [#7314] - Exposed AutoML get_leaderboard as a method in Python.
  • [#7315] - Improved Python client by printing the stacktrace in case of ServerError allowing users to report informative issues for investigation.
  • [#7350] - Enhanced tests by testing the case through all encodings.

Task

  • [#7312] - Updated ANOVA GLM to save model summary as a frame.
  • [#7326] - Added GLM offset column support to GLM MOJO.

Docs

  • [#7302] - Updated the R/Python AutoML documentation parameters to match the descriptions in the User Guide.
  • [#7304] - Removed GLM from balance_classes parameter appendix page in the User Guide.
  • [#7309] - Updated the asfactor procedure documentation to show multiple column usage.

Zizler (3.34.0.1) - 9/14/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zizler/1/index.html

Bug Fix

  • [#7330] - Fixed matplotlib 3.4 compatibility issues with partial_plot.
  • [#7339] - Deprecated is_supervised parameter for h2o.grid method in R.
  • [#7341] - Fixed AutoML NPE by ensuring that models without metrics are not added to the leaderboard.
  • [#7360] - Redistributed the time budget for AutoML.
  • [#7365] - Fixed and reorganized the H2O Explain leaderboard and fixed the confusion matrix.
  • [#7366] - Decreased the number of displayed features in the heatmap for AutoML inside H2O Explain.
  • [#7378] - Fixed NPE raised from weight_column not being in the training model.
  • [#7380] - Fixed the weight=0 documentation change error.
  • [#7383] - Fixed failing rotterdam tests.
  • [#7387] - Fixed GAM NPE from multiple runs with knots specified in a frame.
  • [#8458] - Fixed col_sample_rate not sampling for XGBoost when set to a value lower than 1.0.
  • [#7396] - Fixed wrong column type on MOJO models for Cross-Validation Metrics Summary.
  • [#7408] - Prevented R connect from starting H2O locally.
  • [#7420] - Added StackedEnsembles to AutoML’s time budget to prevent unexpected training times.
  • [#7441] - Fixed the failing pyunit_scale_pca_rf.py test.
  • [#7475] - Improved AutoML behavior when multiple instances are created in parallel.
  • [#7787] - Solved corner cases involving mapping between encoded varimps and predictor columns for H2O Explain by making the varimp feature consolidation more robust.

Improvement

  • [#7381] - Ensured that AutoML uses the entire time budget for max_runtime.
  • [#7455] - Implemented custom progress widgets for Wave apps using H2O-3.
  • [#7461] - Allowed users to convert floats to doubles with PrintMojo to prevent possible parsing issues.
  • [#7465] - Updated GBM cross validation with early_stopping to use ntrees that produce the best score.
  • [#7466] - Enabled print_mojo to produce .png outputs.
  • [#7470] - Updated Python API for all algorithms and AutoML to retrieve the trained model or leader.
  • [#7476] - Removed algorithm-specific logic from base classes.
  • [#7478] - Added support for scoreContributions for imported MOJOs in Java.
  • [#7480] - Exposed AutoML args as writeable properties until first called to train.
  • [#7482] - Updated XGBoost print_mojo to now output weights.
  • [#7498] - Removed the Python client dependency on colorama.
  • [#7504] - Added the parameters and their default values to the _init_ function of the Py code generator.
  • [#7535] - Reduced the workspace of the validation frame in GBM by sharing it with the training frame in cross validation.
  • [#7564] - Slightly reduced precision of predictions stored in holdout frames to significantly save on memory.
  • [#7633] - Removed warning in the Stacked Ensemble prediction function about missing fold_column frame.
  • [#7690] - Enabled returning data from Explain’s varimp_heatmap and model_correlation_matrix.
  • [#7708] - Exposed the top n and bottom n reason codes in Python/R and MOJO.
  • [#12171] - Fixed nightly build version mismatch that prevented the H2OCluster timezone being set to America/Denver.

New Feature

  • [#7336] - Implemented a java-self-check to allow users to run on latest Java.
  • [#7343] - Sped up GBM by optimizing the building of histograms.
  • [#7368] - Added a warning to the TreeSHAP reweighting feature if there are 0 weights and updated the API.
  • [#7418] - Added Maximum R Square Improvement (MAXR) algorithm to GLM.
  • [#7424] - Added warning for when H2O doesn’t have enough memory to run XGBoost.
  • [#7431] - Added the ability to specify a custom file name when saving a MOJO.
  • [#7448] - Added output version number of genmodel.jar when printing usage for PrintMojo.
  • [#7536] - Added MOJO to Rulefit.
  • [#7550] - Implemented ability to calculate Shapley values on a re-weighted tree.
  • [#7561] - Implemented H2O ANOVA GLM algorithm for GLM.
  • [#8283] - Improved and consolidated the handling of version mismatch between Python and Backend.
  • [#8500] - Implemented permutation feature importance for black-box models.
  • [#8501] - Implemented Extended Isolation Forest algorithm.
  • [#9260] - Added support for saving a model directly to S3.

Task

  • [#7363] - Fixed the time limits for the Merge/Sort benchmark.
  • [#7454] - Switched removed pandas as_matrix method to .values and exposed the interim pandas.DataFrame object.
  • [#7533] - Fixed S3 credential for pyunit_s3_model_save.py test.
  • [#7565] - Connected XGBoost aggregation functionality with sorting functionality.

Technical task

  • [#7449] - Replaced subsampling in Extended Isolation Forest.

Docs

  • [#7348] - Updated the AutoML FAQ.
  • [#7351] - Corrected the ignored_columns example.
  • [#7356] - Added RMarkdown, Jupyter Notebook, and HTML output example files to H2O Explain documentation.
  • [#7373] - Added Maximum R Improvements (MAXR) GLM documentation.
  • [#7392] - Added the loss function equations for each distribution and link type.
  • [#7405] - Updated the documentation about StackedEnsembles time constraints in AutoML.
  • [#7446] - Clarified that the Explain function only works for supervised models.
  • [#7471] - Added Examine Models section to AutoML documentation.
  • [#7484] - Added documentation for H2O ANOVA GLM algorithm.
  • [#7526] - Fixed the H2O Explain example in the documentation.
  • [#7596] - Updated and gathered Java links to a singular place in the User Guide.

Zipf (3.32.1.7) - 8/31/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/7/index.html

Bug Fix

  • [#7419] - Fixed predicting issues with imported MOJOs trained with an offset-column.
  • [#7406] - Fixed slow tree building by implementing a switch to turn off the generation of plain language rules.
  • [#7357] - Fixed potential NPE thrown by setting _orig_projection_array=[].
  • [#7346] - Fixed generic model deserialization.
  • [#7345] - Fixed predictions for splits NA vs REST with monotone constraints.

New Feature

  • [#7362] - H2O Standalone now uses log4j2 as the logger implementation.

Zipf (3.32.1.6) - 8/19/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/6/index.html

Bug Fix

  • [#7390] - Fixed the POJO mismatch from MOJO and in-H2O scoring for an unseen categorical value.
  • [#7393] - Simplified duplicated XGBoost parameters in Flow.
  • [#7414] - Fixed broken data frame conversion behavior.

Improvement

  • Added security updates.

New Feature

  • [#7371] - Exposed the scale_pos_weight parameter in XGBoost.

Task

  • [#7412] - Clarified the anomaly score formula used for score calculation within Isolation Forest and Extended Isolation Forest.

Docs

  • [#7553] - Added a note on memory usage when using XGBoost to User Guide.

Zipf (3.32.1.5) - 8/4/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/5/index.html

Bug Fix

  • [#7399] - Modified legacy Dockerfile to add a non-root user.
  • [#7400] - Fixed an issue where running java -jar h2o.jar -version failed.
  • [#7403] - Fixed an issue where monotone constraints in GBM caused issues when reproducing the model.
  • [#7407] - Fixed an issue that caused DRF to create incorrect leaf nodes due to rounding errors.
  • [#7409] - Fixed an issue that caused CoxPH MOJO import to fail.
  • [#7411] - Fixed an issue where categorical splits NAvsREST were not represented correctly.
  • [#7413] - Fixed GBM reproducibility for correlated columns with NAs.
  • [#7416] - Fixed h2odriver so that it no longer uses invalid GC options.
  • [#7423] - Fixed GenericModel predictions for non-AUTO categorical encodings.
  • [#7434] - Fixed H2O interaction outcomes.
  • [#7460] - When remove_collinear_columns=True, fixed an issue where the dimension of gradient and coefficients changed when predictors were removed.

Docs

  • [#7415] - Updated changelog format.

Zipf (3.32.1.4) - 7/8/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/4/index.html

Bug Fix

  • [#7427] - Fixed h2odriver invalid argument error on Java 11.
  • [#7429] - Fixed GLM GRADIENT_DESCENT_SQERR Solver validation.
  • [#7433] - Upgraded to latest version of Javassist (3.28).
  • [#7444] - Fixed H statistic gpu assertion error.
  • [#7456] - Fixed predict contributions failure in multi-MOJO environments.
  • [#7457] - Fixed bug in ordinal GLM class predictions.
  • [#7462] - Fixed Partial Dependent Plot not working with Flow.
  • [#7469] - Updated to current Python syntax.
  • [#7483] - Fixed bug in ordinal GLM class predictions.

Improvement

  • [#7509] - Added support for refreshing HDFS delegation tokens for standalone H2O.

New Feature

  • [#7540] - Obtained Friedman’s H statistic for XGBoost and GBM.

Task

  • [#7500] - Added a warning message when using alpha as a hyperparameter for GLM

Docs

  • [#7492] - Added section on how to delete objects in Flow.
  • [#7499] - Added a note to the productionizing docs that C++ is only available with additional support.

Zipf (3.32.1.3) - 5/19/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/3/index.html

Bug Fix

  • [#7514] - Fixed the printing for auc_pr and pr_auc in cross-validation summaries.

New Feature

  • [#7519] - Added parameter auc_type to performance method to compute multiclass AUC.

Task

  • [#7503] - Upgraded XGBoost predictor to 0.3.18.
  • [#7505] - Increased the timeout duration on the R package jar download.

Docs

  • [#7530] - Fixed formatting errors for local builds.
  • [#7558] - Updated docs examples for baseline hazard, baseline survival, and concordance.

Zipf (3.32.1.2) - 4/29/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/2/index.html

Bug Fix

  • [#7800] - Stacked Ensemble will no longer ignore a column if any base model uses it.
  • [#7791] - Added a user-friendly reminder that the new explainability functions require newer versions of `ggplot2` in R.
  • [#7698] - NullPointerException error no longer thrown when used a saved and reloaded RuleFit model.
  • [#7693] - Can now extract metrics from the validation dataset with a Rulefit Model.
  • [#7573] - Fixed failures from Stacked Ensemble with Multinomial GLM within tests.
  • [#7572] - Fixed AutoML error when an alpha array is used for GLM.
  • [#7570] - Fixed “Rollup not possible" stats failure in GLM.
  • [#7552] - H2O will now still start despite system properties that begin with ‘ai.h2o.’.
  • [#7551] - H2O exits without logging any buffered messages instead of throwing a NullPointerException when starting H2O with an invalid argument.
  • [#7549] - ModelDescriptor field in MOJO is now Serializable.
  • [#7547] - AutoML no longer crashes if model builder produces H2OIllegalArgumentException in the parameter validation phase.
  • [#7543] - Weights in GLM grid search is no longer used as features.
  • [#7529] - Fixed Stacked Ensemble MOJO for cases when sub-model doesn’t have the same columns as the metalearner.
  • [#7524] - Efron-method now fully deterministic in CoxPH.

Improvement

  • [#7562] - User now allowed to specify the escape character for parsing CSVs.
  • [#7557] - Added H2O reconnection script for intermittent 401 errors to R.
  • [#7548] - Added ‘ice_root’ error documented in FAQ.
  • [#7531] - Added further regularization to the GLM metalearner.

New Feature

  • [#9372] - Warning now issued against irreproducible model when early stopping is enabled but neither `score_tree_interva`l or `score_each_iteration` are defined.
  • [#7625] - Encrypted files that contain CSVs can now be imported.
  • [#7577] - Added guidelines for correct use of `remove_collinear_columns` for GLM.
  • [#7538] - Support added for CDP 7.2.

Docs

  • [#7582] - Added information about the `path` argument for exporting .xlsx files.

Zipf (3.32.1.1) - 3/25/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zipf/1/index.html

Bug Fix

  • [#9268] - GBM histograms now ignore rows with NA responses.
  • [#8032] - Variable Importances added to GLM Generic model.
  • [#7859] - Fixed the ArrayIndexOutOfBoundsException issue with GLM CV.
  • [#7816] - CoxPH performance no longer fails when a factor is used for the `event_column`.
  • [#7801] - Existing frame no longer overwritten when data with the same query is loaded.
  • [#7736] - Fixed how `gain` is calculated in XGBFI for GBM.
  • [#7711] - Improved the error messages for `save_to_hive_table`.
  • [#7685] - Added missing argument ’test’ for `h2o.explain_row()`.
  • [#7665] - All trees now supported for XGBoost Print MOJO in Java.
  • [#7657] - CoxPH `prediction` no longer fails when `offset_column` is specified.
  • [#7678] - Added keys for Individual Conditional Expectation (ICE) plot in H2OExplanation class.
  • [#7635] - `model@model$parameters$x` now reports actual feature names instead of `names`.
  • [#7632] - `h2o.explain` no longer errors when AutoML object is trained with a `fold_column`.
  • [#7603] - Fixed issues with python’s explanation plots not displaying fully.

New Feature

  • [#7933] - Ignored columns that are actually used for model training are unignored and no longer prevent model training to start in Flow.
  • [#7904] - Added baseline hazard function estimate to CoxPH model.
  • [#7891] - Target Encoding now supports feature interactions.
  • [#7837] - Added CoxPH concordance to both Flow and R/Python CoxPH summaries.
  • [#7821] - Added a `topbasemodel` attribute to AutoML.
  • [#7811] - Added new learning curve plotting function to R/Python.
  • [#7788] - Added script for estimating the memory usage of a dataset.
  • [#7784] - Added fault protections to grid search allowing saving of data and parameters, model checkpointing, and auto-recovery.
  • [#7761] - Added support for Java 15.
  • [#7673] - Added CDP7.1 support.
  • [#7666] - Added support for XGBoost to Print MOJO as JSON.
  • [#7627] - Added support for refreshing HDFS delegation tokens.
  • [#7613] - Reverted XGBoost categorical encodings for contributions.

Task

  • [#8002] - `max_hit_ratio_k` deprecated and removed.
  • [#7751] - Added upper bound cap to supported Java version in H2O CRAN package requirements.

Improvement

  • [#8165] - Users now allowed to include categorical column name in beta constraints.
  • [#8059] - Multinomial PDP can now be plotted for more than one target class in Flow.
  • [#7903] - Sped up CoxPH concordance score by using tree instead of the direct approach.
  • [#7822] - XGBoost no longer fails when specifying custom `fold_column`.
  • [#7799] - XGBoost CV models now built on multiple GPUs in parallel.
  • [#8459] - Missing metrics added to GLM scoring history.
  • [#7631] - Added validation checks for sampling rates for XGBoost for the R/Python clients.
  • [#7624] - No longer errors when trying to use a fold column where not all folds are represented.
  • [#7616] - Added the `metalearner_transform` option to Stacked Ensemble.
  • [#7592] - GBM main model now built in parallel to the CV models.
  • [#7589] - Removed redundant extraction weights from GBM/DRF histogram.
  • [#7588] - GBM now avoids scoring the last iteration twice when early stopping is enabled.
  • [#7586] - POJO predictions for XGBoost now even closer to in-H2O predictions.
  • [#7585] - Double-scoring of CV models in AutoML now avoided thus speeding up AutoML.
  • [#7579] - AutoML now uses fewer neurons in DL grids and has improved the metalearner for Stacked Ensemble.

####Technical task

  • [#7783] - Thin plate regression splines added to GAM.

Docs

  • [#7728] - Added checkpoint description to GLM.
  • [#7680] - Added thin plate regression spline documentation to GAM algorithm page.
  • [#7656] - Added missing parameters to XGBoost algorithm page.
  • [#7652] - Added more information about log files to User Guide.

Zermelo (3.32.0.5) - 3/16/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zermelo/5/index.html

Bug Fix

  • [#7844] - GAM no longer creates multiple knots at the same coordinates when the cardinality of the `gam_columns` is less than the number of `knots` specified by the user.

Improvement

  • [#7694] - Feature interactions can now be save as .xlxs files.
  • [#7614] - Job polling will retry connecting to h2o nodes if connection fails.

Zermelo (3.32.0.4) - 2/1/2021

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zermelo/4/index.html

Bug Fix

  • [#7697] - Partial Dependence Plot no longer failing for High Cardinality even when `user_splits` is defined.
  • [#7695] - Fixed failing Delta Lake import for Python API.
  • [#7686] - Fix Stacked Ensemble’s incorrect handling of fold column.

Improvement

  • [#7902] - Added MOJO support for CoxPH.
  • [#7675] - Escape all quotes by default when writing CSV.

Docs

  • [#7701] - Added to docs that AUCPR can be plotted.
  • [#7684] - Updated the Customer Algorithm graphic for the Architecture section of the User Guide.
  • [#7661] - Updated the copyright year to 2021.

Zermelo (3.32.0.3) - 12/24/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zermelo/3/index.html

Bug Fix

  • [#7868] - The `pca_impl` parameter is no longer passed to PCA MOJO.
  • [#7749] - Objects to be retained no longer removed during the `h2o.removeAll()` command.
  • [#7743] - Starting GridSearch in a fresh cluster with new hyperparameters that overlap old ones will no longer cause the old models to be trained again.
  • [#7731] - GridSearch no longer hangs indefinitely when not using the default value for paralellism.
  • [#7724] - Fixed the parent dir lookup for HDFS grid imports.
  • [#7717] - Fixed the CustomDistribution test error.

New Feature

  • [#12775] - Cross-Validation predictions can now be saved alongside the model.
  • [#8366] - Added multinomial and grid search support for AUC/PR AUC metrics.
  • [#7782] - Now offers a standalone R client that doesn’t include the h2o jar.
  • [#7773] - Created a Red Hat certification for H2O Docker Image.
  • [#7764] - Fixed randomized split points for `histogram_type=“Random”` when nbins=2.
  • [#7729] - Single quote regime for CSV parser exposed for importing & uploading files.

Improvement

  • [#7887] - REST API disabled on non-leader Kubernetes nodes.
  • [#7769] - GLM now uses proper logging instead of printlines.

Docs

  • [#7819] - Added non-tree-based models to the variable importance page in the user guide.
  • [#7775] - Updated the AutoML citation in the User Guide to point to the H2O AutoML ICML AutoML workshop paper.
  • [#7762] - Updated Python docstring examples about cross-validation.
  • [#7740] - Corrected `k` parameter description for PCA.
  • [#7723] - Corrected the RuleFit Python example.

Zermelo (3.32.0.2) - 11/17/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zermelo/2/index.html

Bug Fix

  • [#7849] - Implemented deserialization of monotone constraints.
  • [#7798] - Updated required version of ggplot2 in R package to 3.3.0.
  • [#7778] - Fixed the parsing of GLM’s `rand_family` params in MOJO JSON.
  • [#7768] - Fixed NPE that resulted when starting a grid with SequentialWalker in AutoML exploitation phase.
  • [#7765] - Fixed MOJO version check message.
  • [#7759] - When grid search has parallelism enabled, it now includes CV models.

New Feature

  • [#7900] - Added feature interactions and importance for XGBoost and GBM.
  • [#7867] - Added new `interaction_constraints` parameter to XGBoost.
  • [#7804] - Added an option to not have quotes in the header during exportFile.
  • [#7758] - Added ability to retrieve a list of all the models in an H2O cluster.
  • [#7745] - Added custom pod labels for HELM charts.

Task

  • [#7807] - Added `lambda_min` & `lambda_max` parameters to GLMModelOutputs.

Improvement

  • [#7894] - Added default values to all algorithm parameters in the User Guide.
  • [#7890] - Fixed the discrepancies between the Target Encoding User Guide page and Client.
  • [#7808] - Added ONNX support to the documentation.

Engineering Story

  • [#7796] - Added a new method which properly locks H2O Frames during conversion from Spark Data Frames to H2O Frames in Sparkling Water.

Docs

  • [#7806] - On the Grid Search User Guide page, fixed the missing syntax highlight in the Python example of the Random Grid Search section.
  • [#7805] - Added `rule_generation_ntrees` parameter to the RuleFit page.
  • [#7767] - Added documentation for GBM and XGBoost on feature interactions and importance.
  • [#7757] - Added a Python example to the `stratify_by` parameter.
  • [#7747] - Added a Feature Engineering section to the Data Manipulation page in the User Guide.

Zermelo (3.32.0.1) - 10/8/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zermelo/1/index.html

Bug Fix

  • [#7972] - Fixed StackedEnsemble’s retrieval of the seed parameter value.
  • [#7893] - Deserialization values of MOJO ModelParameter now work when the Value Type is int[].
  • [#7881] - H2O no longer uses lazy-loading for sequential zip parse.
  • [#7879] - Updated model_type argument names for Rulefit in R.

New Feature

  • [#8393] - Quantile distributions added to monotone constraints.
  • [#8318] - TargetEncoder integrated into ModelBuilder.
  • [#7885] - Python client no longer instructs the user to declare a root handler in library mode.
  • [#7851] - Hostname used as certificate alias to lookup machine-specific certificate allowing Hadoop users to connect to Flow over HTTPS.
  • [#7846] - Added the model explainability interface for H2O models and AutoML objects in both R & Python.
  • [#7919] - Added the RuleFit algorithm for interpretability.
  • [#7834] - Implemented a basic HELM chart.

Task

  • [#7878] - Rulefit model added to algorithm section of UserGuide.
  • [#7855] - Added an Explainability page to the User Guide outlining the new `h2o.explain()` and `h2o.explain_row()` functions.
  • [#7838] - Updated the AutoML User Guide page to include the new Explainability and Preprocessing sections.

Improvement

  • [#12783] - Added support for Python 3.7+.
  • [#7922] - Exposes names of score0 output values in MOJO.
  • [#7909] - Added function to plot a Precision Recall Curve.
  • [#7899] - RuleFit model represented by the set of rules obtained from trees during training.
  • [#7876] - Performance improved for exporting a Frame to CSV.
  • [#7872] - GPU backend allowed in XGBoost when running multinode even when `build_tree_one_node` is enabled.
  • [#7863] - Updated all URLs in R package to use HTTPS.
  • [#7852] - Upgraded to XGBoost 1.2.0.

Technical task

  • [#8271] - Added cross-validation to GAM allowing users to find the best alpha/lambda values when building a GAM model.
  • [#7967] - Added TargetEncoder support for multiclass problems.
  • [#7896] - Added new TargetEncoder parameter that allows users to remove original features automatically.
  • [#7862] - Implemented minimal support for TargetEncoding in AutoML.

Docs

  • [#8097] - Updated the descriptions of AutoML in R & Python packages.
  • [#7860] - Made the default for `categorical_encoding` in XGBoost explicit in the documentation.
  • [#7831] - Updated the import datatype section of the Python FAQ in the User Guide.
  • [#7826] - Updated the default values for `min_rule_length` and `max_rule_length` on the RuleFit page of the User Guide.
  • [#7825] - Updated the `validation_frame` definition for unsupervised algorithms in the User Guide.

Zeno (3.30.1.3) - 9/28/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zeno/3/index.html

Bug Fix

  • [#7861] - CRAN - Use HTTPS for all downloads within the R package.

Zeno (3.30.1.2) - 9/3/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zeno/2/index.html

Bug Fix

  • [#11601] - The ‘h2o.unique()’ command will now only return the unique values within a column.
  • [#8012] - k-LIME easy predict wrapper now uses Regression or KLime as a model category instead of just KLime.
  • [#7982] - Fixed the CRAN check warnings on r-devel for cross-references in the R documentation.
  • [#7935] - Documentation added detailing the supported encodings for CSV files.
  • [#7931] - GLM parameters integrated into GAM parameters.
  • [#7901] - Fixed broken URLs in R documentation that caused CRAN failures.

New Feature

  • [#8017] - Added the concordance statistic for CoxPH models.

Task

  • [#8157] - When using multiple alpha/lambda values for calling GLM from GAM, GLM now returns the best results across all alpha/lambda values. Also added the ‘cold_start’ parameter added to GLM.
  • [#7918] - Added documentation for new GAM hyperparameter ’subspaces’.
  • [#7913] - GLM new parameter ‘cold_start’ added to User Guide and GLM booklet.

Improvement

  • [#7985] - Reduced the memory cost of the `drop_duplicate` operation by cleaning up data early.
  • [#7916] - When calculating unique() values on a column that is the result of an AstRowSlice operation, the domain is now collected in-place and no longer results in an error.
  • [#7908] - Categorical encoding documentation updated by adding ‘EnumLimited’ & ’SortByReponse’ to KMeans and removing ‘Eigen’ from XGBoost.

Technical task

  • [#8270] - Tests added to verify grid search functionality for GAM and allows the user to create more complex hyper spaces for grid search by adding ‘subspaces’ key and functionality to grid search backend.

Docs

  • [#7906] - Added documentation on how to retrieve reproducibility information.

Zeno (3.30.1.1) - 8/10/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zeno/1/index.html

Bug Fix

  • [#8521] - H2OFrames with fields containing double quotes/line breaks can now be converted to Pandas dataframe.
  • [#8149] - Impossible to set Max_depth to unlimited on DRF classifer
  • [#8004] - Model generation for MOJO/POJO are disabled when interaction columns are used in GLM.
  • [#7993] - Reproducibility Information Table now hidden in H2O-Flow.

New Feature

  • [#11793] - Added support for `offset_column` in the Stacked Ensemble metalearner.
  • [#11794] - Added support for `weights_column` in the Stacked Ensemble metalearner.
  • [#8826] - Added continued support to Generalized Additive Models for H2O.
  • [#8397] - The value of model parameters can be retrieved at the end of training, allowing users to retrieve an automatically chosen value when a parameter is set to AUTO.
  • [#8353] - H2O Frame is now able to be saved into a Hive table.
  • [#8171] - XGBoost can now be executed on an external Hadoop cluster.
  • [#7999] - Added the `contamination` parameter to Isolation Forest which is used to mark anomalous observations.
  • [#7998] - Introduced the `validation_response_column` parameter for Isolation Forest which allows users to name the response column in the validation frame.
  • [#7992] - Added official support for Java 14 in H2O.
  • [#7942] - Added external cluster startup timeout for XGBoost.

Task

  • [#7990] - Hadoop Docker image run independent of S3.
  • [#7966] - Upgraded the build/test environment to support R 4.0 and Roxygen2.7.1.1.

Improvement

  • [#8698] - Implemented TF-IDF algorithm to reflect how important a word is to a document or collection of documents.
  • [#8691] - GridSearch R API test added for Isolation Forest.
  • [#8193] - ‘AUTO’ option added for GLM & GAM family parameter.
  • [#8142] - XGBoost Variable Importances now computed using a Java predictor.
  • [#8091] - StackedEnsemble can now be created using only monotone models if user specifies `monotone_constraints` in AutoML.
  • [#8071] - Enabled using imported models as base models in Stacked Ensembles.
  • [#7988] - Removed deprecated H2O-Scala module.

Technical Task

  • [#8447] - Added Java backend to support MOJO in GAM.
  • [#8028] - Added support for `early_stopping` parameter in GAM and GLM.

Engineering Story

  • [#7938] - Sparkling Water Booklet removed from the H2O-3 repository.

Docs

  • [#8082] - Added H2O Client chapter to the User Guide which includes section on Sklearn integration.
  • [#8000] - Added documentation in the Isolation Forest section for the `contamination` parameter.
  • [#7991] - Added documentation in GLM & GAM, and the `family` & `link` algorithm parameters to include how `family` can now be set equal to AUTO.
  • [#7984] - Added `gains lift_bins` to the parameter appendix and added and example to the parameter in the Python documentation. Added an example for the Kolmogorov-Smirnov metric to the Python documentation.
  • [#7983] - Updated GAM and GLM documentation to include support for `early_stopping`.
  • [#7978] - Added the Kolmogorov-Smirnov metric formula to the Performance and Prediction chapter.
  • [#7960] - Added the `negativebinomial` value to the `family` parameter page.
  • [#7959] - Added the `ordinal` and `modified_huber` values to the `distribution` parameter page.
  • [#7957] - Updated deprecated parameter `loading_name` to `representation_name` and fixed the broken init link in the GLRM section of the User Guide.
  • [#7955] - Added a note in the User Guide Stacked Ensemble section about building a monotonic Stacked Ensemble.
  • [#7940] - Added documentation for how `balance_classes` is triggered.

Zahradnik (3.30.0.7) - 7/21/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/7/index.html

New Feature

  • [#8207] - Added support for partitionBy column in partitioned parquet or CSV files.

Task

  • [#7994] - Warning added for user if both a lamba value and lambda search are provided in GLM.

Improvement

  • [#12662] - Added `max_runtime_secs` parameter to Stacked Ensemble.
  • [private-#28] - Upgraded Jetty 9 and switched default webserver to Jetty 9.

Zahradnik (3.30.0.6) - 6/30/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/6/index.html

Bug Fix

  • [#8009] - GLM Plug values are now propagated to MOJOs/POJOs.
  • [#8008] - In the Python documentation, the HGLM example now references `random_columns` by indices rather than by column name.
  • [#7997] - Fixed a link to H2O blogs in the R documentation.

New Feature

  • [#8233] - Added support for the Kolmogorov-Smirnov metric for binary classification models.

Docs

  • [#8014] - Added documentation in the Performance and Prediction chapter for the Kolmogorov-Smirnov metric.

Zahradnik (3.30.0.5) - 6/18/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/5/index.html

Bug Fix

  • [#8308] - Fixed an issue that denied all requests to display H2O Flow in an iframe.
  • [#8075] - Importing with `use_temp_table=False` now works correctly on Teradata.
  • [#8050] - Building a GLM model with `interactions` and `lambda = 0` no longer produces a "Categorical value out of bounds" error.
  • [#8048] - Fixed an inconsistency that occurred when using `predict_leaf_node_assignment` with a path and with a terminal node. For trees with a max_depth of up to 63, the results now match. For max_depth of 64 or higher (for path and nodes that are "too deep"), H2O will no longer produce incorrect results. Instead it will return "NA" for tree paths and "-1" for node IDs.
  • [#8042] - Leaf node assignment now works correctly for trees with a depth >= 31. Note that for trees with a max_depth of 64 or higher, H2O will return "NA" for tree paths and "-1" for node IDs.
  • [#8039] - `allow_insecure_xgboost` now works correctly on Hadoop.

New Feature

  • [#8206] - HTML documentation is now available as a downloadable zip file.
  • [#8037] - Users can now retrieve the prediction contributions when running `mojo_predict_pandas` in Python.
  • [#8025] - H2O documentation is now available in an h2odriver distribution zip file.
  • [#8024] - Quantiles models during the training of other models are now recognized as a regular model.
  • [#8018] - The H2O-SCALA module is deprecated and will be removed in a future release.

Improvement

  • [#9202] - Added support for models built with any `family` when running makeGLMModel.
  • [#8052] - K8S Docker images for h2o-3 are now available.
  • [#8023] - Warnings are now produced during model building when using the Python client.

Docs

  • [#8495] - Added examples for saving and loading grids in the User Guide.
  • [#8051] - Improved the examples in the Performance and Prediction chapter.
  • [#8049] - In the AutoML Random Grid Search Parameters topic, removed the no-longer-supported `min_sum_hessian_in_leaf` parameter from the XGBoost table. Also added clarification on how GHL models are handled in an AutoML random grid search run.
  • [#8035] - In the Python documentation, add examples for Grid Metrics.
  • [#8013] - The value of T as described in the description for `categorical_encoding="enum_limited"` is 10, not 1024.

Zahradnik (3.30.0.4) - 6/1/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/4/index.html

Bug Fix

  • [#8275] - h2o.merge() now works correctly when you joining an H2O frame where the join is on a column to another frame.
  • [#8184] - Fixed an issue that caused h2o.get_leaderboard to fail after creating an AutoML object, disconnecting the client, starting a new session, and then reconecting to the running H2O cluster for the re-attached H2OAutoML object.
  • [#8147] - Stacked Ensemble now inherits distributions/families supported by the metalearner.
  • [#8137] - Fixed an issue that caused AutoML to fail when the target included special characters.
  • [#8073] - CAcert is now supported with the Python API.
  • [#8069] - Water Meter and Form Login now work correctly.
  • [#8066] - In Aggregator, added support for retrieving the Mappings Frame.
  • [#8056] - Added support for using monotone constraints with Tweedie distribution in GBM.

New Feature

  • [#10207] - Added a new drop_duplicates function to drop duplicate observations from an H2O frame.
  • [#9371] - Partial dependence plots are now available for multiclass problems.

Improvement

  • [#8134] - Users now receive a warning if they try to get variable importances in Stacked Ensemble.
  • [#8111] - In XGBoost, removed the min_sum_hessian_in_leaf and min_data_in_leaf options, which are no longer supported by XGBoost. Also added the `colsample_bynode` option.
  • [#8089] - data.table warning messages are now suppressed inside h2o.automl() in R.

Docs

  • [#8120] - Added a "Training Models" section to the User Guide, which describes train() and train_segments().
  • [#8113] - Updated XGBoost to indicate that this version requires CUDA 9, and included information showing users how to check their CUDA version.
  • [#8112] - Added information about GAM support to the missing_values_handling parameter appendix entry.
  • [#8107] - Updated the Minio Instance topic.
  • [#8064] - `monotone_constraints` can now be used with `distribution=tweedie`.
  • [#8062] - Updated the PDP topic to include support for multinomial problems and updated the examples.
  • [#8053] - In the API-related Changes topic, noted that `min_sum_hessian_in_leaf` and `min_data_in_leaf` are no longer supported in XGBoost.

Zahradnik (3.30.0.3) - 5/12/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/3/index.html

Bug Fix

  • [#8146] - Improved validation and error messages for CoxPH.
  • [#8140] - In XGBoost, the `predict_leaf_node_assignment` parameter now works correctly with multiclass.
  • [#8121] - Fixed an issue that caused GBM to fail when it encountered a bin that included a single value and the rest NAs.

Improvement

  • [#8537] - Updated the AutoML example in the R package.
  • [#8199] - PDPs now allow y-axis scaling options.
  • [#8175] - Improved speed for training and prediction of Stacked Ensembles.

Docs

  • [#12851] - Added tables showing parameter values and random grid space ranges to the AutoML chapter.
  • [#8294] - Improved the Hive import documentation.
  • [#8138] - Improved documentation for Quantiles in the User Guide.
  • [#8133] - Fixed the documented default value for `min_split_improvement` parameter in XGBoost.

Zahradnik (3.30.0.2) - 4/28/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/2/index.html

Bug Fix

  • [#8237] - Fixed an issue that caused H2O to crash while debugging Python code using intellij/pycharm.
  • [#8211] - Fixed an issue that caused an assertion error while running Grid Search.
  • [#8203] - Training of a model based on a data frame that includes Target Encodings no longer fails due to a locked frame.
  • [#8198] - Added train_segments() to the R html documentation.
  • [#8196] - Target Encoder now unlocks the output frame.
  • [#8182] - Fixed the BiasTerm in XGBoost Contributions after upgrading to XGBoost 1.0.
  • [#8152] - GBM and XGBoost no longer ignore a column that includes a constant and NAs.

New Feature

  • [#8284] - Added the following options for customizing and retrieving threshold values.
    • `threshold` allows you to specify the threshold value used for calculating the confusion matrix.
    • `default_threshold` allows you to change the threshold that is used to binarise the predicted class probabilities.
    • `reset_model_threshold` allows you to reset the model threshold.
  • [#8261] - Introduced Kubernetes integration. Docker image tests are now available on K8S and published to Docker Hug.
  • [#8229] - A progress bar is now available during Shap Contributions calculations.

Improvement

  • [#9209] - An H2O Frame containing weights can now be specified when running `make_metrics`.
  • [#8361] - Added POJO and MOJO support for all encodings in GBM.
  • [#8192] - Users will now receive an error if they attempt to run https in h2o.init() when starting a local cluster.
  • [#8173] - Added an `-allow_insecure_xgboost` option to h2o and h2odriver that allows XGBoost multinode to run in a secured cluster.
  • [#8169] - Only the leader node is exposed on K8S.

Docs

  • [#8620] - Updated the Target Encoding topic and examples based on the improved API.
  • [#8293] - Added a new "Supported Data Types" topic to the Algorithms chapter.
  • [#8195] - Added a new "Kubernetes Integration" topic to the Welcome chapter.
  • [#8194] - Fixed the links for the constrained k-means Python demos.
  • [#8191] - Fixed the R example in the GAM chapter.
  • [#8187] - Added clarification for when `min_mem_size` and `max_mem_size`` are set to NULL/None in h2o.init().
  • [#8179] - The link to the slideshare in the DRF chapter now points to https instead of http.
  • [#8177] - Added information about the h2o.get_leaderboard() function to the AutoML chapter of the User Guide.
  • [#8176] - Updated the MOJO Quickstart showing how to use PrintMojo to visualize MOJOs without requiring Graphviz.
  • [#8168] - The import_mojo() function now uses "path" instead of "dir" when downloading, importing, uploading, and saving models. Updated the examples in the documentation.

Zahradnik (3.30.0.1) - 4/3/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/1/index.html

Bug Fix

  • [#8638] - Fixed an issue that caused performing multiple h2o.init() to fails with R on Windows.
  • [#8545] - Increased the default clouding time to avoid times out that resulted in a Cloud 1 under 4 error.
  • [#8296] - Removed obsolete exactLambdas parameter from GLM.

New Feature

  • [#12884] - Added support for a fractional response in GLM.
  • [#8826] - Added support for Generalized Additive Models (GAMs) in H2O. The documentation for this newly added algorithm can be found here.
  • [#8730] - Added support for parallel training (e.g. spark_apply in rsparkling or Python/R).
  • [#8404] - Added support for Continuous Bag of Words (CBOW) models in Word2Vec.
  • [#8369] - H2O can now predict OOME during parsing and stop the job if OOME is imminent.
  • [#8333] - Add GBM POJO support for SortByResponse and enumlimited.
  • [#8290] - Added support for Leaf Node Assignments in XGBoost and Isolation Forest MOJOs.
  • [#8285] - Added support for importing Stacked Ensemble MOJO models for scoring. (Note that this only applies to Stacked Ensembles that include algos with MOJO support.)
  • [#8232] - Added support for the `single_node_mode` parameter in CoxPH.
  • [#8228] - H2O now provides the original algorithm name for MOJO import.
  • [#8215] - Created a segmented model training interface in R.
  • [#8214] - Added a print method for the H2OSegmentModel object type in R.

Task

  • [#8401] - Removed the previously deprecated DeepWater Estimator function.
  • [#8252] - Now using Java-based scoring for XGBoostModels.

Improvement

  • [#11521] - In the H2O R package, `data.table` is now enabled by default (if installed).
  • [#9327] - In AutoML, users can try tuning the learning rate for the best model found during exploration in XGBoost and GBM. Note that the new `exploitation_ratio` parameter is still experimental.
  • [#8781] - Added out-of-the-box support for starting an h2o cluster on Kubernetes. Refer to this README for more information.
  • [#8553] - Improved the way AUC-PR is calculated.
  • [#8430] - Added an option to upload binary models from Python and R.

Docs

  • [#8564] - Added examples for Grid Search in the Python Module documentation.
  • [#8481] - Added examples to the R Reference Guide.
  • [#8287] - Added documentation for the fractional binomial family in the GLM section.
  • [#8286] - Added documentation for the new GAM algorithm.
  • [#8249] - Updated tab formatting for the `cluster_size_constraints` parameter appendix entry.
  • [#8231] - Updated the Target Encoding R example.
  • [#8230] - Included confusion matrix threshold details for binary and multiclass classification in the Performance and Prediction chapter.
  • [#8227] - Added documentation for new `upload_model` function.
  • [#8221] - Improved documentation around citing H2O in publications.
  • [#8209] - Added documentation for `single_node_mode` in CoxPH.

Yule (3.28.1.3) - 4/2/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yule/3/index.html

Bug Fix

  • [#8300] - Fixed an issue that occurred during Hive SQL import with `fetch_mode=SINGLE`; improved Hive SQL import speed; added an option to specify the number of chunks to parse.
  • [#8251] - Hive delegation token refresh now recognizes `-runAsUser`.
  • [#8243] - Fixed `base_model` selection for Stacked Ensembles in Flow.
  • [#8241] - The Parquet parser now supports arbitrary precision decimal types.

Story

  • [#8246] - The H2O Hive parser now recognizes varchar column types.

Task

  • [#8223] - Hive tokens are now refreshed without distributing the Steam keytab.

Improvement

  • [#8469] - Users can now specify the `max_log_file_size` when starting H2O. The log file size currently defaults to 3MB.
  • [#8279] - Fixed the of parameters for TargetEncoder in Flow.
  • [#8247] - HostnameGuesser.isInetAddressOnNetwork is now public.
  • [#8235] - Improved mapper-side Hive delegation token acquisition. Now when H2O is started from Steam, the Hive delegation token will already be acquired when the cluster is up.

Docs

  • [#8257] - Added to docs that `transform` only works on numerical columns.
  • [#8218] - Added documentation for the new num_chunks_hint option that can be specified with `import_sql_table`.
  • [#8217] - Added documentation for the new `max_log_file_size` H2O starting parameter.

Yule (3.28.1.2) - 3/17/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yule/2/index.html

Bug Fix

  • [#8847] - The `base_models` attribute in Stacked Ensembles is now populated in both Python and R.
    Note that in Python, if there are no `base_models` in `_parms`, then `actual_params` is used to retrieve base_models, and it contains the names of the models. In R, `ensemble@model$base_models` is populated with a vector of base model names.
  • [#8344] - Fixed an issue that caused the leader node to be overloaded when parsing 30k+ Parquet files.
  • [#8332] - Fixed an issue that caused `model end_time` and `run_time` properties to return a value of 0 in client mode.
  • [#8280] - TargetEncoderModel's summary no longer prints the fold column as a column that is going to be encoded by this model.
  • [#8273] - When h2omapper fails before discovering SELF (ip & port), the log messages are no longer lost.

New Feature

  • [#8506] - Added DeepLearning MOJO support in Generic Models.

Improvement\

  • [#9031] - Changed the output format of `get_automl` in Python from a dictionary to an object.
  • [#8266] - Users can now specify `-hdfs_config` multiple times to specify multiple Hadoop config files.
  • [#8264] - Fixed an issue that caused the clouding process to time out for the Target Encoding module and resulted in a `Cloud 1 under 4` error.

Docs

  • [#8445] - Improved FAQ describing how to use the H2O-3 REST API from Java.

Yule (3.28.1.1) - 3/5/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yule/1/index.html

Bug Fix

  • [#8314] - Added missing AutoML global functions to the Python and R documentation.
  • [#8312] - In the Python client, improved the H2OFrame documentation and properly labeled deprecated functions.
  • [#8303] - Fixed an issue that caused imported MOJOs to produce different predictions than the original model.

Engineering Story

  • [#8310] - Removed Sparling Water external backend code from H2O.

Docs

  • [#8309] - In the R client docs for h2o.head() and h2o.tail(), added an example showing how to control the number of columns to display in dataframe when using a Jupyter notebook with the R kernel.

Yu (3.28.0.4) - 2/23/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/4/index.html

Bug Fix

  • [#9016] - DeepLearning MOJOs are now thread-safe.
  • [#8406] - Fixed an issue that caused h2oframe.apply to fail when run in Python 3.7. Note that Python 3.7 is still not officially supported, but support is a WIP.
  • [#8375] - XGBoost now correctly respects monotonicity constraints for all tree_methods.
  • [#8373] - Decision Tree descriptions no longer include more descriptions than `max_depth` splits.
  • [#8365] - Fixed an issue that caused `import_hive_table` to fail with a JDBC source and a partitioned table.
  • [#8364] - Improved the DKVManager sequential removal mechanism.
  • [#8356] - In XGBoost, added a message indicating that the `exact` tree method is not supported in multinode.
  • [#8329] - XGBoost ContributionsPredictor is now serializable.
  • [#8328] - Fixed a CRAN warning related to ellipsis within arguments in the R package.
  • [#8325] - Added support for specifying AWS session tokens.

New Feature

  • [#9179] - Added support for Constrained K-Means clustering.
  • [#8674] - In Stacked Ensembles, added support for "xgboost" and "naivebayes" in the `metalearner_algorithm` parameter.
  • [#8334] - Added support for `build_tree_one_node` in XGBoost.

Improvement

  • [#8507] - In the R client, users can now optionally specify the number of columns to display in `h2o.frame`, `h2o.head`, and `h2o.tail`.
  • [#8443] - Fixed an issue that caused AutoML to fail to run if XGBoost was disabled.
  • [#8382] - Stacktraces are no longer returned in `h2o.getGrid` when failed models are present.
  • [#8327] - Added `createNewChunks` with a "sparse" parameter in ChunkUtils.

Docs

  • [#8675] - Added an FAQ to the MOJO and POJO quick starts noting that MOJOs and POJOs are thread safe for all supported algorithms.
  • [#8420] - Added the new `cluster_size_constraints` parameter to the KMeans chapter.
  • [#8350] - Updated docs to specify that `mtries=-2` gives all features.
  • [#8323] - Updated EC2 and S3 Storage topic to include the new, optional AWS session token.

Yu (3.28.0.3) - 2/5/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/3/index.html

Bug Fix

  • [#8888] - In the R client, fixed a parsing bug that occurred when using quotes with .csv files in as.data.frame().
  • [#8815] - Fixed an Unsupported Operation Exception in UDP-TCP-SEND.
  • [#8522] - GLM now supports coefficients on variable importance when model standardization is disabled.
  • [#8446] - In the Python client, rbind() can now be used on all numerical types.
  • [#8440] - In XGBoost, fixed an error that occurred during model prediction when OneHotExplicit was specified during model training.
  • [#8429] - Performing grid search over Target Encoding parameters now works correctly.
  • [#8391] - Fixed an issue that caused import_hive_table to not classload the JDBC driver.
  • [#8389] - MOJOs can now be built from XGBoost models built with an offset column.
  • [#8363] - Fixed an issue that cause the R and Python clients to return the wrong sensitivity metric value.
  • [#8362] - Fixed an incorrect sender port calculation in TimestampSnapshot.

New Feature

  • [#9128] - In AutoML, multinode XGBoost is now enabled by default.
  • [#8410] - Users can now specify a custom JDBC URL to retrieve the Hive Delegation token using hiveJdbcUrlPattern.

Task

  • [#8385] - In XGBoost fixed a deprecation warning for reg:linear.

Improvement

  • [#8442] - import_folder() can now be used when running H2O in GCS.
  • [#8407] - Added support for registering custom servlets.
  • [#8377] - In XGBoost, when a parameter with a synonym is updated, the synonymous parameter is now also updated.

Engineering Story

  • [#8388] - AutoBuffer.getInt() is now public.

Docs

  • [#8412] - Python examples for plot method on binomial models now use the correct method signature.
  • [#8411] - Updated custom_metric_func description to indicate that it is not supported in GLM.
  • [#8395] - Updated the AutoML documentation to indicate that multinode XGBoost is now turned on by default.
  • [#8379] - Fixed the description for the Hadoop -nthreads parameter.

Yu (3.28.0.2) - 1/20/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/2/index.html

Bug Fix

  • [#8571] - Fixed an issue that resulted in a "DistributedException java.lang.ClassNotFoundException: BAD" message.
  • [#8499] - Users can now specify either a model or a model key when checkpointing.
  • [#8490] - Fixed an issue that resulted in an endless loop when CsvParser parser $ sign was enclosed in quotes.
  • [#8480] - In GBM and DRF, fixed an AIOOBE error that occurred when the dataset included negative zeros (-0.0).
  • [#8467] - Fixed a race condition in the addWarningP method on Model class.
  • [#8455] - h2odriver now gets correct version of Hadoop dependencies.
  • [#8439] - Fixed a race condition in addVec.
  • [#8435] - Parallel Grid Search threads now call the Hyperspace iterator one at a time.
  • [#8431] - sklearn wrappers now expose wrapped estimator as a public property.
  • [#8428] - Fixed an issue in reading user_splits in Java.
  • [#8421] - Fixed an issue that caused rank vectors of Spearman correlation to have different chunk layouts.

Task

  • [#8583] - Added a JSON option of PrintMojo.
  • [#8520] - Improved the error message that displays when a user attempts to import data from an HDFS directory that is empty.
  • [#8456] - H2O can now read Hive table metadata two ways: either via direct Metastore access or via JDBC.

Improvement

  • [#9167] - Improved heuristics used for finding IP addresses on Hadoop in order to select the right subnet automatically.
  • [#8605] - Added support for `offset_column in XGBoost.
  • [#8551] - Users can now create tree visualizations without installing additional packages.
  • [#8503] - Added a new `download_model` function for downloading binary models in the R and Python clients.
  • [#8475] - Improved XGBoost performance.
  • [#8474] - When computing the correlation matrix of one or two H2OFrames (using `cor()`), users can now specify a method of either Pearson (default) or Spearman.
  • [#8438] - Users are now warned when they attempt to run AutoML with a validation frame and with nfolds > 0.
  • [#8436] - AutoML no longer trains a "Best of Family Stacked Ensemble" when only one family is specified.

Docs

  • [#12973] - Removed `ignored_columns` from the list of available paramters in AutoML.
  • [#8647] - Fixed a broken link in the JAVA FAQ.
  • [#8552] - Improved the documentation for Tree Class in the Python Client docs.
  • [#8484] - Clarified the difference between h2o.performance() and h2o.predict() in the Performance and Prediction chapter of the User Guide.
  • [#8478] - Incorporated HGLM documentation updates into the GLM booklet.
  • [#8441] - Added an FAQ for GC allocation failure in the FAQ > Clusters section.
  • [#8434] - In the Stacked Ensembles chapter, improved the metalearner support FAQ.
  • [#8419] - Added `offset_column` to the list of supported parameters in XGBoost.
  • [#8418] - Added information about recent API changes in AutoML to the API-Related Changes section in the User Guide.

Yu (3.28.0.1) - 12/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/1/index.html

Bug Fix

  • [#12823] - AutoML reruns using, for example, the same project name, no project name, etc., now produce consistent results.
  • [#8924] - Fixed an issue that occcurred when running an AutoML instance twice using the same project_name. AutoML no longer appends new models to the existing leaderboard, which caused the models for the first run to attempt to get rescored against the new learderboard_frame.
  • [#8696] - Updated the list of stopping metric options for AutoML in Flow. Also added support for the aucpr stopping metric in AutoML.
  • [#8673] - When training a K-Means model, the framename is no longer missing in the training metrics.
  • [#8642] - In AutoML, the `project_name` is now restricted to the same constraints as h2o frames.
  • [#8576] - In GBM, fixed an NPE that occurred when sample rate < 1.
  • [#8575] - The AutoML backend no longer accepts `ignored_columns` that contain one of response column, fold column, or weights column.
  • [#8505] - XGBoost MOJO now works correctly in Spark.
  • [#8504] - The REST API ping thread now starts after the cluster is up.
  • [#8502] - Fixed an NPE at hex.tree.TreeHandler.fillNodeCategoricalSplitDescription(TreeHandler.java:272)

New Feature

  • [#12219] - Extended MOJO support for PCA
  • [#9121] - We are very excited to add HGLM (Hierarchical GLM) to our open source offering. As this is the first release, we only implemented the Gaussian family. However, stay tuned or better yet, tell us what distributions you want to see next. Try it out and send us your feedback!
  • [#9117] - MOJO Import is now available for XGBoost.
  • [#8917] - Improved integration of the H2O Python client with Sklearn.
  • [#8896] - Users can now specify monotonicity constraints in AutoML.
  • [#8884] - Users can now save and load grids to continue a Grid Search after a cluster restart.
  • [#8860] - Users can now specify a `parallelism` parameter when running grid search. A value of 1 indicagtes sequential building (default); a value of 0 is used for adapative parallelism; and any value greater than 1 sets the exact number of models built in parallel.
  • [#8837] - Added a function to calculate Spearman Correlation.
  • [#8793] - Users can now specify the order in which training steps will be executed during an AutoML run. This is done using the new `modeling_plan` option.
  • [#8745] - The `calibration_frame` and `calibrate_model` options can now be spcified in XGBoost.
  • [#8707] - Added support for OneHotExplicit categorical encoding in EasyPredictModelWrapper.
  • [#8568] - Added aucpr to the AutoML leaderboard, stopping_metric, and sort_metric.
  • [#8566] - An AutoML leaderboard extension is now available that includes model training time and model scoring time.
  • [#8558] - Exposed the location of Chunks in the REST API.
  • [#8544] - Added a `rest_api_ping_timeout` option, which can stop a cluster if nothing has touched the REST API for the specified timeout.
  • [#8535] - Added support for Java 13.
  • [#8513] - H2O no longer performs an internal self-check when converting trees in H2O.

Task

  • [#8840] - Fixed an XGBoost error on multinode with AutoML.
  • [#8818] - Added checkpointing to XGBoost.
  • [#8664] - Users can now perform random grid search over target encoding hyperparameters
  • [#8582] - Improved Grid Search testing in Flow.

Improvement

  • [#11862] - When specifying a `stopping_metric`, H2O now supports lowercase and uppercase characters.
  • [#9424] - Added a warning message to AutoML if the leaderboard is empty due to too little time for training.
  • [#9019] - In AutoML, blending frame details were added to event_log.
  • [#8879] - If early stopping is enabled, GBM can reset the ntree value. In these cases, added an `ntrees_actual` (Python)/`get_ntrees_actual` (R) method to provide the actual ntree value (whether CV is enabled or not) rather than the original ntree value set by the user before building a model.
  • [#8808] - Refactored AutoML to improve integration with Target Encoding.
  • [#8708] - Exposed `get_automl` from `h2o.automl` in the Python client.
  • [#8701] - In GBM POJOs, one hot explicit EasyPredictModelWrapper now takes care of the encoding, and the user does not need to explicitly apply it.
  • [#8670] - Added support for numeric arrays to IcedHashMap.
  • [#8581] - Improved the AutoML Flow UI.
  • [#8574] - The `mae`, `rmsle`, and `aucpr` stopping metrics are now available in Grid Search.
  • [#8567] - When creating a hex.genmodel.easy.EasyPredictModelWrapper with contributions enabled, H2O now uses slf4j in the library, giving more control to users about when/where warnings will be printed.
  • [#8491] - Moved the order of AUCPR in the list of values for `stopping_metric` to right after AUC.

Engineering Story

  • [#8541] - Removed unused code in UDPClientEvent.

Docs

  • [#8957] - Added examples to the Python Module documentation DRF chapter.
  • [#8920] - Added examples to the Binomial Models section in the Python Python Module documentation.
  • [#8905] - Added examples to the Multimonial Models section in the Python Python Module documentation.
  • [#8903] - Added examples to the Clustering Methods section in the Python Module documentation.
  • [#8902] - Added examples to the Regression section in the Python documentation.
  • [#8892] - Added examples to the Autoencoder section in the Python documentation.
  • [#8891] - Added examples to the Tree Class section in the Python documentation.
  • [#8872] - Added examples to the Assembly section in the Python documentation.
  • [#8867] - Added examples to the Node, Leaf Node, and Split Leaf Node sections in the Python documentation.
  • [#8864] - Added examples to the H2O Module section in the Python documentation
  • [#8821] - Added examples to the H2OFrame section in the Python documentation
  • [#8804] - Documented support for `checkpointing` in XGBoost.
  • [#8802] - Added examples to the GroupBy section in the Python documentation.
  • [#8792] - Update to the supported platform table in the XGBoost chapter.
  • [#8784] - Added R/Python examples to the metrics in Performance and Prediction section of the User Guide.
  • [#8782] - Added Parameter Appendix entries for CoxPH parameters.
  • [#8744] - Added examples to the GBM section in the Python documentation
  • [#8728] - Added a new Reference entry to the Target Encoding documentation.
  • [#8721] - Added examples to the KMeans section in the Python documentation.
  • [#8712] - Added examples to the CoxPH section in the Python documentation.
  • [#8697] - Added examples to the Deep Learning section in the Python documentation.
  • [#8667] - Added examples to the Stacked Ensembles section in the Python documentation.
  • [#8661] - Added new `use_spnego` option to the Starting H2O in R topic.
  • [#8654] - Added examples to the Target Encoding section in the Python documentation.
  • [#8652] - Added examples to the Aggregator section in the Python documentation.
  • [#8651] - Updated the XGBoost extramempercent FAQ.
  • [#8636] - Added examples to the PCA section in the Python documentation.
  • [#8621] - Added a new section for Installing and Starting H2O in the Python Client documentation.
  • [#8615] - Added examples to the SVD section in the Python documentation.
  • [#8604] - Improve the R and Python documentation for `search_criteria` in Grid Search.
  • [#8546] - Added an example using `predict_contributions` to the MOJO quick start.
  • [#8524] - Added examples to the PSVM section in the Python documentation.
  • [#8512] - Added documentation for HGLM in the GLM chapter.
  • [#8498] - Improved AutoML documentation:
    • aucpr is now an available stopping metric and sort metric for AutoML.
    • monotone_constraints can now be specified in AutoML.
    • Added modeling_plan option to list of AutoML parameters.
  • [#8497] - MOJOs are now available for PCA.
  • [#8496] - MOJO models are now available for XGBoost.
  • [#8494] - calibration_frame and calibrate_model are now available in XGBoost.
  • [#8493] - Added Java 13 to list of supported Java versions.