Skip to content

Roadmap for Upcoming Features

Chi Wang edited this page Sep 5, 2022 · 24 revisions

Feature Requests

Multi-modal model

Suggested in https://github.com/microsoft/FLAML/discussions/279. Suggestions about concrete use cases, estimators and hyperparameter search spaces are welcome.

Multivariate time series forecasting

Requested in https://github.com/microsoft/FLAML/issues/204. We currently support two types of multivariate time series forecasting.

This is an important area under active development and contributions are highly valued.

Improve efficiency for multi-tasking

https://github.com/microsoft/FLAML/issues/277. The current solution is to fit multiple single-output models. It will be slow when the number of tasks is large. The same issue applies to time series forecasting when there are multiple time series in the data (differentiated by categorical columns). It is an important real problem for many organizations, including big corporations like Meta and Microsoft. Please reach out if you are interested in research or development for solving this issue.

How to decide value of time_budget

A recurring question is how to decide value of time_budget. For example,

Our current recommendation is at https://github.com/microsoft/FLAML/wiki/Time-budget. Any improvement on it will be beneficial to lots of users. It is a good research problem too. Please reach out if you'd like to contribute. A concrete idea to implement is https://github.com/microsoft/FLAML/issues/710.

Decide how many labeled training examples are needed

Asked in https://github.com/microsoft/FLAML/discussions/289#discussioncomment-1690433. Related research:

It will be a unique feature to integrate these or other techniques into FLAML. Help wanted from both researchers and developers.

Prediction quality

Requested in https://github.com/microsoft/FLAML/issues/214 and https://github.com/microsoft/FLAML/issues/355. It is a useful feature for customers such as Meta. Contributions from the community are appreciated.

A related discussion in https://github.com/microsoft/FLAML/discussions/351.

Imbalance

The current guidance to handle imbalanced data is at https://microsoft.github.io/FLAML/docs/FAQ#how-does-flaml-handle-imbalanced-data-unequal-distribution-of-target-classes-in-classification-task. Contributions on improving the performance in label imbalance are welcome. One example idea is from https://github.com/microsoft/FLAML/discussions/27: Throw a warning and let the user know about class imbalance before training. If imbalance is detected, wrap the classifiers with BalancedBaggingClassifier etc. to overcome imbalance.

Feature Selection by FLAML

https://github.com/microsoft/FLAML/issues/258. Help wanted.

Support "groups" for catboost

https://github.com/microsoft/FLAML/issues/304. Help wanted.

Visualize feature importance, SHAP/LIME explanation, optimization history

Though we have some partial solutions, there is room for improvement. Contributions from the community are appreciated.

Use early_stop_rounds

https://github.com/microsoft/FLAML/issues/172. We made some investigation about the effectiveness of using early_stop_rounds for lightgbm and xgboost. The results are inconclusive. Suggestions are welcome.

Search space for CatBoost

https://github.com/microsoft/FLAML/issues/144. Help wanted.

ONNX/ONNXML export

Help wanted.

Fair AutoML

Integrate Fair AutoML.

Clone this wiki locally