-
Notifications
You must be signed in to change notification settings - Fork 516
Roadmap for Upcoming Features
This is the biggest area of focus for the current development.
Support for GPT-4 and GPT-3.5 models is in preview in 1.2.0.
Build a higher-level framework to automate AI functions in solving complex tasks.
flaml.automl
currently requires the data to be loaded in pandas dataframe or numpy. Supporting Spark dataframe will handle larger-than-memory datasets in flaml.automl
. flaml.tune
doesn't have limitation on the dataset size or format because it works with any user-defined training function.
Suggested in https://github.com/microsoft/FLAML/discussions/279. Suggestions about concrete use cases, estimators and hyperparameter search spaces are welcome.
Requested in https://github.com/microsoft/FLAML/issues/204. We currently support two types of multivariate time series forecasting.
- Forecast a single time series with exogenous variables. Using statistics models or regressors.
- Forecast multiple time series with panel datasets. Using deep neural networks.
https://github.com/microsoft/FLAML/issues/277. The current solution is to fit multiple single-output models. It will be slow when the number of tasks is large. The same issue applies to time series forecasting when there are multiple time series in the data (differentiated by categorical columns). It is an important real problem for many organizations, including big corporations like Meta and Microsoft. Please reach out if you are interested in research or development for solving this issue.
A recurring question is how to decide value of time_budget. For example,
Our current recommendation is at https://github.com/microsoft/FLAML/wiki/Time-budget. Any improvement on it will be beneficial to lots of users. It is a good research problem too. Please reach out if you'd like to contribute. A concrete idea to implement is https://github.com/microsoft/FLAML/issues/710.
Asked in https://github.com/microsoft/FLAML/discussions/289#discussioncomment-1690433. Related research:
- ABC: Efficient Selection of Machine Learning Configuration on Large Dataset
- Efficiently Approximating Selectivity Functions using Low Overhead Regression Models
It will be a unique feature to integrate these or other techniques into FLAML. Help wanted from both researchers and developers.
Requested in https://github.com/microsoft/FLAML/issues/214 and https://github.com/microsoft/FLAML/issues/355. It is a useful feature for customers such as Meta. Contributions from the community are appreciated.
A related discussion in https://github.com/microsoft/FLAML/discussions/351.
The current guidance to handle imbalanced data is at https://microsoft.github.io/FLAML/docs/FAQ#how-does-flaml-handle-imbalanced-data-unequal-distribution-of-target-classes-in-classification-task. Contributions on improving the performance in label imbalance are welcome. One example idea is from https://github.com/microsoft/FLAML/discussions/27: Throw a warning and let the user know about class imbalance before training. If imbalance is detected, wrap the classifiers with BalancedBaggingClassifier etc. to overcome imbalance.
https://github.com/microsoft/FLAML/issues/258. Help wanted.
https://github.com/microsoft/FLAML/issues/304. Help wanted.
- https://github.com/microsoft/FLAML/issues/246.
- https://github.com/microsoft/FLAML/issues/193.
- https://github.com/microsoft/FLAML/discussions/220.
Though we have some partial solutions, there is room for improvement. Contributions from the community are appreciated.
https://github.com/microsoft/FLAML/issues/172. We made some investigation about the effectiveness of using early_stop_rounds for lightgbm and xgboost. The results are inconclusive. Suggestions are welcome.
https://github.com/microsoft/FLAML/issues/144. Help wanted.
Help wanted.
Integrate Fair AutoML.