Skip to content

Files

Latest commit

author
Tiberiu Popa
Sep 23, 2024
f4cc66b · Sep 23, 2024

History

History

airlearner

Airlearner

What is it?

A practical machine learning library designed for production work. Current components:

Binary regression refers to regression problem with both boolean and float label. Pricing problem is a typical binary regression problem, such that true label associated with the sold price, while false label associated with unsold price. Traditional regression can't deal with this type of problem without bias, because either drop all false label samples or keep it with unsold price are biased. Binary regression solve this type of problem.

XGBoost Pipeline productionizes xgboost in spark + HDFS/Hive environment. It supports

  • Transform hive data into xgboost training data
  • Training, evaluation and scoring pipeline
  • MonteCarlo param search and save param search in hive table
  • Save Model and model output into HDFS and Hive.