Skip to content

ceys/fpmc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FPMC

#Produce hive -> DataFrame -> RDD[Features] -> Hbase

生产加工特征存入hbaew: 参数说明: --date:特征的日期,格式为2015-09-22 --sql: 提取数据的sql --ftype: 特征类型(cross,user,item) --mapping:sql row到feature的映射,其中:

attrCate为交叉特征类型ID,需要是整数, user的值对应的sql中user_id的索引, attrid对应sql中id的索引, fIndex为一个字典,键是vw命名空间名,值是对应sql中的索引。 example:

spark-submit \                                                                                                                                                                                                   
   --master local \                                                                                                                                                                                                 
   --num-executors 1 \                                                                                                                                                                                              
   --driver-memory 1g \                                                                                                                                                                                             
   --executor-memory 1g \                                                                                                                                                                                           
   --executor-cores 1 \                                                                                                                                                                                             
   --name test-zhengchen \                                                                                                                                                                                          
   --queue bdp_jmart_jdmp \                                                                                                                                                                                         
   --class com.jd.bdp.fpmc.tools.SparkSql2HbaseFeature \                                                                                                                                                            
   tools-0.1-SNAPSHOT.jar \                                                                                                                                                                                         
   --date $YESTERDAY \                                                                                                                                                                                              
   --sql "select user_id,cast(sku_id as bigint),cast(sum(three_days) as string),cast(sum(seven_days) as string) from adm.o2o_user_sku_features_tmp where dt='$YESTERDAY' and action='buy' group by user_id,sku_id" \
   --ftype cross \                                                                                                                                                                                                  
   --mapping '{"attrCate":1,"user":0,"attrid":1,"fIndex":{"ub3b":2,"ub7b":3}}' \

#Consume hiveTable -> DataFrame -> RDD[Action] -> RDD[Example] <- Hbase

生成从hbase取特征 参数说明: --sql:提取标签数据的sql --output:样本输出的hdfs路径 --mapping:sql row到标签的映射,其中:

user对应sql行数据中用户id的索引, item对应sql行数据中skuid的索引(要求bigint类型), timestamp:时间戳对应的sql行数据索引,单位为s,类型为int, label:标签对应的sql行数据索引, attrmap:交叉特征类别与值映射,建为交叉特征类型,值为交叉特征对应的sql行数据索引 example:

spark-submit \
   --master local \
   --num-executors 1 \
   --driver-memory 1g \
   --executor-memory 1g \
   --executor-cores 1 \
   --name test-zhengchen \
   --queue bdp_jmart_jdmp \
   --class com.jd.bdp.fpmc.tools.MakeExamples \
   tools-0.1-SNAPSHOT.jar \
   --sql "select user_log_acct, sku_id, request_time_sec, 1 from gdm.gdm_m14_online_o2o where dt='$YESTERDAY' and ct_page in ('detail' ,'GoodsInfo') and sku_id is not null and user_log_acct is not null and user_l
   og_acct != '' and refer_page in ('home','Home')" \
   --mapping '{"user":0,"item":1,"timestamp":2,"label":3,"attrmap":{"1":1}}' \
   --output /tmp/fpmc/example

About

feature produce & mange & concume

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages