HugeCTR

HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs). HugeCTR supports model-parallel embedding tables and data-parallel neural networks and their variants such as Deep Interest Network (DIN), NCF, Wide and Deep Learning (WDL), Deep Cross Network (DCN), DeepFM, and Deep Learning Recommendation Model (DLRM). HugeCTR is a component of NVIDIA Merlin Open Beta, which is used to build large-scale deep learning recommender systems. For more information, refer to HugeCTR User Guide.

Design Goals:

Fast: HugeCTR is a speed-of-light CTR model framework that can outperform popular recommender systems such as TensorFlow (TF).
Efficient: HugeCTR provides the essentials so that you can efficiently train your CTR model.
Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR.

Core Features

HugeCTR supports a variety of features, including the following:

High-Level abstracted Python interface
Model parallel training
Optimized GPU workflow
Multi-node training
Mixed precision training
Embedding training cache
GPU embedding cache
GPU / CPU memory sharing mechanism across various inference instances
HugeCTR to ONNX Converter
Hierarchical Parameter Server
Sparse Operation Kit

To learn about our latest enhancements, refer to our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, do the following:

Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:
```
docker run --gpus=all --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) nvcr.io/nvidia/merlin/merlin-hugectr:22.07
```
NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.

NOTE: HugeCTR uses NCCL to share data between ranks, and NCCL may requires shared memory for IPC and pinned (page-locked) system memory resources. It is recommended that you increase these resources by issuing the following options in the docker run command.
```
-shm-size=1g -ulimit memlock=-1
```

Write a simple Python script to generate a synthetic dataset:

# dcn_norm_generate.py
import hugectr
from hugectr.tools import DataGeneratorParams, DataGenerator
data_generator_params = DataGeneratorParams(
  format = hugectr.DataReaderType_t.Norm,
  label_dim = 1,
  dense_dim = 13,
  num_slot = 26,
  i64_input_key = False,
  source = "./dcn_norm/file_list.txt",
  eval_source = "./dcn_norm/file_list_test.txt",
  slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120,
  1543],
  check_type = hugectr.Check_t.Sum,
  dist_type = hugectr.Distribution_t.PowerLaw,
  power_law_type = hugectr.PowerLaw_t.Short)
data_generator = DataGenerator(data_generator_params)
data_generator.generate()

Generate the Norm dataset for your DCN model by running the following command:
```
python dcn_norm_generate.py
```
NOTE: The generated dataset will reside in the folder ./dcn_norm, which contains training and evaluation data.

Write a simple Python script for training:

# dcn_norm_train.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 1280,
                              batchsize_eval = 1024,
                              batchsize = 1024,
                              lr = 0.001,
                              vvgpu = [[0]],
                              repeat_dataset = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                                 source = ["./dcn_norm/file_list.txt"],
                                 eval_source = "./dcn_norm/file_list_test.txt",
                                 check_type = hugectr.Check_t.Sum)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                    update_type = hugectr.Update_t.Global)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array =
                        [hugectr.DataReaderSparseParam("data1", 1, True, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
                           workspace_size_per_gpu_in_mb = 75,
                           embedding_vec_size = 16,
                           combiner = "sum",
                           sparse_embedding_name = "sparse_embedding1",
                           bottom_name = "data1",
                           optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                           bottom_names = ["sparse_embedding1"],
                           top_names = ["reshape1"],
                           leading_dim=416))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                           bottom_names = ["reshape1", "dense"], top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCross,
                           bottom_names = ["concat1"],
                           top_names = ["multicross1"],
                           num_layers=6))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                           bottom_names = ["concat1"],
                           top_names = ["fc1"],
                           num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                           bottom_names = ["fc1"],
                           top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                           bottom_names = ["relu1"],
                           top_names = ["dropout1"],
                           dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                           bottom_names = ["dropout1", "multicross1"],
                           top_names = ["concat2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                           bottom_names = ["concat2"],
                           top_names = ["fc2"],
                           num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                           bottom_names = ["fc2", "label"],
                           top_names = ["loss"]))
model.compile()
model.summary()
model.graph_to_json(graph_config_file = "dcn.json")
model.fit(max_iter = 5120, display = 200, eval_interval = 1000, snapshot = 5000, snapshot_prefix = "dcn")

NOTE: Ensure that the paths to the synthetic datasets are correct with respect to this Python script. data_reader_type, check_type, label_dim, dense_dim, and data_reader_sparse_param_array should be consistent with the generated dataset.

Train the model by running the following command:
```
python dcn_norm_train.py
```
NOTE: It is presumed that the evaluation AUC value is incorrect since randomly generated datasets are being used. When the training is done, files that contain the dumped graph JSON, saved model weights, and optimizer states will be generated.

For more information, refer to the HugeCTR User Guide.

HugeCTR SDK

We're able to support external developers who can't use HugeCTR directly by exporting important HugeCTR components using:

Sparse Operation Kit directory | documentation: a python package wrapped with GPU accelerated operations dedicated for sparse training/inference cases.
GPU Embedding Cache: embedding cache available on the GPU memory designed for CTR inference workload.

Support and Feedback

If you encounter any issues or have questions, go to https://github.com/NVIDIA/HugeCTR/issues and submit an issue so that we can provide you with the necessary resolutions and answers. To further advance the HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

Contributing to HugeCTR

With HugeCTR being an open source project, we welcome contributions from the general public. With your contributions, we can continue to improve HugeCTR's quality and performance. To learn how to contribute, refer to our HugeCTR Contributor Guide.

Additional Resources

Webpages
NVIDIA Merlin
NVIDIA HugeCTR

Talks

Conference / Website	Title	Date	Speaker	Language
APSARA 2021	GPU 推荐系统 Merlin	Oct 2021	Joey Wang	中文
GTC Spring 2021	Learn how Tencent Deployed an Advertising System on the Merlin GPU Recommender Framework	April 2021	Xiangting Kong, Joey Wang	English
GTC Spring 2021	Merlin HugeCTR: Deep Dive Into Performance Optimization	April 2021	Minseok Lee	English
GTC Spring 2021	Integrate HugeCTR Embedding with TensorFlow	April 2021	Jianbing Dong	English
GTC China 2020	MERLIN HUGECTR ：深入研究性能优化	Oct 2020	Minseok Lee	English
GTC China 2020	性能提升 7 倍 + 的高性能 GPU 广告推荐加速系统的落地实现	Oct 2020	Xiangting Kong	中文
GTC China 2020	使用 GPU EMBEDDING CACHE 加速 CTR 推理过程	Oct 2020	Fan Yu	中文
GTC China 2020	将 HUGECTR EMBEDDING 集成于 TENSORFLOW	Oct 2020	Jianbing Dong	中文
GTC Spring 2020	HugeCTR: High-Performance Click-Through Rate Estimation Training	March 2020	Minseok Lee, Joey Wang	English
GTC China 2019	HUGECTR: GPU 加速的推荐系统训练	Oct 2019	Joey Wang	中文

Blogs

Conference / Website	Title	Date	Authors	Language
NVIDIA Devblog	Accelerating Embedding with the HugeCTR TensorFlow Embedding Plugin	Sept 2021	Vinh Nguyen, Ann Spencer, Joey Wang and Jianbing Dong	English
medium.com	Optimizing Meituan’s Machine Learning Platform: An Interview with Jun Huang	Sept 2021	Sheng Luo and Benedikt Schifferer	English
medium.com	Leading Design and Development of the Advertising Recommender System at Tencent: An Interview with Xiangting Kong	Sept 2021	Xiangting Kong, Ann Spencer	English
NVIDIA Devblog	扩展和加速大型深度学习推荐系统 – HugeCTR 系列第 1 部分	June 2021	Minseok Lee	中文
NVIDIA Devblog	使用 Merlin HugeCTR 的 Python API 训练大型深度学习推荐模型 – HugeCTR 系列第 2 部分	June 2021	Vinh Nguyen	中文
medium.com	Training large Deep Learning Recommender Models with Merlin HugeCTR’s Python APIs — HugeCTR Series Part 2	May 2021	Minseok Lee, Joey Wang, Vinh Nguyen and Ashish Sardana	English
medium.com	Scaling and Accelerating large Deep Learning Recommender Systems — HugeCTR Series Part 1	May 2021	Minseok Lee	English
IRS 2020	Merlin: A GPU Accelerated Recommendation Framework	Aug 2020	Even Oldridge etc.	English
NVIDIA Devblog	Introducing NVIDIA Merlin HugeCTR: A Training Framework Dedicated to Recommender Systems	July 2020	Minseok Lee and Joey Wang	English

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HugeCTR

Table of Contents

Core Features

Getting Started

HugeCTR SDK

Support and Feedback

Contributing to HugeCTR

Additional Resources

Talks

Blogs

Files

README.md

Latest commit

History

README.md

File metadata and controls

HugeCTR

Table of Contents

Core Features

Getting Started

HugeCTR SDK

Support and Feedback

Contributing to HugeCTR

Additional Resources

Talks

Blogs