Merlin TensorFlow Training container in NVIDIA GPU Cloud

April 2, 2025

NVIDIA Merlin is a framework for speeding the whole recommender systems pipeline using the graphics processing unit (GPU). Merlin allows data scientists, ML developers, and academics to develop high-performing recommenders at scale using various data sources and techniques. In addition to tackling basic training, ETL, and inference difficulties, Merlin contains tools that democratise the process of developing deep learning recommenders. Aspects of the Merlin pipeline have been improved to handle data volumes of hundreds of gigabytes or more, all of which may be accessed using simple APIs. It is possible to make better forecasts using Merlin than with traditional approaches, as well as to boost click-through rates.

Merlin ETL, Merlin Dataloaders and Training, as well as Merlin Inference are the four primary components of the Merlin ecosystem.

Benefits

A scalable and GPU-accelerated solution, NVIDIA Merlin makes it simple to develop recommender systems from start to finish. You can do the following using NVIDIA Merlin:

ETL is used to convert data in order to perform preprocessing and engineering features.
By utilising efficient, custom-built data loaders, current training pipelines in TensorFlow, PyTorch, or Fast AI may be significantly accelerated.
Huge deep learning recommender models may be scaled by spreading large embedding tables that are larger than the amount of GPU and CPU memory available.
With only a few lines of code, you can put data transformations and trained models into production and see results immediately.

NVTabular

NVTabular offers an advanced developing and preprocessing package for data in tabular form that was developed by NV Technologies. NVTabular is a component of the Merlin ecosystem that is primarily responsible for ETL. It is used to manage terabyte-sized datasets required to train deep learning-based recommender systems in a rapid and simple manner. An API at the highest degree of abstraction is provided by NVTabular, which may be used to create complicated data transformation workflows. NVTabular is also capable of transformation speedups that can range from 100 times to 1,000 times quicker than transformations performed on optimised CPU clusters, depending on the workload. You can do the following using NVTabular:

Prepare datasets for testing as fast and readily as possible so that additional models may be trained.
Process datasets that are larger than the available GPU and CPU memory without having to worry about scaling issues.
By employing abstraction at the operation level, you can concentrate on what to do with the data rather than how to do it.

Training with TensorFlow

The Merlin-TensorFlow-training container lets people do ETL with NVTabular and then train a deep-learning-based recommendation systems model with TensorFlow using the Merlin training framework.

This feature engineering and preprocessing toolkit for tabular data is part of the Merlin ecosystem and is used to swiftly and simply modify terabyte large datasets required to train deep learning based recommender systems NVTabular. The API documentation and the GitHub repository both have extensive documentation on the API's most important features.

In order to train deep learning recommender systems, it is necessary to put a large amount of data into memory. There are two ways by which NVIDIA Merlin speeds up the training of deep learning recommender systems: by using Huge CTR, a specific framework developed in CUDA C++, or by custom data loaders that can speed up the current TensorFlow training workflows. NVTabular data loaders for TensorFlow may be used in this container to speed up deep learning training in existing TensorFlow pipelines.

The Merlin-TF-Training Container

The following command can be used to remove the training containers:

docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host nvcr.io/nvidia/merlin/merlin-tensorflow-training:0.6 /bin/bash

Make sure you're running on a docker version 19 or above. To get started with your Jupyter lab on the docker container, you'll need to type in the run command and hit enter. Should resemble the following:

The jupyter-lab server can be started:

—NotebookApp.token=" cd /; jupyter-lab —allow-root —ip="0.0.0.0"

You may now reach the jupyter-lab server via:8888 from any web browser. Try out some examples or browse the code base under the /nvtabular/ directory. All of our dependencies, including RAPIDS Dask-cuDF, are contained within the container. To get started, run the container shown above and look around the samples that it contains.

Conclusion

Engineers and researchers can create high-performance recommenders at scale with Merlin. Deep learning recommenders can be built using Merlin's libraries and algorithms that cover typical preprocessing and feature engineering, as well as training and inference difficulties. The Merlin pipeline's components are all designed to handle TBs of data, and they're all made available via simple APIs. With Merlin, it is possible to get better forecasts and higher click-through rates than with previous approaches.

Sign up for Free Trial

Latest Blogs