Learning Compressed Embeddings for On-Device Inference

September 26, 2022

Work of compressed embeddings for on-device inference

Through compressed embeddings for on-device inference, we can reduce the dimensions of any embedding table while being able to map every entity into its inherent embedding. In this process instead of having one single embedding table, you can use two individual embedding tables.

In the first table, various entities share an embedding while in the other table you can find the trainable weight of each embedding entity. This entire process enables the models to differentiate between categorical entities while partaking in the same embedding.

Since these two tables are trained jointly, the network can learn a unique embedding per entity, helping it maintain a discriminative capability similar to a model with an uncompressed embedding table.

Compressed embeddings in modern deep learning

In modern deep learning, some of the elementary prime predictive features include categorical features that have huge vocabulary sizes. The NLP or natural language processing systems depend heavily on input data such as individual words, characters, sub-word tokens, etc. These input data are directly provided in different models as categorical features.

Additionally, the search and recommender systems can change their focus in the following way:

From featuring categorical inputs using traditional resources and methods
To showcase the model inputs as categorical features

With the help of embeddings, these techniques' categorical features and implementation have improved. However, in certain areas, people come across specific challenges while dealing with embeddings. For example, when the vocabulary sizes of the categorical features and the number of categorical features in a single model increase in size the memory footprint and the categorical features also increase simultaneously.

This kind of problem is very common when it comes to recommender and search systems. Although the vocabulary of natural language consists of numerous words, the metadata, queries, and documents can easily turn out to be millions. As a result the embedding matrices also become very large and these models are unable to perform every operation. It can severely affect the performance of low-resource models such as a tablet or a smartphone.

On-Device Inference

In comparison to any other computational archetype, on-device inference can offer multiple benefits. Such benefits include improved data privacy, increased moratorium, and low communication bandwidth. Although on-device inference can be challenging when used with deep neural networks.

Nevertheless, with the help of hardware-based optimization techniques, framework-based, neural net architecture-based methods can help in tackling those challenges.

To Conclude

With the help of compressed embeddings for on-device inference, you will be able to help your deep learning model work proficiently while increasing the pace of inference, efficient energy usage, and mobile deployment.

Reference links:

https://proceedings.mlsys.org/paper/2022/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf

https://arxiv.org/abs/2203.10135

https://deepai.org/publication/learning-compressed-embeddings-for-on-device-inference

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Learning Compressed Embeddings for On-Device Inference

Example H2

Work of compressed embeddings for on-device inference

Compressed embeddings in modern deep learning

Additionally, the search and recommender systems can change their focus in the following way:

From featuring categorical inputs using traditional resources and methods
To showcase the model inputs as categorical features

On-Device Inference

Nevertheless, with the help of hardware-based optimization techniques, framework-based, neural net architecture-based methods can help in tackling those challenges.

To Conclude

Reference links:

https://proceedings.mlsys.org/paper/2022/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf

https://arxiv.org/abs/2203.10135

https://deepai.org/publication/learning-compressed-embeddings-for-on-device-inference

Latest Blogs

Learning Compressed Embeddings for On-Device Inference

Table of Contents

Work of compressed embeddings for on-device inference

Compressed embeddings in modern deep learning

On-Device Inference

To Conclude

Reference links:

Learning Compressed Embeddings for On-Device Inference

Table of Contents

Work of compressed embeddings for on-device inference

Compressed embeddings in modern deep learning

On-Device Inference

To Conclude

Reference links:

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs

Accelerate Your AI Application Development Using TIR Containerized VMs

The AI Revolution in the Automotive Industry: Steering Toward a Smarter, Safer, and Sustainable Future

How to Build an AI Agent for Personalized Customer Experiences with LangGraph, LangChain and Gradio

Unleash Your AI Creativity at DeepSeek HackAIthon