DKM casts forth K-means clustering as an attention problem, and then joint optimisation of the DNN parameters and clustering centroids is enabled. Unlike prior experiments, which relied on additional regularisations and parameters, DKM-based compression fixes the original loss function and model architecture.
DNN or deep neural networks have shown extraordinary capability in performing many cognitive tasks. They have demonstrated superhuman performance on many cognitive tasks. An uncompressed and completely-trained DNN is ordinarily used for inference on the server side, and the user experience is enhanced by on-device inference.
All you need to do is reduce the latency and keep the data of the user on the device. Many such on-device platforms are powered by batteries; hence they are constrained for power. So a DNN needs to be efficient power-wise. The stringent use of power resources will reduce the computing budget and also decrease the storage overhead.
What are the solutions for efficient management of the power of DNN?
There are multiple solutions to this one:
- One of the solutions for this is to design an extremely efficient DNN which uses power judiciously at the architecture level. One such example is MobileNet Howard.
- The model could be compressed. This is the other solution. Without the accuracy regression, it would be extremely light. This way, the model consumes less storage. It also reduces the bandwidth utilisation of the System on Chip (SoC) memory. This can minimise the power-consumption and latency. To this end, various DNN compression techniques have been used for this purpose only.
Another method has been shown to deliver a high ratio of DNN compression. The set of weights having a few shareable weight values is clustered into various groups. This is based on the popular method of K-means clustering. After the clustering of weights, the model size is shrunk to store indices in 2 bits, 4 bits and so on. This depends on the number of clusters. It has a lookup table with an integer, or rather a whole number, than floating-point values.
How to design a compact DNN architecture using K-Means clustering?
Designing a compact DNN architecture is easy. You just need to enable weight-clustering together where the clustering could provide the best solution for efficient on-device inference. But the existing model compression approaches can’t compress it completely.
The DNN is already a small DNN like MobileNet. This means that maybe we can presume that the model has not become redundant. On the other hand, the data is presumably necessary because the mathematical model itself has no significant redundancy. We can guess that such limitation comes from the fact that weight-clustering through a K-means algorithm.
Both weight-cluster assignment and updating of weights aren’t completely optimised with the help of a brain upgrade using a target task. The basic complexity in using k-means clustering for weight-sharing is done because both weights correspond to k-means centroids. So, these centroids are free to move. An ordinary K-means clustering by using fixed observations is hard for Neural Programming.
We can use differentiable K-means clustering to enable train-time weight-clustering for compressing the model, which can be used for deep learning. This helps K-means clustering to serve as a layer in generic activation. This helps to show the state-of-the-art results on both computer vision and NLM (Natural Language Model) tasks. That is how K-means clustering is used in Neural Network Compression.
In this regard, E2E Networks has some exciting solutions for you. Especially, E2E Auto Scale and E2E Linux Smart Dedicated 3rd Generation Solutions help use K-Means clustering to compress all your DNN workloads and help in optimising their performance.
Reference Links
https://arxiv.org/abs/2108.12659
https://machinelearning.apple.com/research/differentiable-k-means
https://openreview.net/pdf?id=J_F_qqCE3Z5
Content Health Data
Copyscape Premium Verification
95% Passed/ Key phrase passed