Using Depthwise Separable convolutions in TensorFlow

April 2, 2025

The common story of any lazy network: Over-fitting

Things become pretty challenging when you want to train a deep neural network to hypothesize well to new data. If a model has very little amplitude, then it won’t be able to learn the problem and on the other hand, if a model has too much amplitude, then it will learn the problem with higher efficacy which in turn will overfit the training data-set. As a result, in both cases, the generalization does not happen properly.

‍

The parameters included in a standard convolution layer consist of input*output*width*height where their parameters width and heights work as the width and heights of the filter. Suppose you have an output channel of 15 and an input channel of 10 with the filter of 8*8, then it will have 9600 parameters. These many parameters can drastically extend the chances of over-fitting.

‍

Using Depthwise Separable convolutions in TensorFlow you can certainly decrease the number of parameters of the convolution network which in turn will solve the over-fitting issue for you. In this article we will discuss:

‍

How the ordinary convolution work
What is Depthwise convolution in TensorFlow?
How Depthwise Separable convolutions in TensorFlow works

How the ordinary convolution work?

The ordinary convolution or standard convolution is an extremely simple operation. You need to initiate with a kernel which is practically a miniature weight tensor or matrix. The kernel avoids the input data and starts performing an element-wise multiplication. In the end, it summarizes the result into an isolated output pixel.

‍

The kernel duplicates the entire process whenever it changes its location and transforms a 3D matrix of features into a 2D matrix of features. In an ordinary convolution, the output features are mainly the loaded aggregates of the input features, and these output features are situated in the same position as the output pixel (which is over the layer of input).

‍

You also need to remember that the position of input in which the kernel is operating is considered a Local Receptive Field. In ordinary convolution, every neuron of every layer can only be connected to the region of the preceding layer. That particular region of the previous layer is known as the Local Receptive Field of the ongoing neuron.

What is Depthwise convolution in TensorFlow?

Depthwise convolution in TensorFlow is a specified type of convolution in which we are allowed to apply an isolated convolution filter for every input channel. The Depthwise convolution can retain each channel separately. Let us look at the steps of Depthwise convolution:

‍

Separate the input and filter different channels
Entwine every input with their respective filter
Hoard the entwined outputs together

‍

In any Depthwise convolution, the parameters always remain the same and this kind of convolution provides you with an isolated 3-channel filter along with three output channels. At the same time if you are to use an ordinary convolution then you will require a total of three 3-channel filters.

‍

For a perfect example let us assume that the size of the input layer is 9*9*3 (height * width * channels) and the filter’s size, in this case, is 4*4*4. The size of the output layer after 2D convolution with a single filter is 7*7*1 (that has a single channel). Normally, between two neural networks, there are multiple filters. Let us assume that we have 132 filters. After the application of these 132 filters in the 2D convolution, we can have 132 output maps with the size of 7*7*1.

‍

Now we can pile up these maps in a single layer with 7*7*128 size. With the help of this, we can convert the input layer from 9*9*3 to an output layer of 7*7*128. As you can see from this example, the depth in this type of convolution is extended whereas the spatial dimensions are (width and height) reduced.

How Depthwise Separable convolutions in TensorFlow works?

The Depthwise Separable convolutions not only deal with the spatial dimensions but also addresses the depth dimension (the number of channels). The idea behind this convolution is that the spatial dimension and depth of any filter can be isolated.

‍

If you separate the width and height dimensions of any matrix into two vectors you will be able to represent a 4*4 matrix with 16 values that have only two 4*1 vectors with 8 values. This concept is also applicable when you want to distinguish the depth dimension from width * height which provides us with Depthwise Separable convolution.

‍

With this, you can understand why Depthwise Separable convolutions in TensorFlow are better than ordinary convolutions. Most importantly the overfitting issue of any network can be tackled with the help of Depthwise Separable convolutions.

‍

Reference links:

https://soroushhashemifar.medium.com/depth-wise-separable-convolution-explained-in-tensorflow-9be6aeaa4f8b

https://machinelearningmastery.com/using-depthwise-separable-convolutions-in-tensorflow/

https://blingeach.com/utilizing-depthwise-separable-convolutions-in-tensorflow/

https://github.com/christianversloot/machine-learning-articles/blob/main/creating-depthwise-separable-convolutions-in-keras.md

https://towardsdatascience.com/understanding-depthwise-separable-convolutions-and-the-efficiency-of-mobilenets-6de3d6b62503

‍

Sign up for Free Trial

Latest Blogs