Object Detection Using Triton Inference Server on E2E Cloud

November 29, 2023

NVIDIA Triton Inference Server

The problems we discussed above are largely solved by using a scalable AI inference platform that handles all frameworks and model types. The Triton Inference is one such solution which is a fast and scalable open-source AI inference application. It allows inference in any kind of CPU and GPU environment. It handles a wide range of frameworks including TensorRT, PyTorch, TensorFlow,

Triton Server Architecture (Source: Nvidia Blog)

ONNX and Python, even integrating with Kubernetes and MLOps platforms. It allows maximum hardware utilization using ensemble models and concurrent executions.

Additionally, it boasts the capability of dynamic batching. This allows the Triton server to flexibly group incoming client requests, processing them in larger batches. This enhances model throughput and latency. Triton server thus makes standardized inference possible on cloud, edge devices or any platforms with any framework.

In this tutorial, we will explore NVIDIA Triton Inference on the E2E Cloud platform. E2E Cloud has native integration with Triton server to create and deploy models. Here we will deploy a YoloV5 model on the Triton inference server and perform the inference.

Prerequisites

For the tutorial, you require an account in E2E cloud. You should have access to TIR AI platform. Make sure git is installed in your system. We will have to get a YoloV5 model to deploy. You can find it in the official repository. Clone the repository.


!git clone https://github.com/ultralytics/yolov5

NVIDIA Triton servers support formats like onnx. So, we must first convert a model to this format. Let’s export a pre-trained YoloV5 model in onnx format.


%cd yolov5
!pip install -r requirements.txt
!python export.py --weights yolov5s.pt --include torchscript onnx

Creating the Model

The model has to be created with storage before deploying. The models in TIR are containers for sharing and using model weights. At the backend, model weights and other files are stored in E2E Object Storage (EOS) buckets. Navigate to the model storage tab in the inference section of TIR platform page and click the Create Model button. In the model types, select Triton. After creating the model, you will be able to see the model credentials.

‍

We have created a model with a Triton backend and an object storage. The next step is to upload the weights. You can access E2E Object Storage (EOS) using any S3-compatible CLI or SDK. We recommend using Minio CLI.

We will be using the Mino CLI here. To set up Minio CLI, please run the following command. You can also refer the document to know the steps on how to run the command.

# Setup Host mc config host add https://objectstore.e2enetworks.net Copy Model to Bucket

mc cp -r $FOLDER_NAME yolo-v5/yolo-v5-36cea6

Here, FOLDER_NAME is the path to the onnx file and yolo-v5/yolo-v5-36cea6 is the name of the model and bucket storage I just created. Feel free to change the names as required. The model will be successfully uploaded to the bucket storage.

Deploying the Endpoint

Before moving to endpoints, create an authorization token in the API Tokens section. To create a model endpoint for our object detection model, go to the Model Endpoints and click on the Create Endpoint button. Select the GPU configuration required for your model and create an inference endpoint.

‍

You have successfully deployed the YoloV5 model. Now, let’s see how we can get inference results.

Model Inference

We need to install all Python libraries for the Triton server. Ensure you install the same version used to create the model backend in the cloud. Here I am using version 2.31.0.


!pip install tritonclient[all]==2.31.0

Create a Triton client. This client can be used to send requests to the server and receive responses. The tritonclient.http module is part of the Triton Inference Server client library, which provides a Python API for interacting with Triton servers.


from tritonclient.http import InferenceServerClient
client = InferenceServerClient(url=="infer.e2enetworks.net/project//endpoint//")

If facing any SSL issues, you can try the same with this code snippet.


import gevent.ssl as ssl
from tritonclient.http import InferenceServerClient
client = InferenceServerClient(url="infer.e2enetworks.net/project//endpoint//",  ssl=True, ssl_context_factory=ssl.create_default_context)
client.is_server_ready(headers={'Authorization': 'Bearer '})

Replace the url with the endpoint URL from your model endpoints page.

Let’s test the endpoint using a sample image. Preprocess the image and convert it to a format expected by the YOLO model. For YOLOv5, the input size is typically 640x640. You might need to adjust this based on your model configuration.


import cv2
import numpy as np
Load the image
image = cv2.imread('infer.jpg')
resized_image = cv2.resize(image, (640, 640))
Normalize the image. YOLOv5 expects the pixel values to be in the range [0, 1].
normalized_image = resized_image / 255.0
Add a batch dimension
input_data = np.expand_dims(normalized_image, axis=0)
Convert to float32
input_data = input_data.astype(np.float32)

Now create the model name and an InferInput object. 'input__0' is the name of the input for the YOLOv5 model.


model_name = "yolo-v5"
input = InferInput('input__0', input_data.shape, 'FP32')

Set the input data for the InferInput object.


input.set_data_from_numpy(input_data)

Now, create an InferRequestedOutput object for each output of the model. 'output__0' is the name of the output for the YOLOv5 model.


output = InferRequestedOutput('output__0')

Finally, send the image as an http request to the server and get the response from the model.


results = client.infer(
    model_name,
    inputs=[input],
    outputs=[output]
)

Wrapping Up

You have successfully deployed an object detection model on the NVIDIA Triton server using E2E Cloud. We encourage you to play around with other models too. Ideally you should use Docker, a popular platform that packages applications in containers. This allows for easier distribution and deployment of applications, including AI models. We hope you enjoyed this tutorial and found it useful for your projects. Thank you for reading and happy coding!

Sign up for Free Trial

Latest Blogs