nvOCDR is a C++ library for optical character detection and recognition. It is optimized for Nvidia devices with Nvidia software stack. This library consumes the TAO Toolkit trained OCDNet and OCRNet models for any OCR application. Whether you are building a surveillance system, a traffic monitoring application, or any other type of video analytics solution, the nvOCDR library is an essential tool for achieving accurate and reliable results. It can be easily integrated to any application requiring OCR ability.
- CUDA 11.4 or above
- TensorRT 8.5 or above (To use ViT-based model, TensorRT 8.6 above is required.)
- OpenCV 4.0 or above
- Jetpack 5.1 or above on Jetson devices
- Pretrained OCDNet and OCRNet model
We suggest to start from TensorRT container:
- On X86 platform:
docker run --gpus=all -v <work_path>:<work_path> --rm -it --privileged --net=host nvcr.io/nvidia/tensorrt:23.11-py3 bash # install opencv apt update && apt install -y libopencv-dev
- On Jetson platform
docker run --gpus=all -v <work_path>:<work_path> --rm -it --privileged --net=host nvcr.io/nvidia/l4t-tensorrt:r8.5.2.2-devel bash # install opencv apt update && apt install -y libopencv-dev
And then you could dowload the pretrained models of OCDNet and OCRNet with following instructions or train your own model (Please ref to TAO Toolkit documentation for how to train your own OCDNet and OCRNet. And there will be a vocabulary list named character_list.txt
of OCRNet model when you download the PTM from NGC.
- download the onnx models of OCDnet and OCRnet
mkdir onnx_models
cd onnx_models
# Download OCDnet onnx
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocdnet/deployable_v1.0/files?redirect=true&path=dcn_resnet18.onnx' -O dcn_resnet18.onnx
mv dcn_resnet18.onnx ocdnet.onnx
# Download OCRnet onnx
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v1.0/files?redirect=true&path=ocrnet_resnet50.onnx' -O ocrnet_resnet50.onnx
mv ocrnet_resnet50.onnx ocrnet.onnx
# Download OCRnet character_list
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v1.0/files?redirect=true&path=character_list' -O character_list
mv character_list character_list.txt
# # Download command for ViT-based models:
# # Download OCDNet-ViT onnx
# wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocdnet/deployable_v2.0/files?redirect=true&path=ocdnet_fan_tiny_2x_icdar.onnx' -O ocdnet_fan_tiny_2x_icdar.onnx
# # Download OCRNet-ViT onnx
# wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v2.0/files?redirect=true&path=ocrnet-vit.onnx' -O ocrnet-vit.onnx
# # Download OCRnet character_list
# wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v2.0/files?redirect=true&path=character_list' -O character_list
Notes: If you're using TensorRT 8.6 and above, you can skip this step.
The OCDNet requires modulatedDeformConvPlugin
for running with TensorRT
- Get TensorRT OSS repository
git clone -b release/8.6 /~https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git submodule update --init --recursive
- Compile TensorRT
libnvinfer_plugin.so
:
mkdir build && cd build
# On X86 platform
cmake ..
# On Jetson platform
# cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/
make nvinfer_plugin -j4
Notes: You can use the helper script to compile TensorRT OSS.
- Copy the
libnvinfer_plugin.so
to the system library path
cp libnvinfer_plugin.so.8.6.0 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.1
# On Jetson platform:
# cp libnvinfer_plugin.so.8.6.0 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2
Finally generate the TensorRT engine from trained OCDNet and OCRNet:
#Generate OCDNet engine with dynmaic batch size and max batch size is 4:
/usr/src/tensorrt/bin/trtexec --onnx=./ocdnet.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=./ocdnet.fp16.engine
#Generate OCRNet engine with dynamic batch size and max batch size is 32:
/usr/src/tensorrt/bin/trtexec --onnx=./ocrnet.onnx --minShapes=input:1x1x32x100 --optShapes=input:32x1x32x100 --maxShapes=input:32x1x32x100 --fp16 --saveEngine=./ocrnet.fp16.engine
# #Generate engines for ViT-based models
# /usr/src/tensorrt/bin/trtexec --onnx=./ocdnet_fan_tiny_2x_icdar.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:1x3x736x1280 --fp16 --saveEngine=./ocdnet.fp16.engine
# /usr/src/tensorrt/bin/trtexec --onnx=./ocrnet-vit.onnx --minShapes=input:1x1x64x200 --optShapes=input:32x1x64x200 --maxShapes=input:32x1x64x200 --fp16 --saveEngine=./ocrnet.fp16.engine
-
Clone the repository:
git clone /~https://github.com/NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution.git
-
Compile the
libnvocdr.so
:cd NVIDIA-Optical-Character-Detection-and-Recognition-Solution make export LD_LIBRARY_PATH=$(pwd)
To use nvOCDR in your C++ project, include the nvOCRD.h
header file and link against the nvOCDR
library. Here's an example code:
//test.cpp
#include <opencv2/opencv.hpp>
#include <cuda.h>
#include <cuda_runtime.h>
#include "nvocdr.h"
int main()
{
// Init the nvOCDR lib
// Please pay attention to the following parameters. You may need to change them according to different models.
nvOCDRParam param;
param.input_data_format = NHWC;
param.ocdnet_trt_engine_path = (char *)"./ocdnet.fp16.engine";
param.ocdnet_infer_input_shape[0] = 3;
param.ocdnet_infer_input_shape[1] = 736;
param.ocdnet_infer_input_shape[2] = 1280;
param.ocdnet_binarize_threshold = 0.1;
param.ocdnet_polygon_threshold = 0.3;
param.ocdnet_max_candidate = 200;
param.ocdnet_unclip_ratio = 1.5;
param.ocrnet_trt_engine_path = (char *)"./ocrnet.fp16.engine";
param.ocrnet_dict_file = (char *)"./character_list.txt";
param.ocrnet_infer_input_shape[0] = 1;
param.ocrnet_infer_input_shape[1] = 32;
param.ocrnet_infer_input_shape[2] = 100;
// uncomment if you're using attention-based models:
// param.ocrnet_decode = Attention;
nvOCDRp nvocdr_ptr = nvOCDR_init(param);
// Load the input
const char* img_path = "./test.jpg";
cv::Mat img = cv::imread(img_path);
nvOCDRInput input;
input.device_type = GPU;
input.shape[0] = 1;
input.shape[1] = img.size().height;
input.shape[2] = img.size().width;
input.shape[3] = 3;
size_t item_size = input.shape[1] * input.shape[2] * input.shape[3] * sizeof(uchar);
cudaMalloc(&input.mem_ptr, item_size);
cudaMemcpy(input.mem_ptr, reinterpret_cast<void*>(img.data), item_size, cudaMemcpyHostToDevice);
// Do inference
nvOCDROutputMeta output;
nvOCDR_inference(input, &output, nvocdr_ptr);
// Print the output
int offset = 0;
for(int i = 0; i < output.batch_size; i++)
{
for(int j = 0; j < output.text_cnt[i]; j++)
{
printf("%d : %s, %ld\n", i, output.text_ptr[offset].ch, strlen(output.text_ptr[offset].ch));
offset += 1;
}
}
// Destroy the resoures
free(output.text_ptr);
cudaFree(input.mem_ptr);
nvOCDR_deinit(nvocdr_ptr);
return 0;
}
You can compile the code with the command:
g++ ./test.cpp -I./include -L./ -I/usr/include/opencv4/ -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcudart -lopencv_core -lopencv_imgcodecs -lnvocdr -o test
For more information on how to use nvOCDR in DeepStream, see the documentation.
For more information on how to use nvOCDR in Triton, see the documentation.
The ViT-based OCRNet models released on NGC (deployable 2.0 and deployable 2.1) come with attention module which require attention decoding method. One can enable attention decoding by the following steps:
-
In C++ application:
nvOCDRParam param; param.ocrnet_decode = Attention;
-
In DeepStream:
customlib-props="ocrnet-decode:Attention"
-
In Triton (in
models/nvOCDR/spec.json
):"ocrnet_decode": "Attention"
For more information about nvOCDR API, see the API reference
By cloning or downloading nvOCDR, you agree to terms of the nvOCDR EULA.