[MOT] mot keypoint unite deploy (PaddlePaddle#3530)

* add mot keypoint unite python infer * fix mot pose fps
heavengate · Jun 29, 2021 · 1264fde · 1264fde
1 parent 1cae814
commit 1264fde
Show file tree

Hide file tree

Showing 12 changed files with 353 additions and 33 deletions.
diff --git a/configs/keypoint/README.md b/configs/keypoint/README.md
@@ -86,3 +86,15 @@ python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_38
 #keypoint top-down模型 + detector 检测联合部署推理（联合推理只支持top-down方式）
 python deploy/python/keypoint_det_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4  --device=gpu
 ```
+
+    **与多目标跟踪模型FairMOT联合部署预测：**
+
+```shell
+#导出FairMOT跟踪模型
+python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+#用导出的跟踪和关键点模型Python联合预测
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
+```
+**注意:**
+ 跟踪模型导出教程请参考`configs/mot/README.md`。
diff --git a/configs/mot/README.md b/configs/mot/README.md
@@ -255,11 +255,18 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairm
 ### 5. Using exported model for python inference
 
 ```bash
-python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --use_gpu=True --save_results
+python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
 **Notes:** 
-The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_results` to save the txt result file, or `--save_images` to save the visualization images.
+The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
 
+### 6. Using exported MOT and keypoint model for unite python inference
+
+```bash
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
+```
+**Notes:** 
+ Keypoint model export tutorial: `configs/keypoint/README.md`.
 
 ## Citations
 ```

diff --git a/configs/mot/README_cn.md b/configs/mot/README_cn.md
@@ -253,10 +253,18 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairm
 ### 5. 用导出的模型基于Python去预测
 
 ```bash
-python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --use_gpu=True --save_results
+python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
 **注意:** 
- 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_results`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+
+### 6. 用导出的跟踪和关键点模型Python联合预测
+
+```bash
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
+```
+**注意:** 
+ 关键点模型导出教程请参考`configs/keypoint/README.md`。
 
 ## 引用
 ```

diff --git a/configs/mot/fairmot/README.md b/configs/mot/fairmot/README.md
@@ -86,10 +86,10 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairm
 ### 5. Using exported model for python inference
 
 ```bash
-python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --use_gpu=True --save_results
+python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
 **Notes:** 
-The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_results` to save the txt result file, or `--save_images` to save the visualization images.
+The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
 
 
 ## Citations

diff --git a/configs/mot/fairmot/README_cn.md b/configs/mot/fairmot/README_cn.md
@@ -84,10 +84,10 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairm
 ### 5. 用导出的模型基于Python去预测
 
 ```bash
-python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --use_gpu=True --save_results
+python deploy/python/mot_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
 **注意:** 
- 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_results`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
 
 ## 引用
 ```

diff --git a/configs/mot/jde/README.md b/configs/mot/jde/README.md
@@ -92,10 +92,10 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darkn
 ### 5. Using exported model for python inference
 
 ```bash
-python deploy/python/mot_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --use_gpu=True --save_results
+python deploy/python/mot_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
 **Notes:** 
-The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_results` to save the txt result file, or `--save_images` to save the visualization images.
+The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
 
 
 ## Citations

diff --git a/configs/mot/jde/README_cn.md b/configs/mot/jde/README_cn.md
@@ -93,10 +93,10 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darkn
 ### 5. 用导出的模型基于Python去预测
 
 ```bash
-python deploy/python/mot_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --use_gpu=True --save_results
+python deploy/python/mot_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
 ```
 **注意:** 
- 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_results`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。。
+ 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。。
 
 ## 引用
 ```

diff --git a/deploy/python/mot_infer.py b/deploy/python/mot_infer.py
@@ -43,9 +43,8 @@ class MOT_Detector(object):
     Args:
         pred_config (object): config of model, defined by `Config(model_dir)`
         model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
-        use_gpu (bool): whether use gpu
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
         run_mode (str): mode of running(fluid/trt_fp32/trt_fp16)
-        batch_size (int): size of pre batch in inference
         trt_min_shape (int): min shape for dynamic shape in trt
         trt_max_shape (int): max shape for dynamic shape in trt
         trt_opt_shape (int): opt shape for dynamic shape in trt
@@ -58,9 +57,8 @@ class MOT_Detector(object):
     def __init__(self,
                  pred_config,
                  model_dir,
-                 use_gpu=False,
+                 device='CPU',
                  run_mode='fluid',
-                 batch_size=1,
                  trt_min_shape=1,
                  trt_max_shape=1088,
                  trt_opt_shape=608,
@@ -71,9 +69,8 @@ def __init__(self,
         self.predictor, self.config = load_predictor(
             model_dir,
             run_mode=run_mode,
-            batch_size=batch_size,
             min_subgraph_size=self.pred_config.min_subgraph_size,
-            use_gpu=use_gpu,
+            device=device,
             use_dynamic_shape=self.pred_config.use_dynamic_shape,
             trt_min_shape=trt_min_shape,
             trt_max_shape=trt_max_shape,
@@ -83,6 +80,7 @@ def __init__(self,
             enable_mkldnn=enable_mkldnn)
         self.det_times = Timer()
         self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0
+
         self.tracker = JDETracker()
 
     def preprocess(self, im):
@@ -208,7 +206,7 @@ def print_config(self):
 def load_predictor(model_dir,
                    run_mode='fluid',
                    batch_size=1,
-                   use_gpu=False,
+                   device='CPU',
                    min_subgraph_size=3,
                    use_dynamic_shape=False,
                    trt_min_shape=1,
@@ -222,7 +220,7 @@ def load_predictor(model_dir,
         model_dir (str): root path of __model__ and __params__
         run_mode (str): mode of running(fluid/trt_fp32/trt_fp16/trt_int8)
         batch_size (int): size of pre batch in inference
-        use_gpu (bool): whether use gpu
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
         use_dynamic_shape (bool): use dynamic shape or not
         trt_min_shape (int): min shape for dynamic shape in trt
         trt_max_shape (int): max shape for dynamic shape in trt
@@ -236,23 +234,20 @@ def load_predictor(model_dir,
     Raises:
         ValueError: predict by TensorRT need use_gpu == True.
     """
-    if not use_gpu and not run_mode == 'fluid':
+    if device != 'GPU' and run_mode != 'fluid':
         raise ValueError(
-            "Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
-            .format(run_mode, use_gpu))
+            "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
+            .format(run_mode, device))
     config = Config(
         os.path.join(model_dir, 'model.pdmodel'),
         os.path.join(model_dir, 'model.pdiparams'))
-    precision_map = {
-        'trt_int8': Config.Precision.Int8,
-        'trt_fp32': Config.Precision.Float32,
-        'trt_fp16': Config.Precision.Half
-    }
-    if use_gpu:
+    if device == 'GPU':
         # initial GPU memory(M), device ID
         config.enable_use_gpu(200, 0)
         # optimize graph and fuse op
         config.switch_ir_optim(True)
+    elif device == 'XPU':
+        config.enable_xpu(10 * 1024 * 1024)
     else:
         config.disable_gpu()
         config.set_cpu_math_library_num_threads(cpu_threads)
@@ -267,6 +262,11 @@ def load_predictor(model_dir,
                 )
                 pass
 
+    precision_map = {
+        'trt_int8': Config.Precision.Int8,
+        'trt_fp32': Config.Precision.Float32,
+        'trt_fp16': Config.Precision.Half
+    }
     if run_mode in precision_map.keys():
         config.enable_tensorrt_engine(
             workspace_size=1 << 10,
@@ -391,7 +391,7 @@ def main():
     detector = MOT_Detector(
         pred_config,
         FLAGS.model_dir,
-        use_gpu=FLAGS.use_gpu,
+        device=FLAGS.device,
         run_mode=FLAGS.run_mode,
         trt_min_shape=FLAGS.trt_min_shape,
         trt_max_shape=FLAGS.trt_max_shape,
@@ -412,5 +412,8 @@ def main():
     parser = argsparser()
     FLAGS = parser.parse_args()
     print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU'
+                            ], "device should be CPU, GPU or XPU"
 
     main()
diff --git a/deploy/python/mot_keypoint_unite_infer.py b/deploy/python/mot_keypoint_unite_infer.py
@@ -0,0 +1,164 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import math
+import numpy as np
+import paddle
+
+from mot_keypoint_unite_utils import argsparser
+from keypoint_infer import KeyPoint_Detector, PredictConfig_KeyPoint
+from keypoint_visualize import draw_pose
+from benchmark_utils import PaddleInferBenchmark
+from utils import Timer
+
+from tracker import JDETracker
+from mot_preprocess import LetterBoxResize
+from mot_infer import MOT_Detector, PredictConfig_MOT, write_mot_results
+from infer import print_arguments
+from ppdet.modeling.mot import visualization as mot_vis
+from ppdet.modeling.mot.utils import Timer as FPSTimer
+
+
+def mot_keypoint_unite_predict_video(mot_model, keypoint_model, camera_id):
+    if camera_id != -1:
+        capture = cv2.VideoCapture(camera_id)
+        video_name = 'output.mp4'
+    else:
+        capture = cv2.VideoCapture(FLAGS.video_file)
+        video_name = os.path.split(FLAGS.video_file)[-1]
+    fps = 30
+    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+    print('frame_count', frame_count)
+    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    # yapf: disable
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    # yapf: enable
+    if not os.path.exists(FLAGS.output_dir):
+        os.makedirs(FLAGS.output_dir)
+    out_path = os.path.join(FLAGS.output_dir, video_name)
+    writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+    frame_id = 0
+    timer_mot = FPSTimer()
+    timer_kp = FPSTimer()
+    timer_mot_kp = FPSTimer()
+    mot_results = []
+    while (1):
+        ret, frame = capture.read()
+        if not ret:
+            break
+        timer_mot_kp.tic()
+        timer_mot.tic()
+        online_tlwhs, online_scores, online_ids = mot_model.predict(
+            frame, FLAGS.mot_threshold)
+        timer_mot.toc()
+
+        mot_results.append(
+            (frame_id + 1, online_tlwhs, online_scores, online_ids))
+        mot_fps = 1. / timer_mot.average_time
+
+        timer_kp.tic()
+        keypoint_results = keypoint_model.predict([frame],
+                                                  FLAGS.keypoint_threshold)
+        timer_kp.toc()
+        timer_mot_kp.toc()
+        kp_fps = 1. / timer_kp.average_time
+        mot_kp_fps = 1. / timer_mot_kp.average_time
+
+        im = draw_pose(
+            frame,
+            keypoint_results,
+            visual_thread=FLAGS.keypoint_threshold,
+            returnimg=True)
+
+        online_im = mot_vis.plot_tracking(
+            im,
+            online_tlwhs,
+            online_ids,
+            online_scores,
+            frame_id=frame_id,
+            fps=mot_kp_fps)
+
+        im = np.array(online_im)
+
+        frame_id += 1
+        print('detect frame:%d' % (frame_id))
+
+        if FLAGS.save_images:
+            save_dir = os.path.join(FLAGS.output_dir, video_name.split('.')[-2])
+            if not os.path.exists(save_dir):
+                os.makedirs(save_dir)
+            cv2.imwrite(
+                os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
+
+        writer.write(im)
+        if camera_id != -1:
+            cv2.imshow('Tracking and keypoint results', im)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+    if FLAGS.save_mot_txts:
+        result_filename = os.path.join(FLAGS.output_dir,
+                                       video_name.split('.')[-2] + '.txt')
+        write_mot_results(result_filename, mot_results)
+    writer.release()
+
+
+def main():
+    pred_config = PredictConfig_MOT(FLAGS.mot_model_dir)
+    mot_model = MOT_Detector(
+        pred_config,
+        FLAGS.mot_model_dir,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn)
+
+    pred_config = PredictConfig_KeyPoint(FLAGS.keypoint_model_dir)
+    keypoint_model = KeyPoint_Detector(
+        pred_config,
+        FLAGS.keypoint_model_dir,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        use_dark=FLAGS.use_dark)
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        mot_keypoint_unite_predict_video(mot_model, keypoint_model,
+                                         FLAGS.camera_id)
+    else:
+        print('Do not support unite predict single image.')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU'
+                            ], "device should be CPU, GPU or XPU"
+
+    main()