欢迎访问我的网站,希望内容对您有用,感兴趣的可以加入我们的社群。

YOLOv5 的 Android 部署,基于 tflite

YOLO 迷途小书童 2年前 (2022-09-06) 7014次浏览 0个评论

环境

  • window 10 64bit
  • yolov5 v6.2
  • torch1.7.1+cuda101
  • tensorflow-gpu 2.9.1

前言

前文 借助NCNN,在Android上运行YOLOv5目标检测在Android上进行yolov5目标检测,使用torchscript方式,我们分别使用了 ncnntorchscript 这2种方式将 YOLOv5 部署到了 android 手机上。本篇我们将使用另一种方式 tflite 来进行部署,所以,喜欢哪个就用哪个吧。

具体步骤

这里使用最新的 v6.2 版本代码,使用官方的 yolov5s.pt 模型。要使用自己的数据集,需要自己先行训练,训练步骤可以参考文末的链接。

模型转换过程还需要 tensorflow 的环境,安装一下

pip instal tensorflow-gpu 

然后就可以使用 export.py 脚本进行转换了,命令如下

python export.py --weights yolov5s.pt --include tflite --img 416

参数 weights 指定 .pt 模型,--include 参数指定要转换的目标模型,--img 参数指定图片大小

(pytorch1.7) PS D:\Downloads\yolov5-6.2> python export.py --weights yolov5s.pt --include tflite --img 416
export: data=D:\Downloads\yolov5-6.2\data\coco128.yaml, weights=['yolov5s.pt'], imgsz=[416], batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['tflite']
YOLOv5  2022-8-17 Python-3.8.8 torch-1.7.1+cu110 CPU

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt with output shape (1, 10647, 85) (14.1 MB)

TensorFlow SavedModel: starting export with tensorflow 2.9.1...

                 from  n    params  module                                  arguments
2022-08-23 15:19:30.085029: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-08-23 15:19:30.091351: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: USERMIC-Q2OSJCT
2022-08-23 15:19:30.092531: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: USERMIC-Q2OSJCT
2022-08-23 15:19:30.094067: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]
  2                -1  1     18816  models.common.C3                        [64, 64, 1]
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  4                -1  1    115712  models.common.C3                        [128, 128, 2]
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  6                -1  1    625152  models.common.C3                        [256, 256, 3]
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 12           [-1, 6]  1         0  models.common.Concat                    [1]
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 16           [-1, 4]  1         0  models.common.Concat                    [1]
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]
 19          [-1, 14]  1         0  models.common.Concat                    [1]
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512], [416, 416]]
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to
==================================================================================================
 input_1 (InputLayer)           [(1, 416, 416, 3)]   0           []

 tf_conv (TFConv)               (1, 208, 208, 32)    3488        ['input_1[0][0]']

 tf_conv_1 (TFConv)             (1, 104, 104, 64)    18496       ['tf_conv[0][0]']

 tfc3 (TFC3)                    (1, 104, 104, 64)    18624       ['tf_conv_1[0][0]']

 tf_conv_7 (TFConv)             (1, 52, 52, 128)     73856       ['tfc3[0][0]']

 tfc3_1 (TFC3)                  (1, 52, 52, 128)     115200      ['tf_conv_7[0][0]']

 tf_conv_15 (TFConv)            (1, 26, 26, 256)     295168      ['tfc3_1[0][0]']

 tfc3_2 (TFC3)                  (1, 26, 26, 256)     623872      ['tf_conv_15[0][0]']

 tf_conv_25 (TFConv)            (1, 13, 13, 512)     1180160     ['tfc3_2[0][0]']

 tfc3_3 (TFC3)                  (1, 13, 13, 512)     1181184     ['tf_conv_25[0][0]']

 tfsppf (TFSPPF)                (1, 13, 13, 512)     656128      ['tfc3_3[0][0]']

 tf_conv_33 (TFConv)            (1, 13, 13, 256)     131328      ['tfsppf[0][0]']

 tf_upsample (TFUpsample)       (1, 26, 26, 256)     0           ['tf_conv_33[0][0]']

 tf_concat (TFConcat)           (1, 26, 26, 512)     0           ['tf_upsample[0][0]',
                                                                  'tfc3_2[0][0]']

 tfc3_4 (TFC3)                  (1, 26, 26, 256)     361216      ['tf_concat[0][0]']

 tf_conv_39 (TFConv)            (1, 26, 26, 128)     32896       ['tfc3_4[0][0]']

 tf_upsample_1 (TFUpsample)     (1, 52, 52, 128)     0           ['tf_conv_39[0][0]']

 tf_concat_1 (TFConcat)         (1, 52, 52, 256)     0           ['tf_upsample_1[0][0]',
                                                                  'tfc3_1[0][0]']

 tfc3_5 (TFC3)                  (1, 52, 52, 128)     90496       ['tf_concat_1[0][0]']

 tf_conv_45 (TFConv)            (1, 26, 26, 128)     147584      ['tfc3_5[0][0]']

 tf_concat_2 (TFConcat)         (1, 26, 26, 256)     0           ['tf_conv_45[0][0]',
                                                                  'tf_conv_39[0][0]']

 tfc3_6 (TFC3)                  (1, 26, 26, 256)     295680      ['tf_concat_2[0][0]']

 tf_conv_51 (TFConv)            (1, 13, 13, 256)     590080      ['tfc3_6[0][0]']

 tf_concat_3 (TFConcat)         (1, 13, 13, 512)     0           ['tf_conv_51[0][0]',
                                                                  'tf_conv_33[0][0]']

 tfc3_7 (TFC3)                  (1, 13, 13, 512)     1181184     ['tf_concat_3[0][0]']

 tf_detect (TFDetect)           ((1, 10647, 85),     229245      ['tfc3_5[0][0]',
                                 [(1, 2704, 3, 85),               'tfc3_6[0][0]',
                                 (1, 676, 3, 85),                 'tfc3_7[0][0]']
                                 (1, 169, 3, 85)])

==================================================================================================
Total params: 7,225,885
Trainable params: 0
Non-trainable params: 7,225,885
__________________________________________________________________________________________________
2022-08-23 15:19:33.627375: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2022-08-23 15:19:33.627658: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
Assets written to: yolov5s_saved_model\assets
TensorFlow SavedModel: export success, saved as yolov5s_saved_model (27.8 MB)

TensorFlow Lite: starting export with tensorflow 2.9.1...
Found untraced functions such as tf_conv_2_layer_call_fn, tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_4_layer_call_fn while saving (showing 5 of 268). These functions will not be directly callable after loading.
Assets written to: C:\Users\ADMINI~1\AppData\Local\Temp\tmp2dfxq2oq\assets
2022-08-23 15:21:33.820763: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-08-23 15:21:33.821578: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-08-23 15:21:33.827268: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: C:\Users\ADMINI~1\AppData\Local\Temp\tmp2dfxq2oq
2022-08-23 15:21:33.933152: I tensorflow/cc/saved_model/reader.cc:81] Reading meta graph with tags { serve }
2022-08-23 15:21:33.934302: I tensorflow/cc/saved_model/reader.cc:122] Reading SavedModel debug info (if present) from: C:\Users\ADMINI~1\AppData\Local\Temp\tmp2dfxq2oq
2022-08-23 15:21:34.244556: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-08-23 15:21:34.301240: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2022-08-23 15:21:35.141922: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: C:\Users\ADMINI~1\AppData\Local\Temp\tmp2dfxq2oq
2022-08-23 15:21:35.504643: I tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 1677383 microseconds.
2022-08-23 15:21:36.609861: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:263] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2022-08-23 15:21:38.044500: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1972] Estimated count of arithmetic ops: 7.666 G  ops, equivalently 3.833 G  MACs

Estimated count of arithmetic ops: 7.666 G  ops, equivalently 3.833 G  MACs
TensorFlow Lite: export success, saved as yolov5s-fp16.tflite (13.9 MB)

Export complete (136.66s)
Results saved to D:\Downloads\yolov5-6.2
Detect:          python detect.py --weights yolov5s-fp16.tflite
Validate:        python val.py --weights yolov5s-fp16.tflite
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s-fp16.tflite')
Visualize:       https://netron.app

可以看到,转换后的模型是 yolov5s-fp16.tflite,同时,在同层目录下,还有个文件夹 yolov5s_saved_model,里面包含了 .pb 文件,就是 protobuf 文件,这里有个细节,就是 .pt 文件是先被转换成 .pb,然后再转换成 .tflite 的。

接着,使用脚本 detect.py 来验证一下,用上面生成的 .tflite 模型

python detect.py --weights yolov5s-fp16.tflite --source data/images/bus.jpg --img 416

检测没有问题

android yolov5 tflite

有了模型文件,接下来就可以来到 android 端了,我把示例代码上传到了 github 上,可以直接 clone

git clone https://github.com/xugaoxiang/yolov5_android_tflite.git

需要替换的是 yolov5_android_tflite/app/src/main/assets 文件夹下的2个文件,class.txtyolov5s-fp16.tflite

编译后安装到手机,就可以开始进行基于摄像头的目标检测了

参考资料

喜欢 (2)

您必须 登录 才能发表评论!