Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOV8训练时loss为0 #164

Closed
3 tasks done
kaixin-bai opened this issue Jul 7, 2023 · 4 comments
Closed
3 tasks done

YOLOV8训练时loss为0 #164

kaixin-bai opened this issue Jul 7, 2023 · 4 comments

Comments

@kaixin-bai
Copy link

kaixin-bai commented Jul 7, 2023

问题确认 Search before asking

  • 我已经查询历史issue,没有发现相似的bug。I have searched the issues and found no similar bug report.

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

在paddleyolo中使用yolov8训练自定义数据集时,显示loss全为0,对保存的checkpoint进行推理无法推理出结果。同样的数据集在paddledetection的yolov3上可以顺利训练并推理。

训练命令:

python3 tools/train.py -c ./configs/yolov8/yolov8_l_500e_domainR_RGB_coco.yml --use_vdl=true --vdl_log_dir=vdl_dir/yolov8_l_500e_domainR_RGB_coco/scalar

推理命令:

python3 tools/infer.py -c ./configs/yolov8/yolov8_l_500e_domainR_RGB_coco.yml --output_dir=./ --draw_threshold=0.0 --infer_img=./ceshi/input/rgb_0900.png -o weights=./output/yolov8_l_500e_domainR_RGB_coco/9.pdparams

配置文件:

_BASE_: [
  '../datasets/coco_detection_domainR_RGB.yml',
  '../runtime.yml',
  '_base_/optimizer_500e_high.yml',
  '_base_/yolov8_cspdarknet.yml',
  '_base_/yolov8_reader_high_aug.yml',
]
depth_mult: 1.0
width_mult: 1.0

log_iter: 50
snapshot_epoch: 10
weights: output/yolov8_l_500e_domainR_RGB_coco/model_final


YOLOv8CSPDarkNet:
  last_stage_ch: 512 # The actual channel is int(512 * width_mult), not int(1024 * width_mult) as in YOLOv5


TrainReader:
  batch_size: 8  # 16 # default 8 gpus, total bs = 128
(paddle) kb@ar-gpu01:/data-r10/kb/Projects/SynDataGen/PaddleYOLO$ python3 tools/train.py -c ./configs/yolov8/yolov8_l_500e_domainR_RGB_coco.yml --use_vdl=true --vdl_log_dir=vdl_dir/yolov8_l_500e_domainR_RGB_coco/scalar
/data-r10/kb/anaconda3/envs/paddle/lib/python3.8/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/data-r10/kb/anaconda3/envs/paddle/lib/python3.8/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/data-r10/kb/anaconda3/envs/paddle/lib/python3.8/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
loading annotations into memory...
Done (t=3.93s)
creating index...
index created!
[07/07 16:50:19] ppdet.data.source.coco INFO: Load [900 samples valid, 0 samples invalid] in file /data-r10/kb/Projects/SynDataGen/datasets/domain_randomization_rgb_mps1/train.json.
W0707 16:50:20.097818 25033 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2
W0707 16:50:20.101763 25033 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[07/07 16:50:26] ppdet.engine INFO: Epoch: [0] [  0/113] eta: 2 days, 16:35:47 lr: 0.000000 loss: 478.704224 loss_cls: 478.704224 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 4.1159 data_cost: 1.6733 ips: 1.9437 images/s
[07/07 16:51:08] ppdet.engine INFO: Epoch: [0] [ 50/113] eta: 13:29:23 lr: 0.000218 loss: 247.558243 loss_cls: 247.558243 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.7952 data_cost: 0.2914 ips: 10.0607 images/s
[07/07 16:51:53] ppdet.engine INFO: Epoch: [0] [100/113] eta: 13:30:54 lr: 0.000870 loss: 1.999204 loss_cls: 1.999204 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8651 data_cost: 0.3560 ips: 9.2475 images/s
[07/07 16:52:07] ppdet.engine INFO: Epoch: [1] [  0/113] eta: 13:46:36 lr: 0.001111 loss: 1.268529 loss_cls: 1.268529 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.9100 data_cost: 0.3909 ips: 8.7912 images/s
[07/07 16:52:51] ppdet.engine INFO: Epoch: [1] [ 50/113] eta: 13:34:34 lr: 0.002312 loss: 0.301968 loss_cls: 0.301968 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8401 data_cost: 0.3263 ips: 9.5229 images/s
[07/07 16:53:35] ppdet.engine INFO: Epoch: [1] [100/113] eta: 13:31:01 lr: 0.003948 loss: 0.102459 loss_cls: 0.102459 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8547 data_cost: 0.3380 ips: 9.3601 images/s
[07/07 16:53:49] ppdet.engine INFO: Epoch: [2] [  0/113] eta: 13:39:27 lr: 0.004444 loss: 0.079679 loss_cls: 0.079679 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.9041 data_cost: 0.3965 ips: 8.8488 images/s
[07/07 16:54:32] ppdet.engine INFO: Epoch: [2] [ 50/113] eta: 13:30:48 lr: 0.006629 loss: 0.042159 loss_cls: 0.042159 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8269 data_cost: 0.3053 ips: 9.6741 images/s
[07/07 16:55:17] ppdet.engine INFO: Epoch: [2] [100/113] eta: 13:29:32 lr: 0.009248 loss: 0.012407 loss_cls: 0.012407 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8614 data_cost: 0.3499 ips: 9.2871 images/s
[07/07 16:55:31] ppdet.engine INFO: Epoch: [3] [  0/113] eta: 13:34:04 lr: 0.010000 loss: 0.011372 loss_cls: 0.011372 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.9075 data_cost: 0.3917 ips: 8.8153 images/s
[07/07 16:56:14] ppdet.engine INFO: Epoch: [3] [ 50/113] eta: 13:26:51 lr: 0.009946 loss: 0.004152 loss_cls: 0.004152 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8156 data_cost: 0.3029 ips: 9.8085 images/s
[07/07 16:56:58] ppdet.engine INFO: Epoch: [3] [100/113] eta: 13:25:00 lr: 0.009946 loss: 0.001564 loss_cls: 0.001564 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8520 data_cost: 0.3256 ips: 9.3893 images/s
[07/07 16:57:12] ppdet.engine INFO: Epoch: [4] [  0/113] eta: 13:27:49 lr: 0.009946 loss: 0.001540 loss_cls: 0.001540 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8985 data_cost: 0.3776 ips: 8.9034 images/s
[07/07 16:57:55] ppdet.engine INFO: Epoch: [4] [ 50/113] eta: 13:24:32 lr: 0.009946 loss: 0.001526 loss_cls: 0.001526 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8372 data_cost: 0.3240 ips: 9.5555 images/s
[07/07 16:58:40] ppdet.engine INFO: Epoch: [4] [100/113] eta: 13:22:30 lr: 0.009946 loss: 0.001526 loss_cls: 0.001526 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8464 data_cost: 0.3216 ips: 9.4520 images/s
[07/07 16:58:55] ppdet.engine INFO: Epoch: [5] [  0/113] eta: 13:27:25 lr: 0.009928 loss: 0.001525 loss_cls: 0.001525 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.9087 data_cost: 0.3855 ips: 8.8035 images/s
[07/07 16:59:38] ppdet.engine INFO: Epoch: [5] [ 50/113] eta: 13:24:35 lr: 0.009928 loss: 0.001523 loss_cls: 0.001523 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8382 data_cost: 0.3152 ips: 9.5445 images/s
[07/07 17:00:23] ppdet.engine INFO: Epoch: [5] [100/113] eta: 13:23:55 lr: 0.009928 loss: 0.001500 loss_cls: 0.001500 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8646 data_cost: 0.3505 ips: 9.2525 images/s
[07/07 17:00:36] ppdet.engine INFO: Epoch: [6] [  0/113] eta: 13:25:56 lr: 0.009910 loss: 0.001483 loss_cls: 0.001483 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.9161 data_cost: 0.4078 ips: 8.7323 images/s
[07/07 17:01:20] ppdet.engine INFO: Epoch: [6] [ 50/113] eta: 13:23:23 lr: 0.009910 loss: 0.000978 loss_cls: 0.000978 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8377 data_cost: 0.3267 ips: 9.5498 images/s
[07/07 17:02:03] ppdet.engine INFO: Epoch: [6] [100/113] eta: 13:20:28 lr: 0.009910 loss: 0.000144 loss_cls: 0.000144 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8274 data_cost: 0.3139 ips: 9.6688 images/s
[07/07 17:02:17] ppdet.engine INFO: Epoch: [7] [  0/113] eta: 13:22:47 lr: 0.009892 loss: 0.000080 loss_cls: 0.000080 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8882 data_cost: 0.3785 ips: 9.0067 images/s
[07/07 17:03:01] ppdet.engine INFO: Epoch: [7] [ 50/113] eta: 13:20:49 lr: 0.009892 loss: 0.000017 loss_cls: 0.000017 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8421 data_cost: 0.3204 ips: 9.5006 images/s
[07/07 17:03:46] ppdet.engine INFO: Epoch: [7] [100/113] eta: 13:20:00 lr: 0.009892 loss: 0.000007 loss_cls: 0.000007 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8613 data_cost: 0.3310 ips: 9.2879 images/s
[07/07 17:04:00] ppdet.engine INFO: Epoch: [8] [  0/113] eta: 13:21:21 lr: 0.009874 loss: 0.000005 loss_cls: 0.000005 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.9011 data_cost: 0.3771 ips: 8.8778 images/s
[07/07 17:04:42] ppdet.engine INFO: Epoch: [8] [ 50/113] eta: 13:18:15 lr: 0.009874 loss: 0.000000 loss_cls: 0.000000 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8158 data_cost: 0.3034 ips: 9.8058 images/s
[07/07 17:05:27] ppdet.engine INFO: Epoch: [8] [100/113] eta: 13:16:58 lr: 0.009874 loss: 0.000000 loss_cls: 0.000000 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8500 data_cost: 0.3314 ips: 9.4116 images/s
[07/07 17:05:41] ppdet.engine INFO: Epoch: [9] [  0/113] eta: 13:18:17 lr: 0.009856 loss: 0.000000 loss_cls: 0.000000 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8904 data_cost: 0.3790 ips: 8.9844 images/s
[07/07 17:06:24] ppdet.engine INFO: Epoch: [9] [ 50/113] eta: 13:16:09 lr: 0.009856 loss: 0.000000 loss_cls: 0.000000 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8305 data_cost: 0.3138 ips: 9.6330 images/s
[07/07 17:07:09] ppdet.engine INFO: Epoch: [9] [100/113] eta: 13:14:56 lr: 0.009856 loss: 0.000000 loss_cls: 0.000000 loss_iou: 0.000000 loss_dfl: 0.000000 loss_l1: 0.000000 batch_cost: 0.8496 data_cost: 0.3301 ips: 9.4162 images/s

复现环境 Environment

cudatoolkit 11.2.2 hbe64b41_10 conda-forge
cudnn 8.2.1.32 h86fa8c9_0 conda-forge
paddlepaddle-gpu 2.4.2.post112 pypi_0 pypi

Bug描述确认 Bug description confirmation

  • 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR? Are you willing to submit a PR?

  • 我愿意提交PR!I'd like to help by submitting a PR!
@kaixin-bai
Copy link
Author

yolov7可以顺利训练,loss也显示比较正常,请问config文件中我有哪些需要修改?yolov8训练时我将reader文件的share_memory给设置为False,因为会报错。其他的没有什么修改了

@nemonameless
Copy link
Collaborator

数据集建议再检查下,loss_dfl loss_iou 在第0iter理论上不会为0
此外训练配置的lr bs不合理,训练自定义数据集最好加上coco权重预训练。
配置文件里加一行 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov8_l_500e_coco.pdparams
或运行命令后加上 -o pretrain_weights=https://paddledet.bj.bcebos.com/models/yolov8_l_500e_coco.pdparams
同时lr改小1/10
#43

@54wb
Copy link

54wb commented Sep 7, 2023

hi,请问你这个问题解决了嘛,我遇到跟你一摸一样的问题,将reader文件的share_memory给设置为False,我加载了预训练模型,我学习率设置是正常的

@54wb
Copy link

54wb commented Sep 7, 2023

数据集建议再检查下,loss_dfl loss_iou 在第0iter理论上不会为0 此外训练配置的lr bs不合理,训练自定义数据集最好加上coco权重预训练。 配置文件里加一行 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov8_l_500e_coco.pdparams 或运行命令后加上 -o pretrain_weights=https://paddledet.bj.bcebos.com/models/yolov8_l_500e_coco.pdparams 同时lr改小1/10 #43

我的情况跟上面的一样,只是中途loss变为了0,加载了预训练模型,bs为16,lr设置的0.00125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants