[Object detection] YOLOX: Exceeding YOLO Series in 2021 Review

4 분 소요

YOLOX: Exceeding YOLO Series in 2021 review

title

0. Abstract

YOLO Detector에 anchor-free manner를 적용하고, 다른 detection techniques (decoupled head, leading label assignment strategy SimOTA)을 적용함.
YOLO Nano에 적용하여, 0.91M parameters, 1.08G FLOPs, 25.3AP on COCO.
YOLOV3에 적용하여, 47.3AP 달성
YOLOX-L은 YOLO V4-CSP, VOLOV5-L과 비슷한 Parameter 수를 가지지만, 50.0 AP 달성
Won 1st prize on Autonomous driving workshop at CVPR 2021

1. Introduction

YOLO Series는 해당 시대에 가장 뛰어난 detection technique를 적용해서 좋은 결과를 냈고, 그 중에서 현재까지는 YOLO V5의 성능이 가장 뛰어나다.
그러나 아직 Anchor-free / advanced label assignment strategies, NMS-Free 등은 적용한적이 없다.
Anchor-free를 적용하기에는 YOLO-V3가 가장 적합해서 해당 모델을 Baseline으로 활용
YOLO V5 + CSPNet backbone + PAN head (알기) = 50.0% AP in COCO. 다른 YOLO 보다 뛰어남.
Tiny, nano version에서도 다른 YOLO보다 뛰어남.

2. YOLOX

2.1. YOLOX-DarkNet53
- Implementation details
  - 기존 YOLO V3 훈련 방식과 같음
  - 300 Epochs 중 5 Epoch 마다 warm up in COCO 2017
  - SGD 사용 (Momentum 0.9, weight decay = 0.0005)
  - Learning Rate는 0.01로 설정 / Cosine lr schedule 사용 / lr = lr*BatchSize/64로 설정
  - Batch size 128 / 8-GPU Device에서
  - Input size는 448부터 832까지 32 strides로
- YOLO V3 baseline
  - Backbone = DarkNet53 + SPPNet (알기)
  - EMA weights updating 적용
  - Cosine lr schedule (알기)
  - IoU Loss (For regression), IoU-aware branch. (알기)
  - BCE Loss (For classification score and objectness score)
  - Augmentation = ColorJitter + RandomHorizontalFlip
- Decoupled head
  - Object detection에서 regression과 classification 사이의 conflict는 유명한 문제이다.
  - 따라서 해결책으로, Decoupled head를 사용한다.
  - YOLO Series의 backbone과 feature pyramids는 지속적으로 발전해왔고, 여전히 coupled 된 head를 사용했다.
    - 실험을 통해 Coupled detection head가 performance를 낮추는 것을 확인했다.
    - 1. Decoupled head 사용하니까 속도가 빨라졌다.
    - 1. AP도 좋아졌다.
    - 따라서 figure 2와 같이 decoupled head를 사용했고, 1x1 conv를 통해서 채널 수를 감소시키고, 두개의 3x3 conv를 통해서 branch를 나누었다.
    - Batch sisze 1로 설정하였을 때 V100에서 1.1ms의 속도 향상을 가져왔다.
- Strong data augmentation
  - Mosaic (YOLO V4, V5 적용) and MixUp 적용
  - 마지막 15 Epoch에는 적용하지 않음.
  - Strong data augmentation을 적용하면 ImageNet pre-training이 더 이상 효과적이지 않다는 것을 발견.
  - 따라서 모든 모델을 scratch로 부터 훈련시킴.
  찾아봐야 할 것. IoU Loss / IoU aware branch / SPPNet / Cosine lr / label assignment strategy SimOTA / PAN Head
- Anchor-free
  - YOLO V4, YOLO V5는 Anchor-based network.
  - Anchor-based network의 단점
    1. Training 전에 cluster analysis를 통해서 domain specific한 anchor-set을 결정하는 과정이 필요하다.
    2. Detection head의 complexity가 증가한다. (Processor 간의 전송이 bottle neck)
  - Anchor-free network는 huristic한 tuning / tricks가 필요한 parameter의 수를 낮췄다. (Anchor-clustering, Grid Sensitive)
  - 각 Grid의 좌표 및 width, height만 구하는 식으로 anchor-free를 구현. 즉, object의 center coordinate + Pre-define scale (FPN Level에 따라 정함)로 학습하겠다.
  - 결과적으로 FLOPs를 낮추었고, 42.9 AP를 달성.
- Multi-positives
  - Anchor-free network는 center coordinate만 예측한다.
  - 하지만 기존의 Anchor-based prediction은 class-imbalanced problem을 방지하는 효과가 있다.
  - 따라서 우리는 center의, 3x3 Area만 positive로 할당했다.
- SimOTA
  - Advanced label assignment는 object detection에서 최근 몇 년간 중요하게 다뤄져 왔다.
  - Advanced label assignment의 4개의 Key
    1. Loss / Quality aware
    2. Center prior
    3. Dynamic number of positive anchors for each GT (Top 몇 개만 사용)
    4. Global view
  - 특히, OTA는 label assignment를 global하게 보고, assigning procedure를 Optimal Transport problem으로 보았다. (확인 필요)
  - 그렇지만 전체 Training set에 적용하기엔 Training time이 25%나 증가해서, Dynamic top-k strategy를 취했다. (확인 필요)
  - SimOTA는 각 prediction-gt pair에서 cost와 quality의 matching degree를 계산한다.
  - $c_{ij} = L_{ij}^{cls} + \lambda L_{ij}^{reg}$
  - $\lambda$는 lalancing coefficient, $L_{ij}^{cls}, L_{ij}^{reg}$는 $gt_i$와 $prediction_j$의 regression / classification loss
  - $GT_i$의 TOP K VALUE를 선택해서 Positive로 labeling, 나머지는 negative. (이렇게되면 class imbalanced problem이 발생하지 않나?)
  - SimOTA는 Training time을 줄일 뿐 아니라, Sinkhorn-Knopp algorithm (확인 필요)의 additional solver parameter 문제도 피하게 했다.
  - Figure 2에 SimOTA를 사용한 결과가 제시되어 있다.
- End-to-end YOLO
  - ‘Object detection made simpler by eliminating heuristic nms’를 참조해서 두개의 conv layer를 추가해서, 각각 label을 할당하고, gradient를 stop 했다.(각각 gradient가 전파된다.. 그런말인가?)
  - 결과적으로 end-to-end가 되었지만, performance나 inference speed가 느려졌다. (Post-processing 보다 느리다니)
2.2 Other Backbones
- YOLOX를 DarkNet53 말고 다른 backbone을 통해서도 실험 해 보았다.
- Modified CSPNet in VOLO V5
  - YOLO V5에도 CSPNet, SiLU Activation, PAN head를 적용하였고, 결과는 AP 향상 및 약간의 시간 지연(From the decoupling head).
- Tiny and Nano detectors
  - YOLOX Tiny와 nano(Use depth-wise convolution)는 YOLO V4-Tiny와 NanoDet과 비교하였다.
- Model size and data augmentation
  - 2.1에서는 모든 모델은 같은 learning rate schedule을 따르고, 같은 optimizing parameters를 사용한다고 했지만, model의 size에 따라 data agumentation 또한 변화해야하는 것을 확인했다.
  - YOLOX Nano와 같은 small model에서는 mixup을 빼고, mosaic을 약하게 적용했다.
  - Larger model image를 jitter(?) 한 후에, mixup을 Original paper보다 더 강하게 적용했다.
  - Mixup (with scale jittering) 은 Copy-paste처럼 instance mask가 필요없는 만큼 Copy-paste의 효과적인 대체체가 되었다.

3. Comparison with the SOTA

Scaled-YOLO v4나 Transformer 계열의 논문들은 시간과 resource가 부족해서 탐색은 못했지만, 이미 scope 안에 들어가 있다.

4. 1st Place on Streaming Perception Challenge (WAD at CVPR 2021)

Streaming Perception Challenge는 최근에 제시된 metric인 Streaming Accuracy (Accuracy + latency)로 평가되었다.
Accuracy와 latency의 trade-off를 고려할 때, inference time이 33ms 이하인게 가장 효율이 좋았고, 그걸로 1등을 했다.

5. Conclusion

경험적인 방법들로 YOLO를 업그레이드 한 YOLOX를 제시
Decoupled head, anchor-free, advanced label assigning strategy is applied.
Achieves better trade-off than other models.
YOLO V3를 발전시켜서 47.3 AP를 COCO에서 달성했다.

Twitter Facebook LinkedIn

[Object detection] YOLOX: Exceeding YOLO Series in 2021 Review

0. Abstract

1. Introduction

2. YOLOX

2.1. YOLOX-DarkNet53

Implementation details

YOLO V3 baseline

Decoupled head

Strong data augmentation

Anchor-free

Multi-positives

SimOTA

End-to-end YOLO

2.2 Other Backbones

Modified CSPNet in VOLO V5

Tiny and Nano detectors

Model size and data augmentation

3. Comparison with the SOTA

4. 1st Place on Streaming Perception Challenge (WAD at CVPR 2021)

5. Conclusion

공유하기

댓글남기기

참고

[Concept summary] Contrastive learning

[Concept summary] Maximum likelihood estimation

[Linear algebra] 개념 정리

[Tip] Windows에서 Docker 가지고 놀기