[Object detection] You Only Learn One Representation: Unified Network for Multiple Tasks Review [ENG]

7 분 소요


you only learn one representation: unified network for multiple tasks paper review

title

0. Abstract

  • People understand the world through vision, hearing, and so on. So human has huge database making intuition even if it’s unseen task.

  • The unified network consisted by implicit and explicit knowledge can perform multiple task on catching pysical meaning.

1. Introduction

  • Unlike CNN, a human’s implicit knowledge can help to perform various tasks.

  • In this paper, the knowledge directly corresponded with observation is called “Explicit knowledge” and, the knowledge that has nothing to do with observation is called “Inplicit knowledge” in this paper.

  • The proposed network is unified by implicit knowledge and explicit knowledge and, enable to the learned model to contain a general representation.

  • The proposed network is constructed by compressive sensing and deep learning.

1.1. Contribution

  • The authors propose a unified network that can accomplish various tasks and the network effectively improves the performance of the model with a very small amount of additional cost.

  • The authors introduce a various things into the implicit knowledge learning process and verify their effectiveness.

  • The authors discussed the ways of multiple tools as a tool to model implicit knowledge.

  • The authors confirmed that the proposed implicit representation learned can accurately correspond to a specific physical characteristic.

  • Combined with state-of-the-art methods, the proposed model achieved comparable accuracy as Scaled-YOLOv4-P7 on object detection and the inference speed has been increased 88%.

title

2. Related work

  • The related work introduce about Explicit deep learning and implicit deep learning and knowledge modeling.

  • 2.1. Explicit deep learning

    • Transformer, Non-local networks, Commonly used explicit deep learning.
  • 2.2. Implicit deep learning

    • The main categories of implicit deep learning are neural representations and deep equilibrium models.

    • Neural representations: To obtain the parameterized continueos mapping representation of discrete inputs to perform different tasks.

    • Deep equilibrium models: To transform implicit learning into a residual form neural networks.

  • 2.3. Knowledge modeling

    • The main categories of knowledge modeling are sparse representations and memory networks.

    • Sparse representations: Describe the specific thing in low dimension than original.

    • Memory networks: To combine various forms of embedding to form memory.

3. How implicit knowledge work?

  • In this section, the method of implicit knowledge as constant tensor can be applied to various tasks is mainly introduced.

  • 3.1. Manifold space reduction

    • The good representation should be able to find an appropriate projection in the manifold space.

    • So, like figure 3, if the target categories are classified enough, the inner product between implicit representation and projection vector can effectively achieve various tasks.

    title

  • 3.2. Kernel space alignment

    • The figure 4-(a), (b) illustrates an example of kernel space misalignment in multi-task and aligned example.

    • The misalignment can be deal with translation, rotation, and scaling of output feature and implicit representation.

    title

  • 3.3. More functions

    • Implicit knowlenge can be extended into many more functions.

    • The one of the example is that neural network can predict the offset of center coordinate.

    • It can be done by a multiplication of hyperparameter to anchors.

    • Figure 5 illustrates the examples of the functions able to be applied by implicit knowledge.

    title

4. Implicit knowledge in our unified networks

  • In this section, the comparison between conventional networks and proposed unified networks.

  • 4.1. Formulation of implicit knowledge

    • 4.1.1. Conventional networks

      title

      • $x$ is observation, $\theta$ is the set of parameters of a neural network, $f_{\theta}$ represents operation of the neural network, $\epsilon$ is the error term, and $y$ is the garget of given task.

      • In the conventional networks, the different observation project to the same point (=the same task) if the target of different observation is same. Figure 6-(a) illustrates the problem.

      • For the general purpose neural network, the obtained representation must be able to depict the all purpose. But that is impossible with a trivial mathematical method (One-Hot vector, theshold of Euclidean distance… and so on.). Figure 6-(b) depict the impossible state of an above problem.

      • Figure 6(c) depicts the solution the error term $\epsilon$ to find solutions for different tasks.

      title

    • 4.1.2. Unified networks

      • The following equation is about the unified objective function with implicit knowledge and explicit knowledge.

      title

      • $\epsilon_{ex} and \epsilon_{im}$ are the operations which modeling the explicit error and implicit error from observation x and latent code z (The feature from encoder).

      • $g_{\pi}$ is a task specific operation that serves to combine or select information from explicit knowledge and explicit knowledge.

      title

      • equation 2 can be depicted as equation 3. The Star means the operators that can combine $f_{\theta}$ and $g_{\theta}$. And in this work, the sort of operators are addition, multiplication, concatenation.
  • 4.2. Modeling implicit knowledge

    • The implicit knowledge can be modeled in the Vector / Matrix / Tensor.

    • 4.2.1. Neural Network.

      • The weight matrix can be used for linear combination or non-linaarization with z as the prior of implicit knowledge.

      • The weight matrix can be substitute by more complex neural network or Markov chain.

    • 4.2.2. Matrix Factorization

      title

      • The $Z$ is the implicit prior basis and the $c$ is the coefficient for forming implicit representation.

      • The $C$ can be sparse constraint or non-negative constraint for matrix factorization (NMF) form.

      title

  • 4.3. Training

    • Assuming that our model dos not have any prior implicit knowledge at the beginning, that is to say, it will not have ny effect on explicit representation $f_{\theta}(x)$.

    • When the combining operator star ∈ {addition, concatenation}, the initial implicit prior z ∼ N(0, σ).

    • When the combining operator star ∈ {addition, concatenation}, the initial implicit prior z ∼ N(0, σ).

    • Here, σ is a very small value which is close to zero. As for z and φ, they both are trained with backpropagation algorithm during the training process.

  • 4.4. Inference

    • Since implicit knowledge is irrelevant with observation $x$, the model $g_{\pi}$ can be constant tensor before inference phase. So, implicit information has no effect on computational cost. (Why..?)

    • If the operation is multiplication, and the subsequant layer is conv layer, it can be integrated like equation (9).

      title

    • If the operation is addition, and the subsequant layer is conv layer, no activation layer, it can be integrated like equation (10).

      title

5. Experiments

  • The experiments is conducted with the MSCOCO dataset which is including object detection, instance segmentation, and so on..!

  • 5.1. Experimental setup

    • The implicit knowledge in this experiments, is three aspect.

    • The content is, 1. Feature alignment for FPN, 2. Prediction refinement, 3. Multi-task learning in a single model.

    • The baseline model is YOLOv4-CSP and introduce the implicit knowledge into the model at the position pointed by the arrow in Figure 8. The hyper parameter is set as default of baseline model.

      title

  • 5.2. Feature alignment for FPN

    • The implicit representation for feature alignment in each FPN layer improves all of the AP. The result is shown in table 1.

      title

  • 5.3. Prediction refinement for object detection

    • The table 2 shows the result of adding implicit knowledge in prediction refinement.

      title

    • In overall objective function of object detection model has position, objectness, class information. So the implicit knowledge is already reflected in almost object detection model.

  • 5.4. Canonical representaion for multi-task

    • In normal case, for multi-task, the cost function is designed with joint optimization process and this process can make worse the total accuracy.

    • In this experiement, only implicit knowledge reflect the multi-task aspect instead of joint optimization process.

      title

  • 5.5. Implicit modeling with different operators

    • In the implicit knowledge for feature alignment experiment, the addition and concatenation improve performance, while the multiplicaiton makes worse in accuracy.

    • In the implicit knowledge for prediction refinement experiment, the concatenation cannot be stated bacause of change of channel.

    • So, in prediction refinement experiment, applying multiplication is better than applying addition. Because the multiplication can set anchor owns a larger optimization space but addition can affect to coordinate bounded by grid.

    • Table 4 shows the result of this experiment.

      title

      title

  • 5.6. Modeling implicit knowledge in different ways

    • The result of modeling with neural networks and matrix factorization is shown in table 6.

      title

  • 5.7. Analysis of implicit models

    • In this section, the parameters, FLOPs, and the learning process of model with/ w/o implicit knowledge is described in table 7 and figure 11.

      title

      title

  • 5.8. Implicit knowledge for object detection

    • The table 8 shows the effect of introducing implicit knowledge.

      title

    • The table 9 shows the comparision of state-of-the-art.

      title

6. Conclusion

  • The authors show how to construct a unified network that integrates implicit knowledge and explicit knowledge which is very effective for multi-task learning.

7. Impression

  • I understand about what is the implicit knowledge and how to reflect the knowledge to original model. However, i don’t know about how the implicit knowledge can be constant tensor in detail..

  • The advantage of this model is to make possible to multi-task learning but there is no result about other than object detection.

  • It was difficult to understand about implicit learning because there were too few empirical explanation in the mathematical equations of implicit learning.

댓글남기기