[Object detection] You Only Learn One Representation: Unified Network for Multiple Tasks Review [ENG]
you only learn one representation: unified network for multiple tasks paper review
0. Abstract
-
People understand the world through vision, hearing, and so on. So human has huge database making intuition even if it’s unseen task.
-
The unified network consisted by implicit and explicit knowledge can perform multiple task on catching pysical meaning.
1. Introduction
-
Unlike CNN, a human’s implicit knowledge can help to perform various tasks.
-
In this paper, the knowledge directly corresponded with observation is called “Explicit knowledge” and, the knowledge that has nothing to do with observation is called “Inplicit knowledge” in this paper.
-
The proposed network is unified by implicit knowledge and explicit knowledge and, enable to the learned model to contain a general representation.
-
The proposed network is constructed by compressive sensing and deep learning.
1.1. Contribution
-
The authors propose a unified network that can accomplish various tasks and the network effectively improves the performance of the model with a very small amount of additional cost.
-
The authors introduce a various things into the implicit knowledge learning process and verify their effectiveness.
-
The authors discussed the ways of multiple tools as a tool to model implicit knowledge.
-
The authors confirmed that the proposed implicit representation learned can accurately correspond to a specific physical characteristic.
-
Combined with state-of-the-art methods, the proposed model achieved comparable accuracy as Scaled-YOLOv4-P7 on object detection and the inference speed has been increased 88%.
2. Related work
-
The related work introduce about Explicit deep learning and implicit deep learning and knowledge modeling.
-
2.1. Explicit deep learning
- Transformer, Non-local networks, Commonly used explicit deep learning.
-
2.2. Implicit deep learning
-
The main categories of implicit deep learning are neural representations and deep equilibrium models.
-
Neural representations: To obtain the parameterized continueos mapping representation of discrete inputs to perform different tasks.
-
Deep equilibrium models: To transform implicit learning into a residual form neural networks.
-
-
2.3. Knowledge modeling
-
The main categories of knowledge modeling are sparse representations and memory networks.
-
Sparse representations: Describe the specific thing in low dimension than original.
-
Memory networks: To combine various forms of embedding to form memory.
-
3. How implicit knowledge work?
-
In this section, the method of implicit knowledge as constant tensor can be applied to various tasks is mainly introduced.
-
3.1. Manifold space reduction
-
The good representation should be able to find an appropriate projection in the manifold space.
-
So, like figure 3, if the target categories are classified enough, the inner product between implicit representation and projection vector can effectively achieve various tasks.
-
-
3.2. Kernel space alignment
-
The figure 4-(a), (b) illustrates an example of kernel space misalignment in multi-task and aligned example.
-
The misalignment can be deal with translation, rotation, and scaling of output feature and implicit representation.
-
-
3.3. More functions
-
Implicit knowlenge can be extended into many more functions.
-
The one of the example is that neural network can predict the offset of center coordinate.
-
It can be done by a multiplication of hyperparameter to anchors.
-
Figure 5 illustrates the examples of the functions able to be applied by implicit knowledge.
-
4. Implicit knowledge in our unified networks
-
In this section, the comparison between conventional networks and proposed unified networks.
-
4.1. Formulation of implicit knowledge
-
4.1.1. Conventional networks
-
$x$ is observation, $\theta$ is the set of parameters of a neural network, $f_{\theta}$ represents operation of the neural network, $\epsilon$ is the error term, and $y$ is the garget of given task.
-
In the conventional networks, the different observation project to the same point (=the same task) if the target of different observation is same. Figure 6-(a) illustrates the problem.
-
For the general purpose neural network, the obtained representation must be able to depict the all purpose. But that is impossible with a trivial mathematical method (One-Hot vector, theshold of Euclidean distance… and so on.). Figure 6-(b) depict the impossible state of an above problem.
-
Figure 6(c) depicts the solution the error term $\epsilon$ to find solutions for different tasks.
-
-
4.1.2. Unified networks
- The following equation is about the unified objective function with implicit knowledge and explicit knowledge.
-
$\epsilon_{ex} and \epsilon_{im}$ are the operations which modeling the explicit error and implicit error from observation x and latent code z (The feature from encoder).
-
$g_{\pi}$ is a task specific operation that serves to combine or select information from explicit knowledge and explicit knowledge.
- equation 2 can be depicted as equation 3. The Star means the operators that can combine $f_{\theta}$ and $g_{\theta}$. And in this work, the sort of operators are addition, multiplication, concatenation.
-
-
4.2. Modeling implicit knowledge
-
The implicit knowledge can be modeled in the Vector / Matrix / Tensor.
-
4.2.1. Neural Network.
-
The weight matrix can be used for linear combination or non-linaarization with z as the prior of implicit knowledge.
-
The weight matrix can be substitute by more complex neural network or Markov chain.
-
-
4.2.2. Matrix Factorization
-
The $Z$ is the implicit prior basis and the $c$ is the coefficient for forming implicit representation.
-
The $C$ can be sparse constraint or non-negative constraint for matrix factorization (NMF) form.
-
-
-
4.3. Training
-
Assuming that our model dos not have any prior implicit knowledge at the beginning, that is to say, it will not have ny effect on explicit representation $f_{\theta}(x)$.
-
When the combining operator star ∈ {addition, concatenation}, the initial implicit prior z ∼ N(0, σ).
-
When the combining operator star ∈ {addition, concatenation}, the initial implicit prior z ∼ N(0, σ).
-
Here, σ is a very small value which is close to zero. As for z and φ, they both are trained with backpropagation algorithm during the training process.
-
-
4.4. Inference
-
Since implicit knowledge is irrelevant with observation $x$, the model $g_{\pi}$ can be constant tensor before inference phase. So, implicit information has no effect on computational cost. (Why..?)
-
If the operation is multiplication, and the subsequant layer is conv layer, it can be integrated like equation (9).
-
If the operation is addition, and the subsequant layer is conv layer, no activation layer, it can be integrated like equation (10).
-
5. Experiments
-
The experiments is conducted with the MSCOCO dataset which is including object detection, instance segmentation, and so on..!
-
5.1. Experimental setup
-
The implicit knowledge in this experiments, is three aspect.
-
The content is, 1. Feature alignment for FPN, 2. Prediction refinement, 3. Multi-task learning in a single model.
-
The baseline model is YOLOv4-CSP and introduce the implicit knowledge into the model at the position pointed by the arrow in Figure 8. The hyper parameter is set as default of baseline model.
-
-
5.2. Feature alignment for FPN
-
The implicit representation for feature alignment in each FPN layer improves all of the AP. The result is shown in table 1.
-
-
5.3. Prediction refinement for object detection
-
The table 2 shows the result of adding implicit knowledge in prediction refinement.
-
In overall objective function of object detection model has position, objectness, class information. So the implicit knowledge is already reflected in almost object detection model.
-
-
5.4. Canonical representaion for multi-task
-
In normal case, for multi-task, the cost function is designed with joint optimization process and this process can make worse the total accuracy.
-
In this experiement, only implicit knowledge reflect the multi-task aspect instead of joint optimization process.
-
-
5.5. Implicit modeling with different operators
-
In the implicit knowledge for feature alignment experiment, the addition and concatenation improve performance, while the multiplicaiton makes worse in accuracy.
-
In the implicit knowledge for prediction refinement experiment, the concatenation cannot be stated bacause of change of channel.
-
So, in prediction refinement experiment, applying multiplication is better than applying addition. Because the multiplication can set anchor owns a larger optimization space but addition can affect to coordinate bounded by grid.
-
Table 4 shows the result of this experiment.
-
-
5.6. Modeling implicit knowledge in different ways
-
The result of modeling with neural networks and matrix factorization is shown in table 6.
-
-
5.7. Analysis of implicit models
-
In this section, the parameters, FLOPs, and the learning process of model with/ w/o implicit knowledge is described in table 7 and figure 11.
-
-
5.8. Implicit knowledge for object detection
-
The table 8 shows the effect of introducing implicit knowledge.
-
The table 9 shows the comparision of state-of-the-art.
-
6. Conclusion
- The authors show how to construct a unified network that integrates implicit knowledge and explicit knowledge which is very effective for multi-task learning.
7. Impression
-
I understand about what is the implicit knowledge and how to reflect the knowledge to original model. However, i don’t know about how the implicit knowledge can be constant tensor in detail..
-
The advantage of this model is to make possible to multi-task learning but there is no result about other than object detection.
-
It was difficult to understand about implicit learning because there were too few empirical explanation in the mathematical equations of implicit learning.
댓글남기기