title |
---|
texture analysis |
Use pretrained VGG network as a feature embedding method. Given target image, extract features, and compute Gram matrix
The experiments show that the model can generate better image if using the feature map up to pool4 layer of the VGG network.
The algorithm may fail when the target texture image is man made, for example, brick walls.
The Gram matrix, as a new set of features, can also be used for object recognition, a supervised learning task. When using the higher layer of Gram matrix, we can achieve good classification of the objects in the image. This is a bit of surprising, since Gram matrix as a texture feature, does not have any spatial information. But such good classification performance is also consistent with the fact that convnet is also spatial agnostic. (LW: this is probably the reason that some garbage input image can fool convnet even it does not have spatial information, because convnet does not care spatial information)
Interestingly, the authors claim that both "Spatial pyramid pooling in deep convolutional networks for visual recognition" and "Deep convolutional filter banks for texture recognition and segmentation" use similar concept: compute a pooled statistics/features in a stationary feature space.
An early paper that propose a function
That remind me a question that what convolutional net learns. Does it just learn the texture information, which is often sufficient for classification task? I suspect it mainly learns spatial agnostic local statistics, because the full convolutional layer.
Besides the cross-correlation between the feature maps within a layer/scale, this paper also proposes to use cross-correlation between feature maps across layers. Since different scales have different feature map size, finer scale's feature map need to be down-sampled before cross-correlation. however, such texture representation is not used in [gat15].
Use two CNN for texture classification and recognition. One use the second to last layer of pretrained CNN, so the fully connected layer has spatial information of the object. The other one use the last convolutional layer without the fully connected layer, so it does not have spatial information, and better represent texture information. In practice, the second feature embedding method extract features from multiple scale/layers, just like SIFT.
First generate regions, then do texture classification within regions. Both CNN models are used in the texture classification stage, not in the fist region generation stage. This is because stage one only have a coarse region proposal.
This paper is similar to "Deep Filter Bank..." but uses SIFT + Fisher vector, instead of using CNN + FV.
Same author has some works on texture representation. And the bilinear CNN is similar to the cross-correlation methods for texture synthesis.
Referenced by the above bilinear CNN. Visual pathway has two path: one for 'where' and another for 'what'. This is yet another paper to show the connection to biological vision.
Seem like an early work of bilinear models. also referenced by bilinear CNN paper.
The same authors of bilinear CNN use the model to inverse network and generate images.
Looks like the original paper used by Neural-Doodle.
Latest from Twitter.
A paper I found earlier this year, from bengio's group.
Google and found it for texture and content separation. Donoho's paper. Note the citation and need to dig into those cited it.