ICCV 2017: Cool Computer Vision Ideas Everywhere

Nikolaos Sarafianos, November 9, 2017

After spending 8 days in the beautiful city of Venice, attending ICCV and presenting our work on curriculum learning for multi-task classification, it's time for a writeup on papers I liked, presentations I found interesting and some trends for the near future. You can find the opening slides which contain nice stats here.


GANs and Instance Level Recognition

Among the tons of options for workshops and tutorials that were available I decided to go mainstream and attend the tutorial on GANs from Ian Goodfellow and the workshop on instance level recognition organized by Georgia Gkioxari. Both websites have the slides of the talks available which is wonderful. I liked a lot the talk from Mihaela Rosca on autoencoder GANs which is based on this paper for its simplicity and the way it's presented. StackGANs have a follow-up work with even more realistic results which improves the previous idea but works on multiple scales with a conditional and an unconditional loss at each one of them. The horse2zebra demo from the CycleGAN authors consisted of a horse mask that you get to wear which was hilarious.
From the workshop on instance recognition, I loved how Kaiming He deviated from his Mask-RCNN work and he explained his perspective on invariance versus equivariance and how this concept fitted his work by ditching ROIpool and doing ROIalign on the generated region proposals. Since we're here check the paper on Deformable Convolutional Networks from MSR Asia which discusses deformable RoI pooling. I also attended parts of the beyond supervised learning workshop which covered a lot of unsupervised and reinforcement learning approaches with computer vision applications. The presenters focused on generative, self-supervised, a lot of RL, and imitation learning techniques and argued that just throwing data into smartly designed architectures ignores significant amounts of information that's available. Here's what I found the most interesting:
  • Attend Infer Repeat: proposes an elegant way of performing inference in structured images (2 digits in an image) using generative modeling which adapts to the input given.
  • Co-Segmentation by Composition: Smart way of performing segmentation in an unsupervised way by leveraging occurrence of similar image patterns in images. For example, they can segment all the bikes in an image by inferring that these 2 detected things (they do not have a label) look very similar and thus they must have a similar structure.
  • Alexei Efros talked about their recent ICML work on self-supervised learning in which they learn to play Mario without introducing any rewards as in RL but just by making him curious about the world. He seems to be a super cool person and his talks are very nice; find one on YouTube.

Cool Papers

The great thing in conferences with 700+ papers and no parallel sessions (at least in the main conference) is that whatever your research interests are you will always find posters with a cute and simple idea to learn more about. Without counting keywords in titles I have a feeling that action recognition, human re-identification, pose estimation and face recognition should be among the most popular applications. I noticed a lot of papers with pyramid-like networks that extracted information at multiple scales, a lot of autoencoder-like applications and obviously a plethora of GANs. Video is still not huge although there were a decent amount of papers working on video understanding. I was glad to see that some papers followed simple straight-forward approaches to solve problems. The best example I can recall is this one: A simple yet effective baseline for 3d human pose estimation.
Learning Embeddings
Domain Adaptation
  • Both Kate Saenko and Trevor Darrell discussed recent works on domain adaptation including two adversarial-based ones. One is ADDA in which a discriminator with an adversarial loss is introduced between the two domains during training. They have a nice follow-up work (I guess for the upcoming CVPR) named CyCADA which enforces semantic consistency in the generator by an additional source cycle loss. They generate the source back from the target following the CycleGAN paper and compare the initial and the generated images.
  • Another piece of interesting work is the paper on Asssociative Domain Adaptation which leverages unlabeled data and tries to send a "walker" from the source labeled domain to the target and back to the source (again a cycle-related idea) and checks if it goes back to the same number of the MNIST dataset.
  • Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach

Methods on Learning
I find very interesting papers that aspire to deviate from the classical supervised learning paradigm and look into ways to combat problems such as imbalanced data, noisy labels, or adding tasks to existing networks. Alex Kendall's talk on geometry and uncertainty in Computer Vision also touched this subject since he discussed his recent paper on how to assign weights to tasks based on the homoscedastic uncertainty of the individual tasks. Some papers I enjoyed talking to their authors are the following.


Ablation studies and failure cases were present in most of the papers as just state-of-the-art results are not important per se if there's no explanation behind the components of the method that achieved them. I noticed a lot of people that had understood in depth the problem they were aspiring to solve, and how they had structured the solution they were proposing into sub-parts each one of which made a step towards the right direction. I feel that as a community we have solved the easy end-to-end supervised learning problems that were to be solved and people have started looking into ways on what we can do better. This can be in terms of better embeddings, other learning paradigms, or leveraging additional information that is out there in our data/world and we yet haven't used it properly.

Back to my page