semantic segmentation

pixel-level classification

Figure 1. Example of semantic segmentation from an RGB image in a driving scenario Image from the Cityscapes dataset.

The fine grained understanding of the visual content in images provided by semantic segmentation finds application in different fields. I particularly interestd in its application to driving scenes and aerial imagery, which are extremely relevant in todays industry (e.g., for autonomous driving, crops monitoring, etc.). Within this contexts, we are investigating solutions to improve the segmenttion results even when the models are applied to different target domains with respect to the distribution of data available at training time (domain shift problem). Although this problem affects also other tasks, in vsemantic segmentation it is exacerbated by the fact that labeling the images with pixel-wise annotations is very expensive and time consuming. Thus, expanding the training set with images from many different domains is infeasible not only for the difficulty in collecting the images, but also in labeling them.

Semantic Segmentation for Autonomous Driving

Athough semantic segmentation holds the potential to be an extremely important functionality in the perception stack of a self-driving car, its effectiveness depends on wether or not it can adapt or generalize to unseen domains. This problem is crucial if we want to achieve the promise of autonomous vehicles that are capable to operate anywhere in the world. To support this research, at Vandal we have created a synthetic dataset, called IDDA, which contains 105 different scenarios that differentiate for the weather condition, environment and point of view of the camera.

Figure 2. IDDA offers 105 scenarios of driving scenes, with weather conditions, environments and points of view of the camera (on a car, jeep, mini van, ...). Each RGB images is accompanied by a semantic mask and depth image.

We have openly released IDDA as a tool to study and develop domain adaptation and generalization solutions for semantic segmentation in driving scenes. More info can be found in our RA-L paper IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving.

For example, one of the solutions that we have developed also leveraging this dataset is PixDA, a few-shot method for cross-domain alignment in semantic segmentation.
More info on PixDA can be found in our WACV 2021 paper Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation. The idea behind PixDA is that a few-shot solution where the training dataset includes only a few labeled images from the target domain may be more practical than an unsupervised domain algorithm that requires a very large number of unlabeled target images. Yet, the few-shot setting is very challenging because it can significantly increase the class imbalance between source and target data. In fact, although an imbalance in the available pixels-per class is typical in semantic segmentation (because some classes are very extended, and other small), in the few-shot setting the target images may expose very few pixels from some classes, or even none at all. Moreover, an image wise domain alignment of the features can produce a negative transfer on some semantic categories that are already well aligned across the two domains. PixDA uses a new loss, called PixAdv, that aligns source and target domains locally while reducing negative transfer and avoiding overfitting the underrepresented classes.

Figure 3. Illustration of the pixel-by-pixel adversarial learning of PixDA. A new pixel-wise discriminator computes the adversarial loss whose contribution at each pixel is weighted by two terms: S, that considers the ability of the model to correctly represent the pixel, and B, that weights each pixel based on the frequency of its semantic class. Yellow/blue lines refer to the source/target domain, respectively.

Semantic Segmentation for Aerial Monitoring and Remote Sensing

The large availability of data for driving scenes has led to new and more effective semantic segmentation architectures. However, these same solutions are generally less effective when used to process aerial images. In our research we have found that the peculiarities of aerial images may be the cause for this phenomenon and may require more specific solutions rather then applying models developed for driving scenes. In particular, the model the model cannot rely on a fixed semantic structure of the scene, unlike in driving scenes where the street is always at the bottom, the sky on top, etcetera. Moreover, there is an even more pronounced imbalance among classes, with some that are much more extended than other (for example land covers versus cars). We are developing solutions that address the domain shift problem in aerial segmentation and are tailored to address the specificities of these scenes.

Figure 4. Some of the solutions that we have developed specifically for aerial segmentation. This method, described in the paper Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial Images>, uses an augmentation invariance strategy and an adaptive sampling that are tailored for the aerial scenario.

Related Publications

  1. Workshop Proc.
    Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial Images
    Tavera, A., Arnaudo, E.,  Masone, C., and Caputo, B.
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2022
  2. Conference Proc.
    Learning Semantics for Visual Place Recognition through Multi-Scale Attention
    Paolicelli, V., Tavera, A., Berton, G.,  Masone, C., and Caputo, B.
    In Proceedings of the 21st International Conference on Image Analysis and Processing (ICIAP) 2022
  3. Conference Proc.
    Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation
    Tavera, A., Cermelli, F.,  Masone, C., and Caputo, B.
    In 2022 IEEE Winter Conference on Applications of Computer Vision (WACV) 2022
  4. Conference Proc.
    Reimagine BiSeNet for Real-Time Domain Adaptation in Semantic Segmentation
    Tavera, A.,  Masone, C., and Caputo, B.
    In Proceedings of the I-RIM 2021 Conference 2021
  5. Journal
    IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving
    Alberti, E., Tavera, A.,  Masone, C., and Caputo, B.
    IEEE Robotics and Automation Letters 2020