The fine grained understanding of the visual content in images provided by semantic segmentation finds application in different fields. I particularly interestd in its application to driving scenes and aerial imagery, which are extremely relevant in todays industry (e.g., for autonomous driving, crops monitoring, etc.). Within this contexts, we are investigating solutions to improve the segmenttion results even when the models are applied to different target domains with respect to the distribution of data available at training time (domain shift problem). Although this problem affects also other tasks, in vsemantic segmentation it is exacerbated by the fact that labeling the images with pixel-wise annotations is very expensive and time consuming. Thus, expanding the training set with images from many different domains is infeasible not only for the difficulty in collecting the images, but also in labeling them.
Semantic Segmentation for Autonomous Driving
Athough semantic segmentation holds the potential to be an extremely important functionality in the perception stack of a self-driving car, its effectiveness depends on wether or not it can adapt or generalize to unseen domains. This problem is crucial if we want to achieve the promise of autonomous vehicles that are capable to operate anywhere in the world. To support this research, at Vandal we have created a synthetic dataset, called IDDA, which contains 105 different scenarios that differentiate for the weather condition, environment and point of view of the camera.
We have openly released IDDA as a tool to study and develop domain adaptation and generalization solutions for semantic segmentation in driving scenes. More info can be found in our RA-L paper IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving.
For example, one of the solutions that we have developed also leveraging this dataset is PixDA, a few-shot method for cross-domain alignment in semantic segmentation.
More info on PixDA can be found in our WACV 2021 paper Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic Segmentation. The idea behind PixDA is that a few-shot solution where the training dataset includes only a few labeled images from the target domain may be more practical than an unsupervised domain algorithm that requires a very large number of unlabeled target images. Yet, the few-shot setting is very challenging because it can significantly increase the class imbalance between source and target data. In fact, although an imbalance in the available pixels-per class is typical in semantic segmentation (because some classes are very extended, and other small), in the few-shot setting the target images may expose very few pixels from some classes, or even none at all. Moreover, an image wise domain alignment of the features can produce a negative transfer on some semantic categories that are already well aligned across the two domains. PixDA uses a new loss, called PixAdv, that aligns source and target domains locally while reducing negative transfer and avoiding overfitting the underrepresented classes.
Semantic Segmentation for Aerial Monitoring and Remote Sensing
The large availability of data for driving scenes has led to new and more effective semantic segmentation architectures. However, these same solutions are generally less effective when used to process aerial images. In our research we have found that the peculiarities of aerial images may be the cause for this phenomenon and may require more specific solutions rather then applying models developed for driving scenes. In particular, the model the model cannot rely on a fixed semantic structure of the scene, unlike in driving scenes where the street is always at the bottom, the sky on top, etcetera. Moreover, there is an even more pronounced imbalance among classes, with some that are much more extended than other (for example land covers versus cars). We are developing solutions that address the domain shift problem in aerial segmentation and are tailored to address the specificities of these scenes.
- Workshop Proc.Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial ImagesIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2022
- Conference Proc.Learning Semantics for Visual Place Recognition through Multi-Scale AttentionIn Proceedings of the 21st International Conference on Image Analysis and Processing (ICIAP) 2022
- Conference Proc.Pixel-by-Pixel Cross-Domain Alignment for Few-Shot Semantic SegmentationIn 2022 IEEE Winter Conference on Applications of Computer Vision (WACV) 2022
- Conference Proc.Reimagine BiSeNet for Real-Time Domain Adaptation in Semantic SegmentationIn Proceedings of the I-RIM 2021 Conference 2021
- JournalIDDA: A Large-Scale Multi-Domain Dataset for Autonomous DrivingIEEE Robotics and Automation Letters 2020