ShapeMoE

Abstract

Amodal segmentation targets to predict complete object masks, covering both visible and occluded regions. This task poses significant challenges due to complex occlusions and extreme shape variation, from rigid furniture to highly deformable clothing. Existing one-size-fits-all approaches rely on a single model to handle all shape types, struggling to capture and reason about diverse amodal shapes due to limited representation capacity. A natural solution is to adopt a Mixture-of-Experts (MoE) framework, assigning experts to different shape patterns. However, naively applying MoE without considering the object's underlying shape distribution can lead to mismatched expert routing and insufficient expert specialization, resulting in redundant or underutilized experts. To deal with these issues, we introduce ShapeMoE, a shape-specific sparse Mixture-of-Experts framework for amodal segmentation. The key idea is to learn a latent shape distribution space and dynamically route each object to a lightweight expert tailored to its shape characteristics. Specifically, ShapeMoE encodes each object into a compact Gaussian embedding that captures key shape characteristics. A Shape-Aware Sparse Router then maps the object to the most suitable expert, enabling precise and efficient shape-aware expert routing. Each expert is designed as lightweight and specialized in predicting occluded regions for specific shape patterns. ShapeMoE offers well interpretability via clear shape-to-expert correspondence, while maintaining high capacity and efficiency. Experiments on COCOA-cls, KINS, and D2SA show that ShapeMoE consistently outperforms state-of-the-art methods, especially in occluded region segmentation.

Motivation

Motivation and Comparison of Routing Strategies. (a) One-size-fits-all models treat all shape types equally, often producing incomplete predictions under occlusion. (b) Naive MoE approaches rely on softmax-based routing without modeling shape distributions, leading to a mismatch between samples and experts. (c) Our ShapeMoE framework encodes each shape as a Gaussian distribution in a latent space, enabling shape-aware sparse routing to specialized experts and improving segmentation of diverse amodal shapes. Best viewed in color.

The Proposed ShapeMoE Method

Given an input image and a visible mask, ShapeMoE performs amodal segmentation through the following stages. (1) The image is encoded by the image feature encoder, while the visible mask is embedded into a shape-aware mask embedding. (2) The Shape Distribution Encoder predicts the Gaussian parameters that characterize the object’s shape distribution in a learned latent space. (3) A latent shape representation is sampled, and the Shape-aware Sparse Router computes expert selection scores to route each instance to the most appropriate expert. (4) The selected expert, specialized in specific shape patterns, predicts the final high-quality amodal segmentation mask. Best viewed in color.

Qualitative Results

Qualitative results of the proposed ShapeMoE. Four representative cases are shown across various object categories, including bench, human, and horse, demonstrating ShapeMoE’s ability to handle complex occlusions and varied amodal shapes. Best viewed in color and zoomed in for details.

BibTeX


          @inproceedings{li2025shapemoe,
                title={Shape Distribution Matters: Shape-specific Mixture-of-Experts for Amodal Segmentation under Diverse Occlusions},
                author={Li, Zhixuan and Liu, Yujia and Hui, Chen and Lee, Jeonghaeng and Lee, Sanghoon and Lin, Weisi},
                booktitle={arXiv:2508.01664},
                year={2025}
          }