Unified Spherical Frontend: Towards Universal Distortion-Free Lens-Agnostic Rotation-Equivariant Perception

April 2026

Unified Spherical Frontend: Towards Universal Distortion-Free Lens-Agnostic Rotation-Equivariant Perception

Authors:

Mukai Yu

Abstract:

Standard convolutional neural networks operate on planar grids and are mismatched to wide field-of-view imagery from fisheye, catadioptric, and panoramic cameras. By Gauss's Theorema Egregium, no projection from the sphere S^2 to the plane can preserve intrinsic curvature, so every 2D mapping introduces spatially varying distortion. Models trained on one projection therefore overfit to that specific lens geometry and degrade under camera changes or in-plane rotations.

We present the Unified Spherical Frontend (USF), a modular framework that eliminates these limitations. Given a calibrated camera with an arbitrary projection model, USF lifts each pixel to S^2 and resamples the signal onto a near-uniform spherical grid. The resampled signal is then processed with spatial-domain spherical convolution and pooling layers that replace their planar counterparts. Convolution kernels depend only on geodesic distance, which makes the operation SO(3)-equivariant by construction without spectral transforms or special augmentation. All geometric quantities are camera-specific constants computed once and cached, so the runtime overhead is minimal.

We validate USF on three tasks spanning classification, object detection, and semantic segmentation. On Spherical MNIST, USF achieves less than 1% accuracy loss under arbitrary SO(3) rotations without rotation augmentation, whereas planar baselines suffer over 50% degradation. On PANDORA panoramic detection with a YOLOv11 backbone and on Stanford 2D-3D-S panoramic segmentation with DeepLab v3 and UNet backbones, USF matches or exceeds planar performance while generalizing zero-shot across lens types unseen during training. We further extend the spherical convolution primitive to SE(3)-equivariant volumetric convolution for 3D point cloud processing; the same distance-only kernel design transfers directly from S^2 to R^3.

Notes:

@mastersthesis{Yu-2026-88264,
author = {Mukai Yu},
title = {Unified Spherical Frontend: Towards Universal Distortion-Free Lens-Agnostic Rotation-Equivariant Perception},
year = {2026},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-33},
keywords = {Geometric Deep Learning, Computer Vision, Spherical Convolution, Convolution, Sampling, Classification, Detection, Segmentation},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.