Presented By: Michigan Robotics
Leveraging 3D Geometry for Place Recognition and Image Restoration in Visual and Thermal SLAM
Robotics PhD Defense, Spencer Carmichael
Committee Chair: Katherine Skinner
Abstract:
Simultaneous Localization and Mapping (SLAM) is the task of progressively mapping a sensor platform's surroundings, while also tracking its pose (position and orientation) within it. SLAM enables robot navigation in unseen, unstructured, and GPS-denied environments and supports virtual and augmented reality applications without requiring external sensors or fiducials. Due to the low cost, weight, and power consumption, small size, and high spatial resolution of visible spectrum cameras, visual SLAM is especially well-studied. However, visual SLAM is challenged by image degradation arising from poor illumination and visual obscurants such as rain, fog, snow, dust, and smoke. Thermal cameras offer an alternative that can operate in complete darkness and see through the aforementioned visual obscurants, but only relatively low quality uncooled microbolometer thermal cameras are practical in most applications. Due to the physics of microbolometer image formation, SLAM with these cameras must often contend with significant motion blur, rolling shutter distortions, and noise. Additionally, all forms of SLAM are fundamentally susceptible to pose drift, which is error accumulated by concatenating relative pose estimates. The problems of pose drift and image degradation can be mitigated with place recognition and image restoration. In recent years, both tasks are commonly performed with end-to-end deep learning approaches that process only one image or a small set of images at a time. However, these methods fail to exploit the full extent of information available in the context of SLAM and are limited in their ability to generalize to scenarios unrepresented in their training data.
This thesis introduces techniques for place recognition and image restoration that leverage 3D geometry that is either freely available as a byproduct of SLAM or can be jointly estimated. By utilizing estimated 3D geometry, these methods implicitly exploit wide baseline multi-view information and, if available, data from multiple sensors. First, an autonomous vehicle dataset is introduced featuring challenging conditions, including adverse illumination and opposing viewpoints, captured by stereo RGB, monochrome, event, and thermal cameras. Next, a place recognition algorithm for similar and opposing viewpoints is introduced that forms descriptors from point clouds estimated through stereo visual odometry. The approach is evaluated on the new dataset and achieves high performance in many conditions, except poorly lit areas at night where the surrounding structure cannot be perceived. This highlights a potential use case for thermal SLAM. Then, towards robust thermal SLAM, an approach for thermal image restoration is proposed that incorporates the unique image formation model of microbolometer thermal cameras with Neural Radiance Fields (NeRFs). Finally, the rendering pipeline introduced for thermal image restoration is adapted to 3D Gaussian Splatting and integrated into a thermal-inertial SLAM system uniquely capable of operating under severe thermal image degradation. Overall, the methods proposed in this thesis demonstrate the potential of leveraging 3D geometry in all components of a SLAM system and contribute to making SLAM reliable in challenging conditions and large-scale environments.
Zoom passcode: slam
Abstract:
Simultaneous Localization and Mapping (SLAM) is the task of progressively mapping a sensor platform's surroundings, while also tracking its pose (position and orientation) within it. SLAM enables robot navigation in unseen, unstructured, and GPS-denied environments and supports virtual and augmented reality applications without requiring external sensors or fiducials. Due to the low cost, weight, and power consumption, small size, and high spatial resolution of visible spectrum cameras, visual SLAM is especially well-studied. However, visual SLAM is challenged by image degradation arising from poor illumination and visual obscurants such as rain, fog, snow, dust, and smoke. Thermal cameras offer an alternative that can operate in complete darkness and see through the aforementioned visual obscurants, but only relatively low quality uncooled microbolometer thermal cameras are practical in most applications. Due to the physics of microbolometer image formation, SLAM with these cameras must often contend with significant motion blur, rolling shutter distortions, and noise. Additionally, all forms of SLAM are fundamentally susceptible to pose drift, which is error accumulated by concatenating relative pose estimates. The problems of pose drift and image degradation can be mitigated with place recognition and image restoration. In recent years, both tasks are commonly performed with end-to-end deep learning approaches that process only one image or a small set of images at a time. However, these methods fail to exploit the full extent of information available in the context of SLAM and are limited in their ability to generalize to scenarios unrepresented in their training data.
This thesis introduces techniques for place recognition and image restoration that leverage 3D geometry that is either freely available as a byproduct of SLAM or can be jointly estimated. By utilizing estimated 3D geometry, these methods implicitly exploit wide baseline multi-view information and, if available, data from multiple sensors. First, an autonomous vehicle dataset is introduced featuring challenging conditions, including adverse illumination and opposing viewpoints, captured by stereo RGB, monochrome, event, and thermal cameras. Next, a place recognition algorithm for similar and opposing viewpoints is introduced that forms descriptors from point clouds estimated through stereo visual odometry. The approach is evaluated on the new dataset and achieves high performance in many conditions, except poorly lit areas at night where the surrounding structure cannot be perceived. This highlights a potential use case for thermal SLAM. Then, towards robust thermal SLAM, an approach for thermal image restoration is proposed that incorporates the unique image formation model of microbolometer thermal cameras with Neural Radiance Fields (NeRFs). Finally, the rendering pipeline introduced for thermal image restoration is adapted to 3D Gaussian Splatting and integrated into a thermal-inertial SLAM system uniquely capable of operating under severe thermal image degradation. Overall, the methods proposed in this thesis demonstrate the potential of leveraging 3D geometry in all components of a SLAM system and contribute to making SLAM reliable in challenging conditions and large-scale environments.
Zoom passcode: slam