2National Taiwan University
4University of Maryland, College Park
Comparisons. We show rendered smooth trajectory from each method.
We progressively add frames and estimate their poses while optimizing radiance fields. Local radiance fields are dynamically allocated throughout the process. We show the current frames in red and the radiance fields locations with the blue boxes. On the right, we show the last input frame used on top and the last rendered testing view at the bottom. We observe that our method starts estimating a coarse radiance field along with poses before a refinement stage that sharpens the renders. Then, we create a new local radiance field and process another segment of the video.
We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene. For handling unknown poses, we jointly estimate the camera poses with radiance field in a progressive manner. We show that progressive optimization significantly improves the robustness of the reconstruction. For handling large unbounded scenes, we dynamically allocate new local radiance fields trained with frames within a temporal window. This further improves robustness (e.g., performs well even under moderate pose drifts) and allows us to scale to large scenes. Our extensive evaluation on the Tanks and Temples dataset and our collected outdoor dataset, Static Hikes, show that our approach compares favorably with the state-of-the-art.
By smoothing the camera path, we achieve much smoother stabilization than 2D methods such as FuSta.
Ablation. Progressive optimization is crucial to the pose estimation and local radiance fields grants more robustness and improves sharpness in later parts of the sequences.