May 2024 and Gauzilla Pro
May was a well-rounded month for Radiance Fields with platform announcements, updates, new research papers, and additional industry jobs.
Platform Updates
Training NeRFs with GPU acceleration on Apple Silicon has arrived from Lifecast’s Volurama and Scaniverse launches on Android. Google CloudNeRF launched with a tutorial from yours truly! Use this link to get $350 in GCP credits.
New ecommerce platform Doly launches!
Gracia VR launched V1.0 onto the Steam and Sidequest store! They’ve also shipped standalone mode for the Quest 3 and VisionOS support will be coming soon.
Jonathan Stephens has been exploring 2DGS and publishing his results to Linkedin with a tutorial coming in the near future. PlayCanvas implements Gaussian Splatting into their Editor!
Research:
On the research side, there have been some really awesome papers!
CAT3D from Google drops the latest SOTA for generative and while it’s a bit out of the reach for consumers for now, gives a glimpse of what’s coming. Google was busy this month, also releasing NeRF-Casting with significantly improved reflections. SuperGaussian greatly steps up scaling capabilities for 3DGS. RaDe-GS brings even better surface reconstruction than 2DGS.
Who’s Hiring:
The Job Board on Radiancefields.com is officially live. If you have an opening, get in touch!
Arcturus Industries is looking for a Senior Computer Vision / DL / SLAM Engineer in AR/VR. Apply now!
Tesla is hiring a Machine Learning Engineer, 3D Computer Vision, Self-Driving
Microsoft is looking for a Senior Researcher!
Graswald is hiring a Senior Computer Graphics Engineer (Frontend)
Interview with Yoshi Saito of Gauzilla Pro
I spoke with the creator of Gauzilla Pro, Yoshiharu (Josh) Saito about his platform!
Can you tell us a little about your background and what inspired you to begin building Gauzilla? What is Gauzilla?
Yoshi: Back when I was in high school (almost 20 years ago - time flies!) I developed a 3D graphics engine in C++ from scratch using the OpenGL API. I went so far as to implement a per-pixel voxel raycaster on NV40 (a pre-CUDA GeForce architecture). But a few years later my main interest shifted to financial markets (quantitative/algorithmic trading in particular), where in subsequent years I worked primarily as a “quant” applying AI/ML at hedge funds and trading firms. Around 2021-2022 a series of advancements in NeRF (eg. AutoInt, Instant-NGP, KiloNeRF) revived my interest in 3D graphics, and 3D Gaussian Splatting or 3DGS, released in 2023, blew me away when it came out, which made me decide to start developing Gauzilla.
Gauzilla Pro is a fully web-based, no-code, and AI-powered 3DGS editing platform that’s currently in active development. I decided to develop a 3DGS editor because, although there were many 3DGS training apps/solutions, 3DGS editing apps were almost non-existent. There is also Gauzilla Basic, which is an open-source and free-to-use 3DGS web viewer.
Gauzilla is written in the Rust programming language, which assures memory and thread safety while achieving high execution performance comparable to C/C++ (nowadays even the White House recommends using memory-safe languages). It runs as a secure, multi-threaded WebAssembly (or WASM, a portable executable binary format developed by Mozilla, Google, Microsoft, etc) module in a web browser, utilizing WebGL2 for rendering and WASM for serverless edge-AI.
How does Gaussian Splatting allow for new uses and offerings that traditional 3D methods like photogrammetry do not?
Yoshi: Conventional 3D sensing technologies (eg. photogrammetry) let you reconstruct reality scenes in textured meshes or colored point clouds. Gaussian Splatting, on the other hand, uses 3D Gaussian ellipsoids (or 2D Gaussian ellipses in the case of 2DGS) as a rendering primitive, each of which has a unique color and opacity, thereby allowing for rendering transparent materials (eg. tree leaves, glasses) and soft/fluffy objects (eg. animals, clothes) in high fidelity. Gaussian Splatting also has an advantage in model generation - photogrammetry usually requires a large number of high-quality photos (except, perhaps, for object-centric small-scale scenes), but 3DGS can be used for reconstructing both small/detailed scenes as well as large-scale scenes from relatively short videos. Moreover it can be trained using blurry videos and/or low-res videos taken with cheap consumer devices like smartphones.
3DGS is kind of a sweet spot between NeRF and conventional triangle meshes. The former lets you do high-fidelity 3D reconstruction for novel view synthesis (NVS). Its underlying neural radiance field is great for implicit/differential volume rendering but inefficient for rendering empty spaces or off-surface volumes, hence not for real-time rendering by default. Triangle meshes (and the rasterization thereof) are highly efficient for real-time rendering in general (and for editing as well because they are explicit) but not ideal for differential rendering because rasterization is not straightforwardly differentiable. Gaussian Splatting is ground-breaking because it is 1) explicit (i.e., it has Gaussian primitives), 2) efficient (it is rasterizable on the GPU), and 3) differentiable for ML training.
I should mention that photogrammetry can be used in 3DGS workflows. For instance, you can use RealityCapture to accurately align and export camera poses in the COLMAP format, then train 3DGS based off of it for large-scale 3D reconstruction.
What were the main challenges when you were building and how did you solve them?
Yoshi: Since I have an engineering background in 3D (eg. GPU volume rendering) and in AI/ML (eg. transformer neural networks, deep reinforcement learning), understanding the theoretical/mathematical aspects of 3DGS (how it’s trained from imagery and rendered in real time on the GPU) was not too difficult. The main challenge so far was to achieve high-performance & platform-agnostic 3DGS rendering on the web.
3DGS rendering requires the dynamic sorting of 3D Gaussians (usually 1~3 millions of Gaussians per scene in each frame) based on the depth from the camera since it utilizes alpha blending for efficient surface-to-pixel forward rendering on the GPU. The original 3DGS implementation utilizes fast GPU-based radix sort for this, but for implementing it in web apps it requires the use of WebGPU, which still has not been widely adopted by web browsers. So I decided to sort Gaussians on the CPU in WebAssembly using multithreading since all CPUs nowadays are multi-core. It turned out multithreading was not straightforward in WebAssembly - I had to use what’s called a lock-free mechanism to bypass some limitations. Fortunately, it has been a common ultra-low-latency (ULL) programming technique in the field of high-frequency trading (HFT) in finance, a field where I had worked previously (in fact, HFT was the research topic of my Master’s degree dissertation).
Amazingly, Gauzilla is fast and easy to render in the browser. What hardware would a customer need to get up and running with your platform? How easy is it for a customer like Skender to integrate it into their existing workflow?
Yoshi: Gauzilla is still in active development and there’s a lot of room for performance optimization. At this point in time the recommended platform to run Gauzilla is a desktop/laptop with a decent GPU (eg. RTX 20+, AMD GPUs are also supported). Mobile platforms like smartphones are not the main target since Gauzilla is primarily an editing tool for developers, but this could change in the future.
Gauzilla users can load their pre-trained 3DGS models (.ply files) on a web browser without having to upload the data first to an external server (in other words, users have 100% data privacy). You can train 3DGS models beforehand either locally or online using various existing solutions out there from various data sources (smartphones, drones, 360 cameras, LiDAR devices). On Gauzilla, you can segment individual objects in the scene using the state-of-the-art vision transformer (ViT) AI by simply clicking the object from a few different viewing angles. You can then add an annotation to the object (name, scale/size, price, manufacturer, URL link, etc). You can also move, clone, erase, or export the object. In addition, you can create a “4D time-lapse” of the scene by combining multiple .ply files without re-training each model.
There's a wide range of potential industries that can utilize radiance field based methods from Ecommerce, to AEC, GIS, and more. What do you think about the world we're entering and the influx of stronger service offerings that are becoming available?
Yoshi: Radiance field based methods like 3DGS are a cost-effective way to create high-fidelity 3D digital twins, whether they be retail products in e-commerce or real-estate properties in AEC. 3D digital twins allow businesses to enhance their customer experiences. By creating virtual representations of products or services, companies are able to provide customers with interactive and engaging experiences, allowing them to visualize and customize products before purchasing. This not only improves the overall customer experience but also increases sales and customer loyalty. 3D digital twins also allow companies to gain deep insights into how their physical assets are performing in near real-time (eg. visualizing the progression of construction sites every day/week). This helps businesses optimize their operations and maintenance processes, leading to improved efficiency and cost savings (i.e., “kaizen”).
As I mentioned earlier, there are already a number of 3DGS training solutions (both offline and online) but 3DGS editing apps are still extremely scarce. To the best of my knowledge Gauzilla Pro is currently the only install-free and retraining-free web solution that allows users to dynamically segment and manipulate objects (including importing/exporting) across different 3DGS scenes. It is also currently the only solution where users can easily create a 4D time-lapse from multiple 3DGS models captured at different points in time at the same scene.
What's next on the roadmap for Gauzilla Pro?
Yoshi: There are many new features that are in the development pipeline of Gauzilla Pro. One of my main goals is to support WebGPU for 1) faster Gaussian sorting and for 2) faster AI inferencing. However, for maximum compatibility this would be implemented only after all the major web browsers start supporting WebGPU by default.
As you know there has been a Cambrian explosion of 3DGS applications and extensions since its advent in the last year. A new research paper extending 3DGS gets released on almost a daily basis. As such, there is a huge gap between the state-of-the-art research and widely available 3DGS solutions. I hope to fill this gap with Gauzilla Pro.
How can people get involved with or gain access to Gauzilla Pro?
Yoshi: Gauzilla Basic is open-source on GitHub, so if you are a software engineer I’d highly recommend checking out the source code. PRs (pull requests) fixing bugs or adding features would be appreciated. For Gauzilla Pro, you can visit www.gauzilla.xyz and apply for early access by registering on the waiting list.
Community Captures
What are some of the best captures you’ve created or seen lately? Send them to me to be featured!
https://x.com/vibrantnebula/status/1793311343811956794
Code releases:
StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field
Toon3D: Seeing Cartoons from a New Perspective
SMERF: Streamable Memory Efficient Radiance Fields
How Far Can We Compress Instant-NGP Based NeRF?
LP-3DGS: Learning to Prune 3D Gaussian Splatting
As always, if there is something you hate, please tell me! My only goal is to create the best and most helpful content for you, so if something is not working, never hesitate to give tough feedback!
If you enjoyed today’s newsletter, please consider sharing it with someone that might find it useful or subscribing!