It seems like we’re heading into the full swing of the year, with new radiance field platforms coming out. Prism AI recently launched, using Volinga’s newly announced Desktop Suite.
Niantic owned Scaniverse announced Gaussian Splatting support! Even better, it actually trains entirely on device, meaning that you can do full Radiance Field processing from your iPhone! It’s quite fast too, taking about two minutes from capture. I didn’t particularly find it to drain my battery either.
Radiance Field companies Luma AI and CSM were both named to CB Insights AI 100 list for 2024, alongside industry giants like OpenAI, Perplexity, and ElevenLabs.
Research:
On the research side, there have been some really awesome papers! Radsplat merges NeRFs and Gaussians together to get 900 fps, with the fidelity of NeRFs. Implicit Neural Point Clouds (INCP) takes reconstruction SoTA a little higher, with stronger fine detail reconstruction.
The follow up work to SuGaR, which enables pulling high quality meshes from 3DGS received its followup work, Gaussian Frosting, which extends capabilities for editing and animation.
GTC
Now for our deep dive of this newsletter. We’re wrapping up NVIDIA’s GTC. It routinely was stated that this year’s GTC was AI’s Woodstock, though with decidedly less drugs. That of course is only true if you don’t count caffeine, which I am most definitely addicted to.
Perhaps it’s the fact that I was on the lookout for it, but I seemingly kept running into Radiance Fields at GTC.
I myself gave two talks, one about the commercial use cases and the second about creative use cases of radiance fields. Quite a few more people came than I was expecting and if you were in the audience, thank you so much for attending! The breadth of industries that asked me questions about their use cases was both fascinating and encouraging about the role we are going to see Radiance Fields play in the near future.
We also had VR demos, courtesy of Gracia VR. The scenes that were shown are now available online. My favorite is the Hidden Garden! There were also text to 3D demos and an interactive demo from Google’s SMERF.
Elsewhere on the exhibition floor, there were several other companies displaying Radiance Fields.
Both Volinga and CSM had booths and were running demos. Additionally, Shutterstock also had a presence, showcasing their officially licensed Generative AI text to 3D method, in collaboration with NVIDIA’s Edify.
Leia showed off Gaussian Splatting on their glasses free 3D displays and Looking Glass also showcased a demo of NVIDIA’s Real Time Radiance Fields!
It seemed like Radiance Fields were everywhere. Beginning April 8, sessions will be available to the general public through NVIDIA On-Demand and there were plenty of sessions:
Transforming 2D Imagery into 3D Geospatial Tiles With Neural Radiance Fields [S62490]
Découvrir de Nouvelles Opportunités Grâce aux Jumeaux Numériques [SE62780]
Using Headset-Free Displays to Render Next-Generation Gaussian Splats with Leia [XRS63117]
AI for Learning Photorealistic 3D Digital Humans from In-the-Wild Data [S62511]
AI for Learning Photorealistic 3D Digital Humans from In-the-Wild Data [S62511]
Dive Deep into Real-Time Neural Rendering on NVIDIA GPUs [S62046]
Bringing the Metaverse to the Next Billion Users via Codec Avatars [S63211]
Interview with Martin Sawtell, XR Director at Dell.
I also was able to speak with Martin Sawtell, XR Director at Dell, about his GTC experience.
How was your GTC?
It was great – someone said it was like "Woodstock for nerds" and I think that sums it up. There was definitely a buzz in the air, I would even say that they'll need a bigger conference center next year. There was a strong sense of big things to come, and I spent most of it connecting with people on the floor in person.
What was your session about and what did it cover?
I presented with Max Andrews about novel workflows with GenAI, enabled by Precision workstations. We showed that you could quickly flesh out a scene by bring a Splat in and adding basic geometry. The key part is that you can then feed that to a GenAI system like Stable Diffusion to reinterpret and present it as a final upscaled image.
One of the other things I showed was my experiments with representing myself in different ways – as a metahuman or a real-time "Streamdiffusion" version – where I could add a LORA model to further customize my appearance. I trained it on webcam footage of myself, but also the metahuman with a face swap added.
How have you been exploring the use of Gaussian Splatting?
I see it as a new way to capture a space, storing/recalling that data and then rendering it. What's amazing is that it's so flexible, efficient and maintains a natural quality to it similar to a photo. It has an amazing sense of presence.
At Dell's CSG CTO office, we're interested in how these new opportunities shape the future of compute and what users will expect from it. Where does the data get processed? How? What's needed for that? How can we build the tools that people will need to do this? They are the questions on our mind.
Where do you see the technology being used?
The most obvious one is the capture and recreation of anything from spaces, objects to people. I'll give you an example: before I moved furniture into a room I scanned it with my phone, and whilst shopping for furniture it was easy to measure things up. I even scanned a couch, dropped it into Unity and could show my partner in VR that no, we can't fit two of them here, so it solves marital disputes! This was a few years ago, and now we have Splats as a new way of quickly bringing objects into scenes.
As for other areas, I think deformable Splats and related innovations have the potential to change the way we think about self-representation. Over the next few years, a subset of people will start to adopt this as a common way to show up on calls. You can choose how you're dressed, relight yourself and be in any environment you want. This might seem frivolous but having that kind of control over your appearance can make a late night work call a lot easier to deal with.
Another point to remember is that having a natively 3D dataset affords us the opportunity to use more immersive display devices. That could include a 2D screen but control over your viewing location, or it could be a 3D screen, a VR headset or AR glasses. It's great that we have the USD data format for polygonal geometry, shaders, etc., but we will need to continue pushing for more open standards to facilitate the uptake of these devices and the interoperability needed to not just display the data, but share it too.
Where do you see splatting and AI generally, intersecting with XR?
AI intersects quite well with XR. In fact, many XR features are rooted in AI, and as new algorithms and services surface they’re also being introduced, which is incredibly exciting. What I've already described is several ways to rethink content creation, representation and generation. Let me highlight a few additional areas that come to mind…
We've been over the hype curve for the Metaverse, but if you take a closer look, there are communities that are alive and well where their appeal comes from user created content. People go there to connect and share, and Splats are an amazing tool to facilitate this – imagine if you could bring your global friends to a place that's special to you, visit theirs, or a historical place where a Splat was created from file footage.
Instead of downloading a mesh, textures and shader, we pull down a packaged Splat. So far the rendering has been performant and I look forward to seeing how open standards, NPUs and the like contribute to the overall experience along with accessibility.
There’s also the potential interaction with language models and the future of immersive technology. I can foresee situations where you'd want to have a moment to focus on work in a mixed reality headset, and a copilot assistant could bring things to you spatially, driven by the context of who you are, what you're doing, in the physical place you are in. Or, perhaps you'd wave it away with your hand and only ask for important alerts. This productivity scenario is fairly obvious. But what if you wanted to go somewhere completely different? You could verbally say "I want to sit on a couch in a forest and watch Netflix at night, and invite my friends." It could pull in a Splat that matches that, or more likely use the heuristic input to collect examples of objects that would fit the setting and generate a new scene based on that. Just for you, right then and there.
Of course, AI generated content might be an unlimited resource. But there's an aspect to that story where the human need to attach meaning to an instance of space, to have their space, or visit another means that there will always be a place for real world things and human built content. The door to that future is open and cannot be closed, how Splats and other tools to build it in a responsible way is the human question to be asked today.
Martin will be also speaking at HarvardXR this Saturday!
I also ran into Antimatter15, whose Gaussian Splatting viewers have quickly been adopted as the standard within the industry.
At the outset of GTC, XGRIDS announced that their LiDAR scanners are now capable of creating Gaussian Splatting. One of their employees, Mindy, was kind enough to drive all the way to San Jose, to give me a demo. We captured the exterior of the convention center in roughly 5 minutes.
I finished up my time at GTC recording an episode about Radiance Fields on the NVIDIA AI podcast. No word yet when it will release, but I will keep you updated. All in all, it was a hectic week, filled with a lot of excitement for the radiance field community.
Unfortunately, I ran out of time in the Bay Area and couldn’t make it to GDC. What’d I miss?
The Beginning of NeRF
At the same time as GTC across the world, Ben Mildenhall, the first author of NeRF, gave the Keynote address at 3DV. The entire talk was posted on YouTube and Ben speaks for roughly the first 70 minutes.
He offers the strongest glimpse into the initial moments of the founding of NeRFs to date, showcasing the Slack messages between him, Pratul Srinivasan, Matt Tancik, and Jon Barron. Pretty crazy!
Ben also touches on some of the more recent work he completed within the generative space and Reconfusion. The question remains; what in the world is Ben up to since leaving Google?
Code releases:
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Gaussian SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
That’s it for now. Stay tuned to some extraordinarily large changes to the website. Since you are a subscriber, you’re the very first to know that I have been redesigning the website from the ground up over the last few months and we are so close to finishing! The change should be live over the next two days and I hope you like it!
As always, if there is something you hate, please tell me! My only goal is to create the best and most helpful content for you, so if something is not working, never hesitate to give tough feedback!
If you enjoyed today’s newsletter, please consider sharing it with someone that might find it useful!