Table of contents
Source video: 英伟达高俊: AI高质量三维内容生成(内容生成系列【一】) 北京智源大会2023 视觉与多模态大模型
The representation of 3D objects
- Implicit field is in favor of neural network, where it can be optimized by gradient.
- mesh can achieve real-time rendering and is handy for downstream creation, and good topology.
- Marching cube is not fully differentiable
DMTet: A differentiable iso-surfacing is an implict field, and also a mesh.
- An field where only the location at surface has value?
- a field only has one mesh?
- Diff-render
2D images supervise 3D generation
2D GAN advantages:
- various discriminator architecture
- powerful generator
GAN3D
- The latent codes of geometry and texture are sampled from 3D gaussian as prior
- 3D generator: Tri-plane consistute the implicit field.
- Get a mesh by DMTet from the generated geometry and texture, then render it to 2D image
- Use GAN to discriminate if the render is real and backward the gradient of loss
- Limitation: class label conditioned. One model can only can generate 1 category of objects.
Text prompts generate 3D objects
- 2D diffusion used socre function to encourage high-fadality images
- score function needs a full image, but NeRF are trained batch-by-batch of rays, not a full image.
- Dream fusion can only render 64x64 images, so its geometry is low-quality.
- Coarse to fine: Use instant-ngp generate a rough geometry based on low-resolution diffusion model, then use DMTet convert the geometry to mesh; So that a highe-resolution image can be rendered, which can offer a strong gradient for fine geometry
Future work
- a universal model can generate any category of objects.
- composite objects to form a scene
- dynamic objects