watch: Jun Gao | ML for 3D content generation

Source video: 英伟达高俊: AI高质量三维内容生成(内容生成系列【一】) 北京智源大会2023 视觉与多模态大模型

The representation of 3D objects

  • Implicit field is in favor of neural network, where it can be optimized by gradient.
  • mesh can achieve real-time rendering and is handy for downstream creation, and good topology.
  • Marching cube is not fully differentiable

DMTet: A differentiable iso-surfacing is an implict field, and also a mesh.

  • An field where only the location at surface has value?
  • a field only has one mesh?
  • Diff-render

2D images supervise 3D generation

2D GAN advantages:

  1. various discriminator architecture
  2. powerful generator

GAN3D

  • The latent codes of geometry and texture are sampled from 3D gaussian as prior
  • 3D generator: Tri-plane consistute the implicit field.
  • Get a mesh by DMTet from the generated geometry and texture, then render it to 2D image
  • Use GAN to discriminate if the render is real and backward the gradient of loss
  • Limitation: class label conditioned. One model can only can generate 1 category of objects.

Text prompts generate 3D objects

  • 2D diffusion used socre function to encourage high-fadality images
  • score function needs a full image, but NeRF are trained batch-by-batch of rays, not a full image.
  • Dream fusion can only render 64x64 images, so its geometry is low-quality.
  • Coarse to fine: Use instant-ngp generate a rough geometry based on low-resolution diffusion model, then use DMTet convert the geometry to mesh; So that a highe-resolution image can be rendered, which can offer a strong gradient for fine geometry

Future work

  1. a universal model can generate any category of objects.
  2. composite objects to form a scene
  3. dynamic objects