Sunday, November 20, 2022

Magic3D: High-Resolution Text-to-3D Content Creation

Magic3D can create high-quality 3D textured mesh models from input text prompts. It utilizes a coarse-to-fine strategy that leverages both low- and highresolution diffusion priors for learning the 3D representation of the target content. Magic3D synthesizes 3D content with 8× higher-resolution supervision than DreamFusion while also being 2× faster.

[...] indicates helper captions added to improve quality, e.g. "A DSLR photo of".
Videos are best viewed with Google Chrome.
Click on the text prompts to reveal the 3D meshes!

A beautiful dress made out of garbage bags, on a mannequin. Studio lighting, high quality, high resolution.

A blue poison-dart frog sitting on a water lily.

[...] a car made out of sushi.

[...] a bagel filled with cream cheese and lox.

[...] an ice cream sundae.

[...] a peacock on a surfboard.

[...] a plate piled high with chocolate chip cookies.

[...] Neuschwanstein Castle, aerial view.

[...] the Imperial State Crown of England.

[...] the leaning tower of Pisa, aerial view.

A ripe strawberry.

A silver platter piled high with fruits.

[...] a silver candelabra sitting on a red velvet tablecloth, only one candle is lit.

[...] Sydney opera house, aerial view.

Michelangelo style statue of an astronaut.

Given a coarse model generated with a base text prompt, we can modify parts of the text in the prompt, and then fine-tune the NeRF and 3D mesh models to obtain an edited high-resolution 3D mesh.

A squirrel wearing a leather jacket riding a motorcycle.

A bunny riding a scooter.

A fairy riding a bike.

A steampunk squirrel riding a horse.

A baby bunny sitting on top of a stack of pancakes.

A lego bunny sitting on top of a stack of books.

A metal bunny sitting on top of a stack of broccoli.

A metal bunny sitting on top of a stack of chocolate cookies.

We utilize a two-stage coarse-to-fine optimization framework for fast and high-quality text-to-3D content creation. In the first stage, we obtain a coarse model using a low-resolution diffusion prior and accelerate this with a hash grid and sparse acceleration structure. In the second stage, we use a textured mesh model initialized from the coarse neural representation, allowing optimization with an efficient differentiable renderer interacting with a high-resolution latent diffusion model.

@article{lin2022magic3d,
  title={Magic3D: High-Resolution Text-to-3D Content Creation},
  author={Chen-Hsuan Lin and Jun Gao and Luming Tang and Towaki Takikawa and Xiaohui Zeng and Xun Huang and Karsten Kreis and Sanja Fidler and Ming-Yu Liu and Tsung-Yi Lin},
  journal={arXiv preprint arXiv:2211.10440},
  year={2022}
}



from Hacker News https://ift.tt/rpD7MUE

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.