Text-to-3D generation is a valuable technology in virtual reality and digital content creation. While recent works have pushed the boundaries of text-to-3D generation, producing high-fidelity 3D objects with inefficient prompts and simulating their physics-grounded motion accurately still remain unsolved challenges. To address these challenges, we present an innovative framework that utilizes the Large Language Model (LLM)-refined prompts and diffusion priors-guided Gaussian Splatting (GS) for generating 3D models with accurate appearances and geometric structures. We also incorporate a continuum mechanics-based deformation map and color regularization to synthesize vivid physics-grounded motion for the generated 3D Gaussians, adhering to the conservation of mass and momentum. By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects that exhibit physics-aware motion, accurately reflecting the behaviors of the objects under various forces and constraints across different materials. Extensive experiments demonstrate that our approach achieves high-quality 3D generations with realistic physics-grounded motion.
文本到3D生成技术在虚拟现实和数字内容创作中具有重要价值。尽管最近的研究推动了文本到3D生成的边界,但高保真3D对象的生成仍然面临低效提示的挑战,同时精确模拟基于物理的运动也尚未完全解决。 为应对这些挑战,我们提出了一种创新框架,结合了大语言模型(LLM)优化的提示和扩散先验引导的高斯点云(Gaussian Splatting, GS),用于生成具有准确外观和几何结构的3D模型。同时,我们引入了基于连续介质力学的变形映射和颜色正则化方法,为生成的3D高斯点云合成生动的物理运动,遵循质量和动量守恒原则。 通过将文本到3D生成与基于物理的运动合成相结合,我们的框架能够渲染出逼真的3D对象,这些对象表现出物理感知的运动,准确反映在不同材料下对象在各种力和约束条件下的行为。广泛的实验表明,我们的方法不仅生成了高质量的3D模型,还实现了逼真、基于物理的运动合成效果。