Text to 3D Models

Explore the best AI models for text to 3d

Filter

TripoSR

TripoSR is a fast feed-forward 3D reconstruction model jointly developed by Stability AI and Tripo AI that generates detailed 3D meshes from single input images in under one second. Unlike optimization-based methods that require minutes of processing per object, TripoSR uses a transformer-based architecture built on the Large Reconstruction Model framework to predict 3D geometry directly from a single 2D photograph in a single forward pass. The model accepts any standard image as input and produces a textured 3D mesh suitable for use in game engines, 3D modeling software, and augmented reality applications. TripoSR excels at reconstructing everyday objects, furniture, vehicles, characters, and organic shapes with impressive geometric accuracy and surface detail. Released under the MIT license in March 2024, the model is fully open source and can run on consumer-grade GPUs without specialized hardware. It supports batch processing for efficient conversion of multiple images and integrates seamlessly with popular 3D pipelines including Blender, Unity, and Unreal Engine. The model is particularly valuable for game developers, product designers, and e-commerce teams who need rapid 3D asset creation from product photographs. Output meshes can be exported in OBJ and GLB formats with configurable resolution settings. TripoSR represents a significant step toward democratizing 3D content creation by making high-quality reconstruction accessible without expensive scanning equipment or manual modeling expertise.

Open Source

4.5

TRELLIS

Microsoft Research|Unknown

TRELLIS is a revolutionary AI model developed by Microsoft Research that generates high-quality 3D assets from text descriptions or single 2D images using a novel Structured Latent Diffusion architecture. Released in December 2024, TRELLIS represents a fundamental advancement in 3D content generation by operating in a structured latent space that encodes geometry, texture, and material properties simultaneously rather than treating them as separate stages. The model produces complete 3D meshes with detailed PBR (Physically Based Rendering) textures, enabling direct use in game engines, 3D rendering pipelines, and AR/VR applications without extensive manual post-processing. TRELLIS supports both text-to-3D generation where users describe desired objects in natural language and image-to-3D reconstruction where a single photograph is converted into a full 3D model with inferred geometry from occluded viewpoints. The structured latent representation ensures geometric consistency and prevents the common artifacts seen in other 3D generation approaches such as floating geometry, texture seams, and unrealistic proportions. TRELLIS outputs standard 3D formats including GLB and OBJ with UV-mapped textures, making integration with professional tools like Blender, Unity, and Unreal Engine straightforward. Released under the MIT license, the model is fully open source and available on GitHub. Key applications include rapid 3D asset prototyping for game development, architectural visualization, product design mockups, virtual staging for real estate, educational 3D content creation, and metaverse asset generation. The model particularly benefits indie developers and small studios who lack resources for traditional 3D modeling workflows.

Open Source

4.5

Meshy

Meshy AI|N/A

Meshy is a proprietary AI-powered 3D generation platform developed by Meshy AI that creates detailed, production-ready 3D models from text descriptions and images. The platform combines text-to-3D and image-to-3D capabilities with advanced AI texturing features, positioning itself as a comprehensive solution for rapid 3D content creation. Meshy uses a transformer-based architecture that generates textured 3D meshes with PBR-compatible materials, making outputs directly usable in game engines like Unity and Unreal Engine without additional processing. The platform offers multiple generation modes including text-to-3D for creating objects from written descriptions, image-to-3D for converting photographs into 3D models, and AI texturing for applying realistic materials to existing untextured meshes. Generated models include proper UV mapping, normal maps, and physically based rendering materials suitable for professional workflows. Meshy provides both a web-based interface and an API for programmatic access, making it accessible to individual artists and scalable for enterprise pipelines. The platform is particularly popular among game developers, animation studios, and AR/VR content creators who need to produce large volumes of 3D assets efficiently. As a proprietary commercial service launched in 2023, Meshy operates on a subscription model with free tier access for limited generations. The platform continuously updates its models to improve output quality, topology optimization, and texture fidelity, competing directly with other AI 3D generation services in the rapidly evolving market.

Proprietary

4.4

Meshy v4

Meshy AI|undisclosed

Meshy v4 is the fourth generation of Meshy AI's 3D model generation platform, capable of creating detailed, textured 3D models from text descriptions and images in minutes. Released in late 2024, Meshy v4 represents a major upgrade in mesh quality, texture fidelity, and topology optimization over previous versions. The model generates production-ready 3D assets with clean topology suitable for game engines, animation pipelines, and 3D printing. Meshy v4 supports both text-to-3D and image-to-3D generation workflows, with the image-to-3D mode producing particularly impressive results by accurately capturing shape, proportions, and surface details from reference photographs. The platform generates textured meshes with PBR (Physically Based Rendering) materials including diffuse, normal, roughness, and metallic maps, making outputs immediately compatible with Unity, Unreal Engine, and Blender. Generated models can be exported in multiple formats including GLB, OBJ, FBX, and STL. Meshy v4 features improved detail preservation, better handling of thin structures and complex geometries, and more accurate color and texture mapping. The platform serves game developers, 3D artists, architects, product designers, and content creators who need rapid 3D asset creation without manual modeling expertise. A freemium model offers limited free generations with paid plans providing higher quality, more generations, and commercial licensing.

Proprietary

4.5

InstantMesh

Tencent|N/A

InstantMesh is a feed-forward 3D mesh generation model developed by Tencent that creates high-quality textured 3D meshes from single input images through a multi-view generation and sparse-view reconstruction pipeline. Released in April 2024 under the Apache 2.0 license, InstantMesh combines a multi-view diffusion model with a large reconstruction model to achieve both speed and quality in single-image 3D reconstruction. The pipeline first generates multiple consistent views of the input object using a fine-tuned multi-view diffusion model, then feeds these views into a transformer-based reconstruction network that predicts a triplane neural representation, which is finally converted to a textured mesh. This two-stage approach produces significantly higher quality results than single-stage methods while maintaining generation times of just a few seconds. InstantMesh supports both text-to-3D workflows when combined with an image generation model and direct image-to-3D conversion from photographs or artwork. The output meshes include detailed geometry and texture maps compatible with standard 3D software and game engines. The model handles a wide variety of object types including characters, vehicles, furniture, and organic shapes with good geometric fidelity. As an open-source project with code and weights available on GitHub and Hugging Face, InstantMesh has become a popular choice for developers building 3D asset generation pipelines. It is particularly useful for game development, e-commerce product visualization, and rapid prototyping scenarios where fast turnaround and reasonable quality are both important requirements.

Open Source

4.3

Tripo AI v2

Tripo AI|undisclosed

Tripo AI v2 is the second-generation 3D model generation platform from Tripo AI, the company that co-developed TripoSR with Stability AI. Released in 2024, Tripo v2 builds upon the speed and accessibility foundations of TripoSR while adding significant quality improvements, animation capabilities, and production-oriented features. The model generates detailed 3D meshes from text descriptions and single images with improved geometric accuracy, better texture quality, and support for rigged and animated output. Tripo v2's standout feature is its ability to generate rigged 3D characters with automatic skeleton binding, enabling immediate use in animation and game development pipelines. The model produces PBR-ready textured meshes exportable in GLB, FBX, OBJ, and USDZ formats. Generation speed remains impressive at under 10 seconds for basic models, while higher quality outputs with animation rigging take 1-2 minutes. Tripo v2 serves game developers, 3D artists, AR/VR content creators, and product designers who need rapid 3D asset generation with production-quality output. The platform offers API access for enterprise integration and batch processing workflows.

Proprietary

4.5

Shap-E

OpenAI|N/A

Shap-E is a 3D generation model developed by OpenAI that creates 3D objects directly from text descriptions or input images by generating the parameters of implicit neural representations. Unlike its predecessor Point-E which produces point clouds, Shap-E generates Neural Radiance Fields (NeRF) and textured meshes that can be directly rendered and used in 3D applications. The model employs a two-stage training approach where an encoder first learns to map 3D assets to implicit function parameters, then a conditional diffusion model learns to generate those parameters from text or image inputs. This architecture enables fast generation times of just a few seconds on a modern GPU. Shap-E supports both text-to-3D and image-to-3D workflows, making it versatile for different creative pipelines. The generated 3D objects include color and texture information, producing more complete results than geometry-only approaches. Released under the MIT license in May 2023, the model is fully open source with pre-trained weights available on GitHub. While the output quality may not match optimization-heavy methods like DreamFusion that take minutes per object, Shap-E offers a practical balance between speed and quality for rapid prototyping and concept exploration. The model is particularly useful for game developers, 3D artists, and researchers who need quick 3D visualizations from text prompts. As one of OpenAI's contributions to open-source 3D AI research, Shap-E has influenced subsequent work in fast feed-forward 3D generation approaches.

Open Source

4.0

LGM

Peking University|N/A

LGM (Large Gaussian Model) is a 3D generation model developed by researchers at Peking University that produces high-quality 3D objects from single images or text prompts in approximately five seconds using 3D Gaussian Splatting representation. Released in 2024 under the MIT license, LGM combines multi-view image generation with Gaussian-based 3D reconstruction in an end-to-end framework. The model first generates multiple consistent views of the target object using a multi-view diffusion backbone, then a U-Net-based Gaussian decoder predicts 3D Gaussian parameters from these views to construct the full 3D representation. Unlike mesh-based approaches, the Gaussian Splatting output enables real-time rendering with high visual quality including accurate lighting, transparency, and reflective surface effects. LGM supports resolutions up to 512 pixels for the generated views and produces detailed 3D content with clean geometry and vivid textures. The model can be used for both image-to-3D conversion from photographs and text-to-3D generation when paired with a text-to-image model as a front end. As an open-source project with code and pre-trained weights available on GitHub, LGM is accessible to researchers and developers for both academic study and practical applications. The model is particularly suited for interactive 3D visualization, virtual reality content, game asset prototyping, and any scenario where real-time rendering of generated 3D content is required. LGM demonstrates that Gaussian Splatting provides a compelling alternative to traditional mesh representations for AI-generated 3D content.

Open Source

4.2

Rodin Gen-1

Microsoft|N/A

Rodin Gen-1 is a 3D generation model developed by Microsoft Research that creates detailed, high-quality 3D models and digital avatars from text descriptions and images. The model represents Microsoft's significant entry into the AI-powered 3D content creation space, leveraging the company's extensive research in computer vision and generative AI. Rodin Gen-1 uses a diffusion-based architecture that generates 3D representations through a denoising process operating in a learned latent space, producing results with fine geometric details and realistic surface textures. The model is particularly specialized in generating 3D digital avatars with accurate facial features, hair, clothing, and accessories from textual descriptions, making it highly relevant for gaming, virtual reality, and metaverse applications. Beyond avatars, Rodin Gen-1 can generate general 3D objects and scenes with consistent quality across different categories. The generation process produces textured meshes with proper topology suitable for animation and rigging workflows. Microsoft has positioned Rodin Gen-1 as a research contribution, releasing it under a research-only license that permits academic use but restricts commercial deployment. The model builds on Microsoft's broader 3D AI research portfolio and demonstrates how large-scale generative models can be effectively applied to 3D content creation. Rodin Gen-1 is particularly noteworthy for its avatar generation quality, achieving results that approach the fidelity of manually crafted 3D characters while requiring only a text prompt as input, significantly reducing the time and expertise traditionally needed for professional 3D character creation.

Proprietary

4.2

OpenLRM

Zexiang Xu|N/A

OpenLRM is an open-source implementation of the Large Reconstruction Model architecture for single-image 3D reconstruction, developed by Zexiang Xu and collaborators. The project provides a fully open and reproducible implementation of the LRM approach, which uses a transformer-based architecture to predict 3D representations from single input images in a feed-forward manner. OpenLRM processes an input image through a pre-trained vision encoder like DINOv2, then feeds the resulting features into a transformer decoder that generates a triplane-based neural radiance field representation, which can be rendered from novel viewpoints or converted to a textured 3D mesh. The entire reconstruction takes only a few seconds on a modern GPU, making it practical for interactive applications and batch processing workflows. Released under the Apache 2.0 license in December 2023, OpenLRM fills a critical gap in the 3D AI research community by providing an accessible reference implementation that researchers can study, modify, and build upon. The model supports various output formats and can be integrated into existing 3D pipelines for applications ranging from game development to e-commerce product visualization. OpenLRM handles diverse object categories including furniture, vehicles, characters, and everyday items with reasonable geometric fidelity. Pre-trained model weights are available on Hugging Face for immediate use. As one of the foundational open-source projects in feed-forward 3D reconstruction, OpenLRM has directly influenced and enabled numerous downstream projects and research efforts in the rapidly evolving single-image 3D generation space.

Open Source

4.1

Point-E

OpenAI|N/A

Point-E is a 3D generation system developed by OpenAI that produces colored 3D point clouds from text descriptions through a two-stage cascading approach. Released in December 2022, it was one of the first publicly available text-to-3D models from a major AI lab. The system works in two stages: first, a text-conditioned DALL-E-based image generation model creates a synthetic view of the described object, then a second diffusion model generates a 3D point cloud conditioned on that image. This cascading design produces results in just one to two minutes on a single GPU, dramatically faster than optimization-based methods like DreamFusion which require hours of processing. The generated point clouds consist of thousands of colored points representing the 3D shape and appearance of objects. While point clouds are less immediately usable than meshes for production 3D applications, they can be converted to meshes through standard reconstruction algorithms like Poisson surface reconstruction. Point-E supports generation of a wide variety of objects including animals, vehicles, furniture, and everyday items. The model is fully open source under the MIT license with code and pre-trained weights available on GitHub. As a pioneering early contribution to fast text-to-3D generation, Point-E demonstrated that trading some quality for dramatically improved speed was a viable approach, directly influencing the development of subsequent models like Shap-E. The system remains valuable for researchers exploring 3D generation pipelines and for rapid concept visualization where speed matters more than production-ready quality.

Open Source

3.7