Image to 3D Models
Explore the best AI models for image to 3d
TripoSR
TripoSR is a fast feed-forward 3D reconstruction model jointly developed by Stability AI and Tripo AI that generates detailed 3D meshes from single input images in under one second. Unlike optimization-based methods that require minutes of processing per object, TripoSR uses a transformer-based architecture built on the Large Reconstruction Model framework to predict 3D geometry directly from a single 2D photograph in a single forward pass. The model accepts any standard image as input and produces a textured 3D mesh suitable for use in game engines, 3D modeling software, and augmented reality applications. TripoSR excels at reconstructing everyday objects, furniture, vehicles, characters, and organic shapes with impressive geometric accuracy and surface detail. Released under the MIT license in March 2024, the model is fully open source and can run on consumer-grade GPUs without specialized hardware. It supports batch processing for efficient conversion of multiple images and integrates seamlessly with popular 3D pipelines including Blender, Unity, and Unreal Engine. The model is particularly valuable for game developers, product designers, and e-commerce teams who need rapid 3D asset creation from product photographs. Output meshes can be exported in OBJ and GLB formats with configurable resolution settings. TripoSR represents a significant step toward democratizing 3D content creation by making high-quality reconstruction accessible without expensive scanning equipment or manual modeling expertise.
TRELLIS
TRELLIS is a revolutionary AI model developed by Microsoft Research that generates high-quality 3D assets from text descriptions or single 2D images using a novel Structured Latent Diffusion architecture. Released in December 2024, TRELLIS represents a fundamental advancement in 3D content generation by operating in a structured latent space that encodes geometry, texture, and material properties simultaneously rather than treating them as separate stages. The model produces complete 3D meshes with detailed PBR (Physically Based Rendering) textures, enabling direct use in game engines, 3D rendering pipelines, and AR/VR applications without extensive manual post-processing. TRELLIS supports both text-to-3D generation where users describe desired objects in natural language and image-to-3D reconstruction where a single photograph is converted into a full 3D model with inferred geometry from occluded viewpoints. The structured latent representation ensures geometric consistency and prevents the common artifacts seen in other 3D generation approaches such as floating geometry, texture seams, and unrealistic proportions. TRELLIS outputs standard 3D formats including GLB and OBJ with UV-mapped textures, making integration with professional tools like Blender, Unity, and Unreal Engine straightforward. Released under the MIT license, the model is fully open source and available on GitHub. Key applications include rapid 3D asset prototyping for game development, architectural visualization, product design mockups, virtual staging for real estate, educational 3D content creation, and metaverse asset generation. The model particularly benefits indie developers and small studios who lack resources for traditional 3D modeling workflows.
Stable Point Aware 3D (SPA3D)
Stable Point Aware 3D (SPA3D) is an advanced feed-forward 3D reconstruction model developed by Stability AI that generates high-quality textured 3D meshes from a single input image in seconds. Unlike iterative optimization-based approaches that require minutes of processing, SPA3D uses a direct feed-forward architecture that predicts 3D geometry and texture in a single pass, making it practical for interactive workflows and production pipelines. The model employs point cloud alignment techniques that significantly improve geometric consistency compared to other single-view reconstruction methods, ensuring that generated 3D models maintain accurate proportions and structural integrity from multiple viewpoints. SPA3D produces industry-standard mesh outputs with clean topology and UV-mapped textures, enabling direct import into 3D software including Blender, Unity, Unreal Engine, and professional CAD tools. The model handles diverse object categories from organic shapes like characters and animals to hard-surface objects like furniture and vehicles, adapting its reconstruction approach to the structural characteristics of each input. Released under the Stability AI Community License, the model is open source for personal and commercial use with revenue-based restrictions. Key applications include rapid 3D asset creation for game development, augmented reality content production, 3D printing preparation, virtual product photography, architectural visualization, and e-commerce 3D product displays. SPA3D is particularly valuable for creative professionals who need quick 3D mockups from concept sketches or photographs without investing hours in manual modeling. The model runs on consumer GPUs and is available through cloud APIs for scalable deployment.
Zero123++
Zero123++ is a multi-view image generation model developed by Stability AI that generates six consistent canonical views of an object from a single input image. Released in 2023 under the Apache 2.0 license, the model extends the original Zero123 approach with significantly improved view consistency and serves as a critical component in modern 3D reconstruction pipelines. Zero123++ takes a single photograph or rendered image of an object and produces six evenly spaced views covering the full 360-degree range around the object, all maintaining consistent geometry, lighting, and appearance. The model is built on a fine-tuned Stable Diffusion backbone with specialized conditioning mechanisms that ensure multi-view coherence. Unlike the original Zero123 which generates views independently and often produces inconsistent results, Zero123++ generates all six views simultaneously in a single diffusion process, dramatically improving 3D consistency. The generated multi-view images serve as input for downstream 3D reconstruction methods like NeRF, Gaussian Splatting, or direct mesh reconstruction, enabling high-quality 3D model creation from a single photograph. Zero123++ is fully open source with pre-trained weights available on Hugging Face, making it accessible to researchers and developers building 3D generation systems. The model has become a foundational component in many state-of-the-art 3D generation pipelines and is widely used in academic research. It is particularly valuable for applications in game development, product visualization, and virtual reality where converting 2D images to 3D assets is a frequent workflow requirement.
InstantMesh
InstantMesh is a feed-forward 3D mesh generation model developed by Tencent that creates high-quality textured 3D meshes from single input images through a multi-view generation and sparse-view reconstruction pipeline. Released in April 2024 under the Apache 2.0 license, InstantMesh combines a multi-view diffusion model with a large reconstruction model to achieve both speed and quality in single-image 3D reconstruction. The pipeline first generates multiple consistent views of the input object using a fine-tuned multi-view diffusion model, then feeds these views into a transformer-based reconstruction network that predicts a triplane neural representation, which is finally converted to a textured mesh. This two-stage approach produces significantly higher quality results than single-stage methods while maintaining generation times of just a few seconds. InstantMesh supports both text-to-3D workflows when combined with an image generation model and direct image-to-3D conversion from photographs or artwork. The output meshes include detailed geometry and texture maps compatible with standard 3D software and game engines. The model handles a wide variety of object types including characters, vehicles, furniture, and organic shapes with good geometric fidelity. As an open-source project with code and weights available on GitHub and Hugging Face, InstantMesh has become a popular choice for developers building 3D asset generation pipelines. It is particularly useful for game development, e-commerce product visualization, and rapid prototyping scenarios where fast turnaround and reasonable quality are both important requirements.
Unique3D
Unique3D is a high-quality single-image 3D reconstruction model developed by Tencent that produces detailed, well-textured 3D meshes from single input images through a multi-stage pipeline combining multi-view generation, geometry reconstruction, and texture refinement. The model is designed to produce production-quality 3D assets with sharp textures and clean geometry that can be directly used in professional 3D applications. Unique3D employs a multi-level upscale refinement strategy where the initial 3D reconstruction is progressively enhanced at multiple resolution levels, resulting in significantly finer surface details and texture quality compared to single-pass methods. The pipeline first generates consistent multi-view images using a diffusion model, then reconstructs an initial 3D mesh, and finally applies iterative upscaling and refinement to both geometry and texture. This approach produces meshes with crisp texture details and well-defined geometric features even for complex objects with intricate patterns or fine structures. Released under the Apache 2.0 license in May 2024, Unique3D is fully open source with code and pre-trained weights available on GitHub. The model handles a variety of object types including characters, animals, manufactured products, and artistic objects. Output meshes include high-resolution texture maps and proper UV coordinates compatible with standard 3D software. Unique3D is particularly suited for professional workflows in game development, animation, product visualization, and digital content creation where the quality of 3D assets directly impacts the final output. The multi-level refinement approach represents an important contribution to achieving production-grade quality in AI-generated 3D content.
Wonder3D
Wonder3D is a single-image 3D reconstruction model developed by researchers at Tsinghua University that generates both multi-view color images and corresponding normal maps from a single input image for high-quality 3D mesh reconstruction. Accepted at CVPR 2024, Wonder3D introduces a cross-domain diffusion approach that simultaneously produces RGB color views and geometric normal maps, ensuring that the generated views are both visually consistent and geometrically accurate. This dual-output strategy provides significantly richer information for downstream 3D reconstruction compared to methods that generate only color images. The model uses a multi-view cross-domain attention mechanism that enforces consistency between the color and normal map domains during the diffusion process, resulting in coherent multi-view outputs that faithfully represent the 3D structure of the input object. Wonder3D can reconstruct a complete textured 3D mesh from a single photograph in approximately two to three minutes. The output meshes feature clean geometry with well-defined surface details, making them suitable for use in professional 3D workflows. Released under the Apache 2.0 license, the model is fully open source with code and pre-trained weights available on GitHub. Wonder3D handles diverse object categories including characters, animals, furniture, and manufactured objects with consistent quality. The model is particularly valuable for applications in game development, animation, product visualization, and virtual reality where high-quality 3D assets need to be created from limited reference imagery. Its cross-domain approach has influenced subsequent research in multi-view generation for 3D reconstruction.
One-2-3-45
One-2-3-45 is a single-image 3D reconstruction system developed by researchers at UC San Diego that generates textured 3D meshes from a single input image through a two-stage pipeline combining multi-view generation with sparse-view 3D reconstruction. The name reflects the core process: from one image, generate two to three to four to five views, then reconstruct a complete 3D object. In the first stage, a fine-tuned Zero123 model generates multiple novel views of the object from different angles based on the single input photograph. In the second stage, these generated multi-view images are fed into a cost-volume-based sparse-view reconstruction network that produces a textured 3D mesh with consistent geometry. Released in June 2023 under the MIT license, One-2-3-45 was among the first systems to demonstrate that combining 2D diffusion models with 3D reconstruction could produce reasonable 3D assets in under a minute. The model handles a variety of object types including everyday items, animals, vehicles, and artistic objects. Unlike optimization-based approaches like DreamFusion that require per-object optimization taking tens of minutes, One-2-3-45 runs in a feed-forward manner making it significantly faster. The output meshes include color and texture information and can be exported for use in standard 3D applications. As a fully open-source project with code available on GitHub, it has served as an influential reference for subsequent research in single-image 3D generation. The system is particularly useful for researchers and developers exploring rapid 3D content creation from limited input data.
OpenLRM
OpenLRM is an open-source implementation of the Large Reconstruction Model architecture for single-image 3D reconstruction, developed by Zexiang Xu and collaborators. The project provides a fully open and reproducible implementation of the LRM approach, which uses a transformer-based architecture to predict 3D representations from single input images in a feed-forward manner. OpenLRM processes an input image through a pre-trained vision encoder like DINOv2, then feeds the resulting features into a transformer decoder that generates a triplane-based neural radiance field representation, which can be rendered from novel viewpoints or converted to a textured 3D mesh. The entire reconstruction takes only a few seconds on a modern GPU, making it practical for interactive applications and batch processing workflows. Released under the Apache 2.0 license in December 2023, OpenLRM fills a critical gap in the 3D AI research community by providing an accessible reference implementation that researchers can study, modify, and build upon. The model supports various output formats and can be integrated into existing 3D pipelines for applications ranging from game development to e-commerce product visualization. OpenLRM handles diverse object categories including furniture, vehicles, characters, and everyday items with reasonable geometric fidelity. Pre-trained model weights are available on Hugging Face for immediate use. As one of the foundational open-source projects in feed-forward 3D reconstruction, OpenLRM has directly influenced and enabled numerous downstream projects and research efforts in the rapidly evolving single-image 3D generation space.
Era3D
Era3D is a multi-view generation model developed by Alibaba that produces high-resolution, camera-aware multi-view images and normal maps from single input images for 3D reconstruction. The model introduces two key innovations that address common limitations in multi-view generation: a focal length estimation module that adapts to the camera perspective of the input image, and an efficient row-wise attention mechanism that enables generation at higher resolutions than competing methods while using less GPU memory. Era3D generates six consistent views along with corresponding normal maps at 512x512 resolution, providing rich geometric information for downstream 3D mesh reconstruction. The camera-aware design means the model can handle input images taken from different perspectives and focal lengths without degradation in output quality, a significant improvement over methods that assume a fixed camera model. The row-wise attention mechanism replaces the computationally expensive full cross-view attention with a more efficient alternative that processes attention along horizontal rows, reducing memory requirements while maintaining view consistency. Released in May 2024 under the Apache 2.0 license, Era3D is fully open source with code and pre-trained weights available on GitHub. The model demonstrates strong performance across diverse object categories and produces clean multi-view outputs suitable for high-quality 3D reconstruction. Era3D is particularly valuable for professional 3D content creation workflows where input images come from varied sources with different camera characteristics, and where high-resolution multi-view generation is essential for capturing fine details in the final 3D models.
SyncDreamer
SyncDreamer is a multi-view generation and 3D reconstruction model developed by researchers at Tsinghua University that generates synchronized, 3D-consistent views of objects from single input images. Released in 2023 under the Apache 2.0 license, SyncDreamer introduces a synchronized multi-view diffusion approach that generates multiple views simultaneously while enforcing 3D consistency through a novel attention mechanism. Unlike sequential view generation methods that often produce inconsistent results between views, SyncDreamer's synchronized generation process ensures that all output views share coherent geometry, lighting, and appearance. The model uses a modified diffusion architecture with a 3D-aware feature attention module that allows information to flow between different viewpoint predictions during the denoising process. This cross-view communication enables the model to maintain spatial consistency across all generated views. The output multi-view images can be used with standard multi-view reconstruction methods like NeuS or NeRF to produce high-quality textured 3D meshes. SyncDreamer generates 16 evenly spaced views around the object, providing comprehensive coverage for accurate 3D reconstruction. The model handles a variety of object categories including animals, vehicles, furniture, and artistic objects with good consistency. As a fully open-source project with code and weights available on GitHub, SyncDreamer has become an important reference in the multi-view generation literature. The model is particularly relevant for researchers working on 3D generation pipelines and for applications in game development, product visualization, and virtual reality content creation where converting single images to 3D assets is a common requirement.