What is the output quality like?

One-2-3-45 produces textured 3D meshes that capture the general shape and appearance of the input object. The quality is suitable for prototyping, visualization, and research purposes but may not meet production standards for game development or high-quality rendering. Surface details can be approximate, and texture quality depends on the consistency of the generated multi-view images. For production-quality assets, newer models like InstantMesh or commercial options like Meshy are recommended.

Can One-2-3-45 be used commercially?

Yes, One-2-3-45 is released under the MIT license, which permits unrestricted commercial use, modification, and distribution without licensing fees. However, users should also verify the license of the underlying Zero123 model weights used in the multi-view generation stage, as these may have separate licensing terms. The MIT license on the One-2-3-45 code itself allows full commercial use including creating proprietary derivative works.

What hardware is required for One-2-3-45?

One-2-3-45 requires a GPU with at least 12-16GB VRAM for the combined multi-view generation and 3D reconstruction pipeline. NVIDIA RTX 3080 or equivalent GPUs provide adequate performance. The 45-second generation time is measured on high-end GPU hardware; mid-range GPUs may take somewhat longer. The multi-view generation stage (based on Zero123) is the most memory-intensive component. CPU inference is technically possible but impractically slow.

What is the relationship between One-2-3-45 and Zero123?

One-2-3-45 uses Zero123 as a component in its first stage for multi-view image generation. Zero123 is a model that generates novel views of objects from different angles. One-2-3-45 extends this by adding a second stage that performs 3D reconstruction from the generated multi-view images. In essence, Zero123 provides the view synthesis capability while One-2-3-45 adds the 3D reconstruction pipeline to produce actual 3D meshes from the synthesized views.

One-2-3-45

Open Source

4.0

UC San Diego

One-2-3-45 is a single-image 3D reconstruction system developed by researchers at UC San Diego that generates textured 3D meshes from a single input image through a two-stage pipeline combining multi-view generation with sparse-view 3D reconstruction. The name reflects the core process: from one image, generate two to three to four to five views, then reconstruct a complete 3D object. In the first stage, a fine-tuned Zero123 model generates multiple novel views of the object from different angles based on the single input photograph. In the second stage, these generated multi-view images are fed into a cost-volume-based sparse-view reconstruction network that produces a textured 3D mesh with consistent geometry. Released in June 2023 under the MIT license, One-2-3-45 was among the first systems to demonstrate that combining 2D diffusion models with 3D reconstruction could produce reasonable 3D assets in under a minute. The model handles a variety of object types including everyday items, animals, vehicles, and artistic objects. Unlike optimization-based approaches like DreamFusion that require per-object optimization taking tens of minutes, One-2-3-45 runs in a feed-forward manner making it significantly faster. The output meshes include color and texture information and can be exported for use in standard 3D applications. As a fully open-source project with code available on GitHub, it has served as an influential reference for subsequent research in single-image 3D generation. The system is particularly useful for researchers and developers exploring rapid 3D content creation from limited input data.

Image to 3D

Visit Website

Key Highlights

Pioneering Multi-View-Then-Reconstruct Paradigm

One of the first systems to demonstrate the two-stage approach of generating multi-view images then performing 3D reconstruction, establishing a pattern now standard in the field

45-Second Complete Pipeline

Completes the entire single-image to textured 3D mesh pipeline in approximately 45 seconds, dramatically faster than optimization-based methods requiring hours

Cost-Volume 3D Reconstruction

Uses cost-volume-based reconstruction that aggregates information from multiple generated views, providing robustness against view inconsistencies

MIT Licensed Academic Research

Fully open-source research from UC San Diego under MIT license with reproducible code, serving as an important reference for the 3D generation research community

About

The system operates in two key stages. First, a view-conditioned 2D diffusion model (based on Zero123) generates multiple consistent views of the object from different angles. Second, a cost-volume-based sparse-view reconstruction module processes these generated multi-view images to produce a 3D mesh with texture maps. This two-stage approach separates the challenging tasks of view synthesis and 3D reconstruction into manageable sub-problems. The ability to independently optimize each stage facilitates improvement of the system's overall performance and allows different components to be developed separately.

One-2-3-45's contribution lies in demonstrating that combining pre-trained 2D diffusion models with multi-view 3D reconstruction is a viable and efficient approach to single-image 3D generation. The system achieves reasonable 3D reconstruction quality in under a minute, which was a significant improvement over optimization-based methods at the time of publication that could take hours per object. This speed advantage made iterative design cycles practical in 3D content creation workflows and provided researchers with the ability to perform rapid experimental iteration.

The reconstruction module uses a cost-volume approach that aggregates information from multiple generated views to estimate 3D geometry. This approach is more robust to inconsistencies between generated views than methods that rely on single-view depth estimation. The cost volume evaluates different depth hypotheses to determine the most likely 3D geometry, coherently combining information from multiple views throughout this process. The resulting meshes include both geometry and texture information, providing usable 3D assets for visualization and prototyping purposes.

Among the system's limitations, inconsistencies between generated views can affect the final 3D reconstruction quality. These inconsistencies may become more pronounced for objects with complex geometries or fine details. However, the robustness of the cost-volume approach significantly mitigates the impact of such inconsistencies, maintaining reasonable quality across a variety of input types and object categories.

Released under the MIT license, One-2-3-45 is fully open-source with code and pre-trained weights available on GitHub. While newer models like InstantMesh and TripoSR have since achieved higher quality and faster generation, One-2-3-45 remains historically important as one of the first systems to demonstrate the multi-view-then-reconstruct paradigm that has become standard in the field. The architectural design principles established by the model continue to form the foundation of subsequent research and applications in single-image 3D reconstruction.

Use Cases

3D Generation Research Baseline

Serves as a standard comparison baseline for evaluating new single-image 3D reconstruction methods in academic research publications

Quick 3D Prototyping

Generate rough 3D models from reference images in under a minute for design prototyping and concept visualization purposes

Educational Tool for 3D AI

Learn about multi-view 3D reconstruction concepts through a well-documented, accessible implementation with clear two-stage pipeline design

Pipeline Architecture Reference

Use as an architectural reference for building custom 3D generation pipelines that combine 2D diffusion with 3D reconstruction modules

Pros & Cons

Pros

Creates 3D model from a single 2D image in 45 seconds
Zero-shot approach — no retraining needed for each object
Consistent angle synthesis with multi-view diffusion
Open-source research project

Cons

Mesh quality behind commercial tools
Quality loss in fine details and edge areas
Difficulty with asymmetric objects
Limited texture quality

Technical Details

Parameters

N/A

License

MIT

Features

Single Image to 3D
Multi-View Generation Stage
SparseView Reconstruction
Zero123 Based Pipeline
Open-Source MIT License
UC San Diego Research
Mesh Output with Textures
Academic Reference Implementation

Benchmark Results

Metric	Value	Compared To	Source
Novel View PSNR	18.8 dB (GSO)	Unique3D: 20.1 dB	arXiv 2306.16928
Üretim Süresi	~45 saniye	Wonder3D: ~3 dakika	GitHub One-2-3-45
SSIM (GSO)	0.842	Unique3D: 0.922	arXiv 2306.16928

Available Platforms

hugging face

replicate

Frequently Asked Questions

Related Models

TripoSR

Stability AI & Tripo|N/A

TripoSR is a fast feed-forward 3D reconstruction model jointly developed by Stability AI and Tripo AI that generates detailed 3D meshes from single input images in under one second. Unlike optimization-based methods that require minutes of processing per object, TripoSR uses a transformer-based architecture built on the Large Reconstruction Model framework to predict 3D geometry directly from a single 2D photograph in a single forward pass. The model accepts any standard image as input and produces a textured 3D mesh suitable for use in game engines, 3D modeling software, and augmented reality applications. TripoSR excels at reconstructing everyday objects, furniture, vehicles, characters, and organic shapes with impressive geometric accuracy and surface detail. Released under the MIT license in March 2024, the model is fully open source and can run on consumer-grade GPUs without specialized hardware. It supports batch processing for efficient conversion of multiple images and integrates seamlessly with popular 3D pipelines including Blender, Unity, and Unreal Engine. The model is particularly valuable for game developers, product designers, and e-commerce teams who need rapid 3D asset creation from product photographs. Output meshes can be exported in OBJ and GLB formats with configurable resolution settings. TripoSR represents a significant step toward democratizing 3D content creation by making high-quality reconstruction accessible without expensive scanning equipment or manual modeling expertise.

Open Source

4.5

TRELLIS

Microsoft Research|Unknown

TRELLIS is a revolutionary AI model developed by Microsoft Research that generates high-quality 3D assets from text descriptions or single 2D images using a novel Structured Latent Diffusion architecture. Released in December 2024, TRELLIS represents a fundamental advancement in 3D content generation by operating in a structured latent space that encodes geometry, texture, and material properties simultaneously rather than treating them as separate stages. The model produces complete 3D meshes with detailed PBR (Physically Based Rendering) textures, enabling direct use in game engines, 3D rendering pipelines, and AR/VR applications without extensive manual post-processing. TRELLIS supports both text-to-3D generation where users describe desired objects in natural language and image-to-3D reconstruction where a single photograph is converted into a full 3D model with inferred geometry from occluded viewpoints. The structured latent representation ensures geometric consistency and prevents the common artifacts seen in other 3D generation approaches such as floating geometry, texture seams, and unrealistic proportions. TRELLIS outputs standard 3D formats including GLB and OBJ with UV-mapped textures, making integration with professional tools like Blender, Unity, and Unreal Engine straightforward. Released under the MIT license, the model is fully open source and available on GitHub. Key applications include rapid 3D asset prototyping for game development, architectural visualization, product design mockups, virtual staging for real estate, educational 3D content creation, and metaverse asset generation. The model particularly benefits indie developers and small studios who lack resources for traditional 3D modeling workflows.

Open Source

4.5

Stable Point Aware 3D (SPA3D)

Stability AI|Unknown

Stable Point Aware 3D (SPA3D) is an advanced feed-forward 3D reconstruction model developed by Stability AI that generates high-quality textured 3D meshes from a single input image in seconds. Unlike iterative optimization-based approaches that require minutes of processing, SPA3D uses a direct feed-forward architecture that predicts 3D geometry and texture in a single pass, making it practical for interactive workflows and production pipelines. The model employs point cloud alignment techniques that significantly improve geometric consistency compared to other single-view reconstruction methods, ensuring that generated 3D models maintain accurate proportions and structural integrity from multiple viewpoints. SPA3D produces industry-standard mesh outputs with clean topology and UV-mapped textures, enabling direct import into 3D software including Blender, Unity, Unreal Engine, and professional CAD tools. The model handles diverse object categories from organic shapes like characters and animals to hard-surface objects like furniture and vehicles, adapting its reconstruction approach to the structural characteristics of each input. Released under the Stability AI Community License, the model is open source for personal and commercial use with revenue-based restrictions. Key applications include rapid 3D asset creation for game development, augmented reality content production, 3D printing preparation, virtual product photography, architectural visualization, and e-commerce 3D product displays. SPA3D is particularly valuable for creative professionals who need quick 3D mockups from concept sketches or photographs without investing hours in manual modeling. The model runs on consumer GPUs and is available through cloud APIs for scalable deployment.

Open Source

4.3

Zero123++

Stability AI|N/A

Zero123++ is a multi-view image generation model developed by Stability AI that generates six consistent canonical views of an object from a single input image. Released in 2023 under the Apache 2.0 license, the model extends the original Zero123 approach with significantly improved view consistency and serves as a critical component in modern 3D reconstruction pipelines. Zero123++ takes a single photograph or rendered image of an object and produces six evenly spaced views covering the full 360-degree range around the object, all maintaining consistent geometry, lighting, and appearance. The model is built on a fine-tuned Stable Diffusion backbone with specialized conditioning mechanisms that ensure multi-view coherence. Unlike the original Zero123 which generates views independently and often produces inconsistent results, Zero123++ generates all six views simultaneously in a single diffusion process, dramatically improving 3D consistency. The generated multi-view images serve as input for downstream 3D reconstruction methods like NeRF, Gaussian Splatting, or direct mesh reconstruction, enabling high-quality 3D model creation from a single photograph. Zero123++ is fully open source with pre-trained weights available on Hugging Face, making it accessible to researchers and developers building 3D generation systems. The model has become a foundational component in many state-of-the-art 3D generation pipelines and is widely used in academic research. It is particularly valuable for applications in game development, product visualization, and virtual reality where converting 2D images to 3D assets is a frequent workflow requirement.

Open Source

4.3

Quick Info

ParametersN/A

Typediffusion

LicenseMIT

Released2023-06

Rating4.0 / 5

CreatorUC San Diego

Links

Official Website GitHub arXiv Paper

Explore More

All Image to 3D Models

Browse category

3D Modeling with AI: From Text to Object

Read guide

All AI Models

Browse all models

One-2-3-45

Key Highlights

Pioneering Multi-View-Then-Reconstruct Paradigm

45-Second Complete Pipeline

Cost-Volume 3D Reconstruction

MIT Licensed Academic Research

About

Use Cases

3D Generation Research Baseline

Quick 3D Prototyping

Educational Tool for 3D AI

Pipeline Architecture Reference

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

Frequently Asked Questions

How does One-2-3-45 work?

Is One-2-3-45 still relevant with newer models?

What is the output quality like?

Can One-2-3-45 be used commercially?

What hardware is required for One-2-3-45?

What is the relationship between One-2-3-45 and Zero123?

Related Models

TripoSR

TRELLIS

Stable Point Aware 3D (SPA3D)

Zero123++

Quick Info

Links

Tags

Explore More