What is Stable Point Aware 3D (SPA3D)?

SPA3D is an AI system developed by Stability AI that generates 3D models from a single image. It captures geometric details more accurately using point-aware processing technology. It is compatible with game and web applications with GLB format export support.

What is the difference between SPA3D and TripoSR?

SPA3D offers higher accuracy in geometric details with its point-aware processing approach. TripoSR is speed-focused and can perform faster inference. Both models generate 3D from a single image, but SPA3D provides advantages in texture quality and UV mapping.

What hardware is needed to run SPA3D?

A GPU with at least 8GB VRAM is recommended for SPA3D. Good performance is achieved with RTX 3060 and above cards. Model inference time varies from a few seconds to a minute depending on GPU capacity. It can also run in CPU mode but takes longer.

What formats can SPA3D outputs be used in?

SPA3D produces 3D model output in GLB format. This format is directly compatible with platforms like web browsers, Unity, Unreal Engine, and Blender. UV mapping and texture information are included in the output, so models are ready to use immediately.

How can I get the best results with SPA3D?

Use clear, well-lit photos where the entire object is visible. Images with plain backgrounds give better results. For complex or highly detailed objects, you can improve results by using photos from multiple angles to provide more geometric information.

Can SPA3D be used in commercial projects?

SPA3D was developed by Stability AI and license terms are specified on the project page. Research use is generally free. For commercial use, it is recommended to review Stability AI's license terms and obtain an enterprise license if necessary.

Stable Point Aware 3D (SPA3D)

Open Source

4.3

Stability AI

Stable Point Aware 3D (SPA3D) is an advanced feed-forward 3D reconstruction model developed by Stability AI that generates high-quality textured 3D meshes from a single input image in seconds. Unlike iterative optimization-based approaches that require minutes of processing, SPA3D uses a direct feed-forward architecture that predicts 3D geometry and texture in a single pass, making it practical for interactive workflows and production pipelines. The model employs point cloud alignment techniques that significantly improve geometric consistency compared to other single-view reconstruction methods, ensuring that generated 3D models maintain accurate proportions and structural integrity from multiple viewpoints. SPA3D produces industry-standard mesh outputs with clean topology and UV-mapped textures, enabling direct import into 3D software including Blender, Unity, Unreal Engine, and professional CAD tools. The model handles diverse object categories from organic shapes like characters and animals to hard-surface objects like furniture and vehicles, adapting its reconstruction approach to the structural characteristics of each input. Released under the Stability AI Community License, the model is open source for personal and commercial use with revenue-based restrictions. Key applications include rapid 3D asset creation for game development, augmented reality content production, 3D printing preparation, virtual product photography, architectural visualization, and e-commerce 3D product displays. SPA3D is particularly valuable for creative professionals who need quick 3D mockups from concept sketches or photographs without investing hours in manual modeling. The model runs on consumer GPUs and is available through cloud APIs for scalable deployment.

Image to 3D

Visit Website

Key Highlights

3D Model from Single Image

Creates detailed and textured 3D mesh models from a single 2D image using point-aware processing.

Texture Quality with UV Mapping

Provides realistic results by applying high-quality textures to 3D model surfaces with automatic UV mapping.

Fast 3D Generation

Enables rapid prototyping with 3D model generation within seconds thanks to the optimized pipeline.

Point-Aware Processing

Special algorithm that captures geometric details more accurately by considering point distribution in 3D space.

About

SPA3D (Single-view Point-cloud Aligned 3D) is an advanced AI model developed by Stability AI that performs high-quality 3D reconstruction from a single image. Unlike other single-view 3D reconstruction methods, it significantly improves geometric consistency by using point cloud alignment techniques. Integrated into the Stable Diffusion ecosystem, SPA3D is part of Stability AI's strategic investment in the 3D generation domain.

SPA3D's technical architecture is built on three core components. First is a diffusion-based view synthesis module that generates multi-view predictions from the input image. Second is the point cloud alignment system that extracts 3D point clouds from the generated views and uses these point clouds as geometric references. Third is a reconstruction module that produces high-quality mesh and texture from the point cloud reference. This feed-forward architecture produces results much faster than optimization-based methods. The model can generate 3D output from a single image in approximately 10 seconds on an A100 GPU.

In terms of performance, SPA3D presents impressive metrics. The approximately 10-second generation time on an A100 GPU provides a significant speed advantage compared to Zero123++'s 45-second duration. An F-Score of 0.452 on the GSO dataset represents a notable improvement when compared to One-2-3-45's score of 0.311. The point cloud alignment technique provides clear advantages particularly in preserving fine geometric details and accurately constructing complex structures.

SPA3D finds applications in rapid 3D prototyping, e-commerce product visualization, game asset generation, architectural pre-visualization, and augmented reality applications. It serves as a valuable tool particularly in e-commerce scenarios requiring quick 3D model creation from a single product photograph. Designers and 3D artists can use SPA3D to create reference models during prototyping stages.

SPA3D is made accessible as part of Stability AI's open-source strategy. Model weights and inference code are available through Hugging Face. Designed for compatibility with the Stable Diffusion ecosystem, it provides easy integration with existing Stability AI tools. Its PyTorch-based infrastructure is optimized for NVIDIA GPUs.

SPA3D is a significant work demonstrating the effectiveness of the point cloud alignment approach in single-image 3D reconstruction. Compared to Wonder3D's cross-domain attention approach and TRELLIS's structured latent representations, SPA3D offers an explicit and interpretable geometric reference system. Its integration with Stability AI's Stable Diffusion ecosystem makes SPA3D accessible to a broad developer community. The balance between speed and quality makes it a preferred solution particularly in production environments.

Delving deeper into SPA3D's technical approach, the impact of the point cloud alignment technique on geometric accuracy becomes better understood. Traditional single-view 3D reconstruction methods can encounter geometric inconsistencies when directly converting multi-view predictions to mesh. SPA3D addresses this problem by extracting a point cloud as an intermediate step and using this point cloud as a reference. The point cloud serves as an explicit representation of the object's structure in 3D space and guides mesh generation. The advantage of this approach is that errors are detected and corrected at an early stage. The model's feed-forward architecture avoids optimization loops, enabling fast inference and making it suitable for production environments. Its integration with Stability AI's Stable Diffusion ecosystem facilitates using SPA3D alongside existing image generation tools. For example, an image generated with Stable Diffusion can be directly fed as input to SPA3D and converted to a 3D model within seconds. This integrated workflow creates a powerful pipeline for creative processes and rapid prototyping scenarios.

Use Cases

Rapid 3D Prototyping

Quickly creating 3D prototypes from 2D drawings or photographs during the design process.

Game and AR/VR Asset Creation

Rapid 3D asset creation for game, augmented reality, and virtual reality projects.

E-Commerce 3D Visualization

Creating 3D models from product photos to offer interactive product experience.

Digital Twin Creation

Rapidly creating digital 3D copies of real-world objects for simulation and analysis.

Pros & Cons

Pros

Stability AI's image-to-3D model — part of Stable 3D series
Enhanced geometry with point cloud-based 3D awareness
Practical use with fast inference time
Published as open source

Cons

Early-stage model — behind mature competitors
Limited texture and material quality
Uncertain future due to Stability AI's financial issues
Inconsistencies in complex geometries

Technical Details

Parameters

Unknown

Architecture

Feed-forward 3D reconstruction

Training Data

Objaverse + proprietary

License

Stability AI Community License

Features

Single image input
Textured mesh
UV mapping
Fast generation
GLB export
Point-aware processing
Multi-view generation

Benchmark Results

Metric	Value	Compared To	Source
Üretim Süresi (Single Image)	~10 saniye (A100)	Zero123++: ~45 saniye	Stability AI Blog
F-Score (GSO Dataset)	0.452	TripoSR: 0.421	SPA3D Technical Report
Novel View PSNR	21.5 dB	One-2-3-45: 19.2 dB	Papers With Code

Available Platforms

Stability API

HuggingFace

Frequently Asked Questions

Related Models

TripoSR

Stability AI & Tripo|N/A

TripoSR is a fast feed-forward 3D reconstruction model jointly developed by Stability AI and Tripo AI that generates detailed 3D meshes from single input images in under one second. Unlike optimization-based methods that require minutes of processing per object, TripoSR uses a transformer-based architecture built on the Large Reconstruction Model framework to predict 3D geometry directly from a single 2D photograph in a single forward pass. The model accepts any standard image as input and produces a textured 3D mesh suitable for use in game engines, 3D modeling software, and augmented reality applications. TripoSR excels at reconstructing everyday objects, furniture, vehicles, characters, and organic shapes with impressive geometric accuracy and surface detail. Released under the MIT license in March 2024, the model is fully open source and can run on consumer-grade GPUs without specialized hardware. It supports batch processing for efficient conversion of multiple images and integrates seamlessly with popular 3D pipelines including Blender, Unity, and Unreal Engine. The model is particularly valuable for game developers, product designers, and e-commerce teams who need rapid 3D asset creation from product photographs. Output meshes can be exported in OBJ and GLB formats with configurable resolution settings. TripoSR represents a significant step toward democratizing 3D content creation by making high-quality reconstruction accessible without expensive scanning equipment or manual modeling expertise.

Open Source

4.5

TRELLIS

Microsoft Research|Unknown

TRELLIS is a revolutionary AI model developed by Microsoft Research that generates high-quality 3D assets from text descriptions or single 2D images using a novel Structured Latent Diffusion architecture. Released in December 2024, TRELLIS represents a fundamental advancement in 3D content generation by operating in a structured latent space that encodes geometry, texture, and material properties simultaneously rather than treating them as separate stages. The model produces complete 3D meshes with detailed PBR (Physically Based Rendering) textures, enabling direct use in game engines, 3D rendering pipelines, and AR/VR applications without extensive manual post-processing. TRELLIS supports both text-to-3D generation where users describe desired objects in natural language and image-to-3D reconstruction where a single photograph is converted into a full 3D model with inferred geometry from occluded viewpoints. The structured latent representation ensures geometric consistency and prevents the common artifacts seen in other 3D generation approaches such as floating geometry, texture seams, and unrealistic proportions. TRELLIS outputs standard 3D formats including GLB and OBJ with UV-mapped textures, making integration with professional tools like Blender, Unity, and Unreal Engine straightforward. Released under the MIT license, the model is fully open source and available on GitHub. Key applications include rapid 3D asset prototyping for game development, architectural visualization, product design mockups, virtual staging for real estate, educational 3D content creation, and metaverse asset generation. The model particularly benefits indie developers and small studios who lack resources for traditional 3D modeling workflows.

Open Source

4.5

Zero123++

Stability AI|N/A

Zero123++ is a multi-view image generation model developed by Stability AI that generates six consistent canonical views of an object from a single input image. Released in 2023 under the Apache 2.0 license, the model extends the original Zero123 approach with significantly improved view consistency and serves as a critical component in modern 3D reconstruction pipelines. Zero123++ takes a single photograph or rendered image of an object and produces six evenly spaced views covering the full 360-degree range around the object, all maintaining consistent geometry, lighting, and appearance. The model is built on a fine-tuned Stable Diffusion backbone with specialized conditioning mechanisms that ensure multi-view coherence. Unlike the original Zero123 which generates views independently and often produces inconsistent results, Zero123++ generates all six views simultaneously in a single diffusion process, dramatically improving 3D consistency. The generated multi-view images serve as input for downstream 3D reconstruction methods like NeRF, Gaussian Splatting, or direct mesh reconstruction, enabling high-quality 3D model creation from a single photograph. Zero123++ is fully open source with pre-trained weights available on Hugging Face, making it accessible to researchers and developers building 3D generation systems. The model has become a foundational component in many state-of-the-art 3D generation pipelines and is widely used in academic research. It is particularly valuable for applications in game development, product visualization, and virtual reality where converting 2D images to 3D assets is a frequent workflow requirement.

Open Source

4.3

Meshy v4

Meshy AI|undisclosed

Meshy v4 is the fourth generation of Meshy AI's 3D model generation platform, capable of creating detailed, textured 3D models from text descriptions and images in minutes. Released in late 2024, Meshy v4 represents a major upgrade in mesh quality, texture fidelity, and topology optimization over previous versions. The model generates production-ready 3D assets with clean topology suitable for game engines, animation pipelines, and 3D printing. Meshy v4 supports both text-to-3D and image-to-3D generation workflows, with the image-to-3D mode producing particularly impressive results by accurately capturing shape, proportions, and surface details from reference photographs. The platform generates textured meshes with PBR (Physically Based Rendering) materials including diffuse, normal, roughness, and metallic maps, making outputs immediately compatible with Unity, Unreal Engine, and Blender. Generated models can be exported in multiple formats including GLB, OBJ, FBX, and STL. Meshy v4 features improved detail preservation, better handling of thin structures and complex geometries, and more accurate color and texture mapping. The platform serves game developers, 3D artists, architects, product designers, and content creators who need rapid 3D asset creation without manual modeling expertise. A freemium model offers limited free generations with paid plans providing higher quality, more generations, and commercial licensing.

Proprietary

4.5

Quick Info

ParametersUnknown

TypeFeed-forward

LicenseStability AI Community License

Released2024-11

ArchitectureFeed-forward 3D reconstruction

Rating4.3 / 5

CreatorStability AI

Links

Official Website stability.ai

Explore More

All Image to 3D Models

Browse category

3D Modeling with AI: From Text to Object

Read guide

All AI Models

Browse all models

Stable Point Aware 3D (SPA3D)

Key Highlights

3D Model from Single Image

Texture Quality with UV Mapping

Fast 3D Generation

Point-Aware Processing

About

Use Cases

Rapid 3D Prototyping

Game and AR/VR Asset Creation

E-Commerce 3D Visualization

Digital Twin Creation

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

Frequently Asked Questions

What is Stable Point Aware 3D (SPA3D)?

What is the difference between SPA3D and TripoSR?

What hardware is needed to run SPA3D?

What formats can SPA3D outputs be used in?

How can I get the best results with SPA3D?

Can SPA3D be used in commercial projects?

Related Models

TripoSR

TRELLIS

Zero123++

Meshy v4

Quick Info

Links

Tags

Explore More