What is the relationship between OpenLRM, TripoSR, and InstantMesh?

OpenLRM provides the foundational open-source implementation of the Large Reconstruction Model (LRM) architecture for single-image 3D reconstruction. TripoSR, developed by Stability AI and Tripo AI, builds upon the LRM architecture with optimizations for speed, achieving sub-second generation. InstantMesh, from Tencent, extends the concept with multi-view generation and FlexiCubes extraction for higher quality output. Both TripoSR and InstantMesh credit OpenLRM and the LRM architecture as key influences on their design.

How fast does OpenLRM generate 3D models?

OpenLRM generates 3D models through feed-forward inference in approximately 5-15 seconds depending on the model size and GPU hardware used. This is dramatically faster than optimization-based methods like DreamFusion that can take hours per object. However, it is somewhat slower than TripoSR's sub-second generation, as OpenLRM represents the original LRM implementation without the speed optimizations applied in later models. The trade-off is that OpenLRM provides a clean, well-documented codebase suitable for research.

Can OpenLRM be used commercially?

Yes, OpenLRM is released under the Apache 2.0 license, which permits unrestricted commercial use, modification, and distribution without licensing fees. You can use OpenLRM to generate 3D assets for commercial products, build commercial services based on the model, and create proprietary derivative works. The open training code also enables training custom models on proprietary data for specialized commercial applications.

What hardware does OpenLRM require?

OpenLRM requires a GPU with at least 8-12GB VRAM for inference with the smaller model variants, and 16-24GB VRAM for the larger models. NVIDIA RTX 3080 or equivalent GPUs provide good performance for standard use. Training the model from scratch requires significantly more resources, typically multi-GPU setups with high-memory GPUs. The pre-trained checkpoints on Hugging Face allow immediate inference use without training requirements.

What output formats does OpenLRM support?

OpenLRM generates triplane-NeRF representations that can be used in two ways. First, the triplane can be queried for volumetric rendering to produce novel view images of the reconstructed object. Second, mesh extraction through marching cubes converts the triplane representation to a polygonal mesh that can be exported in standard 3D formats. The mesh output is suitable for use in 3D software, game engines, and other standard 3D workflows.

How does OpenLRM handle different types of input images?

OpenLRM works best with single-object images where the subject is clearly visible and well-lit. The Vision Transformer encoder provides robust feature extraction across diverse image types including photographs, renders, and artwork. Images with clean backgrounds produce better results than complex scenes. The model generalizes reasonably well across different object categories, though performance may vary based on how well the object category was represented in the training data.

OpenLRM

Open Source

4.1

Zexiang Xu

OpenLRM is an open-source implementation of the Large Reconstruction Model architecture for single-image 3D reconstruction, developed by Zexiang Xu and collaborators. The project provides a fully open and reproducible implementation of the LRM approach, which uses a transformer-based architecture to predict 3D representations from single input images in a feed-forward manner. OpenLRM processes an input image through a pre-trained vision encoder like DINOv2, then feeds the resulting features into a transformer decoder that generates a triplane-based neural radiance field representation, which can be rendered from novel viewpoints or converted to a textured 3D mesh. The entire reconstruction takes only a few seconds on a modern GPU, making it practical for interactive applications and batch processing workflows. Released under the Apache 2.0 license in December 2023, OpenLRM fills a critical gap in the 3D AI research community by providing an accessible reference implementation that researchers can study, modify, and build upon. The model supports various output formats and can be integrated into existing 3D pipelines for applications ranging from game development to e-commerce product visualization. OpenLRM handles diverse object categories including furniture, vehicles, characters, and everyday items with reasonable geometric fidelity. Pre-trained model weights are available on Hugging Face for immediate use. As one of the foundational open-source projects in feed-forward 3D reconstruction, OpenLRM has directly influenced and enabled numerous downstream projects and research efforts in the rapidly evolving single-image 3D generation space.

Text to 3D

Image to 3D

Visit Website

Key Highlights

Foundation LRM Architecture

Open-source reference implementation of the Large Reconstruction Model architecture that influenced subsequent models including TripoSR and InstantMesh

Triplane-NeRF 3D Representation

Uses three axis-aligned feature planes to encode 3D geometry and appearance, enabling both volumetric rendering and mesh extraction from a compact representation

Vision Transformer Encoding

Leverages pre-trained Vision Transformer (ViT) encoders for robust visual feature extraction, providing strong generalization across diverse input image types

Reproducible Open Research

Provides fully reproducible training and inference code with pre-trained checkpoints, enabling the research community to build upon and extend the LRM paradigm

About

OpenLRM is an open-source implementation of the Large Reconstruction Model (LRM) architecture for single-image 3D reconstruction, developed by Zexiang Xu and collaborators. The project provides a fully open and reproducible implementation of the LRM approach, which uses a transformer-based architecture to reconstruct 3D objects from single images through a triplane neural radiance field representation. Playing a key role in democratizing the LRM paradigm, OpenLRM has served as the foundation for numerous subsequent 3D reconstruction models.

The LRM architecture processes an input image through a pre-trained vision transformer (ViT) encoder to extract visual features, then uses a transformer decoder to predict a triplane representation of the 3D object. This triplane representation consists of three axis-aligned feature planes that encode the object's geometry and appearance. The triplane can be queried at any 3D point to obtain density and color values, enabling both volumetric rendering and mesh extraction through marching cubes. DINO and DINOv2-based vision encoders extract strong semantic features from the input image, enhancing reconstruction accuracy and enabling the model to produce consistent results across different object types.

OpenLRM's significance lies in its open implementation of a powerful 3D reconstruction paradigm. While the original LRM paper described the architecture, OpenLRM made the approach accessible to the broader research and development community by providing pre-trained weights, training code, and inference scripts. This has enabled numerous downstream projects and research efforts to build upon the LRM foundation. The project offers multiple model variants at different scales, allowing users to adjust the speed-quality trade-off according to their specific needs, and this flexibility provides value in both research and production environments.

The model supports both text-to-3D and image-to-3D workflows, though image-to-3D is the primary use case. Feed-forward inference enables generation in seconds rather than the minutes or hours required by optimization-based methods. The output triplane representation can be rendered as novel views or extracted as a textured 3D mesh for use in standard 3D applications. The model's transformer-based architecture effectively captures global context information from the input image, enabling consistent 3D reconstruction even from partial views and logically completing unseen portions of the object.

In terms of training infrastructure, OpenLRM was trained on the Objaverse and Objaverse-XL datasets and demonstrates generalization capacity across diverse object categories. The project open-sources the entire training process, enabling researchers to retrain or fine-tune the model on their own datasets. This transparency has raised reproducibility standards in academic research and facilitated rapid iteration across the community of researchers and developers working in this space.

Released under the Apache 2.0 license, OpenLRM is freely available for research and commercial applications. The project is hosted on Hugging Face with pre-trained model checkpoints at multiple scales. OpenLRM has served as a foundation for several subsequent 3D reconstruction models, including TripoSR and InstantMesh, which built upon and improved the LRM architectural pattern, and it continues to serve as an indispensable reference implementation for the field.

Use Cases

3D Reconstruction Research

Serves as a baseline and starting point for academic research in feed-forward 3D reconstruction, providing reproducible results for comparison studies

Custom Model Development

Use OpenLRM's architecture and training code as a foundation for developing specialized 3D reconstruction models fine-tuned on domain-specific datasets

Rapid 3D Asset Generation

Generate 3D models from reference images quickly for prototyping, visualization, and content creation workflows requiring fast turnaround

Pipeline Component Integration

Integrate as a 3D reconstruction component within larger content creation or processing pipelines alongside image generation and post-processing tools

Pros & Cons

Pros

Open-source model for 3D reconstruction from single images
Open implementation of the Large Reconstruction Model concept
Efficient transformer-based architecture
Free to use for research and prototyping

Cons

Production quality lower than commercial solutions
Limited resolution and detail level
Errors in geometry estimation from unseen angles
Documentation and community support limited

Technical Details

Parameters

N/A

License

Apache 2.0

Features

Single Image to 3D Reconstruction
Large Reconstruction Model Architecture
Triplane-NeRF Representation
Fast Feed-Forward Inference
Open-Source Apache 2.0
Multiple Resolution Support
Mesh Export Capability
Hugging Face Integration

Benchmark Results

Metric	Value	Compared To	Source
Novel View PSNR	21.0 dB (GSO)	InstantMesh: 22.2 dB	GitHub 3DTopia/OpenLRM
SSIM (GSO)	0.856	InstantMesh: 0.880	GitHub 3DTopia/OpenLRM
Üretim Süresi	~5 saniye	—	GitHub 3DTopia/OpenLRM
Parametre Sayısı	~300M	—	Hugging Face Model Card

Available Platforms

hugging face

replicate

Frequently Asked Questions

Related Models

TripoSR

Stability AI & Tripo|N/A

TripoSR is a fast feed-forward 3D reconstruction model jointly developed by Stability AI and Tripo AI that generates detailed 3D meshes from single input images in under one second. Unlike optimization-based methods that require minutes of processing per object, TripoSR uses a transformer-based architecture built on the Large Reconstruction Model framework to predict 3D geometry directly from a single 2D photograph in a single forward pass. The model accepts any standard image as input and produces a textured 3D mesh suitable for use in game engines, 3D modeling software, and augmented reality applications. TripoSR excels at reconstructing everyday objects, furniture, vehicles, characters, and organic shapes with impressive geometric accuracy and surface detail. Released under the MIT license in March 2024, the model is fully open source and can run on consumer-grade GPUs without specialized hardware. It supports batch processing for efficient conversion of multiple images and integrates seamlessly with popular 3D pipelines including Blender, Unity, and Unreal Engine. The model is particularly valuable for game developers, product designers, and e-commerce teams who need rapid 3D asset creation from product photographs. Output meshes can be exported in OBJ and GLB formats with configurable resolution settings. TripoSR represents a significant step toward democratizing 3D content creation by making high-quality reconstruction accessible without expensive scanning equipment or manual modeling expertise.

Open Source

4.5

TRELLIS

Microsoft Research|Unknown

TRELLIS is a revolutionary AI model developed by Microsoft Research that generates high-quality 3D assets from text descriptions or single 2D images using a novel Structured Latent Diffusion architecture. Released in December 2024, TRELLIS represents a fundamental advancement in 3D content generation by operating in a structured latent space that encodes geometry, texture, and material properties simultaneously rather than treating them as separate stages. The model produces complete 3D meshes with detailed PBR (Physically Based Rendering) textures, enabling direct use in game engines, 3D rendering pipelines, and AR/VR applications without extensive manual post-processing. TRELLIS supports both text-to-3D generation where users describe desired objects in natural language and image-to-3D reconstruction where a single photograph is converted into a full 3D model with inferred geometry from occluded viewpoints. The structured latent representation ensures geometric consistency and prevents the common artifacts seen in other 3D generation approaches such as floating geometry, texture seams, and unrealistic proportions. TRELLIS outputs standard 3D formats including GLB and OBJ with UV-mapped textures, making integration with professional tools like Blender, Unity, and Unreal Engine straightforward. Released under the MIT license, the model is fully open source and available on GitHub. Key applications include rapid 3D asset prototyping for game development, architectural visualization, product design mockups, virtual staging for real estate, educational 3D content creation, and metaverse asset generation. The model particularly benefits indie developers and small studios who lack resources for traditional 3D modeling workflows.

Open Source

4.5

Meshy

Meshy AI|N/A

Meshy is a proprietary AI-powered 3D generation platform developed by Meshy AI that creates detailed, production-ready 3D models from text descriptions and images. The platform combines text-to-3D and image-to-3D capabilities with advanced AI texturing features, positioning itself as a comprehensive solution for rapid 3D content creation. Meshy uses a transformer-based architecture that generates textured 3D meshes with PBR-compatible materials, making outputs directly usable in game engines like Unity and Unreal Engine without additional processing. The platform offers multiple generation modes including text-to-3D for creating objects from written descriptions, image-to-3D for converting photographs into 3D models, and AI texturing for applying realistic materials to existing untextured meshes. Generated models include proper UV mapping, normal maps, and physically based rendering materials suitable for professional workflows. Meshy provides both a web-based interface and an API for programmatic access, making it accessible to individual artists and scalable for enterprise pipelines. The platform is particularly popular among game developers, animation studios, and AR/VR content creators who need to produce large volumes of 3D assets efficiently. As a proprietary commercial service launched in 2023, Meshy operates on a subscription model with free tier access for limited generations. The platform continuously updates its models to improve output quality, topology optimization, and texture fidelity, competing directly with other AI 3D generation services in the rapidly evolving market.

Proprietary

4.4

Meshy v4

Meshy AI|undisclosed

Meshy v4 is the fourth generation of Meshy AI's 3D model generation platform, capable of creating detailed, textured 3D models from text descriptions and images in minutes. Released in late 2024, Meshy v4 represents a major upgrade in mesh quality, texture fidelity, and topology optimization over previous versions. The model generates production-ready 3D assets with clean topology suitable for game engines, animation pipelines, and 3D printing. Meshy v4 supports both text-to-3D and image-to-3D generation workflows, with the image-to-3D mode producing particularly impressive results by accurately capturing shape, proportions, and surface details from reference photographs. The platform generates textured meshes with PBR (Physically Based Rendering) materials including diffuse, normal, roughness, and metallic maps, making outputs immediately compatible with Unity, Unreal Engine, and Blender. Generated models can be exported in multiple formats including GLB, OBJ, FBX, and STL. Meshy v4 features improved detail preservation, better handling of thin structures and complex geometries, and more accurate color and texture mapping. The platform serves game developers, 3D artists, architects, product designers, and content creators who need rapid 3D asset creation without manual modeling expertise. A freemium model offers limited free generations with paid plans providing higher quality, more generations, and commercial licensing.

Proprietary

4.5

Quick Info

ParametersN/A

Typetransformer

LicenseApache 2.0

Released2023-12

Rating4.1 / 5

CreatorZexiang Xu

Links

Official Website GitHub HuggingFace

Explore More

All Text to 3D Models

Browse category

3D Modeling with AI: From Text to Object

Read guide

AI 3D Modeling Beginner's Guide

Read guide

All AI Models

Browse all models

OpenLRM

Key Highlights

Foundation LRM Architecture

Triplane-NeRF 3D Representation

Vision Transformer Encoding

Reproducible Open Research

About

Use Cases

3D Reconstruction Research

Custom Model Development

Rapid 3D Asset Generation

Pipeline Component Integration

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

Frequently Asked Questions

What is the relationship between OpenLRM, TripoSR, and InstantMesh?

How fast does OpenLRM generate 3D models?

Can OpenLRM be used commercially?

What hardware does OpenLRM require?

What output formats does OpenLRM support?

How does OpenLRM handle different types of input images?

Related Models

TripoSR

TRELLIS

Meshy

Meshy v4

Quick Info

Links

Tags

Explore More