What is Surya OCR and what is it used for?

Surya OCR is an open-source document OCR system developed by Vikram Seshadri. It can perform text detection, recognition, layout analysis, and table detection in over 90 languages. It offers performance that can compete with commercial solutions in terms of speed and accuracy.

What is the difference between Surya OCR and Tesseract?

Surya OCR uses modern deep learning techniques to offer higher accuracy than Tesseract, especially in complex layouts and multilingual documents. It includes advanced features like layout analysis and table detection. It is also faster in batch processing thanks to GPU optimization.

Does Surya OCR support Turkish?

Yes, Surya OCR supports over 90 languages and Turkish is among them. Turkish-specific characters (ş, ğ, ü, ö, ç, ı) are correctly recognized. Layout analysis and reading order determination for complex Turkish documents are also supported.

What hardware is needed to run Surya OCR?

Surya OCR can run on both CPU and GPU. Much faster results are obtained with GPU, especially for large document collections. A GPU with at least 4GB VRAM is recommended. It also works in CPU mode but processing time increases significantly.

How does Surya OCR compare to PaddleOCR?

Both Surya OCR and PaddleOCR are strong open-source OCR solutions. Surya particularly stands out in layout analysis and reading order determination. PaddleOCR offers broader language support and mobile deployment optimization. You can choose based on your use scenario.

Can Surya OCR extract table data?

Yes, Surya OCR can automatically detect tables in documents and extract them as structured data. It correctly identifies table rows, columns, and cell contents. This feature is particularly valuable in use cases like invoice processing and financial document analysis.

Surya OCR

Open Source

4.5

VikParuchuri

Surya OCR is a modern AI-powered optical character recognition model developed by Vik Paruchuri that supports over 90 languages with impressive accuracy across diverse document types. Built on a Vision Transformer architecture inspired by the Donut framework, Surya takes an encoder-decoder approach that processes document images directly without requiring traditional text detection as a separate preprocessing step. The model extracts text content along with precise bounding box coordinates, enabling both full-text extraction and position-aware document understanding. Beyond basic character recognition, Surya includes a comprehensive document layout analysis module that identifies structural elements such as headers, paragraphs, tables, figures, lists, and captions, providing a complete understanding of document organization. The model handles complex document layouts including multi-column pages, academic papers with equations, invoices with tabular data, and historical documents with non-standard typography. Surya achieves competitive or superior accuracy compared to commercial OCR services on many benchmarks while running locally without requiring cloud API calls, making it suitable for privacy-sensitive document processing. Released under the GPL-3.0 license, the model is open source and actively maintained with regular updates. It provides a Python API and command-line interface for batch processing. Key applications include digitizing printed and handwritten documents, extracting structured data from invoices and receipts, converting scanned books and academic papers to searchable text, processing legal and medical documents, archival document preservation, and building document understanding pipelines for enterprise content management systems. Surya is particularly valued for its strong multilingual support covering Latin, Cyrillic, CJK, Arabic, Devanagari, and many other scripts.

OCR

Visit Website

Key Highlights

Support for 90+ Languages

Meets multilingual document processing needs with text detection and recognition in over 90 languages.

Advanced Layout Analysis

Determines correct reading order by automatically detecting document structure, columns, headings, and paragraphs.

Table Detection and Extraction

Offers capability to automatically detect tables in documents and extract them as structured data.

High Speed with GPU

Meets batch OCR needs by rapidly processing large document collections thanks to GPU optimization.

About

Surya OCR is a modern AI model developed for document-level multilingual optical character recognition, supporting over 90 languages with impressive accuracy across diverse document types. This high-performance model uses an encoder-decoder architecture based on the Donut framework, extracting image features with a Swin Transformer encoder and generating text with an mBART decoder. Unlike traditional OCR systems, Surya employs an end-to-end deep learning architecture that excels in complex document layouts where conventional rule-based approaches struggle significantly.

Surya's architecture includes a transformer-based text recognition module and an advanced layout analysis module that work in concert to understand document structure. The layout analysis automatically detects and classifies different elements in the document such as text blocks, tables, headers, footnotes, captions, and images with high precision. This provides users with rich structural information about the document, ensuring that text output faithfully reflects the original document format and reading order. Multi-column newspaper pages, complex nested table structures, interleaved lists, and mixed-layout academic papers are processed successfully with high fidelity. The line detection module can correctly identify and process skewed and rotated text segments.

The model can recognize multiple writing systems including Latin, Cyrillic, Arabic, Chinese, Japanese, Korean, and Indic scripts with consistent accuracy. This extensive language and script support provides significant advantages in international document processing, archive digitization, and multilingual content management projects across organizations. Turkish character recognition performance achieves high accuracy rates including all special characters (ç, ğ, ı, ö, ş, ü), making it reliable for Turkish-language document processing workflows and historical Ottoman document digitization efforts.

Surya OCR achieves competitive results on ICDAR benchmarks and demonstrates performance comparable to commercial solutions such as Google Cloud Vision and AWS Textract, while remaining completely free and open source without usage limits. Compared to traditional OCR tools like Tesseract, its performance is notably superior especially in handwriting recognition, low-resolution scans, degraded documents, and complex layouts. It works on PDFs, image files (JPEG, PNG, TIFF, WebP), and scanned documents with consistent quality regardless of input format or scanning conditions.

Available as open source on GitHub, Surya OCR can be easily installed via pip and used programmatically through its comprehensive Python API. The CLI tool supports batch document processing and offers a scalable batch processing pipeline for automatic digitization of large archives and document collections. It produces structured output in JSON and hOCR formats, facilitating integration with search engines, document management systems, and downstream applications for indexing and retrieval. It performs fast inference on GPU while also delivering reasonable performance on CPU for smaller workloads.

Serving as an ideal solution for document digitization, archive scanning, invoice processing, contract analysis, medical record transcription, legal document processing, and accessibility applications, Surya OCR provides a powerful and free alternative for researchers, developers, and organizations with document processing automation needs at any scale. Its active developer community and regular updates ensure continuous improvement in recognition accuracy, expanded language and script coverage, and enhanced processing speed for production deployments.

Use Cases

Document Digitization

Converting paper documents, archives, and books to digital text format to make them searchable.

Academic Paper Processing

Digitizing academic papers in correct format with layout analysis and text extraction.

Invoice and Form Processing

Automating data entry by automatically extracting table and form data from business documents.

Multilingual Content Processing

Meeting the needs of multilingual organizations by batch processing documents in different languages.

Pros & Cons

Pros

Versatile document OCR toolkit supporting 90+ languages
Line-level text detection, layout analysis, and reading order detection
Structured data extraction with table recognition
Faster and more accurate results compared to Tesseract

Cons

Specialized for document OCR — weak on photos and natural scene text
Handwritten text recognition not supported
Falls behind some newer vision-language models in certain tests
GPU requirement — slow processing on CPU

Technical Details

Parameters

Unknown

Architecture

Vision Transformer

Training Data

Proprietary multilingual dataset

License

GPL-3.0

Features

90+ languages
Layout analysis
Table detection
Reading order
Fast
Line-level detection
GPU optimized

Benchmark Results

Metric	Value	Compared To	Source
Doğruluk Oranı (General Benchmark)	%93.2 (avg across scripts)	Tesseract: %80.1	Surya GitHub Benchmarks
Desteklenen Diller	90+ dil & yazı sistemi	PaddleOCR: 80+ dil	GitHub Repository
Satır Algılama (Line Detection F1)	0.957	DocTR: 0.921	Surya Benchmark Suite
İşleme Hızı (A100)	~200ms/sayfa (GPU)	PaddleOCR: ~150ms/sayfa	Surya GitHub Benchmarks

Available Platforms

GitHub

PyPI

Frequently Asked Questions

Related Models

PaddleOCR

Baidu|15M

PaddleOCR is a comprehensive optical character recognition system developed by Baidu on the PaddlePaddle deep learning framework, supporting over 80 languages with industry-grade accuracy and speed. The latest PP-OCRv4 architecture employs a three-stage pipeline consisting of text detection, direction classification, and text recognition, each optimized independently for maximum performance. With approximately 15 million parameters in its lightweight configuration, PaddleOCR achieves an exceptional balance between accuracy and inference speed, running efficiently on both server GPUs and edge devices including mobile phones and embedded systems. The system excels at recognizing text in complex real-world scenarios including curved text, rotated text, dense multi-line layouts, and text overlaid on textured backgrounds. PaddleOCR supports Latin, Chinese, Japanese, Korean, Arabic, Cyrillic, and dozens of other scripts with dedicated recognition models for each language family. Beyond basic OCR, the toolkit includes document structure analysis for extracting tables, headers, and paragraphs from scanned documents, as well as key information extraction capabilities for invoices, receipts, and forms. Released under the Apache 2.0 license, PaddleOCR is fully open source and has become one of the most starred OCR repositories on GitHub. It provides pre-trained models, training scripts, and deployment tools for ONNX, TensorRT, and OpenVINO formats. Common applications include document digitization, license plate recognition, receipt processing, handwriting recognition, and industrial text inspection in manufacturing quality control.