What is Document Digitization?

Document Digitization is a 3B parameter state-space Vision Language Model (VLM) purpose-built for high-accuracy Document Intelligence. It extracts text, tables, and structural information from documents across 23 languages (22 Indian + English) with world-class accuracy.

What languages does Document Digitization support?

Document Digitization supports all 22 official Indian languages: Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Assamese, Urdu, Sanskrit, Nepali, Dogri, Bodo, Punjabi, Odia, Konkani, Maithili, Sindhi, Kashmiri, Manipuri, and Santali, plus English.

What input formats are supported?

Document Digitization accepts PDF, PNG, JPG, and ZIP files (flat archives containing JPG/PNG document pages). Output is delivered as a ZIP file containing the processed document in either HTML or Markdown format.

How does the API work?

The Document Intelligence API uses an asynchronous job-based workflow: create a job with your desired language and output format, upload your document, start processing, and then download the results. This design handles large documents and batch workflows efficiently.

How accurate is table extraction?

Document Digitization excels at extracting complex tables, including those with merged cells, multi-level headers, and invisible borders. It preserves row/column structure and outputs clean HTML or Markdown tables ready for downstream processing.

What is the pricing for Document Digitization?

Document Digitization is priced at Rs 1.50 per page. Visit the pricing page or API dashboard for the latest details and free tier availability.

Document Intelligence,
built for Bharat

Extract text, tables, and structure from documents in 23 languages with world-class accuracy. Powered by our 3B parameter vision-language model.

Vision

Learn more

Drop your document here

or click to browse. PNG, JPG, PDF up to 10 MB

Trusted by leading teams

What can Sarvam Vision do?

Tap on any example to see it in action.

Visual reasoning

Understand charts, diagrams, and infographics natively in 23 languages. Sarvam Vision interprets visual elements in context, not just the text around them.

Knowledge extraction

Go beyond OCR. Extract data from trend lines, preserve nested tables, and interpret complex layouts. Every pixel is treated as information.

In-the-wild OCR

Read signboards, street scenes, and real-world documents across Indian scripts. General image perception powers the document intelligence.

Unlock the full Vision stack in your dashboard.

Powering real-world document
workflows

Document digitization

Convert scanned documents, PDFs, and legacy archives into structured, searchable digital formats across all Indian languages.

Government records & archives

Academic papers & textbooks

Legal documents & contracts

Historical & cultural manuscripts

Built for Indian documents

Production-grade document intelligence with structured outputs, async processing, and enterprise-ready APIs.

23 languages with native Indic script support

All 22 scheduled Indian languages plus English, with accurate script recognition across every script family.

PDF, PNG, JPG & ZIP input

Process any document format. Single pages or bulk archives.

Accurate table extraction

Handles merged cells, multi-level headers, and invisible borders perfectly.

HTML & Markdown output

Clean, structured output ready for downstream processing.

Async job-based API

Upload, process, and download. Designed for large documents and batch workflows.

State-of-the-art document intelligence

Document Digitization leads global benchmarks for OCR accuracy across Indian languages.

olmOCR: Overall Performance

Score (%) · Higher is better

Developers love building with Sarvam

Don't take our word for it.

"We digitized 50,000+ government records in Hindi and Marathi. Document Digitization handled handwritten notes and degraded scans that every other OCR tool choked on."

Ankit P.

CTO, GovTech Startup

"Table extraction is unreal. Complex financial reports with merged cells and multi-level headers, it gets them right every time. Saved us months of manual work."

Meera S.

Data Engineering Lead

"The async job API is well-designed. We process 10,000+ pages daily in batch. Upload, wait, download. Simple and reliable at scale."

Rahul K.

Backend Engineer

"Finally an OCR that doesn't force everything through English. Our Tamil and Bengali documents come out in their original script with perfect accuracy."

Divya N.

ML Engineer

"We replaced a 3-vendor pipeline with a single Document Digitization API call. Text extraction, table parsing, structure preservation, all in one. Integration took a day."

Sanjay M.

Engineering Manager

"Processing medical prescriptions across 8 Indian languages with consistent accuracy. This is the kind of Document Intelligence India needed."

Dr. Kavitha R.

HealthTech Founder

"We digitized 50,000+ government records in Hindi and Marathi. Document Digitization handled handwritten notes and degraded scans that every other OCR tool choked on."

Ankit P.

CTO, GovTech Startup

"Table extraction is unreal. Complex financial reports with merged cells and multi-level headers, it gets them right every time. Saved us months of manual work."

Meera S.

Data Engineering Lead

"The async job API is well-designed. We process 10,000+ pages daily in batch. Upload, wait, download. Simple and reliable at scale."

Rahul K.

Backend Engineer

"Finally an OCR that doesn't force everything through English. Our Tamil and Bengali documents come out in their original script with perfect accuracy."

Divya N.

ML Engineer

"We replaced a 3-vendor pipeline with a single Document Digitization API call. Text extraction, table parsing, structure preservation, all in one. Integration took a day."

Sanjay M.

Engineering Manager

"Processing medical prescriptions across 8 Indian languages with consistent accuracy. This is the kind of Document Intelligence India needed."

Dr. Kavitha R.

HealthTech Founder

23 languages, every script natively understood

हिन्दीHindi · hi-IN

বাংলাBengali · bn-IN

தமிழ்Tamil · ta-IN

తెలుగుTelugu · te-IN

मराठीMarathi · mr-IN

ગુજરાતીGujarati · gu-IN

ಕನ್ನಡKannada · kn-IN

മലയാളംMalayalam · ml-IN

অসমীয়াAssamese · as-IN

اردوUrdu · ur-IN

संस्कृतम्Sanskrit · sa-IN

नेपालीNepali · ne-IN

डोगरीDogri · doi-IN

बड़ोBodo · brx-IN

ਪੰਜਾਬੀPunjabi · pa-IN

ଓଡ଼ିଆOdia · od-IN

कोंकणीKonkani · kok-IN

मैथिलीMaithili · mai-IN

سنڌيSindhi · sd-IN

कॉशुरKashmiri · ks-IN

মৈতৈলোন্Manipuri · mni-IN

ᱥᱟᱱᱛᱟᱲᱤSantali · sat-IN

EnglishEnglish · en-IN

Developer-first platform

OpenAI-compatible APIs. Drop-in SDKs for Python and Node.js. Go from zero to first extraction in under 5 minutes.

REST & WebSocket APIs

Standard REST for batch processing, WebSocket for real-time streaming with low-latency responses.

SDKs & libraries

Official Python and Node.js SDKs with TypeScript support. pip install sarvam-ai.

Complete documentation

Interactive API reference, code samples, and integration guides for every endpoint.

Free tier included

Start building immediately. No credit card, no sales call, no minimum commitment.

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

# Convert text to speech
audio = client.text_to_speech.convert(
    target_language_code="en-IN",
    text="Welcome to Sarvam AI!",
    model="bulbul:v3",
    speaker="shubh"
)

save(audio, "output1.wav")

Enterprise-ready. Responsible AI.

Built with safety, compliance, and data sovereignty at the core.

SOC 2 Type II & ISO 27001

Enterprise-grade security certifications. Annual audits, documented controls, continuous monitoring.

Data sovereignty

All data processed and stored in India. No cross-border transfers. Full compliance with Indian data regulations.

No training on your data

Your API inputs are never used for model training. Zero data retention after processing unless explicitly requested.

Document-level encryption

End-to-end encryption for all uploaded documents. Data encrypted at rest and in transit with enterprise key management.

Content safety filters

Automated detection and filtering of sensitive PII data with configurable redaction policies.

Audit-ready logging

Comprehensive API usage logs, access controls, and RBAC for enterprise governance and compliance reporting.

Simple, transparent
pricing

Start free. Scale as you grow. No hidden costs.

Base plan

₹1.5 per page

Free trial included

No credit card required. Get API keys instantly.

PDF, PNG, JPG & ZIP support

HTML & Markdown output

Volume discounts available

Enterprise pricing available

23 languages included

Async job-based processing

Your questions, answered

Start extracting in minutes

Document Intelligence, built for Bharat

Vision

What can Sarvam Vision do?

Visual reasoning

Knowledge extraction

In-the-wild OCR

Powering real-world document workflows

Document digitization

Built for Indian documents

23 languages with native Indic script support

PDF, PNG, JPG & ZIP input

Accurate table extraction

HTML & Markdown output

Async job-based API

State-of-the-art document intelligence

olmOCR: Overall Performance

Developers love building with Sarvam

23 languages, every script natively understood

Developer-first platform

Enterprise-ready. Responsible AI.

SOC 2 Type II & ISO 27001

Data sovereignty

No training on your data

Document-level encryption

Content safety filters

Audit-ready logging

Simple, transparent pricing

Your questions, answered

What is Document Digitization?

What languages does Document Digitization support?

What input formats are supported?

How does the API work?

How accurate is table extraction?

What is the pricing for Document Digitization?

Document Intelligence,
built for Bharat

Powering real-world document
workflows

Simple, transparent
pricing