Sarvam AI
Sarvam Motif

Sarvam Akshar

Intelligent document digitization for India and beyond.

Akshar extracts text from complex layouts, tables, and Indic scripts. Get structured output in HTML, JSON, or Markdown, ready to use.

Where digitization falls short

Most OCR tools weren't built for Indian documents. Complex layouts, dense tables, and scripts where a single matra changes the word.

Complex layouts break

Multi-column pages, dense tables, and mixed-format documents come out garbled.

Indic scripts are misread

Conjuncts, matras, and diacritics get confused. Low-resource scripts are worse.

Tables lose structure

Rows merge, columns shift, and cell boundaries disappear.

No way to fix errors

One-shot output. If the OCR is wrong, you redo it manually.

Raw text dumps

You get a wall of text. No layout, no reading order, no structure.

The Akshar Pipeline

Four stages, powered by vision-language models. A proprietary harness keeps output reliable on messy, real-world documents.

Layout understanding

Layout understanding

Identifies paragraphs, headers, tables, footnotes, and figures in the document.

Reading order

Reading order

Figures out how the document is meant to be read, across columns and sections.

Text extraction

Text extraction

OCR across 22 Indic languages and English. Handles multilingual pages.

Structured output

Structured output

Returns HTML, JSON, or Markdown. Layout and reading order stay intact.

Proofreading with visual grounding

You see every change before it exports. Nothing leaves without your approval.

Review and correct before export

Every block, paragraph, and table cell is linked to its location in the source document. You always see where things came from.

Visual groundingManual editingAgent-driven corrections

Click any extracted element to see its position in the original scan.

Fix text, relabel blocks, and restructure layout with the source document alongside.

Describe what you want changed. The agent applies it across the entire document.

22 Indic languages + English

Hindi to Santali. High-resource and low-resource scripts alike. Conjuncts, matras, and diacritics come through correctly. Multilingual pages work out of the box.

Assamese Bengali Bodo Dogri English Gujarati Hindi Kannada Kashmiri Konkani Maithili Malayalam Manipuri Marathi Nepali Odia Punjabi Sanskrit Santali Sindhi Tamil Telugu Urdu

For every kind of document team

Government & public records

Convert administrative files, forms, and historical
records into searchable, structured formats.

Publishing & archives

Turn scanned books, backlists, and out-of-print
titles into accessible e-books.

Finance & legal

Process contracts, statements, court records, and compliance docs.

Research & education

Extract text from manuscripts, newspapers, textbooks, and primary sources.

Developers

Add digitization to your product with the Akshar API.

Questions? Answers.

Akshar is a document digitization product. It reads complex layouts, tables, and Indic scripts, and returns structured output in HTML, JSON, or Markdown with layout and reading order intact.
The API is for batch processing. Send documents, get structured output, no manual step. The Platform adds a visual interface where you can review, edit, and correct output before exporting.
Twenty-two Indic languages plus English: Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu.
HTML, JSON, and Markdown. All three preserve the original layout and reading order.
Contact us. We will walk through your use case and set up access.

Try Akshar. See the structured output.