GROUPDOCS .Parser

          GroupDocs.Parser · Product Family
        

Extract text & data

Extract text, images, and metadata from PDF, Word, Excel, email, and fixed-layout formats — or pull structured data with templates.

Live demo Get started

50+

formats

platforms

MIT

examples

document.pdf · GroupDocs.Parser

Install in seconds

Pick your platform, copy the package command, and ship your first integration.

.NET v26.4.0

dotnet add package GroupDocs.Parser

2M downloads

Java v26.5.0

implementation 'com.groupdocs:groupdocs-parser:26.5.0'

Python v25.12.0

pip install groupdocs-parser-net

Quick start — .NET

using GroupDocs.Parser;

using var parser = new Parser("document.pdf");
parser.GetText();

What you can build

GroupDocs.Parser in production — fast, flexible, and source-agnostic.

Text & images

Extract raw or formatted text plus embedded images.

Template parsing

Pull structured fields and tables with reusable templates.

Container support

Parse archives, emails, and PDF portfolios.

Encoding detection

Detect and handle text encoding automatically.

Supported formats

A representative slice of the formats GroupDocs.Parser works with.

Documents

PDF DOCX DOC RTF ODT TXT

Spreadsheets

XLSX XLS CSV ODS

Presentations

PPTX PPT ODP

Images

PNG JPG TIFF BMP

Free · ads-free · no install

Try it live in your browser

Run GroupDocs.Parser on your own files in the free, ads-free Parser web app — no install required. Files are deleted after 24 hours.

Open the Parser app

Open-source examples

View all repositories →

GroupDocs.Parser-for-.NET

GroupDocs.Parser-for-Java

groupdocs-parser.github.io

HTML5

GroupDocs.Parser-Docs

HTML5

Groupdocs.Parser-References

GroupDocs.Parser-Products

Python4