GroupDocs
GROUPDOCS
.Parser
GroupDocs.Parser · Product Family

Extract text & data

Extract text, images, and metadata from PDF, Word, Excel, email, and fixed-layout formats — or pull structured data with templates.

Live demo Get started
50+
formats
3
platforms
MIT
examples
document.pdf · GroupDocs.Parser

Install in seconds

Pick your platform, copy the package command, and ship your first integration.

.NET v26.4.0
dotnet add package GroupDocs.Parser
2M downloads
Java v26.5.0
implementation 'com.groupdocs:groupdocs-parser:26.5.0'
Python v25.12.0
pip install groupdocs-parser-net
Quick start — .NET
using GroupDocs.Parser;

using var parser = new Parser("document.pdf");
parser.GetText();

What you can build

GroupDocs.Parser in production — fast, flexible, and source-agnostic.

Text & images

Extract raw or formatted text plus embedded images.

Template parsing

Pull structured fields and tables with reusable templates.

Container support

Parse archives, emails, and PDF portfolios.

Encoding detection

Detect and handle text encoding automatically.

Supported formats

A representative slice of the formats GroupDocs.Parser works with.

Documents
PDF DOCX DOC RTF ODT TXT
Spreadsheets
XLSX XLS CSV ODS
Presentations
PPTX PPT ODP
Images
PNG JPG TIFF BMP
Free · ads-free · no install

Try it live in your browser

Run GroupDocs.Parser on your own files in the free, ads-free Parser web app — no install required. Files are deleted after 24 hours.

Open the Parser app

Open-source examples

View all repositories →
GroupDocs.Parser-for-.NET
17
GroupDocs.Parser-for-Java
10
groupdocs-parser.github.io
HTML5
GroupDocs.Parser-Docs
HTML5
Groupdocs.Parser-References
5
GroupDocs.Parser-Products
Python4