olmocr

olmocr

Toolkit for linearizing PDFs for LLM datasets/training

18k 1.5k
Apache-2.0
last commit 2026-03-25
Source
Share:

About

Toolkit for linearizing PDFs for LLM datasets/training

Languages

Contributors15

No features listed.

Comments Theme
Platforms
Hosting
Self-hosted
Install
Docker
pip
Build from source
slug: olmocr