trafilatura

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

6.1k 380
Apache-2.0
last commit 2026-06-10
Website Source
Share:

About

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Languages

Contributors30

No features listed.

Comments Theme
slug: trafilatura