extractr
CLI Tool • January 27, 2026
Template-based data extraction from web pages using YAML configs. Define extraction rules once, run forever. Supports pagination, transforms, nested fields, and multiple output formats.
TypeScript
Bun
Playwright
Web Scraping
CLI
YAML
Key Features
- •YAML-based configuration for extraction rules
- •Built-in templates for common sites (HN, Amazon, Reddit)
- •Field type detection (text, number, currency, date, boolean, list, nested)
- •Transform pipeline (trim, regex, replace, split, parse)
- •Pagination support with configurable delays
- •Multiple output formats (JSON, JSONL, CSV)
- •Interactive TUI for debugging
- •Custom template validation