extractr

CLI ToolJanuary 27, 2026

Template-based data extraction from web pages using YAML configs. Define extraction rules once, run forever. Supports pagination, transforms, nested fields, and multiple output formats.

TypeScript
Bun
Playwright
Web Scraping
CLI
YAML

Key Features

  • YAML-based configuration for extraction rules
  • Built-in templates for common sites (HN, Amazon, Reddit)
  • Field type detection (text, number, currency, date, boolean, list, nested)
  • Transform pipeline (trim, regex, replace, split, parse)
  • Pagination support with configurable delays
  • Multiple output formats (JSON, JSONL, CSV)
  • Interactive TUI for debugging
  • Custom template validation