the commonplace notebook for eliana

❯

❯

❯

types of scraping

types of scraping

May 08, 20262 min read

Core product types / features

API – one API to turn the web into structured, LLM‑ready data
Search (web search + page content)
Scrape (URL → clean markdown / JSON / text / screenshots)
Interact (cloud sandboxes / agents acting on pages)
Crawl (crawl entire websites)
Map (map all URLs / site structure)

Data / format types

Structured data (LLM‑ready)
Clean markdown
Structured JSON
Screenshots
Semantic text
Page content (full page, not just links)
Real‑time context / fresh knowledge

System / infra‑style types

Agent (AI agent, “intelligence as core”)
RAG pipeline (retrieval‑augmented generation)
AI‑native software (foundation for)
MCP (Agent tools with MCP / “live web access with MCP”)
Cloud sandbox (for agents)
Endpoint
Onboarding (how teams get started)

Use‑case / application types

Deep research
Multi‑step web research (with live data)
Smarter AI chats
AI assistants
AI agent tools
Lead enrichment
Sales pipeline building
Web data (in general, “wherever it lives”)

Stages / modes

Research Preview
Here’s a summary of the software items mentioned, grouped by category:

Headless Browser APIs

Splash, Zombie.js, SimpleBrowser, DotNetBrowser

Browser Automation (Unified Interface)

Selenium WebDriver, Playwright, Puppeteer

Test Automation

Capybara, Jasmine, Cypress, QF-Test

Browser API Alternatives

Deno (built-in browser APIs), jsdom (for Node.js), HtmlUnit (Java-based, uses Rhino engine for JS/Ajax)

Graph View

Core product types / features
Data / format types
System / infra‑style types
Use‑case / application types
Stages / modes
Headless Browser APIs
Browser Automation (Unified Interface)
Test Automation
Browser API Alternatives

Created with Quartz v4.5.2 © 2026

github
colophon