Test This In Your Stack

One command inClean rollbackLow commitment

SandboxedInstalls to ~/.claude — isolated from your projects. One command to remove.

Fastest way to find out if spider belongs in your setup.

Copy the install command, run a real test, and back it out cleanly if it slows you down.

Try now

git clone https://github.com/spider-rs/spider ~/.claude/agents/spider

Run this first. You will know quickly if the workflow earns a permanent slot.

Back out

rm -rf ~/.claude/agents/spider

No messy cleanup loop. If it misses, remove it and keep moving.

Install Location

~/  └─ .claude/      ├─ commands/      ├─ agents/      │   └─ spider/ ← installs here      └─ settings.json

README

Spider

Website | Guides | API Docs | Examples | Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

Crawl 100k+ pages in minutes on a single machine. See benchmarks.
HTTP, Chrome CDP, WebDriver, and AI automation in one dependency.
Production-ready with caching, proxy rotation, anti-bot bypass, and distributed crawling. Feature-gated so you only compile what you use.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }

use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	73 ms	baseline
node-crawler	JavaScript	15 s	205x slower
colly	Go	32 s	438x slower
wget	C	70 s	959x slower

Linux (2-core, 7 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	50 ms	baseline
node-crawler	JavaScript	3.4 s	68x slower
colly	Go	30 s	600x slower
wget	C	60 s	1200x slower

The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.

Only compile what you use. Every optional capability (Chrome, caching, proxies, AI) lives behind a Cargo feature flag. A minimal spider = "2" stays lean.

Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

Crawling

Concurrent and streaming crawls with backpressure
Decentralized crawling for horizontal scaling
Caching: memory, disk (SQLite), or hybrid Chrome cache
Proxy support with rotation
Cron job scheduling
Depth budgeting, blacklisting, whitelisting
Smart mode that auto-detects JS-rendered content and upgrades to Chrome

Browser Automation

Chrome DevTools Protocol: headless or headed, stealth mode, screenshots, request interception
WebDriver: Selenium Grid, remote browsers, cross-browser testing
AI-powered challenge solving (deterministic + Chrome built-in AI)
Anti-bot fingerprinting, ad blocking, firewall

Data Processing

HTML transformations (Markdown, text, structured extraction)
CSS/XPath scraping with spider_utils
OpenAI and Gemini integration for content analysis

AI Agent

spider_agent: concurrent-safe multimodal web automation agent
Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
Web research with search providers (Serper, Brave, Bing, Tavily)
110 built-in automation skills for web challenges

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();

Mode	Strategy	Best For
Proxy (default)	All traffic through Spider Cloud proxy	General crawling with IP rotation
Smart (recommended)	Proxy + auto-fallback on bot detection	Production (speed + reliability)
Fallback	Direct first, API on failure	Cost-efficient, most sites work without help
Unblocker	All requests through unblocker	Aggressive bot protection

Free credits on signup. Get started at spider.cloud

Get Spider

Package	Language	Install
spider	Rust	`cargo add spider`
spider_cli	CLI	`cargo install spider_cli`
spider-nodejs	Node.js	`npm i @spider-rs/spider-rs`
spider-py	Python	`pip install spider_rs`
spider_agent	Rust	`cargo add spider --features agent`

Cloud and Remote

Package	Description
Spider Cloud	Managed crawling infrastructure, no setup needed
spider-clients	SDKs for Spider Cloud in multiple languages
spider-browser	Remote access to Spider's Rust browser

Resources

64 examples covering crawling, Chrome, WebDriver, AI, caching, and more
API documentation
Benchmarks
Changelog

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT

Test This In Your Stack

One command inClean rollbackLow commitment

SandboxedInstalls to ~/.claude — isolated from your projects. One command to remove.

Fastest way to find out if spider belongs in your setup.

Copy the install command, run a real test, and back it out cleanly if it slows you down.

Try now

git clone https://github.com/spider-rs/spider ~/.claude/agents/spider

Run this first. You will know quickly if the workflow earns a permanent slot.

Back out

rm -rf ~/.claude/agents/spider

No messy cleanup loop. If it misses, remove it and keep moving.

Install Location

~/  └─ .claude/      ├─ commands/      ├─ agents/      │   └─ spider/ ← installs here      └─ settings.json

README

Spider

Website | Guides | API Docs | Examples | Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

Crawl 100k+ pages in minutes on a single machine. See benchmarks.
HTTP, Chrome CDP, WebDriver, and AI automation in one dependency.
Production-ready with caching, proxy rotation, anti-bot bypass, and distributed crawling. Feature-gated so you only compile what you use.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }

use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	73 ms	baseline
node-crawler	JavaScript	15 s	205x slower
colly	Go	32 s	438x slower
wget	C	70 s	959x slower

Linux (2-core, 7 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	50 ms	baseline
node-crawler	JavaScript	3.4 s	68x slower
colly	Go	30 s	600x slower
wget	C	60 s	1200x slower

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Only compile what you use. Every optional capability (Chrome, caching, proxies, AI) lives behind a Cargo feature flag. A minimal spider = "2" stays lean.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

Crawling

Concurrent and streaming crawls with backpressure
Decentralized crawling for horizontal scaling
Caching: memory, disk (SQLite), or hybrid Chrome cache
Proxy support with rotation
Cron job scheduling
Depth budgeting, blacklisting, whitelisting
Smart mode that auto-detects JS-rendered content and upgrades to Chrome

Browser Automation

Chrome DevTools Protocol: headless or headed, stealth mode, screenshots, request interception
WebDriver: Selenium Grid, remote browsers, cross-browser testing
AI-powered challenge solving (deterministic + Chrome built-in AI)
Anti-bot fingerprinting, ad blocking, firewall

Data Processing

HTML transformations (Markdown, text, structured extraction)
CSS/XPath scraping with spider_utils
OpenAI and Gemini integration for content analysis

AI Agent

spider_agent: concurrent-safe multimodal web automation agent
Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
Web research with search providers (Serper, Brave, Bing, Tavily)
110 built-in automation skills for web challenges

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();

Mode	Strategy	Best For
Proxy (default)	All traffic through Spider Cloud proxy	General crawling with IP rotation
Smart (recommended)	Proxy + auto-fallback on bot detection	Production (speed + reliability)
Fallback	Direct first, API on failure	Cost-efficient, most sites work without help
Unblocker	All requests through unblocker	Aggressive bot protection

Free credits on signup. Get started at spider.cloud

Get Spider

Package	Language	Install
spider	Rust	`cargo add spider`
spider_cli	CLI	`cargo install spider_cli`
spider-nodejs	Node.js	`npm i @spider-rs/spider-rs`
spider-py	Python	`pip install spider_rs`
spider_agent	Rust	`cargo add spider --features agent`

Cloud and Remote

Package	Description
Spider Cloud	Managed crawling infrastructure, no setup needed
spider-clients	SDKs for Spider Cloud in multiple languages
spider-browser	Remote access to Spider's Rust browser

Resources

64 examples covering crawling, Chrome, WebDriver, AI, caching, and more
API documentation
Benchmarks
Changelog

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT

spider

What You Are Adopting

Test This In Your Stack

About

README

Spider

Quick Start

Command Line

Rust

Streaming

Headless Chrome

Benchmarks

Why Spider?

Features

Spider Cloud

Get Spider

Cloud and Remote

Resources

Contributing

License

Tech Stack

Installation

Reviews0

auto_awesomeYour strongest next moves after spider

spider

What You Are Adopting

Test This In Your Stack

About

README

Spider

Quick Start

Command Line

Rust

Streaming

Headless Chrome

Benchmarks

Why Spider?

Features

Spider Cloud

Get Spider

Cloud and Remote

Resources

Contributing

License

Tech Stack

Installation

Reviews0

auto_awesomeYour strongest next moves after spider