local_fire_departmentHoneystax
search⌘K
loginLog Inperson_addSign Up
layers
HONEYSTAX TERMINAL v1.0
HomeNewsSavedSubmit
Back to the live board
S

spider

Agent

Web crawler and scraper for Rust

Copy the install, test the workflow, then decide if it earns a permanent slot.

2,443
Why nowMoving now

Fresh repo activity plus visible builder pull. This is the kind of tool people test before it turns obvious.

DecisionHigh-conviction move

Copy the install, test the workflow, then decide if it earns a permanent slot.

Trial costMedium lift

Reasonable to try, but it will take more than a quick skim to get real signal.

Risk14/100

GitHub health 87/100. no security policy. Fresh enough repo health and manageable issue load keep the risk controlled.

What You Are Adopting

AI Agent

Universal

Model

Multiple

Build Time

Minutes

Test This In Your Stack

One command inClean rollbackLow commitment
shieldSandboxedInstalls to ~/.claude — isolated from your projects. One command to remove.

Fastest way to find out if spider belongs in your setup.

Copy the install command, run a real test, and back it out cleanly if it slows you down.

Try now
git clone https://github.com/spider-rs/spider ~/.claude/agents/spider

Run this first. You will know quickly if the workflow earns a permanent slot.

Back out
rm -rf ~/.claude/agents/spider

No messy cleanup loop. If it misses, remove it and keep moving.

Install Location

~/  └─ .claude/      ├─ commands/      ├─ agents/      │   └─ spider/ ← installs here      └─ settings.json

About

Web crawler and scraper for Rust

README

Spider

Build Status Crates.io Downloads Documentation License: MIT Discord

Website | Guides | API Docs | Examples | Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

  • Crawl 100k+ pages in minutes on a single machine. See benchmarks.
  • HTTP, Chrome CDP, WebDriver, and AI automation in one dependency.
  • Production-ready with caching, proxy rotation, anti-bot bypass, and distributed crawling. Feature-gated so you only compile what you use.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"
use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }
use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

Crawler Language Time vs Spider
spider Rust 73 ms baseline
node-crawler JavaScript 15 s 205x slower
colly Go 32 s 438x slower
wget C 70 s 959x slower

Linux (2-core, 7 GB RAM):

Crawler Language Time vs Spider
spider Rust 50 ms baseline
node-crawler JavaScript 3.4 s 68x slower
colly Go 30 s 600x slower
wget C 60 s 1200x slower

The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.

Only compile what you use. Every optional capability (Chrome, caching, proxies, AI) lives behind a Cargo feature flag. A minimal spider = "2" stays lean.

Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

Crawling
  • Concurrent and streaming crawls with backpressure
  • Decentralized crawling for horizontal scaling
  • Caching: memory, disk (SQLite), or hybrid Chrome cache
  • Proxy support with rotation
  • Cron job scheduling
  • Depth budgeting, blacklisting, whitelisting
  • Smart mode that auto-detects JS-rendered content and upgrades to Chrome
Browser Automation
  • Chrome DevTools Protocol: headless or headed, stealth mode, screenshots, request interception
  • WebDriver: Selenium Grid, remote browsers, cross-browser testing
  • AI-powered challenge solving (deterministic + Chrome built-in AI)
  • Anti-bot fingerprinting, ad blocking, firewall
Data Processing
  • HTML transformations (Markdown, text, structured extraction)
  • CSS/XPath scraping with spider_utils
  • OpenAI and Gemini integration for content analysis
AI Agent
  • spider_agent: concurrent-safe multimodal web automation agent
  • Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
  • Web research with search providers (Serper, Brave, Bing, Tavily)
  • 110 built-in automation skills for web challenges

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();
Mode Strategy Best For
Proxy (default) All traffic through Spider Cloud proxy General crawling with IP rotation
Smart (recommended) Proxy + auto-fallback on bot detection Production (speed + reliability)
Fallback Direct first, API on failure Cost-efficient, most sites work without help
Unblocker All requests through unblocker Aggressive bot protection

Free credits on signup. Get started at spider.cloud

Get Spider

Package Language Install
spider Rust cargo add spider
spider_cli CLI cargo install spider_cli
spider-nodejs Node.js npm i @spider-rs/spider-rs
spider-py Python pip install spider_rs
spider_agent Rust cargo add spider --features agent

Cloud and Remote

Package Description
Spider Cloud Managed crawling infrastructure, no setup needed
spider-clients SDKs for Spider Cloud in multiple languages
spider-browser Remote access to Spider's Rust browser

Resources

  • 64 examples covering crawling, Chrome, WebDriver, AI, caching, and more
  • API documentation
  • Benchmarks
  • Changelog

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT

Tech Stack

RustGoJavaScriptPythonSQLiteOpenAILLM

Installation

cargo add spider

cargo install spider_cli

npm i @spider-rs/spider-rs

Open Live ProjectAudit Repo

Reviews0

Log in to write a review.

ActiveLast commit today
Submitted April 29, 2026

auto_awesomeYour strongest next moves after spider