Crawl

Base Path: /v1/crawl

Note on Responsible Crawling: This API is designed for generating context in LLM applications, not bulk data collection. Please:

  • Respect the target website's robots.txt and crawl limits

  • Use reasonable delays between requests (we enforce minimum delays)

  • Only crawl publicly accessible pages

  • Consider using official APIs when available

  • Cache results when possible to minimize repeat crawls

Endpoints

Start Crawl

Method: POST Endpoint: https://api.tokensource.com/v1/crawl Description: Initiates a multi-page crawl starting from a URL. Supports depth control, path filtering, and notifications.

Request Body:

{
  "url": "https://coinmarketcap.com",
  "include_paths": ["currencies/", "exchanges/", "nft/"],
  "exclude_paths": ["login/", "settings/", "api/"],
  "max_depth": 2,
  "ignore_sitemap": true,
  "limit": 10,
  "allow_backward_links": true,
  "allow_external_links": true,
  "scrape_options": {
    "formats": ["markdown", "html", "raw_html", "links", "screenshot", "extract"],
    "headers": { "Authorization": "Bearer XYZ" },
    "include_selectors": [".protocol-details", ".market-data", ".exchange-info"],
    "exclude_selectors": [".advertisement", ".user-menu"],
    "main_only_content": true,
    "wait_for": 2000,
    "timeout": 30000,
    "extract": {
      "schema": {
        "coin_name": "string",
        "price_usd": "number",
        "market_cap": "number",
        "volume_24h": "number",
        "change_24h": "number"
      },
      "prompt": "Extract cryptocurrency market data including price, market cap, and 24h volume."
    }
  },
  "callback_url": "https://api.yourservice.com/crypto-webhook"
}

Response (202 Accepted):

{
  "status": 202,
  "data": {
    "crawl_id": "crypto_crawl_123",
    "status": "started",
    "estimated_pages": 5
  },
  "message": "Crypto market data crawl started successfully"
}

Get Crawl Status

Method: GET Endpoint: https://api.tokensource.com/v1/crawl/{crawl_id} Description: Retrieves the current status of a crawl operation.

Parameters:

  • crawl_id (string, required)

Response (200 OK):

{
  "status": 200,
  "data": {
    "crawl_id": "crypto_crawl_123",
    "status": "in_progress",
    "pages_crawled": 25,
    "total_pages": 50,
    "start_time": "2024-03-14T10:00:00Z",
    "last_page": "https://coinmarketcap.com/currencies/bitcoin"
  },
  "message": "Crypto market data crawl in progress"
}

Webhook Events

The crawl endpoint sends updates to your webhook URL with the following event types:

crawl.started:

{
  "status": 200,
  "event_type": "crawl.started",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "start_time": "2024-03-14T10:00:00Z",
    "estimated_pages": 50,
    "market": "cryptocurrency"
  }
}

crawl.page:

{
  "status": 200,
  "event_type": "crawl.page",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "url": "https://coinmarketcap.com/currencies/ethereum",
    "markdown": "# Ethereum (ETH)\n\nCurrent Price: $3,245.67\nMarket Cap: $389.5B\n24h Volume: $15.2B",
    "html": "<div class='coin-details'>...</div>",
    "links": [
      "https://coinmarketcap.com/currencies/ethereum/markets",
      "https://coinmarketcap.com/currencies/ethereum/news"
    ],
    "extractions": {
      "coin_name": "Ethereum",
      "price_usd": 3245.67,
      "market_cap": 389500000000,
      "volume_24h": 15200000000,
      "change_24h": 2.5
    },
    "page_metadata": {
      "title": "Ethereum Price, ETH Price Index, Chart, and Info | CoinMarketCap",
      "description": "Get Ethereum price, charts, and other cryptocurrency info"
    }
  }
}

crawl.completed:

{
  "status": 200,
  "event_type": "crawl.completed",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "total_pages": 50,
    "duration_seconds": 120,
    "end_time": "2024-03-14T10:02:00Z",
    "summary": {
      "total_coins": 25,
      "total_exchanges": 15,
      "total_nft_collections": 10
    }
  }
}

crawl.failed:

{
  "status": 500,
  "event_type": "crawl.failed",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "error": "Rate limit exceeded on CoinMarketCap API",
    "pages_completed": 25,
    "last_successful_url": "https://coinmarketcap.com/currencies/cardano"
  }
}

Error Responses

Response (400 Bad Request):

{
  "status": 400,
  "data": null,
  "message": "Invalid cryptocurrency market selectors"
}

Response (401 Unauthorized):

{
  "status": 401,
  "data": null,
  "message": "Invalid or missing API key for CoinMarketCap access"
}

Response (429 Too Many Requests):

{
  "status": 429,
  "data": null,
  "message": "Market data rate limit exceeded. Try again in 60 seconds"
}

Last updated