# Crawl

**Base Path:** `/v1/crawl`

> **Note on Responsible Crawling**: This API is designed for generating context in LLM applications, not bulk data collection. Please:
>
> * Respect the target website's robots.txt and crawl limits
> * Use reasonable delays between requests (we enforce minimum delays)
> * Only crawl publicly accessible pages
> * Consider using official APIs when available
> * Cache results when possible to minimize repeat crawls

## Endpoints

### Start Crawl

**Method:** `POST`\
**Endpoint:** `https://api.tokensource.com/v1/crawl`\
**Description:** Initiates a multi-page crawl starting from a URL. Supports depth control, path filtering, and notifications.

**Request Body:**

```json
{
  "url": "https://coinmarketcap.com",
  "include_paths": ["currencies/", "exchanges/", "nft/"],
  "exclude_paths": ["login/", "settings/", "api/"],
  "max_depth": 2,
  "ignore_sitemap": true,
  "limit": 10,
  "allow_backward_links": true,
  "allow_external_links": true,
  "scrape_options": {
    "formats": ["markdown", "html", "raw_html", "links", "screenshot", "extract"],
    "headers": { "Authorization": "Bearer XYZ" },
    "include_selectors": [".protocol-details", ".market-data", ".exchange-info"],
    "exclude_selectors": [".advertisement", ".user-menu"],
    "main_only_content": true,
    "wait_for": 2000,
    "timeout": 30000,
    "extract": {
      "schema": {
        "coin_name": "string",
        "price_usd": "number",
        "market_cap": "number",
        "volume_24h": "number",
        "change_24h": "number"
      },
      "prompt": "Extract cryptocurrency market data including price, market cap, and 24h volume."
    }
  },
  "callback_url": "https://api.yourservice.com/crypto-webhook"
}
```

**Response (202 Accepted):**

```json
{
  "status": 202,
  "data": {
    "crawl_id": "crypto_crawl_123",
    "status": "started",
    "estimated_pages": 5
  },
  "message": "Crypto market data crawl started successfully"
}
```

### Get Crawl Status

**Method:** `GET`\
**Endpoint:** `https://api.tokensource.com/v1/crawl/{crawl_id}`\
**Description:** Retrieves the current status of a crawl operation.

**Parameters:**

* `crawl_id` (string, required)

**Response (200 OK):**

```json
{
  "status": 200,
  "data": {
    "crawl_id": "crypto_crawl_123",
    "status": "in_progress",
    "pages_crawled": 25,
    "total_pages": 50,
    "start_time": "2024-03-14T10:00:00Z",
    "last_page": "https://coinmarketcap.com/currencies/bitcoin"
  },
  "message": "Crypto market data crawl in progress"
}
```

### Webhook Events

The crawl endpoint sends updates to your webhook URL with the following event types:

**crawl.started:**

```json
{
  "status": 200,
  "event_type": "crawl.started",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "start_time": "2024-03-14T10:00:00Z",
    "estimated_pages": 50,
    "market": "cryptocurrency"
  }
}
```

**crawl.page:**

```json
{
  "status": 200,
  "event_type": "crawl.page",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "url": "https://coinmarketcap.com/currencies/ethereum",
    "markdown": "# Ethereum (ETH)\n\nCurrent Price: $3,245.67\nMarket Cap: $389.5B\n24h Volume: $15.2B",
    "html": "<div class='coin-details'>...</div>",
    "links": [
      "https://coinmarketcap.com/currencies/ethereum/markets",
      "https://coinmarketcap.com/currencies/ethereum/news"
    ],
    "extractions": {
      "coin_name": "Ethereum",
      "price_usd": 3245.67,
      "market_cap": 389500000000,
      "volume_24h": 15200000000,
      "change_24h": 2.5
    },
    "page_metadata": {
      "title": "Ethereum Price, ETH Price Index, Chart, and Info | CoinMarketCap",
      "description": "Get Ethereum price, charts, and other cryptocurrency info"
    }
  }
}
```

**crawl.completed:**

```json
{
  "status": 200,
  "event_type": "crawl.completed",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "total_pages": 50,
    "duration_seconds": 120,
    "end_time": "2024-03-14T10:02:00Z",
    "summary": {
      "total_coins": 25,
      "total_exchanges": 15,
      "total_nft_collections": 10
    }
  }
}
```

**crawl.failed:**

```json
{
  "status": 500,
  "event_type": "crawl.failed",
  "data": {
    "crawl_id": "crypto_crawl_123",
    "error": "Rate limit exceeded on CoinMarketCap API",
    "pages_completed": 25,
    "last_successful_url": "https://coinmarketcap.com/currencies/cardano"
  }
}
```

### Error Responses

**Response (400 Bad Request):**

```json
{
  "status": 400,
  "data": null,
  "message": "Invalid cryptocurrency market selectors"
}
```

**Response (401 Unauthorized):**

```json
{
  "status": 401,
  "data": null,
  "message": "Invalid or missing API key for CoinMarketCap access"
}
```

**Response (429 Too Many Requests):**

```json
{
  "status": 429,
  "data": null,
  "message": "Market data rate limit exceeded. Try again in 60 seconds"
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tokensource.com/crawl.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
