Base Path: /v1/crawl
Note on Responsible Crawling : This API is designed for generating context in LLM applications, not bulk data collection. Please:
Respect the target website's robots.txt and crawl limits
Use reasonable delays between requests (we enforce minimum delays)
Only crawl publicly accessible pages
Consider using official APIs when available
Cache results when possible to minimize repeat crawls
Endpoints
Start Crawl
Method: POST
Endpoint: https://api.tokensource.com/v1/crawl
Description: Initiates a multi-page crawl starting from a URL. Supports depth control, path filtering, and notifications.
Request Body:
Copy {
"url": "https://coinmarketcap.com",
"include_paths": ["currencies/", "exchanges/", "nft/"],
"exclude_paths": ["login/", "settings/", "api/"],
"max_depth": 2,
"ignore_sitemap": true,
"limit": 10,
"allow_backward_links": true,
"allow_external_links": true,
"scrape_options": {
"formats": ["markdown", "html", "raw_html", "links", "screenshot", "extract"],
"headers": { "Authorization": "Bearer XYZ" },
"include_selectors": [".protocol-details", ".market-data", ".exchange-info"],
"exclude_selectors": [".advertisement", ".user-menu"],
"main_only_content": true,
"wait_for": 2000,
"timeout": 30000,
"extract": {
"schema": {
"coin_name": "string",
"price_usd": "number",
"market_cap": "number",
"volume_24h": "number",
"change_24h": "number"
},
"prompt": "Extract cryptocurrency market data including price, market cap, and 24h volume."
}
},
"callback_url": "https://api.yourservice.com/crypto-webhook"
}
Response (202 Accepted):
Copy {
"status": 202,
"data": {
"crawl_id": "crypto_crawl_123",
"status": "started",
"estimated_pages": 5
},
"message": "Crypto market data crawl started successfully"
}
Get Crawl Status
Method: GET
Endpoint: https://api.tokensource.com/v1/crawl/{crawl_id}
Description: Retrieves the current status of a crawl operation.
Parameters:
crawl_id
(string, required)
Response (200 OK):
Copy {
"status": 200,
"data": {
"crawl_id": "crypto_crawl_123",
"status": "in_progress",
"pages_crawled": 25,
"total_pages": 50,
"start_time": "2024-03-14T10:00:00Z",
"last_page": "https://coinmarketcap.com/currencies/bitcoin"
},
"message": "Crypto market data crawl in progress"
}
Webhook Events
The crawl endpoint sends updates to your webhook URL with the following event types:
crawl.started:
Copy {
"status": 200,
"event_type": "crawl.started",
"data": {
"crawl_id": "crypto_crawl_123",
"start_time": "2024-03-14T10:00:00Z",
"estimated_pages": 50,
"market": "cryptocurrency"
}
}
crawl.page:
Copy {
"status": 200,
"event_type": "crawl.page",
"data": {
"crawl_id": "crypto_crawl_123",
"url": "https://coinmarketcap.com/currencies/ethereum",
"markdown": "# Ethereum (ETH)\n\nCurrent Price: $3,245.67\nMarket Cap: $389.5B\n24h Volume: $15.2B",
"html": "<div class='coin-details'>...</div>",
"links": [
"https://coinmarketcap.com/currencies/ethereum/markets",
"https://coinmarketcap.com/currencies/ethereum/news"
],
"extractions": {
"coin_name": "Ethereum",
"price_usd": 3245.67,
"market_cap": 389500000000,
"volume_24h": 15200000000,
"change_24h": 2.5
},
"page_metadata": {
"title": "Ethereum Price, ETH Price Index, Chart, and Info | CoinMarketCap",
"description": "Get Ethereum price, charts, and other cryptocurrency info"
}
}
}
crawl.completed:
Copy {
"status": 200,
"event_type": "crawl.completed",
"data": {
"crawl_id": "crypto_crawl_123",
"total_pages": 50,
"duration_seconds": 120,
"end_time": "2024-03-14T10:02:00Z",
"summary": {
"total_coins": 25,
"total_exchanges": 15,
"total_nft_collections": 10
}
}
}
crawl.failed:
Copy {
"status": 500,
"event_type": "crawl.failed",
"data": {
"crawl_id": "crypto_crawl_123",
"error": "Rate limit exceeded on CoinMarketCap API",
"pages_completed": 25,
"last_successful_url": "https://coinmarketcap.com/currencies/cardano"
}
}
Error Responses
Response (400 Bad Request):
Copy {
"status": 400,
"data": null,
"message": "Invalid cryptocurrency market selectors"
}
Response (401 Unauthorized):
Copy {
"status": 401,
"data": null,
"message": "Invalid or missing API key for CoinMarketCap access"
}
Response (429 Too Many Requests):
Copy {
"status": 429,
"data": null,
"message": "Market data rate limit exceeded. Try again in 60 seconds"
}
Last updated 4 months ago