Crawl
Base Path: /v1/crawl
Note on Responsible Crawling: This API is designed for generating context in LLM applications, not bulk data collection. Please:
Respect the target website's robots.txt and crawl limits
Use reasonable delays between requests (we enforce minimum delays)
Only crawl publicly accessible pages
Consider using official APIs when available
Cache results when possible to minimize repeat crawls
Endpoints
Start Crawl
Method: POST
Endpoint: https://api.tokensource.com/v1/crawl
Description: Initiates a multi-page crawl starting from a URL. Supports depth control, path filtering, and notifications.
Request Body:
{
"url": "https://coinmarketcap.com",
"include_paths": ["currencies/", "exchanges/", "nft/"],
"exclude_paths": ["login/", "settings/", "api/"],
"max_depth": 2,
"ignore_sitemap": true,
"limit": 10,
"allow_backward_links": true,
"allow_external_links": true,
"scrape_options": {
"formats": ["markdown", "html", "raw_html", "links", "screenshot", "extract"],
"headers": { "Authorization": "Bearer XYZ" },
"include_selectors": [".protocol-details", ".market-data", ".exchange-info"],
"exclude_selectors": [".advertisement", ".user-menu"],
"main_only_content": true,
"wait_for": 2000,
"timeout": 30000,
"extract": {
"schema": {
"coin_name": "string",
"price_usd": "number",
"market_cap": "number",
"volume_24h": "number",
"change_24h": "number"
},
"prompt": "Extract cryptocurrency market data including price, market cap, and 24h volume."
}
},
"callback_url": "https://api.yourservice.com/crypto-webhook"
}Response (202 Accepted):
Get Crawl Status
Method: GET
Endpoint: https://api.tokensource.com/v1/crawl/{crawl_id}
Description: Retrieves the current status of a crawl operation.
Parameters:
crawl_id(string, required)
Response (200 OK):
Webhook Events
The crawl endpoint sends updates to your webhook URL with the following event types:
crawl.started:
crawl.page:
crawl.completed:
crawl.failed:
Error Responses
Response (400 Bad Request):
Response (401 Unauthorized):
Response (429 Too Many Requests):
Last updated