scrape_

Documentation

REST API and MCP server reference

Authentication

Generate an API key from the Settings page. Pass it as a bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

REST API

GET /v1/health

Returns service status. No authentication required.

curl https://small-giraffe-45.convex.site/v1/health

POST /v1/scrape

Scrape one or more URLs. Use sync mode for a single URL (returns results inline) or async mode for batch jobs (up to 5 URLs).

Request body

FieldTypeDescription
urlsstring[]URLs to scrape (1–5). Sync mode requires exactly 1.
modestringsync (default) or async
formatstringmarkdown (default), html, or both
configobjectOptional config overrides (see below)
promptstringOptional LLM extraction prompt (max 10,000 chars). Response returned in the extract field.

Config options

OptionDefaultDescription
timeoutMs30000Request timeout (1–120,000 ms)
maxConcurrency3Parallel fetches per batch (1–5)
cacheTtlSeconds3600Cache TTL (60–3,600 seconds)
userAgentProfiledefaultUA profile: default, mobile, or tablet
jitterMs{ min: 500, max: 2000 }Random delay range between requests (0–60,000 ms)

Sync example

curl -X POST https://small-giraffe-45.convex.site/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "format": "markdown"
  }'

Sync response

{
  "requestId": "abc123",
  "mode": "sync",
  "result": {
    "url": "https://example.com",
    "status": "success",
    "cached": false,
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "requestedUrl": "https://example.com",
      "finalUrl": "https://example.com/",
      "title": "Example Domain",
      "httpStatus": 200,
      "fetchDurationMs": 342,
      "fetchedAt": 1710800000000,
      "expiresAt": 1710803600000
    }
  }
}

Async example (batch)

curl -X POST https://small-giraffe-45.convex.site/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com", "https://example.org"],
    "mode": "async"
  }'

# Response:
# { "requestId": "abc123", "mode": "async", "jobId": "job_456" }

LLM extraction example

curl -X POST https://small-giraffe-45.convex.site/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/pricing"],
    "prompt": "Extract all pricing tiers as JSON with name, price, and features[]"
  }'

GET /v1/jobs/:jobId

Poll the status of an async job.

curl https://small-giraffe-45.convex.site/v1/jobs/job_456 \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response:
# {
#   "jobId": "job_456",
#   "status": "completed",
#   "totalCount": 2,
#   "completedCount": 2,
#   "successCount": 2,
#   "failureCount": 0,
#   "expiresAt": 1710803600000
# }

Job statuses: queued, running, completed, failed, partial.

GET /v1/jobs/:jobId/results

Fetch results for a completed (or partial) async job.

curl https://small-giraffe-45.convex.site/v1/jobs/job_456/results \
  -H "Authorization: Bearer YOUR_API_KEY"

Error codes

When a scrape fails, the error field contains a code and message:

CodeDescription
INVALID_URLURL is malformed
UNSUPPORTED_SCHEMEOnly http/https URLs are supported
PRIVATE_NETWORK_TARGETURL resolves to a private/internal IP
TIMEOUTRequest exceeded timeout
FETCH_FAILEDNetwork error or non-2xx response
NON_HTML_RESPONSEResponse content-type is not HTML
TOO_MANY_URLSExceeded max 5 URLs per request
UNAUTHORIZEDMissing or invalid API key
USER_DISABLEDAccount is disabled
RATE_LIMITEDToo many requests
NOT_FOUNDJob or resource not found
INTERNAL_ERRORUnexpected server error

MCP Server

scrape_ exposes an MCP server so AI agents can scrape pages directly. Authentication uses OAuth 2.0 — MCP clients handle the flow automatically.

scrape_url tool

ParameterRequiredDescription
urlYesURL to scrape
formatNomarkdown (default), html, or both
timeout_msNoTimeout in ms (default 30000, max 120000)
user_agent_profileNodefault, mobile, or tablet
promptNoLLM extraction prompt — response returned as JSON in the extract field

Setup

Add the following to your MCP client configuration. Replace the URL with your Convex site URL.

Claude Desktop / Claude Code

{
  "mcpServers": {
    "scrape": {
      "type": "http",
      "url": "https://small-giraffe-45.convex.site/mcp"
    }
  }
}

Cursor

{
  "mcpServers": {
    "scrape": {
      "url": "https://small-giraffe-45.convex.site/mcp"
    }
  }
}

On first use, your MCP client will open a browser window to authenticate via OAuth. After granting access, the token is cached and subsequent requests are automatic.

Questions?

Reach out at support@zaks.io.