GET /api/robots-parse

Robots Parse

Fetch and parse robots.txt for any site. Accepts a full URL or domain, resolves the robots.txt from the site origin, and returns crawler rules, sitemap references, and user-agent groups as clean JSON. Results are cached for 60 seconds.

Query parameters

urlrequiredA site URL to inspect. If a path is provided, TinyUtils will fetch robots.txt from the site's origin.

Example request

curl

curl "https://tinyutils.dev/api/robots-parse?url=https://example.com"

JavaScript (fetch)

const res = await fetch(
  "https://tinyutils.dev/api/robots-parse?url=https://example.com"
);
const data = await res.json();

Example response

{
  "ok": true,
  "input_url": "https://example.com",
  "robots_url": "https://example.com/robots.txt",
  "found": true,
  "status": 200,
  "content_type": "text/plain",
  "sitemaps": [
    "https://example.com/sitemap.xml"
  ],
  "groups": [
    {
      "user_agents": ["*"],
      "allow": ["/public/"],
      "disallow": ["/admin/"],
      "crawl_delay": null,
      "host": null
    }
  ],
  "meta": {
    "responseTimeMs": 74,
    "cached": false,
    "rateLimitedScope": "global"
  },
  "error": null
}

Not found response

{
  "ok": true,
  "input_url": "https://example.com",
  "robots_url": "https://example.com/robots.txt",
  "found": false,
  "status": 404,
  "content_type": null,
  "sitemaps": [],
  "groups": [],
  "meta": {
    "responseTimeMs": 61,
    "cached": false,
    "rateLimitedScope": "global"
  },
  "error": null
}

Error response

{
  "ok": false,
  "input_url": "not-a-url",
  "robots_url": "",
  "found": false,
  "status": null,
  "content_type": null,
  "sitemaps": [],
  "groups": [],
  "error": "INVALID_URL",
  "meta": {
    "responseTimeMs": 12,
    "cached": false,
    "rateLimitedScope": "global"
  }
}

Error codes

INVALID_URLThe URL is missing or malformed.

BLOCKED_HOSTThe hostname is private, reserved, or internal.

TIMEOUTThe upstream request timed out.

NETWORK_ERRORThe upstream request failed.

RATE_LIMITEDThe shared request limit was exceeded.

INTERNAL_ERRORAn unexpected server error occurred.

Rate limiting

Requests are rate-limited via a global pool shared with url-check, dns-lookup, url-resolve, http-headers, and ssl-check. Results are cached for 60 seconds to reduce duplicate requests. When the limit is exceeded, the endpoint returns HTTP 429 with RATE_LIMITED.