GET /api/robots-parse
Robots Parse
Fetch and parse robots.txt for any site. Accepts a full URL or domain, resolves the robots.txt from the site origin, and returns crawler rules, sitemap references, and user-agent groups as clean JSON. Results are cached for 60 seconds.
Query parameters
urlrequiredA site URL to inspect. If a path is provided, TinyUtils will fetch robots.txt from the site's origin.
Example request
curl
curl "https://tinyutils.dev/api/robots-parse?url=https://example.com"
JavaScript (fetch)
const res = await fetch( "https://tinyutils.dev/api/robots-parse?url=https://example.com" ); const data = await res.json();
Example response
{
"ok": true,
"input_url": "https://example.com",
"robots_url": "https://example.com/robots.txt",
"found": true,
"status": 200,
"content_type": "text/plain",
"sitemaps": [
"https://example.com/sitemap.xml"
],
"groups": [
{
"user_agents": ["*"],
"allow": ["/public/"],
"disallow": ["/admin/"],
"crawl_delay": null,
"host": null
}
],
"meta": {
"responseTimeMs": 74,
"cached": false,
"rateLimitedScope": "global"
},
"error": null
}Not found response
{
"ok": true,
"input_url": "https://example.com",
"robots_url": "https://example.com/robots.txt",
"found": false,
"status": 404,
"content_type": null,
"sitemaps": [],
"groups": [],
"meta": {
"responseTimeMs": 61,
"cached": false,
"rateLimitedScope": "global"
},
"error": null
}Error response
{
"ok": false,
"input_url": "not-a-url",
"robots_url": "",
"found": false,
"status": null,
"content_type": null,
"sitemaps": [],
"groups": [],
"error": "INVALID_URL",
"meta": {
"responseTimeMs": 12,
"cached": false,
"rateLimitedScope": "global"
}
}Error codes
INVALID_URLThe URL is missing or malformed.
BLOCKED_HOSTThe hostname is private, reserved, or internal.
TIMEOUTThe upstream request timed out.
NETWORK_ERRORThe upstream request failed.
RATE_LIMITEDThe shared request limit was exceeded.
INTERNAL_ERRORAn unexpected server error occurred.
Rate limiting
Requests are rate-limited via a global pool shared with url-check, dns-lookup, url-resolve, http-headers, and ssl-check. Results are cached for 60 seconds to reduce duplicate requests. When the limit is exceeded, the endpoint returns HTTP 429 with RATE_LIMITED.
See also
Looking for use-case guidance? Robots.txt Parser API walks through common inspection scenarios and integration patterns.