Robots Parse
Fetch and parse robots.txt for any site. Returns crawler rules, sitemap references, and user-agent groups as clean JSON.
GET /api/robots-parse
Try it
{
"ok": true,
"input_url": "https://example.com",
"robots_url": "https://example.com/robots.txt",
"found": true,
"status": 200,
"content_type": "text/plain",
"sitemaps": [
"https://example.com/sitemap.xml"
],
"groups": [
{
"user_agents": ["*"],
"allow": ["/public/"],
"disallow": ["/admin/"],
"crawl_delay": null,
"host": null
}
],
"meta": {
"responseTimeMs": 74,
"cached": false,
"rateLimitedScope": "global"
},
"error": null
}What it returns
- •ok — whether the request succeeded
- •input_url — the original URL submitted
- •robots_url — the robots.txt URL that was fetched
- •found — whether robots.txt was found
- •status — the robots.txt HTTP status code
- •content_type — the returned content type if present
- •sitemaps — sitemap URLs declared in robots.txt
- •groups — parsed user-agent groups and rules
- •meta.responseTimeMs — total request time
- •meta.cached — whether the result came from cache
- •error — error code if the request failed
Use cases
- •Check whether a site exposes robots.txt
- •Inspect crawler allow/disallow rules
- •Find sitemap declarations quickly
- •Debug crawl issues during SEO work
- •Validate robots.txt in scripts or CI
Quick API examples
curl
curl "https://tinyutils.dev/api/robots-parse?url=https://example.com"
JavaScript (fetch)
const res = await fetch( "https://tinyutils.dev/api/robots-parse?url=https://example.com" ); const data = await res.json(); console.log(data.sitemaps); console.log(data.groups);