Robots Parse

Fetch and parse robots.txt for any site. Returns crawler rules, sitemap references, and user-agent groups as clean JSON.

GET /api/robots-parse

Try it

Enter a URL

{
  "ok": true,
  "input_url": "https://example.com",
  "robots_url": "https://example.com/robots.txt",
  "found": true,
  "status": 200,
  "content_type": "text/plain",
  "sitemaps": [
    "https://example.com/sitemap.xml"
  ],
  "groups": [
    {
      "user_agents": ["*"],
      "allow": ["/public/"],
      "disallow": ["/admin/"],
      "crawl_delay": null,
      "host": null
    }
  ],
  "meta": {
    "responseTimeMs": 74,
    "cached": false,
    "rateLimitedScope": "global"
  },
  "error": null
}

What it returns

•ok — whether the request succeeded
•input_url — the original URL submitted
•robots_url — the robots.txt URL that was fetched
•found — whether robots.txt was found
•status — the robots.txt HTTP status code
•content_type — the returned content type if present
•sitemaps — sitemap URLs declared in robots.txt
•groups — parsed user-agent groups and rules
•meta.responseTimeMs — total request time
•meta.cached — whether the result came from cache
•error — error code if the request failed

Use cases

•Check whether a site exposes robots.txt
•Inspect crawler allow/disallow rules
•Find sitemap declarations quickly
•Debug crawl issues during SEO work
•Validate robots.txt in scripts or CI

Quick API examples

curl

curl "https://tinyutils.dev/api/robots-parse?url=https://example.com"

JavaScript (fetch)

const res = await fetch(
  "https://tinyutils.dev/api/robots-parse?url=https://example.com"
);
const data = await res.json();
console.log(data.sitemaps);
console.log(data.groups);