About Cmdsbot

Cmdsbot is the official crawler for Cmds Search. It helps us discover and index public webpages so they can appear in our search results. Our crawlers follow standard protocols to ensure site owners control what does and doesn’t appear in search results.

Crawlers

cmdsearchbot/1.0 — Primary crawler for fetching and indexing page content.
cmdsearchbot-image/1.0 — Fetches images to collect metadata only.
cmdsearchbot-verify/1.0 — Fetches verification tokens for site ownership checks in Search Console.

Each crawler identifies itself clearly in the User-Agent string and links back here.

These are the only Cmds Search crawlers currently in use. Whilst Cmds does have non-Search bots, any unlisted, unverified, or modified user agents claiming to be Cmdsbot should be treated with caution.

Robots.txt

Cmdsbot will obey all robots.txt directives for it’s specific user agent. You can control access and discovery with standard directives such as User-agent, Allow, Disallow, Sitemap, and Crawl-delay.

Cmdsbot will cache the robots.txt file for 7 days. It will be automatically refreshed when needed after the cache period has expired. Please note, changes to robots.txt may take time to apply due to caching.

Example:

User-agent: cmdsearchbot
Disallow: /private/
Crawl-delay: 2

A robots refresh can be triggered anytime from Search Console.

Page Meta

We honor page-level robots rules, including:

nocrawl - Tells our crawler not to fetch the page at all.
noindex - Allows crawling, but the page won't appear in search results. We still fetch it to see links, but it won't be listed.
none - Alias for both of the above directives.

Example:

<meta name="robots" content="noindex">

Crawl Behavior

Cmdsbot waits ~5 seconds between requests (unless manually triggered)
Retries up to 3 times on transient errors (500, 502, 504).
Does not attempt unsupported protocols (only HTTP/1.1 and HTTP/2).
Accepts encodings: gzip, deflate, and br.

If it’s the first time our bot has seen your domain, or the resources are due for a refresh, Cmdsbot may attempt to infer resources that may or may not exist, such as robots.txt, sitemap.xml, and favicon.ico.

Please note, Cmdsbot may still fetch a page to check in-page robots directives, even if those directives later prevent the page from being indexed.

Identify Cmdsbot

Our crawlers use a clear User-Agent string including cmdsearchbot and a link back to this page, such as:

Mozilla/5.0 (...) Safari/537.36 (compatible; cmdsearchbot/1.0; +https://search.cmds.media/bot)

You can also verify whether a request originates from an official Cmds crawler using our verification API:

Endpoint:
GET /verify-bot?ip=

If no IP is provided, the requester’s IP is checked.
Rate limit: 15 requests per 30 seconds.

Example response:

{
  "bot": null,
  "ip": "8.8.8.8",
  "note": "Unverified or non-Cmds crawler",
  "verified": false
}

Contact

If you have any questions or want to report crawler behavior, please reach out at support@cmds.media.