httpsexamplecomrobotstxt
httpsexamplecomrobotstxt is the name used to refer to the robots.txt file at the root of the domain example.com accessed over HTTPS. Robots.txt is a plain text file that communicates with web crawlers and is part of the robots exclusion protocol. Its purpose is to guide compliant bots about which parts of a site may be crawled or indexed.
The file must be located at the site root, for example at https://example.com/robots.txt, and is read by
Importantly, robots.txt is a voluntary convention and does not prevent access by users or by crawlers that
Example content typically found in a robots.txt file:
Sitemap: https://example.com/sitemap.xml
This sample illustrates a common pattern, where general directives apply to all crawlers, with specific blocks
---