robotstxtEinstellungen
The `robots.txt` file is a simple text file used by website administrators to instruct web crawlers, such as search engine bots, how to access and index pages on their website. It is placed in the root directory of a domain (e.g., `https://example.com/robots.txt`) and follows a standardized syntax defined by the [Robots Exclusion Protocol](https://www.robotstxt.org/). While it does not enforce restrictions—bots may ignore or override these rules—it serves as a guideline for proper crawling behavior.
A basic `robots.txt` file consists of directives for specific user agents (e.g., `Googlebot`, `Bingbot`) and paths
- `User-agent: *` – Applies rules to all bots.
- `Disallow: /` – Blocks all access to the website.
- `Allow: /public/` – Permits access to a specific directory.
- `Sitemap: https://example.com/sitemap.xml` – Specifies a sitemap location for bots to discover.
Comments in the file begin with a `#` and are ignored by crawlers. For example:
`# Block search engines from indexing admin pages`
While `robots.txt` is useful for directing bots, it should not be relied upon for sensitive content protection.
Testing and validating a `robots.txt` file can be done using tools like Google’s [Robots Testing Tool](https://www.google.com/webmasters/tools/robots-testing-tool)