Robots.txt
Robots.txt is a plain text file at the root of your website that tells search engine crawlers which pages they should and shouldn't access. It controls crawler access — not whether a page appears in the index.
Robots.txt uses a simple allow/disallow syntax to specify which crawlers can access which URL paths. A disallow rule for /admin/ tells Googlebot not to crawl any URLs under that path. The file is publicly accessible — any crawler can read it, and humans can too, which is why it shouldn't be used to protect sensitive information.
The critical distinction: robots.txt controls crawling, not indexing. If a page is disallowed in robots.txt, Googlebot won't crawl it — but if other sites link to that page, Google can still index it as an uncrawled URL, displaying a result with no description. To actually remove a page from the index, you need a noindex tag (which requires the page to be crawlable) or a URL removal request in Search Console. Disallowing a page in robots.txt while expecting it to disappear from search results is a common and consequential mistake.
The most damaging robots.txt error is accidentally disallowing your entire site — often caused by a misplaced slash in the Disallow field. Another common mistake: blocking JS or CSS files, which prevents Google from rendering your pages correctly. If Googlebot can't load your CSS and JavaScript, it can't understand your page's visual layout, which affects ranking quality assessments.
For most content sites, robots.txt changes are infrequent — you set it once to block admin areas, staging paths, and internal search results, then rarely touch it. Treat it as infrastructure, not a tool for regular SEO work.
Prevents Googlebot from crawling admin panels, staging environments, and internal tools — keeping the crawl focused on content that belongs in search results
Blocking JS and CSS files in robots.txt breaks Google's page rendering — an audit of these rules is worth running on any site experiencing unexplained traffic drops that technical issues might explain
Does not prevent indexing of disallowed pages — understanding this limitation prevents a common mistake where teams disallow pages expecting them to disappear from search, then discover they're still indexed
Want to put this into practice?
Content Torque builds B2B content programs that apply every one of these principles. Book a free strategy call.
Book a free callExplore More Terms
Full glossaryCrawl Budget
Crawl budget is the number of pages Googlebot will crawl on your site within a given timeframe — determined by your server's capacity and how much Google values your content. For most B2B content sites it isn't a daily concern, but at scale it becomes critical.
SEOCanonical Tag
A canonical tag is an HTML element that tells search engines which version of a page is the "official" one when similar or duplicate content exists at multiple URLs. It consolidates ranking credit to a single preferred URL.
SEOPillar Page
A pillar page is a comprehensive, long-form piece of content that covers a broad topic in depth and serves as the anchor for a topic cluster.
SEOKeyword Intent
Keyword intent (also called search intent) is the underlying goal a searcher has when they type a query — informational, navigational, commercial, or transactional.
SEOGEO (Generative Engine Optimization)
Generative Engine Optimization (GEO) is the practice of structuring content so it gets retrieved and cited by AI tools like ChatGPT, Perplexity, and Google AI Overviews.
SEOInternal Linking
Internal linking is the practice of linking from one page on your website to another, used to pass authority between pages and guide readers through related content.
