Online Documentation and SEO. Part 7 - Robots.txt

Posted in SEO on 9/11/20133 min read

Google - Robots.txt specifications

Typically, each web site has directories and pages that should not be indexed by search engines. For example, printed versions of web site pages, pages of the security system (registration, authentication). There may also be directories like administrator resources folder, various technical folders.

In addition, webmasters may want to give additional information about indexing to search engines. For example, the location of the sitemap.xml file.

Google Webmaster Tools - Blocked URLs

All these tasks are performed by the robots.txt file. It is just a text file of a specific format, and you put it on your web site (to the main directory) so that the web crawlers know how to properly index the web site contents. Full specification of this file format can be found in the Google Developers portal. Also, Google Webmaster Tools provide tools for the robots.txt file analysis to make sure you properly created the file - this function is in the Crawl - Blocked URLs section. So, if you happen to see some strange pages of your web site shown in the search results, or marked as indexed in the webmaster tools of the search engine you are using, you can easily fix this by creating a robots.txt file. There are a lot of tools on the Internet that can help you generate a correct file contents.

Please note: the robots.txt file is not a way to protect sensitive pages, it is just a set of directions web crawlers may follow to make sure they index useful content, and not service pages. The pages you restrict in the file will remain accessible by direct URLs, but will not be indexed.

Robots.txt and Online Documentation

When it comes to online documentation tools, or the tools that just publish documentation on the web, there are also pages you would like not to be indexed to avoid confused readers when they come to a page that was not supposed to be directly accessed from the search results. This may be login pages, error pages, printer-friendly versions of documents, document property pages, user profiles, etc.

Most documentation tools that generate online versions of documentation also generate a number of service pages. So, when publishing documentation online, technical writers will benefit from understanding the contents of the files genrated, so that they can create a good robots.txt file to avoid indexing certain pages. Technical writers have never been webmasters to think about this aspect, but now they may have to. The robots.txt file syntax is not complex for basic tasks, so it is a good idea to check some examples and find out how to close a specific page or directory from indexing. There are a lot of examples on the Internet - you can easily find them!

In ClickHelp, we took care of this part of SEO as well - the pages that should not be indexed are closed in robots.txt, so you don’t have to think about this. You can try all benefits of our technical writing software using Free ClickHelp Trial.

This is a continuation of a blog post series on the SEO aspect of technical writing. You can find the first post here: Part 1 – Human-Readable URLs.

Happy Technical Writing!
ClickHelp Team


Are Your Manuals Ready for Mobile World?
Download Free Ebook

Want to become a better professional?

Get monthly digest on technical writing, UX and web design, overviews of useful free resources and much more.
Like this post? Share it with others:

Have Not Found What You Need?

Ask us and get an answer within 24 hours!
All channels are monitored 24x7.
Your e-mail (for our response):*
Inquiry text:*

Mind if we email you once a month?

Professionals never stop learning. Join thousands of experts subscribed to our blog to get tips, ideas and stories from us once per month.
Learn more on ClickHelp