Helping Search Engines Find Your XML SitemapIn May of 2006, Google announced that it was supporting a type of file that they named a Google Sitemap. Like a traditional sitemap for website visitors, this file is a complete (or partial) listing of pages within a website. However, unlike a traditional sitemap, the Google Sitemap is not intended for humans to read, but rather for Google to scan. Because XML (Extensible Markup Language) is easily parsed, it was chosen as the format for this new sitemap. Therefore an XML Sitemap is essentially a file that lists the URLs of a given website using the XML language. Shortly thereafter, all the major search engines agreed to use this standard. The file is now referred to as an XML Sitemap as opposed to a Google Sitemap. An XML Sitemap is one of the best ways to make sure that a Search Engine is aware of all of your URLs.
Until a couple of months ago, the only method for webmasters to make the location of their Sitemap known to the search engines was by manually submitting the location directly to the relevant search engine. Google used its Webmaster Central tools and Yahoo, its SiteExplorer Application. There was simply no way to make the XML Sitemap known to Microsoft. Fortunately, things have progressed and there is now a new and efficient standard that makes it easy for all search engines to find an XML sitemap by placing the location of the sitemap in a robots.txt file. This new convention is often referred to as "Sitemaps Auto-Discovery."
What is a robots.txt file? A robots.txt file is a file that webmasters have traditionally used to tell search engine spiders which files and directories they do not want indexed. If used, it resides in the root directory of a website (http://YOURDOMAIN/robots.txt). Most spiders respect the requests of this file. Simply adding the following line to a robots.txt file, enables the spider to find the XML Sitemap:
Sitemap: http://YOURDOMAIN/YOURSITEMAPFILENAME
It is important to note that this method should only be used if you are using an XML Sitemap. Merchants interested in learning more about creating XML Sitemaps and Robots.txt file should take a look at these resources:
XML Sitemap: http://www.sitemaps.org/
Robots.txt: http://www.robotstxt.org/
Is it worth the effort to create a sitemap and point to it from a robots.txt? If the search engines seem to index every page your website, you may see little to no gain from taking on this project. On the other hand, if you have pages that aren't being picked by some of the search engines, you will see benefits from this exercise. To quickly test whether a site's pages are indexed in a specific search engine, simply type this command into the search box:
site:YOURDOMAIN
The resulting page should list the number of files indexed and contain only results from the site typed. The more pages that are in a search engines index (excluding duplicates), the more opportunities you have to increase free traffic to your site. Using Robots.txt to identify a Sitemap makes the submission process a breeze. Just be sure to update your XML Sitemap as pages are added or modified your site.