Dynamic robots.txt handler

This is just a quick follow-up on my previous blog post about creating a dynamic robots.txt handler.

Having a robots.txt tells the search engines what content to index, but it’s not a guarantee that a search engine won’t index your content.

The problem is that if someone links to your content, or you do it yourself, the linked content will be indexed. In that case, it doesn’t matter whether the content is in the robots.txt, the search engine will index the page, as it found a link to it. The only way to be sure that the engine doesn’t index your content is to put up proper meta tags on the page.

As we already has the functionality for the editors to mark what content that shouldn’t be index (via the robots.txt), we reused the settings to create proper meta tags on our content.

To implement the function, we needed the meta part of the page to be dynamic; this was simply done in a simple sublayout that was included on all pages. Containing the following code-in-front content:

<title><%=MetaTitle %></title>
<meta name="title" content="<%=MetaTitle %>" />
<meta name="description" content="<%=MetaDescription %>" />
<meta name="keywords" content="<%=MetaKeywords %>" />
<meta name="robots" content="<%=Robots %>"/>

This part well concentrate on is the robots tag. It ensures that the robots tag is written on all pages on the site. For normal pages that shouldn’t be indexed, we simply write "index, follow". This means that a search engine should index the page, and follow the links provided in the page.

        protected string Robots
        {
            get
            {
                var robotsValue = "index,follow,noodp";
                // get item that has defined the robots section
                var siteRoot = WebUtil.GetSiteRoot(Sitecore.Context.Item);
                if (siteRoot != null)
                {
                    // get the field containing items that shouldne be indexed
                    var noIndexItems = (MultilistField) siteRoot.Fields["Noindex items"];
                    if (noIndexItems != null)
                    {
                        // gets the current url
                        var currentUrl = HttpContext.Current.Request.Url.ToString().ToLower();
                        foreach (var item in noIndexItems.GetItems())
                        {
                            var itemLink = LinkManager.GetItemUrl(item).ToLower();
                            // match wheather the item is part of the current url
                            if (currentUrl.Contains(itemLink))
                            {
                                // if we have a match, we tells it not to follow
                                robotsValue = "noindex,follow,noodp";
                                break;
                            }
                        }
                    }
                }
                return robotsValue;
            }
        }

The code is quite simple, and is checking whether the current url is part of one of the links in the no-index field. If it is, we tell the search engine not to index it. The function will add noindex to the items in the field, and all of its children. We implemented it this way, as our customer had a FAQ that they didn’t want indexed yet.

mikkelhm on .NET

Dynamic robots.txt handler - Meta tags