Friday 14 March 2014

Prevent Googlebot from crawling some links, but still have them useable by the user

WHAT:

Prevent Googlebot from crawling some links, so that other more
important links on our page are given priority and indexed faster


HOW:

1. remove the link, replace with "#" so it's still a valid HREF
2. Put the value as data attribute, ie, data-goto
3. Reverse the value because if we had data-goto="/productlist.html",
Googlebot would still crawl it.
Instead we have data-goto="lmth.tsiltcudorp/"
4. On loading the page, run a javascript to go through all the links
that have this attribute, reverse the string and place it in the
"href" attribute. Googlebot can't run javascript so it never sees the
HREF values in there.

RETURNED BY SERVER

<a class="unseen-link" href="#" data-goto="lmth.tsiltcudorp/">Product List</a>


SEEN BY USER

<a class="unseen-link" href="/productlist.html"
data-goto="lmth.tsiltcudorp/">Product List</a>



function prepareMissingLinks() {
var stringReverse = function(string) {
return string.split('').reverse().join('');
};
var $hiddenLink = $('.unseen-link');

$hiddenLink.each(function() {
$(this).attr("href", stringReverse($(this).data('goto')));
});
}



I had an earlier implementation of this using the hover event but that
doesn't work for touchscreen devices. This works on desktop and
mobile, and is also simpler.