Fake Links: Nofollow is Just the Beginning

We all know how important links are; the lifeblood of just about any SEO campaign. The more links point to a particular URL, the more important that page is assumed to be. Links!Like any rule of thumb, that's an over-simplification, but it's true enough that if you only know a few things about search engine optimization, that needs to be one of them.

Further, the anchor text of these links is assumed to describe what a page is generally about; if you want to rank for a particular phrase in a competitive landscape, you need help from off-page optimization. Out of about 150 million results, Adobe's Acrobat Reader ranks #1 for "click here" even though the text is nowhere on the page.

From Humble Beginnings

The relationship attribute of a hyperlink ( a element ) was given a new, and by now widely recognized value, nofollow, in January 2005. This new rel="nofollow" code in a link tells the major search engines not to count the link it's applied to in their ranking algorithms. Google introduced this new value to help bloggers deal with comment spam; by removing any SEO benefit from links on unmoderated sites, to make them less attractive targets.

Wikipedia upped the ante, applying nofollow to all outgoing links from their site, except to a handful of wikia sister projects and internal pages. WikipediaA high traffic, free encyclopedia that ranks well became a great spam target. Blog spam expanded into wiki spam.

Earlier this year, Google announced that it would like help identifying sites that sell links without using nofollow or some means of hiding the links from GoogleBot. Recent events show they're serious. Naturally, it didn't take long for webmasters to start using nofollow for any number of reasons; there are a few sprinkled into this post to illustrate the point. Can you tell which ones?

Rise of I-Follow or DoFollow

All of the major blogging platforms adopted nofollow in their comment links quickly. How could a change like this not see resistance? A movement calling itself "I-Follow" has sprung up, urging bloggers to remove the attribute from their comment links, and reward visitors who contribute something of value.

Not all that many blogs have come on board, partially because it takes some work. ( In WordPress, you can remove the one I Follow or DoFollowoccurance of the word nofollow from wp-includes/comment-template.php. ) These islands of what some in the SEO world are calling "DoFollow" links naturally attract more traffic and more comments. Whether it's worth it for the blog owner is an open question, but it's become public knowledge that certain blogs offer these types of links.

Beyond NoFollow

Predictably, neophytes to SEO who ask about link building are usually told to submit to directories, use forum signatures, and post comments on dofollow blogs. This is so common the D-List, which lists the 250 or so known dofollow blogs, has an Alexa score of 23,478. Statistics are usually misleading, and Alexa is particularly so; still there's a great deal of interest in knowing whether the sites people want to use for 'link building' offer search engine friendly links.

Several browser plugins have sprung up to answer the question. These add-ons, like SeoQuake, draw attention to nofollow links by rendering them in strike through, or red on yellow text. Unfortunately, this can lure people into a false sense of assuredness that a link is kosher, because these tools only highlight nofollow links. Note: It's bad form to comment on a blog purely for the link you might get out of it. It's also not very effective, because most of the blogs on the D-List moderate their comments actively.

Robots Meta Tag

This should come as no surprise; sites use meta tags to prevent search engines from respecting the links on a page. This is often done in forums to prevent spam attacks. Of course a person shouldn't participate in a forum just to drop links.

Stale Links

Because there's a thriving market for signature links on many forums, let's debunk this. If a person on a forum has 20,000 posts going back for several years and offers to sell you a link in his or her signature, that's 20,000 new backlinks pointed at your site! The trick is that search engines are paying more and more attention to "freshness," and crawling an ever expanding internet. GoogleBot isn't going to go back and crawl a four year old forum post, so it won't ever see many of these links. This isn't a way people intentionally cheat link seekers, but it pays to check the search engine's cache when in doubt.

JavaScript

We expect a search engine friendly, or SEF, link to look like this: <a href="http://www.example.com"</a>Here is the anchor text</a>, and for unfriendly links to look more like <a href="http://www.example.com" rel="nofollow"</a>Here is the anchor text</a>. Let's look at a simple way to get past software trained to look for this pattern:

<a href="#" onmouseover="this.href='http://www.example.com';">Anchor Text Here</a>

Try it. That's a basic implementation, but a pretty easy one to spot if you look at the page's code. Sites that use this approach, "faking the funk," tend to be a little more clever about it. A sneakier example might look more like this:
<a href="#" class="newTab" target="_blank">Anchor Text Goes Here</a>
...
<script type="text/javascript">
var links = document.getElementByTagName("a");
for(var i = 0; i < links.length; i++)
if(links[i].class == "newTab" && links[i].parentNode.id.contains("something"))
links[i].href="http://www.example.com";
</script>

To make this harder to spot, the javascript that fills in the link would go in an external file. And newTab would be defined in the stylesheet to make links that open a new window or tab look different from others.

What about people who don't browse with javascript enabled? ( All 4 % of them? )

Enter Cloaking

People have gone further with this type of deception. Cloak and DaggerCloaking means sending different content to a search engine spider than a user would get for the same request. This is usually done to get a better ranking somehow; an advanced form of hidden text. It's also been used to misrepresent links to webmasters. And because this runs at the server, to accommodate spiders who don't know jscript, it works on users who disable the stuff, too.

In most situations, this is a type of browser sniffing. Based on the user agent string, it's easy in PHP or ASP to determine whether a visitor is GoogleBot, Slurp, or Internet Explorer. This alone can get a site banned from most search engines, but then locks only keep honest people out; thieves break a window. FireFox users can pretend to be a search engine spider with User Agent Switcher, to determine whether Google sees a different type of link than they do.

Fake DoFollows in WordPress

Unlike Blogspot, WordPress can be installed on your own server, and then its code is open to modification. Let's take a look at the ways this can be done, since, unfortunately, blogs are too often used for link development. Faking SEF links leaves a noticeable footprint, so let's see how to recognize this.

The following code, again applied to comment-template.php, would be a rough start:
function get_comment_author_link() {
global $comment;
$url = get_comment_author_url();
$author = get_comment_author();
$return = "<span onmouseover=\"document.getElementById('comment_$comment->comment_ID').innerHTML='<span><a href=$url>$author</a></span>'; ><span
id='comment_$comment->comment_ID'><a href='#'>$author</a></span></span>";
return apply_filters('get_comment_author_link', $return);
}

What's important is that the comment links, or a close level of parent container has a unique ID. If a span containing the author link is numbered, this can be used with the document.getElementById function and the almost universal innerHTML property.

But Why?

This is a good question. Why would someone try to cheat webmasters this way? I've seen a handful of answers that run from legitimate to dirty and underhanded.

  1. AdSense serves up advertising links this way. Similarly, some bloggers use nofollow or javascript to show something to their users without passing link juice or incurring Google's wrath. The new guildlines explicitly demand the webmaster to prevent the link from passing any PR ... nofollow is one way to comply, client script is another.
  2. There are situations where you might want to link to your competition, without giving them a backlink. Smaller search engines ( which, combined, have a single digit percentage of the search market ) don't recognize nofollow.
  3. Fear of a bad neighborhood. This is what nofollow is for, but a few people will cheat any system. Using javascript instead of rel="nofollow" lets a person have their cake and eat it too; attract readers, get a slot on the high traffic D-List, and not give out real links in SEO terms.
  4. PR hoarding. There's a myth that ranking power, and PageRank in particular, can "leak" out of a page through external links. This isn't true. Still, the link juice of a page is spread out equally among all of its links, so more external links means a little bit less ranking power for your internal ones, like site navigation. This isn't going to make much difference, but some people seem overly concerned with this, especially those in competitive niches.
  5. I don't know, honestly, but I've seen things like this, and I've seen threads in SEO forums where people complain about being cheated out of a link.

Looking at the Why? question from another angle, I wrote this post to encourage people not to rely too much on tools like SeoQuake to spot search engine unfriendly links. If you're doing something for link juice, you need to check page's markup.

What About Organic Links...?

This article might seem to promote mercenary link building; that's not my intention. Still, there are some decisions webmasters will make based on the value of a link, so they should consider more than just whether that link uses nofollow. ( Traffic is a far better thing to look at, but another story )

The truth is the best links are usually the ones you don't create yourself or buy outright. The D-List is popular because a lot of link builders find it useful. It shows up [too] often when people ask about nofollow. Wikipedia does well in the search results because they provide free information that's usually pretty good. Provide useful, attractive content, and you'll find a lot of the links build themselves.

document.getElementById('tryItLink').href='www.google.com';

Category: 

20 Comments

Very interesting post, although there is an inaccuracy about the D-List. There's no such thing as an Alexa ranking for a single page.

My site, CourtneyTuttle.com has an Alexa ranking in that range, but it isn't the Alexa ranking for the D-List.

You're right; Alexa rank seems to be applied at the entire site level, a domain and all its subdomains ... which can naturally throw a person off. And it's a mistake to think everything you've done to attract a very respectable amount of traffic can all be attributed to that one page.

Also, while we're talking about it, Alexa uses statistical sampling: x % of people use their toolbar, so take the number who've visited a site and adjust it to the whole. Webmasters are far more likely than ordinary people to use the Alexa plug in, so sites catering to web design and development issues are more likely to see a high Alexa rank. Still, there really aren't many public traffic metrics ( like a Nielson rating? ), so we can only go with what's available...

Thanks for sharing with us.

I didn't know anchor texts were so important for ranking high. Thanks for pointing this out.

Nice post and thanks for the description on cloaking. I've heard of it and never go into it too much. How well does it work (not that I'd want to do it)? Looks like the blog is about SEO and related topics!! Right on

Aside from what Courtney pointed out, it's also very easy to strip nofollows out of WordPress comments, just get a plugin.

However, this post is excellent. A very accurate explanation of the rel="nofollow" attribute.

One thing you may want to add is the effectiveness of using rel="nofollow" to actually improve onces site structure.

btw, I followed your sig from Sitepoint to get here ;)

Jayson;

Cloaking can be an effective SEO technique, but only in the short term for most sites. It's ironic that carrying a lot of adsense so that the site's owner gets paid for ad clicks is what cloaking is normally used for, and a situation it won't work very long in.

The idea is to send Google something it will rank highly ( keyword rich title, headings, etc ) and send the users something you'll get paid for. In response, Google is rumored to send out robots that don't announce themselves as GoogleBots, to spot check pages and compare the response they get against the cache. They also take reports from users about cloaking, and it's natural to expect they take action when they see a threat to their search or ad business.

On the other hand, The New York Times is notorious for cloaking, and for them it's allowed. If one of their older stories comes up in a search, clicking into it takes you to a page asking you to sign up for a membership. If you click the little "cached" link in the SERP, you'll get the whole article. The same is true for Experts-Exchange.com, at IT forum.

Why these two sites are 'allowed' to cloak is anybody's guess, but if you or I tried it, we would find ourselves banned within days.

I understand the whole idea behind cloaking, that seems pretty clear, but what's actually involved in making that happen? I mean is it the kind of thing just anyone can do or is there something technical that lets the NYT do it?

cloaking is allowed also by google if you send visitors different pages because they are from different regions or there computer is set to a language other then English
by the way google uses cloaking for example if you go to a google.com page from Germany you will be sent to google.gr that is easy to do because they know your IP address.
so as long as you use cloaking not for SEO reason but because it makes sense to send a user a page in his language that is ok

Thanks for the tip how to strip wordpress from de "nofollow" tag in comments.Very usefull

Interesting articles !
So, do I understand you corectly that rel="nofollow" WILL limit the leakage of link juice from a sites internal links?

I ripped my personal wordpress blog from all "nofollow" attributes, thank you again very much for the technical tip..love linklove..

As to comment 11, I believe the internal links pass juice to the root domain, so there would be a benefit to having nofollow on most, if not all, external links and not on internal links.

The official claim is that links with the rel=nofollow attribute do not influence the search engine rankings of the target page. In addition to Google, Yahoo and MSN also support the rel=nofollow attribute.

i think it helps indexing...

Nice Post, Im not sure that I would have thought about nofollow in this sense but thanks for the good work..

one of my friends told me about a program that can tell you where your links are and which is their state, if they're active or have brought traffic, and also if there are some broken links that you built time ago and don't even remember, I dont' remember the exact name but it's sometihng like spyglass or about. if you can confirm it, it would be great.

Very useful post particularly the cloaking definition is too good. I have heard a lot about it but never got an ample opportunity to apply in my site. Framed my mind to apply once :)