SEO 101 Series – Canonicalisation Issues

This will be the first blogpost in our new SEO 101 Series where the SEO-Chicks will be going through some basic, and some more advanced search engine optimisation principles. I have decided to start with an area within technical SEO; “Canonicalisation”.

One of the reasons I chose this subject is that for some reason this area always seems to be one of those areas of SEO that somehow gets forgotten, even when a company are doing SEO. It amazes me! In the last 9 months, since I’ve set up Verve Search, I have been doing more SEO evaluations than you can shake a stick at. And the most occurring problem I evaluate for a website is canonicalisation issues, and it can have such a massive impact on your SEO efforts. Let’s start from scratch.

First of all I find it very amusing that the term “canonicalisation” isn’t actually the appropriate description or term for the actual problem. In fact “canonicalisation” is the term for the solution to the problem rather than the name for the problem itself. But in SEO we have taken to referring to both the problem and solution when using the term “canonicalisation”. How the heck that happened I don’t know. In fact I recon we should take it upon ourselves to come up with a term that explains the problem. Clients particularly have a epileptic fit when you mention a word like “canonicalisation”, took me years just to pronounce it for petes sake.

The word Canonicalisation derives from mathematics and simply put means; the process of choosing one where there are several choices. Canonical = preferred! So in SEO terms it is referring to choosing one preferred URL where there several URLs loading the same page (content).

canonicalisation

So how do you get several URL versions of the same page? Short answer, a dozen ways. The most common, and also in my opinion the most unnecessary and unforgiving example of canonicalisation issues are these:

www.samplesite.com

http://samplesite.com

www.samplesite.com/index.php (or .htm .asp or whichever)

http://samplesite.com/index.php

99% of the time your website will have a www version and a non www version of each page. The site is also likely to have a /index version or sometimes even a /home version.

Now, what happens when the search engine spider arrives on your site is this:

CanonicalisationConfusedSpider1

The spider will only chose ONE of these versions for its index, if you have no canonical “signs” it will chose as it pleases (although yes most of the time the spider will chose the www.vervesearch.com version). So what’s the problem with this? Let’s say for argument sake that the search engine spider chose the non www version but as you can see per below illustration this version of the homepage actually has the LEAST amount of links!

LinksToPages

Because all of these URLs load the homepage, people can link to any of these versions. Most likely people will link to the www version, but quite often you will find that if someone has gone further in to your site then hit the home button this will load the /index versions. And sometimes people don’t use www before the domain name when they type it straight into the browser. So what is happening here? You are DILUTING YOUR LINKS! If the spider indexed the above version in the example it would mean the likelihood of ranking highly (for whatever keyword you have optimised for) will be significantly less than if the spider ranked the www version! Thus the biggest problem with “canonicalisation issues” is dilution of your “link juice”/link equity! There are also a dozen other examples of even worse instances of “canonicalisation issues”, if you have an ecommerce site for example it’s probably 99% chance you have canonicaliation issues. Usually a website with loads of products will have different routes of getting to the same end product; i.e one product will be in several categories thus creating duplicate URLs with the same content!

To fix it we need to make sure the search engines know which URL is the preferred URL and also take precautions to carry over the link juice from the “duplicate URLs”, so to speak, to the preferred/canonical URL.

RedirectDuplicateURLs

You essentially have two options for carrying over the link juice. Either by using a ‘301 redirect’ or use of the (relatively new) ‘canonical link tag’. Or even both. In fact for the above example of Verve Search I used a 301 redirect for the non www version and a canonical tag on the /index version.

Using a 301 redirect
In my opinion using a 301 redirect is still the “safest” way to make sure the link equity gets carried over. In addition using 301 redirects can be used across domains, whilst the canonical link tag can only be used within a domain. I would say the only downside with 301 redirects is that this will need to be carried out on a server level so if you are not a programmer or at least very familiar with how this is done it can easily be done wrong. I would recommend this is left to a programmer or technical SEO (I have seen instances where 301 redirects implemented wrong has created infinite loops and all sorts!)

Setting up 301 redirects and how it is done very much depends on your server. If your site is on a UNIX or Apache server it’s relatively simple.

Here’s how to do use a 301 redirect using the .htaccess file:
1. Locate the .htaccess file on your server. If you don’t have one, just create one using a textfile (notepad) and name it .htaccess (if you already have a .htaccess file make sure you start the new redirect instructions scroll down to the end of the previous instructions)

2. Put in your redirect information, which would be something like this
redirect 301 /directory/file.html http://www.domainame.com/directory/file.html

NB! The first part “/directory/file.html” is the location of the file being moved and the second part “http://www.domainame.com/directory/file.html” is where the file is being moved.

3. Upload the file to your server.

On the other hand, If your site is on IIS Server (eeek) the procedure for implementing 301 redirects is totally different.

For 301 redirects on IIS do the following:

1. Click on the file or folder you want to redirect within the internet services manager.
2. Select “a redirection to a URL”.
3. Enter the URL that you want to redirect to
4. Make sure you check “The exact url entered above” and the “A permanent redirection for this resource”.
5. Click Apply

I would also like to point out this very usefully article by Steven Hargrave for specifics on doing 301 redirects for pretty much any script, including Perl, Ruby, Java and 301 redirect with mod rewrite.

Using a Canonical Tag
In theory the canonical meta tag should be just as effective as a 301 redirect. But as it’s still relatively new I’m finding it hard to trust it yet (I would love to get your feedback on your experiences using this tag though.) Here’s in essence how the tag works: the “canonical tag” lets you specify in the HTML header that the URL in question should be treated as a “copy” and names the canonical URL that all link authority and content metrics should flow back to.

The canonical link tag is very easy to implement, and you don’t need access to the server. Here’s how:

Within the HTML Header (Meta information) of the pages you want to “redirect” the link juice from, implement the following tag:

canonciallinktag

The canonical tag should be implemented on every URL you have that is loading the same page. For more information on the canonical link tag, check out this article on the subject by Matt Cutts.

The canonical link tag is ideal for large products and ecommerce sites, I also usually use the canonical link tag on all /index version of pages as I’m a bit worried about 301 redirecting /index versions as these are essentially the physical version of the actual page.

In allot of cases sorting out your canonicalisation issues will have a BIG impact on your SEO efforts, in fact I would say that when starting out the SEO process on a website, fixing the canonicalisation issues should be the first port of call. Doing SEO on a site that is “bleeding” link authority due to canonicalisation, is a bit like building a sandcastle using a sieve!!!

Liked this? View all posts in SEO 101 Series

23 Responses to “SEO 101 Series – Canonicalisation Issues”

  1. Chewie says:

    Just something to keep in mind Lisa, if one is moving a site to a new domain or mass moving any sort of pages you could use dynamic 301′s and do something like…

    RewriteRule ^([^/]+)/([^/]+).html?$ http://www.domain.com/$1/$2.html [L,R=301]

    instead of manually typing…

    redirect 301 /directory/file.html http://www.domainame.com/directory/file.html

    for each page :o )

    Of course you can also use rewrite rules to make pretty url’s from shabby query string type URLS, e.g…

    RewriteRule ^addidas/runningshoes/superstarsXYZ/1/?$ addidas/products.php?category=mens&inner-category=shoes&product_name=superstarsXYZ&id=1 [L]

    Or, lets not make more work for ourselves huh?

    RewriteRule ^addidas/([^/]+)/([^/]+)/([^/]+)/?$ addidas/products.php?category=mens&inner-category=$1&product_name=$2&id=$3 [L]

    Regular expressions FTW, someone *may* find this useful, i dunno?

    Chu

  2. Neil says:

    Oh that’s how it works! :-) Thanks Lisa. Very useful.
    (However do you really mean epileptic fit? You might want to review that bit to avoid offence :-) – just a thought)

  3. Lisa Myers says:

    @Chewie: Thanks mate. I didn’t include the more complicated codes as its a SEO 101 blogpost :) but much appreciate the input.

    @Neil: It’s a SEO 101 blogpost so alot of you will already know this :) With regards to the “epileptic fit” reference, it is not meant in an offensive way at all. I have a close family member with epilepsi but doubt they would take offense to my blogpost. But thanks for pointing out.

  4. Lisa. THANK YOU for this post. I have been longing for something to reference in a two month battle to get ‘client x’ technical integration partner to sort out their .ht access file. You’ve managed to get a technical post to read so well.

  5. Nice once Lisa – this technical ‘issue’ probably is number one when it comes to SEO problems on existing and new websites.

    @Chewie – correct – bear in mind that it only works when moving a existing site with its current URL structure.

  6. Neil says:

    No problem. I know you didn’t mean it to be offensive Lisa :-)
    Look forward to reading more in this series.

  7. Brilliant blog post! I’m going to have to reference it when I send things out to clients as it makes it easier for them to understand ;-)

    Fantastic – really super work! Thank you so much for getting this out there – totally needed!

  8. Lisa Myers says:

    Thanks guys….

    @Chewie or anyone else, had any experience and/or done testing on the canonical tag?

  9. [...] SEO 101 Series – Canonicalisation Issues « SEO Chicks [...]

  10. g1smd says:

    There are consequences (all of them bad) if you mix code from mod_rewrite and mod_alias in the same .htaccess file.

    Stick with RewriteCond and RewriteRule, not Redirect or RedirectMatch, directives for a simple and sane SEO life.

  11. Craig Mullins says:

    What are the consequences (all of them bad) if you mix code from mod_rewrite and mod_alias in the same .htaccess file. (hehe)

    Why would you worry about redirecting an index file to the folder? Fix the links the ding bat web designers made and redirect.

  12. g1smd says:

    Directives are evaluated per module, not in the raw order they appear in your .htaccess file.

    Mixing directives from different modules may lead to unwanted multiple step redirection chains, and/or to the exposing of internal target filepaths from internal rewrites back out into external URLs.

    You might get away with using such code on one server and then an upgrade to the server software or a move to a different server might make that code silently ‘fail’ when used with the new configuration.

    Such errors might silently erode your search rankings with little or no obvious clues as to what is going on until it is too late.

    .

    Why redirect index URLs back to slash-ended URLs?

    Simply, eliminating multiple URLs returning “200 OK” for the same content gives the crawler less work to do, removes the risk of Duplicate Content indexing, and consolidates external incoming linking benefits.

  13. Diane says:

    A useful article to bookmark and then pass on to everyone else I know. When the newbies start asking questions it’s nice to have concisely written articles to refer them too!

  14. Alphonse Ha says:

    Since you are talking about 301 redirect, I want to ask you and the people who posted comments, do any of you have experience with creating a 301 redirect using PHP?

    I am aware of the htaccess file but I am currently working on a site that is using a php command that is doing a permanent redirect but I am not sure if it is really working.

    I am concerned that those pages might be “bleeding” juice.

    Thanks.

  15. g1smd says:

    I assume that you have used the PHP HEADER directive.

    You’ll actually need two HEADER directives. One to send the 301 status code, and the other to send the URL you’re redirecting to. If you omit the status code, you’ll get an unwanted 302 redirect instead.

    Check the code works by using the Live HTTP Headers extension for Mozilla-based browsers. You should see a single step 301 redirect from one URL to the other, and the final URL returning “200 OK”.

  16. Craig Mullins says:

    Check the headers of the server response. Their are many tools and firefox addons to do it….

    Here’s a website to make it easy. I like this one because it shows all redirects, not just the first one like many tools…

    http://www.seoconsultants.com/tools/headers/

    You can also pick your browser/user agent, http protocol and method…

  17. Andy Williams | Impact Media says:

    Excellent blog post and this will be a very good point of reference to use when recommending this action to clients.

  18. Alister on seo says:

    A redirect is the best way true, but some hosting companies won’t allow you to create your own .htaccess file and so you have to use more complicated ways around it, if even those are available to you.

  19. Luci says:

    Great post! It’s brilliant to see the issue broken down and explained piece by piece. I think I’ll bookmark it for those ‘blank’ days when i need a refresher, thanks!
    What’s next in the 101 Series?

  20. @Luci – it’s me next with Data Analysis 101!

  21. Tom telford says:

    Nice one – this is a great non-geeky reference-point for explaining the issue (again) to ‘experts’ who know beter! Thanks.

  22. Lisa Myers says:

    Thanks to everyone for their great comments and input :) !

  23. Dominic says:

    Thank you very much for this blog – helped me to understand some basics and parts of a quotation :-)

Leave a Reply