SEO 101 Series – Canonicalisation Issues

This will be the first blogpost in our new SEO 101 Series where the SEO-Chicks will be going through some basic, and some more advanced search engine optimisation principles. I have decided to start with an area within technical SEO; “Canonicalisation”.

One of the reasons I chose this subject is that for some reason this area always seems to be one of those areas of SEO that somehow gets forgotten, even when a company are doing SEO. It amazes me! In the last 9 months, since I’ve set up Verve Search, I have been doing more SEO evaluations than you can shake a stick at. And the most occurring problem I evaluate for a website is canonicalisation issues, and it can have such a massive impact on your SEO efforts. Let’s start from scratch.

First of all I find it very amusing that the term “canonicalisation” isn’t actually the appropriate description or term for the actual problem. In fact “canonicalisation” is the term for the solution to the problem rather than the name for the problem itself. But in SEO we have taken to referring to both the problem and solution when using the term “canonicalisation”. How the heck that happened I don’t know. In fact I recon we should take it upon ourselves to come up with a term that explains the problem. Clients particularly have a epileptic fit when you mention a word like “canonicalisation”, took me years just to pronounce it for petes sake.

The word Canonicalisation derives from mathematics and simply put means; the process of choosing one where there are several choices. Canonical = preferred! So in SEO terms it is referring to choosing one preferred URL where there several URLs loading the same page (content).

canonicalisation-8072393-6956693

So how do you get several URL versions of the same page? Short answer, a dozen ways. The most common, and also in my opinion the most unnecessary and unforgiving example of canonicalisation issues are these:

www.samplesite.com

http://samplesite.com

www.samplesite.com/index.php (or .htm .asp or whichever)

http://samplesite.com/index.php

99% of the time your website will have a www version and a non www version of each page. The site is also likely to have a /index version or sometimes even a /home version.

Now, what happens when the search engine spider arrives on your site is this:

canonicalisationconfusedspider1-2288590

The spider will only chose ONE of these versions for its index, if you have no canonical “signs” it will chose as it pleases (although yes most of the time the spider will chose the www.vervesearch.com version). So what’s the problem with this? Let’s say for argument sake that the search engine spider chose the non www version but as you can see per below illustration this version of the homepage actually has the LEAST amount of links!

linkstopages-5705659

Because all of these URLs load the homepage, people can link to any of these versions. Most likely people will link to the www version, but quite often you will find that if someone has gone further in to your site then hit the home button this will load the /index versions. And sometimes people don’t use www before the domain name when they type it straight into the browser. So what is happening here? You are DILUTING YOUR LINKS! If the spider indexed the above version in the example it would mean the likelihood of ranking highly (for whatever keyword you have optimised for) will be significantly less than if the spider ranked the www version! Thus the biggest problem with “canonicalisation issues” is dilution of your “link juice”/link equity! There are also a dozen other examples of even worse instances of “canonicalisation issues”, if you have an ecommerce site for example it’s probably 99% chance you have canonicaliation issues. Usually a website with loads of products will have different routes of getting to the same end product; i.e one product will be in several categories thus creating duplicate URLs with the same content!

To fix it we need to make sure the search engines know which URL is the preferred URL and also take precautions to carry over the link juice from the “duplicate URLs”, so to speak, to the preferred/canonical URL.

redirectduplicateurls-6616135

You essentially have two options for carrying over the link juice. Either by using a ‘301 redirect’ or use of the (relatively new) ‘canonical link tag’. Or even both. In fact for the above example of Verve Search I used a 301 redirect for the non www version and a canonical tag on the /index version.

Using a 301 redirect
In my opinion using a 301 redirect is still the “safest” way to make sure the link equity gets carried over. In addition using 301 redirects can be used across domains, whilst the canonical link tag can only be used within a domain. I would say the only downside with 301 redirects is that this will need to be carried out on a server level so if you are not a programmer or at least very familiar with how this is done it can easily be done wrong. I would recommend this is left to a programmer or technical SEO (I have seen instances where 301 redirects implemented wrong has created infinite loops and all sorts!)

Setting up 301 redirects and how it is done very much depends on your server. If your site is on a UNIX or Apache server it’s relatively simple.

Here’s how to do use a 301 redirect using the .htaccess file:
1. Locate the .htaccess file on your server. If you don’t have one, just create one using a textfile (notepad) and name it .htaccess (if you already have a .htaccess file make sure you start the new redirect instructions scroll down to the end of the previous instructions)

2. Put in your redirect information, which would be something like this
redirect 301 /directory/file.html http://www.domainame.com/directory/file.html

NB! The first part “/directory/file.html” is the location of the file being moved and the second part “http://www.domainame.com/directory/file.html” is where the file is being moved.

3. Upload the file to your server.

On the other hand, If your site is on IIS Server (eeek) the procedure for implementing 301 redirects is totally different.

For 301 redirects on IIS do the following:

1. Click on the file or folder you want to redirect within the internet services manager. 2. Select “a redirection to a URL”. 3. Enter the URL that you want to redirect to 4. Make sure you check “The exact url entered above” and the “A permanent redirection for this resource”.

5. Click Apply

I would also like to point out this very usefully article by Steven Hargrave for specifics on doing 301 redirects for pretty much any script, including Perl, Ruby, Java and 301 redirect with mod rewrite.

Using a Canonical Tag
In theory the canonical meta tag should be just as effective as a 301 redirect. But as it’s still relatively new I’m finding it hard to trust it yet (I would love to get your feedback on your experiences using this tag though.) Here’s in essence how the tag works: the “canonical tag” lets you specify in the HTML header that the URL in question should be treated as a “copy” and names the canonical URL that all link authority and content metrics should flow back to.

The canonical link tag is very easy to implement, and you don’t need access to the server. Here’s how:

Within the HTML Header (Meta information) of the pages you want to “redirect” the link juice from, implement the following tag:

canonciallinktag-8655297

The canonical tag should be implemented on every URL you have that is loading the same page. For more information on the canonical link tag, check out this article on the subject by Matt Cutts.

The canonical link tag is ideal for large products and ecommerce sites, I also usually use the canonical link tag on all /index version of pages as I’m a bit worried about 301 redirecting /index versions as these are essentially the physical version of the actual page.

In allot of cases sorting out your canonicalisation issues will have a BIG impact on your SEO efforts, in fact I would say that when starting out the SEO process on a website, fixing the canonicalisation issues should be the first port of call. Doing SEO on a site that is “bleeding” link authority due to canonicalisation, is a bit like building a sandcastle using a sieve!!!