Optimising a Drupal site for SEO - part 1

* Originally posted on crawlscore.com but moved here as the blog on crawlscore.com is no more *

Because crawlscore.com has been through a couple of re-designs and also because I've made some mistakes when putting the site together, it needed some maintenance to ensure we make the most of each crawlers visit.

This is part 1 of the project where we'll just focus on what the search engines see and what Crawl Score tells us in terms of the site structure. Part 2 will be about the bones of the site and looking at making sure it loads quickly, use caching correctly and a few other tips.

Google Sitelinks and website structure

crawlscore.com uses Drupal with search engine friendly URLs enabled. This means you can specify whatever URL you wish for each page.

To get the desired Google Sitelinks we want to make the site structure very simple - all blog entries in www.crawlscore.com/blog, all faqs in www.crawlscore.com/faq and so on.

Redirects in Drupal

This means I need to rename some pages. This is very simple in Drupal but I also used the Drupal URL redirect module to use a 301 Permanent Redirect to the new page - for example :

www.crawlscore.com/are_sitemaps_worth_it now redirects to
www.crawlscore/com/blog/are_sitemaps_worth_it

The search engines will see that the old page has moved and update its index accordingly.

Drupal blog paths

Because Drupal allows multiple blogs per site it has URLS such as :

http://www.crawlscore.com/blog/1?page=1
http://www.crawlscore.com/blog?page=1
http://www.crawlscore.com/blog/1

We installed Drupal module Pathauto and changed the URLs to :

http://www.crawlscore.com/blogs/dan-frost?page=1
http://www.crawlscore.com/blog?page=1
http://www.crawlscore.com/blogs/dan-frost

So we're still getting duplicate pages titles (because they're the same page) but without changing the Drupal code, we can't change this. For this particular site, with one blog, it's no big deal so we left that in place.

We had two 404 (page not found) errors - because we have external links to those pages we decided to re-direct them with a 301. The SEO value will be gone (a redirect does not really pass any link value) but obviously it's better for users who have followed that link. Normally you would just remove the links to those pages.

robots.txt

We also found that a few pages were being crawled that we didn't want crawling so we edited robots.txt accordingly. In particular Drupal creates a dynamic URL for comments. Comments will appear on the main blog URL anyway so there's no need for the extra page with just the comments on.

An example URL is :

www.crawlscore.com/comment/reply/3.

Here's an extract from our modified robots.txt :

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /comments/
Disallow: /user/
Disallow: /files/

This means that search engines won't follow or index links that sit in those directories.

The main reason for doing all this is to ensure that whatever crawler resource the search engines have, we make the most of it.

Being able to see the site via Crawl Score reports allowed me to see all of these issues much easier. I corrected the issues, re-ran a crawl immediately, corrected some typos that I'd made and was happy that the site was fine.

I could have been waiting for days or weeks for the Google Webmaster Tools to update making this 30 minute job a much more disjointed task. Because I'm able to do an immediate crawl, I know that I've made the changes correctly.

I'm a reasonably competent person (ahem) and I made lots of mistakes on this simple 50 page website so it shows how easy it is to have these sorts of issues on very small websites.

Part 2 of this article will be about ensuring that crawlscore.com is as crawler friendly as it can be with regard to caching, quick load times and related areas.


Drupal Association Member

Nominet member