Duplicate Content, Wordpress, Google and Robots

Duplicate content for users of Wordpress blogs is a bit of an issue.  It isn’t particularly clear what the owners of Wordpress blogs should do to avoid the issue.  For example, content can be duplicated on the home page, the actual post page and in the categories and archives.  There are also problems with RSS feeds being crawled and duplicated by Google and non-canonical URL’s.

There is a lot of discussion about what to do about this and how much it will affect certain pages ranking ability so I thought I’d sort through some of the  opinions and let you know what I do.

Google says: 

“During our crawling and when serving search results, we try hard to index and show pages with distinct information.  This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list.”

They go on to say that it is only if they deem that duplicate content has been used to manipulate the rankings that they will make adjustments to a sites rankings.

Google do seem to try offer an answer, but the experience of myself and others suggests that the desired page is not the one being indexed and pages are still ending up in the “supplementals”.

What is the answer to this duplicate content issue? 

Different people have different ways of doing it and different views on what you should stop the search engines from indexing.  At SEOmoz  Ran Fishkin advises on using Meta Robots = “NoIndex” on the pages you don’t want to be indexed.  Good advice as always.

There’s also some excellent advice at Graywolf’s SEO Blog on how to only put your blog post into one category.  Putting it into multiple categories creates duplicate content.  You have the same post but in different parts of your site.  This is a simple but good way of reducing some of the clutter on your site as well as helping with duplicate content problems.

As far as non-canonical URL’s are concerned thie problems with this seem to have been resolved in Wordpress 2.3.  Instead of having multiple URL’s with the same content now Wordpress makes sure there is only one URL with your content on.  This is extremely important and a great move by Wordpress to address to duplicate content issues.

I use a robots.txt file to stop Google displaying my RSS URL in the search results instead of the URL of the page I want to rank.  I also use robots.txt to block trackback URL’s and the duplicate content issue that comes from this.  For example /trackback/ will usually have the same content as the post without the trackback.  So there will be the same content on different URL’s.

I also use a robots.txt file to block search engines crawling my archives.  There is no real reason for them to go there.  I let them access my category pages which I know some people don’t allow but personally I think this is not be a problem.

As you can see from my homepage I use the “more” feature when writing a post.  This displays only an excerpt of my post on the homepage, and reduces the duplicate content problems of having exactly the same content on the homepage and post page.

With so many blogs out there, Google and the other search engines are most certainly aware of how these blogs throw up duplicate content.  They can deal with the issue and there are many blogs out there that do not have any kind of robots.txt file and rank very well.

If many of your sites pages are experiencing duplicate content issues then try some of the methods above, you may see an improvement in traffic as some people have reported.  If you are ranking well and not experiencing and problems then you shouldn’t need to worry, if it ain’t broke don’t fix it.

This entry was posted on Wednesday, October 17th, 2007 at 4:14 am and is filed under On Site SEO. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Duplicate Content, Wordpress, Google and Robots”

  1. Search Engine Marketing » Duplicate Content, Wordpress, Google and Robots Says:

    […] SEO UK - SEO Expert in Internet Marketing - SEO Ambassador wrote an interesting post today on Duplicate Content, Wordpress, Google and RobotsHere’s a quick excerptDuplicate Content, Wordpress, Google and Robots Duplicate content for users of Wordpress blogs is a bit of an issue.  It isn’t particularly clear what the owners of Wordpress blogs should do to avoid the issue.  For example, content can be duplicated on the home page, the actual post page and in the categories and archives […]

  2. Eric Says:

    This is exactly what I expected to find out after reading the title icate Content, Wordpress, Google and Robots - SEO Ambassador. Thanks for informative article

  3. Daniel Says:

    I couldn’t understand some parts of this article icate Content, Wordpress, Google and Robots - SEO Ambassador, but I guess I just need to check some more resources regarding this, because it sounds interesting.

  4. Internet Adviser delivers reliable marketing results. Says:

    Thanks for the very impressive post. I\’ve book marked your site.

Leave a Reply