Lee Dodd

     
 
Google Officially Addresses Duplicate Content for Forums

I am not sure about you, but Google sends me the vast majority of of my search engine traffic. I spend my SEO time making sure that Google loves me - as I bet many of you do as well. For some time there has been chatter about how Google sees the different forum versions that software like vBulletin creates of the same content. These different version are a value-added feature for vB users - they allow for printer-friendly versions, threaded view versions, text-based archive versions, and more, but are they affecting your rankings in Google?

The real question has been: Does Google see these different version as duplicate content? And if they do, are they penalizing my rankings because of it? And if they don’t, wouldn’t I be crazy to tell Google to not index these pages, therefore giving me fewer indexed pages (thus, fewer opportunities to be found in search results)?

Rest easy, Google to the rescue: (emphasis added)

What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.

Two points for Google here: First, they at least state that much of the duplicate content issue isn’t intended. And second, they address forums specifically in their definition.

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index.

Interesting. So what Google is saying is that the worst thing that will happen if you leave all your page versions to be indexed, is that Google may choose the wrong version to show visitors when they are routed through a search. This means that they won’t dock your rankings because of duplicate content, but they will choose which version that they consider appropriate.

Hmmm… so what does Google suggest that you do? Should you leave all your pages to be indexed - and therefore have more opportunities to be found a Google search?

How can Webmasters proactively address duplicate content issues?
* Block appropriately: Rather than letting our algorithms determine the “best” version of a document, you may wish to help guide us to your preferred version. For instance, if you don’t want us to index the printer versions of your site’s articles, disallow those directories or make use of regular expressions in your robots.txt file.

Okay, so even though they say they won’t penalize you, they suggest that you choose ONE version to index and block the others. I don’t know about you, but I tend to follow what Google advises me to do. This actually came up at the Elite Retreat on Tuesday. I asked the question of Aaron Wall because I was curious what the author of the best SEO book on earth would say. He advised me to choose one version and block the rest. Since I can’t recall exact wording, I won’t misquote him here, but when Aaron speaks, I listen. :)

If you are interested in the full blog post by Google, you can read it here. There are a bunch of great tips on minimizing duplicate content issues that should be read and applied.

For me, I think that whenever possible, we should keep things simple. If an imperfect algorithm will be determining my rankings, it makes sense to be totally clear about what you want indexed. I would rather not leave any confusion about possible duplicate content, but beyond that, perhaps it’s not a good idea to bring in visitors off the archives when the view they gets doesn’t match the actual site. (I have to say that the Archives do make money, but I would rather have the long-term visitor that I may not be converting.)

What are your thoughts on this issue? What are you doing for your forum(s). Have you made changes and gotten results from them? I would love to hear the thoughts of other Admin on this issue as I think it’s an important one.

by Chris Kenworthy @ Ackfoo.com

Add to Del.icio.us


This entry was posted on Thursday, December 21st, 2006 at 1:09 pm and is filed under Forum SEO, Guest Contributors. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Related Posts

RSS feed | Trackback URI

2 Comments »

Comment by Lee
2006-12-21 13:33:27

Awesome post and reassuring news for forum owners!

 
Comment by Brent Wilson
2006-12-22 00:41:27

The problem with vBulletin is more than just duplicate content. It also has duplicate URLS that lead to the same thread. Even on your forum Laura, with rewritten urls you have duplicate urls still.

Laura, I noticed on your homepage you have showthread.php?t=xxxxx links still. This will not help you with Search Engines as Google does not transfer Pagerank to redirect urls. I would see about getting those showthread.php urls rewritten on the mainpage. For more info on the 301 redirect with Google see here: http://www.vbulletinzone.com/t104/

Thanks for posting the article though.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.