Identifying and Combating Duplicate Content Issues

Duplicate ContentA recent post by Paddy Moogan from Distilled about when to use a 301 redirect and when to use a Rel =Canonical got me thinking about all the possible ways we can fight duplicate content issues.

First, for those who are new into search marketing; a duplicate content penalty is a consequence that the Search Engines impose when they find large amounts of text that have been copied from other sources on the Web. Some would argue that the search engines are simply filtering you out of the SERP’s (search engine results pages) in effort to deliver more relevant, fresh content. Anyway you look at it, you won’t benefit from it, and therefore it’s a penalty in my eyes.

What can cause duplicate content problems?

Duplicate homepages can be seen as individual pages, possibly discounting the merit that your true homepage has earned. If your site homepage can be viewed like the examples below, you may want to continue reading to correct the error.

http://www.example.com or http://example.com are both good, but it needs to be one or the other.
http://www.example.com/index or /home or /homepage needs to be corrected.

There is also the possibility that someone has outright stolen your content. If that content you created has already been crawled and established itself in Google’s index, odds are that thief isn’t going to benefit on the search engines. Ideally they’ll just get filtered out.

Creating dozens of versions of the same article to distribute to article sites/networks is a rather popular link building technique. While I won’t take a stance on its effectiveness, if you use an article that is already on your site and create numerous versions of it, it can come back to bite you because the search engines can still see the correlation between the original and the copies spread all over the Web. It’s quite possible it could even discount those included links further.

Some shopping cart content management systems can have different paths to get to the same product or category page. Why is this an issue? Well if those two different URLs are going to the same product, then it’s fair to say that those are duplicate pages.

However, if you have a blog and you’re worried about your different categories having duplicate content because of the different categories you posted it in; the Search Engines are keen to this and understand blogs. Also, the more posts you get in those categories, the more it’ll mix up that content preventing any sort of duplicate content problem. Same story with post snippets.

How can I find any duplicate content that could be hurting my site?

One way is to browse your site to see if you have any of the examples above. Another is to type your URL into Copyscape. Keep in mind that when you do this, it is only showing you the result for that exact page that you entered, not sitewide. Also, it will not return results of duplicate content that you have on the same URL that you submitted your query for.

How do I fix my duplicate content issues?

First, the odds of you hurting from other people stealing your content isn’t very likely. Lookup SEOmoz.com in copyscape.com and you’ll see that there are pages of results but because they were the originators of the content, it’s not likely that they’ll be filtered out or receive any sort of penalty.

If you have content that other people have copied or stolen, you can try e-mailing the webmaster and kindly asking them to take it down. Chances of them responding aren’t very likely so the best thing you can do is probably just forget about it. People steal content left and right on the Internet, dwelling on it is just wasting your time when you’re probably not getting penalized from it anyway.

How to fix duplicate homepage issues

Luckily if you are getting penalized because you have duplicate pages, it’s on your end of things and it’s relatively easy to fix. If you have duplicate homepage problems locate your .htaccess file.

Add the following code to redirect all your www-URLs to the non-www URLs:

  • RedirectMatch: 301 ^(.*)$ http://domain.com
  • RedirectMatch permanent: ^(.*)$ http://www.domain.com

  • You’ll need to replace “domain.com” with your URL as well as change whether you want everything to go to www or non-www.

    If you need to get rid of your /index or /homepage page problems you’ll need to implement a simple 301 redirect. This will also need to be specified in the .htaccess file using the code below:

  • Redirect 301: /badurl.htm http://www.example.com/

  • Change the example URLs to make sense with your particular situation.

  • Redirect 301 /index http://www.example.com

  • For more clarification, it’s telling the site to permanently redirect your /index to http://www.example.com leaving you with a clean URL structure. Now, all your duplicate homepages should go to either http://example.com or http://www.example.com, whichever you preferred.

    Fixing Other Duplicate Pages Using the Rel=Canonical Tag

    For example, if you have a product site that has more than one way of getting to the product, those duplicate URLs could be hurting each other. For example:

    http://www.site.com/ipods/skins/blue-ipod-covers vs. http://www.site.com/skins/ipods/blue-ipod-covers

    Same page, different URLs. In this instance, using a rel=canonical tag is in your best interest. Using it will tell the major Search Engines that the page that copies your other page should be treated as one in the same. For example:

    If http://www.site.com/ipods/skins/blue-ipod-covers isn’t the correct page, and you would rather have http://www.site.com/skins/ipods/blue-ipod-covers be the main page, you’d want to put a rel=canonical tag on http://www.site.com/ipods/skins/blue-ipod-covers. This way the Search Engines understand that it’s a user-generated duplicate page and that you want all the links and other metrics to be directed towards the right page. No longer will the search engines be confused on which page to display or give credit too.

    Using the rel=canonical tag is an alternative to programming a 301 redirect. A 301 redirect is still the preferred way to guarantee the search engines understand your intent to move content from one URL to another.

    In addition to fixing potential duplicate content issues, treating the two separate pages as one can help any keyword cannibalization that could be going on.

    Get Internet Marketing Insight For Your Company - SEO.com

    5 Comments

    1. Gurgaon says

      I agree with Ray, this was very easy to understand and pretty helpful.

      I recently changed how my WordPress permalinks are handled, and it has me worried about duplicate content issues now that some of my posts are technically found under two different permalinks.

      Any more duplicate content posts in the near future?

    2. AJ Clarke says

      Duplicate content is one of the things that really screws with people’s rankigns when using WordPress…In my opinion the best things you can do is to optimize your robots.txt and to prevent indexing of tag pages.

    3. says

      Very informative. Every questions in my mind after reading each paragraph has been answered by the next paragraph. That’s why this blog post leaves me no more question. All I need to do is nod in the tips and find some duplicate contents in our website.
       

    4. Melissa says

      Content stealing is the major concern in internet world today. Thanks for such an eye opener article with all the problems and solutions at one single place.

    Leave a Reply