One of the hot topics at PubCon last week was the issue of duplicate content. One may say “What’s the big deal? I can get more of my pages indexed in Google.”. Well, it is a big deal and it could be affecting your rankings.
When Google comes by your site and starts to index it, it does not necessarily index every single page during that particular visit (especially if you have 10,000+ pages). But when the search engine bot (or spider) arrives, it starts to go through each page and index them one-by-one while placing links on that page into a queue for later indexing. So as the spider continues to index the site the queue of additional pages to index keeps increasing. So if you are throwing in a bunch of duplicate content pages into the queue then you are essentially limiting the spiders ability to grab all of your “unique” content on the site. If your duplicate content does get indexed you could be splitting Page Rank among the duplicate pages, lessening the chance of ranking well in the search engines.
Some places to look for duplicate content.
- Printer Friendly pages. Place a noindex or use robots.txt to exclude these pages from being spidered.
- Search results pages that sort by product attribute (color, size, model). Sometimes these attributes are included as a new URL parameter in the search string. Try placing these search parameters into a cookie.
- There is a helpful tool called Xenu. It is primarily used to search for broken links on a site, but you can use it to search for duplicate content on your site. You can export the results into Excel (probably need to use 2007 since it can handle 100,000 rows of data now). You can then sort it and strat cruising for duplicate URLs.
There was some talk about if there is a penalty associated with duplicate content. The short answer is no.
Contact Sales & Marketing Technologies to learn how we can improve your web site's success.
Tags: duplicate content, xenu, pubcon