Earlier this week I wrote about why duplicate content sucks in SEO. I'm going to start mixing in tutorials/explanations of common ways folks end up duplicating content on their sites, too.
Today's topic: Pagination. It's oh-so-easy to generate duplicates with those little 1 2 3 4 >> at the bottom of the page.
Say you've got a site called www.blah.com. You've written an article that's 12 pages long, and added pagination at the bottom, like this:
It's purty, and it works. When Google or Bing land on the page www.blah.com/articleaboutx/, they see the pagination and the page URL, and they get it. This page is page 1 of your article:
Now, Googlebot crawls to page 2 of the article. That page is located at www.blah.com/articleaboutx/p2. Also no problem.
But when it attempts to crawl the '1' link, it sees a new URL: www.blah.com/articleaboutx/p1
That page has the same content as the first article page we saw at www.blah.com/articleaboutx/, because it is the first article page. But it's got a different URL.
Two URLs, same page? Uh-oh. That's a duplication problem of the canonicalization variety.
If you have a large publication with, oh, 2000 articles, and all of those articles are paginated the way I described above, you've created 2000 duplicate pages on your site. And they happen to be the first page of every article - the most important page you've got.
Bloggers will link to the '/' or the '/p1' version randomly, depending on which URL they're viewing when they cut and paste.
Your caching software will have to cache both URLs.
And search engine crawlers will waste their time crawling all of those duplicates.
Blech. Luckily, this is an easy one to avoid.
This one's magical... it's tricky... wait for it...
Link the '1' in your pagination to the original URL for the first page of your article.
So, if your article's first page was at www.blah.com/articleaboutx/, make the '1' link point there, too. Don't point it at /p1.
This sounds silly, I bet, but I have yet to see a publisher site, a designer blog, or any other site that paginates get it right the first time. If it's right, it's because a cranky SEO whined about it.
If you don't like the sound of me whining, go ahead and fix it now.
There you have it: One duplicate content problem fixed.