Minasan, Watashiwa Wawan Desu...

Monday, February 14, 2011

Duplicate content sin #2: Default page linking


Last week I wrote about duplicate content sin #1 - screwy pagination. Today I'm going to explain a much simpler, but bigger problem: The inconsistent default page link.


When I say 'default page', I mean whatever page you'd first see if you navigated to a folder on a web site.


So the default page for Conversation Marketing (the whole site) can be found at www.conversationmarketing.com/. That's the root folder - the main folder housing my whole site.


The default page for all of this month's posts can be found at http://www.conversationmarketing.com/2010/10/. That's the sub-sub folder /10/, in the sub-folder 2010, in the root folder for www.conversationmarketing.com:


cm-folder-structure.gif


You can also find the default page for Conversation Marketing at http://www.conversationmarketing.com/index.htm. And you can find the default page for this month's posts at http://www.conversationmarketing.com/2010/10/index.htm.


Web servers automatically deliver these default pages when a visitor requests the folder - that's why you don't have to add 'index.htm' to these addresses.


The problems arise when a developer or designer links to default pages using different link styles at different times. For example, if your site has a 'home' link that points at '/index.htm' or 'default.aspx' or whatever your default page is, you've created duplication:

Search engines and most people see your home page as www.yoursite.com. Most other sites link to you there, too.But search engines crawling your site also see the link to www.yoursite.com/index.htm, and follow that link.To a search engine, the '/index.htm' page and the www.yoursite.com page are two unique pages with the exact same content.

Voila. Duplication.


The same thing happens if you inconsistently link to subfolders in your site.


I won't even waste time explaining what this does to your link profile. It's bad.


The problem here is duplication. And, as we know, duplicate content sucks.


If you want to avoid this kind of problem, apply Ian's Rule of Simplicity: Always use the shortest version of any default page's address. That version should typically be:


www.yourdomain.com + folders


No filenames.


Do that, and you'll eliminate one huge duplication problem. Best part is, most of your default page links will be in your navigation. If your site was built by a relatively sane person, you can make one change to your site template and fix a site-wide duplication issue. Woo hoo!


By the way, this is also considered a canonicalization problem. I'll never stop ranting about canonicalization - you know that, right?

I've been writing up a storm this week, so no fancy conclusions or funny animal pictures. Bye.



View the original article here

No comments: