Minasan, Watashiwa Wawan Desu...

Friday, February 18, 2011

How to Make a WordPress Blog Duplicate Content Safe

Supplementary indexIn one of my recent posts I wrote about the duplicate content issue. This topic is especially important to me since my blog uses the WordPress content management system which, when used with the default configuration, is not duplicate content proof. In fact this CMS is capable to render almost 100% of your content duplicate. As usual the fault of the system has roots in its advantages. WordPress has many features facilitating blogging and linking, such as RSS feeds to posts and comments, trackback URLs, monthly archives and so on. In the same time this variety of URLs returning similar or identical pages represents a clear case of duplicate content.

The first evidences of duplicate content produced by your WordPress CMS can be found in your sidebar. They are category pages and monthly/daily archives. Category pages store your articles posted under the same topic – a category. Such pages have no unique content; they are just a collection of your previous posts. Monthly and daily archives also simply group your previous articles by the date of posting. Sometimes when you have only one post in a given day, the archive page for the date and your post are totally identical.

The next case of duplicate content is even more prominent. It can be your home page itself. If it contains not excerpts but the full text of your posts, then it duplicates your post pages. This also applies to the ‘next/previous entries’ pages – those accessible via /page/2, /3, /4 etc.

Feeds. Search engine spiders crawl all the content they can reach and of course this includes RSS feeds too. The additional problem with them is that Google may choose to display your RSS URL in the search results over the link to the original post. In this case the user who clicks this result will see an XML formatted page which is not ‘human-friendly’.

Trackback URLs. Many WordPress templates add trackback links after posts. This links enable authors to track who links to their posts. Usually, if your post URL looks like ‘www.yoursite.com/2006-11-30/yourpost/’ its trackback URL will be ‘www.yoursite.com/2006-11-30/yourpost/trackback/’.

Identical meta-description. By default WordPress doesn’t provide a tool to add unique meta description tags to your posts, and they either have none or share a single site-wide description. Having no meta description at all is a disadvantage, as a properly written one can make your snippet stand out in a SERP. Having an identical description for all your pages is a threat, as Google might get them filtered out as too similar. (see a thread here)

Because of the duplicate content Google search can return less desired URLs (such as feeds or archives instead of original posts); your pages can be moved out of their index, or placed into the supplemental results, which are rarely displayed to users.

What can you do to avoid this problem? You can tell the search engines what URL to index by using ‘noindex, follow’ meta tag, robots.txt exclusions or 301 redirects. Let’s say you want Google to index your front page, posts, single pages and category pages and forbid the spiders from crawling the content of archives, feeds and ‘next entries’ pages – page/2, /3, … To do this you have to add to your header.php the following code:

if((is_home() && ($paged < 2 )) || is_single() || is_page() || is_category()){echo '';} else {echo '';}

For those not familiar with editing templates in WordPress: in your dashboard click Presentation menu item and after the new page is opened – click Theme Editor. In the Theme Editor choose ‘header.php’ and then paste the above code into the editor form. This code has to be inserted anywhere between head tags .

Here the tag is added to the home page but not the ‘next entries’ page (is_home() and ($paged<2)), to your posts (is_single()); to solo pages, like ‘About me’, if you created any (is_page()); and to category pages (is_category()). If you don’t want your categories to be indexed just delete || is_category(). All the other pages will get . They will not be indexed, but this will not prevent crawlers from following their outgoing links.

For this purpose I use Head Meta Description plugin. This plugin can be configured to use an excerpt of your post as a meta description – this is especially useful if you have to add this tag to hundreds of existing pages. Or you can add your own manually as a custom field, which is my personal preference.

By using this tag you tell WordPress to display only the first few lines of your post. This greatly reduces the similarity of home page and your articles. If you have too many existing posts to edit, you can use an ‘excerpt’ plugin, such as this one from Semiologic

You should edit your .htaccess file to perform 301 redirects. Non-www addresses like yoursite.com should be redirected to www.yoursite.com. URL without trailing slashes like www.yoursite.com/category should be rewritten to include it: www.yoursite.com/category/ This can be done by inserting the following code into your .htaccess file:


RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

For more details I advise you to read this: the process or rewriting the URL layout.

For this purpose you should edit your robots.txt file by inserting the following code

User-agent: *
Disallow: /wp-
Disallow: /search
Disallow: /feed
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /*/*/feed/$
Disallow: /*/*/feed/rss/$
Disallow: /*/*/trackback/$
Disallow: /*/*/*/feed/$
Disallow: /*/*/*/feed/rss/$
Disallow: /*/*/*/trackback/$

Some people find it useful to restrict the number of posts displayed in your home page to 4-5, as less posts are duplicated.

A great article on customizing the more tag in Wordpress.

To avoid the duplicate content issue in WordPress include you should do:Add ‘noindex, follow’ meta tag to your monthly/weekly/daily archives, ‘next entries’, and if necessary, category pagesEnsure that all your pages have unique meta-description tagsSet up 301 redirects for your non-www URL and URLs without trailing slashesRestrict search engine crawlers from indexing your feeds and trackbacksUse more tag to show excerpts in your home page instead of full postsRestrict the number of posts displayed in your home pagereddit_url='http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm'

View the original article here

No comments: