Duplicate Content: It's Not What You Think!

cashgen · Dec 15, 2010

jcorkern said:
Duplicate content is when you have an exact copy of a site. Page for page, file name for file name, image for image, code for code. And exact replica of a page or site. This was put in place for dynamic page spawners, duplicate websites, and doorway pages that were designed by blackhat seo?s and spammers that are trying to control the natural search results.

Each article directory, Press release site and even the pages that host RSS feeds have different code, images, file names, java scrips and a multitude of other differences that stops them from getting hit by duplicate content.

There is more on a page than your article when a directory publishes your article. Search engines read all of the code, not just your articles text on the page. You do not have to worry about duplicate content with articles or press releases.

Jim

Thank you for your explanation on duplicate content.

I respectfully differ on a number of points.

Right from the outset, may i state that a lot of faulty deductions have been made about "duplicate content" which actually was never stated by Google nor was Google's intention, hence the confusion.

Next. On what authority do i speak? I have done an in depth research on this particular topic as depicted by the post on it on my blog which i have not stated the link to here to avoid been labelled as self serving. On request however, i am willing to post the link for details on this topic.

Now, any content that is significantly the same (does not necessarily have to be 100%) whether on the same site or across sites is duplicate content. The issue however is that duplicate content is not necessarily penalized by the search engines.

About the only reason stated by Google for penalizing is " Duplicate content with malicious intent or deceptive in origin". Apart from this, the only other case which actually is not a penalty is where only one preferred version (as determined by Google) of a duplicated or replicated web page on the same site is indexed while the others are not (where the duplicate is not deceptive in origin nor did it arise with malicious intent)

Thus the illustration made by the OP arrives at the right conclusion that duplicate content arising from duplicate article submissions is not removed from Google or other search engines index, but in my view, for the wrong reasons. It is not because the codes, images, file name, java scripts etc on the different sites web pages differs it from that on other sites but rather simply because it is not Google's policy to penalize such duplicate content.

One must however be careful here in that presence in Google index does not mean ranking highly in the index. Again, apart from the primary index, there is also the supplementary index.

Even though not also directly stated by Google, it is unlikely that a content which is significantly the same with another will feature on the first page of Google. Distinct this however from "spinned" articles or the application of "article leverage" which may have made the articles significantly different, even if bearing the same title.

Regarding content on the same site, it is also not simply based on the same website codes, images, file name, java scripts etc that duplicate content is not indexed but rather purely on the policy pronunciation of Google on this. Even if the codes etc. on different pages of the same site differ as it happens on different websites, the duplicate content will still not be indexed as long as it is on the same domain.

jcorkern · Dec 15, 2010

Sir cashgen

While I do appreciate your response, I do believe that you have not allowed for several factors that are pretty obvious.

1. If you check the date that it was first published, it was over 3 years ago and Google has since made this information public, but not prior to.

2. "One must however be careful here in that presence in Google index does not mean ranking highly in the index. Again, apart from the primary index, there is also the supplementary index.

Even though not also directly stated by Google, it is unlikely that a content which is significantly the same with another will feature on the first page of Google."

Considering this information, then we should not see search results like these, and I could show you millions of them: Electronic Cigarette Retailer Sells At Wholesale Prices To The Public - Google Search and New Quartz Infrared Heater Could Save Americans Over Half On Their Heating Bill - Google Search

I have not covered this in depth because I have just not had the time to lately, but it looks like now is the time. I have never believed in a penalty, and about 7 months after I wrote this post Matt Cutts published that there was no penalty, then made a video of it and explained it thoroughly so that anyone with a working concept of SEO could understand, but that does not count for the newbie who is trying to piece SEO together. This post was to educate newbies that were living in fear of "Duplicate content" as most put it.

Now to the search results that I posted. They are 2 press releases, one from about a year ago and one from about 6 months ago. There are several occurrences of each on the first page just by searching the title alone, no quotes. This is the primary index cashgen speaks of when he stated "
Even though not also directly stated by Google, it is unlikely that a content which is significantly the same with another will feature on the first page of Google"

We now see this is false, and if you want to check others, go to a major newswire and past titles in Google and take a look. I have to admit that article directories will give you less play, but it is because of the given domain authority of the directories as compared to the news sites that pick them up.

You can also make them stick by direct linking to them. Matt Cutts himself validated this stating that ""As a reminder, supplemental results aren't something to be afraid of; Ive got pages from my site in the supplemental results, for example. A complete software rewrite of the infrastructure for supplemental results launched in Summer of 2005, and the supplemental results continue to get fresher. Having urls in the supplemental results doesn't mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they are in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. "

As you can see, the fun started in 2005, and before 2006 I had tested and found that you can have multiples in the search results with good links to as many as I wanted in the first page.

I have linked many not only out of supplemental results, but also to the first page. I did not do that to the two I referenced here, as most news sites carry a lot of PR throughout their sites, it makes sense that they are getting some serious PR bleed from a couple of high PR pages within the news sites. So they do that well basically on domain authority and internal link structure.

Now, there is a loss of them altogether on PRs and articles both because a lot of news sites and article directories cull and archive content after a certain period of time. So once the pages are removed from these sites, some of them could disappear, but they are removed from the index because the content is removed.

djbaxter · Dec 16, 2010

jcorkern said:
I have never believed in a penalty, and about 7 months after I wrote this post Matt Cutts published that there was no penalty, then made a video of it and explained it thoroughly so that anyone with a working concept of SEO could understand, but that does not count for the newbie who is trying to piece SEO together. This post was to educate newbies that were living in fear of "Duplicate content" as most put it.

See also:

http://affiliate-marketing-forums.5...22041-there-no-duplicate-content-penalty.html

http://affiliate-marketing-forums.5...oogle-says-dont-afraid-duplicate-content.html

http://affiliate-marketing-forums.5...cate-content-penalty-there-no-such-thing.html

http://affiliate-marketing-forums.5...roduct-pages.html?highlight=duplicate+content

Toucan-marketing · Dec 16, 2010

Thanks, Minstrel for clarifying that one!
I am less afraid of duplicate content now... Am I not a newbie anymore?

cashgen · Dec 16, 2010

Thank you jcorkern and minstrel for your posts in response to mine.

I indicated at the outset of my post that "Right from the outset, may i state that a lot of faulty deductions have been made about "duplicate content" which actually was never stated by Google nor was Google's intention, hence the confusion."

I am seing it again being repeated in your posts. For crying out load, we are all agreed that there is no duplicate content penalty resulting to duplication of articles or news items across multiple sites in normal circumstances and Google also did not state otherwise at anytime. They are allowed to feature in Google's index. So, illustrating with examples is actually not necessary. This is the faulty deductions never actually stated by Google and fuelling the duplicate content controversy that i was referring to. It is like arguing over nothing.

But now please note my words carefully. That does not translate to the fact that there is no duplicate content penalty across board. It exists but not in the direction indicated above that most people think.

Google specifically indicated that there is a penalty for duplicate content where it results either on the same site or across sites, if it is "Duplicate content with malicious intent or deceptive in origin". In this case, there is a contravention of Google Webmaster Guidelines and such penalty can include but not limited to complete removal from Google index.

I also clearly stated in my post that "Even though not also directly stated by Google, it is unlikely that a content which is significantly the same with another will feature on the first page of Google. Distinct this however from "spinned" articles or the application of "article leverage" which may have made the articles significantly different, even if bearing the same title."

In the very few cases where you view the same article or news titles on page 1 of Google, it does not mean those articles or news items are significantly the same, as the title though important does not constitute the entire article or significantly the article. With good "spinners" and "article leverage tools", an article can be changed even by 50%, yet with the article or news item title remaining in place.

My point also remains that our conclusions regarding duplicate content as it applies to syndicated articles and news items is the same i.e. no penalty - but for divergent reasons. You rationalized it based on different codes, images, java script, file name which i disagree with but rather rationalizing it based on the clear Google policy.

I agree that given the time you wrote the post, you probably lacked enough information as subsequently revealed by Google but that does not change the facts, rather it only makes us understand your plight at the time.

Finally, there is no point buttressing areas where we are both agreed, thereby creating an impression as if i disagree on these points. Also, the fact that duplicate content across sites is not ordinarily penalized based on current Google policy does not make them less duplicate content than that on the same site as many would want us believe. It is simply that they are not penalized. Please note here that "duplicate content " is/are english word/s and is in the dictionary and not the coinnage of Google.

I will leave you now with this quote from Wkipedia

Duplicate content
From Wikipedia, the free encyclopedia

Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page, even on different web sites. When multiple pages contain essentially the same content, search engines such as Google prefer to only display one of those pages in their search results.

Non-malicious duplicate content may include multiple views of the same page, such as normal HTML and a version for mobile devices, printer-only versions of a page, or store items that can be shown via multiple distinct URLs.

Malicious duplicate content refers to intentionally generated search spam in an effort to manipulate search results and gain more traffic. Users do not like to see the same content listed multiple times.

jcorkern · Dec 16, 2010

"That does not translate to the fact that there is no duplicate content penalty across board."

Yes it does. What these guys get in trouble for is a linking scheme, and that is what you are calling malicious and it is a violation of the Google T.O.S.

Spinning is malicious because it is designed to "trick" the Google algorithm into give the linked to site better serps. An example is a blog farm, like these: ★ Cigarette Pack - E-Cig | Compare Prices On E Cig and Acce and Organic Cigarettes just a few that I know about, and this particular guy has over 100 of these to artificially build links from related sites and pages.

The spinning is to try to evade a duplicate content penalty that has never existed, therefore drawing attention to the recipient site for a manual review and possible expulsion from the Google index. Also, I am almost certain at this point of testing that Googles LSI can in fact pick up spun articles and identify them.

I understand your point and how one could come to that conclusion, but it is not based on reality. I would also suggest you find a better source than wikipedia because it is open source content. That means anyone can edit it with no fact to back it up, and the writers and editors only have a username, making them non accountable for information they put out.

Matt Cutts, the lead engineer for Googles spam team is the source that we ( both myself and minstrel ) have cited. I would look at the credibility factor in where you get information for your research, and wikipedia is not it. Look, here is one post concerning wikipedia : Blog Insights: Wikipedia's great fraud | ITworld

I could go much deeper into wikipedia, but this is not the place, but you can rest assured that getting algorithm information from wikipedia is not what I would call extensive research, but what would give you a much better grip on this situation would be to "test" extensively with controls as I have done and do so on a daily basis on a fairly large scale.

cashgen said:
"In the very few cases where you view the same article or news titles on page 1 of Google, it does not mean those articles or news items are significantly the same, as the title though important does not constitute the entire article or significantly the article."

I already posted the press releases that were not "significantly the same", but identical, yet they performed in Google in a manner in which you stated does not happen. So even on a second post by you, you are still contradicting what is obviously a reality in real time searched as of now.

There is no way to syndicate a spun press release, so that is one control I could not use to test, but the the amount of occurrences of a "duplicate" article in multiple directories produces about the same amount of indexed in primary results if they are duplicate or spun to 50 %, 60%, 70% and so forth and so on. This does not mean that the same amount ranks on the homepage and that is attributed to authority and quality iof inbound links to each occurrences.

They do not tend to get as many first page appearances as press releases do, but that is back to authority of the news sites as compared to article directories.

I can tell you that each additional occurrence of any given article, press release or spun content that goes into the primary or the supplemental results loses link value at a significant rate for each one indexed, no matter what index it is placed it. So lets give and example of what i am getting at.

The article or press release is syndicated, the strongest site in most cases will give the most link value, lets say on a scale of 1 to 10 that the first one gets a score of 10, the second may give you a link value of 7, the third may give you a 4, and at this rate, it will only take 4 or 5 occurrences before the ones after will have next to nothing, or a fraction of 1 on the scale.

I am not trying to be critical of your thoughts, but rather trying to explain what is really going on from a "testing" perspective, and now that Google has come out and backed the testing up, it is pretty cut and dry.

I have avoided stating this to this point, but you asked a question and gave and answer to your own question in your first post and I did not address it on the last post, and one of the great moderators here may remove the information, and I will understand if they do, but you asked the question of "Next. On what authority do i speak? I have done an in depth research on this particular topic as depicted by the post on it on my blog"

Then you answered it with what you have done to validate your information, So I figured I would ask the same and answer the same.

I currently have a test budget and a dedicated 6 person team to perform this research that reaches well into the 6 digit area per year testing only.

You will never attain your true potential until you invest in testing. No matter what forum, site or expert you go to, testing is what separates the wheat from the chaff. You can research until the moon turns purple if you are not getting the info from test results and seeing the evidence of the test results live in Google in a real world experiment.

Moderators, I have no problem with editing any info I have here for issues I may not understand.

djbaxter · Dec 16, 2010

jcorkern said:
Moderators, I have no problem with editing any info I have here for issues I may not understand.

Thanks for a well thought-out response, jcorkern. No editing necessary.

I wanted to return to a secondary point you made previously though where you reference Google's supplementary index. I was under the impression that this no longer existed, although that may be incorrect. Google originally introduced this as their databases grew ever larger as a method for increasing the speed of database searches and retrievals. However, since that time, they have vastly increased their computing power and the size of their computing network, as well as introducing refinements such as Caffeine which now allow them to index and return pages for searches in almost real time (i.e., almost instantly for certain types of queries). My assumption was that the need for the supplementary index disappeared and the supplementary index became became obsolete some time ago as a result of these changes.

jcorkern · Dec 16, 2010

you know what minstrel, I had not thought of that. That is very good point and something else that needs testing...unless you know of some credible sources with answers.

What led you to believe that or question it? I would really be interested in looking into this further, so do you know of a starting point to look at or was it simple deductions from what you learned from experience?

Yeah, I am fishing because you struck my interest.

djbaxter · Dec 16, 2010

No, I have no sources. It's really more a question than a statement, I guess. Just something I have been thinking about based on (a) the reasons for introducing the supplemental index in the first place, (b) the massive upgrades to Googles technology, and (c) the fact that I haven't heard anyone from Google mention anything at all about it for quite some time.

jcorkern · Dec 16, 2010

Well, I believe that is enough reason to look into it and do some testing. Thanks for the idea.

djbaxter · Dec 18, 2010

Here's a fairly detailed contribution from Yoast (I haven't had time to read through it yet):

Duplicate content: causes and solutions
by Yoast
December 2010

Search engines like Google have a problem. They call it "duplicate content": your content is being shown on multiple pages locations on and off your site and they don't know which location to show. Especially when people start linking to all the different versions of the content, the problem becomes bigger. This article is meant for you to understand the different causes of duplicate content, and to find the solution for each of them.

Causes for duplicate content

Misunderstanding the concept of a URL

Session ID's

URL parameters used for tracking and sorting

Scrapers & content syndication

Order of parameters

Comment Pagination

Printer friendly pages

WWW vs. non-WWW

Conceptual Solution: A "canonical" URL

Identifying Duplicate Contents issues

Google Webmaster Tools

Searching for titles or snippets

Practical Solutions for Duplicate Content

Avoiding Duplicate Content

301 Redirecting Duplicate Content

Using rel="canonical" links

Linking back to the original content

Conclusion: Duplicate content is fixable, and should be fixed

BlogDiva · Dec 29, 2010

Thanks for sharing the great insights of your time and effort! I work for a site that sells jewelry, and in writing product descriptions I was so worried about duplicate content. Does this mean that I can use exactly similar descriptions for similar products?

jcorkern · Dec 29, 2010

It is always better to write original content. The only way I would have duplicates is syndication of an article or press release. This was more to let people know that there is no "penalty".

You need to write good descriptions that entice people to buy or click.

Toucan-marketing · Dec 30, 2010

I agree. I keep my blog content unique though but only because it ranks better. But if duplicate content penalty existed, that would mean anyone could ruin your ranking by pulling your content and reprinting or trackback. That would not be fair. By the way: a blog post is usually different in structure than an article or Press Release, especially if you are writing for people who are returning visitors and know you a bit more.

kimar · Jan 2, 2011

no, it will not. one is a blog and one a site, two different page designs, this makes them different.

It's 1st time I heard that !!!

Toucan-marketing · Jan 2, 2011

I believe that, however it is not proven... It might as well be true... Does that mean that Article Directories are different too, because of the design? And PR sites?

djbaxter · Jan 2, 2011

Spiders do not read page designs. They only read text. If you have the same content on two pages with different designs, they are duplicates as far as search engine spiders are concerned.

Toucan-marketing · Jan 2, 2011

Thanks for clarifying this! Thought so!

erumsf · Jan 28, 2011

Can you have the same content in your blog and an ezine article.

kbhuffaker · Jan 28, 2011

Original content is always best. The method of creating one unique article and then generating several others from it is called Spinning Your Article. There are software programs that will do this for you. I would suggest that you look into them to make your Article Marketing easier.

The Most Active and Friendliest Affiliate Marketing Community Online!

Duplicate Content: It's Not What You Think!

New Member

<b>Senior Member - SEO Pro<br />Global Moderator</

djbaxter

Guest

New Member

New Member

<b>Senior Member - SEO Pro<br />Global Moderator</

djbaxter

Guest

<b>Senior Member - SEO Pro<br />Global Moderator</

djbaxter

Guest

<b>Senior Member - SEO Pro<br />Global Moderator</

djbaxter

Guest

New Member

<b>Senior Member - SEO Pro<br />Global Moderator</

New Member

New Member

New Member

djbaxter

Guest

New Member

New Member

New Member

Similar threads

The Most Active and Friendliest
Affiliate Marketing Community Online!