The Most Active and Friendliest
Affiliate Marketing Community Online!

“AdsEmpire”/  Direct Affiliate

Volunteers based search project to take on Google?

temi

New Member
affiliate
Though the main idea behind the majestic project is not to compete or take on the search giants. The idea behind it is to use the power of distributed computing, using spare PC and Internet connection of volunteers to index web pages (similar to the way SETI@home uses spare capacity or idle time of computers to search for extra terrestrials).
But given that only sixty volunteers has so far indexed over 7 billion documents, should the volunteers double, in no time, the size of majestic project’s indexed pages will be challenging the databases if the major search engines.

Its worth checking out majestic 12 website at http://www.majestic12.co.uk/
 
But being distributed in its self is not a problem, as long as there is a way to retrieve results when searchers search for information on the engine.
What I think the main problem could be is if many of the servers that hold the database are offline at the same time
 
I couldn't see anyway to retrieve results, which is why I said what I did.

I would imagine that the servers would be classed into normal servers and super hubs. The initial request will go through the superhub, which will load balance it.
 
Nice idea. If I owned that site I would aim at offering the best content search, not the number of URL's crawled. How about designing a search engine without spam pages?
 
To do that you have to define spam, which is what Google are trying to do.

You see, many people complain about all the viagara, vicodin,casino mortgage spam sites, BUT If I carry out a search for 'buy vicodin' I have as much right to get a page about vicodin as the next man has about tom& Gerry.

The whole problem is in defining spam, which is why google use a very complicated algorithm. base 5 sliding log algorithm with 100 elements allowing it in fact to produce 5 BILLION combinations of the same algo!
 
I know, but I would use the spare PC and Internet connection of volunteers not only to index the URL's, but to determine what's spam and what's not.
 
Definition of spam is difficult, I think its almost impossible to design a search engine without some spam
 
hostingspeeds said:
to determine what's spam and what's not.

HOW? This is the problem, HOW do you define spam? If Nike Spam the SE's for the word Nike, do you ban them? people searching for Nike deserve to find Nike.

By the way Nike did this, and Google DID ban them ( for a day) :D

It is not a processing problem it is an algorithmic one.

Google 'hope' to reduce the effects of spamming with their new trustrank system.
 
OWG,
Could you shed more light on this "trustrank system" how does it work, how does it help reduce spam?
Lately I found out that when I type one of my sites name to Google, the whole of the first page and the best part of the second page list my sites with that name, it was not like that about 6 months ago, is this "trustrank system" in action?
 
Ooh Trustrank, well I actually have a really short article I am writing based on a reply I gave to this very question on a forum :D here is the rough bit

OWG's Nutshell Trust rank

First of all you will need to understand hubs and authority sites in the eyes of Google. Sites and pages are clustered by Phrase and topic as part of the Google web mapping. Within these clusters G will identify its current hubs and authorities. Hubs being sites that link out to many of the sites within that cluster, and authorities where the sites within that cluster link to it.

Google will then manually rate these sites with regard trusted content and their trustworthyness.

A prime example would be the BBC. they only ever link out to on topic, hand selected sites (like my rugby forum ) sorry I couldn't resist, sometimes it pays to have friends who also edit for the beeb LOL

More seriously. These seed sites will be given a trust factor which can then be carried (like page rank) through their links.

Most niche trusted sites will carry links to similar content, or links to links on similar content. E.G. Reuters might run a news line, that is picked up by the BBC who rewrite and cite the original Reuters report. CNN might cite the BBC version which is more developed. ALL these sites are interlinking, and all are trusted sites. They might all also link to a website that originally carried the white paper/ allegation/ content.

So lets put it into practice.

I release a niche altruistic website that is well received. The news gets picked up by the BBC who link to it. various other news sites pick up on it and also point links to my niche site. In time other charity and educational sites link to me, as well as church sites. Many of these sites will be either seed trusted sites or linked to by seed sites. The closer the incoming link to my site is to the seed site, the more trustrank my altruism site will get. The further away, the less. By default, if my trustrank hits a level, I might well become a trusted hub. Making links from my site very valuable (in terms of TR not monetary, although this will also be the case, but no one will know the TR to know the value)

Trust rank will (IMO) compound so that the sum of the whole will be greater than the parts. so if a site got links from a seed site it would be worth x, but a site with links from two seed sites could be x +y + compounded trust value.

That is my take on trustrank. So the old adage of build quality content that people will want to link to, means more than ever now.

One last thing, in life there is balance, for every shard of light there is dark, for every white stetson, there is a black un If some sites ARE trusted, either by seed or approval of seed, then some will NOT be. It is therefore possible, that links from untrusted (due to being unknown) sites, might not get any link benefit to pass on? Just a theory.


H0ope that explains it a little better. ;)
 
I don't care what they call it so long as I get more traffic to my sites, I'm happy.


<sorry, no edit made, I hit the edit buton insted of quote :( >
 
MI
Back