Friday, September 24, 2010

History of Search Engines: From 1945 to Google Today

As We May Think (1945):

The concept of hypertext and a memory extension really came to life in July of 1945, when after enjoying the scientific camaraderie that was a side effect of WWII, Vannevar Bush's As We May Think was published in The Atlantic Monthly.
As We May Think.
He urged scientists to work together to help build a body of knowledge for all mankind. Here are a few selected sentences and paragraphs that drive his point home.
Specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.
The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.
A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted.
He not only was a firm believer in storing data, but he also believed that if the data source was to be useful to the human mind we should have it represent how the mind works to the best of our abilities.
Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing. ... Having found one item, moreover, one has to emerge from the system and re-enter on a new path.
The human mind does not work this way. It operates by association. ... Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it. In minor ways he may even improve, for his records have relative permanency.
Presumably man's spirit should be elevated if he can better review his own shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion and not merely become bogged down part way there by overtaxing his limited memory.
He then proposed the idea of a virtually limitless, fast, reliable, extensible, associative memory storage and retrieval system. He named this device a memex.

Gerard Salton (1960s - 1990s):

Gerard Salton, who died on August 28th of 1995, was the father of modern search technology. His teams at Harvard and Cornell developed the SMART informational retrieval system. Salton’s Magic Automatic Retriever of Text included important concepts like the vector space model, Inverse Document Frequency (IDF), Term Frequency (TF), term discrimination values, and relevancy feedback mechanisms.
He authored a 56 page book called A Theory of Indexing which does a great job explaining many of his tests upon which search is still largely based. Tom Evslin posted a blog entry about what it was like to work with Mr. Salton.

Ted Nelson:

Ted Nelson created Project Xanadu in 1960 and coined the term hypertext in 1963. His goal with Project Xanadu was to create a computer network with a simple user interface that solved many social problems like attribution.
While Ted was against complex markup code, broken links, and many other problems associated with traditional HTML on the WWW, much of the inspiration to create the WWW was drawn from Ted's work.
There is still conflict surrounding the exact reasons why Project Xanadu failed to take off.
The Wikipedia offers background and many resource links about Mr. Nelson.

Advanced Research Projects Agency Network:

ARPANet is the network which eventually led to the internet. The Wikipedia has a great background article on ARPANet and Google Video has a free interesting video about ARPANet from 1972.

Archie (1990):(First Search Engine)

Archie.
The first few hundred web sites began in 1993 and most of them were at colleges, but long before most of them existed came Archie. The first search engine created was Archie, created in 1990 by Alan Emtage, a student at McGill University in Montreal. The original intent of the name was "archives," but it was shortened to Archie.
Archie helped solve this data scatter problem by combining a script-based data gatherer with a regular expression matcher for retrieving file names matching a user query. Essentially Archie became a database of web filenames which it would match with the users queries.
Bill Slawski has more background on Archie here.

Veronica & Jughead:

As word of mouth about Archie spread, it started to become word of computer and Archie had such popularity that the University of Nevada System Computing Services group developed Veronica. Veronica served the same purpose as Archie, but it worked on plain text files. Soon another user interface name Jughead appeared with the same purpose as Veronica, both of these were used for files sent via Gopher, which was created as an Archie alternative by Mark McCahill at the University of Minnesota in 1991.

File Transfer Protocol:

Tim Burners-Lee existed at this point, however there was no World Wide Web. The main way people shared data back then was via File Transfer Protocol (FTP).
If you had a file you wanted to share you would set up an FTP server. If someone was interested in retrieving the data they could using an FTP client. This process worked effectively in small groups, but the data became as much fragmented as it was collected.

Tim Berners-Lee & the WWW (1991):

Tim 
Berners-Lee. From the Wikipedia:
While an independent contractor at CERN from June to December 1980, Berners-Lee proposed a project based on the concept of hypertext, to facilitate sharing and updating information among researchers. With help from Robert Cailliau he built a prototype system named Enquire.
After leaving CERN in 1980 to work at John Poole's Image Computer Systems Ltd., he returned in 1984 as a fellow. In 1989, CERN was the largest Internet node in Europe, and Berners-Lee saw an opportunity to join hypertext with the Internet. In his words, "I just had to take the hypertext idea and connect it to the TCP and DNS ideas and — ta-da! — the World Wide Web". He used similar ideas to those underlying the Enquire system to create the World Wide Web, for which he designed and built the first web browser and editor (called WorldWideWeb and developed on NeXTSTEP) and the first Web server called httpd (short for HyperText Transfer Protocol daemon).
The first Web site built was at http://info.cern.ch/ and was first put online on August 6, 1991. It provided an explanation about what the World Wide Web was, how one could own a browser and how to set up a Web server. It was also the world's first Web directory, since Berners-Lee maintained a list of other Web sites apart from his own.
In 1994, Berners-Lee founded the World Wide Web Consortium (W3C) at the Massachusetts Institute of Technology.
Tim also created the Virtual Library, which is the oldest catalogue of the web. Tim also wrote a book about creating the web, titled Weaving the Web.

What is a Bot?

Robot Spider.
Computer robots are simply programs that automate repetitive tasks at speeds impossible for humans to reproduce. The term bot on the internet is usually used to describe anything that interfaces with the user or that collects data.
Search engines use "spiders" which search (or spider) the web for information. They are software programs which request pages much like regular browsers do. In addition to reading the contents of pages for indexing spiders also record links.
  • Link citations can be used as a proxy for editorial trust.
  • Link anchor text may help describe what a page is about.
  • Link co citation data may be used to help determine what topical communities a page or website exist in.
  • Additionally links are stored to help search engines discover new documents to later crawl.
Another bot example could be Chatterbots, which are resource heavy on a specific topic. These bots attempt to act like a human and communicate with humans on said topic.

Parts of a Search Engine:

Search engines consist of 3 main parts. Search engine spiders follow links on the web to request pages that are either not yet indexed or have been updated since they were last indexed. These pages are crawled and are added to the search engine index (also known as the catalog). When you search using a major search engine you are not actually searching the web, but are searching a slightly outdated index of content which roughly represents the content of the web. The third part of a search engine is the search interface and relevancy software. For each search query search engines typically do most or all of the following
  • Accept the user inputted query, checking to match any advanced syntax and checking to see if the query is misspelled to recommend more popular or correct spelling variations.
  • Check to see if the query is relevant to other vertical search databases (such as news search or product search) and place relevant links to a few items from that type of search query near the regular search results.
  • Gather a list of relevant pages for the organic search results. These results are ranked based on page content, usage data, and link citation data.
  • Request a list of relevant ads to place near the search results.
Searchers generally tend to click mostly on the top few search results, as noted in this article by Jakob Nielsen, and backed up by this search result eye tracking study.

Want to learn more about how search engines work?

Types of Search Queries:

Andrei Broder authored A Taxonomy of Web Search [PDF], which notes that most searches fall into the following 3 categories:
  • Informational - seeking static information about a topic
  • Transactional - shopping at, downloading from, or otherwise interacting with the result
  • Navigational - send me to a specific URL

Improve Your Searching Skills:

Want to become a better searcher? Most large scale search engines offer:
  • Advanced search pages which help searchers refine their queries to request files which are newer or older, local or in nature, from specific domains, published in specific formats, or other ways of refining search, for example the ~ character means related to Google.
  • Vertical search databases which may help structure the information index or limit the search index to a more trusted or better structured collection of sources, documents, and information.
Nancy Blachman's Google Guide offers searchers free Google search tips, and Greg R.Notess's Search Engine Showdown offers a search engine features chart.
There are also many popular smaller vertical search services. For example, Del.icio.us allows you to search URLs that users have bookmarked, and Technorati allows you to search blogs.

World Wide Web Wanderer:

Soon the web's first robot came. In June 1993 Matthew Gray introduced the World Wide Web Wanderer. He initially wanted to measure the growth of the web and created this bot to count active web servers. He soon upgraded the bot to capture actual URL's. His database became knows as the Wandex.
The Wanderer was as much of a problem as it was a solution because it caused system lag by accessing the same page hundreds of times a day. It did not take long for him to fix this software, but people started to question the value of bots.

ALIWEB:

In October of 1993 Martijn Koster created Archie-Like Indexing of the Web, or ALIWEB in response to the Wanderer. ALIWEB crawled meta information and allowed users to submit their pages they wanted indexed with their own page description. This meant it needed no bot to collect data and was not using excessive bandwidth. The downside of ALIWEB is that many people did not know how to submit their site.

Robots Exclusion Standard:

Martijn Kojer also hosts the web robots page, which created standards for how search engines should index or not index content. This allows webmasters to block bots from their site on a whole site level or page by page basis.
By default, if information is on a public web server, and people link to it search engines generally will index it.
In 2005 Google led a crusade against blog comment spam, creating a nofollow attribute that can be applied at the individual link level. After this was pushed through Google quickly changed the scope of the purpose of the link nofollow to claim it was for any link that was sold or not under editorial control.

Primitive Web Search:

By December of 1993, three full fledged bot fed search engines had surfaced on the web: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider. JumpStation gathered info about the title and header from Web pages and retrieved these using a simple linear search. As the web grew, JumpStation slowed to a stop. The WWW Worm indexed titles and URL's. The problem with JumpStation and the World Wide Web Worm is that they listed results in the order that they found them, and provided no discrimination. The RSBE spider did implement a ranking system.
Since early search algorithms did not do adequate link analysis or cache full page content if you did not know the exact name of what you were looking for it was extremely hard to find it.

Excite:

Excite.
Excite came from the project Architext, which was started by in February, 1993 by six Stanford undergrad students. They had the idea of using statistical analysis of word relationships to make searching more efficient. They were soon funded, and in mid 1993 they released copies of their search software for use on web sites.
Excite was bought by a broadband provider named @Home in January, 1999 for $6.5 billion, and was named Excite@Home. In October, 2001 Excite@Home filed for bankruptcy. InfoSpace bought Excite from bankruptcy court for $10 million.

Web Directories:

VLib:

The 
Virtual Library.
When Tim Berners-Lee set up the web he created the Virtual Library, which became a loose confederation of topical experts maintaining relevant topical link lists.

EINet Galaxy

Galaxy.
The EINet Galaxy web directory was born in January of 1994. It was organized similar to how web directories are today. The biggest reason the EINet Galaxy became a success was that it also contained Gopher and Telnet search features in addition to its web search feature. The web size in early 1994 did not really require a web directory; however, other directories soon did follow.

Yahoo! Directory

In April 1994 David Filo and Jerry Yang created the Yahoo! Directory as a collection of their favorite web pages. As their number of links grew they had to reorganize and become a searchable directory. What set the directories above The Wanderer is that they provided a human compiled description with each URL. As time passed and the Yahoo! Directory grew Yahoo! began charging commercial sites for inclusion. As time passed the inclusion rates for listing a commercial site increased. The current cost is $299 per year. Many informational sites are still added to the Yahoo! Directory for free.

Open Directory Project

The 
Open Directory Project. In 1998 Rich Skrenta and a small group of friends created the Open Directory Project, which is a directory which anybody can download and use in whole or part. The ODP (also known as DMOZ) is the largest internet directory, almost entirely ran by a group of volunteer editors. The Open Directory Project was grown out of frustration webmasters faced waiting to be included in the Yahoo! Directory. Netscape bought the Open Directory Project in November, 1998. Later that same month AOL announced the intention of buying Netscape in a $4.5 billion all stock deal.

LII

LII.
Google offers a librarian newsletter to help librarians and other web editors help make information more accessible and categorize the web. The second Google librarian newsletter came from Karen G. Schneider, who is the director of Librarians' Internet Index. LII is a high quality directory aimed at librarians. Her article explains what she and her staff look for when looking for quality credible resources to add to the LII. Most other directories, especially those which have a paid inclusion option, hold lower standards than selected limited catalogs created by librarians.
The Internet Public Library is another well kept directory of websites.

Business.com

Business.com Directory.
Due to the time intensive nature of running a directory, and the general lack of scalability of a business model the quality and size of directories sharply drops off after you get past the first half dozen or so general directories. There are also numerous smaller industry, vertically, or locally oriented directories. Business.com, for example, is a directory of business websites.

Looksmart

Looksmart.
Looksmart was founded in 1995. They competed with the Yahoo! Directory by frequently increasing their inclusion rates back and forth. In 2002 Looksmart transitioned into a pay per click provider, which charged listed sites a flat fee per click. That caused the demise of any good faith or loyalty they had built up, although it allowed them to profit by syndicating those paid listings to some major portals like MSN. The problem was that Looksmart became too dependant on MSN, and in 2003, when Microsoft announced they were dumping Looksmart that basically killed their business model.
In March of 2002, Looksmart bought a search engine by the name of WiseNut, but it never gained traction. Looksmart also owns a catalog of content articles organized in vertical sites, but due to limited relevancy Looksmart has lost most (if not all) of their momentum. In 1998 Looksmart tried to expand their directory by buying the non commercial Zeal directory for $20 million, but on March 28, 2006 Looksmart shut down the Zeal directory, and hope to drive traffic using Furl, a social bookmarking program.

Search Engines vs Directories:

All major search engines have some limited editorial review process, but the bulk of relevancy at major search engines is driven by automated search algorithms which harness the power of the link graph on the web. In fact, some algorithms, such as TrustRank, bias the web graph toward trusted seed sites without requiring a search engine to take on much of an editorial review staff. Thus, some of the more elegant search engines allow those who link to other sites to in essence vote with their links as the editorial reviewers.
Unlike highly automated search engines, directories are manually compiled taxonomies of websites. Directories are far more cost and time intensive to maintain due to their lack of scalability and the necessary human input to create each listing and periodically check the quality of the listed websites.
General directories are largely giving way to expert vertical directories, temporal news sites (like blogs), and social bookmarking sites (like del.ici.ous). In addition, each of those three publishing formats I just mentioned also aid in improving the relevancy of major search engines, which further cuts at the need for (and profitability of) general directories.

WebCrawler:

WebCrawler.
Brian Pinkerton of the University of Washington released WebCrawler on April 20, 1994. It was the first crawler which indexed entire pages. Soon it became so popular that during daytime hours it could not be used. AOL eventually purchased WebCrawler and ran it on their network. Then in 1997, Excite bought out WebCrawler, and AOL began using Excite to power its NetFind. WebCrawler opened the door for many other services to follow suit. Within 1 year of its debuted came Lycos, Infoseek, and OpenText.

Lycos:

Lycos.
Lycos was the next major search development, having been design at Carnegie Mellon University around July of 1994. Michale Mauldin was responsible for this search engine and remains to be the chief scientist at Lycos Inc.
On July 20, 1994, Lycos went public with a catalog of 54,000 documents. In addition to providing ranked relevance retrieval, Lycos provided prefix matching and word proximity bonuses. But Lycos' main difference was the sheer size of its catalog: by August 1994, Lycos had identified 394,000 documents; by January 1995, the catalog had reached 1.5 million documents; and by November 1996, Lycos had indexed over 60 million documents -- more than any other Web search engine. In October 1994, Lycos ranked first on Netscape's list of search engines by finding the most hits on the word ‘surf.’.

Infoseek:

Infoseek also started out in 1994, claiming to have been founded in January. They really did not bring a whole lot of innovation to the table, but they offered a few add on's, and in December 1995 they convinced Netscape to use them as their default search, which gave them major exposure. One popular feature of Infoseek was allowing webmasters to submit a page to the search index in real time, which was a search spammer's paradise.

AltaVista:

AltaVista.
AltaVista debut online came during this same month. AltaVista brought many important features to the web scene. They had nearly unlimited bandwidth (for that time), they were the first to allow natural language queries, advanced searching techniques and they allowed users to add or delete their own URL within 24 hours. They even allowed inbound link checking. AltaVista also provided numerous search tips and advanced search features.
Due to poor mismanagement, a fear of result manipulation, and portal related clutter AltaVista was largely driven into irrelevancy around the time Inktomi and Google started becoming popular. On February 18, 2003, Overture signed a letter of intent to buy AltaVista for $80 million in stock and $60 million cash. After Yahoo! bought out Overture they rolled some of the AltaVista technology into Yahoo! Search, and occasionally use AltaVista as a testing platform.

Inktomi:

Inktomi.
The Inktomi Corporation came about on May 20, 1996 with its search engine Hotbot. Two Cal Berkeley cohorts created Inktomi from the improved technology gained from their research. Hotwire listed this site and it became hugely popular quickly.
In October of 2001 Danny Sullivan wrote an article titled Inktomi Spam Database Left Open To Public, which highlights how Inktomi accidentally allowed the public to access their database of spam sites, which listed over 1 million URLs at that time.
Although Inktomi pioneered the paid inclusion model it was nowhere near as efficient as the pay per click auction model developed by Overture. Licensing their search results also was not profitable enough to pay for their scaling costs. They failed to develop a profitable business model, and sold out to Yahoo! for approximately $235 million, or $1.65 a share, in December of 2003.

Ask.com (Formerly Ask Jeeves):

Ask 
Jeeves.
In April of 1997 Ask Jeeves was launched as a natural language search engine. Ask Jeeves used human editors to try to match search queries. Ask was powered by DirectHit for a while, which aimed to rank results based on their popularity, but that technology proved to easy to spam as the core algorithm component. In 2000 the Teoma search engine was released, which uses clustering to organize sites by Subject Specific Popularity, which is another way of saying they tried to find local web communities. In 2001 Ask Jeeves bought Teoma to replace the DirectHit search technology.
Jon Kleinberg's Authoritative sources in a hyperlinked environment [PDF] was a source of inspiration what lead to the eventual creation of Teoma. Mike Grehan's Topic Distillation [PDF] also explains how subject specific popularity works.
On Mar 4, 2004, Ask Jeeves agreed to acquire Interactive Search Holdings for 9.3 million shares of common stock and options and pay $150 million in cash. On March 21, 2005 Barry Diller's IAC agreed to acquire Ask Jeeves for 1.85 billion dollars. IAC owns many popular websites like Match.com, Ticketmaster.com, and Citysearch.com, and is promoting Ask across their other properties. In 2006 Ask Jeeves was renamed to Ask, and they killed the separate Teoma brand.

AllTheWeb

AllTheWeb.
AllTheWeb was a search technology platform launched in May of 1999 to showcase Fast's search technologies. They had a sleek user interface with rich advanced search features, but on February 23, 2003, AllTheWeb was bought by Overture for $70 million. After Yahoo! bought out Overture they rolled some of the AllTheWeb technology into Yahoo! Search, and occasionally use AllTheWeb as a testing platform.

Meta Search Engines

Most meta search engines draw their search results from multiple other search engines, then combine and rerank those results. This was a useful feature back when search engines were less savvy at crawling the web and each engine had a significantly unique index. As search has improved the need for meta search engines has been reduced.
Hotbot was owned by Wired, had funky colors, fast results, and a cool name that sounded geeky, but died off not long after Lycos bought it and ignored it. Upon rebirth it was born as a meta search engine. Unlike most meta search engines, Hotbot only pulls results from one search engine at a time, but it allows searchers to select amongst a few of the more popular search engines on the web. Currently Dogpile, owned by Infospace, is probably the most popular meta search engine on the market, but like all other meta search engines, it has limited market share.
One of the larger problems with meta search in general is that most meta search engines tend to mix pay per click ads in their organic search results, and for some commercial queries 70% or more of the search results may be paid results. I also created Myriad Search, which is a free open source meta search engine without ads.

Vertical Search

The major search engines are fighting for content and marketshare in verticals outside of the core algorithmic search product. For example, both Yahoo and MSN have question answering services where humans answer each other's questions for free. Google has a similar offering, but question answerers are paid for their work.
Google, Yahoo, and MSN are also fighting to become the default video platform on the web, which is a vertical where an upstart named YouTube also has a strong position.
Yahoo and Microsoft are aligned on book search in a group called the Open Content Alliance. Google, going it alone in that vertical, offers a proprietary Google Book search.
All three major search engines provide a news search service. Yahoo! has partnered with some premium providers to allow subscribers to include that content in their news search results. Google has partnered with the AP and a number of other news sources to extend their news database back over 200 years. And Topix.net is a popular news service which sold 75% of its ownership to 3 of the largest newspaper companies. Thousands of weblogs are updated daily reporting the news, some of which are competing with (and beating out) the mainstream media. If that were not enough options for news, social bookmarking sites like Del.icio.us frequently update recently popular lists, there are meme trackers like Techmeme that track the spread of stories through blogs, and sites like Digg allow their users to directly vote on how much exposure each item gets.
Google also has a Scholar search program which aims to make scholarly research easier to do.
In some verticals, like shopping search, other third party players may have significant marketshare, gained through offline distribution and branding (for example, yellow pages companies), or gained largely through arbitraging traffic streams from the major search engines.
On November 15, 2005 Google launched a product called Google Base, which is a database of just about anything imaginable. Users can upload items and title, describe, and tag them as they see fit. Based on usage statistics this tool can help Google understand which vertical search products they should create or place more emphasis on. They believe that owning other verticals will allow them to drive more traffic back to their core search service. They also believe that targeted measured advertising associated with search can be carried over to other mediums. For example, Google bought dMarc, a radio ad placement firm. Yahoo! has also tried to extend their reach by buying other high traffic properties, like the photo sharing site Flickr, and the social bookmarking site del.icio.us.
After a couple years of testing, on May 5th, 2010 Google unveiled a 3 column search result layout which highlights many vertical search options in the left rail.

Search Engine Marketing

Search engine marketing is marketing via search engines, done through organic search engine optimization, paid search engine advertising, and paid inclusion programs.

Paid Inclusion

As mentioned earlier, many general web directories charge a one time flat fee or annually recurring rate for listing commercial sites. Many shopping search engines charge a flat cost per click rate to be included in their databases.
As far as major search engines go, Inktomi popularized the paid inclusion model. They were bought out by Yahoo in December of 2003. After Yahoo dropped Google and rolled out their own search technology they continued to offer a paid inclusion program to list sites in their regular search results, but Yahoo Search Submit was ended at the end of 2009.

Pay Per Click

Pay per click ads allow search engines to sell targeted traffic to advertisers on a cost per click basis. Typically pay per click ads are keyword targeted, but in some cases, some engines may also add in local targeting, behavioral targeting, or allow merchants to bid on traffic streams based on demographics as well.
Pay per click ads are typically sold in an auction where the highest bidder ranks #1 for that keyword. Some engines, like Google and Microsoft, also factor ad clickthrough rate into the click cost. Doing so ensures their ads get clicked on more frequently, and that their advertisements are more relevant. A merchant who writes compelling ad copy and gets a high CTR will be allowed to pay less per click to receive traffic.
In 1996 an 18-year-old college dropout named Scott Banister came up with the idea of charging search advertisers by the click with ads tied to the search keyword. He promoted it to the likes of Yahoo!, but their (lack of) vision was corrupted by easy money, so they couldn't see the potential of search. The person who finally ran with Mr. Banister's idea was IdeaLab's Bill Gross.

Overture (Formerly GoTo)

Overture.
Overture, the pioneer in paid search, was originally launched by Bill Gross under the name GoTo in 1998. His idea was to arbitrage traffic streams and sell them with a level of accountability. John Battelle's The Search has an entertaining section about Bill Gross and the formation of overture. John also published that section on his blog.
“The more I [thought about it], the more I realized that the true value of the Internet was in its accountability,” Gross tells me. “Performance guarantees had to be the model for paying for media.”
Gross knew offering virtually risk-free clicks in an overheated and ravenous market ensured GoTo would takeoff. And while it would be easy to claim that GoTo worked because of the Internet bubble’s ouroboros-like hunger for traffic, the company managed to outlast the bust for one simple reason: it worked.
While Overture was wildly successful, it had two major downfalls which prevented them from taking Google's market position:
  • Destination Branding: Google allowed itself to grow into a search destination. Bill Gross decided not to grow Overture into one because he feared that would cost him distribution partnerships. When AOL selected Google as an ad partner, in spite of Google also growing out their own brand, that pretty much was the nails in the coffin for Overture being the premiere search ad platform.
  • Ad Network Efficiency: Google AdWords factors ad clickthrough rate into their ad costs, which ensures higher relevancy and more ad network efficiency. As of September 2006 the Overture platform (then known as Yahoo! Search Marketing) still did not fix that problem.
Those two faults meant that Overture was heavily reliant on it's two largest distribution partners - Yahoo! and Microsoft. Overture bought out AltaVista and AllTheWeb to try to win some leverage, but ultimately they sold out to Yahoo! on July 14, 2003 for $1.63 billion.

Google AdWords

Google AdWords launched in 2000. The initial version was a failure because it priced ads on a flat CPM model. Some keywords were overpriced and unaffordable, while others were sold inefficiently at too cheap of a price. In February of 2002, Google relaunched AdWords selling the ads in an auction similar to Overture's, but also adding ad clickthrough rate in as a factor in the ad rankings.
Affiliates and other web entrepreneurs quickly took to AdWords because the precise targeting and great reach made it easy to make great profits from the comfort of your own home, while sitting in your underwear :)
Over time, as AdWords became more popular and more mainstream marketers adopted it, Google began closing some holes in their AdWords product. For example, to fight off noise and keep their ads as relevant as possible, they disallowed double serving of ads to one website. Later they started looking at landing page quality and establishing quality based minimum pricing, which squeezed the margins of many small arbitrage and affiliate players.
Google intends to take the trackable ad targeting allowed by AdWords and extend it into other mediums. Google has already tested print and newspaper ads. Google allows advertisers to buy graphic or video ads on content websites. On January 17, 2006, Google announced they bought dMarc Broadcasting, which is a company they will use to help Google sell targeted radio ads.
On September 15, 2006, Google partnered with Intuit to allow small businesses using QuickBooks to buy AdWords from within QuickBooks. The goal is to help make local ads more relevant by getting more small businesses to use AdWords.
On March 20, 2007, Google announced they were beta testing creating a distributed pay per action affiliate ad network. On April 13, 2007 Google announced the purchase of DoubleClick for $3.1 billion.

Google AdSense

On March 4, 2003 Google announced their content targeted ad network. In April 2003, Google bought Applied Semantics, which had CIRCA technology that allowed them to drastically improve the targeting of those ads. Google adopted the name AdSense for the new ad program.
AdSense allows web publishers large and small to automate the placement of relevant ads on their content. Google initially started off by allowing textual ads in numerous formats, but eventually added image ads and video ads. Advertisers could chose which keywords they wanted to target and which ad formats they wanted to market.
To help grow the network and make the market more efficient Google added a link which allows advertisers to sign up for AdWords account from content websites, and Google allowed advertisers to buy ads targeted to specific websites, pages, or demographic categories. Ads targeted on websites are sold on a cost per thousand impression (CPM) basis in an ad auction against other keyword targeted and site targeted ads.
Google also allows some publishers to place AdSense ads in their feeds, and some select publishers can place ads in emails.
To prevent the erosion of value of search ads Google allows advertisers to opt out of placing their ads on content sites, and Google also introduced what they called smart pricing. Smart pricing automatically adjusts the click cost of an ad based on what Google perceives a click from that page to be worth. An ad on a digital camera review page would typically be worth more than a click from a page with pictures on it.
Google was secretive about its revenue share since the inception of AdSense, but due to a lawsuit in Italy Google feared they would be stuck disclosing their revenue share, so they decided to do so publicly for good public relations on May 24, 2010. Google keeps 32% while giving publishers 68% of contextual ad revenues. On search ads Google keeps 49% and gives publishers 51%. Some premium publishers are able to negotiate higher rates & custom integration options as well.

Yahoo! Search Marketing

Yahoo! Search Marketing is the rebranded name for Overture after Yahoo! bought them out. As of September 2006 their platform is generally the exact same as the old Overture platform, with the same flaws - ad CTR not factored into click cost, it's hard to run local ads, and it is just generally clunky.

Microsoft AdCenter

In 2000 Microsoft launched a keyword driven ad program called keywords, but shut it down after 2 months because they feared it would cannibalize their banner ad revenues.
Microsoft AdCenter was launched on May 3. 2006. While Microsoft has limited marketshare, they intend to increase their marketshare by baking search into Internet Explorer 7. On the features front, Microsoft added demographic targeting and dayparting features to the pay per click mix. Microsoft's ad algorithm includes both cost per click and ad clickthrough rate.
Microsoft also created the XBox game console, and on May 4, 2006 announced they bought a video game ad targeting firm named Massive Inc. Eventually video game ads will be sold from within Microsoft AdCenter.

Search Engine Optimization

What is SEO?

Search engine optimization is the art and science of publishing information in a format which will make search engines believe that your content satisfies the needs of their users for relevant search queries. SEO, like search, is a field much older than I am. In fact, it was not originally even named search engine optimization, and to this day most people are still uncertain where that phrase came from.

Early SEO

Early search engine optimization consisted mostly of using descriptive file names, page titles, and meta descriptions. As search advanced on the page factors grew more important and then people started trying to aim for specific keyword densities.

Link Analysis

One of the big things that gave Google an advantage over their competitors was the introduction of PageRank, which graded the value of a page based on the number and quality of links pointing at it. Up until the end of 2003 search was exceptionally easy to manipulate. If you wanted to rank for something all you had to do was buy a few powerful links and place the words you wanted to rank for in the link anchor text.

Search Gets More Sophisticated

On November 15, 2003 Google began to heavily introduce many more semantic elements into its search product. Researchers and SEO's alike have noticed wild changes in search relevancy during that update and many times since then, but many searchers remain clueless to the changes.
Search engines would prefer to bias search results toward informational resources to make the commercial ads on the search results appear more appealing. You can see an example of how search can be biased toward commercial or informational resources by playing with Yahoo! Mindset.

Curbing Link Spam

On January 18, 2005, Google, MSN, and Yahoo! announced the release of a NoFollow tag which allows blog owners to block comment spam from passing link popularity. People continued to spam blogs and other resources, largely because search engines may still count some nofollow links, and largely because many of the pages they spammed still rank.
Since 2003 Google has came out with many advanced filters and crawling patterns to help make quality editorial links count more and depreciate the value of many overtly obvious paid links or other forms of link manipulation.

Historical, Editorial, & Usage Data

Older websites may be given more trust in relevancy algorithms than newer websites (just existing for a period of time is a signal of quality). All major search engines use human editors to help review content quality and help improve their relevancy algorithms. Search engines may factor in user acceptance and other usage data to help determine if a site needs reviewed for editorial quality and to help determine if linkage data is legitimate.
Google has also heavily pushed giving away useful software, tools, and services which allow them to personalize search results based on the searcher's historical preferences.

Self Reinforcing Market Positions

In many verticals search is self reinforcing, as in a winner take most battle. Jakob Nielsen's The Power of Defaults notes that the top search result is clicked on as often as 42% of the time. Not only is the distribution and traffic stream highly disproportionate, but many people tend to link to the results that were easy to find, which makes the system even more self reinforcing, as noted in Mike Grehan's Filthy Linking Rich.
A key thing to remember if you are trying to catch up with another website is that you have to do better than what was already done, and significantly enough better that it is comment worthy or citation worthy. You have to make people want to switch their world view to seeing you as an authority on your topic. Search engines will follow what people think.

Hypocrisy in Search

Google engineer Matt Cutts frequently comments that any paid link should have the nofollow attribute applied to it, although Google hypocritically does not place the nofollow attribute on links they buy. They also have placed their ads on the leading Warez site and continued to serve ads on sites that they banned for spamming. Yahoo! Shopping has also been known to be a big link buyer.
Much of the current search research is based upon the view that any form of marketing / promotion / SEO is spam. If that was true, it wouldn't make sense that Google is teaching SEO courses, which they do.

Google

Google from 1998.

Early Years

Google's corporate history page has a pretty strong background on Google, starting from when Larry met Sergey at Stanford right up to present day. In 1995 Larry Page met Sergey Brin at Stanford.
By January of 1996, Larry and Sergey had begun collaboration on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website. Larry, who had always enjoyed tinkering with machinery and had gained some notoriety for building a working printer out of Lego™ bricks, took on the task of creating a new kind of server environment that used low-end PCs instead of big expensive machines. Afflicted by the perennial shortage of cash common to graduate students everywhere, the pair took to haunting the department's loading docks in hopes of tracking down newly arrived computers that they could borrow for their network.
A year later, their unique approach to link analysis was earning BackRub a growing reputation among those who had seen it. Buzz about the new search technology began to build as word spread around campus.
BackRub ranked pages using citation notation, a concept which is popular in academic circles. If someone cites a source they usually think it is important. On the web, links act as citations. In the PageRank algorithm links count as votes, but some votes count more than others. Your ability to rank and the strength of your ability to vote for others depends upon your authority: how many people link to you and how trustworthy those links are.
In 1998, Google was launched. Sergey tried to shop their PageRank technology, but nobody was interested in buying or licensing their search technology at that time.

Winning the Search War

Later that year Andy Bechtolsheim gave them $100,000 seed funding, and Google received $25 million Sequoia Capital and Kleiner Perkins Caufield & Byers the following year. In 1999 AOL selected Google as a search partner, and Yahoo! followed suit a year later. In 2000 Google also launched their popular Google Toolbar. Google gained search market share year over year ever since.
In 2000 Google relaunched their AdWords program to sell ads on a CPM basis. In 2002 they retooled the service, selling ads in an auction which would factor in bid price and ad clickthrough rate. On May 1, 2002, AOL announced they would use Google to deliver their search related ads, which was a strong turning point in Google's battle against Overture.
In 2003 Google also launched their AdSense program, which allowed them to expand their ad network by selling targeted ads on other websites.

Going Public

Google used a two class stock structure, decided not to give earnings guidance, and offered shares of their stock in a Dutch auction. They received virtually limitless negative press for the perceived hubris they expressed in their "AN OWNER'S MANUAL" FOR GOOGLE'S SHAREHOLDERS. After some controversy surrounding an interview in Playboy, Google dropped their IPO offer range from $85 to $95 per share from $108 to $135. Google went public at $85 a share on August 19, 2004 and its first trade was at 11:56 am ET at $100.01.

Verticals Galore!

In addition to running the world's most popular search service, Google also runs a large number of vertical search services, including:

Just Search, We Promise!

Google's corporate mission statement is:
Google's mission is to organize the world's information and make it universally accessible and useful.
However that statement includes many things outside of the traditional mindset of search, and Google maintains that ads are a type of information. This other information includes:

Paying for Distribution

In addition to having strong technology and a strong brand Google also pays for a significant portion of their search market share.
On December 20, 2005 Google invested $1 billion in AOL to continue their partnership and buy a 5% stake in AOL. In February 2006 Google agreed to pay Dell up to $1 billion for 3 years of toolbar distribution. On August 7, 2006, Google signed a 3 year deal to provide search on MySpace for $900 million. On October 9, 2006 Google bought YouTube, a leading video site, for $1.65 billion in stock.
Google also pays Mozilla and Opera hundreds of millions of dollars to be the default search provider in their browsers, bundles their Google Toolbar with software from Adobe and Sun Microsystems, and pays AdSense ad publishers $1 for Firefox + Google Toolbar installs, or up to $2 for Google Pack installs.
Google also builds brand exposure by placing Ads by Google on their AdSense ads and providing Google Checkout to commercial websites.
Google Pack is a package of useful software including a Google Toolbar and software from many other companies. At the same time Google helps ensure its toolbar is considered good and its competitors don't use sleazy distribution techniques by sponsoring StopBadware.org.
Google's distribution, vertical search products, and other portal elements give it a key advantage in best understanding our needs and wants by giving them the largest Database of Intentions.

Editorial Partnerships

They have moved away from a pure algorithmic approach to a hybrid editorial approach. In April of 2007, Google started mixing recent news results in their organic search results. After Google bought YouTube they started mixing videos directly in Google search results.

Webmaster Communication

Since the Florida update in 2003 Google has looked much deeper into linguistics and link filtering. Google's search results are generally the hardest search results for the average webmaster to manipulate.
Matt Cutts, Google's lead engineer in charge of search quality, regularly blogs about SEO and search. Google also has an official blog and has blogs specific to many of their vertical search products.
On November 10, 2004, Google opened up their Google Advertising Professional program.
Google also helps webmasters understand how Google is indexing their site via Google Webmaster Central. Google continues to add features and data to their webmaster console for registered webmasters while obfuscating publicly available data.
For an informal look at what working at Google looked like from the inside from 1999 to 2005 you might want to try Xooglers, a blog by former Google brand manager Doug Edwards.

Information Retrieval as a Game of Mind Control

In October of 2007 Google attempted to manipulate the public perception of people buying and selling links by announcing that they were going to penalize known link sellers, and then manually editing the toolbar PageRank scores of some well known blogs and other large sites. These PageRank edits did not change search engine rankings or traffic flows, as the PageRank update was entirely aesthetic.

Yahoo!

1995 
Yahoo! Directory.

Getting Into Search

Yahoo! was founded in 1994 by David Filo and Jerry Yang as a directory of websites. For many years they outsourced their search service to other providers, considering it secondary to their directory and other content features, but by the end of 2002 they realized the importance and value of search and started aggressively acquiring search companies.
Overture purchased AllTheWeb and AltaVista in 2003. Yahoo! purchased Inktomi in December, 2002, and then consumed Overture in July, 2003, and combined the technologies from the various search companies they bought to make a new search engine. Yahoo! dumped Google in favor of their own in house technology on February 17, 2004.

Getting Social

In addition to building out their core algorithmic search product, Yahoo! has largely favored the concept of social search.
On March 20, 2005 Yahoo! purchased Flickr, a popular photo sharing site. On December 9, 2005, Yahoo! purchased Del.icio.us, a social bookmarking site. Yahoo! has also made a strong push to promote Yahoo! Answers, a popular free community driven question answering service.
Yahoo! has a cool Netrospective of their first 10 years, a brief overview of their corporate history here, and Bill Slawski posted a list of many of the companies Yahoo! consumed since Overture.
On July 2, 2007, Yahoo! launched their behaviorally targeted SmartAds product.
On July 29, 2009, Yahoo! decided to give up on search and signed a 10 year deal to syndicate Bing ads and algorithmic results on their website.

Microsoft

In 1998 MSN Search was launched, but Microsoft did not get serious about search until after Google proved the business model. Until Microsoft saw the light they primarily relied on partners like Overture, Looksmart, and Inktomi to power their search service.
They launched their technology preview of their search engine around July 1st of 2004. They formally switched from Yahoo! organic search results to their own in house technology on January 31st, 2005. MSN announced they dumped Yahoo!'s search ad program on May 4th, 2006.
Live Search.
On September 11, 2006, Microsoft announced they were launching their Live Search product.
Bing.
On June 1, 2009, Microsoft launched Bing, a new search service which changed the search landscape by placing inline search suggestions for related searches directly in the result set. For instance, when you search for credit cards they will suggest related phrases like
  • credit card types
  • apply for credit cards
  • credit cards for bad credit
  • advice on credit cards
Microsoft released a Bing SEO guide for Webmasters [PDF] which claimed that the additional keyword suggestions helped pull down search demand to lower listed results when compared against the old results 6 through 10 when using a single linear search result set. Conversely, the Google format tends to concentrate attention on the top few search listings. After extensive eye tracking Gord Hotchkiss named this pattern Google's Golden Triangle.
Eye Tracking Studies.

Other Engines

One would be foolish to think that there is not a better way to index the web, and a new creative idea is probably just under our noses. The fact that Microsoft is making a large investment into developing a new search technology should be some cause for concern for other major search engines.
Through this course of history many smaller search engines have came and went, as the search industry has struggled to find a balance between profitability and relevancy. Some of the newer search engine concepts are web site clustering, semantics, and having industry specific smaller search engines / portals, but search may get attacked from entirely different angles.
On October 5, 2004 Bill Gross ( the founder of Overture and pioneer of paid search) relaunched Snap as a search engine with a completely transparent business model (showing search volumes, revenues, and advertisers). Snap has many advanced sorting features but it may be a bit more than what most searchers were looking for. People tend to like search for the perceived simplicity, even if the behind the scenes process is quite complex.
Outside of technology there are four other frontiers search is being attacked / commoditized from
  • Browser & Software Distribution: Search companies are paying computer manufacturers or software companies an aggregated value of hundreds of millions or billions of dollars each year to bundle their search toolbar with their products.
  • Social Search: Large social networks have significant reach and a ton of page views. Yahoo! is rumored to be entertaining buying social network Facebook nearly a billion dollars. Yahoo! has already bought social picture site Flickr and social bookmarking site Del.icio.us. In August of 2006 Google signed a 3 year $900 million contract to provide search and advertising on MySpace.

    In addition some companies, like Eurekster, are trying to create products which allow groups of webmasters to make topic or community specific search services.
  • Content Providers: Some content providers are trying to publish content on their own domains and build off their brand. Some are refusing to be included in search indexes. Some are requiring a kickback to be indexed. Some are unsure of what they want and are choosing to sue search engines, either for further brand exposure, or to gain further negotiation leverage.
  • Content Aggregators: Search is just one way of finding information. Via RSS feeds and various other technologies many sites are offering what some people consider persistent search, or a way to access any information about a specific topic as it becomes available. Google also bought YouTube for $1.65 in stock. YouTube consists largely of pirated content which Google can organize and publish ads against based on usage data and other forms of ad targeting.

Search & Legal Issues

Privacy

In 2005 the DoJ obtained search data from AOL, MSN, and Yahoo!. Google denied the request, and was sued for search data in January of 2006. Google beat the lawsuit and was only required to hand over a small sample of data.
In August of 2006 AOL Research released over 3 months worth of personal search data by 650,000 AOL users. A NYT article identified one of the searchers by name. In 2007 the European Union aggressively probed search companies aiming to limit data retention and maintain searcher privacy rights.

Publishing & Copyright Lawsuits

As more people create content attention is becoming more scarce. Due to The Tragedy of the Commons many publishing businesses and business models will die. Many traditional publishing companies enjoyed the profits enabled by running what was essentially regionally based monopolies. Search, and other forms of online media, allow for better targeting and less wasteful / more efficient business models. Due to growing irrelevancy, a fear of change, and a fear of disintermediation, many traditional publishing companies have fought search.
In an interview by Danny Sullivan, Eric Schmidt stated he thought many of the lawsuits Google face are business deals done in a court room.

Newspapers

In September of 2006 some Belgian newspaper companies won a copyright lawsuit against Google News which makes Belgium judges look like they do not understand how search or the internet work. Some publisher groups are trying to create an arbitrary information access protocol, Agence France Presse (AFP) sued Google to get them to drop their news coverage, and Google paid the AP a licensing fee.

Books

In September of 2005 the Authors Guild sued Google. In October of 2005 major book publishing houses also sued Google over Google Print.

Photos

Perfect 10, a pornography company, sued Google for including cached copies of stolen content in their image index, and for allowing publishers to collect income on stolen copyright content via Google AdSense.

Access to Hate Information

In May of 2000 a French judge required Yahoo! to stop providing access to auctions selling Nazi memorabilia.
Many requests for information removal are covered on Chilling Effects and by the EFF. Eric Goldman tracks these cases as well.

Pay Per Click & Ad Targeting Lawsuits

In 1999 Playboy sued Excite and Netscape for selling banner impressions sold for searches for Playboy.
Overture sued Google for patent infringement. Just prior to Google's IPO they settled with Yahoo! (who by then bought out Overture) by giving them 2.7 million shares of class A Google stock.
Geico took Google to court in the US for trademark violation because Google allowed Geico to be a keyword trigger to target competing ads. Geico lost this case on December 15, 2004. Around the same time Google lost a similar French trademark case filed by Louis Vuitton.
Lane's Gifts sued Google for click fraud, but did not have a strong well put together case. Google's lawyers pushed them into a class wide out of court settlement of up to $90 million in AdWords credits. The March 2006 settlement aimed to absolve Google of any clickfraud related liabilities back through 2002, when Google launched their pay per click model.

Search User Information

The US government requested that major search companies turned over a significant amount of search related data. Yahoo!, MSN, and AOL gave up search data. The Google blog announced that Google fought the subpoena
In August, Google was served with a subpoena from the U. S. Department of Justice demanding disclosure of two full months’ worth of search queries that Google received from its users, as well as all the URLs in Google’s index.
A judge stated that Google did not have to turn over search usage data.
AOL not only shared information with the government, but AOL research also accidentally made search records public record.

Search as a Commoditizer

Each search company has its own business objectives and technologies to define relevancy. The three biggest issues search engines are fighting are
  • Publishing Rights: All search engines are fighting trying to gain the rights to index quality content. Some of the highest quality content is so expensive to create and market that there is not a business model for openly sharin

0 comments :