Geeking with Greg: 09/01/2007

Sunday, September 30, 2007

Findory to shut down November 1

Update: The shut down of Findory has been rescheduled to occur on November 22, 2007.

Update: The shut down of Findory has been postponed. The website will remain active past November 1, 2007. More information when I can.

Findory will be shutting down on November 1, 2007. I posted the following on the site:

Thank you, Findory readers, for the last four years. I hope you not only found interesting news on Findory you otherwise would have missed, but also, just by using Findory, took pleasure in helping others find news they might enjoy.

Sadly, all good things must come to an end. On November 1, 2007, Findory will be shutting down.

It was a wonderful experiment building Findory. Information personalization is in our future.

Some day, online newspapers will focus on your interests, building you your own unique, customized front page of news. Some day, search engines will learn from what you do to help you find what you need. Some day, your computer will adapt to your needs to help with the task at hand. Some day, information overload will be tamed.

But not today. Findory will be shutting down on November 1. The website will no longer carry news, blogs, videos, podcasts, or favorites. The daily e-mails will cease. To ease the transition for users of Findory Inline and the Findory API, empty feeds will be served for a couple weeks into November.

I am sorry to see Findory go. Though it will not be Findory doing so, I continue to believe that the future will be personalized.

- Greg Linden, Founder, Findory.com

Please also see my earlier post, "Findory rides into the sunset".

KFS, a Google FS clone

Rich Skrenta discusses and endorses Kosmix's KFS, an alpha-stage Google File System clone.

There appear to be a few of these early stage clones, the biggest being the Yahoo-supported Hadoop.

The Yahoo train wreck?

Analyst Henry Blodget calls Yahoo's declining Comscore numbers "an absolute disaster", says the company is a "train wreck", and predicts that "if the company can't reverse this trend in short order, its only hope will be to sell itself."

Yowsers. It's not all that bad, is it?

Like Microsoft, Yahoo may have been a feeble competitor lately, often appearing distracted, slow to react, and unable to do more than follow in most areas.

Yet, like Microsoft, Yahoo remains an giant in the field. In fact, not only does Yahoo have second place market share in many Web products, but also, according to at least one study, Yahoo has the strongest search brand around.

Yahoo's biggest problem at the moment seems to be that they try to do too much and end up doing nothing well. Even in their core business of advertising, they are playing second fiddle to Google. There's no reason for people to use the second best when switching costs are so low, but Yahoo seems bizarrely content to stay behind the leader even as it sees its audience trickling away.

There are as many opinions on what Yahoo should focus on as Yahoo has products, but, I have to say, I am amazed Yahoo has not done more with personalization.

Yahoo has hundreds of millions of signed in users with long histories of what they wanted and enjoyed. Rather than chasing Google's tail, they could lead in personalized advertising, search, e-mail, and news. Yahoo could use their knowledge of what their users need to focus attention, surface relevant information, and be as helpful as possible.

It is something Yahoo could do better than anyone else and would make Yahoo different. As Jeff Bezos used to say about Amazon, when shoppers try other stores, the experience should seem "hollow and pathetic". "Why doesn't this store know me?", shoppers should ask. "Why doesn't it know what I want?"

Yahoo should be the same way. Yahoo should know you. Elsewhere, the experience should feel vaguely unpleasant, like jumping from talking to your friends to being alone in a group of strangers. "Why is this site showing me that?", they should ask. "Doesn't it know I don't like that?"

Please also see my earlier post, "Yahoo post-Semel and the long road ahead".

Please also see my May 2006 post, "Yahoo home page cries out for personalization".

Friday, September 28, 2007

Starting Findory: Infrastructure and scaling

[After a long break, I am returning to my "Starting Findory" series, a group of posts about my experience starting and building Findory.]

From the beginning of Findory, I was obsessed with performance and scaling.

The problem with personalization is that that it breaks the most common strategy for scaling, caching. When every visitor is seeing a different page, there are much fewer good opportunities to cache.

No longer can you just grab the page you served the last guy and serve it up again. With personalization, every time someone asks for a page, you have to serve it up fresh.

But you can't just serve up any old content fresh. With personalization, when a visitor asks for a web page, first you need to ask, who is this person and what do they like? Then, you need to ask, what do I have that they might like?

So, when someone comes to your personalized site, you need to load everything you need to know about them, find all the content that that person might like, rank and layout that content, and serve up a pipin' hot page. All while the customer is waiting.

Findory works hard to do all that quickly, almost always in well under 100ms. Time is money, after all, both in terms of customer satisfaction and the number of servers Findory has to pay for.

The way Findory does this is that it pre-computes as much of the expensive personalization as it can. Much of the task of matching interests to content is moved to an offline batch process. The online task of personalization, the part while the user is waiting, is reduced to a few thousand data lookups.

Even a few thousand database accesses could be prohibitive given the time constraints. However, much of the content and pre-computed data is effectively read-only data. Findory replicates the read-only data out to its webservers, making these thousands of lookups lightning fast local accesses.

Read-write data, such as each reader's history on Findory, is in MySQL. MyISAM works well for this task since the data is not critical and speed is more important than transaction support.

The read-write user data in MySQL can be partitioned by user id, making the database trivially scalable. The online personalization task scales independently of the number of Findory users. Only the offline batch process faced any issue of scaling as Findory grew, but that batch process can be done in parallel.

In the end, it is blazingly fast. Readers receive fully personalized pages in under 100ms. As they read new articles, the page changes immediately, no delay. It all just works.

Even so, I wonder if I have been too focused on scaling and performance. For example, there have been some features in the crawl, search engine, history, API, and Findory Favorites that were not implemented because of the concern about how they might scale. That may have been foolish.

The architecture, the software, the hardware cluster, these are just tools. They serve a purpose, to help users, and have little value on their own. A company should focus on users first and infrastructure second. Despite the success in the design of the core personalization engine, perhaps I was too focused on keeping performance high and avoiding scaling traps when I should have been giving readers new features they wanted.

Please see also a post on O'Reilly Rader, "Database War Stories #8: Findory and Amazon".

If you have not seen them already, please see also the other posts in the "Starting Findory" series.

Wednesday, September 26, 2007

Reputation from edits and edit reversal

An enjoyable WWW 2007 paper out of UC Santa Cruz, "A Content-Driven Reputation System for the Wikipedia" (PDF), builds on a great but simple idea: High quality authors usually do not have their Wikipedia edits reversed.

From the paper:

In our system, authors gain reputation when the edits they perform to Wikipedia articles are preserved by subsequent authors, and they lose reputation when their edits are rolled back or undone in short order.

Most reputation systems are user-driven: they are based on users rating each other's contributions or behavior ... In contrast, ... [our] system ... requires no user input ... authors are evaluated on the basis of how their contributions fare.

A content-driven reputation system has an intrinsic objectivity advantage over user-driven reputation systems. In order to badmouth (damage the reputation of) author B, an author A cannot simply give a negative rating to a contribution by B. Rather, to discredit B, A needs to undo some contribution of B, thus running the risk that if subsequent authors restore B's contribution, it will be A's reputation, rather than B's, to suffer, as A's edit is reversed. Likewise, authors cannot simply praise each other's contributions to enhance their reputations: their contributions must actually withstand the test of time.

A fun demo of the technique is available that colors the text of some Wikipedia articles based on the reputation of the authors, providing some measure of how trustworthy particular passages of text might be.

It is curious how this simple but clever technique seems less susceptible to gaming. I was trying to think of ways the system could be manipulated -- Would people retaliate for having their edits reversed? Would they make lots of non-controversial but useless edits to increase their reputation? -- but these and other obvious attacks seem like they might have a fairly high risk of damaging your own reputation as people caught on and reversed the changes.

I also was trying to think how this might be applied elsewhere. For example, on eBay, rather than have sellers and buyers rate each other with inane things like "A++++!", perhaps eBay seller reputation could be determined instead by how often the transaction is reversed? What if eBay implemented a 30-day unconditional return policy on all transactions, then reported buyer reputation based on payment rate and seller reputation based on return rate?

Netflix and the KDD Cup 2007

There is an interesting group of papers in the KDD Cup 2007 on the Netflix Prize recommender contest. The full papers are particularly good and detail techniques that are doing well in the contest, but also don't miss the much lighter introductory paper, "The Netflix Prize" (PDF), and its discussion and charts of the progress on the prize.

On a related note, you might also be interested in Simon Funk's enjoyable write-ups ([1] [2] [3]) on his explorations into using SVD for the Netflix contest. I particularly liked his focus on the speed of and twiddles to SVD.

By the way, I am no longer playing with the contest, but I should admit that I never got anywhere near the performance of these contenders. For more on that, see my post, "Latest on the Netflix prize". Interesting that my prediction in that post -- that additional data on the movies from another source will be necessary -- so far has turned out to be wrong.

Microsoft Research Beyond Search grants

Microsoft Research is offering $1M in research grants for "research in the area of Semantic Computing, Internet Economics and Online Advertising".

A juicy dataset is provided for that research, a "Microsoft adCenter Search query log excerpt with 100 million search queries along with ad click logs sampled over a few months, and a Live Search query log excerpt with 15 million search queries with per-query search result clickthroughs."

Sadly, the grants and access to the data are open only to academic research groups. I suppose that is to be expected after the ruckus that followed the now-defunct AOL Research group's naive attempt to offer their search logs more widely. I guess non-academics will just have to buy search logs from ISPs.

By the way, I thought it was amusing, when looking over the terms of Microsoft's request for proposals, to see that participants are discouraged from "relying exclusively on non-Microsoft technologies." Ah, yes, what good is research if it doesn't use Microsoft products?

[Found via ResourceShelf]

Friday, September 21, 2007

Hotmap and map attention data

Hotmap is a fun demo that shows a heat map of what areas people are viewing frequently on Microsoft Live Search Maps.

From the About page:

Hotmap shows where people have looked at when using Virtual Earth, the engine that powers Live Search Maps: the darker a point, the more times it has been downloaded.

It is a pretty cool idea. The heat maps clearly focus on high population areas, roads, coastlines, rivers, country borders, and other items of interest.

Seems like there could be a bunch of unusual and useful applications here, especially if you take into account time series (that people look at some map tiles after looking at other map tiles).

Danyel Fisher at Microsoft Research has two articles on the project: "Hotmap: Looking at Geographic Attention" and "How We Watch the City: Popularity and Online Maps". The articles are light reads, plenty of pretty pictures of heat maps.

As the second of the two papers mentions, there are some amusing examples of where people direct their attention. Here in Seattle, a "small, very bright point on the shore of Lake Washington points out Bill Gates' house."

For related work that uses GPS log data rather than map search log data, also make sure to check out "The Microsoft Multiperson Location Survey" and "Predestination: Inferring Destinations from Partial Trajectories".

Update: More on Hotmap from Matthew Hurst, Todd Bishop, and directly from Danyel Fisher.

Update: One month later, the NYT reviews a commercial device called the Dash Express that apparently "broadcasts information about its travels back to the Dash network", allowing users to "warn each other through the network [anonymously] the second they hit a traffic slowdown."

E-mail versus social networks

Insightful thoughts from Om Malik:

E-mail has most ... of our attention .... [and] has all the elements needed for a social ecosystem, namely the address book.

Yahoo might have taken the wrong approach to ... social networking ... It should have started from within Yahoo's email service, which has some 250 million subscribers.

[E-mail should be] something better, something that doesn't make us all groan every time we open our inbox.

It probably is not fair to pick on Yahoo here. Microsoft and Google also seem to have had only limited success with their social network apps while letting their e-mail apps languish.

But, I think Om has an excellent point. Rather than try to replace e-mail apps with social apps, most people might be better served by bringing more social features to our e-mail apps.

On this topic, you might be interested in checking out "Inner circle: people centered email client", a fun CHI 2005 paper out of Microsoft Research.

Update: Two months later, Saul Hansell at the NYT posts about the Yahoo Inbox 2.0 project, an extension to Yahoo Mail "that can automatically determine the strength of your relationship to someone by how often you exchange e-mail and instant messages with him or her" and displays "other information about your friends ... much like the news feed on Facebook."

Tuesday, September 18, 2007

Google's PowerPoint launches

Google launches Google Presentations, lightweight PowerPoint-like functionality integrated into Google Docs.

Philipp Lenssen posts a lengthy review. More details are available from the Google Help pages.

As you might expect in an online app, the focus appears to be on collaboration, sharing, and virtual conferencing (using chat and synchronized online viewing of the presentation).

Stepping back and looking at the bigger picture here, I find myself getting to the point where my entire day is spent in the browser. Even on machines where I have Microsoft Office installed, I often find it faster to quickly view documents using the GMail integration with Google Docs than open other applications.

I was skeptical that Google would get us to that point, but they have. Google appears to be making remarkable progress chipping away at the utility of a desktop PC environment.

Monday, September 17, 2007

The Google phone company

Bob Cringely's latest column proposes that Google spend several billion to buy the 700 MHz band, sell "Google Cubes" that act as small fileservers, WiFi points, and the mesh of a 700 MHz network, and then "overnight" become the "biggest and lowest-cost ISP" and the "biggest and lowest-cost mobile phone company" while "dominating local- and location-based search".

Bob is thinking big today, it appears.

Friday, September 14, 2007

Tech talk: Searching for Evil

Security research guru Ross Anderson has a talk up on Google Video, "Searching for Evil", that, among other things, surveys some of the more unusual Web-based financial schemes.

If you only have a few minutes, jump to 20:23 to check out Ross' frightening examples of some phishing-like schemes that are popping up on the web. The first example shows how people recruit mules on the Web to sit in the middle of a fraudulent financial transaction, with the person who accepted a too-good-to-be-true job offer getting badly screwed in the end.

If you have more time to dive in deeper and watch the whole thing, I enjoyed Ross' discussion at the beginning of the talk about using evolutionary game theory in simulations of network attacks. He refers to a WEIS 2006 paper, "The topology of covert conflict" (PDF), for more details. That paper starts to "build a bridge between network science and evolutionary game theory" and to "explore ... sophisticated [network] defensive strategies" including "cliques ... the cell structure often used in revolutionary warfare" which turn out to be "remarkably effective" for defending a network against adaptive attackers.

Similarly, though not mentioned in his talk, Ross has a ESAS 2007 paper, "New Strategies for Revocation in Ad-Hoc Networks" (PDF) which looks at how to "remove nodes that are observed to be behaving badly" from ad-hoc networks. They come up with a remarkable conclusion that "the most effective way of doing revocation in general ad-hoc networks is the suicide attack ... [where] a node observing another node behaving badly simply broadcasts a signed message declaring both of them to be dead."

Wednesday, September 12, 2007

Netflix prize and the value of experimenting

The Netflix Prize leaderboard continues to be a fascinating proof of the value of experimentation when working with big data.

The top entries include teams of graduate students from around the world, with eastern Europe particularly well represented. The second best entry at the moment is from undergraduates from Princeton (kudos, Dinosaur Planet).

Some of the teams disclose information about their solutions, enough to make it clear that the teams are playing with a wide variety of techniques.

I love the "King of the Hill" approach to these kinds of problems. There should be no sacred cows, no egos preventing people from trying and testing new techniques. From the seasoned researcher to the summer intern, anyone should be able to try their hand at the problem and build on what works.

Please also see my July 2007 post, "Netflix Prize enabling recommender research", and my June 2007 post, "Latest on the Netflix Prize".

See also my April 2006 post, "Early Amazon: Shopping cart recommendations", for an example from the early days of Amazon of the value of A/B testing and experimentation.

Update: The KDD Cup 2007 papers are available. They give a nice flavor for the approaches (mostly based on twiddles to SVD) that are currently near the lead.

Tuesday, September 11, 2007

Leaked information on Google Reader

According to Philipp Lenssen, an internal Google talk with confidential information on Google Reader was briefly available on Google Video.

Philipp posts a summary from someone referred to as "Fanboy". Ionut Alex Chitu also posts two summaries ([1] [2]) of the content of the talk. Worth a look.

There is mention of planned social sharing features, details on the internal operations of Google Reader, and various statistics on feeds and feed reading. It also sounds like they plan on launching feed recommendations soon.

Impressive that the team working on Google Reader is so small, just seven people.

Update: Three months later, the feed recommendations have launched.

Actively learning to rank

Filip Radlinski and Thorsten Joachims had a paper at KDD 2007, "Active Exploration for Learning Rankings from Clickthrough Data" (PDF), with a good discussion of strategies for experimenting with changes to the search results to maximize a search engine's ability to learn from clickstream data.

Some excerpts:

[When] learning rankings of documents from search engine logs .... all previous work has only used logs collected passively, simply using the recorded interactions that take place anyway. We instead propose techniques to guide users so as to provide more useful training data for a learning search engine.

[With] passively collected data ... users very rarely evaluate results beyond the first page, so the data obtained is strongly biased toward documents already ranked highly. Highly relevant results that are not initially ranked highly may never be observed and evaluated.

One possibility would be to intentionally present unevaluated results in the top few positions, aiming to collect more feedback on them. However, such an ad-hoc approach is unlikely to be useful in the long run and would hurt user satisfaction.

We instead introduce ... changes ... [designed to] not substantially reduce the quality of the ranking shown to users, produce much more informative training data and quickly lead to higher quality rankings being shown to users.

The strategy they propose is to come up with some rough estimate of the cost of ranking incorrectly, then twiddle with the search results in such a way that the data produced will help us minimize that cost.

There are a bunch of questions raised by the paper that could use further discussion: Is the loss function proposed a good one (in particular, with how it deals with lack of data)? How do other loss functions perform on real data? How much computation does the proposed method require to determine which experiment to run? Are there simpler strategies that require less online computation (while the searcher is waiting) that perform nearly as well on real data?

But, such quibbles are beside the point. The interesting thing about this paper is the suggestion of learning from clickstream data, not just passively from what people do, but also actively by changing what people see depending on what we need to learn. The system should explore the data, constantly looking for whether what it believes to be true actually is true, constantly looking for improvements.

On a broader point, this paper appears to be part of an ongoing trend in search relevance rank away from link and text analysis and toward analysis of searcher behavior. Rather than trying to get computers to understand the content and whether it is useful, we watch people who read the content and look at whether they found it useful.

People are great at reading web pages and figuring out which ones are useful to them. Computers are bad at that. But, people do not have time to compile all the pages they found useful and share that information with billions of others. Computers are great at that. Let computers be computers and people be people. Crowds find the wisdom on the web. Computers surface that wisdom.

See also my June 2007 post, "The perils of tweaking Google by hand", where I discussed treating every search query as an experiment where results are frequently twiddled, predictions made on the impact of those changes, and unexpected outcomes result in new optimizations.

Improving Amazon?

I want to surface a new comment thread on improving online shopping from one of my old posts. I would enjoy hearing other thoughts on it, so please comment if you have anything to add.

Chris Zaharias asked:

Where you think Amazon still has opportunities to improve their game?

I said:

Looking at the bigger picture, I think it is hard to say that Amazon is anywhere close to done. The experience of shopping at Amazon is hardly effortless, full of discovery, or even all that pleasant.

Going to Amazon should be like walking into your favorite store, the nearest shelves piled high with things you like, everything you don't need fading into the background.

When you walk up to an item, everything you need to quickly evaluate it and decide whether to buy it should float to your attention.

Buying should be effortless, a couple clicks at most, with no unpleasant surprises (such as hidden shipping charges, delays, or belated out of stock e-mails).

Amazon has taken some steps toward that vision, but is a long way from there.

What do you think? How do you think online shopping be improved? Are there things Amazon should be doing that they are not?

Monday, September 10, 2007

Learning forgiving hashes

Googlers Shumeet Baluja and Michele Covell had a paper at IJCAI 2007, "Learning 'Forgiving' Hash Functions: Algorithms and Large Scale Tests" (PDF).

The paper describes finding similar songs using very short audio snippets from the songs, a potential step toward building a music recommendation system. The similarity algorithm used is described as a variant of locality-sensitive hashing.

First, an excerpt from the paper describing LSH:

The general idea is to partition the feature vectors into subvectors and to hash each point into separate hash tables... Neighbors can be [found by] ... each hash casting votes for the entries of its indexed bin, and retaining the candidates that receive some minimum number of votes.

What is interesting (and a bit odd) about this work is that they used neural networks to learn the hash functions for LSH:

Our goal is to create a hash function that also groups "similar" points in the same bin, where similar is defined by the task. We call this a forgiving hash function in that it forgives differences that are small.

We train a neural network to take as input an audio spectrogram and to output a bin location where similar audio spectrograms will be hashed.

A curious detail is that initializing the training data by picking output bins randomly worked poorly, so instead they gradually change the output bins over time, allowing them to drift together.

The primary difficulty in training arises in finding suitable target outputs for each network.

Every snippet of a song is labeled with the same target output. The target output for each song is assigned randomly.

The drawback of this randomized target assignment is that different songs that sound similar may have entirely different output representations (large Hamming distance between their target outputs). If we force the network to learn these artificial distinctions, we may hinder or entirely prevent the network from being able to correctly perform the mapping.

Instead of statically assigning the outputs, the target outputs shift throughout training ... We ... dynamically reassign the target outputs for each song to the [bin] ... that is closest to the network's response, aggregated over that song's snippets.

By letting the network adapt its outputs in this manner, the outputs across training examples can be effectively reordered to avoid forcing artificial distinctions .... Without reordering ... performance was barely above random.

I have not quite been able to make up my mind about whether this is clever or a hack. On the one hand, reinitializing the target outputs to eliminate biases introduced by random initialization seems like a clever idea, perhaps even one that might have broad applicability. On the other hand, it seems like their learning model has a problem in that it does not automatically learn to shift the outputs from their initial settings, and the reordering step seems like a hack to force it to do so.

In the end, I leave this paper confused. Is this a good approach? Or are there ways to solve this problem more directly?

Perhaps part of my confusion is my lack of understanding of what the authors are using for their similarity metric. It never appears to be explicitly stated. Is it that songs should be considered similar if the difference between their snippets is small? If so, is it clear that is what the NNet is learning?

Moreover, as much as I want to like the authors idea, the evaluation only compares their approach to LSH, not to other classifiers. If the goal is to minimize the differences between snippets in the same bin while also minimizing the number of bins used for snippets of any given song, are there better tools to use for that task?

Universal action and the future of the desktop

In a Google engEdu tech talk, "Quicksilver: Universal Access and Action", Nicholas Jitkoff provides some thought-provoking ideas on the future of the desktop.

Starting at 03:52 in the talk, Nicholas begins describing how he thinks the desktop should work, listing four categories of user goals: Search, Summon, Browse, and Act.

Search is a fast, comprehensive, and easy-to-use desktop search tool, such as Google Desktop. This may not sound new but, amazingly, this has only recently begun to be a common part of the desktop experience.

Summon is using desktop search for navigation. You know something exists, you just want to get back to it immediately.

Browse may sound close to what the desktop does now, but Nicholas seems to mean browse not as navigating a folder hierarchy but as finding objects related to or near other objects. For example, you might not remember the name of the specific song you want, but you might be able to remember the artist who wrote it; getting to the first allows you to recall the second.

Act is when you want to immediately do a task (e.g. play a music track) without any intervening steps such as opening an application.

Note the deemphasis of the traditional file hierarchy, the focus on objects, and the shift away from applications and toward actions on objects.

The desktop should seek to satisfy our goals immediately. We should not have to start to adjust the lighting in a photo album by navigating an hierarchical menu, locating an application that allows you to edit photos, waiting for the application to load, and then opening the files using the open menu in that application. We should just ask to adjust the lightening in a photo album.

The next few minutes of the talk further break down some of the constraints on the desktop metaphor. Nicholas advocates fast, universal access that ignores the boundaries of the machine, reaching out to the network to whatever data and code is needed to act. The focus should be on the task -- getting work done -- using whatever resources are necessary, requiring as little effort as possible.

The vision is fantastic and inspiring. However, while Quicksilver is an interesting example, from what I saw, it appears to be only a baby step toward these lofty goals. The learning and automation appears primitive, and the effort required to customize severe, which may make Quicksilver closer to a geek tool than a realization of the broader ambition.

Even so, Nicholas is offering intriguing thoughts on where the desktop should go. It is well worth listening.

Friday, September 07, 2007

More interviews on search in 2010

Gord Hotchkiss at Search Engine Land continues his interviews on the future of search with his post, "Search In The Year 2010: Part Two".

Again, I would recommend reading the whole thing, but I here will focus on the parts on personalization.

In particular, there are a few tidbits on personalized advertising in this round of interviews. Some excerpts:

Chris Sherman: ... As they get to know you and your preferences, you know... "I never click on that video ad," they’ll gradually stop showing you [those] ads ... and maybe increase the ads ... that you do click on.

Larry Cornett: ... The more they understand about what a specific user is looking for in their context, the more intelligent they can be about what they're actually offering ... By being more targeted it will add more value for the users and hopefully, be a better experience for them as well .... Do [users] really want to spend time in the context where they're seeing a lot of stuff that’s not targeted and not appropriate and might even be annoying or would they rather ... [see ads that] could be beneficial for them.

[Gord Hotchkiss:] Personalization of advertising will happen incrementally and the ability to target accurately will improve over time. For many users, it will be a mixed environment, with some very well targeted, relevant ads in some locations that don’t even look like advertising and the more typical forms of untargeted advertising we're more familiar with.

The impression I got from this is that personalized advertising is now seen as inevitable. Privacy concerns may make it appear incrementally, but most seem to agree that it will happen.

On a different topic, usability guru Jakob Nielsen used his time to promote NLP over personalization and pick on Amazon.com's recommendations yet again. Gord asked me to respond.

On the one hand, I agree with Jakob about the long-term promise of natural language techniques (though I think he may be underestimating the challenges and overestimating the likelihood of rapid progress there) and his criticism of inaccuracies in personalization and recommendations (and they are inaccurate, no doubt).

On the other hand, I think Jakob is using an absolute measure of the effectiveness of personalization where a relative measure is more appropriate. Specifically, the metric should not be how often does personalized content accurately reflect your interests; it should be how much better does personalized content predict your interests than whatever unpersonalized content you otherwise would have to put in the space.

That is a much lower bar. Bestsellers and other unpersonalized content tend to be very poor predictors of individual interest. By knowing even a little bit about you, it is easy to do better.

Is personalization ever going to be perfect? No, but it does not have to be. It just has to be more useful than the alternative. Personalized content only has to be marginally more interesting than unpersonalized content to be helpful.

See also Gord's first post in this series, "Search In The Year 2010", which has some more on personalized search, and my comments on that post.

For more on personalized advertising, please see also my posts "What to advertise when there is no commercial intent?" and "Is personalized advertising evil?"

Wednesday, September 05, 2007

The power of branding in web search

Some research out of Penn State, "The Effect of Brand Awareness on the Evaluation of Search Engine Results" (PDF) puts some hard numbers on the hurdles for Google's web search competitors.

The study showed participants Google search results on all queries, but switched around branding elements at the top and bottom of the page to label the results as from Yahoo, Microsoft Live Search, a startup called AI²RS, and Google.

From the paper:

Based on average relevance ratings, there was a 25% difference between the most highly rated search engine and the lowest, even though search engine results were identical in both content and presentation.

The 25% difference was between the results branded with the AI²RS startup and the results branded as Yahoo.

Curiously, Yahoo was rated substantially higher than Google, despite the fact that these were Google's search results. Yahoo has failed to gain web search market share, but, if you believe this study, brand weakness is not the reason why.

It is true that this study is small, just 32 participants and across 4 different queries. It would be nice to see a broader study that confirms these results.

Even so, it probably is safe to say that the strength of the Google and Yahoo brands (and Microsoft's ownership of the defaults in Internet Explorer) make it very difficult for any web search startup. As Rich Skrenta once said, "A conventional attack against Google's search product will fail ... A copy of their product with your brand has no pull."

See also a lighter Penn State Live article on the study, "Branding matters -- even when searching".

[Found via Barry Schwartz]

Missing an opportunity in shopping metasearch

There is an interesting tidbit at the beginning of a Search Engine Land post, "Kicking The Tires On Shopping Search, Part Two: The Independents".

Compiling data from Hitwise, they shows that Google, Microsoft, and Yahoo now have a meager 15% market share in shopping search combined, down from nearly 50% combined three years ago.

Considering that a substantial percentage of web search queries are shopping-related (see "A taxonomy of web search" (PDF)) and the ease of extracting advertising revenue and revenue sharing where there is such strong purchase intent, I would think that Google, Yahoo, and Microsoft would be pursuing shopping metasearch more aggressively.

See also part one of the Search Engine Land series, "Analyzing the Major Shopping Search Services", which focuses mostly on design and usability of the shopping sites offered by Google, Yahoo, and Microsoft.

See also my earlier posts, "R.I.P. Froogle?" and "What should Google do next?".

Saturday, September 01, 2007

HITS, PageRank, and keeping it simple

A SIGIR 2007 paper out of Microsoft Research, "HITS on the Web: How does it Compare?" by Marc Najork, Hugo Zaragoza, and Michael Taylor is a large-scale study of several ranking algorithms using a substantial web crawl and data from the MSN query logs.

The authors appear to have expected the HITS algorithm to outperform the others in their tests, but found instead that a combination of BM25F and simple in-degree link analysis outperformed everything else. From the paper:

We were quite surprised to find that HITS, a query-dependent feature, is about as effective as web page in-degree, the most simpleminded query-independent link-based feature.

As expected, BM25F outperforms all link-based features by a large margin. The link-based features are divided into two groups, with a noticeable performance drop between the groups. The better-performing group consists of the features that are based on the number and/or quality of incoming links (in-degree, PageRank, and HITS authority scores); and the worse-performing group consists of the features that are based on the number and/or quality of outgoing links (outdegree and HITS hub scores).

The combination of BM25F with ... id in-degree consistently outperforms the combination of BM25F with PageRank or HITS authority scores, and can be computed much easier and faster.

PageRank performed poorly in their tests. However, their explanation of why struck me as unconvincing. From the paper:

The fact that in-degree features outperform PageRank under all measures is quite surprising. A possible explanation is that link-spammers have been targeting the published PageRank algorithm for many years, and that this has led to anomalies in the web graph that affect PageRank.

This begs the question of whether they picked the right PageRank algorithm. In particular, there are variants of PageRank that they could have used that appear less sensitive to spam and may have performed much better. Unfortunately, without results for those variants, it is hard to know whether the criticisms in this paper of naive PageRank are applicable to the algorithms evolved from PageRank used by search engines today.

Even so, the results of the study are interesting, both for the overview of several relevance ranking algorithms and the conclusions about their effectiveness. Particularly intriguing is the evidence that computationally expensive algorithms such as the query-dependent HITS algorithm seem to hold no advantage over much simpler techniques.

Update: Marc Najork, one of the authors of the paper, expands on the PageRank algorithm issue and the performance of HITS in the comments for this post.