Peter Norvig: An Insider’s Look at Google Research

Google is such a business powerhouse that people sometimes forget that every single penny depends on the research of thousands of engineers toiling in the bowels of the Googleplex. That’s why I always like to hear what Google’s rocket scientists have to say. This morning, at the Search Marketing Expo, Pete Norvig, Google’s director of research, is holding forth. I’ll liveblog the highlights here. (And there’s more coverage by Cade Metz at The Register and Tom Krazit at CNET News’ Relevant Results.)

His opening Powerpoint slide promises a review of 21 projects in 15 minutes:

1) Person Finder, following the Chile earthquake.

2) Power Meter you can plug into your house power system to monitor how much you use.

3) Earth Engine, which can show areas of deforestation using Google Earth.

4) Trike and Snowmobile that can contribute to Google Earth views.

5) User photos in Google Street View.

6) Image Swirl, image recognition software.

7) Web-scale image annotation, matching words for images.

8) Image rotation captchas, so you don’t have to divine those increasingly ridiculous captchas.

9) Google Goggles, take a photo of a product and Google will tell you what it is.

10) Discontinuous video scene-carving.

11) Sharing cluster data.

12) App Inventor for Android, introductory programming development environment for phones, in alpha test.

13) Speech recognition, and how much better it’s doing over time.

14) Punctuation/capitalization in transcribed speech. (I want this!)

15) Translating phones, or at least two pieces of it–translation and voice recognition.

16) Low-resource MT: Yiddish–better translation even with languages with not much written material.

17) Sound understanding: Show me all the car crashes in YouTube. Not quite there yet, but it’s coming.

18) Google Squared

19) Clustering of words within a context, like “Whistler” creates clusters of the painter, Olympics, British Columbia, etc.

20) Attribute extraction, to improve search results.

21) Browser size–tool that puts overlay on pages that shows which percentage of the page can be seen on each browser.

Whew. Barely kept up there.

Norvig: We’re trying to observe the world of the Web… try to understand all of that that’s going on by observing the data and creating models.

Chris Sherman at Search Engine Land, which puts on the SMX show, asks about 20% time. Norvig says Google’s ability to scale Webwide using Google’s infrastructure is key, because it allows much faster testing and deployment.

How do you decide the balance between short-term and long-term research? Norvig: We’re pushing very hard toward doing something useful. Always in service of something we eventually want to get out there.

Danny Sullivan, Search Engine Land’s editor in chief, asks about what has come out of 20% time. Norvig: Gmail is one example. Though right away that became that engineer’s 100% time. Speech recognition is another.

How much are cofounder Larry Page and Sergey Brin involved now? Norvig: They’re very involved. They’re setting the long-range direction. And they’re really trying to evaluate as many projects as they can. Their life hasn’t changed very much, because they’re still at their deep level. But for rest of us, a lot has changed–takes them longer for them to get to any particular new project.

What are you researching now? Norvig: Education. Ways to lead people to information over an entire semester, not just this moment.

How are projects segmented in various regions? Norvig: Some are local because we need local translations or products. Remote product development sometimes because that’s where the right people are.

What technologies do you see out there that would change how search is done? Norvig: Lots of emphasis on mobile.

How has Google come up with new signals to do real-time search? Norvig: One thing that I still think is overhyped in PageRank. Just one of many things. We never felt that it was such a big factor. It’s got the catchy name but we’ve always looked at all the available data. How do users interact with them, etc. You’re combining every available signal. It’s just a slightly different combination.

What enables that is the infrastructure that we’ve built. That’s allowed us to do real-time.  I remember when we went to hourly (updates of the index), and Larry pushed back and said that’s not good enough. The engineers said, well, we can’t do better yet. In the end, Larry gave in, but said they needed to call it the 3600-second index, otherwise the hour would remain an hour.

Is it time for new marketing beyond PageRank? Norvig: I think that’s right. We need some better branding.

On the Caffeine infrastructure update, what’s your group’s role and where’s it at? Norvig: Gives only vague timing, despite coverage lately that it might be late.

Do you have some signals Google uses that people don’t realize? Norvig: Bibliographies in Google’s book scanning.

How separate is the search and the ad side? Norvig: Just the way a newspaper has editorial content and advertising content, and those don’t mix. Of course, we use Google File System and Big Tableing and things like that in both.

Is there more work put into core search vs. ads? Norvig: Doesn’t quite say (though I suspect most of it is core).

Now that the Web is an index of objects as much as pages, will there be a different notion of how to treat those objects, like companies or people’s names? Norvig: We are moving in that direction. We want to support types of queries like “show me these types of companies and rank them by revenue.” You’re on your own now unless some page has done that.

What are the really hard problems today? Norvig: Vision is the big problem today. There haven’t been really big breakthroughs from 20 years ago. Still images and especially video images. There’s just so much more data involved in video vs. a text file. And parsing video into understandable objects. I’m excited about that.

Do you have any solutions for email overload? Norvig: Actually I had an intern last summer working on that project. Some experimental things will roll out before long. Another thing is saying, Is email the right tool? Maybe just slashing all that down and starting all over again is the way to go. Google Wave? Google Buzz? Not sure, but maybe. But still people trying to figure out where Wave works. Do I make a Google doc, do I make a Wave, do I make a site? I think we’re going to have to see some consolidation… based on the content.

Sergey has talked about embedded chip in your head to do searches–anybody doing that? Norvig: Uh, not yet.

How do you ensure that people get training and knowledge to work at Google? Norvig: When people doing information retrieval in college come to Google, they realize all I knew was wrong. That’s changing a little bit. Also we have an internal course on all the Google tools. Then give them a starter project. Then they get ready to do something else on their own.

Do you move people around a lot to different projects? Norvig: We encourage that. We like to keep our projects short–three or six months rather than a year. Often they find a couple things out of those projects to do, so they stay a little in that same area for awhile. But we do make it easier to move from one place to another.

What’s next in search–any dramatic changes in the metaphor beyond the list? Norvig: You see the page becoming more interesting and varied–pictures, video, etc., rather than 10 links. Mobile is also driving things hard because the screen is so small. There we’re really forced to do a better job. That will require more of a partnership, more interactive. Won’t be as stateless–will have more of a dialogue that both sides are contributing to. Right now, we force the user to do most of the work.

And that’s  a wrap.

  1. [...] Rob Hof’s coverage (I don’t know Rob but he’s sitting next to me and it looks like he’s caught stuff I missed.) [...]

  2. Rob,

    Thanks for this. Excellent write up.

