Google IO Conference

Thursday, May 29, 2008

"Introduction to Google DocType: An Encyclopedia to the Open Web" by Mark Pilgrim

Google DocType is basically a wiki provisioned by Google withing their Google Code site. The goal of the wiki is to document the open web. So what is the open web? Well it wasn't really a straightforward answer. Essentially, Google only really wants to capture information related to raw browsers, for instance, anything related to a vanilla install of Firefox. This means that they won't really address Flash or Silverlight, or anything else not included with Firefox. This is where the definition got a little gray, Google does want to security to be a prevalent topic, so security issues related to Flash are okay for DocType. According to the presenter, it's technically a wiki, but ultimately Google (and he) reserve the right to control and revise the content that ends up there. This is not to discourage users from contributing, I think it was more for a way to avoid some of the challenges encountered by Wikipedia. For instance, he mocked the lengthy process Wikipedia has defined for deleting a page (this page itself is about 10 printed pages long)
http://en.wikipedia.org/wiki/Wikipedia:Deletion_policy
Dictatorship aside, the content that is actually in the wiki now is extremely useful. There are thorough articles that include thorough examples of code. The only problem is that the content in the wiki is far from comprehensive. There are plenty of pages that are merely stubs of information yet to come. Google recognizes this and merely admits that DocType is a work in progress.
Some other noteworthy pieces of information from the presentation. Apparently, when the presenter was pitching the idea to fellow Googlers, their primary concern was how to prevent spam from being posted. Interesting that this is the primary concern, and quality of information isn't. To alleviate this, an automated process is put in place to find and remove spam, and after that a manual process is defined. Bottom line though, spam really hasn't been a problem up until this point. Another nice feature is that since Google is hosting this, the data is replicated across all their datacenters. And finally, the entire project (code examples included) is accessible through a subversion checkout. A very nice feature if you'd like to copy the data locally for review during a period of no Internet connectivity. Finally, Google has written thousands of JsUnit test cases to verify the information in DocType. For instance, if some information is posted about how Firefox 2.x handles colors, there are accompanying JsUnit test cases to verify that the information is correct. Mark mentioned that they actually found some very subtle bugs in browsers that have existed for years while creating their baseline of test cases.
Mark was very anti-Microsoft, and made many jokes at their expense. His operating system of choice is Debian.
You can find Google DocType at:
http://code.google.com/doctype/
In my opinion, this is a nice step by Google. Since their business is almost all web based they have great expertise in this area. I have to wonder though, how will this keep the momentum for constant updates?

"Open Source is Magic" Tech Talk with Chris DiBona

I arrived a few minutes late to this presentation on the account that I was eating lunch. It's funny, there was no time allotted in the schedule for lunch. We basically had to pick some time between a session to eat, so naturally, obtaining the food and consuming it took a little longer than 15 minutes. In general, the point of this presentation was to indicate why open source software is great, why google thinks it's great, and some ways that they're helping to advance the field. Something that has really been stressed throughout the conference is that Google really has built their entire foundation on free open source software. Even going back to the days of Sergey and Larry in dorm rooms, pretty much all of the software used to build Google has been open source. Chris also made the point that there is noone in the world who has not interacted with something driven by open source software, they just may not be aware of it. Aside from web searches using Google (which everyone has done in today's world right?), almost all embedded devices have C code that was combiled with the gcc (Gnu C compiler). Google itself has also done much to help further the field of open source, primarily in two methods. The first of which is releasing their custom patches to open source software back to the community. For instance, about a month ago Google announced that they will be providing the community with custom patches they wrote for MySql 4.x. Of course these patches were written by Googlers for Google optimizations, but once they were in the wild, they gained a lot of popularity and MySQL now has plans to incorporate the code into their 5.x code base.
http://google-code-updates.blogspot.com/2007/04/google-releases-patches-that-enhance.html
The second way in which Google is advancing the field of open source is with their Summer of Code efforts.
http://code.google.com/soc/2008/
Summer of Code is a way in which Google is providing college students a channel to stay involved with computer science throughout the summer, primarily through contributions to open source software projects. Google coordinates the efforts, and even awards students with monetary payment for their work. Everyone benefits.
Ultimately, Chris' point was that open source software is good for the Internet, and what is good for the Internet is good for Google.
I think that open source software really has potential in the enterprise. Finally companies are beginning to trust the open source licenses (there really aren't that many of them). Once legal departments become familiar with the licensing, I don't see why they wouldn't look forward to the opportunity to avoid the headachces that happen when processing a new license agreement for every new piece of proprietary software. Plus, it's free!

Wednesday, May 28, 2008

Day 2 Key note

Keynote speech by Marissa Mayer, VP of Search products and User Experience.

A glimpse under the hood @ Google

This keynote is focused on how google evolved, what search products are built and how Google handles the User Experience.

Marissa started by introducing how and why Google invited hundreds of artists and designers to design iGoogle interface. Google believes that a fraction of its users wants more than the minimalistic look of Google homepage.

Sergai created the Google home page. When Marissa asked him why he chose the minimalistic look on the home page, Sergai's answer was first, we don't have a web master and second, I dont do HTML. Its amazing the way it started and remained the same all these years supporting the founders vision. When Google first tested their home page with students from Stanford, students kept waiting for > 40s staring at the screen. Marissa asked them - "are you waiting for
something?". Then the tester replied that they are "waiting for the rest of the page to load".
Marissa revealed the complexity behind the simple home page. She believes there is no need for the end user to know what goes behind the search box.

Basically every search query is greeted by a loadbalancer which will send the query to one of the many data centers. Mixers get involved before hitting 300-400 backend search servers. Again Mixer gets involved where Ad Servers and other components get mixed up. Then the process goes thru' Google Web Servers for HTML content. This entire process goes through 700-1000 servers brining back millions of hits in a fraction of second. This is the complexity Google wants to hide from the end users.

Split A/B testing.Google conducts A/B testing to understand how users react to different designs. This testing is done in production and matrics are gathered in realtime.
Matrics could be RPM, number of hits etc. Based on the trends, Google decides on which design makes sense. Marissa gave couple of examples on the type of tests they conducted. She showed 3 versions of Google Search results, all the three looks almost same, except the white space between Google icon and the beginning of the search results. Amazingly, couple of pixels makes quite a difference in how users react to the search results. Google found that the version
with less white space is liked by many users. THis is backed by the serach results users actually clicked on. Similarly, Google tested Google Ads background color - blue vs. yellow. It found that Yellow makes sence for the ads.

There was a research conducted on the number of results should be displayed on the first page, 20? 30? 40? Google found that more search results on the page
actually generated less search querries. (The more you search, the more money Google makes.. makes sense).

Learning curve on Searching.Gogle found that over a period of time, users got educated on how to search, what to search for and how to talk to search engine to get better results.
Think 10 years out.Google believes every company should think 10 years down the line and think of how their applications, business models will evolve; and be prepared for.

Google 411:There is no reason behind this application and no relavancy to search, but at the cre of this application is the speech recognition and text to speech. Google believes this is not a 100% best solution (like any of its applications), but this application demonstrates that it could be used for video search and other media searches; and could bring more inforation to the user. Car search is something that is mentioned couple of times.

Google Language Informational SearchThis is a great story. Google spend lots of energy in building the multiple language search. Outsourced to translators, but that turned out to be a lengthy process and very unproductive...when one of Marissa's friend could produce his web page in 50 different languages. The secret is his friend used his fans to translate it for him. Google followed same approach and requested all its users to contribute to the translation. As a result, more than quarter million people contributed to tranlasting search results into 110 languages. Now Google has 140 domains.

Easter Eggs on Google

Marissa, during her grad days used an app on Linux that echos anything typed in Chef language. She used this program to convert 400+ strings and it is one of the language choices on Google Preferences. Its called Bork, bork, bork language.

Healthy disrespect for the impossible.
Marissa believes that solving search problem is impossible...but they will keep working towards it. Getting something close to 90-95% could make quite a
difference.

Google Health:This application is created to help users store the medical records in one place. Again, this is an impossible task, which Google clearly acknowledged.

20% of time:This is a culture Google employed in the company. All Google employees spend 20% of their time in doing what they like to do. Amazing work was produced with
this 20% of time rule - 50% of the Google features are produced in this time. Gmail is another example.

Overall Keynote was fantastic, lot of interesting insights into how Google operates and keeps the innovation going in the organization - still marching towards the founders vision and goals.

Software Development Methodology at Google

We had a chance to talk to a Tech Lead on Google Gears about how they develop and elevate application efficiently. His answer is simple - they do not follow any Software developmen methodology or approach. They have coding best practices. On the top of it, the hiring bar is so high, the Engineer generate good quality code and constantly find ways to improve it by peer reviews. Some parts of the organization follow scrum, but they are not crazy about it. What they believe is, less process more room for innovation.

One important thing he mentioned is, Engineers are at the top of the food chain, not project managers or marketing or product department. This allows Engineers to take decision on what and how something gets elevated.

Overall impression on the first day

Though the registration process was little bit disappointing, it looks "doesn't matter" with the Key Note speech by Vic. Google nicely managed their inefficiency in managing "quiuing".

Google is encouraging developers to build more and more apps without any constraints... that pushes the web to new levels. I think we need this. Thanks to Netscape for starting this.

I liked Android and OpenSocial tracks, where many new technologies and techniques are presented... again to encourage developers to build more and more apps.

The day ended nicely with the "After Hours" party, which showed Google's company culture... music, games, drinks, food, band, lots of information sharing...

MySpace and Gears

MySpace is a heavy adoptor for Google Gears. Looks like they used Gears effectively and took advantage of the features available in Gears. Couple of examples mentioned in the conference

  1. Search feature for power user. In this scenario, someone could search for a friend in the list (trust me 0.1% of the MySpace users have > 10,000 friends. I guess they need this feature) and Gears will bring it up as the users type the criteria. This feels like Google Suggest. But powerful than that. The data is locally cached and brought up as required and the UI changes based on the requirement.
  2. Message enhancements

When is Google Gears a good choice

  • Our user base should have plug-in installed. Chances are... users download it, much like they download Flash plugin. Thanks to Adobe.
  • If we have some control on the users and their browsers (typically corporate intranets), then we can push this plugin
  • Added functionality is compelling enough to download the plugin? What will users do if the app we build requires them to download the plugin. I think the users and their behavious is changing (again, thanks to Adobe), so chances are this could become ubiquitous so we can ignore this as a requirement

We need to keep this in mind when building Gears apps

  • Users may have multiple machines, we need to be cognizant of that
  • Worker pool threads are great for processing on the client side
  • we need to account for shared OS login scenario.

Questions? owyn@myspace.com

URLs are people too

Whatt? yes, people could use URLs to represent them. Plaxo uses this model to identify the user and pull his information using Social Graph API (+some manual crawling). These URLs use OpenID and Social Graph API.