Thursday, May 29, 2008

"Introduction to Google DocType: An Encyclopedia to the Open Web" by Mark Pilgrim

Google DocType is basically a wiki provisioned by Google withing their Google Code site. The goal of the wiki is to document the open web. So what is the open web? Well it wasn't really a straightforward answer. Essentially, Google only really wants to capture information related to raw browsers, for instance, anything related to a vanilla install of Firefox. This means that they won't really address Flash or Silverlight, or anything else not included with Firefox. This is where the definition got a little gray, Google does want to security to be a prevalent topic, so security issues related to Flash are okay for DocType. According to the presenter, it's technically a wiki, but ultimately Google (and he) reserve the right to control and revise the content that ends up there. This is not to discourage users from contributing, I think it was more for a way to avoid some of the challenges encountered by Wikipedia. For instance, he mocked the lengthy process Wikipedia has defined for deleting a page (this page itself is about 10 printed pages long)
http://en.wikipedia.org/wiki/Wikipedia:Deletion_policy
Dictatorship aside, the content that is actually in the wiki now is extremely useful. There are thorough articles that include thorough examples of code. The only problem is that the content in the wiki is far from comprehensive. There are plenty of pages that are merely stubs of information yet to come. Google recognizes this and merely admits that DocType is a work in progress.
Some other noteworthy pieces of information from the presentation. Apparently, when the presenter was pitching the idea to fellow Googlers, their primary concern was how to prevent spam from being posted. Interesting that this is the primary concern, and quality of information isn't. To alleviate this, an automated process is put in place to find and remove spam, and after that a manual process is defined. Bottom line though, spam really hasn't been a problem up until this point. Another nice feature is that since Google is hosting this, the data is replicated across all their datacenters. And finally, the entire project (code examples included) is accessible through a subversion checkout. A very nice feature if you'd like to copy the data locally for review during a period of no Internet connectivity. Finally, Google has written thousands of JsUnit test cases to verify the information in DocType. For instance, if some information is posted about how Firefox 2.x handles colors, there are accompanying JsUnit test cases to verify that the information is correct. Mark mentioned that they actually found some very subtle bugs in browsers that have existed for years while creating their baseline of test cases.
Mark was very anti-Microsoft, and made many jokes at their expense. His operating system of choice is Debian.
You can find Google DocType at:
http://code.google.com/doctype/
In my opinion, this is a nice step by Google. Since their business is almost all web based they have great expertise in this area. I have to wonder though, how will this keep the momentum for constant updates?

No comments: