The Geeks Shall Inherit the Earth

Sunday, September 23, 2007

Rebooting the Web

For the last few years, the thing all the cool software developers have been talking about is Web Services. Unfortunately, what they’re usually talking about is SOAP and the rest of the W3C web services stack. My experience with SOAP suggests to me that it is not particularly well designed, an opinion others seem to share. But the real issue I have with it is calling the whole business “web services”. I’m all for an open protocol for remote method invocation based on HTTP and XML, but don’t assume that just because you’re using HTTP that means that what you’re doing is “web-like”.

What has made the web so successful is two things: openness and architecture. Openness is really important if you want your technology to gain widespread acceptance, but hyperlinking is what makes the web special. It’s, you know, the “web” part. Hyperlinking is what lets the users of the system generate not only all the content, but the distributed structure as well. And then it lets you do all sorts of other cool things, like have two guys from Stanford write an algorithm to analyze all that user-generated structure and build a great new search engine. When the history of the web is written, I predict that SOAP will be a footnote but that REST will play a key role. SOAP (or some improved version thereof) will have proven to be a useful technology. But if true web services are to be successful at all, REST, with its approach to exposing all of the objects of a system so that third parties can exploit that access in flexible and unpredictable ways, is how it will happen.

Another thing that people seem to overlook is that “web applications” are rarely very web-like. They’re deployed via the web, but not they’re not really part of the web at all. I love the travel site SideStep. It lets me build and manipulate a view of a huge number of airline fares in a way that lets me find exactly what I need. But I can’t link to an individual view of available fares that meet certain criteria and then send that link to a friend.

Wikipedia, on the other hand, lets you link to any topic, any version of a topic, any conversation about that topic, and the list of all edits by any user who has edited that topic. That kind of access to all the information in Wikipedia is what lets clever people create things like the WikiScanner that’s caused such a stir lately. Google Maps is another great example of a web-like web application. Its “Link to this page” offers a link to the current view that you can paste into an email or embed in your own web page. Which is not even to mention its support for an endless variety of mashups that display everything from US Census Bureau statistics to stores where Nintendo Wiis are in stock. Any time you give users a way to exploit your technology and remove the limits on how they can use it, good and surprising things will happen.

Which brings me to my point. I was highly amused to hear that Microsoft had rebooted the web a few months ago. Now if anybody knows something about rebooting, it’s Microsoft, but this was still pretty big news. If you haven’t heard about Silverlight, the new technology that spurred this surprising announcement, perhaps it’s because the rest of the web has continued doing its thing. You’d be forgiven for not noticing the reboot.

Silverlight looks like a really cool technology. But it didn’t, nor will it ever, reboot the web, because it isn’t the web. It lets you deploy applications via the web and it probably “plays well” with the web. But unless Microsoft builds in all sorts of clever ways for people to assemble and exploit this technology in unplanned ways, it will never be a part of the web. And for a company known for trying to strictly control how people use its software, that seems unlikely.

Update (July 4, 2008): Apparently Google agrees about REST. The true mark of REST’s success is that it’s starting to become the default. HTTP APIs are just made that way, without anyone necessarily even calling them “REST”.

Update (November 3, 2008): ProgrammableWeb adds its 1000th entry into its web API directory. Guess which type of API accounts for 63% of the entries?

Sunday, September 9, 2007

Favorite Web Apps

I got involved in a conversation the other day about favorite web apps. Here’s my Top Five:

IMDB - Wikipedia is starting to contend with IMDB as the first place I go to look up information on movies, but IMDB holds a special place in my heart. I’ll never forget discovering it about ten years ago. That was the first time I was certain that the web was going to totally change how we organize and access information. The complete freedom to follow the highly interconnected relationships between movies, actors, and directors was thrilling. Before then, the web was a faster and more convenient way to get at information that could be had via other channels. Tremendously useful, for sure, but not quite revolutionary. There may be examples that pre-date it, but IMDB was the site that really brought it home for me.
Wikipedia - Opinions seem to range from WIkipedia being more accurate than Encyclopedia Brittanica to it being so inaccurate as to be a joke. But for my money, it is the best thing created since the web itself. Paul Graham put it better than I could: “Experts have given Wikipedia middling reviews, but they miss the critical point: it’s good enough. And it’s free, which means people actually read it. On the web, articles you have to pay for might as well not exist. Even if you were willing to pay to read them yourself, you can’t link to them. They’re not part of the conversation.”
Gmail - Not so much because of its AJAX implementation, although that’s very cool, but because of how Google re-thought the user interface of a mail reader. For example, moving entire threads back into the Inbox when someone replies to an archived message. And, of course, the ability to rapidly search all messages with a single keystroke is invaluable. Not only is it highly effective in discovering what I’m looking for, but it has freed me from my compulsion to create large hierarchies of mail folders. Filing messages in these folders consumes a lot of time and energy and often makes finding messages more difficult rather than easier.
Google Maps - I love maps, and I’m not aware of a better site for easily navigating maps and satellite imagery. Google Earth/Maps* is one of my favorite ways to waste time.
RSS - Sort of an application framework as opposed to a specific application, of course, but transformative however you categorize it. As Netvibes puts it, RSS lets you “remix” the web.

The key to most of these examples (and the web itself) is having a simple but effective model (e.g. information feeds, encyclopedia topics, email messages), defining an accessible set of operators on that model (e.g. publication/subscription/aggregation, create/edit/diff/rollback, labeling/archiving), and then removing as many restrictions from users’ ability to exploit the operators as possible.

* Update (2007-09-13): Make that Google Earth/Maps/Sky/Moon! **

** Update (2007-09-19): I missed Google Mars. I think Google’s producing map applications faster than I can keep track of them.

Saturday, September 1, 2007

Realistic Space Combat Game?

Here’s a random thought I had today. Why has there not been a single space combat game with realistic physics since Asteroids? A game where you pilot a space fighter that doesn’t behave as if it’s in an atmosphere?

Well, I don’t think there’s anything that doesn’t exist somewhere on the internet (consider Rule 34). It turns out that someone asked themselves the same question and then actually created such a game, Void War. An independent game that received pretty weak reviews even from the backwater game sites I’ve never heard of that reviewed it, though. The game may not be much, but the designer’s story behind the creation of the game is pretty interesting.

I think there could be a market for a mainstream game of this nature. It would be difficult to learn to control a space fighter with realistic physics, so maybe the lack of arcadiness would doom such a game to failure. But I think positioning it as more of a simulation would help. Consider all the hyper realistic flight simulators out there. They couldn’t be less arcadey, but they have a small but loyal market. Better yet, why not a “Battlestar Galactica” game? One of that show’s hallmarks is the (mostly) realistic physics of its space combat. Duplicate that in a video game, offer the option for true realism or physics-lite combat and I think you’d have a solid hit.

Wednesday, August 29, 2007

Spelling Corrector in 21 Lines of Python

A friend recently pointed out a very interesting article from Peter Norvig, the Director of Research at Google: How to Write a Spelling Corrector. The point of the article was to illustrate some spell-correction theory, but of course what I took away from it was that he had written a working spelling corrector in just 21 lines of Python code. Ammo for the language wars!

My friend also mentioned that he often mistypes things because his hands are offset one position on the keyboard. So he’ll type “je;;p” instead of “hello”. Since spelling correction is so easy, why do none of them offer a way to correct offset typing, he wondered? I told him he was probably just in a very small minority. I doubt that many web users touch type at all, let alone have the offset typing problem. But more importantly, here was an opportunity to ram home the point that so much is possible in so very few lines of Python.

So I took a swing at a solution, and five lines of code is what it took me to add offset-typing correction to Norvig’s spelling corrector:

offsets_right = {'j' : 'h', 'i' : 'u', ';' : 'l', 'p' : 'o'}
offsets_left  = {'g' : 'h', 'y' : 'u', 'k' : 'l', 'i' : 'o'}
def offsets(word):
  return set([''.join(map(lambda x: offsets_right.get(x, 'a'), word)),
              ''.join(map(lambda x: offsets_left.get(x, 'a'), word))])

And then you just have to tweak the final function:

def correct(word):
  return max(known([word]) or known(edits1(word)) or known_edits2(word)
             or known(offsets(word)) or [word],
             key=lambda w: NWORDS[w])

This solution checks for words typed with a left or right offset. For example:

>>> correct('hillo')
'hullo'
>>> correct('ji;;p')
'hullo'
>>> correct('gykki')
'hullo'

Notes:
1. I use ‘hullo’ instead of ‘hello’ since the latter never appears in the Sherlock Holmes stories used as a training corpus.
2. The offset dictionaries only have 4 entries, to cover the four letters in the example. The complete solution would need to have entries for every non-meta key on the keyboard (except for the ones on the left or right edge).
3. My solution is a bit more complicated than I’d like. What I wanted to write was:
map(offsets_right.get, word)
Which is nice and clear but for various reasons I had to use:
''.join(map(lambda x: offsets_right.get(x, 'a'), word))
There’s probably a nice Python solution that would be as terse but more readable.

Here’s the complete solution:

import re, string, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
  model = collections.defaultdict(lambda: 1)
  for f in features:
      model[f] += 1
  return model

NWORDS = train(words(file('holmes.txt').read()))

def edits1(word):
  n = len(word)
  return set([word[0:i]+word[i+1:] for i in range(n)] + ## deletion
             [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + ## transposition
             [word[0:i]+c+word[i+1:] for i in range(n) for c in string.lowercase] + ## alteration
             [word[0:i]+c+word[i:] for i in range(n+1) for c in string.lowercase]) ## insertion

def known_edits2(word):
  return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

offsets_right = {'j' : 'h', 'i' : 'u', ';' : 'l', 'p' : 'o'}
offsets_left  = {'g' : 'h', 'y' : 'u', 'k' : 'l', 'i' : 'o'}

def offsets(word):
  return set([''.join(map(lambda x: offsets_right.get(x, 'a'), word)),
              ''.join(map(lambda x: offsets_left.get(x, 'a'), word))])

def correct(word):
  return max(known([word]) or known(edits1(word)) or known_edits2(word)
             or known(offsets(word)) or [word],
             key=lambda w: NWORDS[w])

Saturday, January 13, 2007

Python

If you love Python, this post is not for you. If you hate Python, this post is not for you. If you were thinking about trying Python, or you’ve tried it and you’re just not sure, read on.

Here’s my favorite entries in the world of Why Python is Great:

Quotes about Python
Why Python?
Why I Promote Python
Can Your Programming Language Do This? (More of a dig on Java, and doesn’t even mention Python, but a very entertaining read.)

If you want to do web-based stuff, then try TurboGears. Seriously, try it. You’ll thank me.

Now, if you’re really looking to spend some time thinking about languages, the following discussion is very interesting if a bit esoteric and paints Python in a more nuanced light:

Enthusiasts of other languages could put together similar lists for their own languages. So if you don’t like Python for whatever reason, but you’ve always wondered about Ruby or Scheme, go seek out more information. The most important thing is that we as programmers continually seek to expand our language horizons. But whatever you do, please, I’m begging you, don’t use Perl.

Saturday, August 12, 2006

Web 2.0

I just read a great essay by Paul Graham on the definition of "Web 2.0".

“Experts have given Wikipedia middling reviews, but they miss the critical point: it’s good enough. And it’s free, which means people actually read it. On the web, articles you have to pay for might as well not exist. Even if you were willing to pay to read them yourself, you can’t link to them. They’re not part of the conversation.”

With this and other comments, Graham really sums up nicely the fundamental reason why many organizations don’t “get” the web. They think of it as simply a new distribution channel for their existing products (magazines, tv shows, encyclopedias, etc.). They think they get it, because they recognize the increased power of this new channel, but they don’t understand that entirely new forms of existing products and new products that never existed before are now possible, and that their non-web products can’t compete.