One Lucky Winner

I was discussing my latest project the other day with a friend, who told me about something he heard about where you could win $10,000 in scholarship money, just by tweeting.

It turns out that he was talking about a CollegeScholarships.org offer to award students a little over $1000 in a Twitter-based raffle. (The $10,000 figure given in their press material is inconsistent with what the site says).

When I was very young, the idea of winning the lottery fascinated me. With a chance to be lucky enough to become an instant millionaire, who could resist the allure of buying lottery tickets? My opinions luckily changed as I learned more about probability and behavioral psychology, but I still frequently see entrepreneurs pitching ideas that involve some element of a sweepstakes, or a contest, or a raffle.

I’m definitely interested in game mechanics, but not these types of games. If being a winner of your game involves a special rare stroke of luck, it’s probably a game that’s best left unplayed.

Comments



 

Playing with Fwix and Twilio (and an App Engine CacheCompress utility)

This weekend, I had a chance to play with two platforms that have been on my “to-play-with” list: Twilio and Fwix.

The result is a simple app I’ve made called PhoneFwix. The workflow is simple. Call (415) 483-1286. Type in your zipcode. Listen to news. The Python code is only a couple hundred lines, including all of the geocoding, fetching, processing, and little utility methods. Pretty good, Fwix and Twilio. Pretty, pretty good.

The setup on the Twilio website was very straightforward, thanks to some awesome documentation. I did have to pay a minimum of $20 to open a full account, but you can barely buy lunch these days for $20, so that’s certainly a fair deal.There are a couple of improvements I’d like to see from Twilio, such as an interactive debugging mode where I can call from a developer phone number and get some on-the-fly introspection. And, of course, Google Voice integration. (Actually, I’ll just go out and predict that Twilio will be purchased by Google by the end of 2010. You heard it here first.) On the whole, I’m very pleased with Twilio and have already thought of a few more exciting things I could do with it.

A lot of the Twilio apps I’ve seen so far involve lead generation and other sales-oriented type things. Which makes sense, considering the huge savings you’d get from Twilio compared to the expensive enterprise phone systems that charge upwards of $5 per call received (whereas Twilio charges a few cents for an average call).

Fwix’s API only has a few methods, and has a ‘geo_id’ schema that’s slightly more awkward than it might be, requiring at least two calls to get anything done. And it formats integers as strings and sometimes sends strange fragments as titles or summaries. And while these can easily be automatically flagged and discarded, they could also be flagged and discarded on Fwix’s end.

But enough with the complaining, already! Fwix does a good job in the most important area: the content. All of the geographic areas are well-seeded with (mostly) relevant information, and this is quite a feat for a Chop Suey application.

The hardest part about making this application was taking a zipcode and figuring out which Fwix location was closest to it. Luckily, geopy made this task a piece of cake:



@memoize()
def get_coordinates_for_zipcode(zipcode):
    from geopy import geocoders  
    g = geocoders.Google(GOOGLE_API_KEY)
    place, (lat, lng) = g.geocode(zipcode)
    return place, (lat, lng)

 
 
@memoize()
def get_closest_geo_id(lat, lng):
    from geopy import distance as geopy_distance
    from models import GeoID
    geo_ids = GeoID.all().fetch(1000)
    distances = []
    for geo_id in geo_ids:
      distances.append(
      {'distance': int(geopy_distance.distance(
      geo_id.coords(), (lat, lng)).miles),
       'geo_id': geo_id.geo_id,
       'place': geo_id.key().name()
      })
    distances = sort_by_key(distances, 'distance', reverse=False)
    return distances[0]['geo_id']


get_coordinates_for_zipcode uses the Google Maps API to (drum roll) get coordinates for the given zipcode.

get_closest_geo_id is the one that really saved me from having to brush up on my Dijkstra algorithm skills. It takes two (lon, lat) tuples and automagically finds the distance. The resulting list is sorted by the ‘distance’ dictionary key, so that the first sorted geo_id is the least distance from the given zipcode.

One annoying thing about geopy is that it’s full of print statements that I had to comment out. Honestly, who does that? You can’t just print things wily nily, geopy. Then there would just be anarchy. That’s why we have logging: to avoid anarchy…and print statements.

On a slightly unrelated note, I’ve also made a simple module for compressing GAE datastire entities into their binary representations, and decompressing them. It’s inspired by Nick Johnson’s recent post about the db.entity_pb module. (By the way, if you do any development on GAE and you don’t read Nick Johnson’s blog, you, sir or ma’am, are a crazy person.)

The problem with Nick’s sample code is that, well, it’s sample code. In real life, I’m memcaching pretty much everything I can, and in lots of cases, it’s just arbitrary data.

So I made two helper methods to conditionally compress entities into their binary equivalent. If the data is a list, it figures out if the list contains entities by looking at the first few entries for a db.Model object. If so, it individually checks each item and converts it if it’s an entity.


 
def to_binary(data):
  """ compresses entities or lists of entities for caching.

  Args: 
        data - arbitrary data input, on its way to memcache
  """ 
  if isinstance(data, db.Model):
    # Just one instance
    return makeProtoBufObj(data)
  # if none of the first 5 items are models, don't look for entities
  elif isinstance(data,list) and find_first(
  lambda i:isinstance(i, db.Model), data[:5]):
    # list of entities
    entities = []
    for obj in data:
      # if item is entity, convert it.
      if isinstance(obj, db.Model):
       protobuf_obj = makeProtoBufObj(obj)
       entities.append(protobuf_obj)
      else:
       entities.append( obj )
    buffered_list = ProtoBufList(entities)
    return buffered_list
  else: # return data as is  
    return data


I use custom classes ProtoBufObj and ProtoBufList so that it’s very easy to identify the types of data that need to be decompressed:



class ProtoBufObj():
  """ special type used to identify protobuf objects """
  def __init__(self, val, model_class): 
    self.val = val
    self.model_class = model_class 
    # model class makes it unnecessary to import model classes
  
class ProtoBufList():
  """ special type used to identify list containing protobuf objects """
  def __init__(self, vals):
    self.vals = vals

def makeProtoBufObj(obj):
  val = db.model_to_protobuf(obj).Encode()
  model_class =  db.class_for_kind(obj.kind())
  return ProtoBufObj(val, model_class) 

I initially didn’t have the model_class attribute of ProtoBufObj objects, but then I started getting KindError exceptions when an entity was being decompressed but the model definition import for that given entity was behind the memoize() decorator, where it wasn’t being executed.

It’s easy to rectify this issue by making sure that model imports are being called from outside of the memoized methods, and then you should be fine to remove the model_class attribute.

** Download the CacheCompress module here. **

This utility makes for a great example of the well known time/space tradeoff, since it takes a little more time to compress and decompress the entities, but saves a good amount of space in the memcache. It actually might negate the additional time required since the amount of data that needs to be retrieved from the memcache is considerably less than it would otherwise be, so it likely takes less time to complete the memcache calls. I’ll probably have to do a test with some huge amount of entities to get a definitive answer to see how it affects performance.

Comments



 

A Chance to Be Good

In his essay on being good, Paul Graham first introduces two Y-Combinator principles of “Make Something People Want” and “don’t worry about the business model”, and considers at length the similarities (and differences) between why these principles, seemingly descriptive of a charity, also describe a successful startup.

The passage below is one that I find to be particularly insightful:

The most important advantage of being good is that it acts as a compass. One of the hardest parts of doing a startup is that you have so many choices. There are just two or three of you, and a thousand things you could do. How do you decide?
Here’s the answer: Do whatever’s best for your users. You can hold onto this like a rope in a hurricane, and it will save you if anything can. Follow it and it will take you through everything you need to do.
It’s even the answer to questions that seem unrelated, like how to convince investors to give you money. If you’re a good salesman, you could try to just talk them into it. But the more reliable route is to convince them through your users: if you make something users love enough to tell their friends, you grow exponentially, and that will convince any investor.

The advantage of being good that pg missed was your goodness will be your ace in the hole when it comes to challenging an establishment.

Good for Newspaper Readers

The story of the fall of newspapers is a perfect example. The internet was just so darn good for newspaper readers that newspapers did not have nearly the sort of bargaining power they imagined they would have had.

For one of my last classes at Medill, I read op-eds and essays about internet journalism from the mid-nineties that made not only the mistake of underplaying the new medium’s true strengths, but also greatly overestimating their own ability to bargain their way to the top, because they mostly denied the possibility that the web would be so good that people would simply be willing to see newspapers fail.

After all, what would it be like to live in a country where most small cities didn’t have their own newspaper?

Good for Students

I think the next example we’ll see of something extraordinarily good happening will be in classrooms.

Teacher’s unions have claimed that it’s their job to help students by helping teachers, when they block new technologies from entering the classroom that vaguely appear to be a threat to the livelihood of teachers.

So the only way to get such a thing into a classroom would be to have something so good for students that it would just make the teachers unions look bad to be anything but enthusiastic about it.

This is another chance to be good that has finally arrived.

Comments



 

There's No Shame in Chopping Suey

Last week, I wrote about what I loved about Fwix:

Fwix doesn’t even bother with the pretense of asking its users for original content. As far as I can tell, there aren’t any places within the Fwix.com site where you can post stories.
There’s a good chance that Fwix actually will introduce tools to post original content, but why should they bother? Right now, you’re expected to post them on your Facebook or Twitter feed, but that’s where everyone would rather be posting news links anyways. In fact, that’s where people are already posting news links!
The mental cost of switching can have huge ramifications about user adoption, and leads to rich-get-richer, poor-get-poorer effects. Friendfeed suffered, for example, because it had too high of a mental cost of switching for most people, even if it was very easy to use Friendfeed together with other services.
It’s refreshing to see a site that doesn’t even pretend that you’re going to want to use yet another tool on the web.

Yesterday, I read about Artwiculate:

Every day a new word is chosen, and simply sending a tweet containing that word will enable your update to appear on the Artwiculate site. Once there, other users can vote whether your usage was “liked” or whether it was inaccurate.

Chop Suey

Artwiculate is another member of what I now call the Chop Suey class of application that doesn’t require people to do anything besides what they’re already doing. This class of application isn’t really new. Google’s flagship search product is literally a textbook example of adding value to crowdsourced data, and Tim O’Reilly, Kevin Kelly and many others have been evangelizing collective intelligence for well over a decade.

But it’s now getting to the point where you’re at an obvious disadvantage when you do anything but use things that are already there. Artwiculate works even if no one uses it, and that’s a crucial distinction that may help it survive the valley of despair.

Unfortunately, the Chop Suey approach has largely not yet been absorbed in academic settings, and is lagging behind the vanguard even in the best case scenarios.

Comments



 

Why Your 'App Engine Sucks' Post Sucks

After reading another misinformed criticism of Google’s App Engine platform, I feel compelled to make a suggestion.

Try your performance tests again, but with a simple change.

Add a @conditionalCache decorator that will always serve a cached response unless the request is from a background task or cron job, in which case it forces a refresh.

You could use the memoize decorator I posted earlier in the summer that allows for some extra params to be sent to configure the per-request cache handling, or something more simple.

Then add these two lines where appropriate:


if self.request.get('X-AppEngine-TaskName', False):
  force_run = True # force cache refresh

Cron jobs have an ‘X-AppEngine-Cron’ header, but I haven’t included it in this if-statement because your cron jobs should not directly hit the URLs in question, but rather add a task to hit that URL. Then, if the request fails, the task will be retried until it succeeds and can cache the results, whereas the cron job would simply fail.

You can cache your templates this way, or any arbitrary data objects, especially anything that requires a GQL query or URLfetch. Of course, you may have specific elements you don’t want to cache, and you can usually render those at runtime without too much trouble if they don’t involve a db hit.

Cron jobs are very easy to configure, but background tasks are much more versatile and useful. For instance, to refresh popular pages more often, you could use another decorator to launch a background task to force a refresh of the page’s cache after N number of requests to a given URL are made.

Comments