Monthly Archives: March 2010

Changelog for build 5275

Filed under changelog
New Core tables

  • yql.env, automatically apply environments per app or per user per app

Bug fixes, including:

  • “content” can be used to filter in WHERE clause
  • fixed charset parameter handling in HTML table (was broken in previous build)
  • another custom redirect fix (for SPARQL table)
  • update now supports “in” as where clause

Avoiding rate limits and getting banned in YQL and Pipes: Caching is your friend

Filed under tutorial

Web caches are great pieces of software: they lower the load on servers; and serve content faster to clients. YQL and Pipes love caches for this very reason, and we reward clients making good use of our reverse proxy caches by not subjecting those who get cached content to rate limits. That’s right – if we can give you your content from cache you can call it as often as you like, no need to cache locally just to save on calls.

Unfortunately we’ve seen a lot of requests to us that could easily take advantage of our caches but don’t. Here’s a list of some DOs and DONTs for calling YQL and Pipes, and let’s use the example of fetching the weather for a zipcode: * from weather.forecast where location=90210

DON’T cachebust
“Cachebusting” means changing your request just a little so that the cache can’t give you a copy of the response it’s seen before, often using a random value or timestamp on the end of the query parameters, for example: * from weather.forecast where location=90210&rnd=_12312

Beware of client-side JS libraries “helping” you
Often developers aren’t even aware they are doing this, but various web client libraries, particular client-side Javascript ones, seem to think cachebusting by default is a sensible thing to do. Its not. It just makes our servers work harder, the downstream sources of data work harder, and the response come back slower to the client. It also stops your own browser cache from helping your app behave faster.

For example, jQuery provides an automatic JSONP callback library that creates a randomly named global function name for each callback. This causes it to cachebust on every call as the function name changes all the time. By taking the time to add a few extra lines of code you can benefit from our caches:

   url: '',
   dataType: 'jsonp',
   jsonp: 'callback',
   jsonpCallback: 'cbfunc'
function cbfunc(data){
   $.each(data.query.results.table, function(i,item){

By defining your own global function in your script you can be sure that it won’t change from request to request, and you can leverage our caches.

If you must cachebust, use a “window” of time
Sometimes the content gets cached for longer than you want, and sometimes your clients are IE6 web browsers which don’t respect cache headers correctly [shudder]. The best solution to this is to append a parameter that changes gradually at the same rate as the content you are requesting. For example, back to our example of fetching the weather forecast for a zipcode. The forecast will probably change throughout the day, but not every single second, so you’re probably ok fetching that every hour and therefore can create a cachebusting header that uses a timestamp that only changes once per hour, for example: * from weather.forecast where location=90210&rnd=_2010031310

This uses a YYYYMMDDHH (year/month/day/hour) format for each request to fetch the weather. All requests arriving over the course of an hour will get a cache hit and you’ll use 1 unit of rate limit (per zipcode).

DO put content into cache
On the flip side of caching busting, sometimes content isn’t cached as long as it should be, or you want it do be. Perhaps the content provider set it wrongly or your usage of the content doesn’t need it updating so frequently. You can take control of this in YQL in a couple of ways.

First, you can choose to explicitly set the cache “maxage” header in an open data table to whatever you want. Lets say you want the table data to be cached for 5 minutes, then in the <execute> statement you’d say response.maxAge=300; (its specified in seconds).

Secondly you can just ask YQL to cache the response to a statement for longer – just append the _maxage query parameter to your call and the result will be stored in cache for that length of time (but not shorter than it would have been originally): * from weather.forecast where location=90210&_maxage=3600

This is really useful when you’re using output from a table that’s not caching enough or an XML source without having to do any open table work.

By making a few small changes to the way your client calls YQL and Pipes you can gain almost infinite rate limit in many cases and provide better performance to your users.

Jonathan Trevor