Author Archives: yqlteam

Changelog for build 8743

Filed under changelog

New Feature Highlights

  • Store recent queries using HTML5 container
  • Console y.ahoo.it url shortener integrated into the console when permalinking queries – now it’s easier to grab and share long console query

New core tables

  • Yahoo Mail API
  • Social Relationship API

Other features

  • y.cache.incr() should return the new value
  • geo.placetypes adds language support

Changes

  • Preserve newlines in console output

Bug fixes, including

  • Aliased input keys are not available as variables in <execute>
  • In console set publiclyCallable to false if a user is not logged in and runs a query with an auth required table

Changelog for build 8367

Filed under changelog
New Feature Highlights:
  • Allow an <execute> outside the execute blocks to add libraries, functions
  • y.rest().ping() add a way to beacon data to statistics gathering services

New core tables

  • Fantasy Sports API

Other features

  • Improve ability to throw exceptions from execute tables
  • Update tidy library to support HTML5
  • Add y.tidy(String html) to tidy the html and return a document

<execute> changes

  • Core HTTP changes to enable better connection management and reuse

Bug fixes, including:

  • Fix Diagnostics service-time included y.query time
  • Fix Can’t “desc yql.env”
  • Fix Json array returned by a rest call results in an exception
  • Fix Debug output does not handle special characters
  • Fix paging with mode=’offset’ is ignoring the start default attribute

Changelog for build 6122

Filed under changelog
New Feature Highlights:

  • new <meta> element in YQL response envelope returns information “about” the result list
  • social.updates.search table – access the social updates firehose
  • changes to environment feature and capabilities
    • new “env” statement. You can now load envs using the “env” statement as part of the regular YQL syntax.
    • “env”s may now be nested – you can include one env from another
    • “sets” on one environment do NOT apply to any other, unless the environment is nested

New core tables

  • social.updates.search
  • yql.table.desc, yql.table.list and yql.tables tables (reflect on the available YQL tables)

Other features

  • show tables now reports the security required to use the table
  • desc <table> now works for all tables irrespective of security or connection restrictions (https) on tables

<execute> changes

  • added response.meta to enable <meta> element to be set on the response object
  • added forceCharset(String charset) to request/y.rest(..) as a way of overriding the return contentType charset.

Bug fixes, including:

  • fixed debug=true always reported the method as GET in network dump
  • including a store:// url in execute y.include intermittently caused no results to be returned – fix
  • yahoo:uri is no longer in the response envelope
  • yahoo.identity fixed

Changelog for build 5275

Filed under changelog
New Core tables

  • yql.env, automatically apply environments per app or per user per app

Bug fixes, including:

  • “content” can be used to filter in WHERE clause
  • fixed charset parameter handling in HTML table (was broken in previous build)
  • another custom redirect fix (for SPARQL table)
  • update now supports “in” as where clause

Avoiding rate limits and getting banned in YQL and Pipes: Caching is your friend

Filed under tutorial

Web caches are great pieces of software: they lower the load on servers; and serve content faster to clients. YQL and Pipes love caches for this very reason, and we reward clients making good use of our reverse proxy caches by not subjecting those who get cached content to rate limits. That’s right – if we can give you your content from cache you can call it as often as you like, no need to cache locally just to save on calls.

Unfortunately we’ve seen a lot of requests to us that could easily take advantage of our caches but don’t. Here’s a list of some DOs and DONTs for calling YQL and Pipes, and let’s use the example of fetching the weather for a zipcode:

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210

DON’T cachebust
“Cachebusting” means changing your request just a little so that the cache can’t give you a copy of the response it’s seen before, often using a random value or timestamp on the end of the query parameters, for example:

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210&rnd=_12312

Beware of client-side JS libraries “helping” you
Often developers aren’t even aware they are doing this, but various web client libraries, particular client-side Javascript ones, seem to think cachebusting by default is a sensible thing to do. Its not. It just makes our servers work harder, the downstream sources of data work harder, and the response come back slower to the client. It also stops your own browser cache from helping your app behave faster.

For example, jQuery provides an automatic JSONP callback library that creates a randomly named global function name for each callback. This causes it to cachebust on every call as the function name changes all the time. By taking the time to add a few extra lines of code you can benefit from our caches:

$.ajax({
   url: 'http://query.yahooapis.com/v1/public/yql?q=show%20tables&format=json',
   dataType: 'jsonp',
   jsonp: 'callback',
   jsonpCallback: 'cbfunc'
});
function cbfunc(data){
   $.each(data.query.results.table, function(i,item){
   $('#tables').append('<p>'+item+'</p>');
});
}

By defining your own global function in your script you can be sure that it won’t change from request to request, and you can leverage our caches.

If you must cachebust, use a “window” of time
Sometimes the content gets cached for longer than you want, and sometimes your clients are IE6 web browsers which don’t respect cache headers correctly [shudder]. The best solution to this is to append a parameter that changes gradually at the same rate as the content you are requesting. For example, back to our example of fetching the weather forecast for a zipcode. The forecast will probably change throughout the day, but not every single second, so you’re probably ok fetching that every hour and therefore can create a cachebusting header that uses a timestamp that only changes once per hour, for example:

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210&rnd=_2010031310

This uses a YYYYMMDDHH (year/month/day/hour) format for each request to fetch the weather. All requests arriving over the course of an hour will get a cache hit and you’ll use 1 unit of rate limit (per zipcode).

DO put content into cache
On the flip side of caching busting, sometimes content isn’t cached as long as it should be, or you want it do be. Perhaps the content provider set it wrongly or your usage of the content doesn’t need it updating so frequently. You can take control of this in YQL in a couple of ways.

First, you can choose to explicitly set the cache “maxage” header in an open data table to whatever you want. Lets say you want the table data to be cached for 5 minutes, then in the <execute> statement you’d say response.maxAge=300; (its specified in seconds).

Secondly you can just ask YQL to cache the response to a statement for longer – just append the _maxage query parameter to your call and the result will be stored in cache for that length of time (but not shorter than it would have been originally):

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210&_maxage=3600

This is really useful when you’re using output from a table that’s not caching enough or an XML source without having to do any open table work.

By making a few small changes to the way your client calls YQL and Pipes you can gain almost infinite rate limit in many cases and provide better performance to your users.

Jonathan Trevor

Changelog for build 4264

Filed under changelog
New feature highlights:

  • Customizable caching. Execute can now set maxage header in response (response.maxAge=300), and clients can also request a greater maxage header for increased performance (&_maxage=300).
  • Query aliases. Name your YQL queries using meaningful short names.

Core Table changes

  • New global execute element outside of bindings is prepended to all executes (to enable common js to be run over all bindings)

New Core tables

Execute changes

  • max-age header is now auto-calculated based upon queries and rest calls made in execute

Bug fixes, including:

  • Redirect handling improved
  • Upgraded memcache library
  • Batchable attribute now works correctly with paramType=”query” and “matrix”

Changelog for build 3396

Filed under changelog
New feature highlights:

  • y.rest and y.query now support timeouts
    • y.rest(..).timeout(30).get() will fail after 30ms
    • y.query(…,30) will fail after 30ms
    • An exception gets thrown if the timeout is hit

    Open Data Table schema changes

    • url/urls is now optional in the schema

    Bug fixes, including:

    • @ substitution works for paging parts of the query
    • url based paging works in more cases

    Changelog for build 3013

    Filed under changelog

    New feature highlights:

    • set verb for configuring static variables
    • yql.storage tables for storing tables, environments and more in YQL itself
    • debug mode for table development (debug=true)
    • multiple environment support

    Core table changes:

    • update for geo.placemaker table
    • social.connections.updates results are sorted by date similar to social.updates
    • csv table now has a charset key (if the source doesn’t provide one this can be used instead of the utf-8 default)

    New core tables:

    • meme.*

    Open Data Table schema changes

    • input key “as” attribute for renaming parameters

    Execute changes

    • y.env function so you can load up enviroments inside a YQL execute element.
    • y.crypto, for cryptographic signing
    • y.context (single value, table, contains the name used by the executor of this table)

    Bug fixes, including:

    • xpath and multiple IN url selects on HTML page no longer fails
    • table name is now present in execute
    • sanitize() can now take no params
    • workaround to ruby/github client-ip bug
    • update query without where clause returns error message instead return null
    • const key values are no longer mutable by the keys in the YQL query
    • add client-ip to outgoing header based on incoming authenticated IP address
    • @variables other than urls now work on data tables
    • trim whitespace around json responses to parse better (fixes itunes issue)
    • post method reverse(field=”id”) displays correct method name in error message
    • User-Agent sent via HTML fetches through YQL should indicate Yahoo Pipes 2.0 (now uses: Mozilla/5.0 (compatible; Yahoo Pipes 2.0; +http://developer.yahoo.com/yql/provider) Gecko/20090729 Firefox/3.5.2)

    Changelog for build 2174

    Filed under changelog

    New feature highlights:

    • INSERT/UPDATE/DELETE bindings and language features
    • JSONP-X feature (XML string as JSON result if format=xml and JSONP callback is specified)

    Core table changes:

    • social.updates defaults to sorting updates by date (most recent first)

    New core tables:

    • social.profile.status
    • social.connections.updates (efficiently gets updates for all connections)
    • flickr.photoset
    • geo.placemaker

    Open Data Table schema changes

    • new “url” paging model
    • new “insert”,”update” and “delete” bindings
    • new “map” and “value” input key types

    Execute changes

    • New methods on y.rest():
      • post(content), post the value of content to the url.
      • put(content), put the value of the content to the url
      • del(), send delete http verb to the url (delete is a reserved word in JavaScript? so thats why this is del)
      • contentType(string), set the content type of the payload on content (e.g. application/json)
      • accept(string), set the accept http header to a mimetype, and tell YQL what we expect the response to contain (and how to parse it)

    Bug fixes, including:

    • No microformats in a page handled better
    • Words like “Select” and “Desc” now acceptable in projection, where and function clauses
    • Map input type now works in “Select”
    • Query parameters on the console are now passed through to the YQL service
    • y.log response.object and native object fixes
    • Multiple open data table authors now shown in “desc”
    • Json table now accepts top-level arrays.
    • response.object can be appended
    • CSV parser handles commas inside quoted strings
    • Javascript array.toString() works better
    • response.headers returns headers correctly now
    • y.jsonToXml now accepts javascript objects and empty values encoded as NULL
    • multi-key joins key ordering fixed

    Adding value to a data feed using YQL Execute

    Filed under feature, tutorial

    I want USGS earthquake data.  More specifically, I’m interested in recent, substantial quakes.  Fortunately, data.gov makes this data easy to find.  After searching through the USGS raw data catalog for the text “earthquake”, I choose the Worldwide M2.5+ Earthquakes, Past 7 Days feed, and pull it into YQL for parsing.  It’s almost perfect, but I want easy access to each quake’s magnitude, and the magnitude is buried in the “title” element.  No worries.  I’ll use YQL Execute to split it out and give it its own element in the feed’s structure.  I can then visualize this data using something like Jon LeBlanc’s js-yql-display project on github.

    Here are a few reasons why YQL is perfect for this task:
    1) I can take advantage of Yahoo!’s web-serving infrastructure to fetch, process, and cache the feed, reducing my server’s exposure and bandwidth costs.  My table is also cached, further reducing bandwidth usage.

    2) Because YQL Execute employs standard E4X, I am using and adding to my JavaScript skill set, instead of spending time learning a new language

    3) E4X was built specifically for XML manipulation so it has a convenient syntax for this job

    4) By using YQL to do the heavy lifting, I can minimize the code I send to the browser and keep it focused on the display logic.

    Ok. Ok. Here’s the code:

    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
      <meta>
            <description>Extracts magnitude from item title in atom feed and adds it as an element to the item.  We can then filter by magnitude using yql's built-in operators</description>
    	<sampleQuery>select entry from usgs.earthquakes</sampleQuery>
    	<sampleQuery>select entry from usgs.earthquakes where entry.magnitude >= 6.0</sampleQuery>
    
      </meta>
      <bindings>
        <select itemPath="" produces="XML">
    		<urls>
    
    			<url>http://earthquake.usgs.gov/eqcenter/catalogs/7day-M2.5.xml?11d</url>
    		</urls>
    		<execute><![CDATA[
    
    			default xml namespace = "http://www.w3.org/2005/Atom";
    			var xml = request.get().response,//call the url defined above
    
    			 	entries = <entries></entries>,//prep the output object
    				entry = null,//individual entry in xml obj. used in loop below.
    				magnitude = null;//magnitude of quake.  used in loop below
    
    			for each(entry in xml.entry){
    				magnitude =
    					entry.title//eg M 3.0, Puerto Rico region
    
    					.split(' ')[1]//eg --> 3.0,
    					.replace(',', '');//eg --> 3.0
    
    				entry.appendChild( <magnitude>{magnitude}</magnitude> );
    				entries.appendChild(entry);
    
    			}
    			response.object = entries;
    		]]></execute>
    
        </select>
      </bindings>
    </table>
    

    Now, we can put this table on a server, load it up in YQL, and easily access the magnitude using YQL’s parser.

    For those unfamiliar with E4X, it’s worth noting the namespace declaration (default xml namespace = "http://www.w3.org/2005/Atom";).  It tells YQL’s JavaScript engine what kind of structure to expect.  We wouldn’t be able to access the feed’s elements without it.  Find the namespaces associated with your data by looking in the xml wrapper:   The Atom namespace governs my feed’s structure as a whole, which is why it was convenient to declare it as a default.  For access to specific elements using another namespace, e.g. georss data, it’s be easier to define the namespace locally like this:
    var ns = Namespace("http://www.georss.org/georss");
    and then use it like this:
    var latitude = xml.ns::Result.ns::Latitude;

    Since we’ve gone to the trouble of defining a YQL table, we may as well add parsing for the summary element, which also contains some useful information in an inconvenient format.  Because this content is a bit more extensive, while still being somewhat predictable, a regular expression works well.  Here’s the code:

    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
      <meta>
    
    	<description>Extracts magnitude from item title in atom feed and adds it as an element to the item.  We can then filter by magnitude using yql's built-in operators.  Additionally, it extracts summary cdata, parses it, wraps the parsed data in its own element, and adds this element to the xml output.  </description>
    	<sampleQuery>select entry from usgs.earthquakes</sampleQuery>
    
    	<sampleQuery>select entry.title, entry.updated, entry.link from usgs.earthquakes</sampleQuery>
    
    	<sampleQuery>select entry.summary from usgs.earthquakes where entry.summary.type = "xml" and entry.summary.depth.km > 99</sampleQuery>
    
      </meta>
      <bindings>
        <select itemPath="" produces="XML">
    		<urls>
    
    			<url>http://earthquake.usgs.gov/eqcenter/catalogs/7day-M2.5.xml?11d</url>
    		</urls>
    		<execute><![CDATA[
    
    			default xml namespace = "http://www.w3.org/2005/Atom";
    
    			var xml = request.get().response,//call the url defined above
    
    			 	entries = <entries></entries>,//prep the output object
    				entry = null,//individual entry in xml obj. used in loop below.
    				magnitude = null,//magnitude of quake.  used in loop below
    
    				re = '<img '//img tag opening bracket (note: trailing spaces here and below)
    					+ 'src="(http://earthquake\\.usgs\\.gov/images/globes/[\\d_-]+\\.jpg)" '//img src - capture
    					+ 'alt="([\\d\\.]+&#176;(?:N|S) [\\d\\.]+&#176;(?:W|E))" '//img alt - ignore (we already have coords from georss)
    
    					+ 'align="(left|right)" '//img align - ignore
    					+ 'hspace="(\\d+)" '//img hspace - ignore
    					+ '/>'//img tag closing bracket
    					+ '<p>'//opening p tag
    
    					+ '(\\w+, \\w+\\s+\\d+, \\d+ [\\d:]+) UTC'//utc date - capture (note: variable amt of whitespace btwn month and day)
    					+ '<br>'//br tag
    					+ '(\\w+, \\w+\\s+\\d+, \\d+ [\\d:]+ (?:AM|PM)) at epicenter'//local date at epicenter - capture
    
    					+ '</p>'//closing p tag
    					+ '<p>'//opening p tag
    					+ '<strong>Depth</strong>: '//descriptive text w/ strong tags
    
    					+ '([\\d\\.]+) km '//depth in kilometers - capture
    					+ '\\(([\\d\\.]+) mi\\)'//depth in miles (enclosed in parenthesis) - capture
    					+ '</p>',//closing p tag
    
    				cdata = null,
    				summary = null;
    
    			for each(entry in xml.entry){
    
    				magnitude =
    					entry.title//eg M 3.0, Puerto Rico region
    					.split(' ')[1]//eg --> 3.0,
    
    					.replace(',', '');//eg --> 3.0
    				entry.appendChild( <magnitude>{magnitude}</magnitude> );
    
    				cdata = new RegExp(re).exec(entry.summary);
    
    				summary = <summary type="xml"><!-- differentiate this summary obj from native summary obj w/ type 'html' -->
    
    					<img alt={cdata[2]} align={cdata[3]} hspace={cdata[4]} src={cdata[1]} />
    
    					<date>
    						<utc>{cdata[5]}</utc>
    						<local>{cdata[6]}</local>
    
    					</date>
    					<depth>
    						<km>{cdata[7]}</km>
    
    						<mi>{cdata[8]}</mi>
    					</depth>
    				</summary>;
    
    				entry.appendChild(summary);
    
    				entries.appendChild(entry);
    
    			}
    			response.object = entries;
    		]]></execute>
    
        </select>
      </bindings>
    </table>
    

    Now we’re talking!  Check it out in the console.

    Here are a couple implementation-level notes:
    1) this code will generate an additional summary object, i.e., it doesn’t replace the pre-existing one.  If the later behavior is preferred, replace
    entry.appendChild(summary);
    with
    entry.summary = summary;

    2) the regular expression syntax used above is just the standard syntax for JavaScript, but be aware that the html is rendered using html entities, so the content I’m parsing using the regular expression looks different in the YQL console.  For example, add this as the first line inside the for loop:
    y.log(entry.summary); 
    This will print the cdata-wrapped html to the diagnostics section of the YQL output.  Instead of “<img src=”http://earthquake…-65.jpg” alt=”19.192&#176;N “, as we see in the raw xml feed, it looks like “&lt;img src=”http://earthquake…-65.jpg” alt=”19.192&amp;#176;N …”, On the server, it actually is the raw html, so the regular expression must be constructed accordingly.

    To conclude, this post presents a couple ways to restructure a USGS data feed using YQL Execute so it’s more convenient to consume.  I’ve also given a couple tips for working with E4X and YQL.  Because YQL does the fetching, processing, and caching for me, my data delivery is speedy and my client-side code is light.