Category Archives: tutorial

The powerful YQL feednormalizer table

Filed under feature, tutorial

YQL’s feednormalizer table is used to convert an input feed in one format into an output feed of another format.

Optionally, prexsl transformation can be applied to the input feed before format conversion and postxsl transformation can be applied to the output feed after format conversion. Prexsl transformation can be used to correct badly formed feeds; while as, postxsl transformation can be used to rearrange, filter, or format final output.

Input feeds can be of any character encoding; however, the output is always transcoded into UTF-8. Illegal characters found in the feed during transcoding are removed.

Syntax
The input feed can either be specified as a url or as an xml string. When the desired output format of the feed is specified, the input feed will be converted into that format. Optionally, one can supply xsl transforms to pre-process the input feed or post-process the output feed.

SELECT * FROM feednormalizer
 WHERE (url= | xml=)
  [AND output=('rss_0.91N'|'rss_0.93'|'rss_0.92'|'rss_1.0'|'rss_0.94'|'rss_2.0'|'rss_0.91U\rss_0.9'|'atom_1.0'|'atom_0.3')]
  [AND prexslurl=]
  [AND postxslurl=]
  [AND timeout=]

or

SELECT * FROM feednormalizer
 WHERE (url= | xml=)
  [AND output=('rss_0.91N'|'rss_0.93'|'rss_0.92'|'rss_1.0'|'rss_0.94'|'rss_2.0'|'rss_0.91U\rss_0.9'|'atom_1.0'|'atom_0.3')]
  [AND prexsl=]
  [AND postxsl=]
  [AND timeout=]

If a timeout value (msec) is specified, feednormalizer table will expect the url to respond within that timeout; otherwise, an error message will be returned.

Example 1: Querying a valid input feed

SELECT * FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'

Try above example in YQL Console. Try above example as REST request.


Example 2: Simple conversion of valid input feed

SELECT * FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'
   AND output='atom_1.0'

Try above example in YQL Console. Try above example as REST request.


Example 3: Converting invalid input feed produces error

Invalid feeds (such as http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml) produces errors when being transformed. For example, the statement:

SELECT * FROM feednormalizer
 WHERE url='http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml'
   AND output='rss_2.0'

produces the following error during execution:

Could not parse feed data. Invalid rss_2.0 feed, missing image title

Try above example in YQL Console. Try above example as REST request.


Example 4: Successfully converting invalid feeds using XSL transform.

In Example 3, we saw that the invalid feed could not be converted because it has missing image title. We can get around this by transforming the invalid input feed into a valid feed by removing the <image> tag. This can be done through the use of an XSL transform as shown below:

SELECT * FROM feednormalizer
 WHERE url='http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml'
   AND output='rss_2.0'
   AND prexsl='<?xml version="1.0" encoding="ISO-8859-1"?>
               <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                <xsl:output omit-xml-declaration="yes"/>
                   <xsl:template match="node()|@*">
                     <xsl:copy>
                        <xsl:apply-templates select="node()|@*"/>
                     </xsl:copy>
                   </xsl:template>
                   <xsl:template match="image"/>
               </xsl:stylesheet>'

Try above example in YQL Console. Try above example as REST request.


Example 5: HTML generation

A postxsl transformation can be applied to Example 4 to convert the corrected feed into HTML:

SELECT * FROM feednormalizer 
 WHERE url='http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml'
   AND output='rss_2.0'
   AND prexsl='<?xml version="1.0" encoding="ISO-8859-1"?>
               <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                <xsl:output omit-xml-declaration="yes"/>
                   <xsl:template match="node()|@*">
                     <xsl:copy>
                        <xsl:apply-templates select="node()|@*"/>
                     </xsl:copy>
                   </xsl:template>
                   <xsl:template match="image"/>
               </xsl:stylesheet>'
   AND postxsl='<?xml version="1.0" encoding="ISO-8859-1"?>
                <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                  <xsl:template match="/">
                    <html>
                      <body>
                        <h2>My News</h2>
                        <table border="1">
                          <tr bgcolor="#9acd32">
                            <th>Title</th>
                            <th>Description</th>
                          </tr>
                          <xsl:for-each select="rss/channel/item">
                            <tr>
                              <td><a><xsl:attribute name="href"><xsl:value-of select="link"/></xsl:attribute><xsl:value-of select="title"/></a></td>
                              <td><xsl:value-of select="description"/></td>
                            </tr>
                          </xsl:for-each>
                        </table>
                      </body>
                    </html>
                  </xsl:template>
                </xsl:stylesheet>'

Try above example in YQL Console. Try above example as REST request.


Example 6: HTML generation, 2nd example

For valid feeds that are already in the desired format, postxsl can be applied directly for HTML conversion:

SELECT * FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'
   AND postxsl='<?xml version="1.0" encoding="ISO-8859-1"?>
                <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                  <xsl:template match="/">
                    <html>
                      <body>
                        <h2>My News</h2>
                        <table border="1">
                          <tr bgcolor="#9acd32">
                            <th>Title</th>
                            <th>Description</th>
                          </tr>
                          <xsl:for-each select="rss/channel/item">
                            <tr>
                              <td><a><xsl:attribute name="href"><xsl:value-of select="link"/></xsl:attribute><xsl:value-of select="title"/></a></td>
                              <td><xsl:value-of select="description"/></td>
                            </tr>
                          </xsl:for-each>
                        </table>
                      </body>
                    </html>
                  </xsl:template>
                </xsl:stylesheet>'

Try above example in YQL Console. Try above example as REST request.


Example 7: Transcoding input feeds into UTF-8

Input feed documents may be in any encoding. The output is always in UTF-8. During transcoding illegal characters encountered in the input feed are removed. The following diagnostics message will appear in the output, when illegal characters are removed:

removed 14 badly encoded characters from feed.

SELECT * FROM feednormalizer
 WHERE url='http://cn.wsj.com/big5/rssbch.xml'
   AND output='atom_1.0'

Try above example in YQL Console. Try above example as REST request.


Example 8: Selecting and filtering input feeds using YQL

SELECT entry.title FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'
   AND output='atom_1.0' | sort('entry.title')

Try above example in YQL Console. Try above example as REST request.

YQL + YUI: Building End-to-End Applications

Filed under tutorial

The third team talk was presented by Paul Donnelly and Nagesh Susarla. They go over how to start your query out in the YQL console, access YQL data via the various endpoints, and go through YQL’s various authentication layers.

Code examples here. And slides here.

Here is the link to the original yuiblog post and below an embedded video.

Building YQL Open Data Tables with YQL Execute

Filed under tutorial

The second YQL talk at YUIConf was presented by Nagesh Susarla. Nagesh goes over how to use YQL execute in the open data tables. Here is the link to the original yuiblog post and below an embedded video.

YQL: An Introduction

Filed under tutorial

The YQL team was honored to be asked to participate at YUIConf 2010. Several team members had the chance to present various topics on YQL.

The first talk was presented by Mirek Grymuza and Josh Gordineer. Here is the link to the original yuiblog post and below an embedded video.

We’ll post the remaining YQL videos from YUIConf throughout the week.

Avoiding rate limits and getting banned in YQL and Pipes: Caching is your friend

Filed under tutorial

Web caches are great pieces of software: they lower the load on servers; and serve content faster to clients. YQL and Pipes love caches for this very reason, and we reward clients making good use of our reverse proxy caches by not subjecting those who get cached content to rate limits. That’s right – if we can give you your content from cache you can call it as often as you like, no need to cache locally just to save on calls.

Unfortunately we’ve seen a lot of requests to us that could easily take advantage of our caches but don’t. Here’s a list of some DOs and DONTs for calling YQL and Pipes, and let’s use the example of fetching the weather for a zipcode:

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210

DON’T cachebust
“Cachebusting” means changing your request just a little so that the cache can’t give you a copy of the response it’s seen before, often using a random value or timestamp on the end of the query parameters, for example:

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210&rnd=_12312

Beware of client-side JS libraries “helping” you
Often developers aren’t even aware they are doing this, but various web client libraries, particular client-side Javascript ones, seem to think cachebusting by default is a sensible thing to do. Its not. It just makes our servers work harder, the downstream sources of data work harder, and the response come back slower to the client. It also stops your own browser cache from helping your app behave faster.

For example, jQuery provides an automatic JSONP callback library that creates a randomly named global function name for each callback. This causes it to cachebust on every call as the function name changes all the time. By taking the time to add a few extra lines of code you can benefit from our caches:

$.ajax({
   url: 'http://query.yahooapis.com/v1/public/yql?q=show%20tables&format=json',
   dataType: 'jsonp',
   jsonp: 'callback',
   jsonpCallback: 'cbfunc'
});
function cbfunc(data){
   $.each(data.query.results.table, function(i,item){
   $('#tables').append('<p>'+item+'</p>');
});
}

By defining your own global function in your script you can be sure that it won’t change from request to request, and you can leverage our caches.

If you must cachebust, use a “window” of time
Sometimes the content gets cached for longer than you want, and sometimes your clients are IE6 web browsers which don’t respect cache headers correctly [shudder]. The best solution to this is to append a parameter that changes gradually at the same rate as the content you are requesting. For example, back to our example of fetching the weather forecast for a zipcode. The forecast will probably change throughout the day, but not every single second, so you’re probably ok fetching that every hour and therefore can create a cachebusting header that uses a timestamp that only changes once per hour, for example:

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210&rnd=_2010031310

This uses a YYYYMMDDHH (year/month/day/hour) format for each request to fetch the weather. All requests arriving over the course of an hour will get a cache hit and you’ll use 1 unit of rate limit (per zipcode).

DO put content into cache
On the flip side of caching busting, sometimes content isn’t cached as long as it should be, or you want it do be. Perhaps the content provider set it wrongly or your usage of the content doesn’t need it updating so frequently. You can take control of this in YQL in a couple of ways.

First, you can choose to explicitly set the cache “maxage” header in an open data table to whatever you want. Lets say you want the table data to be cached for 5 minutes, then in the <execute> statement you’d say response.maxAge=300; (its specified in seconds).

Secondly you can just ask YQL to cache the response to a statement for longer – just append the _maxage query parameter to your call and the result will be stored in cache for that length of time (but not shorter than it would have been originally):

http://query.yahooapis.com/v1/public/yql?q=select * from weather.forecast where location=90210&_maxage=3600

This is really useful when you’re using output from a table that’s not caching enough or an XML source without having to do any open table work.

By making a few small changes to the way your client calls YQL and Pipes you can gain almost infinite rate limit in many cases and provide better performance to your users.

Jonathan Trevor

Adding value to a data feed using YQL Execute

Filed under feature, tutorial

I want USGS earthquake data.  More specifically, I’m interested in recent, substantial quakes.  Fortunately, data.gov makes this data easy to find.  After searching through the USGS raw data catalog for the text “earthquake”, I choose the Worldwide M2.5+ Earthquakes, Past 7 Days feed, and pull it into YQL for parsing.  It’s almost perfect, but I want easy access to each quake’s magnitude, and the magnitude is buried in the “title” element.  No worries.  I’ll use YQL Execute to split it out and give it its own element in the feed’s structure.  I can then visualize this data using something like Jon LeBlanc’s js-yql-display project on github.

Here are a few reasons why YQL is perfect for this task:
1) I can take advantage of Yahoo!’s web-serving infrastructure to fetch, process, and cache the feed, reducing my server’s exposure and bandwidth costs.  My table is also cached, further reducing bandwidth usage.

2) Because YQL Execute employs standard E4X, I am using and adding to my JavaScript skill set, instead of spending time learning a new language

3) E4X was built specifically for XML manipulation so it has a convenient syntax for this job

4) By using YQL to do the heavy lifting, I can minimize the code I send to the browser and keep it focused on the display logic.

Ok. Ok. Here’s the code:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
        <description>Extracts magnitude from item title in atom feed and adds it as an element to the item.  We can then filter by magnitude using yql's built-in operators</description>
	<sampleQuery>select entry from usgs.earthquakes</sampleQuery>
	<sampleQuery>select entry from usgs.earthquakes where entry.magnitude >= 6.0</sampleQuery>

  </meta>
  <bindings>
    <select itemPath="" produces="XML">
		<urls>

			<url>http://earthquake.usgs.gov/eqcenter/catalogs/7day-M2.5.xml?11d</url>
		</urls>
		<execute><![CDATA[

			default xml namespace = "http://www.w3.org/2005/Atom";
			var xml = request.get().response,//call the url defined above

			 	entries = <entries></entries>,//prep the output object 
				entry = null,//individual entry in xml obj. used in loop below.
				magnitude = null;//magnitude of quake.  used in loop below

			for each(entry in xml.entry){
				magnitude = 
					entry.title//eg M 3.0, Puerto Rico region

					.split(' ')[1]//eg --> 3.0,
					.replace(',', '');//eg --> 3.0

				entry.appendChild( <magnitude>{magnitude}</magnitude> );
				entries.appendChild(entry);

			}
			response.object = entries;
		]]></execute>

    </select>
  </bindings>
</table>

Now, we can put this table on a server, load it up in YQL, and easily access the magnitude using YQL’s parser.

For those unfamiliar with E4X, it’s worth noting the namespace declaration (default xml namespace = "http://www.w3.org/2005/Atom";).  It tells YQL’s JavaScript engine what kind of structure to expect.  We wouldn’t be able to access the feed’s elements without it.  Find the namespaces associated with your data by looking in the xml wrapper:   The Atom namespace governs my feed’s structure as a whole, which is why it was convenient to declare it as a default.  For access to specific elements using another namespace, e.g. georss data, it’s be easier to define the namespace locally like this:
var ns = Namespace("http://www.georss.org/georss");
and then use it like this:
var latitude = xml.ns::Result.ns::Latitude;

Since we’ve gone to the trouble of defining a YQL table, we may as well add parsing for the summary element, which also contains some useful information in an inconvenient format.  Because this content is a bit more extensive, while still being somewhat predictable, a regular expression works well.  Here’s the code:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>

	<description>Extracts magnitude from item title in atom feed and adds it as an element to the item.  We can then filter by magnitude using yql's built-in operators.  Additionally, it extracts summary cdata, parses it, wraps the parsed data in its own element, and adds this element to the xml output.  </description>
	<sampleQuery>select entry from usgs.earthquakes</sampleQuery>

	<sampleQuery>select entry.title, entry.updated, entry.link from usgs.earthquakes</sampleQuery>

	<sampleQuery>select entry.summary from usgs.earthquakes where entry.summary.type = "xml" and entry.summary.depth.km > 99</sampleQuery>

  </meta>
  <bindings>
    <select itemPath="" produces="XML">
		<urls>

			<url>http://earthquake.usgs.gov/eqcenter/catalogs/7day-M2.5.xml?11d</url>
		</urls>
		<execute><![CDATA[

			default xml namespace = "http://www.w3.org/2005/Atom";
			
			var xml = request.get().response,//call the url defined above

			 	entries = <entries></entries>,//prep the output object 
				entry = null,//individual entry in xml obj. used in loop below.
				magnitude = null,//magnitude of quake.  used in loop below

				re = '<img '//img tag opening bracket (note: trailing spaces here and below)
					+ 'src="(http://earthquake\\.usgs\\.gov/images/globes/[\\d_-]+\\.jpg)" '//img src - capture
					+ 'alt="([\\d\\.]+&#176;(?:N|S) [\\d\\.]+&#176;(?:W|E))" '//img alt - ignore (we already have coords from georss)

					+ 'align="(left|right)" '//img align - ignore
					+ 'hspace="(\\d+)" '//img hspace - ignore
					+ '/>'//img tag closing bracket
					+ '<p>'//opening p tag

					+ '(\\w+, \\w+\\s+\\d+, \\d+ [\\d:]+) UTC'//utc date - capture (note: variable amt of whitespace btwn month and day)
					+ '<br>'//br tag
					+ '(\\w+, \\w+\\s+\\d+, \\d+ [\\d:]+ (?:AM|PM)) at epicenter'//local date at epicenter - capture

					+ '</p>'//closing p tag
					+ '<p>'//opening p tag
					+ '<strong>Depth</strong>: '//descriptive text w/ strong tags

					+ '([\\d\\.]+) km '//depth in kilometers - capture
					+ '\\(([\\d\\.]+) mi\\)'//depth in miles (enclosed in parenthesis) - capture
					+ '</p>',//closing p tag

				cdata = null,
				summary = null;
			
			for each(entry in xml.entry){

				magnitude = 
					entry.title//eg M 3.0, Puerto Rico region
					.split(' ')[1]//eg --> 3.0,

					.replace(',', '');//eg --> 3.0
				entry.appendChild( <magnitude>{magnitude}</magnitude> );

				
				cdata = new RegExp(re).exec(entry.summary);

				summary = <summary type="xml"><!-- differentiate this summary obj from native summary obj w/ type 'html' -->

					<img alt={cdata[2]} align={cdata[3]} hspace={cdata[4]} src={cdata[1]} />

					<date>
						<utc>{cdata[5]}</utc>
						<local>{cdata[6]}</local>

					</date>
					<depth>
						<km>{cdata[7]}</km>

						<mi>{cdata[8]}</mi>
					</depth>
				</summary>;

				entry.appendChild(summary);
				
				entries.appendChild(entry);

				
			}
			response.object = entries;
		]]></execute>

    </select>
  </bindings>
</table>

Now we’re talking!  Check it out in the console.

Here are a couple implementation-level notes:
1) this code will generate an additional summary object, i.e., it doesn’t replace the pre-existing one.  If the later behavior is preferred, replace
entry.appendChild(summary);
with
entry.summary = summary;

2) the regular expression syntax used above is just the standard syntax for JavaScript, but be aware that the html is rendered using html entities, so the content I’m parsing using the regular expression looks different in the YQL console.  For example, add this as the first line inside the for loop:
y.log(entry.summary); 
This will print the cdata-wrapped html to the diagnostics section of the YQL output.  Instead of “<img src=”http://earthquake…-65.jpg” alt=”19.192&#176;N “, as we see in the raw xml feed, it looks like “&lt;img src=”http://earthquake…-65.jpg” alt=”19.192&amp;#176;N …”, On the server, it actually is the raw html, so the regular expression must be constructed accordingly.

To conclude, this post presents a couple ways to restructure a USGS data feed using YQL Execute so it’s more convenient to consume.  I’ve also given a couple tips for working with E4X and YQL.  Because YQL does the fetching, processing, and caching for me, my data delivery is speedy and my client-side code is light.

Getting stock information with YQL and open data tables

Filed under feature, tutorial

One question that the YQL and Pipes teams get asked is “how can I get stock quotes? There isn’t an API for it on developer.yahoo.com”. Interestingly, while there isn’t a more traditional web service API, Yahoo finance does provide a very nice way to get a lot of well structured information on a given company. For example, here’s the Yahoo finance page on YHOO:

You’ll notice that there’s a little “download data” link to the right. If you click the link, it generates a CSV file dynamically with almost all the pricing information on the page. The problem is how to understand what fields the “f” parameter actually produce in the CSV file. Luckily someone has already done that hard work. So now we have a link with a bunch of configurable parameters to get lots of lovely stock information for multiple stock symbols. It is an API of sorts, but that CSV file is still a hard to work with, somewhat cryptic to use, and the data in it is a bit messy.

Enter YQL open data tables. If you don’t want to know “how” this works, and just want a really cool open data table and API to give you stock quotes, then give this a go in the YQL console. Here’s a second example of pulling out only a few fields and sorting the quotes by who has the biggest gain.

You’ll see the query:

select * from yahoo.finance.quotes where symbol in ("YHOO","AAPL","GOOG","MSFT")

And the results (trimmed for this post – there’s a lot of data in the results):

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="4" yahoo:created="2009-06-01T10:40:52Z" yahoo:lang="en-US" yahoo:updated="2009-06-01T10:40:52Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+*+from+yahoo.finance.quotes+where+symbol+in+%28%22YHOO%22%2C%22AAPL%22%2C%22GOOG%22%2C%22MSFT%22%29">
  <diagnostics>
    <publiclyCallable>true</publiclyCallable>
    <url execution-time="2"><![CDATA[http://datatables.org/alltables.env]]></url>
    <url execution-time="55"><![CDATA[http://www.datatables.org/yahoo/finance/yahoo.finance.quotes.xml]]></url>
    <url execution-time="5"><![CDATA[http://download.finance.yahoo.com/d/quotes.csv?s=YHOO,AAPL,GOOG,MSFT&f=aa2bb2b3b4cc1c3c6c8dd1d2ee1e7e8e9ghjkg1g3g4g5g6ii5j1j3j4j5j6k1k2k4k5ll1l2l3mm2m3m4m5m6m7m8nn4opp1p2p5p6qrr1r2r5r6r7ss1s7t1t7t8vv1v7ww1w4xy]]></url>
    <url execution-time="13"><![CDATA[select * from csv where url=@url and columns='Ask,AverageDailyVolume,Bid,AskRealtime,BidRealtime,BookValue,Change&PercentChange,Change,Commission,ChangeRealtime,AfterHoursChangeRealtime,DividendShare,LastTradeDate,TradeDate,EarningsShare,ErrorIndicationreturnedforsymbolchangedinvalid,EPSEstimateCurrentYear,EPSEstimateNextYear,EPSEstimateNextQuarter,DaysLow,DaysHigh,YearLow,YearHigh,HoldingsGainPercent,AnnualizedGain,HoldingsGain,HoldingsGainPercentRealtime,HoldingsGainRealtime,MoreInfo,OrderBookRealtime,MarketCapitalization,MarketCapRealtime,EBITDA,ChangeFromYearLow,PercentChangeFromYearLow,LastTradeRealtimeWithTime,ChangePercentRealtime,ChangeFromYearHigh,PercebtChangeFromYearHigh,LastTradeWithTime,LastTradePriceOnly,HighLimit,LowLimit,DaysRange,DaysRangeRealtime,FiftydayMovingAverage,TwoHundreddayMovingAverage,ChangeFromTwoHundreddayMovingAverage,PercentChangeFromTwoHundreddayMovingAverage,ChangeFromFiftydayMovingAverage,PercentChangeFromFiftydayMovingAverage,Name,Notes,Open,PreviousClose,PricePaid,ChangeinPercent,PriceSales,PriceBook,ExDividendDate,PERatio,DividendPayDate,PERatioRealtime,PEGRatio,PriceEPSEstimateCurrentYear,PriceEPSEstimateNextYear,Symbol,SharesOwned,ShortRatio,LastTradeTime,TickerTrend,OneyrTargetPrice,Volume,HoldingsValue,HoldingsValueRealtime,YearRange,DaysValueChange,DaysValueChangeRealtime,StockExchange,DividendYield']]></url>
    <javascript instructions-used="279387"/>
    <user-time>313</user-time>
    <service-time>75</service-time>
    <build-version>1678</build-version>
  </diagnostics>
  <results>
    <quote symbol="YHOO">
      <Ask>16.60</Ask>
      <AverageDailyVolume>22083900</AverageDailyVolume>
      <Bid>16.55</Bid>
      <AskRealtime>16.60</AskRealtime>
      <BidRealtime>16.55</BidRealtime>
      <BookValue>8.30</BookValue>
      <Change_PercentChange>+0.74 - +4.67%</Change_PercentChange>
      <Change>+0.74</Change>
      <Commission/>
      <ChangeRealtime>+0.74</ChangeRealtime>
      <AfterHoursChangeRealtime>N/A - N/A</AfterHoursChangeRealtime>
      <DividendShare>0.00</DividendShare>
      <LastTradeDate>6/1/2009</LastTradeDate>
      <TradeDate/>
      <EarningsShare>0.011</EarningsShare>
      <ErrorIndicationreturnedforsymbolchangedinvalid>N/A</ErrorIndicationreturnedforsymbolchangedinvalid>
      <EPSEstimateCurrentYear>0.36</EPSEstimateCurrentYear>
      <EPSEstimateNextYear>0.42</EPSEstimateNextYear>
      <EPSEstimateNextQuarter>0.08</EPSEstimateNextQuarter>
      <DaysLow>16.13</DaysLow>
      <DaysHigh>16.65</DaysHigh>
      <YearLow>8.94</YearLow>
      <YearHigh>27.10</YearHigh>
...
      <MarketCapitalization>23.140B</MarketCapitalization>
      <MarketCapRealtime/>
      <EBITDA>1.278B</EBITDA>
      <ChangeFromYearLow>+7.64</ChangeFromYearLow>
      <PercentChangeFromYearLow>+85.46%</PercentChangeFromYearLow>
      <LastTradeRealtimeWithTime>N/A - &lt;b&gt;16.58&lt;/b&gt;</LastTradeRealtimeWithTime>
      <ChangePercentRealtime>N/A - +4.67%</ChangePercentRealtime>
      <ChangeFromYearHigh>-10.52</ChangeFromYearHigh>
      <PercebtChangeFromYearHigh>-38.82%</PercebtChangeFromYearHigh>
      <LastTradeWithTime>4:00pm - &lt;b&gt;16.58&lt;/b&gt;</LastTradeWithTime>
      <LastTradePriceOnly>16.58</LastTradePriceOnly>
      <HighLimit/>
      <LowLimit/>
      <DaysRange>16.13 - 16.65</DaysRange>
      <DaysRangeRealtime>N/A - N/A</DaysRangeRealtime>
      <FiftydayMovingAverage>14.6126</FiftydayMovingAverage>
      <TwoHundreddayMovingAverage>12.9096</TwoHundreddayMovingAverage>
      <ChangeFromTwoHundreddayMovingAverage>+3.6704</ChangeFromTwoHundreddayMovingAverage>
      <PercentChangeFromTwoHundreddayMovingAverage>+28.43%</PercentChangeFromTwoHundreddayMovingAverage>
      <ChangeFromFiftydayMovingAverage>+1.9674</ChangeFromFiftydayMovingAverage>
      <PercentChangeFromFiftydayMovingAverage>+13.46%</PercentChangeFromFiftydayMovingAverage>
      <Name>Yahoo! Inc.</Name>
      <Notes>-</Notes>
      <Open>16.18</Open>
      <PreviousClose>15.84</PreviousClose>
...
      <Symbol>YHOO</Symbol>
      <SharesOwned/>
      <ShortRatio>2.10</ShortRatio>
      <LastTradeTime>4:00pm</LastTradeTime>
      <TickerTrend>&amp;nbsp;======&amp;nbsp;</TickerTrend>
      <OneyrTargetPrice>15.27</OneyrTargetPrice>
      <Volume>27926064</Volume>
      <HoldingsValue/>
      <HoldingsValueRealtime/>
      <YearRange>8.94 - 27.10</YearRange>
      <DaysValueChange>- - +4.67%</DaysValueChange>
      <DaysValueChangeRealtime>N/A - N/A</DaysValueChangeRealtime>
      <StockExchange>NasdaqNM</StockExchange>
...
    </quote>
    <quote symbol="AAPL">
...
  </results>
</query>

So how did we go from that ugly looking CSV file to the lovely XML? The answer is the yahoo.finance.quotes open data table:

<?xml version="1.0" encoding="UTF-8" ?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
    <sampleQuery>
      select * from {table} where symbol in ("YHOO","AAPL","GOOG","MSFT")
    </sampleQuery>
  </meta>
  <bindings>
    <select itemPath="quotes.quote" produces="XML">
      <urls><url>http://download.finance.yahoo.com/d/quotes.csv?s={-listjoin|,|symbol}</url></urls>
      <inputs>
        <key id='f' type='xs:string' const='true' default='aa2bb2b3b4cc1c3c6c8dd1d2ee1e7e8e9ghjkg1g3g4g5g6ii5j1j3j4j5j6k1k2k4k5ll1l2l3mm2m3m4m5m6m7m8nn4opp1p2p5p6qrr1r2r5r6r7ss1s7t1t7t8vv1v7ww1w4xy' paramType='query' />
        <key id='symbol' type='xs:string' batchable='true' maxBatchItems='20' paramType='path' required='true'/>
      </inputs>
      <execute><![CDATA[
        var results = y.query("select * from csv where url=@url and columns='Ask,AverageDailyVolume,Bid,AskRealtime,BidRealtime,BookValue,Change&PercentChange,Change,Commission,ChangeRealtime,AfterHoursChangeRealtime,DividendShare,LastTradeDate,TradeDate,EarningsShare,ErrorIndicationreturnedforsymbolchangedinvalid,EPSEstimateCurrentYear,EPSEstimateNextYear,EPSEstimateNextQuarter,DaysLow,DaysHigh,YearLow,YearHigh,HoldingsGainPercent,AnnualizedGain,HoldingsGain,HoldingsGainPercentRealtime,HoldingsGainRealtime,MoreInfo,OrderBookRealtime,MarketCapitalization,MarketCapRealtime,EBITDA,ChangeFromYearLow,PercentChangeFromYearLow,LastTradeRealtimeWithTime,ChangePercentRealtime,ChangeFromYearHigh,PercebtChangeFromYearHigh,LastTradeWithTime,LastTradePriceOnly,HighLimit,LowLimit,DaysRange,DaysRangeRealtime,FiftydayMovingAverage,TwoHundreddayMovingAverage,ChangeFromTwoHundreddayMovingAverage,PercentChangeFromTwoHundreddayMovingAverage,ChangeFromFiftydayMovingAverage,PercentChangeFromFiftydayMovingAverage,Name,Notes,Open,PreviousClose,PricePaid,ChangeinPercent,PriceSales,PriceBook,ExDividendDate,PERatio,DividendPayDate,PERatioRealtime,PEGRatio,PriceEPSEstimateCurrentYear,PriceEPSEstimateNextYear,Symbol,SharesOwned,ShortRatio,LastTradeTime,TickerTrend,OneyrTargetPrice,Volume,HoldingsValue,HoldingsValueRealtime,YearRange,DaysValueChange,DaysValueChangeRealtime,StockExchange,DividendYield'",{url:request.url});
        var quotes = <quotes/>;
        var rows=results.results.row;
        for each (var row in rows) {
          for each (var item in row.*) {
            var elname = item.localName();
            var txt = item.text().toString();
            if (txt=="N/A") txt=""; else if (txt=="-") txt=""; else {
              txt = txt.replace(/"/g, '');
            }
            row[elname]=txt;
          }
            quotes.quote += <quote symbol={row.Symbol.text().toString()}>{row.*}</quote>;
        }
        response.object = quotes;
           ]]></execute>
    </select>
  </bindings>
</table>

Let’s step through the main parts of the open data table definition. First the URL that YQL builds to get the data looks like this:

<url>http://download.finance.yahoo.com/d/quotes.csv?s={-listjoin|,|symbol}</url>

The listjoin is a URI template instruction that creates a “comma” separated list of values for each item in the “symbol” value. Symbol itself looks like this in the inputs section:

<key id='f' type='xs:string' const='true' default='aa2bb2b3b4cc1c3c6c8dd1d2ee1e7e8e9ghjkg1g3g4g5g6ii5j1j3j4j5j6k1k2k4k5ll1l2l3mm2m3m4m5m6m7m8nn4opp1p2p5p6qrr1r2r5r6r7ss1s7t1t7t8vv1v7ww1w4xy' paramType='query' />
<key id='symbol' type='xs:string' batchable='true' maxBatchItems='20' paramType='path' required='true'/>

Note the batchable attribute of symbol. This tells YQL that this parameter can accept a set of values that can be sent to the remote data provider in a single request. In this case, the finance CSV API can take a comma separated list of stock symbols and return all that information for each entry.

That cryptic looking f input key is a constant – we’re going to get all the fields we can every time, and this value holds all the short field names that the API understands.

The YQL execute section actually dispatches the request and processes the return data.

var results = y.query("select * from csv where url=@url and columns='Ask,AverageDailyVolume,Bid,AskRealtime,BidRealtime,BookValue,Change&PercentChange,Change,Commission,ChangeRealtime,AfterHoursChangeRealtime,DividendShare,LastTradeDate,TradeDate,EarningsShare,ErrorIndicationreturnedforsymbolchangedinvalid,EPSEstimateCurrentYear,EPSEstimateNextYear,EPSEstimateNextQuarter,DaysLow,DaysHigh,YearLow,YearHigh,HoldingsGainPercent,AnnualizedGain,HoldingsGain,HoldingsGainPercentRealtime,HoldingsGainRealtime,MoreInfo,OrderBookRealtime,MarketCapitalization,MarketCapRealtime,EBITDA,ChangeFromYearLow,PercentChangeFromYearLow,LastTradeRealtimeWithTime,ChangePercentRealtime,ChangeFromYearHigh,PercebtChangeFromYearHigh,LastTradeWithTime,LastTradePriceOnly,HighLimit,LowLimit,DaysRange,DaysRangeRealtime,FiftydayMovingAverage,TwoHundreddayMovingAverage,ChangeFromTwoHundreddayMovingAverage,PercentChangeFromTwoHundreddayMovingAverage,ChangeFromFiftydayMovingAverage,PercentChangeFromFiftydayMovingAverage,Name,Notes,Open,PreviousClose,PricePaid,ChangeinPercent,PriceSales,PriceBook,ExDividendDate,PERatio,DividendPayDate,PERatioRealtime,PEGRatio,PriceEPSEstimateCurrentYear,PriceEPSEstimateNextYear,Symbol,SharesOwned,ShortRatio,LastTradeTime,TickerTrend,OneyrTargetPrice,Volume,HoldingsValue,HoldingsValueRealtime,YearRange,DaysValueChange,DaysValueChangeRealtime,StockExchange,DividendYield'",{url:request.url});

This runs another YQL select statement when the table gets invoked. It’s fetching a CSV data source from a URL and setting up the column names for each “row” that comes back. The URL that YQL would originally have fetched for this data is already created in the request object, containing the list of symbols expanded into the s parameter, so we just used the @ substitution syntax to add that into the YQL statement.

var quotes = <quotes/>;
var rows=results.results.row;

The next few lines create a new XML object called quotes which will hold our final XML document and gets an XML list to each of the rows that came back from the CSV query.

for each (var row in rows) {

Now we’ll loop over each of those rows (one row per stock symbol).

   for each (var item in row.*) {
    var elname = item.localName();
    var txt = item.text().toString();
    if (txt=="N/A") txt=""; else if (txt=="-") txt=""; else {
      txt = txt.replace(/"/g, '');
    }
    row[elname]=txt;
  }

For each element in that row (note the E4X syntax to get all the elements row.*) we’re going to clean up the XML somewhat. We’ll get rid of “N/A” and “-” text elements and use empty elements instead, as well as remove any quotes in the text.

   quotes.quote += <quote symbol={row.Symbol.text().toString()}>{row.*}</quote>;
}

Finally in the main loop we’ll append a new quote element to our root XML quotes element that contains the reformatted XML elements, and an attribute called symbol

response.object = quotes;

Last of all we set the response object to the document we’ve just created in the loop.

We’ve already added the table to the github open data table repository, so you can try this table out in the YQL console just by including the community tables. And now its in YQL, you can sort, filter, project and join on any of this data that comes back! For example, you can pull out only a few fields and sort the quotes by who has the biggest gain.