Category Archives: feature

Recent Enhancement to the HTML table

Filed under feature, news

The HTML table has recently been enhanced to support HTML version 5. The HTML table in the backend uses a parser which autocorrects malformed tags. To support HTML5 we are using a different parser then the one used previously. Because of this change, the output might be slightly different than before. To ensure backward compatibility, both the parsers are supported, with the older one being the default. The new parser can be used by just appending compat=”html5” to the query.

For Example: select * from html where url=”http://finance.yahoo.com/q?s=yhoo” and compat=”html5″

Please start using this feature and give us your feedback! Eventually the new parser which supports ‘html5’ will be made default, but that will follow an announcement. Even after the new parser is made default, the old one can still be used by having compat equals to ‘html4’.

YQL Editor

Filed under feature

(updated)

The YQL Editor is a simple and easy way to create your table in Yahoo’s cloud. The editor makes use of the yql.storage table to store your table with Yahoo’s cloud instead of hosting it on your own server.

Simply access it from the YQL console on the upper right hand column under “My Tables”.

mytablesconsole

Some quick notes: You must be logged in to view, create and edit your tables. You can not view other people’s tables. If you previously created and stored tables using yql.storage, they will not show up in “My Tables”. Storing tables directly using the yql.storage table will also not be shown in “My Tables”. “My Tables” makes it easy to track tables you created via the YQL Editor while logged in.

Click the “new” link to launch the YQL Editor. This will open up the editor in a separate page. By default new tables are named “untitled_table”. You can rename the table by simply clicking on the name.

The Tables dropdown provides sample templates to construct your table. It also will show your tables if you have any.

Screen shot 2011-10-20 at Oct 20, 4.08.36 PM

When in the “My Tables” section in the console, clicking on the table name will put the store execute key into the yql statement area and desc the table.

Screen shot 2011-10-20 at Oct 20, 4.17.00 PM

To query your table store, put the yql query statment after the “use” declaration. For example: use “store://Tdr13p0ubxczYZ78ia0Sph” as zillow; select * from zillow where address = “1835 73rd Ave NE” and citystatezip = “98039″ and zwsid = “X1-ZWz1cse68iatcb_13bwv”

Quick note: You can share your table execute store, it can be run by the public. Your table store execute key is only known to you – unless you choose to share it.

You can also make your endpoint (which will be really long) into a query alias. Click on the “Create Query Alias” link on the top right hand side of the YQL statement box to customize your endpoint.

Currently, it can take up to 30 seconds to see changes made to your table after editing. This will be fixed in a future YQL release. By adding debug=true to your query (or console), you can see real time edits after saving.

We plan to add new features to the YQL Editor as time goes on. Future releases will include the ability to manage and create your own YQL environments and hosted Javascript files. Please let us know of any features you’d like to see at yql-questions (at) yahoo-inc.com.

YQL Table Health and YQL Lint

Filed under feature, news

YQL has attracted a large number of OpenData tables thanks to the efforts of the community. But some of these tables don’t end up working properly due to many factors, like recent changes made to the underlying API. Therefore we’ve created two new tools, YQL Table Health and YQL Lint, to help developers see and understand which tables actually work and which ones don’t.

YQL Table Health is intended to provide a quick general overview of how “healthy” the community OpenData tables are:

img1

When you first arrive to the page, you will see a list of all the tables that can be used by YQL. Clicking on one of the entries in the list will cause it to expand and show additional information regarding where the source of the XML file is, what kind of table it is, sample query information, and lastly any errors that were encountered. You can use the controls on the left-hand side to further filter, sort, and search through all this data. If you see a table that doesn’t work, you can contact the author of the table to fix it via github. You may also fork the yql-tables from github and fix or enhance the table yourself.

YQL Table Health uses a sever-side script to iterate through all the tables. Each table’s XML file is loaded into YQL cloud storage before a series of checks are run against that XML file. The test results are then cached in a database as well as memory to serve this data as fast as possible. Updates to caches are triggered by users visiting the page and only fire if the data is older than thirty minutes; requests for an update are also synchronized to prevent a race condition occurring, where two or more requests might be made simultaneously. Last but not least, the data is served to the user through a user-interface built using HTML5 and the YUI Library.

The next tool we’re going to introduce to you is YQL Lint.

img2

YQL Lint is essentially an XML debugger for individual YQL tables. You can enter either a URL to a XML file, or the contents of an XML file, and it will validate this against our schema for syntax flaws. Once the schema check has been passed, we will use YQL to get a description of your YQL table and check to see if it contains a sample query that returns a valid result. YQL Lint essentially relies on the same core backend as YQL Table Health.

Please experiment with these tools and send us any questions or suggestions you might have.

Daniel Park – YQL/Pipes Intern

YQL and Comet-based Streaming

Filed under feature

Summary

The latest YQL release adds support for Comet-based streaming with Downstream Polling (”CDP”), which allows YQL clients to receive updates to their queries in real time.

Motivation

In traditional YQL, a client must poll the YQL server for updates, by sending the same YQL query over and over again. Each time, the YQL server parses the query into a Pipe object, executes it, sends the results to the client, and closes the response.

Traditional way of invoking Pipe

Traditional way of invoking a Pipe

This approach is inefficient and does not scale well for updates: First of all, the YQL server has to parse the same YQL statements into corresponding Pipe objects over and over again, where each Pipe object is used to produce only a single response, after which it will be garbage-collected.  Secondly, there is no guarantee that the response data will have changed, resulting in unnecessary network traffic and wasted Pipe constructions and executions.

Long polling is not a solution either: It is impossible for the YQL server to know for how long to keep open a response, because it has no way of telling when new data has become available. As with busy polling, Pipe objects are not reused.

CDP attempts to address these deficiencies: In this mode, the client opens a single persistent connection to the server and sends the YQL query in the initial request. The YQL engine on the YQL server parses the query into a Pipe object, but instead of discarding the Pipe after a single execution and closing the response, it holds on to both the Pipe (turning it into a Standing Pipe), which allows it to execute the same query repeatedly over a period of time, and the Comet-enabled response, which allows it to send updated results to the client asynchronously and in real time.

Periodic invocation of Standing Pipe

Periodic invocation of Standing Pipe

Polling Frequency

In order to enable a table for CDP, its developer must specify the frequency (in seconds) that is appropriate for polling the table’s downstream web service for updates, using the new pollingFrequencySeconds table attribute.

If the YQL query is mapped to a single table, then the frequency with which the Standing Pipe will be executed is equal to the table’s pollingFrequencySeconds. If the YQL query is mapped to multiple tables, then the execution frequency of the Standing Pipe is set to the largest polling frequency of the tables involved, to increase the likelihood that each Standing Pipe execution will yield updated results.

Check out the YQL documentation for an example of how to enable a table for CDP.

Future Enhancements

A future enhancement will have the YQL engine participate in a truly event-driven, publish-subscribe (Bayeux) style notification system, where a table’s downstream service will be a named source of events, to which the YQL engine will subscribe through the appropriate event channel.

Implementation Status and Limitations

The current implementation of CDP is considered experimental and is made available on separate YQL web service endpoints, which are named after the traditional YQL web service endpoints, with streaming inserted into their URI paths. Therefore, YQL’s streaming-enabled endpoint for public tables is accessible through this URL:

http://query.yahooapis.com/v1/public/streaming/yql?[query_params]

whereas the streaming-enabled endpoint for OAuth-protected tables can be accessed at this URL:

http://query.yahooapis.com/v1/streaming/yql?[query_params]

The number of concurrent Comet connections has been throttled at the YQL engine: When the maximum number of concurrent Comet connections has been reached,  any requests that would normally have been put into CDP mode are served in the traditional way.

The version of the Comet implementation that CDP builds upon does not support a configurable timeout for Comet connections, with the effect that a Comet connection will remain open for only 20 seconds. This limitation will be lifted in a future YQL release.

Support for round trip lossless JSON processing

Filed under feature

There have been many reports (and complaints) in the past about the YQL engine changing the structure of a JSON response from a downstream webservice as part of its processing before returning JSON output to the client.

This corruption of JSON response content would manifest itself in a number of ways: JSON numbers in the downstream response would be delivered as JSON strings to the client, and single-element JSON arrays converted to JSON objects, among others.

For example, "myint":[5] in the downstream response would be returned as "results":{"json":{"myint":"5"}} to the client.

Recent YQL releases have fixed these issues and now provide support for round trip lossless JSON processing. As we want to give developers sufficient time to take advantage of and adapt to this change, we have not yet enabled this feature by default. To enable it, simply append jsonCompat=new to your YQL query.

Please start experimenting with this new query parameter and give us feedback. Eventually, this feature will be enabled by default, but there will an announcement before we make the switch.

Adding value to a data feed using YQL Execute

Filed under feature, tutorial

I want USGS earthquake data.  More specifically, I’m interested in recent, substantial quakes.  Fortunately, data.gov makes this data easy to find.  After searching through the USGS raw data catalog for the text “earthquake”, I choose the Worldwide M2.5+ Earthquakes, Past 7 Days feed, and pull it into YQL for parsing.  It’s almost perfect, but I want easy access to each quake’s magnitude, and the magnitude is buried in the “title” element.  No worries.  I’ll use YQL Execute to split it out and give it its own element in the feed’s structure.  I can then visualize this data using something like Jon LeBlanc’s js-yql-display project on github.

Here are a few reasons why YQL is perfect for this task:
1) I can take advantage of Yahoo!’s web-serving infrastructure to fetch, process, and cache the feed, reducing my server’s exposure and bandwidth costs.  My table is also cached, further reducing bandwidth usage.

2) Because YQL Execute employs standard E4X, I am using and adding to my JavaScript skill set, instead of spending time learning a new language

3) E4X was built specifically for XML manipulation so it has a convenient syntax for this job

4) By using YQL to do the heavy lifting, I can minimize the code I send to the browser and keep it focused on the display logic.

Ok. Ok. Here’s the code:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
        <description>Extracts magnitude from item title in atom feed and adds it as an element to the item.  We can then filter by magnitude using yql's built-in operators</description>
	<sampleQuery>select entry from usgs.earthquakes</sampleQuery>
	<sampleQuery>select entry from usgs.earthquakes where entry.magnitude >= 6.0</sampleQuery>

  </meta>
  <bindings>
    <select itemPath="" produces="XML">
		<urls>

			<url>http://earthquake.usgs.gov/eqcenter/catalogs/7day-M2.5.xml?11d</url>
		</urls>
		<execute><![CDATA[

			default xml namespace = "http://www.w3.org/2005/Atom";
			var xml = request.get().response,//call the url defined above

			 	entries = <entries></entries>,//prep the output object
				entry = null,//individual entry in xml obj. used in loop below.
				magnitude = null;//magnitude of quake.  used in loop below

			for each(entry in xml.entry){
				magnitude =
					entry.title//eg M 3.0, Puerto Rico region

					.split(' ')[1]//eg --> 3.0,
					.replace(',', '');//eg --> 3.0

				entry.appendChild( <magnitude>{magnitude}</magnitude> );
				entries.appendChild(entry);

			}
			response.object = entries;
		]]></execute>

    </select>
  </bindings>
</table>

Now, we can put this table on a server, load it up in YQL, and easily access the magnitude using YQL’s parser.

For those unfamiliar with E4X, it’s worth noting the namespace declaration (default xml namespace = "http://www.w3.org/2005/Atom";).  It tells YQL’s JavaScript engine what kind of structure to expect.  We wouldn’t be able to access the feed’s elements without it.  Find the namespaces associated with your data by looking in the xml wrapper:   The Atom namespace governs my feed’s structure as a whole, which is why it was convenient to declare it as a default.  For access to specific elements using another namespace, e.g. georss data, it’s be easier to define the namespace locally like this:
var ns = Namespace("http://www.georss.org/georss");
and then use it like this:
var latitude = xml.ns::Result.ns::Latitude;

Since we’ve gone to the trouble of defining a YQL table, we may as well add parsing for the summary element, which also contains some useful information in an inconvenient format.  Because this content is a bit more extensive, while still being somewhat predictable, a regular expression works well.  Here’s the code:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>

	<description>Extracts magnitude from item title in atom feed and adds it as an element to the item.  We can then filter by magnitude using yql's built-in operators.  Additionally, it extracts summary cdata, parses it, wraps the parsed data in its own element, and adds this element to the xml output.  </description>
	<sampleQuery>select entry from usgs.earthquakes</sampleQuery>

	<sampleQuery>select entry.title, entry.updated, entry.link from usgs.earthquakes</sampleQuery>

	<sampleQuery>select entry.summary from usgs.earthquakes where entry.summary.type = "xml" and entry.summary.depth.km > 99</sampleQuery>

  </meta>
  <bindings>
    <select itemPath="" produces="XML">
		<urls>

			<url>http://earthquake.usgs.gov/eqcenter/catalogs/7day-M2.5.xml?11d</url>
		</urls>
		<execute><![CDATA[

			default xml namespace = "http://www.w3.org/2005/Atom";

			var xml = request.get().response,//call the url defined above

			 	entries = <entries></entries>,//prep the output object
				entry = null,//individual entry in xml obj. used in loop below.
				magnitude = null,//magnitude of quake.  used in loop below

				re = '<img '//img tag opening bracket (note: trailing spaces here and below)
					+ 'src="(http://earthquake\\.usgs\\.gov/images/globes/[\\d_-]+\\.jpg)" '//img src - capture
					+ 'alt="([\\d\\.]+&#176;(?:N|S) [\\d\\.]+&#176;(?:W|E))" '//img alt - ignore (we already have coords from georss)

					+ 'align="(left|right)" '//img align - ignore
					+ 'hspace="(\\d+)" '//img hspace - ignore
					+ '/>'//img tag closing bracket
					+ '<p>'//opening p tag

					+ '(\\w+, \\w+\\s+\\d+, \\d+ [\\d:]+) UTC'//utc date - capture (note: variable amt of whitespace btwn month and day)
					+ '<br>'//br tag
					+ '(\\w+, \\w+\\s+\\d+, \\d+ [\\d:]+ (?:AM|PM)) at epicenter'//local date at epicenter - capture

					+ '</p>'//closing p tag
					+ '<p>'//opening p tag
					+ '<strong>Depth</strong>: '//descriptive text w/ strong tags

					+ '([\\d\\.]+) km '//depth in kilometers - capture
					+ '\\(([\\d\\.]+) mi\\)'//depth in miles (enclosed in parenthesis) - capture
					+ '</p>',//closing p tag

				cdata = null,
				summary = null;

			for each(entry in xml.entry){

				magnitude =
					entry.title//eg M 3.0, Puerto Rico region
					.split(' ')[1]//eg --> 3.0,

					.replace(',', '');//eg --> 3.0
				entry.appendChild( <magnitude>{magnitude}</magnitude> );

				cdata = new RegExp(re).exec(entry.summary);

				summary = <summary type="xml"><!-- differentiate this summary obj from native summary obj w/ type 'html' -->

					<img alt={cdata[2]} align={cdata[3]} hspace={cdata[4]} src={cdata[1]} />

					<date>
						<utc>{cdata[5]}</utc>
						<local>{cdata[6]}</local>

					</date>
					<depth>
						<km>{cdata[7]}</km>

						<mi>{cdata[8]}</mi>
					</depth>
				</summary>;

				entry.appendChild(summary);

				entries.appendChild(entry);

			}
			response.object = entries;
		]]></execute>

    </select>
  </bindings>
</table>

Now we’re talking!  Check it out in the console.

Here are a couple implementation-level notes:
1) this code will generate an additional summary object, i.e., it doesn’t replace the pre-existing one.  If the later behavior is preferred, replace
entry.appendChild(summary);
with
entry.summary = summary;

2) the regular expression syntax used above is just the standard syntax for JavaScript, but be aware that the html is rendered using html entities, so the content I’m parsing using the regular expression looks different in the YQL console.  For example, add this as the first line inside the for loop:
y.log(entry.summary); 
This will print the cdata-wrapped html to the diagnostics section of the YQL output.  Instead of “<img src=”http://earthquake…-65.jpg” alt=”19.192&#176;N “, as we see in the raw xml feed, it looks like “&lt;img src=”http://earthquake…-65.jpg” alt=”19.192&amp;#176;N …”, On the server, it actually is the raw html, so the regular expression must be constructed accordingly.

To conclude, this post presents a couple ways to restructure a USGS data feed using YQL Execute so it’s more convenient to consume.  I’ve also given a couple tips for working with E4X and YQL.  Because YQL does the fetching, processing, and caching for me, my data delivery is speedy and my client-side code is light.

Getting stock information with YQL and open data tables

Filed under feature, tutorial

One question that the YQL and Pipes teams get asked is “how can I get stock quotes? There isn’t an API for it on developer.yahoo.com”. Interestingly, while there isn’t a more traditional web service API, Yahoo finance does provide a very nice way to get a lot of well structured information on a given company. For example, here’s the Yahoo finance page on YHOO:

You’ll notice that there’s a little “download data” link to the right. If you click the link, it generates a CSV file dynamically with almost all the pricing information on the page. The problem is how to understand what fields the “f” parameter actually produce in the CSV file. Luckily someone has already done that hard work. So now we have a link with a bunch of configurable parameters to get lots of lovely stock information for multiple stock symbols. It is an API of sorts, but that CSV file is still a hard to work with, somewhat cryptic to use, and the data in it is a bit messy.

Enter YQL open data tables. If you don’t want to know “how” this works, and just want a really cool open data table and API to give you stock quotes, then give this a go in the YQL console. Here’s a second example of pulling out only a few fields and sorting the quotes by who has the biggest gain.

You’ll see the query:

select * from yahoo.finance.quotes where symbol in ("YHOO","AAPL","GOOG","MSFT")

And the results (trimmed for this post – there’s a lot of data in the results):

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="4" yahoo:created="2009-06-01T10:40:52Z" yahoo:lang="en-US" yahoo:updated="2009-06-01T10:40:52Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+*+from+yahoo.finance.quotes+where+symbol+in+%28%22YHOO%22%2C%22AAPL%22%2C%22GOOG%22%2C%22MSFT%22%29">
  <diagnostics>
    <publiclyCallable>true</publiclyCallable>
    <url execution-time="2"><![CDATA[http://datatables.org/alltables.env]]></url>
    <url execution-time="55"><![CDATA[http://www.datatables.org/yahoo/finance/yahoo.finance.quotes.xml]]></url>
    <url execution-time="5"><![CDATA[http://download.finance.yahoo.com/d/quotes.csv?s=YHOO,AAPL,GOOG,MSFT&f=aa2bb2b3b4cc1c3c6c8dd1d2ee1e7e8e9ghjkg1g3g4g5g6ii5j1j3j4j5j6k1k2k4k5ll1l2l3mm2m3m4m5m6m7m8nn4opp1p2p5p6qrr1r2r5r6r7ss1s7t1t7t8vv1v7ww1w4xy]]></url>
    <url execution-time="13"><![CDATA[select * from csv where url=@url and columns='Ask,AverageDailyVolume,Bid,AskRealtime,BidRealtime,BookValue,Change&PercentChange,Change,Commission,ChangeRealtime,AfterHoursChangeRealtime,DividendShare,LastTradeDate,TradeDate,EarningsShare,ErrorIndicationreturnedforsymbolchangedinvalid,EPSEstimateCurrentYear,EPSEstimateNextYear,EPSEstimateNextQuarter,DaysLow,DaysHigh,YearLow,YearHigh,HoldingsGainPercent,AnnualizedGain,HoldingsGain,HoldingsGainPercentRealtime,HoldingsGainRealtime,MoreInfo,OrderBookRealtime,MarketCapitalization,MarketCapRealtime,EBITDA,ChangeFromYearLow,PercentChangeFromYearLow,LastTradeRealtimeWithTime,ChangePercentRealtime,ChangeFromYearHigh,PercebtChangeFromYearHigh,LastTradeWithTime,LastTradePriceOnly,HighLimit,LowLimit,DaysRange,DaysRangeRealtime,FiftydayMovingAverage,TwoHundreddayMovingAverage,ChangeFromTwoHundreddayMovingAverage,PercentChangeFromTwoHundreddayMovingAverage,ChangeFromFiftydayMovingAverage,PercentChangeFromFiftydayMovingAverage,Name,Notes,Open,PreviousClose,PricePaid,ChangeinPercent,PriceSales,PriceBook,ExDividendDate,PERatio,DividendPayDate,PERatioRealtime,PEGRatio,PriceEPSEstimateCurrentYear,PriceEPSEstimateNextYear,Symbol,SharesOwned,ShortRatio,LastTradeTime,TickerTrend,OneyrTargetPrice,Volume,HoldingsValue,HoldingsValueRealtime,YearRange,DaysValueChange,DaysValueChangeRealtime,StockExchange,DividendYield']]></url>
    <javascript instructions-used="279387"/>
    <user-time>313</user-time>
    <service-time>75</service-time>
    <build-version>1678</build-version>
  </diagnostics>
  <results>
    <quote symbol="YHOO">
      <Ask>16.60</Ask>
      <AverageDailyVolume>22083900</AverageDailyVolume>
      <Bid>16.55</Bid>
      <AskRealtime>16.60</AskRealtime>
      <BidRealtime>16.55</BidRealtime>
      <BookValue>8.30</BookValue>
      <Change_PercentChange>+0.74 - +4.67%</Change_PercentChange>
      <Change>+0.74</Change>
      <Commission/>
      <ChangeRealtime>+0.74</ChangeRealtime>
      <AfterHoursChangeRealtime>N/A - N/A</AfterHoursChangeRealtime>
      <DividendShare>0.00</DividendShare>
      <LastTradeDate>6/1/2009</LastTradeDate>
      <TradeDate/>
      <EarningsShare>0.011</EarningsShare>
      <ErrorIndicationreturnedforsymbolchangedinvalid>N/A</ErrorIndicationreturnedforsymbolchangedinvalid>
      <EPSEstimateCurrentYear>0.36</EPSEstimateCurrentYear>
      <EPSEstimateNextYear>0.42</EPSEstimateNextYear>
      <EPSEstimateNextQuarter>0.08</EPSEstimateNextQuarter>
      <DaysLow>16.13</DaysLow>
      <DaysHigh>16.65</DaysHigh>
      <YearLow>8.94</YearLow>
      <YearHigh>27.10</YearHigh>
...
      <MarketCapitalization>23.140B</MarketCapitalization>
      <MarketCapRealtime/>
      <EBITDA>1.278B</EBITDA>
      <ChangeFromYearLow>+7.64</ChangeFromYearLow>
      <PercentChangeFromYearLow>+85.46%</PercentChangeFromYearLow>
      <LastTradeRealtimeWithTime>N/A - &lt;b&gt;16.58&lt;/b&gt;</LastTradeRealtimeWithTime>
      <ChangePercentRealtime>N/A - +4.67%</ChangePercentRealtime>
      <ChangeFromYearHigh>-10.52</ChangeFromYearHigh>
      <PercebtChangeFromYearHigh>-38.82%</PercebtChangeFromYearHigh>
      <LastTradeWithTime>4:00pm - &lt;b&gt;16.58&lt;/b&gt;</LastTradeWithTime>
      <LastTradePriceOnly>16.58</LastTradePriceOnly>
      <HighLimit/>
      <LowLimit/>
      <DaysRange>16.13 - 16.65</DaysRange>
      <DaysRangeRealtime>N/A - N/A</DaysRangeRealtime>
      <FiftydayMovingAverage>14.6126</FiftydayMovingAverage>
      <TwoHundreddayMovingAverage>12.9096</TwoHundreddayMovingAverage>
      <ChangeFromTwoHundreddayMovingAverage>+3.6704</ChangeFromTwoHundreddayMovingAverage>
      <PercentChangeFromTwoHundreddayMovingAverage>+28.43%</PercentChangeFromTwoHundreddayMovingAverage>
      <ChangeFromFiftydayMovingAverage>+1.9674</ChangeFromFiftydayMovingAverage>
      <PercentChangeFromFiftydayMovingAverage>+13.46%</PercentChangeFromFiftydayMovingAverage>
      <Name>Yahoo! Inc.</Name>
      <Notes>-</Notes>
      <Open>16.18</Open>
      <PreviousClose>15.84</PreviousClose>
...
      <Symbol>YHOO</Symbol>
      <SharesOwned/>
      <ShortRatio>2.10</ShortRatio>
      <LastTradeTime>4:00pm</LastTradeTime>
      <TickerTrend>&amp;nbsp;======&amp;nbsp;</TickerTrend>
      <OneyrTargetPrice>15.27</OneyrTargetPrice>
      <Volume>27926064</Volume>
      <HoldingsValue/>
      <HoldingsValueRealtime/>
      <YearRange>8.94 - 27.10</YearRange>
      <DaysValueChange>- - +4.67%</DaysValueChange>
      <DaysValueChangeRealtime>N/A - N/A</DaysValueChangeRealtime>
      <StockExchange>NasdaqNM</StockExchange>
...
    </quote>
    <quote symbol="AAPL">
...
  </results>
</query>

So how did we go from that ugly looking CSV file to the lovely XML? The answer is the yahoo.finance.quotes open data table:

<?xml version="1.0" encoding="UTF-8" ?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
    <sampleQuery>
      select * from {table} where symbol in ("YHOO","AAPL","GOOG","MSFT")
    </sampleQuery>
  </meta>
  <bindings>
    <select itemPath="quotes.quote" produces="XML">
      <urls><url>http://download.finance.yahoo.com/d/quotes.csv?s={-listjoin|,|symbol}</url></urls>
      <inputs>
        <key id='f' type='xs:string' const='true' default='aa2bb2b3b4cc1c3c6c8dd1d2ee1e7e8e9ghjkg1g3g4g5g6ii5j1j3j4j5j6k1k2k4k5ll1l2l3mm2m3m4m5m6m7m8nn4opp1p2p5p6qrr1r2r5r6r7ss1s7t1t7t8vv1v7ww1w4xy' paramType='query' />
        <key id='symbol' type='xs:string' batchable='true' maxBatchItems='20' paramType='path' required='true'/>
      </inputs>
      <execute><![CDATA[
        var results = y.query("select * from csv where url=@url and columns='Ask,AverageDailyVolume,Bid,AskRealtime,BidRealtime,BookValue,Change&PercentChange,Change,Commission,ChangeRealtime,AfterHoursChangeRealtime,DividendShare,LastTradeDate,TradeDate,EarningsShare,ErrorIndicationreturnedforsymbolchangedinvalid,EPSEstimateCurrentYear,EPSEstimateNextYear,EPSEstimateNextQuarter,DaysLow,DaysHigh,YearLow,YearHigh,HoldingsGainPercent,AnnualizedGain,HoldingsGain,HoldingsGainPercentRealtime,HoldingsGainRealtime,MoreInfo,OrderBookRealtime,MarketCapitalization,MarketCapRealtime,EBITDA,ChangeFromYearLow,PercentChangeFromYearLow,LastTradeRealtimeWithTime,ChangePercentRealtime,ChangeFromYearHigh,PercebtChangeFromYearHigh,LastTradeWithTime,LastTradePriceOnly,HighLimit,LowLimit,DaysRange,DaysRangeRealtime,FiftydayMovingAverage,TwoHundreddayMovingAverage,ChangeFromTwoHundreddayMovingAverage,PercentChangeFromTwoHundreddayMovingAverage,ChangeFromFiftydayMovingAverage,PercentChangeFromFiftydayMovingAverage,Name,Notes,Open,PreviousClose,PricePaid,ChangeinPercent,PriceSales,PriceBook,ExDividendDate,PERatio,DividendPayDate,PERatioRealtime,PEGRatio,PriceEPSEstimateCurrentYear,PriceEPSEstimateNextYear,Symbol,SharesOwned,ShortRatio,LastTradeTime,TickerTrend,OneyrTargetPrice,Volume,HoldingsValue,HoldingsValueRealtime,YearRange,DaysValueChange,DaysValueChangeRealtime,StockExchange,DividendYield'",{url:request.url});
        var quotes = <quotes/>;
        var rows=results.results.row;
        for each (var row in rows) {
          for each (var item in row.*) {
            var elname = item.localName();
            var txt = item.text().toString();
            if (txt=="N/A") txt=""; else if (txt=="-") txt=""; else {
              txt = txt.replace(/"/g, '');
            }
            row[elname]=txt;
          }
            quotes.quote += <quote symbol={row.Symbol.text().toString()}>{row.*}</quote>;
        }
        response.object = quotes;
           ]]></execute>
    </select>
  </bindings>
</table>

Let’s step through the main parts of the open data table definition. First the URL that YQL builds to get the data looks like this:

<url>http://download.finance.yahoo.com/d/quotes.csv?s={-listjoin|,|symbol}</url>

The listjoin is a URI template instruction that creates a “comma” separated list of values for each item in the “symbol” value. Symbol itself looks like this in the inputs section:

<key id='f' type='xs:string' const='true' default='aa2bb2b3b4cc1c3c6c8dd1d2ee1e7e8e9ghjkg1g3g4g5g6ii5j1j3j4j5j6k1k2k4k5ll1l2l3mm2m3m4m5m6m7m8nn4opp1p2p5p6qrr1r2r5r6r7ss1s7t1t7t8vv1v7ww1w4xy' paramType='query' />
<key id='symbol' type='xs:string' batchable='true' maxBatchItems='20' paramType='path' required='true'/>

Note the batchable attribute of symbol. This tells YQL that this parameter can accept a set of values that can be sent to the remote data provider in a single request. In this case, the finance CSV API can take a comma separated list of stock symbols and return all that information for each entry.

That cryptic looking f input key is a constant – we’re going to get all the fields we can every time, and this value holds all the short field names that the API understands.

The YQL execute section actually dispatches the request and processes the return data.

var results = y.query("select * from csv where url=@url and columns='Ask,AverageDailyVolume,Bid,AskRealtime,BidRealtime,BookValue,Change&PercentChange,Change,Commission,ChangeRealtime,AfterHoursChangeRealtime,DividendShare,LastTradeDate,TradeDate,EarningsShare,ErrorIndicationreturnedforsymbolchangedinvalid,EPSEstimateCurrentYear,EPSEstimateNextYear,EPSEstimateNextQuarter,DaysLow,DaysHigh,YearLow,YearHigh,HoldingsGainPercent,AnnualizedGain,HoldingsGain,HoldingsGainPercentRealtime,HoldingsGainRealtime,MoreInfo,OrderBookRealtime,MarketCapitalization,MarketCapRealtime,EBITDA,ChangeFromYearLow,PercentChangeFromYearLow,LastTradeRealtimeWithTime,ChangePercentRealtime,ChangeFromYearHigh,PercebtChangeFromYearHigh,LastTradeWithTime,LastTradePriceOnly,HighLimit,LowLimit,DaysRange,DaysRangeRealtime,FiftydayMovingAverage,TwoHundreddayMovingAverage,ChangeFromTwoHundreddayMovingAverage,PercentChangeFromTwoHundreddayMovingAverage,ChangeFromFiftydayMovingAverage,PercentChangeFromFiftydayMovingAverage,Name,Notes,Open,PreviousClose,PricePaid,ChangeinPercent,PriceSales,PriceBook,ExDividendDate,PERatio,DividendPayDate,PERatioRealtime,PEGRatio,PriceEPSEstimateCurrentYear,PriceEPSEstimateNextYear,Symbol,SharesOwned,ShortRatio,LastTradeTime,TickerTrend,OneyrTargetPrice,Volume,HoldingsValue,HoldingsValueRealtime,YearRange,DaysValueChange,DaysValueChangeRealtime,StockExchange,DividendYield'",{url:request.url});

This runs another YQL select statement when the table gets invoked. It’s fetching a CSV data source from a URL and setting up the column names for each “row” that comes back. The URL that YQL would originally have fetched for this data is already created in the request object, containing the list of symbols expanded into the s parameter, so we just used the @ substitution syntax to add that into the YQL statement.

var quotes = <quotes/>;
var rows=results.results.row;

The next few lines create a new XML object called quotes which will hold our final XML document and gets an XML list to each of the rows that came back from the CSV query.

for each (var row in rows) {

Now we’ll loop over each of those rows (one row per stock symbol).

   for each (var item in row.*) {
    var elname = item.localName();
    var txt = item.text().toString();
    if (txt=="N/A") txt=""; else if (txt=="-") txt=""; else {
      txt = txt.replace(/"/g, '');
    }
    row[elname]=txt;
  }

For each element in that row (note the E4X syntax to get all the elements row.*) we’re going to clean up the XML somewhat. We’ll get rid of “N/A” and “-” text elements and use empty elements instead, as well as remove any quotes in the text.

   quotes.quote += <quote symbol={row.Symbol.text().toString()}>{row.*}</quote>;
}

Finally in the main loop we’ll append a new quote element to our root XML quotes element that contains the reformatted XML elements, and an attribute called symbol

response.object = quotes;

Last of all we set the response object to the document we’ve just created in the loop.

We’ve already added the table to the github open data table repository, so you can try this table out in the YQL console just by including the community tables. And now its in YQL, you can sort, filter, project and join on any of this data that comes back! For example, you can pull out only a few fields and sort the quotes by who has the biggest gain.

Do it your way with execute and open data tables

Filed under feature

Today we’ve released a significant new feature for YQL: <execute>.

Previously our open data tables enabled any developer to bring the power of YQL and the table model available to new APIs or data sources that we hadn’t covered. Many of these are hosted on github, and you can read (and use) about them on datatables.org. These give you access to friendfeed, google reader, guardian, lastfm, nyt, twitter, weather.com, whitepages, yelp, zillow to name just a few.

YQL <execute> extends the open data table capabilities through server-side transformation and processing using javascript. This dramatically increases the reach and power of open data tables, for example:

  • join data across services
    e.g. grab New York Times article tags and incorporate associated flickr photos
  • combine multiple searches into a single result
    twitter, web, news and image…
  • augment data
    e.g. Add city/state information to APIs that just return zip code
  • create an API from a web page
    e.g. celebrity birthdays scraped from imdb
  • access APIs that require authentication
    e.g. Netflix OAuth, FlickrAuth, Google AuthSub …
  • data transformation
    e.g. convert the result from xml to Google’s kml format
  • move business logic for your application’s data to the cloud
  • leverage the work of others to make access easier by sharing tables

Read more about this on our YDN blog post, including links to several examples to wet your appetite.

YQL environment files

Filed under feature, news

We’ve introduced a new env query parameter to both the YQL console and the web service. The env parameter expects a URL value that points to a plain text file anywhere on the web (including subversion repositories, github etc). The file should contain a list of use statements that specify all the open data tables that you’d like to be available to your YQL queries for that request. For example, if I want all of the New York Times open data tables hosted at javarants I would create a file like this:

use 'http://www.javarants.com/nyt/nyt.article.search.xml';
use 'http://www.javarants.com/nyt/nyt.bestsellers.history.xml';
use 'http://www.javarants.com/nyt/nyt.bestsellers.search.xml';
use 'http://www.javarants.com/nyt/nyt.bestsellers.xml';
use 'http://www.javarants.com/nyt/nyt.movies.critics.xml';
use 'http://www.javarants.com/nyt/nyt.movies.picks.xml';
use 'http://www.javarants.com/nyt/nyt.movies.reviews.xml';

…and serve it from somewhere on the web. If you want to try this new feature on some useful open data tables in the console, you can use a few of the ones currently in the github open data tables project like this:

http://developer.yahoo.com/yql/console/?env=http://github.com/spullara/yql-tables/raw/ef685688d649a7514ebd27722366b2918d966573/alltables.env

Try it in the console!. If you look to the “table” list on the right of the console, you’ll see an extra 9 APIs and 30 tables to explore!