YQL and Comet-based Streaming

Filed under feature

Summary

The latest YQL release adds support for Comet-based streaming with Downstream Polling (“CDP”), which allows YQL clients to receive updates to their queries in real time.

Motivation

In traditional YQL, a client must poll the YQL server for updates, by sending the same YQL query over and over again. Each time, the YQL server parses the query into a Pipe object, executes it, sends the results to the client, and closes the response.

Traditional way of invoking Pipe

Traditional way of invoking a Pipe

This approach is inefficient and does not scale well for updates: First of all, the YQL server has to parse the same YQL statements into corresponding Pipe objects over and over again, where each Pipe object is used to produce only a single response, after which it will be garbage-collected.  Secondly, there is no guarantee that the response data will have changed, resulting in unnecessary network traffic and wasted Pipe constructions and executions.

Long polling is not a solution either: It is impossible for the YQL server to know for how long to keep open a response, because it has no way of telling when new data has become available. As with busy polling, Pipe objects are not reused.

CDP attempts to address these deficiencies: In this mode, the client opens a single persistent connection to the server and sends the YQL query in the initial request. The YQL engine on the YQL server parses the query into a Pipe object, but instead of discarding the Pipe after a single execution and closing the response, it holds on to both the Pipe (turning it into a Standing Pipe), which allows it to execute the same query repeatedly over a period of time, and the Comet-enabled response, which allows it to send updated results to the client asynchronously and in real time.

Periodic invocation of Standing Pipe

Periodic invocation of Standing Pipe

Polling Frequency

In order to enable a table for CDP, its developer must specify the frequency (in seconds) that is appropriate for polling the table’s downstream web service for updates, using the new pollingFrequencySeconds table attribute.

If the YQL query is mapped to a single table, then the frequency with which the Standing Pipe will be executed is equal to the table’s pollingFrequencySeconds. If the YQL query is mapped to multiple tables, then the execution frequency of the Standing Pipe is set to the largest polling frequency of the tables involved, to increase the likelihood that each Standing Pipe execution will yield updated results.

Check out the YQL documentation for an example of how to enable a table for CDP.

Future Enhancements

A future enhancement will have the YQL engine participate in a truly event-driven, publish-subscribe (Bayeux) style notification system, where a table’s downstream service will be a named source of events, to which the YQL engine will subscribe through the appropriate event channel.

Implementation Status and Limitations

The current implementation of CDP is considered experimental and is made available on separate YQL web service endpoints, which are named after the traditional YQL web service endpoints, with streaming inserted into their URI paths. Therefore, YQL’s streaming-enabled endpoint for public tables is accessible through this URL:

http://query.yahooapis.com/v1/public/streaming/yql?[query_params]

whereas the streaming-enabled endpoint for OAuth-protected tables can be accessed at this URL:

http://query.yahooapis.com/v1/streaming/yql?[query_params]

The number of concurrent Comet connections has been throttled at the YQL engine: When the maximum number of concurrent Comet connections has been reached,  any requests that would normally have been put into CDP mode are served in the traditional way.

The version of the Comet implementation that CDP builds upon does not support a configurable timeout for Comet connections, with the effect that a Comet connection will remain open for only 20 seconds. This limitation will be lifted in a future YQL release.