The powerful YQL feednormalizer table

Filed under feature, tutorial

YQL’s feednormalizer table is used to convert an input feed in one format into an output feed of another format.

Optionally, prexsl transformation can be applied to the input feed before format conversion and postxsl transformation can be applied to the output feed after format conversion. Prexsl transformation can be used to correct badly formed feeds; while as, postxsl transformation can be used to rearrange, filter, or format final output.

Input feeds can be of any character encoding; however, the output is always transcoded into UTF-8. Illegal characters found in the feed during transcoding are removed.

Syntax
The input feed can either be specified as a url or as an xml string. When the desired output format of the feed is specified, the input feed will be converted into that format. Optionally, one can supply xsl transforms to pre-process the input feed or post-process the output feed.

SELECT * FROM feednormalizer
 WHERE (url= | xml=)
  [AND output=('rss_0.91N'|'rss_0.93'|'rss_0.92'|'rss_1.0'|'rss_0.94'|'rss_2.0'|'rss_0.91U\rss_0.9'|'atom_1.0'|'atom_0.3')]
  [AND prexslurl=]
  [AND postxslurl=]
  [AND timeout=]

or

SELECT * FROM feednormalizer
 WHERE (url= | xml=)
  [AND output=('rss_0.91N'|'rss_0.93'|'rss_0.92'|'rss_1.0'|'rss_0.94'|'rss_2.0'|'rss_0.91U\rss_0.9'|'atom_1.0'|'atom_0.3')]
  [AND prexsl=]
  [AND postxsl=]
  [AND timeout=]

If a timeout value (msec) is specified, feednormalizer table will expect the url to respond within that timeout; otherwise, an error message will be returned.

Example 1: Querying a valid input feed

SELECT * FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'

Try above example in YQL Console. Try above example as REST request.


Example 2: Simple conversion of valid input feed

SELECT * FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'
   AND output='atom_1.0'

Try above example in YQL Console. Try above example as REST request.


Example 3: Converting invalid input feed produces error

Invalid feeds (such as http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml) produces errors when being transformed. For example, the statement:

SELECT * FROM feednormalizer
 WHERE url='http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml'
   AND output='rss_2.0'

produces the following error during execution:

Could not parse feed data. Invalid rss_2.0 feed, missing image title

Try above example in YQL Console. Try above example as REST request.


Example 4: Successfully converting invalid feeds using XSL transform.

In Example 3, we saw that the invalid feed could not be converted because it has missing image title. We can get around this by transforming the invalid input feed into a valid feed by removing the <image> tag. This can be done through the use of an XSL transform as shown below:

SELECT * FROM feednormalizer
 WHERE url='http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml'
   AND output='rss_2.0'
   AND prexsl='<?xml version="1.0" encoding="ISO-8859-1"?>
               <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                <xsl:output omit-xml-declaration="yes"/>
                   <xsl:template match="node()|@*">
                     <xsl:copy>
                        <xsl:apply-templates select="node()|@*"/>
                     </xsl:copy>
                   </xsl:template>
                   <xsl:template match="image"/>
               </xsl:stylesheet>'

Try above example in YQL Console. Try above example as REST request.


Example 5: HTML generation

A postxsl transformation can be applied to Example 4 to convert the corrected feed into HTML:

SELECT * FROM feednormalizer 
 WHERE url='http://www.yqlblog.net/blog/wp-content/uploads/tmp/example_feed.xml'
   AND output='rss_2.0'
   AND prexsl='<?xml version="1.0" encoding="ISO-8859-1"?>
               <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                <xsl:output omit-xml-declaration="yes"/>
                   <xsl:template match="node()|@*">
                     <xsl:copy>
                        <xsl:apply-templates select="node()|@*"/>
                     </xsl:copy>
                   </xsl:template>
                   <xsl:template match="image"/>
               </xsl:stylesheet>'
   AND postxsl='<?xml version="1.0" encoding="ISO-8859-1"?>
                <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                  <xsl:template match="/">
                    <html>
                      <body>
                        <h2>My News</h2>
                        <table border="1">
                          <tr bgcolor="#9acd32">
                            <th>Title</th>
                            <th>Description</th>
                          </tr>
                          <xsl:for-each select="rss/channel/item">
                            <tr>
                              <td><a><xsl:attribute name="href"><xsl:value-of select="link"/></xsl:attribute><xsl:value-of select="title"/></a></td>
                              <td><xsl:value-of select="description"/></td>
                            </tr>
                          </xsl:for-each>
                        </table>
                      </body>
                    </html>
                  </xsl:template>
                </xsl:stylesheet>'

Try above example in YQL Console. Try above example as REST request.


Example 6: HTML generation, 2nd example

For valid feeds that are already in the desired format, postxsl can be applied directly for HTML conversion:

SELECT * FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'
   AND postxsl='<?xml version="1.0" encoding="ISO-8859-1"?>
                <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
                  <xsl:template match="/">
                    <html>
                      <body>
                        <h2>My News</h2>
                        <table border="1">
                          <tr bgcolor="#9acd32">
                            <th>Title</th>
                            <th>Description</th>
                          </tr>
                          <xsl:for-each select="rss/channel/item">
                            <tr>
                              <td><a><xsl:attribute name="href"><xsl:value-of select="link"/></xsl:attribute><xsl:value-of select="title"/></a></td>
                              <td><xsl:value-of select="description"/></td>
                            </tr>
                          </xsl:for-each>
                        </table>
                      </body>
                    </html>
                  </xsl:template>
                </xsl:stylesheet>'

Try above example in YQL Console. Try above example as REST request.


Example 7: Transcoding input feeds into UTF-8

Input feed documents may be in any encoding. The output is always in UTF-8. During transcoding illegal characters encountered in the input feed are removed. The following diagnostics message will appear in the output, when illegal characters are removed:

removed 14 badly encoded characters from feed.

SELECT * FROM feednormalizer
 WHERE url='http://cn.wsj.com/big5/rssbch.xml'
   AND output='atom_1.0'

Try above example in YQL Console. Try above example as REST request.


Example 8: Selecting and filtering input feeds using YQL

SELECT entry.title FROM feednormalizer
 WHERE url='http://rss.news.yahoo.com/rss/topstories'
   AND output='atom_1.0' | sort('entry.title')

Try above example in YQL Console. Try above example as REST request.