Content Sync Extensions Spec
The scope of Content Sync Extensions (CSE) is to define a small set of extensions necessary to enable unidirectional data sync for rich, occasionally connected applications.
CSE extends the following specifications:
RSS 2.0 http://blogs.law.harvard.edu/tech/rss
One goal of CSE is to reinvent as little as possible—hence the use of RSS in this spec as the underlying XML container format for syncing data.
Overview
Namespaces and Version
The XML namespace URI for the XML data format described in this specification is:
http://schemas.microsoft.com/rss/2007/contentsyncextensions
In this spec, the prefix "csx:" is used for the namespace URI identified above.
RSS Example:
Basic Principles
The CSE spec was created with these principles in mind:
- Unidirectional sync – CSE is optimized for unidirectional sync. If you are interested in bi-directional sync, consult the SSE spec.
- Feed of feeds – CSE provides a set of extensions that enables synchronization of an arbitrarily nested set of feeds. RSS items at any level of the feed tree can point to additional feeds. This concept of a "feed of feeds" allows you to quickly aggregate content and data as well as optimize the sync granularity for a particular scenario.
- References to additional data – rather than use RSS enclosures, CSE uses a special element (see
<csx:link> below) to reference data that is associated with an RSS item. There can be multiple references per item, and these references can be arbitrarily nested within the item. This data can be synchronized automatically, or flagged to be synchronized "on demand".
- Agnostic to sync scenarios – although RSS has been widely used for news content syndication, the CSE spec is explicitly agnostic to the types of sync scenarios that CSE can be used for.
Motivating Scenario
The CSE extensions to RSS were created to facilitate the development of rich, occasionally connected applications. By using well understood technologies such as RSS and XML, application authors can easily specify a set of data (and metadata) that should always be available to the application, as well as data that can be made available on demand. The notion of nested feeds allows the data to be partitioned into sets which optimize sync granularity and performance for a particular scenario. The ability to include <csx:link>’s anywhere within an <item>’s descendants allows app authors to express metadata in a fashion that is most convenient for their application.
The New York Times Times Reader is one kind of application that motivated the creation of CSE. Bear in mind, however, that CSE is not limited to newspaper publishing scenarios. The WPF Syndicated Client Experiences Starter Kit can help application developers quickly build applications that exploit CSE.
Extensions
<csx:link> element(s) within RSS <item>
The <csx:link> element is OPTIONAL. If it exists, it contains a string representing a URL to some resource on the web that is associated with the <item>. Relative URL’s are treated as relative to the URL of the enclosing feed. Each RSS item may have more than one <csx:link>, and <csx:link>’s can be nested within other elements that are children of the RSS <item>. The resources referenced by <csx:link>’s are downloaded according to the algorithm given later in this spec.
The <csx:link> is different from the RSS <enclosure> in several ways:
- This spec explicitly allows
<item>’s to have more than one <csx:link>.
<csx:link>’s may be optionally downloaded depending on the <item>’s attributes (see more below).
- The
<csx:link> can be nested within other children of the <item>, enabling more advanced data extensibility scenarios.
- Forking the concept of the
<csx:link> from <enclosure> allows standard RSS readers to continue to display and process CSE extended RSS feeds while given CSE enabled readers greater flexibility.
Attributes – none.
nestedFeed attribute for <csx:link>
This attribute is OPTIONAL. If not present, it’s assumed to be false. Only one <csx:link> per <item> can be flagged as a nested feed. If a <csx:link> has been flagged as a nested feed, all other <csx:link>’s in the item will be ignored.
Possible values
true – indicates that the <csx:link> element contained in this item points to an additional feed conforming to this spec. The item referenced by the <csx:link> will be downloaded and processed according to this spec.
false – default value. Indicates that any <csx:link> element is not a nested feed.
onDemand attribute for <csx:link>
This attribute is OPTIONAL. This attribute is used to denote nested feeds that should not be automatically downloaded. If not present, it’s assumed to be false. The table below specifies the behavior taken with the item’s csx:link elements with respect to this attribute.
| |
onDemand=true |
onDemand=false |
| nestedFeed=false |
Do not download |
Download only. |
| nestedFeed=true |
Do not download |
Download and process (search for <csx:link>’s.) |
<csx:lastBuildDate> element as a child of RSS <item>
The <csx:lastBuildDate> element is RECOMMENDED. The RSS spec defines a <csx:lastBuildDate> element as a valid element of <channel>, but not <item>. This allows feeds to separate the concept of <pubDate> (used for UI purposes) and <csx:lastBuildDate> (used to track when the feed was last updated, even if the <pubDate> remains the same). The <csx:lastBuildDate> is introduced in the CSE spec to allow this kind of information to be attached to an RSS <item>.
This element contains a Date/Time. Like RSS, these values must conform to RFC 822. The timestamp contained in this element signifies when an <item>, or any of the content referenced in its <csx:link> elements, was last updated. These timestamps play an integral role in the CSE sync algorithm. Thus, a feed generator MUST ensure that the <csx:lastBuildDate> of an <item> containing a <csx:link> to a nested feed is equal to the most recent <csx:lastBuildDate> of any <item> in the nested feed. If not, sync consistency cannot be guaranteed.
If not present, <csx:lastBuildDate> is assumed to be the item’s <pubDate>. If the <pubDate> is not present, <csx:lastBuildDate> is assumed to be January 1st, 1601.
<guid> element as a child of RSS <item>
This element is optional under the RSS spec, but REQUIRED when using content sync extensions. Under the RSS spec, the guid is simply a string, and content publishers must ensure its uniqueness. In practice, CSE only requires that this string be unique within the scope of the tree of nested feeds.
<csx:hiddenItem> attribute for RSS <item>
This element is OPTIONAL. If not present, it is assumed to be false. This attribute is defined as a convenience for CSE enabled RSS readers, since <item>’s may represent entities that shouldn’t be surfaced in an RSS viewer (a nested feed, for example). When set to true, a CSX enabled RSS viewer can hide this item from view. Values:
false –default value. Indicates that this item should be surfaced in an RSS viewer
true –Indicates that this item should not be surfaced in an RSS viewer
Example Feed
Below is an example feed. Note the following items:
<item> 1 – A nested, onDemand feed. It will not get downloaded.
<item> 2 – A nested feed that will get downloaded and further processed.
<item> 3 – A photo collection that has <csx:link>’s at multiple levels. Each of these links will be downloaded.
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:csx=http://schemas.microsoft.com/rss/2007/contentsyncextensions >
<channel>
<title>Example CSX Channel</title>
<link>http://boguslink.com</link>
<description>This is a CSX feed sample</description>
<lastBuildDate>Mon, 2 Oct. 2006 12:35:12</lastBuildDate>
<item>
<link>http://boguslink.com/frontpage.htm</link>
<guid>toplevel10012006.xml</guid>
<title>Bogus Paper</title>
<pubDate>Sun, 1 Oct. 2006 06:00:00</pubDate>
<csx:lastBuildDate>Sun, 1 Oct. 2006 06:00:00</csx:lastBuildDate>
<description>News from Sunday October 1, 2006</description>
<!--this feed will not get downloaded-->
<csx:link nestedFeed="true" onDemand="true">toplevel10012006.xml</csx:Link>
</item>
<item csx:hiddenItem="true">
<link>http://boguslink.com/slideshow.htm</link>
<guid>slideshow.xml</guid>
<title>Bogus Paper</title>
<pubDate>Mon, 2 Oct. 2006 06:00:00</pubDate>
<csx:lastBuildDate>Mon, 2 Oct. 2006 12:35:12</csx:lastBuildDate>
<description>The most current news</description>
<!--this feed will get downloaded AND processed-->
<csx:link nestedFeed="true">slideshow.xml</csx:link>
</item>
<item>
<title>Featured Photo Collection</title>
<link>http://boguslink.com/photos/abcdef.htm</link>
<description>Some really important thing happened</description>
<csx:lastBuildDate>Mon, 2 Oct. 2006 1:35:12</csx:lastBuildDate>
<!--all of the csx:Links in the Photos element will get downloaded-->
<Photos xlmns="http://contosophotos.com/2007/photos">
<Photographer Name="John Doe">
<csx:link>profiles/1324123/thumbnail.jpg</csx:link>
</Photographer>
<csx:link>profiles/1324123/a.jpg</csx:link>
<csx:link>profiles/1324123/b.jpg</csx:link>
</Photos>
</item>
<item>
<title>Another Really Important Headline</title>
<link>http://boguslink.com/stories/lmnop.htm</link>
<description>Another really important thing happened</description>
</item>
</channel>
</rss>
Sync Algorithm
The goal of CSE is to enable unidirectional sync of content from a content consumer to a content publisher. The <csx:link> element allows items to reference related content or nested feeds. When used to reference other feeds, <csx:link> elements create a tree of feeds representing a bundle of related data.
To properly sync this data, a sync implementation must be functionally equivalent to the following algorithm given in pseudo code (definitions and assumptions follow):
Add the root level <csx:link> to the download queue
While the download queue isn’t empty
Download the next <csx:link> in the download queue
If <csx:link> is nestedFeed
If the feed does not exist in the cache
For each <item> element in feed
Add all descendent <csx:link>’s with onDemand==false to the download queue
End for
Else if feed is newer than cached feed
For each <item> element in feed
If cached <item> is older OR <item> does not exist in cached feed
Add all descendent <csx:link>’s with onDemand==false to the download queue
End if
End for
End if
End if
Save file represented by <csx:link> to cache
End While
Algorithm assumptions:
- Data is not deleted from the cache by another process
- Once the sync process starts, it runs to completion.
- The root level
<csx:link> has the attributes onDemand==false and nestedFeed==true
In reality, some of these assumptions may hold not true. Thus, a robust implementation must take care to handle these conditions.
Algorithm Definitions
- Newer (for item) – A is newer than B iff the
<csx:lastBuildDate> of A is more recent than the <csx:lastBuildDate> of B. If the <csx:lastBuildDate> is not present, it is assumed to be the item’s RSS <pubDate>. If the <pubDate> is not present, the <csx:lastBuildDate> is assumed to be January 1st, 1601.
- Newer (for feed) – Feed A is newer than feed B iff the RSS
<lastBuildDate> of A’s first channel is more recent than the RSS <lastBuildDate> of B’s first channel. If the <lastBuildDate> is not present, it is assumed to be the item’s RSS <pubDate>. If the <pubDate> is not present, the <lastBuildDate> is assumed to be January 1st, 1601.
Behaviors
Item creation/deletion/update
Add/delete/update the <item> element in the RSS <channel>. Set the item’s <csx:lastBuildDate> to the current date/time. Propagate <csx:lastBuildDate> information to parent elements and feeds (see below).
Propagating the lastBuildDate to parent feeds
In order to maintain proper sync consistency, the <csx:lastBuildDate> of an <item> must be propagated up the feed hierarchy whenever an <item> is created, deleted, or updated. As an example, consider the scenario wherein a parent feed points to a child feed. If an <item> in the child feed is updated, the item’s <csx:lastBuildDate> must be propagated to the following locations:
- The
<item> element in the child feed
- The parent
<channel> of the <item> in the child feed (use RSS’s <lastBuildDate> element)
- The parent
<item> of the appropriate <csx:Link> in the parent feed
- The parent
<channel> of the <item> in the parent feed (use RSS’s <lastBuildDate> element)
When generating feeds, bear in mind that the <lastBuildDate> of a <channel> must be the most recent <csx:lastBuildDate> of any of the child <item>’s in that channel. In addition, the <csx:lastBuildDate> of an <item> containing a <csx:Link> pointing to a nested feed must be the most recent <csx:lastBuildDate> of any <item> in the nested feed.
Item deletion and cache clean up
When <item>’s are deleted from a feed, the associated files referenced by <csx:link>’s MAY be removed from the cache. Care should be taken to ensure that files reference by multiple <item>’s are not deleted until the last reference is no longer in the feed.
Frequently Asked Questions
What is the difference between SSE and CSE?
The extensions described in the SSE spec "enable loosely-cooperating applications to use XML-based… for item sharing – that is, the bi-directional, asynchronous synchronization of new and changed items amongst two or more cross-subscribed feeds." As described in this blog post, SSE is most interesting when:
- You want to sync data bi-directionally
- There are multiple, non-authoritative writers to the same data source
CSE is ideally used when these conditions don’t hold true (the data sync is unidirectional and there is a single authoritative source of data). There are many scenarios where both sets of extensions can be employed (for example, market data could be synced using CSE, and user comments on market data could be synced using SSE).
Why should I use nested feeds?
Nesting feeds gives you control over sync granularity and performance. Consider the task of syncing 7000 news stories spanning 7 days of news. Although you could represent this in a single feed, adding one story to the feed would require client applications to download a feed containing 7001 stories.
Alternatively, you could group these stories in nested feeds such that every day is represented by a different feed, and all of these days are aggregated into 7 items within a single parent feed. In this case, the parent feed is very light weight, and allows the sync engine to incrementally update the nested feeds when updates occur.
Why would I want multiple <csx:link>’s per item, and how do I determine sync granularity?
The RSS <item>> is the atomic unit of sync granularity. Consider representing a set of photo slide shows via CSE. If a slide show was to be synced as a single package of data, you could craft an <item> with multiple <csx:link>’s representing the photos. When a slideshow changed, the sync engine would download the entire slideshow. The advantage to this approach is in data representation. Since these links can be arbitrarily nested, this <item> could also be used to straightforwardly represent metadata associated with the slideshow. Also, a single feed could compactly represent many slideshows.
If you expected each photo of the slide show to be updated and changed independently, you should represent each photo as a single <item>. Each item could still have multiple <csx:link>’s (the photo and it’s thumbnail, for example). If a photo changed, or was added, the sync engine would only download the new or updated photos. In this case, it may also make sense to represent each slideshow as a single nested feed.