|
|
Posted: Mon, October 13, 2003
Website syndication with RSS
By Dafydd Rees
What is RSS and how does it help?
Consuming RSS using AmphetaDesk
Registering new feeds with AmphetaDesk
Tips for publishers creating RSS feeds
It can be difficult staying current when you have to read many different websites and mailing lists. Most community-based websites use mailing lists, but thanks to the ever-rising tide of spam, people are
becoming increasingly reluctant to disclose their e-mail addresses.
And though on the one hand, the information that you can glean from websites and mailing lists can be quite valuable, you probably don't have the time to check many websites regularly and manage
large numbers of mailing list subscriptions. If this sounds familiar, using content aggregating software based on RSS technology may be just what you're looking for.
In simple terms an RSS feed is just a file on a website that people can download to get a summary of all changes to published content and the corresponding links. (RSS stands for 'Rich Site Summary'
or 'Really Simple Syndication'.) Rather than browse lots of sites every day, people use RSS aggregator software that regularly download these files in the background. The aggregator software then
weaves all this content together - usually into a single, locally held web page. Readers can then get daily summaries of all the changes that occur on multiple sites from this page. This means that you
can keep up with lots of websites without subscribing to vast numbers of mailing lists, and without divulging your e-mail address to many, potentially dubious mailing lists.
Some readers prefer to read RSS feeds to mailing lists. Subscription and unsubscription are merely a matter of telling your aggregator which feeds you wish to read. This is also good for the web
publisher because there's no need to maintain a web-enabled subscription list, nor provide facilities for recovering forgotten subscription passwords. This could also mean less Data Protection Act
compliance work.
With RSS, readers that aren't prepared to divulge their e-mail addresses can receive content that would otherwise require mailing list subscription. The originating website plays a purely passive role by
publishing the RSS. When a reader decides to unsubscribe, there's nothing the website publishers can do to stop it. To be successful, RSS publishers have to concentrate on providing compelling,
relevant content rather than trying to use technology to force readers to accept content through 'push technology warfare'.
There are many RSS aggregator programs around. In this article, we'll discuss AmphetaDesk - an open source aggregator available for Linux, Macintosh and Windows. You can download
AmphetaDesk from
http://www.disobey.com/amphetadesk.
Windows users need to download a .zip file. At present, AmphetaDesk doesn't have a standard, Windows-based installation program, so you have to copy out the folder present in the .zip file and
move it to a convenient location, say inside your 'My Documents' folder. If you're using Microsoft Windows XP, you can do this by double clicking on the .zip file which will display the contents in a new
window. Copy the folder shown in this window into 'My Documents'. Users of other versions of Windows will have to use an archiver that can unpack .zip files, such as WinZip.
When you run AmphetaDesk, it's going to try to update all the pre-set feeds registered. For this reason, you should run the program after connecting to the Internet. You run AmphetaDesk by
double-clicking on the file with the 'pill' icon in the folder you've copied into My Documents. When you do this, the aggregator starts up. The following window should appear:
Figure 1: The AmphetaDesk Application Window
Don't close this window, because that would shut down the program; instead, you should minimise this window. You should see the pill icon in the system tray. After a minute or so, your web browser
should pop up with a web page that looks like this:
Figure 2: The main web page
Figure 2 shows the main AmphetaDesk web page generated and served locally from a running copy of AmphetaDesk. Individual news channels are shown as tables, in which each row is a separate
news item within the feed.
Clicking on the title of each news item follows a link to the complete article on the originating web site.
Feeds are shown in descending order of freshness, so that new material is read first. This means that it's easy to skim-read large numbers of feeds, focusing only on what's changed and jumping directly
to articles of interest.
By default, AmphetaDesk polls each RSS feed website every three hours for updates (although you can change this by using the page named 'My Settings'.) This is fine if you're prepared to leave
AmphetaDesk running on your 'always-on' broadband, or corporate LAN connected computer. However, if you're using a dial-up link, you're probably going to rely on the fact that AmphetaDesk checks
for updates every time it is launched, so you simply start the program after dialling and wait for the main AmphetaDesk web page to appear.
AmphetaDesk can also be forced to check for new feeds, to open a new browser window on the main page, and shut down using the menu displayed by right-clicking on the pill icon in the system tray,
as shown in Figure 3.
Figure 3: Right clicking on the pill icon in the system tray, presents the three main options.
By clicking on 'Add a Channel' in the main window (Figure 2), you can manually enter the address of an RSS feed. Some websites provide an easier way of registering feeds with AmphetaDesk: If for
example, you go to the IT Wales RSS feed page at
http://www.itwales.com/rss, you will see rows of buttons. Each row corresponds with an RSS feed. Each feed has three buttons as shown in Figure 4,
below. The button depicting the AmphetaDesk pill allows 'one-click' registration with AmphetaDesk. The button marked 'XML' is a direct link to the RSS file itself. The other button, depicting a mug is for
'one-click' registration of an RSS feed with the Radio Userland blogging system.
Figure 4: RSS feed registration buttons as they would appear on a website. The IT Wales
subscription buttons are at
http://www.itwales.com/rss.
Which RSS format?
There are now many different RSS file formats. Every format is a separate XML dialect, although the complexity of formats varies considerably, from simple formats like RSS 0.91 to formats based on
RDF which provide advanced metadata. Personally, I'd prefer to get started with one of the older and simpler formats like RSS 0.91 because there is more software support for it and because there's
probably more business value in providing basic syndication data quickly to a large audience than investing time and effort to provide very sophisticated features that only a few people will
appreciate.
RSS is about summaries
Some feed publishers include multiple paragraphs, complex formatting and even entire articles within an RSS feed. Big, fat RSS feeds don't help the reader and can even be seen as arrogant or
anti-social. Although your feed might be important to you it is only one of many from the viewpoint of the reader. It's unlikely that most people need to publish more than a title, publication date and a link
for each article. Publishing a minimal RSS feed means that it downloads quickly and works even with the most basic aggregation software.
Which URL?
Give some consideration to the URL where you choose to publish the files. Changing the URL at which an RSS feed has been published can be a disaster because users will need to register the new
URL in their aggregator. It's better to publish your feed at a short, simple and memorable URL, and make a long-term comitment to support it at that address. Of course many web servers, especially
Java 2 Enterprise edition-compliant ones such as Jakarta Tomcat, support URL mapping so that even if the programs and the files behind the website change the URLs published can remain
unchanged.
Managing the bandwidth
Publishing RSS feeds could mean encouraging large numbers of people to download a file from your website. The HTTP protocol provides a means of asking a web server whether a file has changed
since a particular date. Web servers supporting this only send the file if it has changed since the date supplied, allowing great savings on bandwidth. If you are concerned about bandwidth issues you
might want to read about some other peoples' experiences at
http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers.
Free publicity
There are websites where you can register your RSS feed. This allows third party websites to provide search and syndication services on your content. For example you can register your RSS feed at
http://www.oreillynet.com/meerkat/ and
http://www.syndic8.com.
Hit tracking
Links back to the BBC News website from its RSS feeds don't point directly to the articles. Links in its RSS are special URLs that include the type of RSS feed as well as the identity of the article to be
retrieved. It's easy to imagine using HTTP redirects or internal server forwarding to return the same article for a different URL. Web publishers can use this trick to differentiate between ordinary hits and
those originating in RSS feeds. The BBC news feeds can be found at
http://www.bbc.co.uk/syndication.
What else can RSS do?
RSS provides a way of monitoring anything non-confidential that moves through a series of discrete changes over a period of hours, days or months that can be reached from an internet or
intranet-connected machine.
It's easy to write programs that index data either in the file system or in a database and format it as RSS. RSS feeds don't have to be about website changes. I have a patch for CruiseControl, a piece
of software that continually builds and tests software. By patching the web-based reporting, I was able to create a feed that highlights changes in a software repository. This feed can be used in an
aggregator just like any other.
Many 'on-the-fly' conversion programs exist that simply re-arrange content fetched live from third-party websites. Perhaps the best known of these is the 'Bill Gates Wealth Clock' (http://philip.greenspun.com/WealthClock). As an experiment I've written an on-the-fly converter that maps a discussion
forum 'recent changes' page into RSS. Of course, this isn't as reliable as patching the discussion forum software to create the feed directly, but it does provide a workable stopgap measure until the
original site can be upgraded and it does demonstrate that you can manufacture RSS feeds even for websites that don't support them directly.
It isn't difficult to find new, useful applications for RSS.
Conclusion
The RSS feed concept was first introduced in 1997 as a feature of some of the early blogging systems. Since then, it's gone from an important part of the blogging culture, to way of increasing the value
of large portals, and now it's becoming a simple, spam-free alternative to mailing lists. Whilst the fact that large, technology companies like Oracle, Sun and Microsoft publish RSS news feeds lend
credibility, technical people will always tend to take an interest in technology for its own sake. Perhaps the real question is whether this technology will gain a large following in non-computer related
subjects. Adoption by institutions like the BBC, and by sites like Blogs at Harvard Law (http://blogs.law.harvard.edu/) can only be seen as an
encouraging sign.
About the Author
Dafydd Rees is a software developer specializing in object technology. He welcomes feedback on this article at
http://www.dafydd.net/feedback.html.
|
|
|
|
|