Posted on Saturday, April 05, 2003 @ 07:40:13 CST
UPDATED:The original Perl script had formatting errors. The corrected file is now in place and ready for download.
There have been a lot of requests out there for an RSS feed from CNN.com. Prior to September 11, 2001, CNN maintained such a feed that people could pull from. For whatever reason, they decided to pull this feed after that date. Now, thanks to Dave Jacoby (here is his site), the feed is back.
The file is a perl script that pulls the headlines from each of the major CNN.com sections (CNN's home page, sports, business, entertainment, etc), and dumps them into an XML file, one XML file per section.
So, what do you need to run this script? Here's a quick rundown:
- Perl (I have Perl 5.8.0 installed, but it will probably work with older versions)
- Perl's HTML Parser
- Perl's XML Parser
- Perl's RSS Tool, for XML
- libwww-perl
All of these modules can be found at CPAN.org.
Once you've got the modules installed, download the perl script here. You will have to make one change to the file as follows on line 69. This line reads:
open FILE , ">/path/to/your/web/root/CNN_subdir/$file" ;
Change it to reflect the absolute path to your web root (and, if you want, a CNN directory within that directory).
Now, go to your admin section and add a new block. Title it whatever you want, and have the RSS/RDF file URL reflect the XML file that you want to show up in the block (I use http://localhost/CNN/CNN_TOP_STORIES.xml). I set my update time to 1/2 hour, since I have the cron job run every 15 minutes, to ensure the latest headlines.
That's it. Just run "perl cnn_headlines.pl" from the command line and you're set. You can add this as a cron job to get the latest headlines updated however ofter you wish.