For the last few months I've been reading all my RSS feeds through RSS Bandit. Whenever I come across an interesting blog, podcast or news site I usually subscribe to the feed. This cuts down on my surfing time at the office (a good thing), but I also see a lot of repetitive information. Every couple weeks or so I look through my feeds and discard ones that I'm not currently reading regularly.
I currently have 29 feeds, which is probably an average amount for me. Sites like Lifehacker, Digg, Engadget and Ars Technica are ones that I usually read thoroughly. I'm amazed at how much information is usually repeated between all these news sources. I find that unique news can fall through the cracks because I'm quickly scrolling through 2/3rds of the new feed updates because they're repeats. I think sites that use tagging are useful to sort of types of news, but not as good at lumping together similar stories. A more intelligent "auto-tagging" would be useful, but difficult to implement. A simple, but less accurate way would be to create a tagset for a blog post or other news item based on the words in the title line. If multiple posts share a certain percentage (let's say 80%) of the words in the title of the other posts, then they can be considered to be the same post. I know that sounds like a good way to miss even more unique content, but I'm not sure how to implement something more sophisticated that isn't going to take forever. Starting with a spam-filtering algorithm might help match up posts better by searching the entire blog post.
After the jump, I'll post the contents my current OPML file (copy & pasting it into a text editor would probably help the wordwrapping).
<code><?xml version="1.0" encoding="utf-8"?>
<opml version="1.0">
<head />
<body>
<outline title="Technology">
<outline title="Ajaxian" xmlUrl="http://ajaxian.com/index.xml" htmlUrl="http://ajaxian.com/" description="Cleaning up the web with Ajax" />
<outline title="Journals.ars" xmlUrl="http://arstechnica.com/journals.rssx" htmlUrl="http://arstechnica.com/journals.ars" description="" />
<outline title="Engadget" xmlUrl="http://engadget.com/rss.xml" htmlUrl="http://www.engadget.com/" description="Engadget" />
<outline title="Ars Technica" xmlUrl="http://feeds.feedburner.com/arstechnica/BAaf" htmlUrl="http://arstechnica.com/" description="The PC Enthusiast's Resource" />
<outline title="ClaimID Blog" xmlUrl="http://feeds.feedburner.com/claimID" htmlUrl="http://blog.claimid.com/" description="Manage your online identity." />
<outline title="Official Google Blog" xmlUrl="http://googleblog.blogspot.com/rss.xml" htmlUrl="http://googleblog.blogspot.com/" description="" />
<outline title="Lifehacker" xmlUrl="http://lifehacker.com/index.xml" htmlUrl="http://www.lifehacker.com/" description="RSS Feed for Lifehacker.." />
<outline title="CNET News.com" xmlUrl="http://news.com.com/2547-1_3-0-20.xml?" htmlUrl="http://news.com.com/" description="Tech news and business reports by CNET News.com. Focused on information technology, core topics include computers, hardware, software, networking, and Internet media." />
<outline title="Techbargains.com" xmlUrl="http://techbargains.com//rss.xml" htmlUrl="http://www.techbargains.com/" description="Technology products buying guide - Find the best bargains on the latest products" />
<outline title="The Writely Blog" xmlUrl="http://writely.blogspot.com/atom.xml" htmlUrl="http://writely.blogspot.com/" description="What's up with us." />
<outline title="MacRumors : Mac News and Rumors" xmlUrl="http://www.macrumors.com/macrumors.xml" htmlUrl="http://www.macrumors.com/" description="the mac news you care about" />
<outline title="OSNews" xmlUrl="http://www.osnews.com/files/recent.xml" htmlUrl="http://www.osnews.com/" description="Exploring the Future of Computing" />
<outline title="Woot - One Day, One Deal" xmlUrl="http://www.woot.com/blog/rss.aspx" htmlUrl="http://www.woot.com/Blog/" description="Woot! - One Day, One Deal" />
</outline>
<outline title="CAD">
<outline title="Between the Lines" xmlUrl="http://autodesk.blogs.com/between_the_lines/index.rdf" htmlUrl="http://autodesk.blogs.com/between_the_lines/" description="AutoCAD and Autodesk related blog." />
<outline title="CADDManager.com Blog" xmlUrl="http://caddmanager.com/blog/sitefeeds/CMB-atom.xml" htmlUrl="http://caddmanager.com/blog" description="A CAD Manager's blog. A little less formal than my web site. Sharing thoughts, ideas, ponderings, and other items of note. But always with the focus on getting the job done better, spending less time and less money." />
<outline title="Autodesk | AutoCAD Software News" xmlUrl="http://taylor-tech.com/press/press.rdf" htmlUrl="http://taylor-tech.com/press" description="Autodesk, AutoCAD, CAD software product and industry news - http://taylor-tech.com/" />
<outline title="WorldCAD Access" xmlUrl="http://worldcadaccess.typepad.com/blog/index.rdf" htmlUrl="http://worldcadaccess.typepad.com/blog/" description="Talking about CAD from upFront.eZine" />
<outline title="Cadalyst" xmlUrl="http://www.cadalyst.com/cadalyst/cadalyst.rss" htmlUrl="http://www.cadalyst.com/cadalyst" description="Cadalyst is the most complete source for essential information about integrating CAD and related CAM/CAE/PLM technologies in the key market segments of AEC, MCAD and GIS. The magazine's objective business and technical reporting and product reviews lead high-level corporate managers, CAD managers and CAD users through management and purchasing decisions to realize higher productivity and profits." />
<outline title="Eat Your CAD" xmlUrl="http://www.eatyourcad.com/rss.php/feed.xml" htmlUrl="http://www.eatyourcad.com/" description="Information for CAD managers" />
</outline>
<outline title="News">
<outline title="digg" xmlUrl="http://digg.com/rss/index.xml" htmlUrl="http://digg.com/" description="digg" />
<outline title="CNN.com" xmlUrl="http://rss.cnn.com/rss/cnn_topstories.rss" htmlUrl="http://www.cnn.com/rssclick/?section=cnn_topstories" description="CNN.com delivers up-to-the-minute news and information on the latest top stories, weather, entertainment, politics and more." />
<outline title="ESPN.com" xmlUrl="http://sports.espn.go.com/espn/rss/news" htmlUrl="http://espn.go.com/" description="Latest news from ESPN.com" />
<outline title="NBA.com: Pistons News" xmlUrl="http://www.nba.com/pistons/rss.xml" htmlUrl="http://www.nba.com/" description="" />
</outline>
<outline title="Podcasts">
<outline title="FLOSS Weekly" xmlUrl="http://leoville.tv/podcasts/floss.xml" htmlUrl="http://twit.tv/" description="Every Friday we talk about Free Libre and Open Source Software with the people who are writing it. Part of the TWiT.tv podcast network. " />
<outline title="this WEEK in TECH" xmlUrl="http://leoville.tv/podcasts/twit.xml" htmlUrl="http://thisweekintech.com/" description="Your first podcast of the week is the last word in tech. Join Leo Laporte, Patrick Norton, Kevin Rose, John C. Dvorak, and other tech luminaries in a roundtable discussion of the latest trends in digital technology. Winner of the 2005 People's Choice Podcast Award for best overall podcast and Best Technology Podcast. Released every Sunday at midnight Pacific." />
<outline title="Adam Curry: Daily Source Code" xmlUrl="http://radio.weblogs.com/0001014/categories/dailySourceCode/rss.xml" htmlUrl="http://radio.weblogs.com/0001014/categories/dailySourceCode/" description="Adam Curry's Daily Source Code Podcast dailysourcecode.com" />
<outline title="Revision3 - Diggnation w/Kevin Rose & Alex Albrecht" xmlUrl="http://revision3.com/diggnation/feed/high.mp3.torrent.xml" htmlUrl="http://revision3.com/diggnation" description="Diggnation is a weekly tech/web culture show based on the top digg.com social bookmarking news stories." />
<outline title="this WEEK in MEDIA" xmlUrl="http://thisweekinmedia.libsyn.com/rss" htmlUrl="http://thisweekinmedia.libsyn.com/" description="A weekly discussion of media-related news" />
<outline title="Linux Reality" xmlUrl="http://www.linuxreality.com/feed" htmlUrl="http://www.linuxreality.com/" description="A podcast for the new Linux user" />
</outline>
</body>
</opml>
0 Responses to “RSS Bandit and Repetitive Information Overload”