BAT - Blog Analysis Toolkit

Texifter personnel created the Blog Analysis Toolkit (BAT), which is a Web-based system for capturing, archiving and sharing blog posts. Blog posts are acquired via RSS feeds, and stored in a database where they can be accessed and shared by other researchers. More than 480 users have set up BAT accounts since April 2008, and collectively they have archived over 350,000 posts from more than 573 blogs.   Users can add individual blogs to the repository or do research using samples from existing collections created by other users.  

Although BAT is a deployed system, it is only a rudimentary prototype with very limited capabilities.  It does not retrieve blog posts from the past; only new posts moving forward in time. Comments left on the blogs, which are potentially very useful for democratic and network theorists, are not currently be captured, thus it misses the community discussion that makes blogs intrinsically and democratically interesting.

For a variety of layout and formatting reasons, BAT cannot currently harvest entire posts on a significant sub-set of the blogosphere, and it also cannot connect important metadata to the text collections.  We hope to address these limitations as a part of continuing research.