[Update (March 19)]: Google turned down the Berkman Center application, which included our projects :( ]

We’ve put in for a couple of Summer of Code projects, in conjunction with the Berkman Center:

1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi.

  • Assuming we get permission, figure out how to retrieve syllabi from Google. (If we don’t get permission, we have a starter set of 500,000+ syllabi.)
  • Figure out how to parse the multiple and free-form formats syllabi are found in.
  • Design an appropriate and open data model for the information in syllabi.
  • Build a Web site with that provides useful end-user and API access to the syllabus data.

2. Scholarly semantic web builder. The aim is to crawl the Google Books corpus looking for useful relationships among scholarly works. Such relationships only begin with citations/footnotes. What other semantic cues can be unearthed to see how scholarly books relate?

  • Research the sorts of relations between books that would be of high value to scholars and researchers, in addition to footnotes.
  • Crawl the Google Books corpus to discover these relations [if Google grants permission].
  • Make these relations accessible in an open way, especially in conjunction with the ShelfLife app that provides community-based wayfaring through Harvard Library’s holdings for scholars and researchers.
  • Create interesting and understandable analytics based on the discovered relationships.

We’re now waiting to see if the proposals get accepted.

The rest of the Berkman proposals — many fun ones — are here.


Felix Online, the online news of Imperial College in the UK, reports (in an article by Kadhim Shubber) that Deborah Shorley, Director of the Imperial College London Library, is threatening to end the library’s subscriptions to journals published by Elsevier and Wiley Blackwell, two of the major publishers in the UK. Upset with 6% increases in annual subscription fees (well above inflation, and in the face of a growth in profits at Elsevier from £1B to £1.6B from 2005 to 2009), she is demanding a 15% reduction in fees, as well as other concessions.

Says the article: “…if an agreement or an alternative delivery plan is not in place by January 2nd next year, researchers at Imperial and elsewhere will lose access to thousands of journals. But Deborah Shorley is determined to take it to the edge if necessary: ‘I will not blink.'”

As the article mentions, in 2010, after a 300-400% fee increase, the University of California threatened to boycott the Nature Publishing Group, including not engaging in peer review for NPG’s journals. (NPG claims that the rise in fees was due to the reduction of a discount from 88% to 50%. UC disputes this.) In August of 2010, NPG and UC came to announced “an agreement to work together to address the current licensing challenges as well as the larger issues of sustainability in the scholarly communication process.” [more and more]


[Note: As always with posts on this blog, authors speak for themselves. – dw]

HarperCollins has changed its agreement with the main distributor of e-books to libraries: e-books will now become inaccessible after 26 checkouts.

I understand publishers’ desire to limit ebook access so that selling one copy doesn’t serve the needs of the entire world. But think about what this particular DRM bomb does to libraries, one of the longest continuous institutions of civilization. Libraries exist not just to lend books but to guarantee their continuous availability throughout changes in culture and fashion. This new licensing scheme prevents libraries from accomplishing this essential mission.

It’s beyond ironic. Until now, libraries have in fact had to scale back on that mission because there isn’t enough space for all the physical books they’ve acquired over the years. So, they get rid of books that have fallen out of fashion or no longer seem important enough. Now that the digital revolution has so lowered the cost of storage that libraries can at last do far better at this culture-building mission, a major publisher has instituted the nightmare culture-killing license.

So, why do I say that HarperCollins has lost its soul instead of just criticizing it for this action? Because if you cared about books as vehicles of ideas and not just vehicles of commerce, you would have dismissed with contempt an idea that treats them as evanescent as chatter on a call-in show.


Wikipedia is looking for volunteers to answer some questions as they try to understand why researchers and experts do and do not contribute to Wikipedia. From the email they’ve sent around:

Wikipedia is increasingly used by university students for “pre-research”, to gain context and explore ideas for course assignments and research projects [1]. Yet many among scientists, academics and other experts are reluctant to contribute to Wikipedia, despite a growing number of calls from the scientific community to join the project [2-3].

The Wikimedia Research Committee [4] has just launched a survey to understand why scientists, academics and other experts do (or do not) contribute to Wikipedia and other collaborative projects, and whether individual motivation aligns with shared perceptions of Wikipedia within expert communities. We hope this may help us identify ways around barriers to expert participation. The survey is anonymous and takes about 20 min to complete. Please help us circulate the link among your colleagues and collaborators:

http://bit.ly/ExpertBarriers

[1] http://chronicle.com/article/article-content/125899/
[2] http://www.jmir.org/2011/1/e14/
[3] http://www.psychologicalscience.org/index.php/members/aps-wikipedia-initiative/
[4] http://meta.wikimedia.org/wiki/Research_Committee


Harvard’s Library Lab has announced the first projects it will be funding. It’s an exciting group, and we’re proud that three of our projects made the list:

  • Library Analytics Toolkit: Tools to enable libraries to understand, analyze, and visualize the patterns of activities, including checkouts, returns, and recent acquisitions, and to do so across multiple libraries.

  • LibraryCloud Server: Build and maintain a web server that makes available to all Harvard library innovators data and metadata gathered from the Harvard libraries.

  • Library Innovation Podcasts: A series of biweekly podcast interviews with library innovators about their projects and ideas. The initial series would consist of 15 podcasts of about 20 minutes each.

We’re very excited about these, and have already begun work on them. In fact, if you have ideas for people to interview for our podcasts, let us know.