DPLA Beta Sprint Finalists

For those interested in comparing the demos of the 6 DPLA Beta Sprint finalists (to be presented Oct. 21 in Washington, DC):

DPLA announcement of 6 finalists

Links to the demos:

  1. Digital Collaboration for America’s National Collections
  2. DLF/DCC: DPLA Beta Sprint
  3. extraMUROS
  4. Government Publications: Enhanced Access and Discovery through
    Open Linked Data and Crowdsourcing
  5. Metadata Interoperability Services
  6. ShelfLife and LibraryCloud

Weekly Roundup: recent LIL happenings

Snippets of recent happenings in the Lab:

Annie Cain:

While in the middle of working on some CSS3 transition effects, I just happened to see prefixfree mentioned on Hacker News.

Instead of writing this in my stylesheet

-webkit-transition: margin .15s ;
-moz-transition: margin .15s ;
-o-transition: margin .15s ;
transition: margin .15s ;

I just wrote this

transition: margin .15 ;

Paul Deschner:

How do you find the leading legal cases cited in law review journals throughout their publishing history?  This is the goal of an exploratory project now being set up by visiting scholar Richard Leiter in collaboration with the Innovation Lab.  The hope is to compile a list of the most frequently cited cases, and, depending on what is discovered, possibly facet these results by subject, law-review clusters, etc.  Our initial approach: set up a scripted parser for the inspection of plain-text OCR from sample law journal volumes (generously made available to us by Hein Online).  On the basis of these initial results, using the most basic pattern-matching, we identify case-citation passages which in turn allow us to further refine the parser.  Checking against the associated PDF’s allows us to determine the degree to which we’re successfully capturing citations and to identify new patterns for inclusion in the parser refinement work.  Additional parsing will be necessary to handle initial vs. subsequent case-citation formats, in-text vs. footnoted references, article tagging and textually non-standard citation locations (such as page-spanning citations).  The lessons learned here will hopefully scale to examining general corpora of OCR texts for citation data.

Matt Phillips:

A couple of weeks ago I mashed up a, er, mashup: Find books in LibraryCloud that are related to news items coming off the New York Times Newswire.

Give it a try.

The app is about as crude as it can get. The searching for books is done by keyword-matching NYT topics (each NYT piece gets a topic) with LCSH (this crude matching is done in the crudest way). I think we can get much, much better matching with some more work: If we create links from DBPedia topics to LCSH, we can get really good, semantic, matching.

David Weinberger:

I was very pleased that Dan Brickley this week blogged about the work he’s been doing with the Lab on trying to figure out how to slot Web content into established library categories: How can a system automatically figure out that, say, a TED Talk about space travel ought to be clustered with the right Library of Congress Subject Headings? This is a phenomenally difficult problem because Web content can have very little metadata. Dan, has been exploring linked open data spaces, as well as some open source semantic extraction tools, to see if it can be done. We’ve been working with him all summer on this – which often means watching in amazement as he does his wizardry – which has led to his reporting that he is actually making some progress on this deep problem.

Kim Dulin:

(Kim’s away at the Mobility Shifts conference in NYC, showing off ShelfLife and talking about libraries, education, and other little topics.)

Jeff Goldenson:

No need for silly text, checkout this video, part of a pitch to the Harvard Library Lab fund:

Living Library from Harvard Library Innovation Lab on Vimeo.

Dan Brickley’s Taxonomy of Everything

We’ve been working with the brilliant Dan Brickley all summer (he’s very modest, so now I’ve embarrassed him) trying to figure out how to use all available metadata to slot Web content into library categorization schemes automagically. For example, if we include in our collection — or, more to the point, if the DPLA includes in its collection — library-worthy material such as TED talks, is there a way in which we could automatically categorize those talks within the general mix of library items? We’d like to be able to do this at scale, even if roughly.

No one knows linked open data better than Dan (there, I’ve embarrassed him again!), and he’s been experimenting with all sorts of metadata and connections. Today he posted about what he’s been up to. It’s pretty damn fascinating, and our team has learned a ton working with him on this all summer. We’re looking forward to more!

Library Future.0

This is a montage-y video of snippets from various library folk (including users) here at Harvard addressing aspects of the library’s present and future. We put it together as the opener at the first in a year of public conversations about the future of libraries.

Library Lab/The Podcast 008: The Molecule of Data

Listen: 20:46
(Also in ogg)

How can libraries use the power of metadata — those little molecules of information that help describe the greater work — to help users get more out of their search for resources?

Karen Coyle — herself a librarian — has spent decades helping to build an understanding of the incredible new powers unleashed by the digitization of libraries. She spoke with David Weinberger for this week’s LibraryLab/ThePodcast.

===========
Subscribe to the RSS of the LibraryLab podcast here to stay updated on upcoming episodes!

Subscribe to us in iTunesU

Creative Commons music courtesy of Brad Sucks and photos courtesy of alapublishing and rykneethling.