From the Pinakes page:
Pinakes is a non-commercial tool the aim of which is to offer a renewed historiographic approach to the classification of the scientific heritage. Thanks to the integration of different types of objects, such as instruments, manuscripts, texts, iconography etc. Pinakes aims at transforming the traditional approach to the primary sources of the history of science into a sort of archeology of scientific knowledge. In order to achieve this ambitious project it was necessary to design a model of data-base, Pinakes, able to bring different classes of objects and items into one environment.
Pinakes has been thought as a database capable of hosting different levels of data structuring. On the basis of the choiche of the target, the user might be able to manage data form a very specific level to a more general description of the items
Interesting. (h/t Amanda French)
Jean-Claude Bradley at Useful Chemistry has announced (well, a few weeks ago) that the international chemical company Alfa Aesar has agreed to open source its melting point data. This is important not just because Alfa Aesar is one of the most important sources of that information. It also provides a model that could work outside of chemistry and science.
The data will be useful to the Open Notebook Science solubility project, and because Alfa has agreed to Open Data access, it can be useful far beyond that. In return, the Open Notebook folks cleaned up Alfa’s data, putting it into a clean database format, providing unique IDs (ChemSpiderIDs), and linking back to the Alfa Aesar catalog page.
Open Notebook then merged the cleaned-up data set with several others. The result was a set of 13,436 Open Data melting point values.
They then created a Web tool for exploring the merged dataset.
Why stop with melting points? Why stop with chemistry? Open data for, say, books could lead readers to libraries, publishers, bookstores, courses, other readers…
[Update (March 19)]: Google turned down the Berkman Center application, which included our projects ]
We’ve put in for a couple of Summer of Code projects, in conjunction with the Berkman Center:
1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi.
- Assuming we get permission, figure out how to retrieve syllabi from Google. (If we don’t get permission, we have a starter set of 500,000+ syllabi.)
- Figure out how to parse the multiple and free-form formats syllabi are found in.
- Design an appropriate and open data model for the information in syllabi.
- Build a Web site with that provides useful end-user and API access to the syllabus data.
2. Scholarly semantic web builder. The aim is to crawl the Google Books corpus looking for useful relationships among scholarly works. Such relationships only begin with citations/footnotes. What other semantic cues can be unearthed to see how scholarly books relate?
- Research the sorts of relations between books that would be of high value to scholars and researchers, in addition to footnotes.
- Crawl the Google Books corpus to discover these relations [if Google grants permission].
- Make these relations accessible in an open way, especially in conjunction with the ShelfLife app that provides community-based wayfaring through Harvard Library’s holdings for scholars and researchers.
- Create interesting and understandable analytics based on the discovered relationships.
We’re now waiting to see if the proposals get accepted.
The rest of the Berkman proposals — many fun ones — are here.
Felix Online, the online news of Imperial College in the UK, reports (in an article by Kadhim Shubber) that Deborah Shorley, Director of the Imperial College London Library, is threatening to end the library’s subscriptions to journals published by Elsevier and Wiley Blackwell, two of the major publishers in the UK. Upset with 6% increases in annual subscription fees (well above inflation, and in the face of a growth in profits at Elsevier from £1B to £1.6B from 2005 to 2009), she is demanding a 15% reduction in fees, as well as other concessions.
Says the article: “…if an agreement or an alternative delivery plan is not in place by January 2nd next year, researchers at Imperial and elsewhere will lose access to thousands of journals. But Deborah Shorley is determined to take it to the edge if necessary: ‘I will not blink.'”
As the article mentions, in 2010, after a 300-400% fee increase, the University of California threatened to boycott the Nature Publishing Group, including not engaging in peer review for NPG’s journals. (NPG claims that the rise in fees was due to the reduction of a discount from 88% to 50%. UC disputes this.) In August of 2010, NPG and UC came to announced “an agreement to work together to address the current licensing challenges as well as the larger issues of sustainability in the scholarly communication process.” [more and more]
The Dortmund University Library is releasing its 1.2M catalog records under a public domain Creative Commons 0 license. It is available for download. Yay!
This is promising. OCLC and Cambridge are experimenting with ways to make bibliographic data openly available. Having a reliable, open, set of bibliographic records would encourage the development of innovative applications. Or, put differently, not having a standard way to refer to books and other works has inhibited innovation. The main impediment has been the prohibitions in the licenses for this data. Perhaps this new project indicates a willingness to let an open, public catalog be created.
OCLC Research and Cambridge University collaborate on Open Metadata project – 28 Feb 2011
Library information provider OCLC Research, US, and Cambridge University have announced that both organisations will jointly conduct a six-month, JISC-funded investigation into the value of making collection metadata openly available in a sustainable manner.
The COMET (Cambridge OPen METadata) project will release a sub-set of bibliographic data from Cambridge University Library catalogues as linked data in multiple formats. This activity will test a number of technologies and methodologies for releasing open bibliographic data including XML, RDF, SPARQL, and JSON.
To enhance linking options, records will be enriched using two OCLC Research services to assign FAST (Faceted Application of Subject Terminology) and VIAF (Virtual International Authority File) headings. This will allow for effective information retrieval and semantic interoperability.
Starting in February 2011, COMET will document the availability of metadata for the library’s collections which can be released openly in machine-readable formats and the barriers which prevent other data from being exposed in this way.
It is expected that the project will bring value to the wider community by contributing substantially to the availability of open metadata. Linking to FAST and VIAF headings will demonstrate the potential usefulness of a structured semantic approach to data. The project will also look at the value data enrichment offers for resource discovery.
[Note: As always with posts on this blog, authors speak for themselves. – dw]
HarperCollins has changed its agreement with the main distributor of e-books to libraries: e-books will now become inaccessible after 26 checkouts.
I understand publishers’ desire to limit ebook access so that selling one copy doesn’t serve the needs of the entire world. But think about what this particular DRM bomb does to libraries, one of the longest continuous institutions of civilization. Libraries exist not just to lend books but to guarantee their continuous availability throughout changes in culture and fashion. This new licensing scheme prevents libraries from accomplishing this essential mission.
It’s beyond ironic. Until now, libraries have in fact had to scale back on that mission because there isn’t enough space for all the physical books they’ve acquired over the years. So, they get rid of books that have fallen out of fashion or no longer seem important enough. Now that the digital revolution has so lowered the cost of storage that libraries can at last do far better at this culture-building mission, a major publisher has instituted the nightmare culture-killing license.
So, why do I say that HarperCollins has lost its soul instead of just criticizing it for this action? Because if you cared about books as vehicles of ideas and not just vehicles of commerce, you would have dismissed with contempt an idea that treats them as evanescent as chatter on a call-in show.
Wikipedia is looking for volunteers to answer some questions as they try to understand why researchers and experts do and do not contribute to Wikipedia. From the email they’ve sent around:
Wikipedia is increasingly used by university students for “pre-research”, to gain context and explore ideas for course assignments and research projects . Yet many among scientists, academics and other experts are reluctant to contribute to Wikipedia, despite a growing number of calls from the scientific community to join the project [2-3].
The Wikimedia Research Committee  has just launched a survey to understand why scientists, academics and other experts do (or do not) contribute to Wikipedia and other collaborative projects, and whether individual motivation aligns with shared perceptions of Wikipedia within expert communities. We hope this may help us identify ways around barriers to expert participation. The survey is anonymous and takes about 20 min to complete. Please help us circulate the link among your colleagues and collaborators:
Harvard’s Library Lab has announced the first projects it will be funding. It’s an exciting group, and we’re proud that three of our projects made the list:
Library Analytics Toolkit: Tools to enable libraries to understand, analyze, and visualize the patterns of activities, including checkouts, returns, and recent acquisitions, and to do so across multiple libraries.
LibraryCloud Server: Build and maintain a web server that makes available to all Harvard library innovators data and metadata gathered from the Harvard libraries.
Library Innovation Podcasts: A series of biweekly podcast interviews with library innovators about their projects and ideas. The initial series would consist of 15 podcasts of about 20 minutes each.
We’re very excited about these, and have already begun work on them. In fact, if you have ideas for people to interview for our podcasts, let us know.