Free the Law Wintersession Sprint

USB stick labeled 'all of the caselaw'

TL;DR: We are running a two week data mining sprint from January 4-15, 2016, open to current Harvard students, based on early access to a brand new data set of American caselaw. To apply, send a resume and brief statement of interest to jcushman at law dot harvard dot edu.



We recently announced Free the Law, our project to scan every legal decision ever published in the United States. We’re generating the first consistent, comprehensive, and open database of American law, from the colonial era right up to 2015. You can read the New York Times coverage of the project here.

By the end of this project we’ll have millions of cases in the dataset — no one knows exactly how many. We’re scanning and processing tens of thousands of pages a day, and will soon have entire states completed.

Now it’s time to start exploring what to do with all that data. What new questions can we ask with millions of cases?

The answers cross every discipline at Harvard:

  • Can a spam filter be retrained to guess which torts cases make the most interesting stories?
  • How much money are we willing to fight over — and does the answer offer an alternate inflation index?
  • How has the use of Latin in the law changed over time — are judges writing more or less like regular people?
  • How have defendants’ choice of murder weapon changed? The gender balance of litigants? The reliance on scientific evidence?
  • Can we trace a family’s history through the cases they were involved in?

Caselaw is the historical record of applied moral philosophy under the law. Unlocking its secrets will have an incredible impact on scholarship of all kinds.

The Challenge

Hence our challenge: pick a question you think caselaw might help you answer, perhaps drawn from one of your classes. Build a tool to help answer it – whether that means loading up your favorite ML library, configuring an off-the-shelf statistical tool, or writing code from scratch. In a two-week sprint, do your best to answer the question, and to generalize your tool to help other researchers answer similar questions. We’ll help share the discoveries you make and the tools you build.

The Data

The data set we will share with participants will include a single state’s complete published caselaw. The data includes: (1) TIFF and JPEG2000 images for each scanned page; (2) ALTO XML files for each scanned page; and (3) structured XML files for each case.


December 2015: Application period.

January 4, 2016: Delivery of data set to participants.

January 4, 6, 8, 11, 13, 15: The group will check in three times a week, either in person or remotely, to share notes, progress updates, and requests for help.

Week of January 18: demo day (date TBD).

To Apply

Send your resume and brief statement of interest (such as a general idea of what sort of project you would like to work on) to jcushman at law dot harvard dot edu. If you would like to work with others, feel free to apply as a group.

Link roundup November 30, 2015

This is the good stuff.

The Irony of Writing About Digital Preservation

The Original Mobile App Was Made of Paper | Motherboard

The most Geo-tagged Place on Earth

The Illustrated Interview: Richard Branson

Why is so much of design school a waste of time?

Link roundup November 16, 2015

This is the good stuff.


Rebellious Group Splices Fruit-Bearing Branches Onto Urban Trees | Mental Floss

Idea Sex: How New Yorker Cartoonists Generate 500 Ideas a Week – 99u

Google Cardboard’s New York Times Experiment Just Hooked a Generation on VR

Link roundup November 2, 2015

This is the good stuff.


French Vending Machines Dispense Short Stories Instead Of Snacks | Mental Floss

Swiss Style Color Picker

Chicago Ideas Week

Link roundup October 21, 2015

A little late to the party, but happy to be using the taco emoji!

The Internet’s Dark Ages

An Error Leads to a New Way to Draw, and Erase, Computing Circuits

Searching the world for original Pizza Hut buildings

Will digital books ever replace print?

Hiring! Devops energy wanted.

The Harvard Library Innovation Lab is looking for a devops engineer to help us build tools to explore the open internet and see deep into the future of libraries.

Our projects range in scope from fast-moving prototypes to long-term innovations. The best way to get a feel for what we do is by looking at some of our current efforts.


image01, a web archiving service that is powered by libraries


H2O, a platform for creating, sharing and adapting open course materials


Awesome Box, an alternate returns box used by hundreds of libraries


What you’ll do

Own the production infrastructure that ensures Lab applications are responding quickly to people and bots on the internet

Write code that will monitor systems and develop logic that will automate common deployment and maintenance tasks

Act as a core member of our fun and dynamic team by helping us shape ideas and efforts in libraries, technology, and law. We’re freewheelin’. We fully encourage the pursuit of interests and opportunities


We’re hiring a person and not a skillset, but our current stack of keywords might be helpful

Heroku, AWS, S3, Python, Django, Fabric, git and GitHub, Ruby, Rails, MySQL, PostgreSQL, Apache, NGINX, Elasticsearch, Redis, UNIX, Bash, Rollbar, Splunk


Find details and apply using the Harvard Recruitment Management System. If you have questions, email us directly at .

Link roundup October 1, 2015

Homogeneously contributed

Why Preserving Old Computer Games is Surprisingly Difficult | Mental Floss

Get Peanutized | Turn Yourself into a Peanuts Character

This Camera Refuses to Take Pictures of Over-Photographed Locations | Mental Floss

Link roundup September 11, 2015


The New York Times wrestled with many dimensions of video to visualize the making of a hit » Nieman Journalism Lab

A Roving ‘Batmobile’ Is Helping Map Alaska’s Bats

This Tokyo Book Store Only Carries One Book at a Time | Mental Floss

Backpack Makers Rethink a Student Staple

Link roundup September 3, 2015

Goodbye summer

You can now buy Star Wars’ adorable BB-8 droid and let it patrol your home | The Verge

World Airports Voronoi

Stephen Colbert on Making The Late Show His Own | GQ

See What Happens When Competing Brands Swap Colors | Mental Floss

The Website MLB Couldn’t Buy

Link roundup August 30, 2015

This is the good stuff.

Rethinking Work

Putting Elon Musk and Steve Jobs on a Pedestal Misrepresents How Innovation Happens

Lamp Shows | HAIKU SALUT

Lawn Order | 99% Invisible