Beanstalk was created in response to a major challenge for digital libraries: full-text searching of digitized material is significantly hampered by poor output from Optical Character Recognition (OCR) software. When first scanned, the pages of digitized books and journals are merely image files, making the pages unsearchable and virtually unusable. While OCR converts page images to searchable, machine encoded text, historic literature is difficult for OCR to accurately render because of its tendency to have varying fonts, typesetting, and layouts.
Beanstalk was created as part of the Purposeful Gaming and BHL project, which sought to demonstrate whether or not digital games are a successful tool for analyzing and improving digital outputs from OCR. Players are presented with phrases from scanned pages in the BHL corpus. After much verification, the words players type were sent to BHL to help improve OCR quality.
While the Purposeful Gaming project has ended and BHL is no longer utilizing gaming outputs, the games developed for this project successfully proved how human verification of texts could succeed where machines had failed.
Though the project period is over, Beanstalk will remain live for your continued enjoyment.