Friday, February 20, 2009

Using reCaptcha for Digitization

The Christian Science Monitor has a feature today on the use of reCaptcha technology for book digitization. I've mentioned this before, but this is the best summary I've seen of the project, which takes words from scanned books that OCR programs have difficulty with, and places them in those little "type these two words" boxes on secure sites.

"Web users now provide about 3,000 man-hours a day of free labor in 10-second bursts of human computation, correcting more than 10 million words every day. ReCaptchas have solved 5 billion words in less than two years. Most people aren’t even aware that their brain power is being harnessed, although every reCaptcha includes a button that users can click to explain the program."

The verified words are used in the Internet Archive's text versions of scanned books.