Distributed Proofreaders

Distributed Proofreaders (commonly abbreviated as DP or PGDP) is a project to support the development of e-texts for Project Gutenberg. Public domain works, typically books with expired copyright, are scanned by volunteers or culled from digitalization projects and the images are run through optical character recognition (OCR) software. Since OCR software is presently far from perfect, often a large number of errors appear in the resulting text. To deal with this, individual pages are made available to volunteers via a web-based interface to proofread, displaying the original page's image and the recognized text side-by-side. This effectively distributes the time-consuming error correction process, analogously to distributed computing.

Each page goes through up to three rounds of user proofreading and two rounds of user formatting, after which a "post-processer" combines the pages and prepares the text for uploading to Project Gutenberg. The editing process is similar to the Christian Classics Ethereal Library, which predates it by several years but is focused on the narrower topic of Christian texts.

Distributed Proofreaders was founded by Charles Franks in 2000 as an independent site to assist Project Gutenberg. Distributed Proofreaders became an official Project Gutenberg site in 2002. Distributed Proofreaders posted their 5,000th text to Project Gutenberg in October 2004, and their 10,000th in March 2007. As of March 2007 the 10,000+ DP-contributed texts comprised almost half of the nearly 21,000 works in Project Gutenberg.

Among other projects, Distributed Proofreaders is currently working on producing a complete electronic edition of the 1911 Encyclopedia Britannica, the volumes of which will be available on Project Gutenberg as they are finished.

On 31 July, 2006, the Distributed Proofreaders Foundation was formed to provide Distributed Proofreaders with its own legal entity and not-for-profit status. IRS approval of section 501(c)(3) status was granted retroactive to 7 April, 2006.

In January 2004, DP Europe started, hosted by Project Rastko. This site has the ability to process text in Unicode UTF-8 encoding. Books proofread are centered mainly on European culture, with a large proportion of non-English texts including Hebrew, Arabic, Urdu and many others. As of January 2007, DP Europe had produced 410 books.

On August 21, 2004, DP released its 5,000th e-book A Short Biographical Dictionary of English Literature.

Besides their custom software to support the proofreading project, DP also runs a forum and a separate wiki using MediaWiki software for project coordination and community building.

DP 10K
On 9 March 2007, Distributed Proofreaders announced completing more than 10,000 titles. In celebration, a block of 15 titles was published:


 * Slave Narratives, Oklahoma (A Folk History of Slavery in the United States From Interviews with Former Slaves) by the U.S. Work Projects Administration (English)


 * Eighth annual report of the Bureau of ethnology. (1891 N 08 / 1886–1887) edited by John Wesley Powell (English)


 * R. Caldecott's First Collection of Pictures and Songs by Randolph Caldecott [Illustrator] (English)


 * Como atravessei Àfrica (Volume II) by Serpa Pinto (Portuguese)


 * Triplanetary by E. E. "Doc" Smith (English)


 * Heidi by Johanna Spyri (English)


 * Heimatlos by Johanna Spyri (German)


 * October 27, 1920 issue of Punch (English)


 * Sylva, or, A Discourse of Forest-Trees by John Evelyn (English)


 * Encyclopedia of Needlework by Therese de Dillmont (English)


 * The annals of the Cakchiquels by Francisco Ernantez Arana (fl. 1582), translated and edited by Daniel G. Brinton (1837–1899) (English with Central American Indian)


 * The Shanty Book, Part I, Sailor Shanties (1921) by Richard Runciman Terry (1864–1938) (English)


 * Le marchand de Venise by William Shakespeare, translated by François Guizot (French)


 * Agriculture for beginners, Rev. ed. by Charles William Burkett (English)


 * Species Plantarum (Part 1) by Carolus Linnaeus (Carl von Linné) (Latin)