Wednesday, March 11, 12:00-1:15 p.m.
Strozier Library R&D Commons (Ground Level)
Deep Learning, Dirty OCR, and the Humanist’s Ever-Changing Toolkit
Few, if any, humanities projects involving data acquisition or digital imaging can be done without some knowledge of Optical Character Recognition (OCR). And yet OCR is itself a dynamic and changing application. Whether you are interested in data capture, data markup, corpus representativeness, or imaging capability — or, whether you are vaguely curious about the actual, social, or political implications of OCR on your teaching and research, and on the fate of scholarly and public collections — Digital Scholars’ next meeting will be of interest to you. We are pleased to welcome Dr. Allen Romano, Coordinator of FSU’s M.A. in the Digital Humanities, who will lead us in a hands-on exploration of “Dirty” OCR — a term often used to describe electronic forms or documents whose information has been inaccurately rendered.
This event is open — all disciplinary leanings and technical abilities are welcomed! Participants are invited to read the following in advance:
- Mark J Hill, Simon Hengchen (2019). “Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study.” Digital Scholarship in the Humanities, Volume 34, Issue 4, Pages 825–843. https://academic.oup.com/dsh/article/34/4/825/5476122
- Ryan Cordell (2017). “‘Q i-jtb the Raven’: Taking Dirty OCR Seriously.” Book History, Volume 20, Pages 188-225, via http://ryancordell.org/research/qijtb-the-raven/
- Ryan Cordell. “Why OCR?” https://ryancordell.org/research/why-ocr/
- Brandon Hawk, Antonia Karaisl, and Nick White (2019). “Modelling Medieval Hands: Practical OCR for Caroline Minuscule”, Digital Humanities Quarterly, Volume 13, Issue 1. http://www.digitalhumanities.org/dhq/vol/13/1/000412/000412.html
Participants are encouraged to bring laptops or tablets.
We hope you can join us,
Post-meeting Resources: Dr. Romano has shared with us the github directory he designed for today. Browse to: https://github.com/allenjromano/dirtyocr/blob/master/dirtyocr.md