OCR and Data Cleaning

Wednesday, March 11, 12:00-1:15 p.m.
Strozier Library R&D Commons (Ground Level)

Deep Learning, Dirty OCR, and the Humanist’s Ever-Changing Toolkit

Few, if any, humanities projects involving data acquisition or digital imaging can be done without some knowledge of Optical Character Recognition (OCR).  And yet OCR is itself a dynamic and changing application. Whether you are interested in data capture, data markup, corpus representativeness, or imaging capability — or, whether you are vaguely curious about the actual, social, or political implications of OCR on your teaching and research, and on the fate of scholarly and public collections — Digital Scholars’ next meeting will be of interest to you. We are pleased to welcome Dr. Allen Romano, Coordinator of FSU’s M.A. in the Digital Humanities, who will lead us in a hands-on exploration of “Dirty” OCR — a term often used to describe electronic forms or documents whose information has been inaccurately rendered.

This event is open — all disciplinary leanings and technical abilities are welcomed! Participants are invited to read the following in advance:

Participants are encouraged to bring laptops or tablets.

We hope you can join us,


Post-meeting Resources: Dr. Romano has shared with us the github directory he designed for today. Browse to: https://github.com/allenjromano/dirtyocr/blob/master/dirtyocr.md 

Oceanic Exchanges: Newspaper Corpora and Networks

Wednesday, February 12, 12:00-1:15 p.m.
WMS 415 (from 4th floor elevator, turn L then R)

Oceanic Exchanges: Tracing Global Information Networks in Historical Newspaper Repositories, 1840-1914

For data and digital humanists, observing transnational and transcontinental news circulation offers a keen reminder that “news flow” is as much a function of intimate rhizomatic accidents and technological imagination as it is of telegram networks and modal distribution. This is particularly true when the flow occurred without the explicit use of digital tools, though the affordances of now-digital historical methods help to illuminate these accidents and networks in detail. Digital Scholars is pleased to welcome two scholars, Jana Keck and Paul Fyfe, to share Oceanic Exchanges, a series of projects that work toward uncovering the hidden strategies responsible for promoting the transcontinental flow of information about people, places, and global events between 1840–1914. During their virtual visit, Keck and Fyfe will offer stories of its exigence and development, and offer glimpses into how it is is designed to aggregate — in new ways — the vast but disparate linked open data that occurs in extant sources, such as Chronicling America and The Times Digital Archive. Among the many remarkable features of Oceanic Exchanges is its transcontinental construction. Led by Ryan Cordell and Lara Rose, and established to be an accomplished research collective, Oceanic Exchanges boasts a research team of scholars from seven countries in Europe and the Americas, and represents funded support from six national agencies.

Participants are encouraged to bring electronic tablets or laptops, and to read and browse the following resources in advance:

We hope you can join us,



Collecting Irregular Data on Medieval Manuscripts: “The Tremulator” Four Years Later

Friday, January 31, 12:00-1:15 p.m.
Strozier Library R&D Commons (Ground Level)

“The Tremulator,” Four Years Later

Four years ago this month, Dr. David Johnson presented Digital Scholars with a paleographic tool still under development: “The Tremulator.” Nicknamed after the intricate “layering” of glossed manuscripts in the Middle Ages (such as those produced by the “Tremulous Hand of Worcester” in 13th-century England), this tool was remarkable in two ways: (1) It enabled paleographers to perform scrutinous analysis of medieval inscriptions on something as accessible as a touch-screen device; and (2) it enabled a kind of crowd-sourced cataloguing and visualizing of translative data, especially capturing their various signs of use. As the first speaker in our series on “Using the Humanist’s Tools,” Dr. Johnson will discuss and demonstrate the Tremulator in its current iteration, offering insight into what developers call the “server-side” or “back-end” functions of the tool. Participants are encouraged to bring electronic tablets or laptops, and to browse the following resources in advance:

  • Johnson, David F (2019). The Micro-Texts of the Tremulous Hand of Worcester: Genesis of a Vernacular liber exemplorum. In Ursula Lenker, Lucia Komexl (Eds.), Anglo-Saxon Micro-Texts (pp. 225-266). Berlin, Boston: De Gruyter. https://doi.org/10.1515/9783110630961-012 [stable copy in Canvas org site]
  • Thorpe, Deborah E., and Jane E. Alty (2015). What type of tremor did the medieval ‘Tremulous Hand of Worcester’ have? Brain: A Journal of Neurology, vol. 10, pp. 3123-27. (open-access at Oxford Journals http://brain.oxfordjournals.org/content/138/10/3123)

We hope you can join us,



Organizational Meeting: Using the Humanist’s Tools

Friday, January 17, 12:00-1:15 pm
Williams 415 [immediate L off elevators, then R down hall to seminar room]

An Introduction to “Using the Humanist’s Tools”

For our first meeting of Spring 2020, we will identify lingering and observable tensions between institutional outcomes and institutional value where the humanities’ involvement in digital scholarship is concerned. We will do so by discussing three different proposals for achieving humanistic inquiry through appropriations of data: Christina Boyles’s 2018 argument for social-justice data curation as an intersectional approach to the digital humanities; Stephen Ramsey and Geoffrey Rockwell’s 2012 argument for a materialist ideology that demonstrates “building things” as legitimate theoretical work; and Lev Manovich’s 1998 argument for the database as an appropriately postmodern logic that harnesses the aesthetic capacities and technical motivations of Web 2.0.

These proposals are, by now, familiar and well circulating for many scholars and teachers of the digital humanities and related fields, yet publishing trends in the humanities show them to be largely unrealized at the institutional level. When we meet, we’ll question these as-yet unrealized goals. Do the proposals languish only within institutions that value external stakes more highly than internal outcomes (i.e., privileging big-data representations, tool development, and high-tech market applications over small-scale data representations or exploratory critical work)? Do they languish as a result of new (or recurring) systemic disagreements about the efficacy of materialist work? Or do they reflect more deeply embedded and conflicting assumptions about what is real in DH research?

While the January 17 meeting is primarily for graduate students enrolled in or regularly attending the group, all Digital Scholars participants are welcome to read and join us for conversation on any of the following:

Participants are encouraged to bring laptops or tablets. We hope you can join us.

Using the Humanist’s Tools: Spring 2020 Digital Scholars

Dear Friends of Digital Scholars,

I’m pleased to announce our schedule of topics and speakers for the culminating semester of Digital Scholars, on “using the humanist’s tools,” with all sessions inviting hands-on participation or offering a look into the architecture of particular projects. Please mark your calendars for the following dates:

Friday, Jan. 17, 2020
Organizational Meeting

12:00-1:15 p.m. (WMS 415)

Friday, Jan. 31, 2020
Collecting Irregular Data in Medieval Manuscripts, “The Tremulator,” with David Johnson
12:00-1:15 p.m. (tentatively Strozier Library R&D Commons, ground level)

Wednesday, Feb. 12, 2020
Digitized newspaper corpora and networks, “Oceanic Exchanges,” with Jana Keck and Paul Fyfe [via Zoom]
12:00-1:15 p.m. (WMS 415)

Wednesday, Mar. 11, 2020
Data cleaning for the humanities, “Dirty” OCR Analysis, with Allen Romano
12:00-1:15 p.m. (tentatively Strozier Library R&D Commons, ground level)

Friday, Apr. 3, 2020
Crowd-sourcing cultural citings/sightings, “Dante Today,” with Beth Coggeshall
12:00-1:15 p.m. (WMS 415)

More announcements will follow. We hope you can join us for one or more of these discussions in the spring.

Webinar: Data Surveillance

With an upsurge in attention toward veillance and transparency practices since Edward Snowden’s 2013 interviews published by The Guardian, public conversations of data surveillance have lately centered on racist and cultural critique. Please join us for our final webinar in the continuing series on “People in Data II,” open to any members of the FSU, FAMU, and TCC communities, as well as greater Tallahassee, the state of Florida, and beyond. This discussion will focus on several aspects of surveillance, from sousveillance alternatives (Steve Mann, 2005) to technological supremacy.


WEBINAR: Friday, November 22 – 12:00-1:30 p.m. EST
“Data Surveillance” featuring

  • Yuwei Lin, University of Roehampton [website; blog]
  • Anaïs Nony, University of Fort Hare [website]

Advanced Reading or Browsing
Participants are invited to read the following:

and to browse the following in advance:

All participants are requested to register at https://app.livestorm.co/florida-state-university-2.

Attending and Connecting
Webinar participants in Tallahassee are welcome to join us in person in the R&D Commons, basement level of Strozier Library, or to connect remotely via LiveStorm. Through the interactive features of our LiveStorm platform, all participants will have the opportunity to submit questions and participate in group chat.

Connection Requirements
Remote participants should ensure or secure the following:

  • Web browser (Edge, Chrome, Firefox, Safari version 10 or greater)
  • Adobe Flash Player version 10.1 or greater
  • Internal or external speaker
  • (recommended: headsets or earbuds for optimum sound)

Connection Troubleshooting
If your email host runs Proofpoint, you may experience some difficulty with the email-based link/button that Livestorm sends you to access the webinar. Should this happen, you can still access the webinar by copying/pasting the webinar url into your web browser, rather than clicking the link/button.

This webinar is made possible through the generous support of FSU’s Office of Research.

We hope you can join us,
— Tarez Graban

The Participatory Turn

Friday, November 1, 12:00-1:30 pm
PIH Digital Humanities Lab (Diffenbaugh 421)

On “The Participatory Turn”

In the opening pages to The Participatory Condition, Barney et al invoke Louis Althusser’s concept of “interpellation” to describe the various acts of “hailing and hearing” in which we — in the contemporary West — willingly participate through our interaction with media systems, both on- and offline. They further invoke Bernard Stiegler’s pharmakon to align this participation with “both [the] poison and [the] remedy, … [the] promise of emancipation as well as a form of subjection” that they understand as consequent to all mediated activity (x). At the next Digital Scholars meeting, we hope to consider the strength of these metaphors — weighing the viability of their arguments for a liberal democratic society, and looking more closely at what they understand to be the historical preconditions for such large-scale media liberalism. When did the era of technical media necessarily become an era of passive consent, dividuation, or domination? What are the opportunities for mediated participation beyond propagandized involvement? Where might we make room for alternative views? And how do answers to these questions invoke, in turn, salient discussions of people in data? Participants are welcome to read and join us for conversation on any of the following:

  • Boyle, Casey, and Nathaniel A. Rivers. “Augmented Publics.” In Writing, Rhetoric, Circulation, edited by Laurie E. Gries and Collin Gifford Brooke. Utah State UP, 2018, pp. 83-101. [stable copy in Canvas]
  • The Participatory Condition, edited by Darin Barney, Gabriella Coleman, Christine Ross, Jonathan Sterne, and Tamar Tembeck. Editors’ “Introduction” (pp. vii-xxxix), and Cohen’s chapter on “The Surveillance-Innovation Complex” (pp. 207-226). [stable copy in Canvas]

and to browse any of the following projects or tools in advance:

Participants are encouraged to bring laptops or tablets. We hope you can join us.