Issues and Debates at the Intersection(s) of Art, Archive, Human, Material, and Network

Friday, January 13, 2:30-3:45 pm
WMS 415 (Williams Building, 4th floor, L off elevators)

The organizational meeting for Spring 2017 Digital Scholars will be dedicated to an introductory discussion (and potential complication) of issues and debates at various intersections of five central topics: art, archive, human, material, and network. While this meeting is primarily for graduate students enrolled in or regularly attending the group, all participants are welcome to read with us and join us for conversation:

Generally speaking, our readings for each session will reach across disciplinary boundaries; thus, if any of these pieces is difficult on first read, you might keep track of the following to help you get through it:

  1. The writers’ overarching project or stance, i.e., to what are they responding, or, what are they arguing for, writing about, promoting, or helping to re/define?
  2. How the project outlined in each text either raises or settles questions you might have about digital humanities, digital scholarship, or digital work in/across the disciplines
  3. Anything else that resonates.

See you in January!


Reflections on “Capacity and Care”

The final meeting of Digital Scholars for the 2015/2016 year allowed us to reflect on attaining a balance between the fulfillment of potentialities of the projects within the field and the exploration of alternatives and working methods that can lead to sustainable collaborative dynamics. Two Florida State University professors that have been developing Digital Humanities projects spoke about their respective experiences in collaborative endeavors. Dr. Will Hanley referred in detail to the course that he is currently preparing for Fall 2016, where students will work with The Egyptian Gazette, a newspaper from Alexandria, focusing on issues published during 1905. For her part, Dr. Silvia Valisa described her experience leading the Il secolo project, an initiative of digitization of the homonymous 19th century Italian newspaper.

Dr. Hanley emphasized the need to motivate his students, offering them the opportunity to develop a creative work of discovery and reporting. His class’ workflow, as he explained, will consist in four stages or steps: microfilming of the newspaper, conversion of the image into text (OCR), structural tagging and content tagging. Regarding the last one, he added that student will have to tag  three kinds of content: names of people, locations or places and events of some sort that appear in the newspaper. The last third of the class will be dedicated to formulate what he calls a serial question; that is, a question that has to be asked in the exact same way for the approximately 300 yearly issues of The Egyptian Gazette, and that gains relevance and interest based on the different answers it receives.

Students will have to report curiosities, technical difficulties and results to serial questions by blog posting. Grading is based on completion, which means that finishing the task is prioritized with respect to a performance evaluation. At this point, Dr. Hanley mentioned the interesting article “Student Labour and Training in Digital Humanities”; the article questions the discipline of Digital Humanities from the perspective of student labor, paying attention to students’ access, training, funding and collaboration itself in projects of this kind. Hanley considers that some of the good practices suggested by the article are already present in his course: namely, students can, in his own words, “carry their portion of the project through from the beginning to its end”. In addition, students not only collect data for others to analyze it, but they also carry out such task, which is less exploitative as long as the work is published under their name.

This also implies reconsidering the usefulness and specific weight of other more traditional assignments such as papers, which could be less beneficial or interesting for students. Hanley then addressed TAPAS, the service of shared infrastructures, publication and storage of TEI projects, oriented towards serving creators of TEI data who do not have the proper resources for their research projects. Highlighting its positive aspects and good intentions, an objection was raised against the (often) unnecessary mediation that TAPAS entails. Finally, he insisted in the “empowering transaction of training” as the most valuable exchange within the collaborative sphere of Digital Humanities.

Dr. Valisa explained her work with Il secolo, the main Italian newspaper during the late 19th Century, being her primary goal to make this publication more accessible to researchers and students. [See full description of Valisa’s work here.] Such endeavor is framed within the Sonzogno Digital Library Project, dedicated to the elaboration of a digital catalogue of the Milanese publisher; for this task, Valisa and her team employ Omeka software, having harvested and managed metadata from eight different American and Italian libraries and the CLIO catalogue. Valisa also addressed the issue of sustainable collaborative work and the functioning of the small team in charge of the Sonzogno digital library. Such team is based in Florida State University, and is composed by Drs. Sanghee Oh and Wonchan Choi, of the School of Library and Information Studies, with the collaboration of Seungwon Yang (Ph.D. student at the University of Virginia) and the occasional support from FSU’s graduate students Danila Coppola and Abhiram Chitangal.

To conclude, and regarding the problems and difficulties that these projects can pose for students (arguably, the weakest link in the collaborative chain), Valisa points to a lack of training on the main challenges of the discipline. In a similar vein, Hanley expresses the need to develop an ethic of crediting work that gives students full participant status in the different tasks and projects that they undertake. Despite of the preconceptions on Digital Humanities as a structureless, collaborative field where traditional hierarchies tend to blur, professionals within the field must be fully aware of the persistence of these boundaries.

As the authors of the article “Student Labour and Training in Digital Humanities” point out, “the rhetoric of DH . . . can obscure those structures that are already in place . . . A perceived opposition between DH methodologies and traditional humanities practices seems to deny the existence of a hierarchical power dynamic within DH projects”, and, in the end, “structurelessness ‘becomes a way of masking power’.”

Crowdsourcing, Small-, and Large-Scale Collaboration

Friday, April 15, 12:00-1:15 pm
Strozier Library (Scholars Commons Instructional Classroom) [Map]

“Capacity and Care”: DH Crowdsourcing, Small-, and Large-Scale Collaboration

The simultaneous bane and boon of most digital projects can be summed up in what Dr. Bethany Nowviskie, Director of the Digital Library Foundation, called “capacity and care” in her September 2015 keynote lecture at the NEH Office of the Digital Humanities. In this address, capacity refers to a project’s desired and expected growth potential, while care reflects the neglected but necessary process of ensuring that projects grow humanely (i.e., without promoting the spread of value-laden metaphors). By pairing these concepts, Nowviskie effectively argues that “making a case” for DH projects requires more than just making sense of the interpretive, cultural information-processing, and sharing capacity of small and large corpora; it requires giving attention to the kinds of (sustainable) working relationships that DH projects increasingly need.

For the final Digital Scholars meeting of 2015-2016, Dr. Silvia Valisa and Dr. Will Hanley will demonstrate some results of their large-scale digital projects and share their thoughts and experiences on conducting (and sustaining) collaborative projects in many sites — with digital librarians at home and abroad, across disciplines, and in the graduate and undergraduate classrooms.

Valisa’s development of the Il secolo project has sparked a series of collaborations related to platform development and metadata harvesting with faculty, staff, and graduate students in the FSU Libraries and the School of Information. As a bibliographic project alone, Il secolo is already a coup: it is a rare primary research tool for scholars of 19th-century Italian and European history that, until this digital collaboration, was accessible via micro-film or CD only in situ, in Italian National libraries. (FSU is the only  institution in the world outside of Italy to own and digitize this historical resource.) However, as a DH project, Il secolo is just as significant: building and maintaining the database has provided the libraries an opportunity to test and implement new semantic searching tools for their Islandora platform.

Hanley’s development of Prosop — a linked, flexible pool of historical names and demographic information — has led to a series of workshops, formal and informal collaborations and, most significantly, the development of DH curricula at FSU. In Fall 2016, Hanley will lead an undergraduate class in digitizing, encoding, and publishing the full text of a daily newspaper from Alexandria, the Egyptian Gazette, for the year 1905. Inspired by another Middle Eastern newspaper project, Till Grallert’s work on the Arabic-language Cairo/Damascus newspaper al-Muqtabas, Hanley’s class is an experiment in microhistory, pedagogy, collaboration, and academic labor, as well as an attempt at modeling a small-scale application of TEI.

Participants are invited to read the following in advance of our meeting:

and to browse the following resources for background:

All are welcome, and participants are encouraged to bring tablets or laptops.

We hope you can join us,


Sustainable collaboration and a democratic structure in DH projects

For the last meeting of this course we have read four articles; two of them related to large-scale DH projects (Dr. Silvia Valisa’s and Dr. Will Hanley’s collaborative projects), and two interesting analyses of the field of DH in general and a study of student labor in such projects.

For this post, I will concentrate in what I personally find most interesting, which is also a good topic for a final discussion of this course. I am referring to Dr. Nowviskie’s emphasis on care, mentioned in Dr. Graban’s introductory post, as it is related, not only to the articles, but also to what we have been reading and discussing during the seminar. Dr. Nowviskie proposes to include an “ethics of care” in relation to the capacity and potential for growth. She explains that an ethics of care fosters an appreciation of context, interdependence, and vulnerability, and it is oriented towards personal, worldly action and response. For a collaborative team to be successful, she states, we need to promote a happy yet critical engrossment.

The article written by Anderson, et al about student labor and training actually questions Digital Humanities rhetoric of collaboration and freedom from traditional hierarchy. It argues that the hierarchy of power in academia is a structural problem and it is not eliminated in the DH field. Students are usually “unseen collaborators”, and they tend to see their work in DH projects as the intellectual property of their supervisors. Their lack of recognition and involvement in the last stages of the project (such as scholarly interpretation), due to the kind of work they usually perform, is an issue that this article proposes to tackle. Their suggestions are aimed to improve the team work and truly achieve the ideal of collaboration fostered by DH.

The best practices proposed by the authors in relation to students should be extended to all projects, as they would contribute to the community of care that Dr. Nowviskie refers to:

  • Explicit and negotiated power structure, and transparent communication and assignment of responsibilities
  • Training should be deliberate, formally budgeted and accounted for, with an equal compensation for different tasks
  • Creating/promoting training institutes and supplementary online learning
  • Integrating Digital Humanities training within traditional education curriculum
  • Making the students feel their contributions are impactful
  • Fostering connections with other researches (affective labor)

I am sure there are many projects and programs that have taken these issues into account; however, this article’s findings show that more attention should be given to students’ involvement, and it makes us all aware of some of the challenges faced for the field to reach the growth it promises.

3 things I’ve come to understand about the ESTC

As someone from outside the discipline of English Literature, here are a few things I understand about the ESTC or the English Short Title Catalogue. The first thing is that it’s huge. As Dr. Graban notes in the post below, the archive comprises of close to half a million titles of handpress texts spanning nearly three decades. The second thing I understand is that it’s taken vast numbers of people to bring together this resource: librarians, administrators, scholars of literature, antiquarians, grant writers, the National Endowment for the Humanities among many others. In Daniel Slive’s interview with Henry Snyder, the director of the ESTC since 1978, Snyder credits his background in business, sales and retail as being influential in making it possible for the ESTC to exist. My point being that an undertaking as big as the ESTC required non-scholarly expertise. In other words, it takes a village to create a large scale bibliographic resource and it takes a lot of funds–continual funds. Snyder in his interview also notes that the ESTC is “a project that never seems to end” (84, “Exit Interview: Henry Snyder), and Snyder also states that “ESTC is administrative and political” (75, “Exit Interview: Henry Snyder). Business minds, administrative minds, scholarly minds, information science minds—many different individuals brought the ESTC to life. We often talk about the necessity of collaboration in digital humanities and the ESTC is no exception.

The ESTC requires many, many hands to keep running and it has lasted through multiple waves of data and information technology changes. From entries created by hand to microfiche, to initial databases that were stored on servers that took up entire rooms to current modes of digitization which can be stored on a single computer. Henry Snyder describes the ESTC as a “living, constantly growing and changing organism” and he also states that “sooner or later (and probably sooner), whatever mechanism and system we rely upon today will become passé, and will be supplanted by some ‘rough beast’ not yet imagined” (“The Future of the ESTC: A Vision”). Snyder points to the potential for the ESTC to transform yet again, and that transformation will be negotiated by our information technologies and again, many many people will be required to make that happen.

Finally, the third thing I’ve noticed about the ESTC is again, as Snyder points out above, is that what to do with the ESTC is a political issue. How to number the titles, what date to cut off the catalog at, how to expand the catalog, and how to organize the catalog—all of these considerations are up for extensive debate. In 1981, William Todd describes two schools of thought regarding the organization of the titles in the catalog, “the professional librarians” or “hardheaded realists” and “a motley contingent of scholars” who Todd describes as “softheaded visionaries” (390, “The ESTC as Viewed by Administrators and Scholars“). Not only is organization an issue, but also how the ESTC can be improved. Steven Tabor points to accuracy issues and the absence of information related to the physicality of the texts as areas of potential growth.

Regardless of how much the future of the ESTC can be debated, it seems clear to me that the ESTC is a touchstone for challenges that exist in the digital humanities. Issues of entry, issues of manpower, of funding sources, of management, of information science, of digitization, of primary record issues, of collaboration … the ESTC provides an archival model that we can look to for insight across the disciplines.

Evolution of the English Short Title Catalogue

Monday, March 21, 12:00-1:15 pm
Williams 454 (4th floor, turn R off the elevators)

“More than a science”: Evolution of the English Short Title Catalogue

With almost 500,000 items, the English Short Title Catalogue (ESTC) straddles the line between bibliography and database. Under (the late) former director Henry L. Snyder, its function expanded from recording 18th century English imprints to organizing records for letterpress items in any language from the middle 15th century through 1800, published mostly in the British Isles and North America. Records such as Samuel Edwards’ “Abstract of English Grammar, Including Rhetoric,” John Kersey’s treatise on elementary algebra, and Hannah Glasse’s “The Art of Cookery” indicate the vast range of topics currently reflected within the metadata of the ESTC; yet topical range is not the ESTC’s most notable trait. Unbound by a single criterion such as genre, chronology, or geography, the ESTC embodies several characteristics of a 21st century research tool, including crowd-sourced contributions and linked open data.

Digital Scholars is pleased to welcome Dr. David Gants to provide an insider’s look at the ESTC’s evolution, from its original three volumes compiled without the aid of a computer to its present imaginative logics based on “silent testimonies of fact” (Alston, “Computers and Bibliography,” 1981). The recipient of a 2011 Planning Grant from the Mellon Foundation, the ESTC makes an optimal case study for bibliographic and non-bibliographic tools that have outgrown their first trajectories. Dr. Gants will focus on six years of the project’s history (2011–2016), inviting us to reflect on its sustainable traits and to consider the problems and questions that arose as the ESTC attempted to move from its 30-year-old legacy platform to a more modern platform: What are the administrative steps involved in moving from bibliographic inquiry to digital preparation? How do we define a digital “record”? Who are (or who become) the new users of such a tool? What becomes its public impact on non-specialist communities and audiences? How can such a tool fulfill research agendas that are simultaneously nationalist and transnational in scope?

Participants are invited to read some of the following in advance of our meeting:

and to browse the following resources for background:

This session will be interactive; participants are encouraged to bring tablets or laptops. Dr. Gants has offered to share some of the ESTC Board’s internal documents in advance of the meeting. Please contact Tarez Graban with an RSVP so that she can provide you with a link for accessing them.

We hope you can join us,


On “Preparing ‘Messy Data’ with OpenRefine: A Workshop”

The fourth Digital Scholars group’s meeting for this semester consisted in a workshop led by Dr. Richard J. Urban (School of Information, Florida State University) on the possibilities of OpenRefine as a tool of data management. The workshop focused in OpenRefine’s possibilities as a tool for polishing and improving the presentation, structuring, and grouping of data on different fronts: namely, to “Remove duplicate records, separate multiple values contained in the same field, analyze the distribution of values throughout a data set and group together different representations of the same reality”. Dr. Urban led the session basing it on two tutorials: one by Michigan State University’s Digital Scholarship librarian, Thomas Padilla, and another by Drs. Van Hooland, Verborgh and De Wilde, from different Belgian universities and institutes.

The workshop’s main concern is the refinement of tidy data, and the achieving of an adequate structuring that would allow to bring a variety of categories together as well as to present them in an accessible way. As Hadley Wickham pointed out in his paper “Tidy data”, its preparation is not only a first step within data analysis, but a recurrent activity throughout the process as a whole; it is then necessary to deal with the constant appearance and subsequent incorporation of new data, which could take up to 80% of the time dedicated to analysis as such. Regarding these circumstances, the need of a good method and the employment of adequate tools seem indispensable both for efficiency criteria and the investigation’s success.

OpenRefine, one of the so called IDTs, or Interactive Data Transformation tools (a denomination that includes others like Potter’s Wheel ABC and Wrangler), provides aid against frequent errors in data sets, such as blank cells, duplicates or spelling inconsistencies, and it does so through four fundamental operations/solutions: faceting, filtering, clustering and transforming. Albeit OpenRefine also allows to combine data sets with open data, thus reconciling it with existing knowledge bases, both this operation of linking with external concepts and authorities and the process of named-entity recognition (known as NER) rely on achieving a well-articulated, coherent data set.

Using a sample data set on comics developed and gathered by the British Library, Dr. Urban showed the audience how to perform the aforementioned data polishing tasks with OpenRefine. Firstly, we observed how the facet function in each column allowed for identifying inconsistencies and repetitions; for instance, and regarding the publisher’s column, we could see that the publishing house Titan appeared in the data set under a number of variants (Titan], Titan., Titans, etc.). Through the facet function the user can rewrite them, thus avoiding these variants being considered as different publishers. The filter data function incorporates a text filter that allows to locate and find variants of pieces of data. After using the facet function, certain cases of variant spelling may persist; these would be easily identifiable thanks to the filter, which would display other defective, repeated records.

The cluster function helps locating patterns of variation, so that the user does not have to eliminate inconsistencies one by one through facets or filters. OpenRefine displays the cluster size and the different values that it takes, and allows the user to substitute or merge them in one single denomination. For instance, the nearly 4000 records that appear in the data set with the variations of the publishing house Titan mentioned above can be rewritten as “Titan” in one sitting. As one of the tutorials points out, the scope of these changes should not be a concern, given that OpenRefine keeps a record of changes and allows to go back to previously saved versions of the project. In the same vein, the tool offers a transformation function; with it, the user can modify data, be it through eliminating whitespaces that could cause a duplication of any of the information categories, be it by means of General Refine Expression Language (GREL). Thus, the tutorial’s example focused on the suppression of periods that could compromise an adequate register of the categories.

Lastly, Dr. Urban left room for questions and comments where the audience spoke about their projects and the ways whereby OpenRefine could be beneficial for them; for instance, one of the questions referred to how to deal with spaces (a critical aspect of the tool at different levels) in languages whose writing systems do not use them, such as Japanese. In addition to this, some issues and flaws of the tool were mentioned; perhaps one of the most significant is that an excessive storage size of the data set prevents OpenRefine from functioning properly, something that should prompt a revision towards an urgent upgrade.

On “Bitstreams: Locating the Literary in the Media Archive”

The Digital Scholars group’s last session to this date, conducted by Matthew Kirschenbaum’s talk “Bitstreams: Locating the Literary in the Media Archive”, covered a wide variety of subjects within a clear guiding thread: on the one hand, the materiality and specificity of the new (with this notion of newness properly problematized) information storage formats and codes; on the other hand, the need to better understand the potentialities of such mediums to complement and enrich our approach to the ways whereby we generate, consume and store information.

The notion of the archive, with its edges, fevers and anxieties monopolized great part of the session. We live in a moment when data management is omnipresent in public discourse, through sensitive and burning issues such as the right to be forgotten and mass surveillance. Thus, there is an underlying anxiety about dealing with the enormous repository of information that is being generated; hence the importance of making data available through adequate metadata. Reflecting on the topic of media simulation, the talk moved towards access and interfaces and the ways through which the relationship between users and archive materializes: what users are invited to do, and what they are not.

Regarding Wendy Chun’s article, “The enduring ephemeral, or the future is a memory” and its interesting distinction between memory and storage, the internet very often puts the focus in the preservation of memory. Kirschenbaum brought up the example of the digital library Internet Archive as a repository of cultural artifacts, dedicated to prevent the possibility of the so-called “digital dark age“. The Internet Archive, however, provides the user with surrogates like videogame emulators from previously existing formats/devices, failing to replicate an otherwise irreproducible gaming experience. On a separate issue, and regarding the key question of storage, processing and access to born digital texts, it was noted that an interdisciplinary approach is much needed, comprising a collaborative work between digital archivists and humanists.

One of the ideas that Kirschenbaum tried to contest was precisely the notion of electronic records as surrogates of physical, paper records, which are frequently considered primary. It is clear that at this moment electronic records are most of the time born digital: there is an innumerable list of cultural artifacts that lacks a continuous, physical version preceding the discreet, electronic one. At an instrumental, operative level, Wolfgang Ernst’s chapter, “Media Archaeography: Method and Machine versus the History and Narrative of Media” is relevant to these arguments due to its proposal of a media archaeology; paraphrasing the author’s words, media archaeology would be a kind of reverse engineering that does not seek to articulate a prehistory of the mass media (at least not in the historical sense), but to unravel its epistemological configuration. The idea is to explore mass media as the non-discursive entities that they are, understanding their belonging to a temporal regime other than the historical-narrative, and to overcome “screen essentialism” and go beyond the mere interface in order to find out how the hardware works. To Kirschenbaum, the point of practical intersection of media archaeology with digital humanities is precisely what he coined as the practice of digital forensics: securing and maintaining the digital cultural legacy through preservation, extraction, documentation, and interpretation.

In this same line of intersections, Kirschenbaum referred to the book Notebook, from Annesas Appel, “a project based on mapping the inside of a notebook [computer]”. The project proposes a sort of deconstruction of the device, together with a transition from tridimensional to a bi-dimensional perception. In it, different components and pieces are presented separated and in series. One of the book’s keys is that these components are progressively less recognizable as computer components; there is a detachment from their original function and a transformation towards a script, an isolation and atomization that Kirschenbaum described as media archaeological splendor and that makes evident an archive fever. Through this inventory-like atomization and disposition in series of computer components, and due to their immediateness and simultaneity, we seem to enter an order that is alternative to the historical-narrative: a kind of lost code that reveals itself.

To conclude, and moving back to the title of the session, one of the categories necessarily shaken at this juncture is that of the literary: what were physical manuscripts, traces of the writing process are now born digital files, with the generalized use of the computer as the preferred writing tool. Again, the convergence of media as binary, ones and zeros, makes us wonder if it still makes sense to distinguish what is and is not literary.

On “the archive” and “the ephemeral”– a follow up to Matthew Kirschenbaum’s “Bitstreams”

In his talk titled, “Bitstreams: Locating the Literary in the Media Archive,” Matthew Kirschenbaum interrogated the term “archive”. His talk was partially in response to two pieces of scholarship in the digital humanities, Wendy Hui Kyun Chun’s “The Enduring Ephemeral, or the Future is Memory”, which discusses how that which was once ephemeral is not due to media technologies, and also Wolfgang Ernst’s text titled, “Media Archaeography: Method and Machine Versus the History and Narrative of Media” which discusses how scholars can take an archaeological approach to technology or how we can consider what cultural factors allow media to be transcribed and the significance of those transcriptions.

To return to Kirschenbaum, he began his talk by noting that Derrida used the term “archive” to point to origins and memory. On the pages of The American Archivist, it was also noted that archival practice is increasingly institutionalized, that it growing at an exponential volume, that many records are simply missing and many archivalists struggle with the sheer number of authors and potential technological complications that come with contemporary archival work. What will be obama_1118825ado with President Obama’s Blackberry or our own family photos on Instagram? There is anxiety that surrounds this sort of abundance. Whereas an archive used to be a noun, now it is a verb—“to archive” means to back up data, to move something from more accessible to less accessible at another time.

In response to this abundance and these concerns, Kirschenbaum pointed toward the emerging discourse of “media archaeology.” For examarchive2-300x289ple, the Internet Archive is an archive of the internet on the internet, where bygone websites, games, and images can be explored. The IA also includes executable software that may not be available to experience elsewhere. The only problem with viewing these .exe files on a computer is that the bitstreams are not compatible and the browser may flatten the effect of these files.

Is any media processed through digital technology truly ephemeral? What is the changing nature of the archive in the face of the kinds of widespread digitization and in the face of the digital anxiety about data? The digital and the print aspects of the literary are combined in a contemporary context. A book is created using digital tools. When we archive the works of literary figures, we can consider how digital artifacts will be combined with that archival process.

The questioning of the ephemeral in this context brought up questions of trauma. Are there things that one has the right to forget? Should Facebook have the right to showcase our memories? This questionable ephemerality also pointed to the “screenshot economy” where questionable media events are impossible to erase because of users taking screenshots. An example of this “economy” would be when celebrities create inflammatory tweets and then attempt to remove them, but traces of the tweet exist because of users who screenshot the offense such as the recent event with Donald Trump tweeting about Ted Cruz.

Additional discussion centered on the archive itself. What is an archive able to do? What is it expected to do? In the age of increased digitization, there is a desire for the archive to simulate an experience of the past as well as preserve the data from that past event. It may be that experience of a medium in full authenticity that is ephemeral. We can use a bygone software, but can we recreate the experience of using that software on its original platform?

Kirschenbaum pointed us towards possible conversations that could be taking place between archival communities, archaeological communities and literary communities. With increasing awareness of the significance of materiality and the increasing number of digital collections, more discourse will definitely take place.

Preparing “Messy Data” with OpenRefine

Thursday, February 18, 12:30-1:45 pm
Strozier 107A (Main Floor Instructional Lab) [Map]

Preparing “Messy Data” with OpenRefine: A Workshop

The fourth meeting of  Digital Scholars for Spring 2016 will be conducted as a workshop, led by Dr. Richard J. Urban of FSU’s School of Information, who will walk us through two tutorials on how to use this tool for digital humanities scholarship–both for gathering and for interpreting unread data sets. Formerly a Google tool for data management, OpenRefine has recently been optimized for understanding, manipulating and transforming data of any kind, combining extant data sets (i.e., such as those that researchers have compiled in Excel spreadsheets) with open data, attained through web services and other external links. From large-scale repositories and networks to small-scale archives and visualizations, most projects constructed or used by digital scholars have benefited from data management with OpenRefine, or similar tools.

Participants are encouraged to browse the following resources in advance:

and to read the following for background:

Access to OpenRefine will be provided in the Strozier Library Learning Lab; thus, registration is helpful (though not required) so that we can gauge attendance. Participants are welcome to bring their own devices and install OpenRefine during the session. While Dr. Urban will mostly focus these tutorials, participants are also welcome to bring datasets that they would like to discuss or explore.

We hope you can join us,