Blog Post #4: The Final Blog

Does the use of tools like Voyant lead to better history? To answer that, it should probably be established pretty quickly that having a tool available to you is unlikely to make history any worse. The only way having access to an optional tool might lead to a decrease in quality is if the tool is ineffective / misunderstood, and widely used for whatever reason. With this in mind, tools such as Voyant are at worst useless, and most complaints that they have made things worse aren’t particularly strong arguments.

So is Voyant actually beneficial? My initial reaction is yes, but only in specific contexts. Perhaps I just haven’t used it enough, but the information derived from this service seems too basic to draw any particularly noteworthy conclusions on its own. While finding word frequencies and the words that often appear nearby can shed some level of insight into the author’s priorities, it still generally falls short of  a deep reading of the work. Essentially, this service isn’t capable of replacing deep reading in almost all situations. It can, however, provide some clues before starting the deep dive. Scan in the work, look at the data, find some correlations, come up with some theories, then start reading for yourself. This way Voyant’s relatively light breakdown can be turned into a prep exercise to help get more out of the reading. As mentioned in lecture, these digital tools should start as a beginning, not an end, to research.

IrisPost_Peabody_1602_PhotoArchives

Good luck sorting through this in less than a month

However, let’s say deep reading isn’t really an option. What if you want to look at a huge collection of works, and try to discern trends, or pick out a work that had previously been left ignored. Older historical methods rely off of a relatively small number of key works, leaving a great deal of room for discovery, if only you could sort through a massive amount of mostly useless information. This is what computers can do well. Taking a massive data set, and sorting it into something that is hopefully useful. Computers promise to greatly expand the availability of previously obscure sources, greatly reducing the time spent looking for resources, and greatly increasing the time spent actually using them. In this ideal situation, research is not only faster, but better founded upon the available evidence.

Unfortunately,  I don’t feel Voyant will be the tool to completely fulfill this promise. The biggest issue is in the sources that Voyant can actually use. Anything you want analyzed needs to be convertible to a plain text file. This means that somebody has to manually transcribe a work in order to get usable input for this service. Transcribing anything long enough to need a computer to analyze is an incredible amount of work, defeating the purpose of making this research easier, and likely means it was popular enough to begin with that it’s not a “hidden gem”. This isn’t an issue if you want to analyze the writings of Shakespeare, but using digital tools to dissect a well known corpus of works is unlikely to be truly groundbreaking.

Pangur

When an AI can read this old Irish script accurately, we might have something truly special on our hands

The solution to this probably is already in development, and is seen fairly frequently in online archives. Getting computers to transcribe the text for you is the obvious solution for making obscure works readily available to analyze. The tech isn’t quite there yet, with enough errors and difficulty transcribing older sources that it won’t quite give the results we need, but within perhaps a decade, we could see AI capable of transcribing huge collections of works, and then breaking it down into something a human can make good use out of. Tools like Voyant seem highly situational today, but their successors could radically expand the sources that historians have available to them, hopefully leading to better history.

Blog #3: Collective Punishment

While looking through the collection of historical mapping projects, I noticed that a number of them did not seem to be active any longer. While it would be unreasonable to expect academics to continuously update a project, it was a little dispiriting to see so many that had already been taken offline, presumably without any online backups. In some sense, this is the equivalent of burning all the remaining copies of an academic article. A contribution to the body of academic work in the field, suddenly lost.

Perhaps the projects were unsuccessful, poorly executed, or unoriginal, however, as more and more major digital projects are completed, and as these same projects grow older and older,  more and more will be abandoned. These projects need to be treated like articles and books, with long-lasting copies made available.

Anyway, moving on to reviewing an active GIS project! I chose Collective Punishment: Mob Violence, Riots and Pogroms against African American Communities (1824-1974). This project aims to aggregate all the major racial riots and lynchings in America in a very broad time period. They are clear that this does not try to include smaller scale lynchings, etc… but instead focuses on large-scale events. Considering the way that data is organized here, or to be blunt, the lack of organization, this limiting of scale is a good idea.

2017-11-02.png

The main issue with the map is it’s rather basic construction. Created by essentially throwing pins onto Google Maps, there is no way to search for information, you cannot show only events of a certain time period, or organize by number of victims. Another useful feature would have been a basemap that showed population density in certain time periods, especially the racial makeup of areas, but again, this would require sorting by time period, and a more advanced tool than Google Maps. Had the they decided to include smaller scale riots and lynchings, the map would have completely collapsed, going from tedious to thoroughly parse to practically impossible.

The raw data is made available on the same page as the map, however, like the map itself, organization is a serious issue. The data is presented as plain text, when something like an excel spreadsheet with individually addressable fields would be much more useful (and which seems to exist, judging by the images on the website).

Fortunately, the information itself is better executed, with clear fields, concise but informative descriptions, and links to sources with more details. At first glance, the information seems solid and appropriate for this kind of project. Although, the project also seems to be incomplete, as practically no events west of Dallas, Texas are registered.

2017-11-02 (1)

Historical Texas almost seems to be racially tolerant here

Overall, the research here seems extremely broad and still a work-in-progress, but still promising. The big issue is that the designers of the project seem to be out of their depth in terms of technical skill, and should seek outside aid to shore up the map interface. Google Maps should probably be abandoned altogether in order to accommodate such a wide spatial and temporal scope. Given a better interface, a map like this could be a valuable teaching tool, as well as a good way to start off researching a project. Until then, however, it is simply too unwieldy for any large-scale analysis.

 

Blog Post #2: Online Archives

My first impression of the assigned digital archives was that they were designed for three different audiences. I’d imagine these archives were chosen for this very reason, and it seems as good a place as any to start looking into the uses of each.

Starting off with Ancestry.ca, the most obvious user for this service is the amateur historian looking to fill out their family’s genealogy. Without any significant curation, or an area of focus, this site is designed to store as much data as it can find, and then offer it to the user in the hopes that it will be useful to them. The ways they find this data is also quite interesting, as it relies off of a combination of user-submitted records and in-house mass digitization of census records, etc… The use of user-submitted records should make any historian wary, as these records can be just about anything. While much of it likely well-sourced and insightful, other users may have submitted that Einstein was the third cousin of Queen Victoria.

The in-house records promise to be much more reliable, and could actually serve as an invaluable starting point for research into the lives of people throughout the last few hundred years. It would be somewhat foolish for professional historians to brush off a company that is investing millions into digitizing as many obscure records as they can find, just because they aren’t an “academic focused” database. It’s probably best to view Ancestry.ca as the genealogical equivalent of Wikipedia. You should double check anything you find there, but it’s a great place to get a general overview of a subject before going deep into your research. While it’s role will be fairly limited in most historical research, this archive is a useful resource to begin with.

Image result for archive

The stuff you don’t have to go through, because Ancestry.ca did it for you.

I’ll admit, of the three, the Darwin Correspondence project was the one I found least intriguing. This is admittedly due to it’s scope. If you want an in-depth look at the life of one of the most important scientists in all of history, it’s excellent and should be a primary source of knowledge, regardless of whether you’re an academic or an enthusiast. However, that’s essentially all it does. They have curated lessons for various age groups, and extensive records on the subject, but that subject is always Darwin, or one of his close associates. It’s an archive that is incredibly useful in one or two fields, but practically useless beyond that. It’s roots as a small project for Cambridge are apparent.

One potential issue of having many small, specialized archives is a lack of centralization. This requires those who manage the archives to constantly maintain them, when a centralized resource would be more likely to continue and keep everything up to date format-wise (servers keep running, websites weren’t designed 30 years ago, etc…).

I like the format of the Medici Archive Project better, as it strikes a balance between specialization and size. It focuses on one subject, but a much wider one that incorporates many areas of research. The focus seems to be squarely on academics, with conference listings, new books in the area, and new projects from various universities. By serving as a centralized resource for many different academic projects, this archive can contain curated and reliable information on a wide variety of subjects, making it useful to more scholars who study the area. Also, due to the way exhibits on the site are set up, you could potentially have many different “sub-sites” that would function very similarly to the Darwin Correspondence project, but at the same time being much more centralized. However, right now it’s a little hard to judge the full experience, as their archive project still seems to be under development, and is not readily available.

Just as a technical note though, I know it’s a flashy, modern, media-heavy website, but are the constant loading screens really necessary?

2017-10-04 (4)

Every. Single. Page.

First Post! (For me at least)

Hi, my name is Francis Samson, and I always struggle with whether I’m a 2nd or 3rd year history major. This is my third year of university, however, I changed majors from something completely different (engineering), and now I’m missing about a semesters worth of credits. However, the important part there is that I’m a history major who really likes science and technology. I have reasonable coding skills, and like screwing around with computer hardware, so there was some inherent appeal in a course called “digital history”.

Image result for computer building

Things I like doing in my spare time.

At the moment, my digital research skills basically amount to typing things into a search bar that sound related to what to I’m researching. While this is vastly more efficient than scouring through a physical library, as the introduction of the textbook makes very clear, this is surface level stuff. What I’m hoping to get out of this course is to be able to use computers to analyze large sets of data, and create something more meaningful.

While it is wonderful that a database can give you 1200 primary and secondary sources for a single subject, this turns out to be an unmanageable amount of data. Human brains are pretty slow; I typically read a page in 1-3 minutes. So when presented with the entire collective writings on a subject, I’m typically forced to quickly determine which sources seem useful, and which don’t, without having read or understood most of the results in their entirety. It strikes me that computers, which specialize in processing large and ungainly data sets, are handing off the task too early to my lethargic data processing capabilities. While this quick selection interesting works leads to much more varied and interesting sourcing (I have found genuinely helpful articles from Turkish agricultural journals), I imagine that I’m still missing out on a huge amount interesting information. My second worry is that my internal biases will lead me to never even find information that conflict with my thesis. If I want to prove a certain point, there are enough scholarly articles and primary sources available that I could likely find enough to prove my point, even if I was in fact wrong.

So all we need is a program that looks at everything in the field, finds what’s relevant to our topic, and breaks it down for us in a way that we can understand and manipulate it for our own needs. Unfortunately, this is where things fall apart. Despite everything we hear about “Big Data”, and how smart computers are, there is one dirty little secret. Computers suck at context. The textbook nicely explained how computers see everything in terms of yes or no, when historians deal in maybes. Humans are highly subjective, and history is a highly subjective field. We don’t have time machines, we can’t know exactly what happened. We try to reconstruct history, and then argue why we’re probably right. Something you learn from coding is that computers will do exactly what you tell them to do. You are effectively trying to parse hundreds of documents with a machine that can only say yes or no. They are incredibly finicky, and it’s actually a modern miracle that Google can give you relevant results if you even misspell your search terms. Computers aren’t magic. They can retrieve breathtaking amounts of data, but they have to be used judiciously. Tools such as word clouds can give useful insights into large bodies of works, but at the end of the process, you still have to determine what all that data means.

Perl Problems

For the uninitiated, Perl and regular expressions are forms of computer code.

To me, this is the key issue that the introduction was focusing on; how to use digital tools in an intelligent way. While digital greatly expands our capabilities, it brings a whole host of issues along with it. A primary concern of the chapter was the effect that opening up of historical authorship would have to historians. If Google doesn’t think you need a PhD to be the primary source of historical information to the average user, then why does history as a profession exist right now? The answer proposed is that in order to compete, historians will have to fully learn and embrace these new mediums. You can’t take down shoddy research, but you can be better sourced and better written, while still being prominently placed and accessible. Adopting this strategy would require historians to act more like public figures, taking advantage of various media platforms in order to better communicate with the general public. Historians of tomorrow cannot afford to take as isolated a role as they traditionally have. This process will require a fundamental reformulation of academic history, but it will be necessary, and it could be bring about a dramatic increase in historical literacy.