Meaning and Analysis: Text Analysis Tools Use in Historical Examination

Try this

The use of Voyant Tools to analyze bodies of text is an amazing development in digital history, but it may actually ask more questions than it answers. These tools quantify and analyze the incidence of words, relationships of phrases and give some idea of the construction of the documents being examined. They can produce some interesting visual representation of the use of words, and allow comparison of different documents, such as the Cirrus pattern shown above. This pattern is derived from this article.

Such tools do augment the traditional reading skills that historians bring to bear, in that they point out the use of words that wouldn’t be easily apparent otherwise. To the present time, however, understanding of the meaning, the underlying sub-text, or the psychological effect of a document is not possible in a software tool. For instance, the tool can identify that a word has occurred more than a dozen times in a document, but it cannot explain why, or what that means.

 

Publicly available, free Voyant Tools are somewhat frustrating to use, because although one can load multiple documents into it, there doesn’t seem to be a way to save them. This means one must continually re-load the corpus if one leaves the tool for any reason. There also does not seem to be a way of automatically comparing the two documents, although you can manually compare the ratio of words to types, average number of words in sentences, and the recurring rate of use of various words.

But what does this mean? The tools will be much more useful to those who have training in the psychology of word use or historical information on how certain words resonate with certain nations, and how juxtaposition of phrases indicates a hidden agenda or sub-text. This kind of training will have to include how words might have been used in the past, and of course documents that have been translated from other languages than English will have to be analyzed differently too.

These tools also can’t analyze the intent of the writer. If a high emphasis is placed on one word or phrase, we cannot know why the writer chose to use that word or phrase so often. Of course, a historian reading the same word or phrase often in a document can readily understand that it is important to the subject. Was this the writer’s choice, or was it by accident? It would be helpful to be able to analyze words that are used less often, too, and compare usage in many related documents. Will we gain information from the use of Voyant? Absolutely we will. Will the information be relevant? That is uncertain.

Advertisement

Blog #3: Geospatial History and Harlem

The site that attracted my interest in Geospatial History is Digital Harlem. Harlem was the legendary African-American area of New York, and was the site of nightlife, gambling, entertainment and crime. The authors have researched data from New York crime statistics from legal files for Harlem in 1920, 1925 and 1930. For example, arrests in Harlem, at specific addresses, can be plotted on a map for 1920.

The detail is enlivened by photographs, and the website has five different areas of map-making, particularly on Numbers gambling, the various nightlife venues (nightclubs, buffet flats, and speakeasies), churches, sports, and events that occurred in January 1925. There is a great deal of information on this website. Searches on the information can be done by individuals’ names, or events or locations. The text panels that are available really fill in detail that can’t be shown on a map.

The difficulty, and maybe the advantage, of maps of historical events, is that information is hard to present in a linear way. The data is somewhat scattered, and linking one fact with another is left up to the reader. Although this provides some freedom to the reader, it relieves the historian from having to make those connections. Historical theses and conclusions are left out. Every history map I have read – and there are lots here that I want to go back and review later – leave me wanting to know where I can get a book on the subject. Maybe that’s good, and that is the purpose of history maps. Or maybe it is better to leave the conclusions up to the visitor?

HIST 2P26: Assignment #2 Historical Websites

Historical websites offer a new and interactive way to experience and research history. Three such tools are interesting examples. https://www.medici.org is a website which links readers to a vast database (over 4 million items) that covers documents kept by the Medici family covering everyday life in Florence from the 16th to the 18th century. https://ancestry.ca is a genealogical website which provides access to personal and private data, immigration records, war records, census records and many other items that can assist individuals trying to establish their ancestry and genealogy. https://darwinproject.ac.uk is a website linking to a database containing more than 8,000 letters written by or to Charles Darwin.

The Darwin Project is a British website managed by the University of Cambridge. Charles Darwin wrote and received letters over his lifetime in order to study and obtain information for his scientific observations, so access to these letters provides a view into Darwin’s scientific study, and also his personal world. The website offers users the ability to search the database, and also provides scholarly blogs based on academic research. There are materials directed at high school students, post-graduate students and professional historians. The authors of this website have recognized that many users will not have the knowledge or ability to search the database and make it easy to navigate, while offering some pre-packaged information about Darwin’s life and body of work. However, this website does not encompass all of Darwin’s correspondence yet, and meaningful research does require historical background and knowledge.

Ancestry.ca can be a useful website, containing much basic North American information. It is directed at amateur genealogical researchers in an individual way, and encourages users to post their findings in a structured family tree. It offers a free trial period, but then requires a monthly fee to continue a subscription. This is frustrating because much of the information that the site contains can be found, free of charge, elsewhere. An example of this is Canadian Expeditionary Forces service records, which are available on government websites. Another drawback to this site is that genealogical information posted by individuals can be based on hearsay or recollection, without any substantiation. This can result in inaccuracies being passed on and there is no way to dispute it. Marketing of the site, and of site products such as DNA testing, can be overbearing once one has paid the monthly fee.

The Medici Project is the most ambitious website of the three, for several reasons. The Medici kept documents on just about everything connected with everyday life, so the data being used is immense. The website describes its mission as serving as an online research institute, so it targets the scholarly community. The material it houses needs to be translated from Italian to English, and while some of that is done for the user it does require background historical knowledge to appreciate this site. It is not as clearly laid out as the Darwin Project website, but offers more scholarly articles, descriptions of academic projects underway, and online courses for those interested. Providing access to an important online research institute, free of charge, almost makes the one criticism seem minuscule, but given the subject matter the website could look prettier.

Moreover, compared with the Darwin Project website, it is more difficult to navigate. It requires a lot of mouse clicks to get where you want to go, as a colleague of mine pointed out, and these days scrolling is more popular and easier. The Darwin Project allows scrolling more freely. (ancestry.ca is out of this discussion, because they charge to navigate further than the first page). The Darwin Project makes one feel that all of the information on the site is freely available, along with scholarly analysis and packaged information. The Medici website made one feel that the scholarly institute owned all the data and was only going to allow glimpses at its own pleasure. This raises the question, who should own historical data, and what responsibility do they have for making it accessible to the public?

 

Intro Blog: Living History

My name is Cindy Allingham. At the age of 63, I have a slightly different viewpoint on digital history than many Brock students might. I attended University of Toronto in the 1970s studying European Medieval and Renaissance history, and continued to take part-time courses through the 1990s and 2000s towards my goal of a BA in History.

It is likely hard to imagine, but in the late 1970s a university education was not needed in order to easily secure a good job. I dropped out of school reluctantly to work in the Information Technology field, where employment was instant and lucrative. Initially I was trained as a COBOL programmer, but moved into middle management, where I built a career on anticipating technology trends, obtaining and implementing the right tools and equipment, and helping business people apply technologies to what had been manual work. I helped facilitate the use of mainframes, PCs, database tools, LANs, email and eventually the Internet in large organizations such as Canadian banks. Information management for corporations remained a lifelong specialization. Eventually, however, my knowledge became stale, and the market for my skills changed, and I wanted to learn something new.

My husband and I retired to Welland, and I am pursuing a degree in History at Brock. We are both interested in playing music in groups, and enjoy the peace and quiet in a smaller town. I have loved and listened to jazz and blues my whole life, and love to read. These interests often intersect, as I include an interest in American racial history and how music dominated its culture.

Last spring I began to research my great-uncle Sidney Edward Dudley, who died in WWI in France in 1917. Before he was shipped overseas he married Margaret Maud Haldenby; very little is known about her, but she did have 2 illegitimate children before she met my great-uncle. Their descendants have made contact with me and we have been pooling resources to try to find out more about her. This has sparked my interest in historical research at the turn of the 20th century, and shown me that digitized information can revolutionize the study of history, personal and public. For instance, I located an analogue photograph of the couple and was able to share it with Margaret’s descendants too. They never had a picture of her.

The Digital History Introduction reviews the question of whether technology applied to history is a good thing or a bad thing, and it examines the “promises” and “perils” of digital history in order to argue that, overall, the former outweighs the latter. Among the advantages mentioned are quantitative and expressive “promises” such as capacity, accessibility, flexibility, diversity, manipulability, interactivity and hypertextuality. The “perils” seem to be fewer: quality, readability, passivity and inaccessibility. These seem to be possible negative outcomes of the “promises”.

This kind of argument has taken place in many areas of technology over centuries of human development. I am old enough to remember a world in which digital technology did not exist, so it is easier to observe and understand its effect on the way we live and the way we think. For example, studying History prior to the 1980s required the ability to manually search thousands of card files at the university library for relevant books and journal articles. Compiling a bibliography and creating footnotes for essays were done completely by hand. Essays had to be written by hand or, for some privileged students, typed on a typewriter.

card-catalogelementary-4-638

Doing these things required skills that most students no longer possess. Searching a card file has been replaced with sophisticated search engine software on a database; it can now be done anywhere provided electronic access is available, so it no longer requires students to travel to the library. Gone are the gruelling nights spent scouring the card drawers, retrieving books from the stacks, nursing paper cuts, and taking supper breaks. It can all be done easily from the comfort of home.

Bibliographies are created using software tools, and even basic word processing software automates the process of inserting footnotes and applying formal style standards. On the up side students can spend more time on reading and thinking about the content of their essays, rather than on the form. On the down side, dependency on electronics to provide form leave the student with little understanding of how it is done or why form is important. Some of my former colleagues would argue that this dependency causes lack of investment and ultimate lack of knowledge among users.

Perhaps a simpler illustration of this conundrum is the switch from pen to keyboard. Using a pen mainly requires penmanship,  a skill which was taught in grade school in the 1960s.  Many have argued that penmanship has become obsolete in favour of keyboard skills. Some students don’t use writing or printing because everything they do is on digital keyboards. Is it lamentable that the skill of penmanship (being able to draw letters correctly) seems to be disappearing? Or is it just that the skill, like that of trimming a quill pen, is being discarded because it is no longer needed? How will education change when keyboard skills are no longer necessary because brain implants will allow us to transmit thoughts?

If we accept that digital history is inevitable, and that it is making profound changes in the way we think about, research and share history, we must also accept our own role in shaping the way information is managed, stored, arranged and used. We must also realize that we can contribute in dealing with the challenges of information management only so long as we recognize what those are, and this course will help us explore these challenges.