Assignment 3 – Page 2 – Data Visualization in the Humanities

Palladio Gallery view (unfortunately image URL’s were not behaving)

For this assignment, I continued my investigation of journalistic sources involving the firing of Joe Paterno at Penn State University in 2011. In addition to the university newspaper sources I analyzed for our previous assignment, I added several non-collegiate journalistic sources to my corpus (articles from The New York Times, The Chicago Tribune, The Seattle Times, The Los Angeles Times, The Denver Post, and The Dallas Morning News). In terms of metadata, I chose to classify my data into the following categories: institution, institution location, institution geolocation (lat,long), institution connection to Penn State, article author, author gender, article title, article word count, date of publication, article classification upon original publication (ex: ‘Opinions’), and a quantitative sentiment valuation for the article (taken from Jigsaw and altered by adding one to the value given in Jigsaw to put all scores in a numerical set ranging from one to two). While my metadata certainly contained values for many constituent aspects of my articles, I believe each could be visualized in an epistemological way. When it comes to sentiment and word count (my two numerical categories), each of these facets could be visualized in such a way that aids in generating arguments regarding quantity and quality of discourse surrounding Joe Paterno’s firing. In addition, these values could be paired with author gender or geolocation to determine which areas (in general, as there is still subjective interference in article selection) or genders are speaking more or less and more or less pleasantly about the event. Although publication date did not turn out being too helpful for my data visualization, a larger corpus may foster some sort of conclusions tying article publication (denoting temporal proximity to the actual firing) to geographic location, institution classification, or institution connection to Penn State. I also believe it is important to keep article title and author name present in certain visualizations, as these things can both become pertinent to discursive analysis (understand particular author bias if a name is familiar and get a sense of article tone by looking at title).

Palladio Table view showing multiple layers of metadata

In both the Table and Gallery views in Palladio (Left: Table View, Above: Gallery View) I found the most important function of the platform to be keeping researchers from being too far divorced from the metadata and corpus in question. One of Johanna Drucker’s most poignant comments from her chapter mentions that “almost all information visualizations are reifications of mis-information” (Drucker 245). In my opinion, these two windows in Palladio prevent this from happening on some level. While they do not necessary alleviate concerns regarding data as capta (constructed and gathered by biased subjects), it does allow for those who engage with a corpus and set of metadata in the platform to see for themselves exactly where visualizations are coming from. In the Table view, this is accomplished by presenting metadata exactly as it was uploaded. The Gallery window (when linked to URL’s) not only does the same thing, but can allow (at least in my case) for researchers to return to the component articles/documents of a corpus with the click of a button. Essentially, each of these views in Palladio helps to increase familiarity with a data set/corpus for the benefit of better understanding visualizations in other views.

Palladio Graph view visualizing article titles authored by women (and sentiment value in node size)

Palladio Graph view visualizing article titles authored by men (and sentiment value in node size)

In the Graph view (Pictured Twice Above), I chose to visualize connections between author gender (M, F, and N/A in the case of collective publication) and article title with node sizing governed by Jigsaw sentiment score. While node sizes were not remarkably illustrative in an argumentative, epistemological, or hermeneutic sense, this could potentially be attributed to the small range of achievable sentiment scores keeping nodes at similar sizes. In addition, because the metadata being visualized comes from a journalistic corpus, it is understandable that wild swings in sentiment from one article to the next are not necessarily evident in the visualization (most articles must conform to a customary journalistic prosaic style). However, when node sizes are combined with the sentiment conclusions rooted in article title (as this conveys tone), one can use these visualizations to begin to discern a connection between gender and article “feel.” It appears to me that female authors, based on article title in particular, were more willing to tie the Paterno firing back to Sandusky’s sexual abuse explicitly, whereas male authors were more likely to use their platform to honor Paterno or frame the event through a less criminal lens.

Palladio Map view showing institution geolocation and sentiment value

I also found the Map view (Above) to be quite interesting in terms article sentiment and publishing institution geolocation. When I began to construct this corpus, I theorized that article sentiment would become more negative as proximity to Penn State decreased. However, according to the map, it seems that some of the highest Jigsaw sentiment scores (denoting more “positive” vocabulary in the source document) were found further from State College on the map (except in the southern United States). This could have had something to do with closer, Big Ten affiliated, schools being more saddened by the symbolic loss of one of their conference’s most recognizable icons. It is also important to remember when looking at this visualization that the documents in question are journalistic, and, for that reason, may not be the most perfect texts to analyze in terms of sentiment (although some articles are opinion pieces).

All in all, I believe my visualizations presented above can be perceived as flawed in some sense, but also intellectually constructive in another. Importantly, Palladio’s provision of the Table and Gallery views is helpful for preventing researchers from being barred from a conception of the “translation” process that occurs in the visualizations in the Graph and Map windows. That being said, I believe my visualizations, at least in terms of sentiment analysis, are hindered by forces of concealment and reduction that come along with an imperfect (and hidden) Jigsaw sentiment computation (in some sense these visualizations can be seen as simple representations or mis-representations of sentiment). However, what I do believe my visualizations were successful in doing is putting on display some of the ambiguity or more nuanced aspects of my corpus. For this reason, I believe some of my visualizations would be considered successful knowledge creators to Drucker, for their “denaturalizing” effect on viewers. These visualizations are certainly novel ways of interrogating journalistic discourse surrounding Joe Paterno’s firing, and thus can help to serve as stepping stones for new understanding.

When it comes to Drucker’s discussion of spatialization creating meaning, I believe the map view can be used as a tool for knowledge creation. Due to the fact that article sentiment is tethered to proximity to the focal event, one can begin to create arguments regarding geographic location governing attitudes toward the Paterno firing in local newspapers. Article titles could also be added to these visualizations (found by hovering over a point) to help with this aforementioned pattern recognition.

The dataset I used for this assignment was the same dataset I used for Voyant and Jigsaw. I am using the metadata from the Death Row inmates’ last words corpus. I had a lot of metadata from the website that I thought would be very interesting to investigate further. Although the last words themselves were extremely interesting to analyze, I was also very intrigued by other aspects of the data, such as the age, race, and county of the Death Row inmates. These three aspects of the dataset caught my attention because I figured that I would be able to paint a better picture of the inmates by looking at these other aspects in addition to their last words. It allowed me to see the breakdown of the inmates. I also decided on these three aspects because I figured if there were any underlying biases of which prisoners were sentenced to death, it would be evident in there race, age, or what kind of area they come from (such as a low socioeconomic county).

The screenshot above shows the racial breakdown of each county in Texas that at least one of the Death Row inmates was from. I found it interesting to organize it in this way because the counties in the middle that connect to each race node are more diverse in that they have people of more than one background. It is reasonable for me to assume that the counties on the outside of each of the three race nodes and that are not connected to more than one race are the counties that are more segregated. I found it interesting that the counties on the outside of the “White” node had significantly more counties without any other race or any type of diversity in comparison to the counties outside the “Black” node or the counties outside the “Hispanic” node.

This screenshot shows the breakdown of each race by age. The most interesting part about this visualization is that the white prisoners tended to be much older than the black and hispanic prisoners. The white prisoners had the greatest spread of age in comparison to the other races represented.

In this screenshot, I used the timeline function paired with the race of the prisoners. This shows how the racial breakdown of the prisoners on Death Row changed over time. From 1998 to 2006, it seems like the majority of prisoners executed were black and there were much more executions going on during those years. However, as we get closer and closer to present day, the number of prisoners executed drops significantly and the prisoners seem to be a little more diverse. I figured that the reason the number of executions dropped is because of the legal issues and heated debates that arose regarding capital punishment.

I also did another timeline visualization, but this time I paired it with the county the prisoners were from. This gave me insight into what some of these counties were like and how they might have changed over time. I figured that the counties that the most prisoners originated from are potentially the counties that are more dangerous or may have lower socio-economic status and higher crime rates. Harris county seemed to have the most prisoners and up until recently, it seemed to be very popular. Today it seems to be a little less represented.

This screenshot is from Google Fusion and it shows age and race. I wanted to see how the Google Fusion visualization would compare with that of Palladio. I felt that it looks similar to the Palladio visualization and it tells me the same thing about the data, which is that the white prisoners seemed to be much older whereas the other races seemed to be much younger. The white prisoners had a wider age range than any other race represented.