Categories
Assignment 5

Assignment #5

Gephi is a really powerful data visualizing tool with comprehensive features, supporting calculations, formatting, filtering and etc. It also provide user abundant choice on layout and coloring with multiple ways of classification on nodes and edges. So in this assignment, I tried many combinations of layouts, ranking of nodes and partitioning to see how different dimensions are combined to show more information about baptism of native American.

After I go through what kinds of information provided in each columns of the provided csv file, I want to see how is family members related in baptism. However the all family relationship are not recorded with usable format and the file is sorted by timeline, which makes it hard to inspect the relationships of family members. Therefore, I attempted to involve all family relationships in edges. I classified edges with four kinds: parent-child, couple, sibling and relatives.

First image here is the default layout after I import the csv files of nodes and edges. The import of data in Gephi is very flexible and it allows user to do editions to the data tables and export the data. My original Gephi project went wrong and all data is lost, but thankfully I export my edges table right after I completed it. Each nodes represent a person and edge represent there is family relationship between two people. The data is consisted of over 300 nodes and over 500 edges, so it looks very messy at first, so we can hardly get any useful information from the graph right now.

Then I partition the nodes with by whom each person are baptized and use Force Atlas and Yifan Hu layout to the graph by sizing nodes with degree of them. From the graph we can see that the layout forms of both Force AtlasYifan Hu put closer related nodes together and each group of nodes are usually in the same same color, which makes perfect sense that relatives are more possible to be baptized by the same person. Then I use Fruchterman Reingold layout to have better view of relationships of all people.

When I partitioned the nodes with nation in Yifan Hu layout, Gephi provides me an interesting graph. We can see that most people are from three nations: Delaware, Wanpanoag and Mahican. The top right corner is mainly consisted of green, top left and bottom right are mainly consisted of purple and left bottom corner is consisted of blue. The obvious separation between three nations is actually very reasonable; it is intuitive that family members are usually in the same nations. But if look closer to the boundaries of each sub parts, we can see that the density of edges is significantly lower that the partition of each nation. So compared the Yifan Hu graph above we can assume that the difference between nations contributes more to the grouping of nodes than by whom each person is baptized.

Although the graph looks cleaner, it still provides limited information of the data. One advantage of Gephi is that users can use different partitioning and layouts to make comparisons. Therefore, I tried several different partitionings and had insights into one small group of nodes. Nodes are partitioned with baptized by whom, Eigenvector centrality, modularity and nations. We can see that family members share a lot commons in baptism. In the graphs I listed below, we can see that this group of nodes is mainly consisted by two sub networks, which are connected by “Esther” in the middle. And we can see that those two sub networks are partitioned with different colors in both Eigenvector and nations partitioning, so Gephi works great on separating nodes and family members are usually highly related in baptism.

I also apply ranking of degree on nodes, and I took insight into four nodes with highest degree. Interestingly, those four node have different features: Augustus has two wives connected by two thick green edges, who are Ana Benigna and Esther; Salome has a lot of siblings; Nicodemus has a lot of children; Abraham has a larger family tree. Although those four have different structure of family network, all of them show that family has big influence on the spread of Christianity.

Because I manually typed in all edges for the graph, I found many interesting features in the data, which are not revealed in Gephi due to the lack of consideration on time. Because people are sorted by the date of baptism, people who were baptized earlier are usually parents. However, there are special cases that parents were baptized later. Moreover, there are also a lot couples that one of them may be baptized after marriage (obvious for those who have second wives) and many kids are baptized in their young age. Therefore, we can assume that family relationship contributes a lot in the spread of Christianity.

I also found some weird problems when I was using Gephi. Apart from the failure of loading my previous object, the percentage calculated in partitioning of edge also seems wrong. The proportion of each kind of edge is right, but the sum of them is only 1%.

Above all, compared to all data visualization tool we have been learning this semester, Gephi is the most powerful one, which provide comprehensive tools, flexible manipulation on data and more aesthetically pleasing features. After this assignment, I have learned many useful skills of Gephi, but I found there are still many features I haven’t used, so I hope I can learn and use more powerful tools and features in Gephi in future data visualization projects.

Categories
Assignment 5

Assignment #5 (Omar G. & Bryan M.)

For this particular assignment, Bryan and I analyzed the data from the Baptized Indians Database. We were able to do so by using Gephi visualizations. Using Gephi help reveal patterns and also allowed us to explore and manipulate the data to point out trends of the baptism relationships between Native Americans during the time. We first started by making edges for the people we were assigned to with the ID 175-225. For our visualization, we ended up with 97 nodes and 86 edges. Each node specifically represents a Native American who fell under our category while the edges show the numerous interconnections made between the natives. When we first saw this visualization below we were confused to say the less about what Gephi was trying to show us due to a couple connections that weren’t found on the spreadsheet thus making it hard to follow along initially. You can only see some connections however its hard to follow along because of the lines crossing one another. We then proceeded to play around with Gephi in order to get a better understanding of the data.

[gview file=”http://humn2702018.blogs.bucknell.edu/files/2018/04/raw-version-1.pdf”]

We first started my adding color (modularity) to our visualization to help further break down the data given. By adding color to the graph Gephi allows the viewer to see what group each person is linked with. By doing this it gives a viewer an easier graph to look at compared to the visualization we started out with each node being the same color. You are now able to see the different communities within this visualization.

[gview file=”http://humn2702018.blogs.bucknell.edu/files/2018/04/modularity.pdf”]

We decided to break it down even more and looking at the different connections between each of the communities. In this visualization to highlight the interaction between each person(node), we decided to adjust the size of the nodes according to their class and rank. Furthermore one can imply by looking at this visualization to see how Christianity was spread back then which was usually through marriage which explains the connection between the different color nodes.

[gview file=”http://humn2702018.blogs.bucknell.edu/files/2018/04/indiansmarinekf.pdf”]

The last feature we decided to use was a visualization showing their names and the various relationships between each baptized person. This visualization shows how each person is either related somehow or come in contact with each other. The color of each node depends on their activeness within their community and spread of Christianity. I find this visualization more meaningful than the others because it shows the overlap of the communities and how they networked back then as well it shows you the more important people in the groups.

[gview file=”http://humn2702018.blogs.bucknell.edu/files/2018/04/noverlap.pdf”]

In conclusion, Bryan and I believe Gephi is a helpful visualization site but it can be very difficult at operating sometimes. Overall we both struggled with learning how to use Gephi but are very appreciative of being able to learn how to use another visualization site. However, as time passed Bryan became more comfortable in using the site compared to me. Compared to other tools we’ve used before Gephi isn’t the most user-friendly and because of this, it can be tough at pulling and analyzing the data with this visualization tool. We quickly realized though as we added more features to our data the more knowledge we were able to gain. We like that you could play around with some tools from the site however we both said we wish it could be more interactive like voyant or jigsaw which both had an ultimate amount of resources to play with and extract data from.

Categories
Assignment 5

Assignment 5

Gephi visualization representing relations between Moravian missionaries in the mid Atlantic states in the 18th century

Gephi is a data visualization tool that allows the user to input data and manipulate it in numerous ways. It has the capability to visualize different relationships within the data across multiple dimensions as well. Steve and I were taking a look at 49 different Moravian missionaries who had been baptized in the 18th century. Using the Moravians from ID number 126 to 175, we broke down their data into ID number, name, relations to others within the data set, and the nation they belonged to. After putting the information into Gephi, we started testing with the different options that Gephi had to offer to visualize our data with focuses on statistical calculations such as modularity, degree, and betweenness (or Eigenvector value). These were essential to the project since we were looking at a very tight knit community of people. Paranyushking writes about how all of this information can be allowed to “speak in its multiplicity”, where it tells the story through the information given (Paranyushkin 2011).

It was difficult to figure out the different aspects of Gephi without playing with the options first. To begin with, we had to build up our data. Connecting each Moravian within the table to their relative was the first step. After building undirected links between all 49 Moravians, we were able to begin visualizing the data. What we initially got was a very “bare bone” visualization of our information. It was after that that we started playing with the different options

Gephi “bare bone” visualization

What Steve and I decided to investigate next was adding in a different dimension of information to be visualized. We began to look at the Native American nation they represented.

Gephi visualization with a focus on nation within a family

With this visualization, I was able to start answering the question of how important it was for Christianity to be spread among family’s of Native Americans. By making the nation visible with the partition option, we got to emphasize the importance of how a family may have been baptized together with potentially the same baptizer rather than individually. The edges between each family’s relation also represent the top three nations in the data. These were the Wampanog (41.6%), Delaware (31.5%), and Mahican (21.2%), which made up a majority of the nations represented. This proves that family played a large portion in the spread of Christianity for the Native Americans.

Gephi representation of group mingling

Keeping the visual representation of the nations with the color schemes, we saw that some families had intermingling of other families. This showed that despite what nation you came from, something such as Christianity could still be spread from family to family, and nation to nation. This would then lead to more baptized natives, and more influential families that spread the idea of Christianity through their relationships with other families.

After playing with the different connections, we decide to look at the statistical concepts of modularity, degree, and betweenness or Eigenvector. To isolate each concept, we changed the way the visualization looked with a focus on each.

Salome appears as the biggest name, which means that Salome had the most connections with other family members. We used the average degree report to determine how often the natives were related to one another and came up with a value of 1.57. What this means is that our families were closely related, but not very large. With this information and information we have gather from the previous visualizations, we can determine that a majority of the Native Americans in the entire data set were more than likely small families.

Visualization of Degree
Visualization of Modularity

Modularity represents the community structure within a network. Modularity will focus on the nodes and edges in the visualization. Since we have 89 nodes and 70 edges, we used the Force Atlas visualization. After adjusting the “nodes” tab to show their ranking as a degree, we had to set a much larger range so that the nodes appeared much bigger than usual since many of the families were already so small. Paranyushkin talks about how modularity that is measured greater than 0.4 proves that a partition produced by the algorithm can show a distinct relationship within the data. In order to prove his point, we ran the modularity report in Gephi and measured a .9. This proves Paranyushkin’s theory that there are direct links in the modularity that are much stronger than the average links in the entire data set (Paranyushkin 2011).

Visualization of Betweenness

Finally, betweenness is represented as how often a node appears on the shortest path between two nodes in a network. The higher the betweennes value, the more important the node is based off of its presence in the network. Under the “Network Diameter” option in Gephi. It showed that we had a diameter of 8. With Gephi’s resizing tool, we could make the node bigger based on its connections. Salome had the largest betweenness value at 127.

There was definitely a learning curve when it came to using Gephi. Steve and I both had different problems at different times when it came to visualizing our data. More times than not, I would find that the program itself would lose portions of its functions, like my random inability to use the drag tool, or Steve completely losing the Layout widget entirely. But despite its random glitches and dropping our work before we’ve saved, Gephi is a very powerful tool to use when visualizing information in multiple dimensions. When you begin with the first visualization, it becomes almost like a waterfall effect. Each visualization reveals and builds onto a different dimension that can be investigated.

 

 

Works Cited

Paranyushkin, Dmitry.  “Identifying the Pathways for Meaning Circulation using Text Network”

 

Categories
Assignment 5

Assignment 5

For this assignment, in order to portray the baptism information from the Moravian missionaries in the 18th century, I decided to create a visualization using Gephi that focuses on the gender of those that were baptised and how many connections they had to the others in the dataset. My visualization is featured below:

My visualization has 376 nodes and 70 edges. I used all of the nodes in the dataset, but I only created edges for the 50 people that I was assigned. Although I think the visualization presents the data that I included in a very useful way, seeing all the edgeless nodes surrounding the center shows how big the dataset was and how the edges that I included are just a portion of the people that were baptised. However, by only using these 50 nodes, it allowed me to visualize specific individuals more clearly.

I represented the data primarily through the color of the nodes and edges and also through the node size. The color of the nodes are either pink or blue, pink representing females and blue representing males. The edge color is pink if the connection is between two females, blue if the connection is between two males, or purple if the connection is between a male and a female. I found the gender aspect of this data to be very intriguing because I wanted to see if well connected individuals were typically male or female and what that could mean for the data as a whole. The size of the nodes represents the degree and how many connections a specific person has. The more connections, the bigger the node. The nodes on the outside are all equally small because I did not enter their edges into the data laboratory. All the edges that I added are visible in the center of the visualization.

So, what does this data show? This data shows that Elizabeth, Beata, Zipora, Christiana, and Esther are the females with the most connections and Petrus, Benjamin, Nathanael, and Thomas are the males with the most connections. There are 5 very well connected females and only 4 well connected males present in this visualization. Of the 50 people that I created edges for in the dataset, 31 were female and 19 were male. This is evident with the higher number of pink nodes that you can see in the center of the graphic. This tells me that the females overall had more connections than the males. This made me curious as to what the roles of the women are in these communities and how women are viewed by both the missionaries and by the males in the dataset. The women seem to play a very important role, given how many of them are so well connected to others. Also, most of the nodes that were the largest, meaning the nodes that represented the people with the most connections, tended to be related to the other people who also had the highest number of connections. For example, Zipora’s father is Petrus, her son is Nathanael, and her husband is Benjamin. All four of these names had some of the highest number of connections of the 50 people I included. This made me wonder about the roles of specific families and if some families were more active in the community or if they played specific roles in their community that were considered to be very important to others.

Figuring out how to create this visualization in Gephi was very challenging at times, but once all the data was in, it was very fun to play around with the many different layouts, statistics, and partitions that changed the look and meaning of the nodes and links, just as we’ve read about in our Meirelles readings. However, I wish Gephi allowed for more interaction and more features on the viewer’s end because I think this dataset would be very interesting to explore further. I also would like to see more dynamic visualizations because with such intriguing stories, being able to captivate the viewer would be easier with more freedom and creativity in the design. Just as we explored through the Segel and Heer reading, I think it would contribute more to telling the story of these people that were baptized rather than just showing them as colorful nodes. For example, seeing a time feature in the visualization would be another aspect that could offer insight into their stories. However, with that being said, I think Gephi is the most interesting platform we’ve worked with so far and I really enjoyed the challenge. I was able to explore this data thoroughly and I felt like I had a lot of possible directions to go in when doing so.

Categories
Assignment 5

Assignment 5

In this assignment, I created and analyzed network graphs by using the records of the Moravian missionaries in the mid-Atlantic states in the 18th century. After briefly looking at the data, there were a few questions that I wanted to explore:

  1. Were proximity and social connections important to the spread of Christianity?
  2. Who were the important people in the network?
  3. Were there any patterns in how the baptists choose people to baptize?

To be able to input the data into Gephi, I first ran a script using the original spreadsheet to generate an edge table, resulting in a database with 377 nodes and 490 undirected edges. Although my method is faster than manually extracting the data, due to my lack of text analysis knowledge, marriage, etc. relationships are all represented by the same kind of edge in my data (I feel like Drucker would not like this at all). This might hinder my further study of the data. However, for the questions that I proposed, I do not think it greatly influenced my interpretations.

 

Colors: nation. Sizes: betweenness centrality (left) & eigenvector centrality (right).

 

Above are the force-directed (Force Atlas layout) representations of the network. The nodes in both of the graphs are colored by nation. The nodes are sized by betweenness centrality in the left graph and by eigenvector centrality in the right graph. In both of the graphs, we can see that there are clusters of colors, which indicates that proximity did play a role in the spread of Christianity. In addition, there are also connections between clusters of different colors. This can imply that either Christianity expanded through marriage or those people married because they shared the same faith, which in turn, solidified the position of Christianity in the community.

I made two slightly different graphs because I think that one can be in the shortest paths of many connections and be somewhat insignificant at the same time, which can be seen in the change in size of some nodes between the two graphs. On the other hand, one can be influential in a large community, but that community is only in the periphery of a larger community, which might result in a low betweenness score. Hence, to find the most important people in the network, I used the graph on the left and filtered out nodes with eigenvector centrality < 0.5. The result is shown in the graph below.

 

Important people in the network.

 

I think that the baptists were somehow also aware of the importance of these people. I created two more graphs using the Circle Pack Layout with the first hierarchy being the nations and the second hierarchy being the baptists. The nodes in both of the graphs are sized by eigenvector centrality. In the left graph, the nodes are colored by the nations and by the baptists in the right graph.

Circle Pack Layout. Nations (left) & Baptists (right).

 

Most of the baptists seem to have focused on maximizing the number of regions they went to. In addition, they baptized the people with high eigenvector centrality in each nation. Although this is not rigorous, I arbitrarily checked the dates for some of the big nodes and the nodes surrounding them and found out that the bigger ones tend to be the ones that got baptized earlier. Thus, there seems to be a pattern among these baptists to baptize the most important people in as many regions as possible.

Overall, I think that Gephi is an amazingly versatile visualization tool with a quite usable interface. However, there are also some aspects of Gephi that I found limiting. For example, I was not able to embed a timescale in my visualizations, which is one of the four principles of network visualization mentioned in the 3rd chapter of Lima’s book. Nor was I able to easily put the nodes in a map layout as I could in Palladio.

While making these network graphs, I was also aware that they are not a representation of the data but rather a story that I wanted to tell about the data (again, Drucker would disagree with my method). I proposed questions about the data that I sought to answer. In other words, I chose to omit aspects of the data that I did not care about. I do not think that data visualizations can ever be subjective, even the act of collecting and organizing data contains biases in itself. However, as computer scientist Bret Victor said, “[a]n active reader doesn’t passively sponge up information, but uses the author’s argument as a springboard for critical thought and deep understanding.”

Categories
Assignment 5

Assignment 5

Final graphic representation of small community

In this visualization, we are observing a small sub-group of Indians that appeared in my data of 75 people. To arrive to this data, this first step was tackling the use of gephi. Gephi turned out to be far easier to use than expected. With a little bit of tinkering all of the basics are readily available and fairly self explanatory. After creating roughly 80 edges for my nodes, I used the Yifan Hu layout to not only make the data more aesthetically easier to inspect, but to find anything interesting in the rather sparse amount of edges. Additionally, the data displayed multiple dimensions of the data, namely Indian nation, sex, and community connections (spousal, generational, and sibling). With all of these inputs the data still looked uninteresting, so I chose a small group of Delaware Indians to focus on. For analyzing purposes the size of the names/nodes is proportional to degree. The color of the names is sex, male and female are orange and purple respectively. The color of the nodes is the nation they are affiliated with, most important for us are the purple Delaware Indians and the blue Mahican Indians. Lastly the color of the edges is the relation of the edge, green for spousal, purple for parent to child, and blue for siblings. Some edges I’ve removed for the sake of clarity. With respect to our three calculations of modularity, degree, and eigenvectors I found them mostly inconclusive.  Degree is well represented, and so is modularity by the communities displayed, however eigenvectors failed to make the visualization any more descriptive or interesting, and as such I left them out. Our story focuses on the family trifecta of Petrus, Nicodemus, and Gideon. These three brothers all appeared within my spread of 75 in some way. I think its just fun to observe that each has gone and done their own thing in life. Although difficult to visualize, each went their own ways, represented by them each dying in different places. Nicodemus ended up in Nain, and there he had 5 children. He was married to Lucinda, shown next to him in the visualization. Nicodemus was a randy fellow, as he had the most offspring of the lot. He also appears to have been fairly successful in what he made of himself. One of his children, Zacharias pictured in the top right actually married a daughter of a Mahican. Petrus holds a similar story. He married a Theodore, which may be in a error in the data, otherwise Indians were very forward thinking. Either way Petrus had two kids, one is explicitly mentioned to be adoptive, the other may be as well. If this isn’t an error in data, perhaps they were actually homosexual. Even more impressive is that Petrus clearly made a good name for his family as well.

One of his adoptive daughters married a man who was previously Mahican and married. After his divorce he married Abigail, the daughter. Of course we’ll never know what happened, but we could even postulate that whatever Petrus had going on for himself was enough for a man to divorce his first wife, switch camps to Petrus’s, and then marry one of his daughters, all while being gay in the era. Of course none of that may be true, but exploring these what if’s is what I think visualization is about. And then there’s Gideon, who just had a good life with a wife and some kids. I think this story I found is interesting because it doesn’t show anything incredibly interesting. It simply shows a small merging of people that likely had no huge impact, but we can see this little story play out with just a couple of data points.

Size here is represented by eigenvectors and color by modularity

 

Categories
Assignment 5

Assignment 5

Through using Gephi, the baptismal relationships between Native American Indians will be analyzed. From the initial dataset of 375 nodes, I closely analyzed the first 25 nodes, which each represented a unique Native American.

The main goal of these visualizations is to further understand the complex relationships between the various elements. As Isabelle Meirelles discusses in Design for Information, “Node-link representations use symbolic elements to stand for nodes and lines to represent the connections between them” (55). Using Gephi, a structure of nodes and link connections were created.

The given dataset was a Native America multidimensional, baptismal database. In the Gephi database, each element or person is represented by a node, and I connected the nodes with edges based on the relationships with one another. These edges in Gephi specifically represented husband, wife, brother, son, stepdaughter, widow, or null.

With these edges consistent throughout the visualizations, the three different dimensions that I examined closely were modularity, degree, and Eigenvector.

 

Modularity –

The first step of creating a visualization based on modularity was to color the nodes based on modularity class. To further emphasize differences in class, the node size was also adjusted based on rank.


The attribute of color distinguishes different modularity classes from one another. With the colors, the reader is able to see the main clusters and different groups that the Native Americans were in. Since not all the edges were created for the full data set, many nodes are left gray in the visualization, which represents them being disconnected from the rest of the data set.

Without many groupings showing up with the red through gray modularity color scale for the nodes, the most striking part of the visualization are the edges between the nodes, which show various relationships between the baptized Indians, or adjacent nodes.

 

Degree –

For the degree visualization, the nodes are colored based on the nation of the baptized Indian, and are sized based on the degree. This graphic visualizes the story of how nation and relationships between nodes are related.

Each node is colored based on nation.

The size of each node is based on the degree of the element.

With these ways to partition and rank the data, below is the visualization created.

From this, the reader can interpret how baptized Indians are related to one another based on node size (strength of degree), node color (nation), and edge color (relationship between nodes).

 

Eigenvector –

For the Eigenvector visualization, the nodes are colored based Eigenvector statistical calculation. This graphic visualizes the relationship between Eigenvector values and the relationships with adjacent nodes.

Each node is colored based on Eigenvector calculation.

Each node is sized based on the Eigenvector statistical analysis as well.

With the visualization, the Eigenvector calculation takes into account the degrees of adjacent nodes.

Overall, this visualization is difficult to read, and would be a stronger visualization if there were more connections within the numerous data points. There is an unproportional amount of nodes to number of edges, and overall the set doesn’t create the most meaningful visual.

This is my main disagreement with visualizations in the Gephi platform. The platform is not user-friendly, and is extremely difficult to work with as an author. As an author, the statistical analysis logic and creation is complicated, resulting in extremely complex visualizations that are potentially too sophisticated for readers to completely understand.

In addition to that, unless the visualizations are exported with the Sigma.js extension, the visualizations are static and are not interactive. This results in even more difficulty in analyzing the data further as a visual form.

As the reader can see from these visualizations, it’s very difficult to interpret much from this small a dataset. As Meirelles discusses, “most problems faced by node-link representations are caused by the occlusion of nodes and link crossings, which obliterates the structure it is supposed to reveal” (56). Since there were many gaps in this data set, the full narrative could not be visualized. Overall, the visuals are telling in the connections between the various baptized Indians, and the colorful edges tell a story about the connections. However, without the full extent of the connections between all nodes, the visualizations are not capable of telling the full story that the Moravian missionaries were trying to capture through their records.

Categories
Assignment 5

Assignment 5 – Steve

As Paranyushkin observes, the ability to visualize relationships between multiple dimensions of data “unlocks the potentialities present” by seeing data in “a non-linear fashion, opening it up for interpretations that are not so readily available” (Paranyushkin 2011).  Gephi, a network analysis and visualization tool, is invaluable in helping researchers generate detailed network graphs and metrics that may otherwise remain hidden in data or text.  By manipulating different options within Gephi, a variety of relationship maps can be created which emphasize different aspects of the relationships within the data.  This flexibility opens the door for researchers to explore new meanings and interpretations of the data.

The process my partner (Katie) and I took to learn about Gephi included viewing video tutorials found on the Gephi website (https://gephi.org/) and reading the training collateral that walks users through ‘starter’ projects from beginning to end (e.g., http://www.martingrandjean.ch/gephi-introduction/).  These seem helpful for novice users like ourselves, though Gephi is an advanced tool which we feel expects users to be acquainted with the concept and metrics behind network graphs as a prerequisite.  Although, I believe it is important to state that even with the help of these tutorials, I sometimes could not get Gephi to work properly on my computer.  Luckily, my partner Katie’s laptop was capable of acquiring the visualizations we wanted.  (Therefore, most of these screenshots come from her laptop).

Once we were comfortable enough to begin the process of using Gephi, we built the nodes and edges data, which is the cornerstone for driving graphs and statistical metrics within the application.  The data used for this project consisted of a subset of records produced by the 18th century Moravian missionary on Native Americans being baptized in the Mid-Atlantic states (each missionary, as part of their spread of Christianity, wrote down key data on each person baptized including names, where, when, location, family relations).   For our edge data, we focused solely on the family relationships within the baptized Native Americans. The process of building the edge data was time consuming, because it involved reviewing records and creating links manually. Once this was completed and loaded into Gephi’s Data Laboratory, a network graph was immediately produced, below.

As the Gephi training modules explain, this initial graph is meant to show only a basic network model, and at this point it is up to researchers to explore relationships in more detail and variety using the power of Gephi’s visualization options. As a first step, we chose to produce a view that overlays the node labels onto the graph so that the context about what the graph represents is shown, and colored the edges to present a different aesthetic, below.

For the next view, we expose another dimension of the data by including the ‘nation’ property.  This view helps visualize if any one nation was more likely than others to have family members baptized together (rather than individually).  This would help explore the importance of close family in the spread of Christianity across the Native American nations.  Katie and I made the nation visible by selecting the ‘partition’ option under the ‘appearance’ widget, and selected ‘nation’ as the attribute to color, with the result below.

The edges in the graph are now colored by nation – the top 3 nations represented by the family relationships in the graph are Wampanog (41.6%, purple), Delaware (31.5%, green) and Mahican (20.2%, blue) – which together represent 93.3% of the total population in my dataset. This would indicate that family is an important aspect to the spread of Christianity at the time.

To spatialize the network, different layouts are possible within Gephi. The ForceAtlas 2 layout makes communities within the network transparent by bringing closer together nodes that are connected, and pushing out unconnected nodes. The result is below.

This graph is interesting because it clearly shows small pockets of members grouped together in family units, with little connection between them.  This might not appear to be a good sample of a network model with many relationships. However, I believe this visualization might well be expected from the data, since the data focuses on family relationships between the people who were baptized in my dataset.   Selecting the Force Atlas layout presents a nicer aesthetic I believe, as shown below, to see the groups of families who were baptized, and how these families have little to no family relationship to each other.

We kept the ‘nation’ aspect colored in this visualization, which indicates that most families are made up of members from the same nation, but interestingly there are a few families with members from different nations. Perhaps this shows that baptizing helped bring more people across nations together, or that families who already were made up of members from different nations were more inclined to be baptized.

Next, we explored the statistical metrics embedded within my data – degree, modularity, and betweenness. In our dataset, there are 89 nodes and 70 edges. Degree is a measure of how many edges a node contains, which explains its connectedness to other nodes in the dataset. We wanted to visualize the nodes scaled to their degree – that is, the nodes with most relations would be the largest. To do this, we kept the layout as Force Atlas, selected the ‘nodes’ tab under the ‘appearance’ widget, and selected ‘ranking’ to be ‘Degree’. We entered a large range for scaling the nodes to emphasize results (since we expected most families to be somewhat of the same size) – min size was 5, max size was 200. The result is below.

Salome has the highest degree which means she has the most family members connected to her, followed by Petrus, Caritas, Ruth, Augustus and Gideon. We also ran the Average Degree report in the ‘Statistical’ tab, which produced a value of 1.57. This would indicate that my dataset contains mainly small family groups on average, or larger families balanced by a fair number of individuals with no family connection. If we refer back to graph 4, which shows clear communities of small families, that could lead us to believe the network contains mainly groups of small-sized families.

The next metric is modularity which shows the community structure within the network. Nodes connected together, rather than with the rest of the network, are viewed as being in the same community. Regarding modularity, Paranyushkin states that a modularity measure greater than 0.4 indicates that the partition produced by the modularity algorithm can be used in order to detect distinct communities within the network. It indicates that there are nodes in the network that are more densely connected between each other than with the rest of the network, and that their density is noticeably higher than the graph’s average (Paranyushkin 2011). For this statistical calculation, we ran the ‘modularity’ report which showed a modularity value of .9, and number of communities equal to 21. Confirming Paranyushkin’s view, this network, with a modularity value of .9 (which is greater than .4) does contain distinct communities (a total of 21 according to Gephi’s count) within the network as I previously noted based on the visualization in my fourth and fifth graphs. The graph below visualizes my network’s modularity and was produced by ranking on ‘modularity class’ in the ‘appearance’ widget, using the ForceAtlas2 layout.

In terms of comparing different ways to view modularity, below represents another community visualization, this one produced by selecting the Fruchterman Reingold layout, which seems ‘prettier’ than graph 7.

 

The last metric we revealed in Gephi is the betweenness calculation. Betweenness indicates how often a node appears on the shortest path between any two random nodes in the network. The higher the betweenness value, the more central or important the node is of being a connector for the entire network graph. Gephi calculates the betweenness centrality measure under ‘Network Diameter’ option in the Statistical tab. Gephi indicated that my graph has a Network Diameter of 8. To visualize the betweenness centrality, Gephi enables node resizing according to their betweenness value – the more central the node, the bigger its size. For this view we used the Force Atlas layout, and selected min size of 10 and max size of 100 for scaling the Betweenness Centrality metric. The result is below.

We also chose to add the betweenness values as label attributes for the nodes so we could read the actual measures Gephi calculated – Salome was the largest node with a betweenness value of 127, and Theodora (and others) had the smallest betweenness value of zero.

In terms of what did not work well within Gephi, I would say there are areas of maturity problems that linger in the tool.  For example, there were times when the application lost my work and I had to restart. In another case, I mistakenly removed the ‘Layout’ widget and it took Katie and I quite some research to figure out how to reinstate it.  However, on the whole, these glitches could be overlooked because Gephi is quite powerful. I found the relative ease with which graphs could be produced, along with their associated key metrics, most compelling. With a few clicks, we were able to visualize our data in new ways we would not have been able to do before.  This kind of tool offers researchers the ability to quickly identify new connections or ideas hidden in data that may open up the door for different research paths. As Paranyushking describes, unlocking new meanings within data becomes possible with a network visualization tool like Gephi because “it allows the text to speak in its multiplicity”.

Works Cited

Paranyushkin, Dmitry.  Identifying the Pathways for Meaning Circulation using Text Network

   Analysis. Nodus Labs, Berlin. October 2011.

 

Categories
Assignment 5

Assignment 5

The goal of this assignment is try to understand the data in the Baptized Indians database using Gephi visualizations. I made edges for people with ID 276-325, so all my visualizations have 375 nodes and 75 edges. Below is the default layout of my data sheets when I finished entering the 75 edges to the database.

Default Layout

In this visualization, I can only see some edges between nodes, most of which are thin and two of them are strong. Nearly nothing more can be shown by this graph. From this point, I begin to add features to this visualization.

Color-Modularity

First, I add modularity as a color attribute to this graph. This attribute shows people in different groups, connected with edges I add before. Now, I can see different small communities inside this large group of people in different colors. Moreover,these gray nodes are people that are not connected with others. In other words, they may have relationships with others, but since I only added edges for nodes with ID 276-325, their relationships are not shown in this visualization.

Color-Modularity, Layout-ForceAtlas

In order to view the inner relationship more clearly, I use Force Atlas as the layout. In this visualization, I can easily distinguish different small communities, in which nodes form a connected tree. I can see that the purple group at the top left corner contains most number of nodes, thus learning that Magdalene, represented as the center node in that group, has played an important role in that community.

Color-Ranking-Degree, Layout-ForceAtlas

Furthermore, I use the ranking feature and select degree as the ranking attribute and generate this visualization, in which nodes with deeper colors have higher degree. In this visualization, I can obviously see that the center nodes of the left group and the top left group have greatest influence on the relationship, since their degree are the highest. Also, I can figure out that the middle part of the visualization are the nodes that are not connected.

Color-Nation, Layout-ForceAtlas

Next, I try the nation feature. This attribute seems to be really meaningful, since it is evident that in most communities, people are in the same nations. Therefore, it shows that the native nation is a really important element to the spread of Christianity.

Color-Degree, Size-Betweenness, Layout-ForceAltas

Since I’m not satisfied with two-dimension visualization, I add size as a new dimension. In the above visualization, I visualize degree with color ranking and betweenness with size. The result is really attracting. I can see that nodes with more degree have more betweenness and there are a high-degree class and a low-degree class in most communities. Therefore, I can learn that during the spread of Christianity, active individuals are significant, since they can lead to wider spread. And I can speculate that if more edges are added, the spread will become hierarchical, with some most significant people with highest degree and highest betweenness.

Color-Degree, Size-Eigenvector, Layout-ForceAltas

In the above graph, I replace betweenness with eigenvector value. There are only a few differences between this visualization and the former one. First, I notice that nodes in the left group become larger. Second, nodes in the top right corner become larger. I think it is because eigenvector value takes into account the degree of their adjacent nodes, while betweenness only depends on the degree of nodes themselves. Therefore, the result I can get from this visualization is the same as the above one.

Color-Nation, Size-Betweenness, Layout-ForceAltas

Finally, since betweenness can show degree in some way, I replace degree with nation as the attribute visualized by color. This graph itself can now show a lot of information. First, I learned that nation is an important attribute of the spread of Christianity. In same nations, the spread may be easier. However,  in some situation, spread across nations can happen. It is because that the edges here represent some kind of family relations and it is more probable for people in the same nation to get married with each other. Therefore, I can claim that family relations is also a key element in the spread of Christianity. Second, I observe that the purple nation have the most people who are involved in this database, green and blue coming next. Therefore, I can speculate that Christianity is more popular in these nations than in black or orange nations. Third, this visualization shows that the purple nation tends to have more tightly connected communities, since some groups have large betweenness compared with others. Furthermore, I can identify that the largest node, which is the center node of the group in the top left corner, is the most influential people in this visualization, since this node has the greatest betweenness.

 

I’m surprised that simple actions on Gephi can reveal so much information in the database. And the beautiful graphs are really fascinating. As described by Edward Segel and Jeffrey Heer, “Crafting successful ‘data stories’ requires a diverse set of skills.” I think using Gephi is such a good skill to learn.

Categories
Assignment 5 Uncategorized

Adams Assignment Five

     In quoting Ben Schneiderman, Isabel Meirelles opens her chapter on network design structures by articulating the positive attributes of these types of visualizations, “‘Social network analysis complements methods that focus more narrowly on individuals, adding a critical dimension that captures the connective tissue of societies and other complex interdependencies’” (Meirelles 47). Throughout my experience learning Gephi by using the Native American baptismal database, I have found this program to be incredibly helpful in painting this picture of “interdependencies” and relationships that is difficult to see by looking at metadata alone. Unfortunately, gaining a true understanding of the relational characteristics Gephi is capable of visualizing takes a certain amount of discipline in avoiding the mutual exclusion of visualization and analysis. While Gephi does not necessarily “hide” anything in terms of calculations, as users are offered an intimate look into what is being done when statistics are calculated or force-directed layouts are implemented in visualization, this opportunity to truly understand how data is being translated is easily ignored by individuals caught up in the “click-aha!” trap that is so common with digital visualization platforms (for example, when using partition tool to color and size nodes or manipulate edges). In this way, my experience with Gephi harkened back to Elijah Meeks’ work: “…I spent my time teaching folks how to use Gephi, and I tried to spend some time telling them that the network they create is the result of an interpretive act. I don’t think they cared, I think they just wanted to know how to make node sizes change dynamically in tandem with partition filters” (Meeks 2). This experience of Meeks’, which I perceive as an all-too-common one for those working in Gephi, also opens the door to some of Johanna Drucker’s skepticism, “So the first act of creating data, especially out of humanistic documents, in which ambiguity, complexity, and contradiction abound, is an act of interpretative reduction, even violence. Then, remediating these ‘data’ into a graphical form imposes a second round of interpretative activity, another translation” (Drucker 249). Simply put, by tying my short time with Gephi to the writing of Meeks and Drucker, I was able to arrive at one of my most unavoidable critiques of Gephi: that the platform, hard as it may try to avoid this, allows users to ignore the fact that their data has humanistic, nuanced, and narrative elements behind it (although the Data Laboratory tab is helpful in keeping individuals from being too far removed from their database to begin with). Unless individuals take the time to slow down and understand what is going on when different statistics are calculated or relationships are generated, the true power of Gephi is rendered almost useless.

 

For reference below: Edge color key and proportionality
For reference below: Node color key and proportionality

   

 

 

 

 

In terms of my visualization, I chose to generate a relational network consisting of Native Americans and baptizers (nodes). These nodes were connected by edges representing both baptismal and kinship relationships (for individuals labelled in the database with Unique ID’s 26-75). In total, my multimodal visualization, which utilizes a force-directed layout based on the Fruchterman-Reingold Algorithm, has 404 nodes and 438 edges (with most nodes representing Native Americans and most edges being classified as baptismal). Once I created these connections in Gephi, I elected to color both nodes and edges with nodes being colored according to an individual actor’s nation (or “baptizer”) and edges by the type of relationship represented (for example, baptismal, marital, parental, etc.). The statistics that I elected to run in order to analyze the baptismal database were Degree, Modularity, and Eigenvector Centrality.

Visualization with nodes sized by “Degree”

 

Degree:

By using the “ranking” tool in Gephi to size nodes according to degree (not in-degree or out-degree due to some edges being undirected) I was able to glean several important conclusions from the network. I immediately noticed that the nodes representing baptizers were the most impacted by sizing according to degree (more so than Native Americans). Considering baptismal relationships make up a significant portion of the connective tissue of this relational network, running a calculation for degree shows how influential individual baptizers were in spreading Christianity (baptizers with high degrees, like Camerhof, Christian Rauch, and Marin Mack were more influential in the spread of Christianity than those with lower degrees, like Grube or Utley). The degree calculation, which shows the number of connections a node has, also introduced me to Johannes, a fascinating character in the story of the spread of Christianity. The node representing Johannes was noticeably larger than those of other Native Americans when size was dependent upon degree. This is because Johannes was not only a Native American and Christian convert, but also a baptizer himself. Therefore, he effectively helped to spread Christianity through baptismal relationships, not just kinship ties (which was a characteristic that distinguished him from the other Native Americans in the database).

 

Visualization with nodes sized by “Eigenvector Centrality”

 

Eigenvector Centrality:

Unfortunately, I did not find this statistic to be terribly enlightening while I worked in Gephi. I believe this is likely because a majority of the edges I have in my database are directed, baptismal connections. For this reason, the only members of the database that have a real opportunity of having a high Eigenvector Centrality are baptizers that are connected indirectly through the limited kinship ties I was able to generate (as these would connect well-connected baptizers to one another). In my visualization, this resulted in Martin Mack and Cammerhof (along with the Native Americans whom they baptized) to have the highest measures of Eigenvector Centrality, as their “baptismal worlds” were the only two that were brought into contact with one another through the edges representing kinship ties.

 

Visualization with nodes sized by “Modularity”

 

Modularity:

After sizing nodes according to modularity, I was faced with another interesting iteration of my network diagram. As can be seen above, the modularity calculation helped to present several “small worlds” hovering around the outside of my force-directed graph. Although I was initially confused by this image, I soon came to the conclusion that these small worlds likely would not exist had I manually entered edges representing kinship relationships for all Native Americans in the database (beyond just 26-75). In my visualization, some people may have artificially high modularity for this very reason (because baptismal connections are present but kinship are absent). Essentially, I believe I have created a visualization that contains satellite baptismal communities absent of the familial ties that could effectively deflate the modularity statistic.

Following from this, the fact that modularity is relatively low amongst certain baptizers connected to Native Americans with kinship ties present also helps to show the tendency of different baptizers to work with members of singular families (as well as cross national boundaries). This is due to the fact that multiple baptizers working with members of singular families (for example, spouses being baptized by different people) helped to generate a highly interconnected Christian network of Native Americans and baptizers and effectively eliminated the presence of “small worlds” in certain areas of the network (baptizers connect different families and limit isolation in the network).

Classifying edges proved to be extremely helpful in arriving at this inference, as it demonstrated that individual baptizers likely did not share intimate connections with specific Native American families. This is shown by the colored edges themselves, as kinship relationships visually represent connections between “small baptismal worlds.” (for example, spouses, brothers, sisters, parents, and children appear to be rarely welcomed into the Church by the same baptizer).