Blog 1: Things to Think About When Creating Datasets

This week’s readings and class discussion dealt with the issues historians must contend with when creating datasets for a database. Some of the problems that we mentioned in class is the ability of categories to dehumanize people, geographical creep, multiple names for a single person, and sorting sources with spelling discrepancies. As a historian whose main focus is Indigenous history, many of these issues are relevant to my own research. A lot of Native Americans that I study and write about have multiple names. For example, Red Jacket, the Seneca chief of the Wolf Clan from the 1790s through the 1830s had at least 3 names during his life time. He was know as Red Jacket by Americans, was called Otetiani during his childhood and was referred to as Sagoyewatha during adulthood by his fellow Seneca. In class, we learned that one way to address this problem is to use normalization when creating datasets. This allows us to set up the database so that anytime any of these names are referenced in the primary source, the dataset will recognize that they all refer to the same person. While I think normalization will be a helpful tool for me when creating databases, I don’t think that it addresses all the problems that can arise when studying Indigenous history. In order to normalize the data, you still have to choose one primary name that all the other names will refer back to. This can be problematic because it forces historians to choose which name to use as the main name. This risks privileging Euro-American names over Indigenous ones.

The problem of multiple names extends to other aspects of studying Native American History. Battles, places, and events can also have multiple names and historians must still choose which name to privilege when creating a dataset even if they use normalization. Also, some sources created by non-Natives such as newspapers, personal letters, and government records might refer to groups of specific Native peoples as simply “the Indians”. This can make it hard to determine exactly who “the Indians” are and what band, tribe, nation or clan they belong to. One way to try to overcome this dilemma is to once again, try to normalize the data and look at other primary sources related to the same historical event to if they are more specific.

In class we also discussed the potential for databases to dehumanize people. For example, when looking a statistical data for wars or massacres, historians can be talking about a group of people without ever knowing their names or their personal stories. I think that when using statistical data about actual people, it is important to also try to give a face or voice to these figures by adding primary sources that offer insight into their lives. For Example, if I were to include a list of Arapaho casualties from the Sand Creek massacre, I would also try to include letters or newspaper articles about the massacre its victims to add context.

We also talked about how binary categories such as race or gender can exclude or misrepresent people. For example, for years Virginia’s census records had no category for Native Americans and lumped all Natives regardless of their self-identification as being “colored”. This can make it hard to figure out how many native peoples lived in a location at any given time. While, you can sometimes look at peoples names to try to determine ethnicity or heritage, this is not reliable and can lead to misinformation.

Lumping people into binary gender categories while creating datasets can also prove problematic. When doing research on Native American women war participants, I discovered that some Native Americans self identified as a third gender or as being a “two-spirit” despite primary sources such as newspaper articles, referring to them as either male or female. If I came across this issue while creating a database, I think that I would add a gender, non-binary category.

Overall, this week’s class discussion has led me to think critically about the complexities of creating databases. It made me realize that even well intentioned historians can add to the silences in the historical narrative if they are not careful.