Analyzing the Ethical Consequences of Popular Clustering Algorithms

Jump to: navigation, search

Mentor: Dr. Shion Guha

Researchers: Griffin Berlstein, Justin Miller


Data-driven algorithms are almost ubiquitous in modern social computing, where they make inferences regarding the behavior of users in order to better predict their needs and probable future actions. The common narrative is that these algorithms are neutral, i.e., that they operate on and analyze data without any regard for what the data contains. In some situations this may be the case; however, there is an inherent danger in assuming neutrality on the part of these algorithms as their behavior with border cases demands the use of assumptions. When data points evade simple classification algorithms can either leave them as outliers or place them into a category that they might not properly fit into. This means that depending on the initial conditions of algorithms, data points will end up in false isolation or false aggregation, and while this can seem like a natural hazard of data classification, it becomes problematic when the data points are individuals, rather than numbers. Anyone using these algorithms will believe the narrative of impartiality and will make decisions based on an analysis that may not properly represent the people being analyzed.


In this exploratory project, we will examine how popular clustering algorithms can introduce bias to a data set by testing clustering algorithms on crime and census data to see where bias is introduced. Once this is done we will test the algorithms to determine how their initial configurations can be modified to ameliorate the bias introduced.