Hierarchical clustering is a method in data analysis and statistics used to group similar data points or objects into clusters or groups. It’s called “hierarchical” because it organizes data in a tree-like or hierarchy structure, where clusters can contain sub-clusters.
Here’s how it works:
- Data Points: We start with our individual data points or objects. In our project, these could be individual records of police shootings.
- Pairwise Distances: Here we calculate the distances between all data points. In our case, it might be the geographical distances between different locations of police shootings.
- Agglomeration: Initially, each data point is considered as its own cluster. Then, the two closest clusters or data points are merged into a new cluster. This process continues iteratively, gradually forming a hierarchy of clusters.
- Dendrogram: As we merge clusters, we create a visual representation called a dendrogram. A dendrogram is like a tree diagram that shows how clusters are formed and nested within each other.
Hierarchical clustering helps us group data in a way that shows relationships and hierarchies. It’s like organizing objects based on their similarities, where smaller groups are nested within larger ones. This technique can be useful for exploring patterns and structures within our dataset, such as identifying different categories or groupings of police shootings.