so they've defined ambivalent typologies based on their framework in table 1, and use that to impose 4 clusters onto the data
Based on these considerations, the study sets the number of clusters at four, although all three heuristics, i.e. the elbow method, the silhouette value and the gap statistic, suggest that the respondents form three clusters
so there's really only 3 clusters but they've decided to set k=4 anyway, and then k-means just minimizes the variance within each cluster relative to its mean value. each observation gets assigned to whichever mean is "closest" in a certain sense, but that doesn't mean it's really the best choice.
even after they "merge" the ambivalent classes and set k=3, assigning each observation to a cluster based on the closest mean value doesn't mean it's the best choice for defining each class, just that it's the closest in terms of variance.
the natopedia article has a good illustration:
https://en.wikipedia.org/wiki/K-means_clustering#/media/File:K-means_convergence.gif