I'm doing some research into clustering algorithms and every source I seem to find discusses 2D (or higher-dimensional) clustering of continuous data. The nearest thing I've found to what I'm looking for is this article which discusses discrete-continuous clustering (where the x
and y
axes are quantized into cells, but the z
axis is allowed to vary continuously).
Has anyone come across any algorithms which perform cluster analysis of purely discrete data? Specifically 2D?
Top comments (2)
How about Single-Linkage Clustering or Complete-Linkage Clustering, both belong to hierarchical Clustering, you just have to choose a distance metric that works on the grid, like Manhattan Distance.
Actually shouldn't it be possible to adopt any Clustering algorithm: as an example k-means: you need to choose an appropriate distance metric as above and second adjust the calculation of the prototypes to choose a point of the grid.
Can you be a bit more specific about what your data looks like? Are x and y categorical features and z continuous? I had, at some point, a SO thread about combining data specific distance functions in a nearest neighbor search. I can't find it anymore, but it would be sort of like def custom_distance(X): return scipy.dice(categorical_features) + scipy.euclidean(continuous_features)
It looks sort of like: members.cbio.mines-paristech.fr/~j...
Found it! Hopefully something in this thread is helpful.
datascience.stackexchange.com/ques...