Suppose you’re given a metric space (X, d) and a parameter k, and your goal is to find k “clusters” such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster “center” (a designated point).

This is the well-known k-median problem (which differs from the also popular k-means problem by the use of distances, rather than squares of distances). In a general metric space, the k-median problem is known to be NP-hard, as well as being hard to approximate to within arbitrary constant factor. The current best known approximation ratio for the k-median is ~~4, due to Charikar and Guha~~ 3 + eps, via a local search heuristic due to Arya, Garg, Khandekar, Meyerson, Munagala and Pandit (thanks, Chandra).

If the underlying metric is the Euclidean metric, then the problem complexity changes: in fact a PTAS for the Euclidean k-median can be obtained, due to the results of Arora, Raghavan and Rao, and then Kolliopoulos and Rao (who provide an almost-linear time algorithm). But as far as I am aware, there is still no NP-hardness proof for the Euclidean k-median problem, and I’d be interested in knowing if I am wrong here.

Note that the related problem of Euclidean k-means is known to be NP-hard from an observation by Drineas, Frieze, Kannan, Vempala and Vinay.

### Like this:

Like Loading...

*Related*

## Chandra

The best known approximation for

k-median in general metric spaces

is 3+\eps due to a local search

algorithm of Arya etal.