NP hardness of Euclidean k-median clustering

· clustering, research

Suppose you’re given a metric space (X, d) and a parameter k, and your goal is to find k “clusters” such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster “center” (a designated point).

This is the well-known k-median problem (which differs from the also popular k-means problem by the use of distances, rather than squares of distances). In a general metric space, the k-median problem is known to be NP-hard, as well as being hard to approximate to within arbitrary constant factor. The current best known approximation ratio for the k-median is 4, due to Charikar and Guha 3 + eps, via a local search heuristic due to Arya, Garg, Khandekar, Meyerson, Munagala and Pandit (thanks, Chandra).

If the underlying metric is the Euclidean metric, then the problem complexity changes: in fact a PTAS for the Euclidean k-median can be obtained, due to the results of Arora, Raghavan and Rao, and then Kolliopoulos and Rao (who provide an almost-linear time algorithm). But as far as I am aware, there is still no NP-hardness proof for the Euclidean k-median problem, and I’d be interested in knowing if I am wrong here.

Note that the related problem of Euclidean k-means is known to be NP-hard from an observation by Drineas, Frieze, Kannan, Vempala and Vinay.


Comments RSS
  1. Chandra

    The best known approximation for
    k-median in general metric spaces
    is 3+\eps due to a local search
    algorithm of Arya etal.

  2. Shiva Kintali

    WoW !! what a coincidence ? I was trying to find an answer to this exact question over this long-weekend. I could not get any NP-hardness proof after three days of crazy googling and acm-search. Thanks for asking this question.

    I have a related question. What exactly is the k-variance problem ? I came across couple of different definitions. Is it’s complexity open ? Are there any known results for k-variance on euclidean plane.

  3. Anonymous

    A quick comment: the Arora-Raghavan-Rao PTAS holds only for *low-dimensional* Euclidean metrics. Specifically, the running time of the algorithm is doubly exponential in d.

    As far as the hardness goes, it is known that the doubly exponential dependence is (likely) necessary for a PTAS, *if* your medians must come from a given set (the “discrete median” case). This follows from a combination of papers, incl. Trevisan’97, see comments in Guruswami-Indyk, SODA’03.
    So in particular, it is known that there is no PTAS for the discrete k-median in general Euclidean spaces (unless something strange happens)


  4. Venkat Chakaravarthy

    If I am not mistaken, the following paper shows that NP-Hardness of the k-median problem under Eucledean metric.

    N. Megiddo and K. Supowit.
    On the complexity of some common geometric location problems.
    SIAM J Computing, 13(1), 1984.

  5. Suresh

    ah ! and I’ve even read this paper before.

    Yes, that indeed is a proof. Specifically they show that k-median in the plane is NP-hard, by a very complicated reduction from 3SAT.

    thanks !

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: