Wednesday, January 6, 2010

Kruskal's Algorithm in C++


Minimum Spanning Tree, Kruskal's Algorithm


Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). This algorithm is based on greedy approach.

Performance


This algorithm needs to sort the edges and uses a disjoint set data structure to keep track which vertex is in which component. We know the best comparison sorting is O(e*lg(e)), i.e. the merge sort or quick sort, where e is the number of edges, and the set operations can be implemented such a way that they are almost constant. The algorithm itself is linear to the number of edges e. So the total complexity can be achieved is O(e*lg(e)). Also note that, e can be max v*v (when it is a complete graph).

Algorithm


Do some study and paper-work before you proceed:
1. the algorithm and analysis
2. another good pseudo-code
3. read in wikipedia
4. text: Introduction to Algorithms (CLRS, MIT press), Chapter 23.

Implementation


Here is a short and in general implementation in C++ using the STL library. This solves for one graph, if you need it for multiple graphs inputs, don't forget to reset the vectors and arrays appropriately.

Input


N, E // number of nodes and edges.
E edges containing u, v, w; where, the edge is (u, v) and edge weight is w.

C++ code



#include <cstdio>
#include <vector>
#include <algorithm>
using namespace std;

#define edge pair< int, int >
#define MAX 1001

// ( w (u, v) ) format
vector< pair< int, edge > > GRAPH, MST;
int parent[MAX], total, N, E;

int findset(int x, int *parent)
{
if(x != parent[x])
parent[x] = findset(parent[x], parent);
return parent[x];
}

void kruskal()
{
int i, pu, pv;
sort(GRAPH.begin(), GRAPH.end()); // increasing weight
for(i=total=0; i<E; i++)
{
pu = findset(GRAPH[i].second.first, parent);
pv = findset(GRAPH[i].second.second, parent);
if(pu != pv)
{
MST.push_back(GRAPH[i]); // add to tree
total += GRAPH[i].first; // add edge cost
parent[pu] = parent[pv]; // link
}
}
}

void reset()
{
// reset appropriate variables here
// like MST.clear(), GRAPH.clear(); etc etc.
for(int i=1; i<=N; i++) parent[i] = i;
}

void print()
{
int i, sz;
// this is just style...
sz = MST.size();
for(i=0; i<sz; i++)
{
printf("( %d", MST[i].second.first);
printf(", %d )", MST[i].second.second);
printf(": %d\n", MST[i].first);
}
printf("Minimum cost: %d\n", total);
}

int main()
{
int i, u, v, w;

scanf("%d %d", &N, &E);
reset();
for(i=0; i<E; i++)
{
scanf("%d %d %d", &u, &v, &w);
GRAPH.push_back(pair< int, edge >(w, edge(u, v)));
}
kruskal(); // runs kruskal and construct MST vector
print(); // prints MST edges and weights

return 0;
}

Have fun and please notify for any bug...

36 comments:

  1. Could u plz explain why setting parent[pu] = parent[pv] successfully detects cycle ?? done a pretty decent paper work...and understands why..but still need a solid theorem.

    ReplyDelete
  2. Impressive code. This is the cleanest implementation I have seen of Kruskal's to date. It sticks very close to the pseudo-code -- I like it!

    ReplyDelete
  3. Yeah, I also don't find any reason why people can't help writing some oop mumbojumbo's whenever they come to write something over the net. The do nothing but make algorithms harder to follow...

    ReplyDelete
  4. Can you tell me how to do an input? im really noob, please help me.

    ReplyDelete
  5. I can't understand what are you trying to indicate by "doing an input", are you asking "how to take input"? Well, it's a language dependent issue and you can always use google!

    ReplyDelete
  6. Thanks!, a nice implementation of Kruskal.
    I would remove the recursive findset() to get it even better

    int findset(int x, int* parent)
    {
    while(x!= parent[x])
    x=parent[x];
    return x;
    }

    ReplyDelete
  7. @Anonymous, are you sure you are doing the right thing? your findset does not do anything the recursive one does. In findset, you are supposed to update parent[] that's why it is called path compression. But clearly, you have increased the complexity to O(n) and your algorithm will not even work properly. Let me know if it does.

    ReplyDelete
  8. Removing recursion does not always get things better, at least not the way you have mentioned. Check this, and see if you can get 100:
    http://www.spoj.pl/problems/MST/

    ReplyDelete
  9. It will work the same. In findset() we are just finding out the disjoint set it belonging to, we are not updating it.Updating the parent is in Kruskal().
    Here is result of that problem:
    4<----->2:2
    1<----->3:5
    1<----->2:10
    Total distance=17

    I think complexity and the number of terms computed in both cases remains the same.

    ReplyDelete
  10. "we are not updating it." you are wrong, and I think you need to have some study on path compression. the findset definitely updates parent. and recursive one presented here takes constant time max O(4) while yours one is linear O(n). so, before running with kruskal, first finish this http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=disjointDataStructure
    Thanks

    ReplyDelete
  11. One more thing, I was not aware of the spoj website, thanks for it. I happened to see your solution and I felt it is neat so just commented it. I need to go through the forum and website and see how to submit my own code.By the way in which page your details so that I can seen the memory and time?

    ReplyDelete
  12. No problem. SPOJ is a nice site for problem solving. But remember, this code is not a solution of any specific problem, so submitting this will not result in a 100
    http://www.spoj.pl/status/MST,zobayer/

    ReplyDelete
  13. thanks, as I said I'm here just by accident. I'm not an youngster like you, I've spent last 20 years in software development, probably I tell my children to look in to it. All the best!

    ReplyDelete
  14. Oh, you were writing as Anonymous, so I had no way to know who are you really. So I considered to be as usual contest coder (I mean those who usually read this blog)... never mind :)
    I was wondering, whether you have forgotten the fundamentals of algorithms over the last 20 years or software development doesn't even need these things? Sorry for my ignorance, I don't know what do people do in software development and many thanks for your visit :)

    ReplyDelete
  15. int findset(int x, int *parent)
    14
    {
    15
    if(x != parent[x])
    16
    parent[x] = findset(parent[x], parent);
    17
    return parent[x];
    18
    }

    I can't see anything being updated there.... you are just trying to find an element in an array

    ReplyDelete
  16. What is line 16 doing then? Is this what you call "just trying to find an element in an array "? lol

    ReplyDelete
  17. If we were "just trying to find an element" the we could have just returned findset(parent[x], parent), why we are "storing" this in parent[x]? and if it is not an update, what you call to be an update?

    ReplyDelete
  18. Beautiful code.Congrats from brazil[ufmg].

    ReplyDelete
  19. just one thing - the comparison function for sort was not searched recursively for me.It was fine for pair,pair for ex, but not for >.Would you know why?

    ReplyDelete
  20. @Anonymous, I can not understand you....

    ReplyDelete
  21. Brilliant Explanation!

    ReplyDelete
  22. hey ur sort function is not working !!

    ReplyDelete
  23. @geek, you are the first one saying that. I think you are doing something wrong, sort is not my function, it is STL Algorithm's sort, working fine.

    ReplyDelete
  24. what does the findset() function do?

    ReplyDelete
  25. In a disjoint set structure, nodes are added to some disjoint sets. Each set is identified by a special node which is called the root of that set. Simply put, findes(x) finds the root of x. It has a constant time complexity.

    ReplyDelete
  26. why not vector > GRAPH; ?
    how do you know that the graph doesnt have more than 2 edges ?.

    ReplyDelete
  27. Where does it tell that the graph has two edges?

    ReplyDelete
  28. Have you run the program? If not, obviously you can try running it. Actually this program will work for any number of edges. Just set the MAX macro properly.

    ReplyDelete
    Replies
    1. Hallo,
      sorry for this stupid question, but I dont know how to run your application in MS Visual Studio 2010. Please could you write me what kind of new project should I create to run your application there?

      Thank you very much for your help.

      Delete
    2. Hmm, I guess you have C++ installed with you MSVS. Now, just open a C++ win32 console application. an application wizard will come. now select "empty project", and then add a c++ file to it, copy paste the code, run, etc etc..
      but all of my post assumes that reader knows c/c++ very well (having some sort of problem solving background), so sorry if you can't make it :(

      Delete
  29. hello sir can u please explain how to analyze the kruskal algorithm to caculate the runtime, ex elogv.

    ReplyDelete
    Replies
    1. line #23 sorts all the edges, it should take O(e lg e), then the loop runs for O(e * complexity of find() and link())
      Now, after using path compression, the cost of find() is almost constant, probably 4 on the worst case, and link() is always constant. So, the overall in the loop is O(e). So, the overall goes to O(e lg e + e) = O(e lg e). Now, in programming books, probably they ignore the complexity of sorting, not sure though, in that case the calculation is O(e lg v) because they do not consider union-find structure when optimized, so normal priority queue should take lg(v) in each operation, thus O(e lg v), in that sense, it takes O(e), that's why this is this works much faster.

      Delete
  30. could u just please tell me what pu , pv variables stand for ??

    ReplyDelete
    Replies
    1. parent of u and v respectively in disjoint set structure.

      Delete

Catagories

academic study (17) access modifiers (1) algorithm (28) bash (1) beginner (17) bfs (1) bigint (1) binary tree (1) bitwise (4) blogger (5) bpm (2) brainfuck (1) bst (1) c (1) c++ (36) changes (1) character device driver (1) combinatorics (2) command prompt (1) comparator (1) compression (1) computational geometry (2) confusion (1) contest (7) crc (1) cse (3) css (1) customize (1) data structure (10) database (1) decoding (1) design (1) device driver (1) divide and conquer (2) dp (3) driver (1) dynamic programming (9) encoding (1) encryption (1) error (2) esoteric language (2) euler circuit (1) euler path (1) executable (1) expression evaluation (1) extended euclid (1) facebook (1) factorization (1) funny (14) gcd (2) geometry (4) git (1) github (1) graph (7) hashing (1) hiding window (1) hints (5) hopcroft karp (1) huffman (1) jar (1) java (5) javascript (1) jdbc (1) kernel programming (2) lab (1) like (1) linear algebra (3) linux (5) ls (1) makefile (1) math (16) matrix (2) matrix algebra (1) matrix exponentiation (1) matrix multiplication (1) maxflow (1) maximum bipartite matching (2) maximum flow (1) merge sort (3) mistake (1) modular arithmatic (1) module compiling (2) mysql (1) number system (1) number theory (8) online judge (3) operating system (1) os (1) other (8) parallel programming (1) pollard rho (1) primes (3) problem (3) problem classifier (2) problem solving (34) programming (51) pthreaded (1) puzzle (1) python (3) recursion (5) repository (1) shell (1) shell script (1) sieve (1) simulation (1) sort (3) spacing (1) sphere online judge (12) spoj (12) syntax highlighting (1) system programming (4) table tag (1) tc (1) template (4) thread (1) topcoder (1) training (3) tree (1) tutorial (2) ubuntu (1) usaco (1) uva (5) uva online judge (5) vector (1) windows (1)