I, ME AND MYSELF !!!: Kruskal's Algorithm in C++

Wednesday, January 6, 2010

Kruskal's Algorithm in C++

Minimum Spanning Tree, Kruskal's Algorithm

Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). This algorithm is based on greedy approach.

Performance

This algorithm needs to sort the edges and uses a disjoint set data structure to keep track which vertex is in which component. We know the best comparison sorting is O(e*lg(e)), i.e. the merge sort or quick sort, where e is the number of edges, and the set operations can be implemented such a way that they are almost constant. The algorithm itself is linear to the number of edges e. So the total complexity can be achieved is O(e*lg(e)). Also note that, e can be max v*v (when it is a complete graph).

Algorithm

Do some study and paper-work before you proceed:
1. the algorithm and analysis
2. another good pseudo-code
3. read in wikipedia
4. text: Introduction to Algorithms (CLRS, MIT press), Chapter 23.

Implementation

Here is a short and in general implementation in C++ using the STL library. This solves for one graph, if you need it for multiple graphs inputs, don't forget to reset the vectors and arrays appropriately.

Input

N, E // number of nodes and edges.
E edges containing u, v, w; where, the edge is (u, v) and edge weight is w.

C++ code


#include <cstdio>
#include <vector>
#include <algorithm>
using namespace std;

#define edge pair< int, int >
#define MAX 1001

// ( w (u, v) ) format
vector< pair< int, edge > > GRAPH, MST;
int parent[MAX], total, N, E;

int findset(int x, int *parent)
{
    if(x != parent[x])
        parent[x] = findset(parent[x], parent);
    return parent[x];
}

void kruskal()
{
    int i, pu, pv;
    sort(GRAPH.begin(), GRAPH.end()); // increasing weight
    for(i=total=0; i<E; i++)
    {
        pu = findset(GRAPH[i].second.first, parent);
        pv = findset(GRAPH[i].second.second, parent);
        if(pu != pv)
        {
            MST.push_back(GRAPH[i]); // add to tree
            total += GRAPH[i].first; // add edge cost
            parent[pu] = parent[pv]; // link
        }
    }
}

void reset()
{
    // reset appropriate variables here
    // like MST.clear(), GRAPH.clear(); etc etc.
    for(int i=1; i<=N; i++) parent[i] = i;
}

void print()
{
    int i, sz;
    // this is just style...
    sz = MST.size();
    for(i=0; i<sz; i++)
    {
        printf("( %d", MST[i].second.first);
        printf(", %d )", MST[i].second.second);
        printf(": %d\n", MST[i].first);
    }
    printf("Minimum cost: %d\n", total);
}

int main()
{
    int i, u, v, w;

    scanf("%d %d", &N, &E);
    reset();
    for(i=0; i<E; i++)
    {
        scanf("%d %d %d", &u, &v, &w);
        GRAPH.push_back(pair< int, edge >(w, edge(u, v)));
    }
    kruskal(); // runs kruskal and construct MST vector
    print(); // prints MST edges and weights

    return 0;
}

Have fun and please notify for any bug...

80 comments:

munna1505March 21, 2010 at 9:19 PM
Could u plz explain why setting parent[pu] = parent[pv] successfully detects cycle ?? done a pretty decent paper work...and understands why..but still need a solid theorem.
ReplyDelete
Replies
Zobayer HasanMarch 21, 2010 at 10:39 PM
@Munna, here is the theory...
ReplyDelete
Replies
AnonymousNovember 24, 2010 at 11:58 AM
Impressive code. This is the cleanest implementation I have seen of Kruskal's to date. It sticks very close to the pseudo-code -- I like it!
ReplyDelete
Replies
Zobayer HasanNovember 30, 2010 at 4:07 AM
Yeah, I also don't find any reason why people can't help writing some oop mumbojumbo's whenever they come to write something over the net. The do nothing but make algorithms harder to follow...
ReplyDelete
Replies
AnonymousDecember 2, 2010 at 11:02 AM
Can you tell me how to do an input? im really noob, please help me.
ReplyDelete
Replies
Zobayer HasanJanuary 4, 2011 at 4:41 PM
I can't understand what are you trying to indicate by "doing an input", are you asking "how to take input"? Well, it's a language dependent issue and you can always use google!
ReplyDelete
Replies
AnonymousFebruary 16, 2011 at 10:39 PM
Thanks!, a nice implementation of Kruskal.
I would remove the recursive findset() to get it even better

int findset(int x, int* parent)
{
while(x!= parent[x])
x=parent[x];
return x;
}
ReplyDelete
Replies
Zobayer HasanFebruary 16, 2011 at 11:56 PM
@Anonymous, are you sure you are doing the right thing? your findset does not do anything the recursive one does. In findset, you are supposed to update parent[] that's why it is called path compression. But clearly, you have increased the complexity to O(n) and your algorithm will not even work properly. Let me know if it does.
ReplyDelete
Replies
Zobayer HasanFebruary 16, 2011 at 11:59 PM
Removing recursion does not always get things better, at least not the way you have mentioned. Check this, and see if you can get 100:
http://www.spoj.pl/problems/MST/
ReplyDelete
Replies
AnonymousFebruary 17, 2011 at 12:51 AM
It will work the same. In findset() we are just finding out the disjoint set it belonging to, we are not updating it.Updating the parent is in Kruskal().
Here is result of that problem:
4<----->2:2
1<----->3:5
1<----->2:10
Total distance=17

I think complexity and the number of terms computed in both cases remains the same.
ReplyDelete
Replies
Zobayer HasanFebruary 17, 2011 at 1:14 AM
"we are not updating it." you are wrong, and I think you need to have some study on path compression. the findset definitely updates parent. and recursive one presented here takes constant time max O(4) while yours one is linear O(n). so, before running with kruskal, first finish this http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=disjointDataStructure
Thanks
ReplyDelete
Replies
AnonymousFebruary 17, 2011 at 1:16 AM
One more thing, I was not aware of the spoj website, thanks for it. I happened to see your solution and I felt it is neat so just commented it. I need to go through the forum and website and see how to submit my own code.By the way in which page your details so that I can seen the memory and time?
ReplyDelete
Replies
Zobayer HasanFebruary 17, 2011 at 1:26 AM
No problem. SPOJ is a nice site for problem solving. But remember, this code is not a solution of any specific problem, so submitting this will not result in a 100
http://www.spoj.pl/status/MST,zobayer/
ReplyDelete
Replies
Zobayer HasanFebruary 17, 2011 at 1:31 AM
and my spoj profile:
pizza boy
ReplyDelete
Replies
AnonymousFebruary 17, 2011 at 1:37 AM
thanks, as I said I'm here just by accident. I'm not an youngster like you, I've spent last 20 years in software development, probably I tell my children to look in to it. All the best!
ReplyDelete
Replies
Zobayer HasanFebruary 17, 2011 at 1:47 AM
Oh, you were writing as Anonymous, so I had no way to know who are you really. So I considered to be as usual contest coder (I mean those who usually read this blog)... never mind :)
I was wondering, whether you have forgotten the fundamentals of algorithms over the last 20 years or software development doesn't even need these things? Sorry for my ignorance, I don't know what do people do in software development and many thanks for your visit :)
ReplyDelete
Replies
AnonymousMarch 2, 2011 at 10:33 PM
int findset(int x, int *parent)
14
{
15
if(x != parent[x])
16
parent[x] = findset(parent[x], parent);
17
return parent[x];
18
}

I can't see anything being updated there.... you are just trying to find an element in an array
ReplyDelete
Replies
Zobayer HasanMarch 3, 2011 at 2:28 AM
What is line 16 doing then? Is this what you call "just trying to find an element in an array "? lol
ReplyDelete
Replies
Zobayer HasanMarch 3, 2011 at 3:11 AM
If we were "just trying to find an element" the we could have just returned findset(parent[x], parent), why we are "storing" this in parent[x]? and if it is not an update, what you call to be an update?
ReplyDelete
Replies
AnonymousMay 2, 2011 at 5:31 AM
Beautiful code.Congrats from brazil[ufmg].
ReplyDelete
Replies
AnonymousMay 2, 2011 at 5:41 AM
just one thing - the comparison function for sort was not searched recursively for me.It was fine for pair,pair for ex, but not for >.Would you know why?
ReplyDelete
Replies
Zobayer HasanMay 2, 2011 at 6:52 PM
@Anonymous, I can not understand you....
ReplyDelete
Replies
AnonymousJune 29, 2011 at 8:42 PM
Brilliant Explanation!
ReplyDelete
Replies
geekAugust 10, 2011 at 7:49 PM
hey ur sort function is not working !!
ReplyDelete
Replies
Zobayer HasanAugust 11, 2011 at 5:03 PM
@geek, you are the first one saying that. I think you are doing something wrong, sort is not my function, it is STL Algorithm's sort, working fine.
ReplyDelete
Replies
AnonymousDecember 16, 2011 at 12:41 AM
what does the findset() function do?
ReplyDelete
Replies
Zobayer HasanDecember 16, 2011 at 11:19 PM
In a disjoint set structure, nodes are added to some disjoint sets. Each set is identified by a special node which is called the root of that set. Simply put, findes(x) finds the root of x. It has a constant time complexity.
ReplyDelete
Replies
George ChristoglouJanuary 11, 2012 at 3:49 AM
why not vector > GRAPH; ?
how do you know that the graph doesnt have more than 2 edges ?.
ReplyDelete
Replies
Zobayer HasanJanuary 12, 2012 at 8:37 PM
Where does it tell that the graph has two edges?
ReplyDelete
Replies
Zobayer HasanJanuary 12, 2012 at 8:42 PM
Have you run the program? If not, obviously you can try running it. Actually this program will work for any number of edges. Just set the MAX macro properly.
ReplyDelete
Replies
AnonymousMarch 7, 2012 at 9:56 PM
hello sir can u please explain how to analyze the kruskal algorithm to caculate the runtime, ex elogv.
ReplyDelete
Replies
NouranMay 11, 2012 at 2:05 AM
could u just please tell me what pu , pv variables stand for ??
ReplyDelete
Replies
pulkitJuly 19, 2012 at 11:16 AM
shouldn't you have initialised parent[MAX] as "parent[x]=x"....??
ReplyDelete
Replies
pulkitJuly 21, 2012 at 3:43 PM
when i used "GRAPH.clear()" in the reset function it gives the following error

error: request for member ‘clear’ in ‘G’, which is of non-class type ‘std::vector, std::allocator > > [100001]’

how to clear the vector?
ReplyDelete
Replies
AnonymousAugust 10, 2012 at 4:10 PM
can u plz provide the codes for bfs and dfs too..??
ReplyDelete
Replies
AnonymousSeptember 25, 2012 at 9:19 AM
Nice implementation man, thanks and keep helping people like me.
ReplyDelete
Replies
UnknownDecember 29, 2012 at 11:09 PM
can get bfs and dfs code from.

www.educationandcareeer.blogspot.in
ReplyDelete
Replies
DaRk_KeNtMarch 19, 2013 at 4:45 PM
Hi, I just read this spectacular code for the MST. It works fine.
Now I have a little problem. What can I do when I have a graph whitch the mst does not exist? I want to print instead of the cost, the impossibility to calculate it.
Thanks for the answer :D
ReplyDelete
Replies
AnonymousMay 29, 2013 at 11:11 PM
can u explain me the whole code in laymans language.??
ReplyDelete
Replies
AnonymousJune 2, 2013 at 12:31 PM
What change excatly,will i have to do if i were to add path compression by no of nodes in this algo..!!
How will maintain and update ranks.!!??
ReplyDelete
Replies
AnonymousJune 6, 2013 at 11:51 PM
How to properly make a reset()? Clearing MST and GRAPH isn't enough...
ReplyDelete
Replies
UnknownJuly 13, 2013 at 11:20 PM
Zobayer Hasan : I came across your blog month ago and I am reading everything from it and thanks for everything first,Your blog is amazing and I also implemented the Kruskal's algorithm and detecting cycles in a graph ,CLRS helped me in unions and find (Disjointset data structures) and I am having one doubt I implemented both Kruskal's and prim's What I want to know is that will they usually both result in the SAME MINIMUM SPANNING TREE in case all edges in the graph are not distinct ? PLEASE PROVIDE THE ANSWER WITH PROOF because I am used to learn things in that way and one more thing may I know your TopCoder Member profile it would be more helpful for me and others.Thanks in advance!
ReplyDelete
Replies
UnknownOctober 8, 2013 at 4:01 PM
So many replies on Kluskra Algorithm. Great work Mr. Hassan. I also have worked on Kluskera's Algorithm and here is result I reached at.. please have a look at following code and tell me if there is anything i need to improve. thanks
http://in.docsity.com/en-docs/Kruskals_Algorithm_-_C_plus_plus_Code
ReplyDelete
Replies
AnonymousNovember 27, 2013 at 10:44 AM
Awesome! So simple yet so good! Do you have "Christofides algorithm" in your blog or anywhere? I couldn't find it here!
ReplyDelete
Replies
AnonymousFebruary 22, 2014 at 6:11 AM
Hello Zobayer,

Thank you for this wonderful example. I am trying to learn more about graphs and spanning trees and your example is great. I do have one question. I tried to run your code to get the maximum spanning tree and I know I would have to sort the Graph in descending order but I am having a problem doing so. Can you please advice?

Thank you!
ReplyDelete
Replies
AnonymousFebruary 22, 2014 at 9:38 AM
Hi,

what does this mean

for(i=total=0; i<E; i++) (why are i and total together)
ReplyDelete
Replies
mehrabApril 22, 2014 at 5:17 PM
Such a beautiful code (Y)
ReplyDelete
Replies
AnonymousMay 1, 2014 at 4:56 AM
How would i add the total weight of the input graph before it is made into an MST?
ReplyDelete
Replies
AnonymousOctober 22, 2014 at 9:03 AM
correct me if I'm wrong but from the theory you suggested I think that you are not using 'union by rank' in your implementation
in your code you do:
parent[pu] = parent[pv]; // link
which is O(1) but for next iterations the following line:
pu = findset(GRAPH[i].second.first, parent);
I think will have a worse time that if you used union by rank, am I correct?
ReplyDelete
Replies
TharunOctober 24, 2014 at 12:39 AM
This is not a efficient way. Right? I mean, you have used path compression technique but not union by rank. Correct me if I am wrong.Thanks in advance.
ReplyDelete
Replies
UnknownNovember 16, 2015 at 9:13 AM
Hey Zobayer, can you please look at this piece of code implementing Prim's Algorithm for me? It sure is a looker though!
https://site4algo.wordpress.com/2015/11/16/prims-algorithm-implemented-in-c/#
ReplyDelete
Replies
AnonymousFebruary 3, 2016 at 10:05 PM
A very clean code. (Y)
ReplyDelete
Replies
BaqirJune 19, 2016 at 6:05 PM
Everything is perfect, elegant. Just that you have not implemented union by rank. Have you?
If you haven't, then do you mean that it will not affect the running time much?
ReplyDelete
Replies
ArkapravaOctober 8, 2017 at 8:58 PM
Why did you use a for loop , instead of a while ? increment the counter only when we find an edge which does not create a cycle , please could you explain me this ,this is the doubt i have .
ReplyDelete
Replies

Add comment