Minimum Spanning Tree, Kruskal's Algorithm
Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). This algorithm is based on greedy approach.
Performance
This algorithm needs to sort the edges and uses a disjoint set data structure to keep track which vertex is in which component. We know the best comparison sorting is O(e*lg(e)), i.e. the merge sort or quick sort, where e is the number of edges, and the set operations can be implemented such a way that they are almost constant. The algorithm itself is linear to the number of edges e. So the total complexity can be achieved is O(e*lg(e)). Also note that, e can be max v*v (when it is a complete graph).
Algorithm
Do some study and paper-work before you proceed:
1. the algorithm and analysis
2. another good pseudo-code
3. read in wikipedia
4. text: Introduction to Algorithms (CLRS, MIT press), Chapter 23.
Implementation
Here is a short and in general implementation in C++ using the STL library. This solves for one graph, if you need it for multiple graphs inputs, don't forget to reset the vectors and arrays appropriately.
Input
N, E // number of nodes and edges.
E edges containing u, v, w; where, the edge is (u, v) and edge weight is w.
C++ code
#include <cstdio>
#include <vector>
#include <algorithm>
using namespace std;
#define edge pair< int, int >
#define MAX 1001
// ( w (u, v) ) format
vector< pair< int, edge > > GRAPH, MST;
int parent[MAX], total, N, E;
int findset(int x, int *parent)
{
if(x != parent[x])
parent[x] = findset(parent[x], parent);
return parent[x];
}
void kruskal()
{
int i, pu, pv;
sort(GRAPH.begin(), GRAPH.end()); // increasing weight
for(i=total=0; i<E; i++)
{
pu = findset(GRAPH[i].second.first, parent);
pv = findset(GRAPH[i].second.second, parent);
if(pu != pv)
{
MST.push_back(GRAPH[i]); // add to tree
total += GRAPH[i].first; // add edge cost
parent[pu] = parent[pv]; // link
}
}
}
void reset()
{
// reset appropriate variables here
// like MST.clear(), GRAPH.clear(); etc etc.
for(int i=1; i<=N; i++) parent[i] = i;
}
void print()
{
int i, sz;
// this is just style...
sz = MST.size();
for(i=0; i<sz; i++)
{
printf("( %d", MST[i].second.first);
printf(", %d )", MST[i].second.second);
printf(": %d\n", MST[i].first);
}
printf("Minimum cost: %d\n", total);
}
int main()
{
int i, u, v, w;
scanf("%d %d", &N, &E);
reset();
for(i=0; i<E; i++)
{
scanf("%d %d %d", &u, &v, &w);
GRAPH.push_back(pair< int, edge >(w, edge(u, v)));
}
kruskal(); // runs kruskal and construct MST vector
print(); // prints MST edges and weights
return 0;
}
Have fun and please notify for any bug...
Could u plz explain why setting parent[pu] = parent[pv] successfully detects cycle ?? done a pretty decent paper work...and understands why..but still need a solid theorem.
ReplyDelete@Munna, here is the theory...
ReplyDeleteImpressive code. This is the cleanest implementation I have seen of Kruskal's to date. It sticks very close to the pseudo-code -- I like it!
ReplyDeleteYeah, I also don't find any reason why people can't help writing some oop mumbojumbo's whenever they come to write something over the net. The do nothing but make algorithms harder to follow...
ReplyDeleteCan you tell me how to do an input? im really noob, please help me.
ReplyDeleteI can't understand what are you trying to indicate by "doing an input", are you asking "how to take input"? Well, it's a language dependent issue and you can always use google!
ReplyDeleteThanks!, a nice implementation of Kruskal.
ReplyDeleteI would remove the recursive findset() to get it even better
int findset(int x, int* parent)
{
while(x!= parent[x])
x=parent[x];
return x;
}
@Anonymous, are you sure you are doing the right thing? your findset does not do anything the recursive one does. In findset, you are supposed to update parent[] that's why it is called path compression. But clearly, you have increased the complexity to O(n) and your algorithm will not even work properly. Let me know if it does.
ReplyDeleteRemoving recursion does not always get things better, at least not the way you have mentioned. Check this, and see if you can get 100:
ReplyDeletehttp://www.spoj.pl/problems/MST/
It will work the same. In findset() we are just finding out the disjoint set it belonging to, we are not updating it.Updating the parent is in Kruskal().
ReplyDeleteHere is result of that problem:
4<----->2:2
1<----->3:5
1<----->2:10
Total distance=17
I think complexity and the number of terms computed in both cases remains the same.
"we are not updating it." you are wrong, and I think you need to have some study on path compression. the findset definitely updates parent. and recursive one presented here takes constant time max O(4) while yours one is linear O(n). so, before running with kruskal, first finish this http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=disjointDataStructure
ReplyDeleteThanks
One more thing, I was not aware of the spoj website, thanks for it. I happened to see your solution and I felt it is neat so just commented it. I need to go through the forum and website and see how to submit my own code.By the way in which page your details so that I can seen the memory and time?
ReplyDeleteNo problem. SPOJ is a nice site for problem solving. But remember, this code is not a solution of any specific problem, so submitting this will not result in a 100
ReplyDeletehttp://www.spoj.pl/status/MST,zobayer/
and my spoj profile:
ReplyDeletepizza boy
thanks, as I said I'm here just by accident. I'm not an youngster like you, I've spent last 20 years in software development, probably I tell my children to look in to it. All the best!
ReplyDeleteOh, you were writing as Anonymous, so I had no way to know who are you really. So I considered to be as usual contest coder (I mean those who usually read this blog)... never mind :)
ReplyDeleteI was wondering, whether you have forgotten the fundamentals of algorithms over the last 20 years or software development doesn't even need these things? Sorry for my ignorance, I don't know what do people do in software development and many thanks for your visit :)
int findset(int x, int *parent)
ReplyDelete14
{
15
if(x != parent[x])
16
parent[x] = findset(parent[x], parent);
17
return parent[x];
18
}
I can't see anything being updated there.... you are just trying to find an element in an array
What is line 16 doing then? Is this what you call "just trying to find an element in an array "? lol
ReplyDeleteIf we were "just trying to find an element" the we could have just returned findset(parent[x], parent), why we are "storing" this in parent[x]? and if it is not an update, what you call to be an update?
ReplyDeleteBeautiful code.Congrats from brazil[ufmg].
ReplyDeletejust one thing - the comparison function for sort was not searched recursively for me.It was fine for pair,pair for ex, but not for >.Would you know why?
ReplyDelete@Anonymous, I can not understand you....
ReplyDeleteBrilliant Explanation!
ReplyDeletehey ur sort function is not working !!
ReplyDelete@geek, you are the first one saying that. I think you are doing something wrong, sort is not my function, it is STL Algorithm's sort, working fine.
ReplyDeletewhat does the findset() function do?
ReplyDeleteIn a disjoint set structure, nodes are added to some disjoint sets. Each set is identified by a special node which is called the root of that set. Simply put, findes(x) finds the root of x. It has a constant time complexity.
ReplyDeletewhy not vector > GRAPH; ?
ReplyDeletehow do you know that the graph doesnt have more than 2 edges ?.
Where does it tell that the graph has two edges?
ReplyDeleteHave you run the program? If not, obviously you can try running it. Actually this program will work for any number of edges. Just set the MAX macro properly.
ReplyDeleteHallo,
Deletesorry for this stupid question, but I dont know how to run your application in MS Visual Studio 2010. Please could you write me what kind of new project should I create to run your application there?
Thank you very much for your help.
Hmm, I guess you have C++ installed with you MSVS. Now, just open a C++ win32 console application. an application wizard will come. now select "empty project", and then add a c++ file to it, copy paste the code, run, etc etc..
Deletebut all of my post assumes that reader knows c/c++ very well (having some sort of problem solving background), so sorry if you can't make it :(
hello sir can u please explain how to analyze the kruskal algorithm to caculate the runtime, ex elogv.
ReplyDeleteline #23 sorts all the edges, it should take O(e lg e), then the loop runs for O(e * complexity of find() and link())
DeleteNow, after using path compression, the cost of find() is almost constant, probably 4 on the worst case, and link() is always constant. So, the overall in the loop is O(e). So, the overall goes to O(e lg e + e) = O(e lg e). Now, in programming books, probably they ignore the complexity of sorting, not sure though, in that case the calculation is O(e lg v) because they do not consider union-find structure when optimized, so normal priority queue should take lg(v) in each operation, thus O(e lg v), in that sense, it takes O(e), that's why this is this works much faster.
could u just please tell me what pu , pv variables stand for ??
ReplyDeleteparent of u and v respectively in disjoint set structure.
Delete