I, ME AND MYSELF !!!: tutorial

Showing posts with label tutorial. Show all posts

Thursday, October 1, 2015

Ajax Queue Example Code

Although this is not what ajax is meant for, but sometimes application requirements force us to implement various hacks. For example, ajax is usually done for light-weight request. But in my case, I am using it for an alternative to form submission. Each of these submissions causes heavy load at server end. So I cannot allow more than one concurrent ajax call to reach the server. This simple solution worked as a charm.

Javascript code:

var ajaxQueue = $({});
var currentRequest = null;
$.ajaxQueue = function( ajaxOpts ) {
    // Hold the original complete function.
    var oldComplete = ajaxOpts.complete;
    // Queue our ajax request.
    ajaxQueue.queue(function( next ) {
        // Create a complete callback to fire the next event in the queue.
        ajaxOpts.complete = function() {
            // Fire the original complete if it was there.
            if ( oldComplete ) {
                oldComplete.apply( this, arguments );
            }
            // Run the next query in the queue.
            next();
        };
        // Run the query.
        currentRequest = $.ajax( ajaxOpts );
    });
};

// Ajax calls
function ajaxCall() {
    $.ajaxQueue({
        type: "POST",
        url: '/echo/json/',
        async: true,
        cache: false,
        success: function( result ) {
            // process response
        }
    });
}

// Abort method
function abortAjaxQueue() {
    ajaxQueue.clearQueue();
    if (currentRequest) {
        currentRequest.abort();
    }
}

Disclaimer

This is not my own code, this is community developed, thanks to this stackoverflow question and its accepted answer by sroes. Tested using JQuery versions 1.9.1 to 2.3.1

Play with this code in JSFiddle

Wednesday, November 27, 2013

Various Usage of B.I.T.

I’ve always imagined what BIT (I will omit the dots, but don't confuse it with binary bits) can do and can’t. I have seen numerous times, experienced solvers say “Oh it’s easily solvable using BIT” leaving me wondering, “BIT on this problem! How on earth!” Still I haven’t found quite a good answer for this question. Looks like BIT can do as much as you can imagine it to do (and obviously design it to do so).

However, I am not going to explain the structure of BIT or how it works etc. As there are lots of well documented articles available on the net. I can’t help putting one specific link here, which is a TopCoder algorithm tutorial for BIT. So, if you are not familiar with BIT yet, first go ahead, and read that post.

In this post, we will discuss some of the most common application of BIT for solving data structure related problems, more specifically, where range sum updates and queries are involved.

Some methods used in this post:
Update(x, v): Adds V at the position x in BIT.
Query(x): Returns the cumulative value on position x in BIT.
Read the above tutorial to know how these methods are implemented.
We will be using 1-based indexing throughout the post.

(I) Point Update, Range Query:

This is the most common form of BIT, and just about 99% of online tutorials you will find on the net will describe this application. So here we don't have much to say. If your updates are like “Add v in the x position of the array”, and queries are like “What is the sum of elements in index range [a, b] in the array?”, then all you do for update is call Update(x, v) and answer queries by Query(b) - Query(a-1). Simple cumulative sum formula, nothing really astonishing. This problem is can be solved using BIT: SPOJ - MSE06H.

(II) Range Update, Point Query:

Although not something you see every now and then, but it's readily doable. Now you are required to update v over the range [a, b] and the query is performed on a single index x. So now you call update twice, first add v at index a, then remove v for indices larger than b, which is: Update(a, v); Update(b+1, -v); And for query, you simply call Query(x). Now to see why this works, the following reasoning can be made:

Lets consider, initially you have everything 0. So Query(x) will return 0, no matter what index you call it with. Suppose you have called Update(a, v) and Update(b+1, -v). Now what will happen if you call Query(x)? Three cases can arise:
1. if x < a, the query is not affected by the updates. So you get correct result.
2. if x > b, x will be affected by the update(a, v) since x ≥ a, and update(b+1, -v) since x ≥ b+1, therefore, v-v=0 so everything cancels out. Again, you get the correct result.
3. a ≤ x ≤ b. x is only affected by update(a, v), but not update(b+1, -v), therefore, Query(x)'s value is increased by v, and it will return the correct result.

As we can see, in this fashion, we can increment or decrement for a range [a, b], and get a single index effect for some index x. This problem is an example: SPOJ - UPDATEIT

(III) Range Update, Range Query:

I didn't even know it exists until I read some post in TopCoder forums (the post was made by: AnilKishore).

I will just re-state his explanation as it was quite clear itself. All we want here is to support range updates, and do range queries. As from the previous type, if we try to store range updates, the BIT structure effectively captures updates for single positions, instead of range/cumulative sums. However, we can do some tweaking to work around this problem.

Let's just consider only one update: Add v to [a, b] while the rest elements of the array is 0.
Now, consider sum(0, x) for all possible x, again three situation can arise:
1. 0 ≤ x < a : which results in 0
2. a ≤ x ≤ b : we get v * (x - (a-1))
3. b < x < n : we get v * (b - (a-1))

This suggests that, if we can find v*x for any index x, then we can get the sum(0, x) by subtracting T from it, where:
1. 0 ≤ x < : Sum should be 0, thus, T = 0
2. a ≤ x ≤ : Sum should be v*x-v*(a-1), thus, T = v*(a-1)
3. b < x < n : Sum should be 0, thus, T = -v*b + v*(a-1)

As, we can see, knowing T solves our problem, we can use another BIT to store this additive amount from which we can get:
0 for x < a, v*(a-1) for x in [a..b], -v*b+v(a-1) for x > b.

Now we have two BITs.
To add v in range [a, b]: Update(a, v), Update(b+1, -v) in the first BIT and Update(a, v*(a-1)) and Update(b+1, -v*b) on the second BIT.
To get sum in range [0, x]: you simply do Query_BIT1(x)*x - Query_BIT2(x);
Now you know how to find range sum for [a, b]. Just find sum(b) - sum(a-1) using the formula stated above.

Pretty impressive this one! SPOJ - HORRIBLE can be solved in this approach. And thanks to Iqram Mahmud for this nice problem.

(IV) 2D BIT

How to write Update and Query methods for 2D BIT is well described in the TopCoder tutorial I've mentioned above. The only thing to notice here is how the queries and updates work. 2D BIT is basically a BIT where each element is another BIT. Adding v on (x, y) means it's effect will be found throughout the rectangle [(x, y), (W, H)], and query for (x, y) gives you the result of the rectangle [(0, 0), (x, y)], assuming the total rectangle is [(0, 0), (W, H)]. SO when you query and update on this BIT, you have to be careful about how many times you are subtracting a rectangle and adding it. Simple set union formula works here. So if you want to get the result of a specific rectangle [(x1, y1), (x2, y2)], the following steps are necessary to do so:
V = Query(x2, y2)
V -= Query(x2, y1-1)
V -= Query(x1-1, y2)
V += Query(x1-1, y1-1)

This problem is an example: SPOJ - MATSUM

This page is likely to be updated if I find and manage to solve new types of BIT problems. Please let me know if there are mistakes. This post is inspired by some random posts lying around in TopCoder forums, I just wanted to put them in one place for future reference.

Tuesday, April 2, 2013

SPOJ FINDPRM

SPOJ Problem Set (Classical) 5970. Finding Primes

Fun problem! First it wants you to run a sieve from 2 to N, and then do the similar operations starting from N to 2, except, this time, you have to mark the factors of N and then find the next largest N1 not yet marked, then mark factors of N1 and so on... Finally, you have to tell how many numbers will be unmarked by both algorithms up to N.

First observation is, you only need to consider those numbers which are primes, so, we can run a sieve up to 10⁷, which will allow us to test whether a number is prime or not in O(1) time.

Now, for each N, we can pre-calculate the result. Just note that, say if you know ans[n-1] and want to determine ans[n] from it, this inequality will hold |ans[n] - ans[n-1]| <= 1. Because the current N will be added as an unmarked, or there may be a number which you added in the past, will be removed by one of the factors of this N. Or, if N is not a prime, then it will be already removed by the first algorithm i.e. sieve.

If N is a prime number, then no N1 has removed this N already, so in this case the ans[n] = ans[n-1] + 1.

If N is an even number, and if N/2 was a prime, you have added that with your result, but this should be removed at this stage. So in such a case, ans[n] = ans[n-1].

Why don't you need to consider other factors like N/3, N/5 ... ? 2 is the only even prime factor and it can produce even numbers by multiplying other primes with it. For other primes, p = 3, 5 and so on, if N/p is prime, then N cannot be even. So the basic idea is as follows:

for( i = 2; i <= N; i++ ) {
    ans[i] = ans[i - 1];
    if( i is even and i/2 is prime ) ans[i] = ans[i] - 1;
    if( i is prime ) ans[i] = ans[i] + 1;
}

I had absolutely no clue at the first glance. Really nice problem.

Monday, March 25, 2013

SPOJ PRHYME

SPOJ Problem Set (classical) 2737. Perfect Rhyme

A perfect rhyme is not a crime,
it is something that exceeds time,
a bit of science, a piece of art,
soft as a pillow, sharp as a dart.

I really love this little rhyme.

Basically the problem is, you are given a dictionary of words, and some query words. For each query word q, you have to find a dictionary word u such that, u != q and the common suffix of q and u are of maximum length possible. In case there is a tie, the problem requires the lexicographically smallest such word.

This problem can be solved using STL maps and storing the suffixes along with their sorted id, but there is a more elegant solution using Trie data structure, i.e. prefix tree. As we are interested in maximizing common suffix length, we can store the strings in the Trie in reversed form, so now the suffixes will become prefixes in this tree. For each dictionary word, we also need to store its index number, when sorted, as a termination marker for that word, so that, we can find the id of an word easily during the query. On each node, we also need to keep two additional information, the ids of lexicographically smallest and second smallest strings passing through that node, which we call min1 and min2. Initially both should contain an infinite value.

Now for each query word, we also search the word in reversed form. If it is not found in the tree, i.e. a path may exist, but the end marker is not present, then the task is simple, we just return the lexicographically smallest id, which is min1, from the current node, i.e. the node which we reached while trying to match the query string.

But extra cares should be taken when the query string is found on the tree, because, then we have to look for another candidate, for which, we have kept min2, i.e. second lexicographically smallest index. If we can deduce that, going to node x from current node cur, if it evidently means we will end up finding the exact same word, then we can't follow that path, instead, we decide which index to return from current node cur, if its min1 index refers to the word itself, then we return min2, otherwise we return min1. And if we have no other choice but end up at the exact matching point, then we are also sure that there is at least another string which follows the same path, but does not end at our current node, i.e. at least two different words. Then depending on our query word, we select min1 or min2 from our current node.

So, if you know how to code a Trie, it is not really a hard one, but indeed a tricky one.

Tuesday, February 26, 2013

Divisor Function

Prime Factorization and Divisor Function σ_x(n)

σ_x(n) is defined as the sum of x^th powers of the distinct positive divisors of n. The function can be expressed as:

Here, r = ω(n), which is the number of distinct positive prime factors of n. p_i for i = 1 to r, are the prime factors and a_i is the maximum power of p_iby which n is divisible.

Clearly, this function can be used for various problems, for example, when x = 0, it simply means the number of distinct positive divisors of n, if x = 1, it is the sum of distinct positive divisors of n, for x = 2, its the sum of squares of positive divisors of n and so on.

For programming contest practice, there are a few problems that requires sum of divisors, or number of divisors, which can be calculated by simply calculating the prime factors with their count, and then evaluating the function shown above with appropriate value on x. Also, the form of function definition can be changed when you set a value for x. After putting x = 0, we get this:

So this is easy, you just need to find frequency of each prime, and then multiply each (a_i + 1) for all primes, you get the number of divisors of n.

For sum of divisor, the idea is similar, here x = 1. The following code shows how to find sum of distinct positive divisors of n:

#include <cstdio>
#include <cmath>
using namespace std;

#define sq(x) ((x)*(x))
#define i64 unsigned long long
#define MAX 784
#define LMT 28

unsigned flag[MAX/64];
unsigned primes[5761460], total;

#define chkC(n) (flag[n>>6]&(1<<((n>>1)&31)))
#define setC(n) (flag[n>>6]|=(1<<((n>>1)&31)))

/*
Regular sieve of eratosthenes, bitwise implementation
*/
void sieve()
{
    unsigned i, j, k;
    flag[0]|=0;
    for(i=3;i<LMT;i+=2)
        if(!chkC(i))
            for(j=i*i,k=i<<1;j<MAX;j+=k)
                setC(j);
    primes[(j=0)++] = 2;
    for(i=3;i<MAX;i+=2)
        if(!chkC(i))
            primes[j++] = i;
    total = j;
}

/*
finds n^p in log(p) time
*/
i64 power(unsigned n, unsigned p)
{
    i64 x=1, y=n;
    while(p > 0)
    {
        if(p&1) x *= y;
        y *= y;
        p >>= 1;
    }
    return x;
}

/*
calculates the sigma(1) function, we don't need to find prime frequencies.
*/
inline void update(i64 &sigma1, i64 n, unsigned p)
{
    if(p==1) sigma1 *= (n+1);
    else sigma1 *= ((power(n,p+1)-1)/(n-1));
}

/*
Factorization function, we do not need to store the primes here,
instead, whenever a prime is found, you update corresponding prime
and frequency with sigma 1
*/
void factor(i64 n, i64 &sigma1)
{
    unsigned i, v;
    i64 t;
    for(i=0, t=primes[i]; i<total && t*t <= n; t = primes[++i])
    {
        if(n % t == 0)
        {
            v = 0;
            while(n % t == 0)
            {
                v++;
                n /= t;
            }
            update(sigma1, primes[i], v);
        }
    }
    if(n>1) update(sigma1, n, 1);
}

/*
Our beloved main function
*/
int main()
{
    int t, x;
    i64 n, sigma1;
    sieve();
    scanf("%d", &t);
    for(x=1; x<=t; x++)
    {
        scanf("%llu", &n);
        factor(n, sigma1);
        printf("%llu\n",sigma1);
    }
    return 0;
}

We just need the values of σ, so here we will not store the prime factors. Some similar problem would be to find the number of odd divisors of n, or testing if a number is square free, i.e. no perfect square divides n, going to leave these as an exercise for readers.

A closer look to σ function from wikipedia.

Friday, December 14, 2012

Apriori in Java (Part 2)

(Part 1)

Apriori Algorithm:

The algorithm works as follows: first it generates all the frequent itemsets of length 1 w.r.t. the given threshold (minimum support). Then it continue to generate itemsets of lengths 2,3, ... ,n if possible. There are lots of improvements and pruning possible in the implementation.

Do we need to sort frequently? First observation is, if the items are taken in sorted order on step k = 1, all the frequent patterns generated in future will also be in sorted order if maintained properly. So this eliminates the necessity of sorting or storing itemsets in logarithmic data structures like map or set. Rather, we can store them in arraylist or vector like data structure.

How do we generate itemsets of length k+1? We generate itemsets of length k+1 by merging two itemsets of length k. If two itemsets I1 and I2 have a common prefix of length k-1, and I1[k] < I2[k], we can take I1[1 ... k-1] I1[k] I2[k] which is an itemset of length k+1. As our itemsets are sorted, and frequent itemsets generated are also sorted, this can be done in O(N*K^2). Well, if we follow the naive approach, it will take O(N*K^3), however, as we can pre-calculate the length of common prefix of consecutive items in O(N*K), later, we can use this to perform joining operation stated above and also do early termination rather than looking for the entire N items. The approach is demonstrated in source code.

What are the prunings? The most important observation on apriori algorithm is, if an itemset is not frequent, none of its superset can be frequent. Which also tells us, if an itemset has to be frequent, all of its subset must be frequent. Well, the first one is automatically checked in the algorithm. As it takes the frequent itemsets of previous step to calculate itemsets of current step. Now, how the second rule is checked? Well, we don't need to check all the subsets really, all we need is to check the subsets of length k-1 for step k, which can be easily done by performing manual hashing, or using java hashmap. This saves a lots of hassles.

/*
Author: Zobayer Hasan
*/
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

public class Apriori extends Thread {
    public static boolean debugger = false;
    
    private final Database db;
    private final List< Integer > itemset;
    private final List< List< Integer > > frequent;
    private double minsup;

    public Apriori(String thrName, Database db, double minsup) {
        super(thrName);
        this.db = db;
        itemset = db.getItemset();
        frequent = new ArrayList< List< Integer > >();
        this.minsup = minsup;
    }
    
    @Override
    public void run() {
        double startTime = System.currentTimeMillis();
        
        int k = 1, n = db.dbSize();
        List< List< Integer > > Ck = new ArrayList< List< Integer > >();
        List< List< Integer > > Lk = new ArrayList< List< Integer > >();
        HashMap< List< Integer>, Integer > seenK = new HashMap< List< Integer >, Integer >();
        
        for(Integer item : itemset) {
            List< Integer > temp = new ArrayList< Integer >();
            temp.add(item);
            Lk.add(temp);
        }
        
        while(k <= n && !Lk.isEmpty()) {
            if(debugger) {
                System.out.println("Step " + k);
                System.out.println("Lk: " + Lk);
            }
            
            seenK.clear();
            Ck.clear();
            for(List< Integer > kth : Lk) {
                int count = db.scanDatabase(kth);
                if((double)count < Math.ceil(minsup * (double)n / 100.0)) continue;
                Ck.add(kth);
            }
            
            if(debugger) {
                System.out.println("Ck: " + Ck);
            }
            
            if(Ck.isEmpty()) break;
            
            for(List< Integer > freq : Ck) {
                frequent.add(freq);
                seenK.put(freq, k);
            }
            
            int[] prefixlen = new int[Ck.size()];
            prefixlen[0] = 0;
            for(int i = 1; i < Ck.size(); i++) {
                prefixlen[i] = prefixLen(Ck.get(i-1), Ck.get(i));
            }
            
            List< List< Integer > > temp = new ArrayList< List< Integer > >();
            for(int i = 0; i < Ck.size(); i++) {
                for(int j = i + 1; j < Ck.size(); j++) {
                    if(prefixlen[j] == k-1) {
                        if(debugger) {
                            System.out.println("Joining: " + i + ":" + Ck.get(i) + " + " + j + ":" + Ck.get(j) + " Prefix Length " + prefixlen[j]);
                        }
                        temp.add(prefixJoin(Ck.get(i), Ck.get(j)));
                    }
                    else break;
                }
            }
            
            if(debugger) {
                System.out.println("Temporary: " + temp);
            }

            Lk.clear();
            for(List< Integer > list : temp) {
                boolean candid = true;
                if(k > 1) {
                    for(int i = 0; i < list.size(); i++) {
                        List< Integer > prev = new ArrayList< Integer >();
                        for(int j = 0; j < list.size(); j++) {
                            if(i != j) prev.add(list.get(j));
                        }
                        if(!seenK.containsKey(prev)) {
                            candid = false;
                            break;
                        }
                    }
                }
                if(candid) {
                    Lk.add(list);
                }
            }
            
            if(debugger) {
                System.out.println("Pruned: " + Lk);
            }
            
            k++;
        }
        
        double endTime = System.currentTimeMillis();
        System.out.println("Apriori completed in " + (endTime - startTime)/1000.0 + " seconds");
    }
    
    public void printPatterns() {
        System.out.println("Frequent Itemsets");
        for(List< Integer > pattern : frequent) {
            System.out.println(pattern);
        }
        System.out.println("Total " + frequent.size() + " itemsets");
    }
    
    private int prefixLen(List< Integer > left, List< Integer > right) {
        int len = 0;
        for(len = 0; len < left.size() && len < right.size(); len++) {
            if(left.get(len).compareTo(right.get(len)) != 0) return len;
        }
        return len;
    }
    
    private List< Integer > prefixJoin(List< Integer > left, List< Integer > right) {
        List< Integer > ret = new ArrayList< Integer >();
        for(Integer i : left) {
            ret.add(i);
        }
        ret.add(right.get(right.size() - 1));
        return ret;
    }
}

This class is threaded, so it is possible to take advantages of multicore processors.

A sample test class is shown here.

/*
Author: Zobayer Hasan
*/
public class FIMtest {

    public static void main(String[] args) {
        Database db = null;
        try {
            db = new Database("mushroom.dat");
            
        } catch(Exception e) {
            e.printStackTrace();
        }
        
        System.out.println("\nStarting Apriori");
        
        Apriori test1 = new Apriori("test1", db, 40.0);
        Apriori.debugger = true;
        test1.start();
        try {
            test1.join();
            test1.printPatterns();
        } catch(Exception e) {
            e.printStackTrace();
        }
    }
}

So, this is it, an efficient implementation of Apriori algorithm in java. It is possible that there are some lacking or mistakes in either source code or analysis. If you find any, please let me know. Here is a sample run on mushroom for 40% minimum support.

Have fun...

Thursday, December 13, 2012

Apriori in Java (Part 1)

Introduction:

Apriori is a very basic and straight forward algorithm for frequent pattern mining, I will not be discussing much about the approach, as those can already be studied from different lectures/books available on net. I will basically present an implementation of mine which is an efficient implementation of the standard apriori algorithm in Java. The target input files are taken from Frequent Itemset Mining Dataset Repository, for example, the mushroom dataset. Due to the space and runtime complexity, apriori is not suitable for larger files having 50,000 records or so. It also may take huge time for very dense files.

The Database:

First we need to read the files and store them in some data structure which will be easy to access and efficient at the same time. Here. I am presenting my java source code, and then I will present an explaining analysis why this works better.

/*
Author: Zobayer Hasan
*/

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.PriorityQueue;

class Entry {
    public Integer first;
    public Integer second;
    Entry() {}
    Entry(Integer first, Integer second) {
        this.first = first;
        this.second = second;
    }
}

public class Database {
    public static boolean debugger = false;
    
    private final List< List< Integer > > transactions;
    private final List< Integer > items;

    public Database(String dataFileName) throws Exception {
        if(debugger) {
            System.out.println("Processing " + dataFileName);
        }
        
        transactions = new ArrayList< List< Integer > >();
        items = new ArrayList< Integer >();
        
        FileInputStream fin = new FileInputStream(dataFileName);
        InputStreamReader istream = new InputStreamReader(fin);
        BufferedReader stdin = new BufferedReader(istream);
        
        String line;
        
        double startTime = System.currentTimeMillis();
        
        while((line = stdin.readLine()) != null) {
            List< Integer > transaction = new ArrayList< Integer >();
            String[] temp = line.split("\\s+");
            
            for(String num : temp) {
                transaction.add(Integer.parseInt(num));
            }
            
            if(transaction.isEmpty()) continue;
            
            Collections.sort(transaction);
            transactions.add(transaction);
        }
        
        fin.close();
        istream.close();
        stdin.close();
        
        int n = transactions.size();
        int[] header = new int[n];
        PriorityQueue< Entry > pQ = new PriorityQueue< Entry >(n, new Comparator< Entry >() {
            public int compare(Entry item1, Entry item2) {
                if(item1.first.equals(item2.first)) {
                    return item1.second.compareTo(item2.second);
                } else {
                    return item1.first.compareTo(item2.first);
                }
            }
        });
        
        for(int i = 0; i < n; i++) {
            header[i] = 0;
            pQ.add(new Entry(transactions.get(i).get(header[i]), i));
        }
        
        while(!pQ.isEmpty()) {
            Entry peek = pQ.remove();
            int val = peek.first;
            int idx = peek.second;
            if(items.isEmpty() || items.get(items.size()-1) < val) {
                items.add(val);
            }
            while(header[idx] < transactions.get(idx).size() && transactions.get(idx).get(header[idx]) <= val) {
                header[idx]++;
            }
            if(header[idx] < transactions.get(idx).size()) {
                pQ.add(new Entry(transactions.get(idx).get(header[idx]), idx));
            }
        }
        
        double endTime = System.currentTimeMillis();
        System.out.println("Database created in " + (endTime - startTime)/1000.0 + " seconds");
    }
    
    public int scanDatabase(List< Integer > transaction) {
        int count = 0;
        for(List< Integer > row : transactions) {
            boolean found = true;
            for(Integer item : transaction) {
                int idx, stp, st = 0, en = row.size(), cnt = en - st;
                while(cnt > 0) {
                    stp = cnt >> 1; idx = st + stp;
                    if(row.get(idx).compareTo(item) < 0) {
                        st = ++idx;
                        cnt -= stp+1;
                    }
                    else {
                        cnt = stp;
                    }
                }
                if(st == row.size() || row.get(st).compareTo(item) != 0) {
                    found = false;
                    break;
                }
            }
            if(found) count++;
        }
        return count;
    }
    
    public List< Integer > getItemset() {
        return items;
    }
    
    public int dbSize() {
        return transactions.size();
    }
    
    public List< Integer > getRow(int row) {
        try {
            return transactions.get(row);
        } catch(Exception e) {
            throw e;
        }
    }
}

Clearly it provides some interface to access the database where the constructor must be called with a file name for example 'mushroom.dat'. The sorting on line 56 is not necessary if you know the transactions in the file will be sorted in ascending order. All the data files in this repository are already sorted, so sorting can be disabled.

Now, if we look at the constructor, what is this huge code doing actually? First it treads each transaction as a string, then parses it and inserts the transaction in a sorted list. Now, we need a list of unique elements. Well, this could be done by sorting all the records together, and then eliminating duplicates. However, this can be performed better with the help of a priority queue in O(NK log N) time, where we have n transactions, and each transaction has K records on an average. Actually this works much better than it would be in naive approach which would take O(NK log NK) because of extra sorting overhead.

Then comes the last important part, scanDatabase() method. Which basically searches each record in the database for a set of elements. It takes O(NK log K) time at most, where, we have N records, each having an average length of K. This is much better than looking in O(N*K^2) following a naive approach. Here is some improvement possible, if we know the length of transactions, we could use a bit vector in addition to each transaction. Then the query can be performed in O(NK). However, as the datasets are huge, I didn't go for it due to the necessity of using huge memory.

Part 2 contains the source code and explanation for the apriori algorithm.

Monday, June 25, 2012

CSEDU Training 2012 Week 5, 6, 7, 8

Week 5

Link to practice contest 5

The main focus was mathematics and a few more 2-SAT problems. We've tried several problems regarding Gaussian Elimination. Gaussian Elimination problems are bit difficult because there are lots of things to consider during the process and sometimes there are scopes of optimization. In the practice problemset, there are a few gaussian elimination problems to test you gaussian elimination skills.

Problems from APIO 2009 was also discussed. Next week, we hope to discuss APIO 2010.

Week 6

Link to practice contest 6

The main focus was dynamic programming and a bit of computational geometry. We also discussed problems on heavy light decomposition problems (For example SPOJ GSS7 and problem A from Buet contest). However, there is a way where particular instance of heavy light decomposition can be solved using LCA-to-RMQ approach (tutorial from topcoder). APIO 2010 problems were postponed to next week because no one had their homework in time :P

Week 7

Link to practice contest 7

Problemset consists of APIO 2010 and a number of computational geometry. Vector geometry was discussed. The three types of transformation i.e. Rotation, Translation, Scaling was discussed and how to get them done using matrices. For rotation, there are a simpler way to do where you do not need to remember the matrix.

Then, 3D convex hull gave us a lot of trouble. We could easily derive and idea of O(n^4) but this is too costly. Then we've found a O(n^2) idea which was hard to code. Next schedule was APIO 2011 which is to be discussed on the next class after IUT Contest. Our instructor gave us a few idea on some of the problems though, but the solutions were quite astonishing and none was able to go near the correct solution unfortunately.

Week 8

No practice contest was hosted due to the close coming IUT Contest. Also not much was discussed, a few chit chat and several random problems.

In the mean time, we had a few team contests for IUT practice. Here are the links: (in no particular order)

http://acm.hust.edu.cn:8080/judge/contest/view.action?cid=8710#overview

http://acm.hust.edu.cn:8080/judge/contest/view.action?cid=8716#overview

http://acm.hust.edu.cn:8080/judge/contest/view.action?cid=8754#overview

http://acm.hust.edu.cn:8080/judge/contest/view.action?cid=8801#overview

However, we performed really bad in IUT contest, and not that we were not able to solve the problems, but, we failed to manage time and team co-ordination. Better luck next time :)

Friday, May 11, 2012

CSEDU Training 2012 Week 2

Problems from APIO 2007 were discussed and the solutions seem to be not that hard (well that happens whenever someone tells you the solutions). A few more data structure based solutions were discussed along with problems from last week's virtual contest.

Next day we will be discussing on APIO 2008 problems, and the problem from tonight's virtual contest. A famous problem from SPOJ is discussed (problem code: CHAIN). I have already seen this problem before but could not manage to solve it and all I got was numerous infamous WA (wrong answer). Shanto vai gave as a different type of simpler solution based on weighted union-find structure (really I heard this for the first time in my life today).

From next week, the classes will be held from 2:30PM instead of 4PM as now, and they will be extended up to 5:30PM hopefully.

Edit: A virtual contest has been arranged. Visit this link.

Thursday, May 3, 2012

CSEDU Training 2012 Week 1

Today, we discussed several data structure dependent problems from this page. Most of them are classical type, i.e. common problems and we found the solution ideas quite interesting.

Here is the summery:

#0: Lazy propagation and slicing (compressing search space) ideas in segment tree.

#1: Finding number of intersecting points among axis parallel line segments.

#2: Perimeter and area of union of rectangles.

#3: Finding points in number of rectangles and rectangles covering points.

#4: Sliding window ideas, finding maximal number of points covered by a rectangle placed arbitrarily.

-- and a few others.

After the discussion, we solved this problem to test if we were able to learn anything at all.

Hopefully, the next class, we will have a discussion on the problems on APIO:2007

Problems to be solved before the next class:

SUBSEQ
GSS1
GSS3
GSS5
INCSEQ
INCDSEQ
LIS2
LPERMUT
FREQUENT

All are from http://www.spoj.pl/

Edit: A virtual contest has been arranged. Visit this link.