Friday 22 June 2012

Activity Selection Problem

This particular problem of activity selection falls under the Greedy Algorithms with are basically known for making greedy choice from a set of possible solution and we choose the solution that seems to be the best at the moment .

We in Greedy Algorithms choose the best solution for the moment from a set of probable solutions and we do the feasibility check of the solution which basically means that we check  for the Feasibility of our selected solution that if it satisfy all the constraints a possible solution is supposed to satisfy.If the solution passes this check we remove it from the previous set of probable solutions and add it to the new set Optimal solution.



General characteristics of Greedy Algorithm:

     1. FEASIBLE:

      In feasible we check wether it satisfies all the possible problem constraints or not, so
      as to obtain atleast one solution to our problem.
      2. OPTIMAL CHOICE:
      In this the choice should be the optimum which is selected from the current
      available feasible choice.
      3. UNALTERABLE:
      Once the choice is made at any subsequent step, the choice is not altered.

One such problem is the Activity Selection Problem in which we try to schedule one shared resource over a set of competing activities. 

Aim: To select and maximize the size of set of mutually compatible activities.

"Mutually Compatible" -The term means that the activities do not overlap each other and hence compatible to each other.

The activities about which we are talking here have two main attributes:
1.Starting time (si)
2.Finish time.(fi)

where si <=fi
The activities are mutually compatible when their start and finish time do not overlap. That means non of the activities in our final set of solution overlap each other in terms  of these attributes.
Suppose that activity i and j are compatible then they must satisfy the following statements:

  • si>=fj  or
  • sj>=fi
so we can rightly say that the activity selection problem is all about selecting a maximum sized set of mutually compatible activities.


In the greedy implementation of this problem we first sort the activities in the increasing order and construct of solution set.

f1<=f2<=f3<=....<=fn

This sorting procedure and then arranging in a form of array take the O(log n) time .
The Greedy implementation of activity selection problem takes takes O(n) time in worst case  for scheduling a n activities.
This implementation always selects the activity with the earliest finish time that can be legally scheduled.Thus we can say that it makes a greedy choice of solution by leaving as much probability as possible for the rest of the activities to be scheduled.Thereby maximizes the unscheduled time remaining.

Example:
Scheduling 7 examination activities to be performed in a single hall .
#include<iostream>
#include<conio.h>
using namespace std;
void main()
{
 int act[3][7],i,j,temp,start,a[7],k=1,min;

for(i=0;i<7;i++)
 a[i]=0;
for(i=0;i<7;i++)
 act[0][i]=i+1;
for(j=0;j<7;j++)
 {cout<<"\nEnter the starting time of "<<j+1<<" activity:";
 cin>>act[1][j];
 cout<<"Enter the finishing time of "<<j+1<<" activity:";
 cin>>act[2][j];
 }
for(i=0;i<7;i++)
 {min=act[2][i];
 start=act[1][i];
 temp=i;
 for(j=i+1;j<7;j++)
  {if(act[2][j]<min)
   {min=act[2][j];
   start=act[1][j];
   temp=j;
   }
  }
 act[2][temp]=act[2][i];
 act[1][temp]=act[1][i];
 act[2][i]=min;
 act[1][i]=start;
 }
a[0]=1;
i=0;
for(int m=1;m<7;m++)
 {if(act[1][m]>=act[2][i])
  {a[k]=m+1;
  k++;
  i=m;
  }
 }
cout<<"\The activities that can be performed is:";
for(i=0;a[i]!=0;i++)
 cout<<a[i]<<" ";
getch(); 
}





Thursday 21 June 2012

Red Black Tree

This article is dedicated to red black trees and we would be discussing the red black trees under the following headings:

  • Properties
  • Rotations
  • Operations on Red-black Tree
  • Applications
  • Analysis 
  • Example
The pre-requisite for understanding the red black trees is to have knowledge about the binary search trees(BST).

A red black tree is "Self-Balancing" binary search tree . It has the same way of storing the the data in various nodes but we have one to add one extra bit of storage per node , which is used to store the information about the color of the node.

We call this tree "self balancing" because by containing the way the node's color property we try to balance our tree.

It also just like the binary search tree has left, right, parent and key  fields but in addition to that it has one more field , that is the color field.

why do we need Red black trees when we already have Binary search trees?

The answer to this question is: The red black trees having the additional color field helps to provide us with :
  • Better efficiency than Binary search tree(i.e O(n)) the efficiency of Red black trees is O(log n).
  • Insertion in case of red black trees is in place. This we mean that while inserting we just have to see that the tree thus formed does not violates any of the tree's properties.
  • The self balancing nature  of the red black trees.
  • Our aim to minimize the height of the tree to efficiently perform the searching operation is accomplished.
Properties:

  1. We can have node of only 2 colors i.e either Red or Black.
  2. Every leaf node is of the color Black.
  3. A red node can have only black children.
  4. Black height: Every path from a node to the leaf node must contain same number of black nodes.This is called as the black height of the node.
  5. The Root of the tree is always black.
  6. The new node inserted is always introduced as a Red node afterwards we do the balancing of our tree.
Rotations:

This is mainly done to preserve the property 3 and 4 of the Red- black tree.There are two types of rotation operations that can be performed in case of red-black trees they are :
  1. Left rotation.
  2. Right rotation.
In these we tend to preserve the inorder key ordering and we just change the pointers of the node.

Both the rotation operations run in O(1) time.
Operations on Red-black Tree:

In this part of this article we will discuss the 2 major operations on the Red Black trees:
  1. Insertion
  2. Deletion
  • Insertion:
Steps:
  1. Check that all the leaf nodes are black.
  2. Insert the new node as Red node.
  3. If the property 3 , that a red node can have only black children is violated we repaint the node and perform rotation accordingly.
  4. Check if the black height property is maintained. 
Cases:

  1.  If both the parent and the uncle of the current node are red but the grand parent is black, then both parent and uncle repainted black and the grandparent becomes red.
  2. If the current node and its parent is red but uncle is black also x is the right child of the parent. We perform the left rotation.
  3. If the current node and its parent is red but uncle is black also x is the left child of the parent. We perform the right rotation.

Deletion:

Cases:
  1. If the node to be deleted is a leaf node then just delete it.
  2. If the node to be deleted has just one child, replace it by that child.
  3. If the node to be deleted has 2 children , replace the value of the node with its inorder predecessor, then recursively delete the inorder predecessor. 
Applications:

  1.  Completely Fair Scheduler used in current Linux kernels uses red–black trees
  2. Functional programming
Analysis:
  1. Insertion: Due to inplace insertion of nodes in the tree we just have to check for the violation of the properties of the tree and then insert by means of implementing just one loop statement to iterate over the original tree of height O(logn) and hence the insertion time is just O(logn).
  2. Deletion: has the complexity of O(logn).




Tuesday 19 June 2012

String and Pattern Matching Algorithms

Hey everyone in this article we would discuss 4 main techniques of string and pattern matching. They are as follows:
  1. Naive algorithm
  2. Rabin Karp Algorithm 
  3. Finite State automata
  4. Knuth Morris & Pratt algorithm
along with that we would also learn about the various application, advantages and disadvantages of these algorithms.


first of all lets understand what String matching is.


In string matching we have three main things to which we deal and they are:
  1. Text in the form of array of length n (say) we name it T
  2. A pattern of length smaller than n , obviously since it would be contained in the array and we have to search for it, lets assume its length is m and hence m is smaller n. we name it P
  3. The shift in text unless we find our pattern.
all we have to do is search for the pattern contained in the text and until we find it have to shift by a particular length in the text.


so let's get started with it :)




  1. Naive algorithm:
Applications:

Used for text editing and locating files.

By text editing we mean searching for a particular string in a text. The recursive call to the procedure would do this job but it might be problem when we are dealing will huge data .

Brute Force algorithm:Naive Algorithm

Shift in text T by just one step if the pattern does not matches.

n<- Length of text
m<-Length of pattern
for s=0 to n-m 
if [T[s+1,s+2,....,s+m]=P[1...m]
then print the pattern occurs at shift s
end loop

Analysis:

The time for which the loop executes  O((n-m+m)m).  This efficiency is O(n^2) in worst case when we have the value of m=n/2.

A good example of this would be a situation when we have the following:

T: abcabaabcabac
P:abaa

Drawbacks:
  • Information gained about the text for one value of s is totally ignored in the next value of it so it very inefficient.
  • can not be used for huge data  like in case of database etc. 
Advantage :

It is used by the next algorithm we are going to discuss , Rabin Karp Algorithm.

2. Rabin Karp Algorithm:

This algorithm generalizes to other algorithms for related problem, such as 2-D pattern matching.
It was an improvement over the Naive algorithm .

The basic idea: was to before spending a lot of time comparing characters for a match we do some pre-processing of the text and the pattern in order to eliminate some locations in the text. Then we could quickly run our Naive algorithm on what is left . 

We use a hash function to match the value of the patter and the text for one value of shift s.

we go for Naive algorithm after matching the hash value the patter . 
The reason for applying Naive after matching the hash value is to explicitly verify every valid shift. Also because in some cases it is possible that two different strings may give same hash value ,this dependent upon the hash function we used.


Analysis :
  • The running time of this algorithm is Θ((n-m+1)m) in the worst case but it improves on the average running time.
  • Its aim is to speed up the search process using by using hash function .
  • The algorithm says that two strings are equal if their hash values are equal.
Drawbacks:
  • Calculating the hash value again and again for every value of shift s is very tedious especially in case we are dealing with huge data.
  • In some cases it is possible that two different strings may give same hash value ,although this dependent upon the hash function we used.
The remedy for the above given drawback is the use of a mathematical rule called as the Horner's rule.

By this rule we compute p in time O(m).


Advantage:

  • By the use of the Horner Rule we try to make the hash of the next string based on the previous string . That means we do not ignore the information gained for one value of shift s in the next value for it. 

3.Finite state automata

In this method of string matching we create an automata that scans the text to check for all the occurrences of the pattern .


An automaton represented M has 5-tuple (Q,Σ,δ,q0,F), where:
  • Q is a finite set of states.
  • Σ is a finite set of symbols, called the alphabet of the automaton.
  • δ is the transition function, that is, δ: Q × Σ → Q.
  • q0 is the start state, that is, the state of the automaton before any input has been processed, where q0∈ Q.
  • F is a set of states of Q (i.e. F⊆Q) called accept states.

We begin with the state and read the characters of its input string one at a time and its makes transitions from one state to another.
It accepts a string if the current q0 read state is a member of F. otherwise it is said to be rejected. 

Advantage:
It is very efficient as it examines each text character exactly one, and takes contant time per text character.

Efficiency:
The time used  after the automata is built- is just  Θ(n)

4. Knuth-Morris-Pratt Algorithm :


we do string matching in linear time .


Analysis :


The running time of this algorithm is (n+m) . It avoid the the computation of the transition function as in case finite state automata and does the pattern matching by using the auxiliary function pie[1...m]. Which is pre computed .
from the pattern in time O(m).


It uses 2 functions:


  • Prefix function / failure function: It is used to store the knowledge about portion of the patter that has matched and the portion that has unmatched against shift of itself.
  • Transition Function: calculated from the auxiliary function.

Saturday 16 June 2012

Branch and Bound Algorithms



Backtracking is to cut off a branch of the problem‘s state-space tree as soon as we can deduce that it cannot lead to a solution.This idea can be strengthened further if we deal with an optimization problem, one that seeks to minimize or maximize an objective function, usually subject to some constraints.Note that in the standard terminology of optimization problems, a feasible solution is a point in the problem‘s search space that satisfies all the problem‘s constraints (e.g. a Hamiltonian circuit in the traveling salesman problem, a subset of items whose total weight does not exceed the knapsack‘s capacity),while an optimal solution is a feasible solution with the best value of the objective function (e.g., the shortest Hamiltonian circuit, the most valuable subset of items that fit the
knapsack).


Compared to backtracking, branch-and-bound requires two additional items.A way to provide, for every node of a state-space tree, a bound on the best value of the objective functions on any solution that can be obtained by adding further components to the partial solution represented by the node .


GENERAL METHOD


If this information is available, we can compare a node‘s bound value with the value of the best solution seen so far:
if the bound value is not better than the best solution seen so far—i.e., not smaller for a minimization problem and not larger for a maximization problem—the node is nonpromising and can be terminated (some people say the branch is pruned) because no solution obtained from it can yield a better solution than the one already available.


E.g. Termination of search path
In general, we terminate a search path at the current node in a state-space tree of a branch-and-bound algorithm for any one of the following three reasons:
1. The value of the node‘s bound is not better than the value of the best solution seen so far.
2. The node represents no feasible solutions because the constraints of the problem are already violated.
3. The subset of feasible solutions represented by the node consists of a single point (and hence no further choices can be made)—in this case we compare the value of the objective function for this feasible solution with that of the best solution seen so far and update the latter with the former if the new solution is better.



Backtracking algorithms



Backtracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c ("backtracks") as soon as it determines that c cannot possibly be completed to a valid solution.
Backtracking can be applied only for problems which admit the concept of a "partial candidate solution" and a relatively quick test of whether it can possibly be completed to a valid solution. It is useless, for example, for locating a given value in an unordered table. When it is applicable, however, backtracking is often much faster than brute force enumeration of all complete candidates, since it can eliminate a large number of candidates with a single test.
  • It is one of the most general algorithm design techniques.
  • Many problems which deal with searching for a set of solutions or for a optimal solution satisfying some constraints can be solved using the backtracking formulation. 
  • To apply backtracking method, tne desired solution must be expressible as an n-tuple (x1…xn) where xi is chosen from some finite set Si. 
  • The problem is to find a vector, which maximizes or minimizes a criterion function P(x1….xn). 
  • The major advantage of this method is, once we know that a partial vector (x1,…xi)  will not lead to an optimal solution that (mi+1………..mn) possible test vectors may be ignored entirely. 
  • Many problems solved using backtracking require that all the solutions satisfy a complex set of constraints. 
  • These constraints are classified as: 
  • Explicit constraints.                              
  • Implicit constraints.



1)      Explicit constraints
Explicit constraints are rules that restrict each Xi to take values only from a given set.
       
    • All tupules that satisfy the explicit constraint define a possible solution space for I. 
2)      Implicit constraints:
The implicit constraint determines which of the tuples in the solution space I can actually satisfy the criterion functions.

Algorithm:

Algorithm IBacktracking (n)
// This schema describes the backtracking procedure .All solutions are generated in X[1:n]
//and printed as soon as they are determined.
 {
    k=1;
    While (k0) do
    {
       if (there remains all untried
       X[k]  T (X[1],[2],…..X[k-1]) and Bk (X[1],…..X[k])) is true ) then
      {
         if(X[1],……X[k] )is the path to the answer node)
        Then write(X[1:k]);
        k=k+1;                 //consider the next step.
     }
  else k=k-1;                      //consider backtracking to the previous set.
 }
}

  • All solutions are generated in X[1:n] and printed as soon as they are determined. 
  • T(X[1]…..X[k-1]) is all possible values of X[k] gives that X[1],……..X[k-1] have already been chosen. 
  • Bk(X[1]………X[k]) is a boundary function which determines the elements of X[k] which satisfies the implicit constraint. 
  • Certain problems which are solved using backtracking method are,
                                       
                     1. Sum of subsets.
                     2. Graph coloring.
                     3. Hamiltonian cycle.
                     4. N-Queens problem. 
  
STATE-SPACE TREE

It is convenient to implement this kind of processing by constructing a tree of choices being
made, called the state-space tree. Its root represents an initial state before the search for a
solution begins. The nodes of the first level in the tree represent the choices made for the
first component of a solution; the nodes of the second level represent the choices for the
second component, and so on.

A node in a state-space tree is said to be promising if it corresponds to a partially
constructed solution that may still lead to a complete solution; otherwise, it is called
nonpromising.

Leaves represent either nonpromising dead ends or complete solutions found by the
algorithm.

GENERATION OF STATE SPACE TREE:

  • Maintain an array X to represent all elements in the set.

  • The value of Xi indicates whether the weight Wi is included or not.

  • Sum is initialized to 0 i.e., s=0.

  • We have to check starting from the first node.

  • Assign X(k)<-  1.

  • If S+X(k)=M then we print the subset b’coz the sum is the required output.

  • If the above condition is not satisfied then we have to check S+X(k)+W(k+1)<=M. If so, we have to generate the left sub tree. It means W(t) can be included so the sum will be incremented and we have to check for the next k.

  • After generating the left sub tree we have to generate the right sub tree, for this we have to check S+W(k+1)<=M.B’coz W(k) is omitted and W(k+1) has to be selected.

  • Repeat the process and find all the possible combinations of the subset.

 Algorithm:

Algorithm sumofsubset(s,k,r)
{
//generate the left child. note s+w(k)<=M since Bk-1 is true.
X{k]=1;
If (S+W[k]=m) then write(X[1:k]); // there is no recursive call here as W[j]>0,1<=j<=n.
Else if (S+W[k]+W[k+1]<=m) then sum of sub (S+W[k], k+1,r- W[k]);
//generate right child and evaluate Bk.
If ((S+ r- W[k]>=m)and(S+ W[k+1]<=m)) then
{
   X{k]=0;
   sum of sub (S, k+1, r- W[k]);
}
}



SUBSET-SUM PROBLEM

Subset-Sum Problem is finding a subset of a given set S = {s1,s2….sn} of n positive
integers whose sum is equal to a given positive integer d.
For example, for S = {1, 2, 5, 6, 8) and d = 9, there are two solutions: {1, 2, 6} and {1, 8}.
Of course, some instances of this problem may have no solutions.
It is convenient to sort the set‘s elements in increasing order. So we will assume that
s1 ≤ s2 ≤ ……. ≤ sn
  • The state-space tree can be constructed as a binary tree as that in the following figure for the instances S = (3, 5, 6, 7) and d = 15.
  • The root of the tree represents the starting point, with no decisions about the given elements made as yet.
  • Its left and right children represent, respectively, inclusion and exclusion ofs1 in a set being sought.
  • Similarly, going to the left from a node of the first level corresponds to inclusion of s2, while going to the right corresponds to its exclusion, and soon.
  • Thus, a path from the root to a node on the ith level of the tree indicates which of the first i numbers have been included in the subsets represented by that node.
  • We record the value of s‘ the sum of these numbers, in the node, Ifs is equal to d. we have a solution to the problem.
  • We can either, report this result and stop or, if all the solutions need to he found, continue by backtracking to the node‘s parent.
  • If s‘ is not equal to d, we can terminate the node as non promising if either of the two inequalities holds:

HAMILTONIAN CIRCUIT PROBLEM

Procedure:

1.  Define a solution vector X(Xi……..Xn) where Xi represents the I th  visited vertex of the proposed cycle.

2.  Create a cost adjacency matrix for the given graph.

3. The solution array initialized to all zeros except X(1)=1,b’coz the cycle should start at vertex ‘1’.

4.Now we have to find the second vertex to be visited in the cycle.
5. The vertex from 1 to n are included in the cycle one by one by checking 2 conditions,
            1.There should be a path from previous visited vertex to current vertex.
            2.The current vertex must be distinct and should not have been visited earlier.

6.When these two conditions are satisfied the current vertex is included in the cycle, else the  next vertex is tried.

7.When the nth vertex is visited we have to check, is there any path from nth vertex to first  8vertex. if no path, the go back one step and after the previous visited node.

8.Repeat the above steps to generate possible Hamiltonian cycle.

Algorithm:(Finding all Hamiltonian cycle)

Algorithm Hamiltonian (k)
{
 Loop
            Next value (k)
If (x (k)=0) then return;
{
  If k=n then
      Print (x)
Else
Hamiltonian (k+1);
End if

}
Repeat
}

Algorithm Nextvalue (k)
{
 Repeat
{
  X [k]=(X [k]+1) mod (n+1); //next vertex
  If (X [k]=0) then return;
  If (G [X [k-1], X [k]] 0) then
{
  For j=1 to k-1 do if (X [j]=X [k]) then break;
  // Check for distinction.
  If (j=k) then          //if true then the vertex is distinct.
    If ((k<n) or ((k=n) and G [X [n], X [1]]  0)) then return;
}
} Until (false);
}






8-QUEENS PROBLEM:

This 8 queens problem is to place n-queens in an ‘N*N’ matrix in such a way that no two queens attack each otherwise no two queens should be in the same row, column, diagonal.

Solution:

  • The solution vector X (X1…Xn) represents a solution in which Xi is the column of the  th row where I th queen is placed.
  • First, we have to check no two queens are in same row.
  • Second, we have to check no two queens are in same column.
  • The function, which is used to check these two conditions, is [I, X (j)], which gives position of the I th queen, where I represents the row and X (j) represents the column position.
  • Third, we have to check no two queens are in it diagonal.
  • Consider two dimensional array A[1:n,1:n] in which we  observe that every element on the same diagonal that runs from upper left to lower right has the same value.
  • Also, every element on the same diagonal that runs from lower right to upper left has the same value.
  • Suppose two queens are in same position (i,j) and (k,l) then two queens lie on the same diagonal , if and only if |j-l|=|I-k|.

STEPS TO GENERATE THE SOLUTION:

  • Initialize x array to zero and start by placing the first queen in k=1 in the first row.
  • To find the column position start from value 1 to n, where ‘n’ is the no. Of columns or no. Of queens.
  • If k=1 then x (k)=1.so (k,x(k)) will give the position of the k th queen. Here we have to check whether there is any queen in the same column or diagonal.
  • For this considers the previous position, which had already, been found out. Check whether
    X (I)=X(k) for column |X(i)-X(k)|=(I-k) for the same diagonal.

  •  If any one of the conditions is true then return false indicating that k th queen can’t be placed in position X (k).
  • For not possible condition increment X (k) value by one and precede   d until the position is found.
  • If the position X (k) n and k=n then the solution is generated completely.
  • If k<n, then increment the ‘k’ value and find position of the next queen.
  • If the position X (k)>n then k th queen cannot be placed as the size of the matrix is ‘N*N’.
  • So decrements the ‘k’ value by one i.e. we have to back track and after the position of the previous queen.

Algorithm:
Algorithm place (k,I)
//return true if a queen can be placed in k th row and I th column. otherwise it returns //
//false .X[] is a global array whose first k-1 values have been set. Abs® returns the //absolute value of r.
{
  For j=1 to k-1 do
     If ((X [j]=I)              //two in same column.
     Or (abs (X [j]-I)=Abs (j-k)))
Then return false;
Return true;
}

Algorithm Nqueen (k,n)
//using backtracking it prints all possible positions of n queens in ‘n*n’ chessboard. So
 //that they are non-tracking.
{
     For I=1 to n do
         {
            If place (k,I) then
              {
                 X [k]=I;
                  If (k=n) then write (X [1:n]);
                   Else nquenns(k+1,n)    ;
           }
      }
}