Data Structure and Algorithm Analysis (H) Final Review Note

44 minute read

Published:

1 Sorting Algorithms

1 Sorting Algorithms

1.1 Sorting Based on Comparison

1.1.1 Insertion Sort

Inserting sort maintains a subarray of sorted element. At each iteration, it inserts a new element into the subarray on the correct position.

Runtime: Ω(n), O(n2)

Correctness: Proof by loop invariant: at the start of each iteration of the for loop, the subarray A[1..j-1] consists of the element originally in A[1..j-1], but in sorted order.

Inplace: Yes

Stable: Yes

Insertion-Sort(A)
1. for j=2 to A.length
2.key=A[j]
3.i=j-1
4.while i>0 and A[i]>key
5.  A[i+1]=A[i]
6.  i=i-1
7.A[i+1]=key

1.1.2 Selection Sort

Selection sort maintains a subarray of sorted element. At each iteration, it selects the smallest element in the unsorted subarray, and swap it with the first element in the unsorted subarray.

Runtime: Θ(n2)

Correctness: Proof by loop invariant: at the start of each iteration of the for loop, the subarray A[1..i-1] consists of the i-1 smallest elements.

Inplace: Yes

Stable: No (example: [2a,2b,1])

Selection-Sort(A)
1. for i=1 to A.length-1
2.min=i
3.for j=i+1 to A.length
4.  if A[j]<A[min]
5.   min=j
6. exchange A[i] with A[min]

1.1.3 Merge Sort

Merge sort divides the array into two parts, sort each part recursively, and then merge the two sorted parts.

Runtime: Θ(nlogn)

Correctness:

  • Merge: By loop invariant: At the start of each iteration of the for loop, the subarray A[p..k-1] contains the k-p smallest elements of L[1..n1+1] and R[1..n2+1], in sorted order.

  • Merge sort: By induction: The two smaller subarrays are sorted correctly with induction hypothesis, and the merge step merges the two subarrays correctly.

Inplace: No

Stable: Yes

Merge(A,p,q,r)
1.n1=q-p+1
2.n2=r-q
3.let L[1n1+1] and R[1n2+1] be new arrays
4. for i=1 to n1
5.L[i]=A[p+i-1]
6. for j=1 to n2
7.R[j]=A[q+j]
8.L[n1+1]=
9.R[n2+1]=
10.i=1
11.j=1
12. for k=p to r
13.if L[i]R[j]
14.  A[k]=L[i]
15.  i=i+1
16.else
17.  A[k]=R[j]
18.  j=j+1
Merge-Sort(A,p,r)
1. if p<r
2.q=(p+r)/2
3.Merge-Sort(A,p,q)
4.Merge-Sort(A,q+1,r)
5.Merge(A,p,q,r)

1.1.4 Heap Sort

A heap is a binary tree with the following properties (assume the array starts from index 1):

  1. 1.

    A[Parent(i)]=i/2.

  2. 2.

    A[Left(i)]=2i.

  3. 3.

    A[Right(i)]=2i+1.

  4. 4.

    Max-heap property: A[Parent(i)]A[i].

  5. 5.

    Min-heap property: A[Parent(i)]A[i].

Max-heapify:

Assume the binary trees rooted at Left(i) and Right(i) are max-heaps, but max-heap property may be violated at node i. We let the value at node i float down, so that the subtree rooted at node i becomes a max-heap.

Runtime: O(logn) (height of the tree)

Correctness: By induction: Assume max-heapify works for h=i-1. Now we consider h=i. After we swap the value at node i with the value at node largest, this maintains the max-heap property for node i. Now one of the subtrees rooted at Left(i) or Right(i) may violate the max-heap property. By induction hypothesis, we know that we can call max-heapify to make the subtree rooted at Left(i) or Right(i) a max-heap.

Max-Heapify(A,i)
1.l=Left(i)
2.r=Right(i)
3. if lA.heap-size and A[l]>A[i]
4.largest=l
5. else largest=i
6. if rA.heap-size and A[r]>A[largest]
7.largest=r
8. if largesti
9. exchange A[i] with A[largest]
10.Max-Heapify(A,largest)

Build-max-heap:

Build-max-heap is used to build a max-heap from an unordered array. We can only do bottom-up, since we need to maintain the max-heap property for all nodes.

Runtime: O(n) The maximum number of nodes of height h is n/2h+1. Hence, the total runtime is h=0lognn/2h+1O(h)=O(n).

Correctness: By loop invariant: At the start of each iteration of the for loop, each node i+1,i+2,,n have the max-heap property.

Build-Max-Heap(A)
1.A.heap-size=A.length
2. for i=A.length/2 downto 1
3.Max-Heapify(A,i)

Heap-sort:

We first build a max-heap from the array. Then we can extract the maximum element from the heap, and put it at the end of the array. Now we exchange the first element with the last element, decrease the heap size by 1, and call max-heapify to maintain the max-heap property.

Runtime: Θ(nlogn)

Correctness: By loop invariant: At the start of each iteration of the for loop, the subarray A[1..i] is a max-heap that contains the first i smallest elements, and the subarray A[i+1..n] contains the n-i largest elements.

Inplace: Yes

Stable: No (example: [1a,1b])

Heap-Sort(A)
1.Build-Max-Heap(A)
2. for i=A.length downto 2
3. exchange A[1] with A[i]
4.A.heap-size=A.heap-size-1
5.Max-Heapify(A,1)

1.1.5 Priority Queue

A priority queue is very similar to a heap, but with the following operations:

  1. 1.

    Insert(Q,x): insert x into Q.

  2. 2.

    Maximum(Q): return the element with the largest key.

  3. 3.

    Extract-Max(Q): remove and return the element with the largest key.

  4. 4.

    Increase-Key(Q,x,k): increase the key of x to k.

The only new operation is Increase-Key, which can be implemented by “bubbling up” the element. Its runtime is O(logn).

1.1.6 Quick Sort

Partition:

Partition is used to partition an array into two parts, such that all elements in the first part are smaller than the pivot, and all elements in the second part are larger than the pivot.

Runtime: Worst case: O(n2), Best case: O(nlogn), Average case: O(nlogn)

Correctness: By loop invariant: At the start of each iteration of the for loop, the subarray A[p..j-1] consists of elements smaller than the pivot, and the subarray A[j+1..r] consists of elements larger than the pivot.

Partition(A,p,r)
1.x=A[r]
2.i=p-1
3. for j=p to r-1
4.if A[j]x
5.  i=i+1
6.  exchange A[i] with A[j]
7.exchange A[i+1] with A[r]
8. return i+1
Randomized-Partition(A,p,r)
1.i=Random(p,r)
2.exchange A[r] with A[i]
3. return Partition(A,p,r)

However, there are some drawbacks of this implementation: this partition algorithm behaves poorly if the input array contains many equal elements.

Quick sort:

Quick sort divides the array into two parts, and sort each part recursively. It is very similar to merge sort, but it does not need to combine the two parts.

Runtime: Worst case: O(n2), Best case: O(nlogn), Average case: O(nlogn)

Correctness: Similar to merge sort. By induction, the two smaller subarrays are sorted correctly with induction hypothesis, and the partition step partitions the array correctly.

Inplace: Yes

Stable: No

Quick-Sort(A,p,r)
1. if p<r
2.q=Partition(A,p,r)
3.Quick-Sort(A,p,q-1)
4.Quick-Sort(A,q+1,r)

The randomized version is omitted here.

Randomization:

We can estimate the runtime of quick sort by the expected number of comparisons.

For a partition with fixed ratio, we can prove that the expected number of comparisons is always O(nlogn). Suppose the ratio is 1:9, then we can draw a tree for the partition. We can find at each level, if on both side all nodes exists, the number of comparisons is n. And the maximum number of levels is log10/9n. In total, the number of comparisons is O(nlogn).

The expected number of comparisons for randomized quick sort can be calculated as follows:

Rename the array as Z1,Z2,,Zn, where Z1<Z2<<Zn. Then the possibility that Zi is compared with Zj is 2j-i+1 (suppose i<j).

Hence, the expected number of comparisons is:

i=1n-1j=i+1n2j-i+1=i=1n-1k=1n-i2k+1i=1n-1k=1n-i2k2nk=2n1k2nlnn

1.1.7 Lower Bound for Sorting Based on Comparison

We can use decision tree to prove the lower bound for sorting.

Let the input size be n, suppose the input is a permutation of 1,2,,n, there are n! possible orders of the input. Since each input must correspond to a leaf in the decision tree, the number of leaves is n!. Hence, the height of the decision tree is Ω(logn!).

The lower bound of worst case runtime is the length of the longest path in the decision tree, which is the height of the decision tree. By Stirling’s approximation, logn!=Ω(nlogn).

1.2 Sorting Based on Counting

1.2.1 Counting Sort

Counting Sort assumes that each of the n input elements is an integer in the range 0 to k, for some integer k. It works by counting the number of elements of each value.

Runtime: Θ(n+k)

Correctness: By loop invariant: At the start of each iteration of the for loop, the last element in A with value i that has not yet been copied to B belongs to B[C[i]].

Inplace: No

Stable: Yes

Counting-Sort(A,B,k)
1.let C[0..k] be a new array
2. for i=0 to k
3.C[i]=0
4. for j=1 to A.length
5.C[A[j]]=C[A[j]]+1
6. for i=1 to k
7.C[i]=C[i]+C[i-1]
8. for j=A.length downto 1
9.B[C[A[j]]]=A[j]
10.C[A[j]]=C[A[j]]-1

1.2.2 Radix Sort

Radix Sort sorts the elements’ digit by digit.

Runtime: Θ(d(n+k)), where d is the number of digits.

Correctness: At each iteration of the for loop, the elements are sorted on the last i-1 digits.

Inplace: No

Stable: Yes

Radix-Sort(A,d)
1. for i=1 to d
2. use a stable sort to sort array A on digit i

Attention: radix sort begins from the least significant digit.

When d is a constant and k=O(n), radix sort runs in Θ(n). In more general cases, we can decide how to choose d and k to make the radix sort faster.

Without loss of generality, we assume the number is in base 2. The for a b bit number, we can divide it into b/d groups, each group has d bits. Then we can use counting sort to sort each group. Hence, the runtime is Θ(b/d(n+2d)).

When b<logn, we can choose d=b. Then the runtime is Θ(bb(n+2b))=Θ(n).

When blogn, we can choose d=logn. Then the runtime is Θ(blogn(n+2logn))=Θ(bnlogn).

2 Elementary Data Structures

2.1 Stack

In a stack, only the top element can be accessed. It follows the Last In First Out (LIFO) principle.

Its operations are:

  1. 1.

    Push(S,x): insert x into S.

  2. 2.

    Pop(S): remove and return the top element of S.

Attention: S.top points at the top element of the stack, not an empty element. We initialize S.top=0.

2.2 Queue

In a queue, elements are inserted at the tail, and removed from the head. A queue follows the First In First Out (FIFO) principle.

Its operations are:

  1. 1.

    Enqueue(Q,x): insert x into Q.

  2. 2.

    Dequeue(Q): remove and return the head element of Q.

Attention: Q.head points at the head element of the queue, but Q.tail points at the empty element after the tail element. We initialize Q.head=Q.tail=1. It should be noted that a queue implemented in this way with an array of size n can only store n-1 elements. Since if it stores n elements, then Q.head=Q.tail, which means we cannot distinguish between an empty queue and a full queue.

2.3 Linked List

In a linked list, every element contains two pointers: one points at the previous element, and the other points at the next element. It is also called a doubly linked list.

Linked list makes it easy to insert or delete an element. Here are the pseudocode for insertion and deletion.

List-Insert(L,x)
1.x.next=L.head
2. if L.headNIL
3.L.head.prev=x
4.L.head=x
5.x.prev=NIL
List-Delete(L,x)
1. if x.prevNIL
2.x.prev.next=x.next
3. else L.head=x.next
4. if x.nextNIL
5.x.next.prev=x.prev

3 Binary Search Trees

3.1 General Binary Search Tree

A binary search tree is a binary tree in which for each node x, the values of all the keys in the left subtree of x are smaller than the key of x, and the values of all the keys in the right subtree of x are larger than the key of x.

Here are some properties of a binary search tree:

  • The depth of a node x is the number of edges on the simple path from the root to x. (Top Down)

  • A level of a tree is all nodes at the same depth.

  • The height of a node x is the number of edges on the longest simple downward path from x to a leaf. (Bottom Up)

  • The height of a tree is the height of the root.

  • A binary search tree is full if every node has either zero or two children.

  • A binary search tree is complete if it is full and all leaves are at the same level.

3.1.1 Successor

The successor of a node x is the node with the smallest key greater than x.key. If a node has a right subtree, then its successor is the leftmost node in its right subtree. If it does not, then its successor is the lowest ancestor of x whose left child is also an ancestor of x. Or to say, to find the first “right parent” of x. Notice how this algorithm produce NIL if x is the largest node.

Tree-Successor(x)
1. if x.rightNIL
2.return Tree-Minimum(x.right)
3.y=x.p
4. while yNIL and x=y.right
5.x=y
6.y=y.p
7. return y

The predecessor can be found similarly.

3.1.2 Insert

Insertion is simple, we do binary search to find a position, and then simply append the new node. Notice that we want to keep record of the parent of current node, so that we can find the new node’s parent.

Tree-Insert(T,z)
1.x=T.root
2.y=NIL
3. while xNIL
4.y=x
5.if z.key<x.key
6.  x=x.left
7.else x=x.right
8.z.p=y
9. if y=NIL
10.T.root=z
11. else if z.key<y.key
12.y.left=z
13. else y.right=z

3.1.3 Delete

Deletion is more complicated. We need to consider multiple cases:

  1. 1.

    If z is a leaf, we can simply remove it.

  2. 2.

    If z has only one child, we replace z with its child.

  3. 3.

    If z has two children, we replace z with its successor. Why? Since the right subtree is not empty, the successor must be in the right subtree. The successor cannot have a left child, otherwise the left child will be the successor. Then we can replace z with its successor, and then delete the successor (no left child makes it easy to delete). Since the successor has the smallest key that is larger than z.key, it is larger than all keys in the left subtree of z, and smaller than all keys in the right subtree of z. Then the binary search tree property is maintained.

Tree-Delete(T,z)
1. if z.left=NIL
2.Transplant(T,z,z.right)
3. else if z.right=NIL
4.Transplant(T,z,z.left)
5.else
6.y=Tree-Minimum(z.right)
7.if y.pz
8.  Transplant(T,y,y.right)
9.  y.right=z.right
10.  y.right.p=y
11.Transplant(T,z,y)
12.y.left=z.left
13.y.left.p=y

3.2 Tree Walk

There are three ways to walk a tree:

  1. 1.

    Preorder: root, left, right.

  2. 2.

    Inorder: left, root, right.

  3. 3.

    Postorder: left, right, root.

Since it takes Θ(1) time to process each node, the runtime is Θ(n).

Inorder-Tree-Walk(x)
1. if xNIL
2.Inorder-Tree-Walk(x.left)
3. print x.key
4.Inorder-Tree-Walk(x.right)
Preorder-Tree-Walk(x)
1. if xNIL
2. print x.key
3.Preorder-Tree-Walk(x.left)
4.Preorder-Tree-Walk(x.right)
Postorder-Tree-Walk(x)
1. if xNIL
2.Postorder-Tree-Walk(x.left)
3.Postorder-Tree-Walk(x.right)
4. print x.key

3.3 AVL Trees

A binary search tree can be unbalanced and degenerate into a linked list, which makes it inefficient. An AVL tree is a self-balancing binary search tree. It requires that the heights of the two child subtrees of any node differ by at most 1. This can be implemented by adding a balance factor to each node.

An AVL tree with n nodes has at most height 1.44logn, which is pretty close to the optimal height logn.

3.3.1 Minimum Number of Nodes in AVL Tree

Consider the minimal number of nodes in an AVL tree of height h. We have:

N(h)=N(h-1)+N(h-2)+1

This can be interpreted as: to find the minimal tree of height h, we split the AVL tree into three parts: the root, the left subtree of height h-1, and the right subtree of height h-2. For the root, it satisfies the AVL property, since the height of the left subtree and the right subtree differ by 1. For the two subtrees, to make the number of nodes minimal, they must be the minimal AVL trees of height h-1 and h-2.

This number is related to the Fibonacci sequence, and we have N(h)=Fh+2-1 (assume the Fibonacci sequence starts with F0=F1=1).

3.3.2 Insertion

Runtime: O(logn).

Firstly, we need to define the balance factor of a node x:

bal(x)=height(x.left)-height(x.right)

Without loss of generality, we assume the insertion inserts a node into the right subtree. We use v to denote the node on the path from the inserted node to the root, and x to denote the right child of v. Here are the cases:

Case 1: bal(v)=1. After insertion, bal(v)=0. The height of v does not change, so we can stop.

Case 2: bal(v)=0. After insertion, bal(v)=-1. The height of v increases by 1, so we need to check the parent of v, so we run the algorithm recursively.

Case 3: bal(v)=-1. After insertion, bal(v)=-2. We need to do a rotation to make v balanced.

Subcase 3.1: The inserted node is in the right subtree of x. We do a left rotation on v. After the rotation, v becomes the left child of x. We then update the balance factor of v and x: bal(v)=0, bal(x)=0. Since the height of v does not change, we can stop.

Subcase 3.2: The inserted node is in the left subtree of x. Let w denote the left child of x. We can find v, x, and w form a zig-zag pattern. We do a right rotation on x, and then a left rotation on v. After the rotation, v becomes the left child of w, and x becomes the right child of w. We then update the balance factor of v, x, and w:

  • If bal(w) was 0, then bal(v)=0, bal(x)=0, bal(w)=0.

  • If bal(w) was 1, then bal(v)=0, bal(x)=-1, bal(w)=0.

  • If bal(w) was -1, then bal(v)=1, bal(x)=0, bal(w)=0.

3.3.3 Deletion

Runtime: O(logn).

Deletion is similar to insertion.

Without loss of generality, we assume the deletion deletes a node from the left subtree. We also need to consider three cases:

Case 1: bal(v)=1. After deletion, bal(v)=0. The height of v decreases by 1, so we need to check the parent of v, so we run the algorithm recursively.

Case 2: bal(v)=0. After deletion, bal(v)=1. The height of v does not change, so we can stop.

Case 3: bal(v)=-1. After deletion, bal(v)=-2. We need to do a rotation to make v balanced.

Subcase 3.1: bal(x){0,-1}. The deleted node is in the left subtree of v. We do a left rotation on v. After the rotation, v becomes the left child of x. It depends on the balance factor of x to decide whether we can stop.

Subcase 3.2: bal(x)=1. The deleted node is in the left subtree of v. Let w denote the left child of x. We do a right rotation on x, and then a left rotation on v. After the rotation, v becomes the left child of w, and x becomes the right child of w. The height of v is decreased by 1, so we run the algorithm recursively.

It should be noted that if the deleted operation involves finding the successor, then we need to update the balance factor from the successor’s parent.

4 Dynamic Programming

When we want to solve a problem recursively, we may find that some subproblems are solved repeatedly. For example in the Fibonacci sequence, many numbers are calculated repeatedly. So instead of solving the subproblems repeatedly, we can store the solutions in a table, and then use the table to solve the problem.

To implement dynamic programming, we often need the problem to be optimal substructure and overlapping subproblems.

  • Optimal substructure: The solution to the subproblem used within the optimal solution must be optimal.

  • Overlapping subproblems: the problem can be broken down into subproblems which are reused several times.

4.1 Rod Cutting

Given a rod of length n and a table of prices pi for the rod of length i, we want to find the maximum total revenue rn for the rod. Let ri denote the maximum total revenue for the rod of length i. Then we have the Bellman equation:

rn=max1in(pi+rn-i)

It should be noted that there is another Bellman equation, but we will not discuss it here:

rn=max(pn,r1+rn-1,r2+rn-2,,rn-1+r1)

4.1.1 Different Implementations

Recursive Implementation

Cut-Rod(p,n)
1. if n=0
2.return 0
3.q=-
4. for i=1 to n
5.q=max(q,pi+Cut-Rod(p,n-i))
6. return q

Runtime: Let T(n) denote the number of times Cut-Rod is called. Then we have T(n)=1+i=1nT(n-i). Solve this recurrence relation, we have T(n)=2n.

Bottom Up Implementation

Bottom-Up-Cut-Rod(p,n)
1.r0=0
2. for j=1 to n
3.q=-
4.for i=1 to j
5.  q=max(q,p[i]+r[j-i])
6.r[j]=q
7. return r[n]

In dynamic programming, correctly initializing the table is very important. In this case, we initialize r0=0.

Runtime: Θ(n2).

Top Down Implementation

This is also called memoization, we first check whether the subproblem has been solved, and if so, we return the solution.

Memoized-Cut-Rod(p,n)
1. if rn0
2.return rn
3.q=-
4. for i=1 to n
5.q=max(q,pi+Memoized-Cut-Rod(p,n-i))
6.rn=q
7. return q

Runtime: Θ(n2).

It is worth noting that although in this particular case, the bottom up implementation is as fast as the top down implementation, in general, the memoization method is faster than the bottom up method. This is because the memoization method only solves the subproblems that are needed, while the bottom up method solves all subproblems.

4.2 Subproblem Graph

We can draw a graph to show the subproblems and their dependencies. For example, in the rod cutting problem, we can draw a graph as follows:

When we estimate the runtime of a dynamic programming algorithm, it is often linear in the number of vertices and edges in the subproblem graph.

4.3 Reconstruct The Solution

In the rod cutting problem, we want to find the optimal solution, not just the optimal revenue. We can use an auxiliary array s to store the optimal size of the first piece to cut off. Then we can reconstruct the solution by tracing back the array s.

For the bottom up implementation, we can implement a Extended-Bottom-Up-Cut-Rod to return both the optimal revenue and the auxiliary array s.

Extended-Bottom-Up-Cut-Rod(p,n)
1.r0=0
2. for j=1 to n
3.q=-
4.for i=1 to j
5.  if q<pi+rj-i
6.   q=pi+rj-i
7.   sj=i
8.rj=q
9. return rn,s

Then we can print the solution by:

Print-Cut-Rod-Solution(p,n)
1.(r,s)=Extended-Bottom-Up-Cut-Rod(p,n)
2. while n>0
3. print s[n]
4.n=n-s[n]

5 Greedy Algorithms

A greedy algorithm always makes the choice that looks best at the moment. A greedy algorithm always have a corresponding dynamic programming algorithm, but the converse is not true (otherwise we do not need dynamic programming).

5.1 Activity Selection

We have a set of activities S={a1,a2,,an}, where each activity ai has a start time si and a finish time fi. Two activities ai and aj are compatible if [si,fi) and [sj,fj) do not overlap. The goal is to find the maximum-size subset of mutually compatible activities (not the set of activities with the maximum total time).

5.1.1 Dynamic Programming

We can use dynamic programming to solve this problem. Let Sij denote the set of activities that start after ai finishes and finish before aj starts, and let Aij denote the maximum-size subset of mutually compatible activities in Sij.

The activity selection problem has optimal substructure: Suppose ak is in Aij, then Aij=Aik{ak}Akj. Then |Aij|=|Aik|+1+|Akj|. Then we have the Bellman equation:

c[i,j]=maxakSij(c[i,k]+c[k,j]+1)

5.1.2 Greedy Choice

We can also use a greedy algorithm to solve this problem. We first sort the activities by their finish time, and then we choose the first activity. Then we choose the first activity that is compatible with the first activity, and so on.

In other words, we always choose the activity that finishes first.

Loop version:

Greedy-Activity-Selector(s,f)
1.n=s.length
2.A={a1}
3.k=1
4. for m=2 to n
5.if smfk
6.  A=A{am}
7.  k=m
8. return A

Recursive version:

Recursive-Activity-Selector(s,f,k,n)
1.m=k+1
2. while mn and sm<fk
3.m=m+1
4. if mn
5.return {am}Recursive-Activity-Selector(s,f,m,n)
6.else
7.return

Runtime: Θ(n).

5.1.3 Proof of Correctness

We assume that the activities are sorted by their finish time. Let Sk denote the set of activities the starts after ak finishes, and Ak denote the maximum-size subset of mutually compatible activities in Sk. We want to prove that if am is the activity that finishes first in the set Sk, then ak is in some Ak.

Let aj denote the activity that finishes first in Ak. If aj=am, then we are done. If ajak, we can exchange aj and am and not cause any conflict. Since the number of activities in Ak does not change, the new set is still a maximum-size subset of mutually compatible activities in Sk.

5.2 General Scheme of Greedy Algorithms

  1. 1.

    Cast the optimization problem as one in which we make a choice and are left with one subproblem to solve.

  2. 2.

    Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe.

5.3 Coin Changing

Given a set of coins S={v1,v2,,vn}, where v1<v2<<vn, and a value V, we want to find the minimum number of coins that sum to V. This problem does not always have a greedy solution.

The usual greedy choice is to choose the largest coin that does not exceed the remaining value. This sometimes fails, for example: S={1,3,4}, V=6. The greedy algorithm will choose 4,1,1, but the optimal solution is 3,3.

5.4 When Greedy Algorithms Fail

There are many cases when greedy algorithms fail. For example: maze problem, traveling salesman problem, 0-1 knapsack problem, etc.

Take 0-1 knapsack problem as an example. If we have a knapsack with capacity 50 and three items:

  • Item 1: v1=60, w1=10

  • Item 2: v2=100, w2=20

  • Item 3: v3=120, w3=30

If the greedy choice is to choose the item with the largest value per unit weight, then we will choose item 1 and 2, and the total value is 160. However, the optimal solution is to choose item 2 and 3, and the total value is 220.

It should be noted that the fractional knapsack problem does have a greedy solution.

6 Graph Algorithms

We usually use two ways to represent a graph: adjacency list and adjacency matrix. To choose which representation to use, we need to consider the density of the graph. A graph is sparse if |E|=o(|V|2), and dense if |E|=Θ(|V|2). For sparse graphs, we use adjacency list, and for dense graphs, we use adjacency matrix.

6.1 Breadth First Search

In breadth first search, we use a queue to store the vertices. We extract a vertex from the queue, and then add all its adjacent vertices to the queue.

We assign each vertex a color: white, gray, or black.

  • White: undiscovered.

  • Gray: discovered but not finished.

  • Black: finished.

It is worth noting that the two colors gray and black are not necessary and can be combined into one color. The difference is purely for the purpose of analysis.

Each vertex has three attributes: distance, parent, and color.

BFS(G,s)
1. for each vertex uG.V-{s}
2.u.color=WHITE
3.u.d=
4.u.π=NIL
5.s.color=GRAY
6.s.d=0
7.s.π=NIL
8.Q=
9.Enqueue(Q,s)
10. while Q
11.u=Dequeue(Q)
12.for each vertex vG.Adj[u]
13.  if v.color==WHITE
14.   v.color=GRAY
15.   v.d=u.d+1
16.   v.π=u
17.   Enqueue(Q,v)
18.u.color=BLACK

Runtime: O(|V|+|E|). This is because each vertex is enqueued and dequeued at most once, which takes O(|V|) time. Each edge is examined at most twice, which takes O(|E|) time (assume we use adjacency list).

Correctness: The loop invariant is: At the start of each iteration of the while loop, the queue Q contains all the vertices that are gray. However by the loop invariant only, we cannot prove the correctness of the algorithm.

We can prove the correctness by using the following facts:

  • Let the length of the shortest path from s to u be d(s,u). Then for a graph G=(V,E), if an edge (u,v)E, then d(s,v)d(s,u)+1. This is true because the shortest path from s to v cannot be longer than the shortest path from s to u plus the edge (u,v).

  • When the breadth first search ends, for each vertex vV, v.dd(s,v). This can be proved by induction. The base case is trivial. For the inductive step, we can find on line 15 that v.d=u.d+1. Together with the fact that u.dd(s,u), we have v.d=u.d+1d(s,u)+1d(s,v).

  • Suppose the first vertex in the queue is v1 and the last vertex in the queue is vk. Then for each i=2,3,,k, vi.dvi-1.d and vk.dv1.d+1. This can be proved by induction. The base case is trivial. When we dequeue v1, by the inductive hypothesis, we have v2.dv1.d. Then we can find vk.dv1.d+1v2.d+1. When we enqueue vk+1, let the current processing vertex be u. By the inductive hypothesis, we have vk+1.d=u.d+1v1.d+1. Also, vku.d+1=vk+1.d.

  • Suppose vi and vj are two vertices in the queue, and vi is enqueued before vj. Then vi.dvj.d. This can be proved by the last fact.

  • If v and u are neighbors, then v.du.d+1. When u is dequeued:

    • If v is white, then v is enqueued, and v.d=u.d+1.

    • If v is gray, then by fact 3, v.du.d+1.

    • If v is black, then v is enqueued before u, and by fact 4, v.du.d.

  • After the breadth first search ends, for each vertex vV, v.d is the shortest path from s to v. This can be proved by contradiction. Suppose a vertex v has a shorter path from s to v, or to say v.d>d(s,v), and v has the shortest mismatched path. Then let vertex u be the last vertex on the shortest path from s to v. By the choice of v, we have u.d+1=d(s,u)+1=d(s,v)<v.d. By fact 5, we have v.du.d+1, which contradicts the previous inequality.

The shortest path can be found by tracing back the parent pointers. This means the breadth first search tree is a shortest path tree, and is a single source shortest path algorithm.

6.2 Depth First Search

In depth first search, we use a stack to store the vertices. Each vertex has four attributes: discovery time, finish time, parent, and color.

DFS(G)
1. for each vertex uG.V
2.u.color=WHITE
3.u.π=NIL
4.time=0
5. for each vertex uG.V
6.if u.color==WHITE
7.  DFS-Visit(G,u)
DFS-Visit(G,u)
1.time=time+1
2.u.d=time
3.u.color=GRAY
4. for each vertex vG.Adj[u]
5.if v.color==WHITE
6.  v.π=u
7.  DFS-Visit(G,v)
8.u.color=BLACK
9.time=time+1
10.u.f=time

Runtime: O(V+E). Each vertex is discovered and finished exactly once, and each edge is examined exactly twice.

6.2.1 Properties of Depth First Search

  • Parenthesis property: For any two vertices u and v, exactly one of the following three conditions holds:

    • The intervals [u.d,u.f] and [v.d,v.f] are entirely disjoint, and neither u nor v is a descendant of the other in the depth first forest.

    • The interval [u.d,u.f] is contained entirely within the interval [v.d,v.f], and u is a descendant of v in a depth first tree.

    • The interval [v.d,v.f] is contained entirely within the interval [u.d,u.f], and v is a descendant of u in a depth first tree.

  • White-path theorem: In a depth first forest of a graph G=(V,E), vertex v is a descendant of vertex u if and only if at the time u.d that the search discovers u, there is a path from u to v consisting entirely of white vertices.

Proof:

”: If v is a descendant of u, then v is discovered after u, which means v is white when u is discovered. And this is true for all vertices on the path from u to v, since they are all descendants of u.

”: Proof by contradiction. Let v be the first vertex on the white path but is not a descendant of u. Let w be the predecessor of v on the white path. By the choice of w, we knoe that w is a descendant of u, hence u.dw.fu.f. (w and u can be the same vertex.) Since there is a path from w to v, v will be discovered before w finishes, hence v.d<w.f. In all, we have u.d<v.d<w.fu.f, then v must be a descendant of u.

6.2.2 Classification of Edges

There are four types of edges in a depth first forest:

  • Tree edges: Edges in the depth first forest. An edge (u,v) is a tree edge if v was first discovered by exploring edge (u,v), or to say v’s color is white when edge (u,v) is examined.

  • Back edges: Edges that connect a vertex to an ancestor in a depth first tree. An edge (u,v) is a back edge if at the time (u,v) is examined, v is colored gray.

  • Forward edges: Edges that connect a vertex to a descendant in a depth first tree. An edge (u,v) is a forward edge if at the time (u,v) is examined, v is colored black and was discovered later than u.

  • Cross edges: Edges that connect two vertices in different depth first trees. An edge (u,v) is a cross edge if at the time (u,v) is examined, v is colored black and was discovered earlier than u.

In an undirected graph, there are only tree edges and back edges. This is because there is no way to find a vertex that is colored black.

Let (u,v) be an arbitrary edge of an undirected graph G=(V,E). Without loss of generality, assume that u.d<v.d. Hence, the search must discover v before it finishes exploring u. If the first time that (u,v) is examined is from u, then it is a tree edge. Otherwise, (u,v) is a back edge, since u is still gray when (u,v) is examined.

6.2.3 DFS and Cycle

A directed graph is acyclic if and only if a depth first search of the graph yields no back edges.

Proof:

”: If there is a back edge (u,v), then we can find a loop that consists of a path from u to v on the depth first tree, and the edge (u,v).

”: If there is a loop, then there is a back edge. Assume the cycle is v1,v2,,vk,v1. Without loss of generality, assume the first discovered vertex is v1. Then at time v1.d, all other vertices are colored white. By the white-path theorem, vk is a descendant of v1. Therefore, (v1,vk) is a back edge.

6.2.4 Topological Sort

A topological sort of a directed acyclic graph G=(V,E) is a linear ordering of all its vertices such that if G contains an edge (u,v), then u appears before v in the ordering. A topological sort can be implemented by depth first search.

Topological-Sort(G)
1. call DFS(G) to compute finishing times v.f for each vertex v
2.as each vertex is finished, insert it onto the front of a linked list
3. return the linked list of vertices

Runtime: O(|V|+|E|).

Correctness: For any edge (u,v), we need to prove that u.f>v.f. Without loss of generality, assume u is discovered before v. Since in an acyclic graph there is no back edge, v must be white or black. If v is white, then v is a descendant of u, and u.f>v.f. If v is black, then v.f is already set and u.f>v.f.

6.2.5 Strongly Connected Components

A directed graph is strongly connected if there is a path from each vertex to every other vertex. Let G=(V,E) be a directed graph, and let GT=(V,ET) be the transpose of G, which is the graph formed by reversing all edges in G. Note that G and GT have the same strongly connected components.

Strongly-Connected-Components(G)
1. call DFS(G) to compute finishing times u.f for each vertex u
2. compute GT
3. call DFS(GT), but in the main loop of DFS, consider the vertices in order of decreasing u.f
4. output the vertices of each tree in the depth first forest formed in line 3 as a separate strongly connected component

Runtime: O(|V|+|E|)

Correctness: Proof by induction. An SCC must be in its own tree in the depth first forest. And we know the SCC containing the root of a depth first tree can be correctly output. Then we make induction on the number of SCC in the graph. If there is only one SCC, then the algorithm is correct (it will contain the root of the depth first tree). Then, if we want to output the SCCs of a graph with k SCCs, we can exclude the SCC containing the root of one depth first tree, and output the SCCs of the remaining graph, which has k-1 SCCs.

6.3 Minimum Spanning Tree

A spanning tree of a graph G=(V,E) is a tree that is a subgraph of G and contains all vertices of G. A minimum spanning tree of a weighted, connected, undirected graph G=(V,E) is a spanning tree of G that has minimum weight.

6.3.1 Generic MST Algorithm

Generic-MST(G,w)
1.A=
2. while A does not form a spanning tree
3. find an edge (u,v) that is safe for A
4.A=A{(u,v)}
5. return A

Correctness: By loop invariant: A is a subset of some MST.

6.3.2 Cuts

A cut (S,V-S) of an undirected graph G=(V,E) is a partition of V. An edge (u,v) crosses the cut (S,V-S) if one of its endpoints is in S and the other is in V-S. A cut respects a set A of edges if no edge in A crosses the cut. An edge is a light edge crossing a cut if its weight is the minimum of any edge crossing the cut.

We can prove that if a cut respects a set A of edges, then the light edge crossing the cut is safe for A. Suppose (S,V-S) is a cut of G that respects A, and let (u,v) be a light edge crossing the cut. Let T be a minimum spanning tree that contains A. If (u,v)T, then obviously (u,v) is safe for A. Otherwise, T contains a path from u to v but does not contain (u,v). The path from u to v with (u,v) added forms a cycle. Since the cut crosses the cycle, there must be another edge (x,y) in the cycle that crosses the cut. Since (u,v) is a light edge, we have w(u,v)w(x,y). Then we can replace (x,y) with (u,v) in T to get another spanning tree T. Noting that (x,y) is not in A, since the cut respects A. Hence, T contains A. Therefore, (u,v) is safe for A.

6.3.3 Kruskal’s Algorithm

Kruskal’s algorithm sorts all the edges in nondecreasing order by weight, and then adds the edges that connect two different components in the current forest.

Kruskal(G,w)
1.A=
2. for each vertex vG.V
3.Make-Set(v)
4.sort the edges of G.E into nondecreasing order by weight w
5. for each edge (u,v)G.E, taken in nondecreasing order by weight
6.if Find-Set(u) Find-Set(v)
7.  A=A{(u,v)}
8.  Union(u,v)
9. return A

Runtime: O(|E|log|V|) The major cost is sorting the edges, which takes O(|E|log|E|)=O(|E|log|V|) time.

6.3.4 Prim’s Algorithm

Prim’s algorithm grows a single tree by adding edges to the tree one by one. We use a min-priority queue Q to store the vertices.

Prim(G,w,r)
1. for each uG.V
2.u.key=
3.u.π=NIL
4.r.key=0
5.Q=G.V
6. while Q
7.u=Extract-Min(Q)
8.for each vG.Adj[u]
9.  if vQ and w(u,v)<v.key
10.   v.π=u
11.   Decrease-Key(Q,v,w(u,v))

Runtime: O(|E|log|V|) Dominated by the Decrease-Key operations, which takes O(|E|log|V|) time.

6.4 Single Source Shortest Path

6.4.1 Dijkstra’s Algorithm

We can use Dijkstra’s algorithm to find the shortest path from a single source to all other vertices. The idea is very similar to Prim’s algorithm.

Dijkstra(G,w,s)
1.S=
2.Q=G.V
3.Build-Min-Heap(Q)
4. while Q
5.u=Extract-Min(Q)
6.S=S{u}
7.for each vG.Adj[u]
8.  if v.d>u.d+w(u,v)
10.   v.π=u
11.   Decrease-Key(Q,v,u.d+w(u,v))

Runtime: O(|E|log|V|). The Build-Min-Heap takes O(|V|) time. All calls for Extract-Min takes O(|V|log|V|) time. All calls for Decrease-Key takes O(|E|log|V|) time.

Correctness: Prove by loop invariant: Before each iteration of the while loop, for each vertex vS, v.d is the weight of the shortest path from s to v. We can proof the maintenance of the loop invariant by contradiction: If u is the first vertex that does not satisfy u.d=d(s,u) when it is added to S, then we can find a shortest path from s to u. Let x be the last vertex in S on the shortest path and y be the first vertex in V-S on the shortest path. Notice that x might be s and y might be u. By the choice of u, we have x.d=d(s,x) and y.d=d(s,y). Then we have y.d=d(s,y)d(s,u)u.d. However, since the queue choose u to be the next vertex, we have u.dy.d. This means all the inequalities are equalities, and we get d(s,u)=d.u.

7 Related Concepts

7.1 Asymptotic Notation

7.1.1 Definition of Θ

A function f(n) belongs to Θ(g(n)) if there exist constants 0<c1c2 and n0 such that 0<c1g(n)f(n)c2g(n) for all nn0.

An easy way to proof: divide both side by g(n). For example to proof 0c1n2n2+2n+3c2n2 for all n1.

0c1n2n2+2n+3c2n20c11+2n+3n2c2

Similar to O and Ω. A function f(n) belongs to O(g(n)) if there exist constants 0c and n0 such that 0f(n)cg(n) for all nn0. A function f(n) belongs to Ω(g(n)) if there exist constants 0c and n0 such that 0cg(n)f(n) for all nn0.

In general, big Θ is used to describe the tight bound of a function, big O is used to describe the upper bound of a function, and big Ω is used to describe the lower bound of a function.

7.1.2 Little o and Little ω

A function f(n) belongs to o(g(n)) if limnf(n)g(n)=0. A function f(n) belongs to ω(g(n)) if limnf(n)g(n)=, or equivalently, g(n) belongs to o(f(n)).

7.1.3 Important Inequalities

Every polynomial of logn grows slower than every polynomial of n. That is: (logn)100=o(n0.01).

Every polynomial of n grows slower than every exponential 2nc. That is: n100=o(2n0.01).

7.1.4 Rule of Sum and Rule of Product

Rule of Sum: f(n)+g(n)=Θ(max(f(n),g(n))).

Rule of Product: f(n)g(n)=Θ(f(n)g(n)).

7.2 RAM Model

The RAM model assumes that each elementary operation takes the same amount of time.

7.3 Common Cost Model

Assume the runtime of an algorithm is the number of elementary operations it performs.

7.4 In place

A sorting algorithm is said to be in place if it requires O(1) additional space. Hence, merge sort is not in place, since it requires Ω(n) additional space.

7.5 Stable

A sorting algorithm is said to be stable if it preserves the relative order of equal elements.

7.6 Master Theorem

Let a>0 and b>1 be constants, let f(n) be a function, and let T(n) be non-negative function for large enough n. Then the recurrence T(n)=aT(n/b)+f(n) has the following asymptotic bounds:

  1. 1.

    If f(n)=O(nlogba-ϵ) for some constant ϵ>0, then T(n)=Θ(nlogba).

  2. 2.

    If f(n)=Θ(nlogbalogkn) for some constant k0, then T(n)=Θ(nlogbalogk+1n).

  3. 3.

    If f(n)=Ω(nlogba+ϵ) for some constant ϵ>0, and if af(n/b)cf(n) for some constant c<1 and all sufficiently large n, then T(n)=Θ(f(n)).

7.7 Acylic Graph

A directed graph is acyclic if it has no directed cycles.

8 Problems from Homework

Problem 1

Explain why the statement “The running time of Algorithm A is at least O(n2) is meaningless.

Problem 2

Prove by induction that every nonempty binary tree satisfies |V|=|E|+1.