Notes on Neapolitan and Naimipour text

Some adaptations were made to write mathematical notation on the keyboard:
     1.  "a^b" means "a to the power b" or "a with superscript b".
     2.  "a_b" means "a with the subscript b".
     3.  "<=" means "less than or equal to".
     4.  "Theta" is the Greek capital theta.

Purposes of the Course

Learn to estimate and use information on running time: This is covered in Chapters 1-4 and in Appendix B. There are a couple of points that could be emphazied more. First, worst-case run time can be more important that average-run time if the algorithm is used in a time-critical application such as controlling a robot's response to pontentially dangerous situations. Second, although our study of average-case run times assumes all initial situations are equally likely, this assumption may fail. In that case, we can be misled. For example, the average running time for Quicksort is better than for Mergesort but if input is frequently sorted, then Quicksort may be slower.
Learn to recognize different types of algorithms and their strengths and weaknesses: Some information on this is given at the start of the notes for Chapters 2-4.
Be introduced to the subject of obtain lower bounds for problems: See Chapters 7 and 8.
Learn some things about designing algorithms: Mostly this is learning by studying algorithms. In general, designing an efficient algorithm for a new problem is difficult. There are some general principles to keep in mind.
- Try to solve the problem by thinking in terms of an algorithm type.
- Be open to the idea of keeping track of additional data in designing an algorithm. Examples where this is needed: Exercises 2.42 and 3.33, the D^(k)[i][j] arrays in Floyd's algorithm (without them, there's no smaller problem to build on for dynamic programming!), the touch[ ] array in Dijkstra's algorithm (although this could be considered just a matter of efficiency)
- If numbers are involved, sorting may improve efficiency. (Kruskal's algorithm sorts edges, sorting is done in the 0-1 Knapsack problem to obtain an order for making decisions, and there are many other examples)
- Keeping track of additional data may improve efficiency.
- Appropriate data structures may improve efficiency. This may be as simple as a sorted list (noted earlier), or may be more complex as in Kruskal's algorithm.

Brief Notes on Neapolitan and Naimipour Text

Ch. 1 Ch. 2 Ch. 3 Ch. 4 Ch. 5 App. A App. B

You can step through some of the algorithms in the text by going to http://students.ceid.upatras.gr/~papagel/, clicking on "Algorithms Tutoring Web Page", and choosing an algorithm. (Thanks to Long Huynh, 188 student, Spring 2000.)

page vii: Read to learn about the nature of the text.
page viii: Read to get a quick overview of the text.

Chapter 1

page 26: You can do arithmetic with O(...) notation beyond what the text discusses. I'll discuss it very briefly in class since it appears in the literature.
page 37: Also useful: If g(n) is in o(f(n)), then f(n) + rg(n) is in Theta(n) for all real r. (#7 in the text can only be used when r > 0.)
page 37: Also useful: n! is in Theta((n/e)^(n+1/2)).
For analyzing algorithms, it is useful to know that 1^k + 2^k + ...+ n^k is a polynomial of degree k+1 whose leading term is n^(k+1)/(k+1). This is true even if we omit some terms at the start of the sum and add or omit some at the end. For example, the sum could start at 3^k and end at (n+1)^k. It follows that the sum is n^(k+1)/(k+1) plus something in o(n^(k+1)). To illustrate, if we must sum k(n-k), we can think of this as "n times the sum of k" minus "the sum of k^2". This gives us n(n^2/2) - n^3/3 = n^3/6 plus something in o(n^3). Hence the sum is in Theta(n^3).

Chapter 2

Overview of Divide and Conquer: Works well when problem can be divided into a few significantly smaller problems and the solutions combined into a solution for the original problem without extensive effort. Determining run-time complexity class leads to a recursion. Obtaining a recursion is often easy, but not always. The divide and conquer approach is referred to as top down.
page 51: Be careful in reading the analysis of the binary search algorithm. When n is a even (e.g., a power of 2), the list is split into two unequal pieces with the right-hand piece being the larger and containing exactly n/2 elements.
Sec. 2.7: One problem with mostly theoretical calculations as done in this section is that you may leave something out; for example, overhead in calling a routine; or you may miscalculate the time for the basic operation. An alternative approach is to get actual times if the operating system allows you to do that. Fortunately, finding the exact value of n at wich the threshold occurs is seldom important because the times for the two algorithms are usually close together for a range of values of n near the threshold. You can see this in Table 2.5 on page 82

Chapter 3

Overview of Dynamic Programming: Works well when solution can be built up step by step from solution to smaller problems, frequently using the principle of optimality (p. 103). The approach is referred to as bottom up. In contrast to divide and conquer, dynamic programming code contains loops instead of recursive calls. Analysis is usually fairly easy.
Section 3.1: Algorithms 3.1 and 3.2 can be improved for large k by using the first equality in the section to note that the binomial coefficient is unchanged when k is replaced by n-k, whence we may assume that k <= n/2.
page 91: How bad can this algorithm be? The binomial coefficient is largest when k is the floor of n/2. It can be shown that the algorithm computes Theta(2^n/sqrt(n)) binomial coefficients. Compare with the worst case of Algorithm 3.2.
Section 3.2: It may have been nicer if the superscripts on D[i,j] were one larger because:
- The superscript would be the maximum number of edges allowed in looking for shortest paths from i to j.
- It would agree with the notation in graph theory.
- It would agree with the notation in matrix theory because then D^(k) = W^k where W is thought of as a matrix but matrix multiplication is done differently: "times" becomes "plus" and "plus" becomes "minimum". For example, in AxC,
  a[i,1] c[1, j] + ... + a[i, n] c[n, j] becomes min(a[i,1]+c[1, j], ..., a[i, n]+c[n, j]).
page 97: In the first line of Example 3.2, "exemplary" has one of its secondary meanings "serving as a model or example", not its primary meaning "commendable".
page 99: Before considering Floyd's algorithm, we should look at the naive one: Compute W^k by the modified matrix product discussed above. If we do that the naive way, there are n-1 matrix products and each product takes Theta(n^3), giving a Theta(n^4) every case running time. Less naive computation of powers requires exactly the ceiling of lg n multiplications and so the every case running time is in Theta(n^3 lg n). Floyd's algorithm is faster.
page 100:  The following may help you see why Floyd's algorithm is correct: The "obvious" way to find the shortest path is to compute longer and longer paths; that is, start with D^(0) = W and compute D^(1), D^(2), ... as indicated on pages 97-99.  Here the superscript indicates the maximum possible number of internal vertices.  Instead, we can compute another another array (also called D), but now the superscript will indicate the maximum possible value for a vertex subscript.  For example, D^(4)[i, j] is the shortest path from v_i to v_j where the only vertices allowed between v_i and v_j are v_1, v_2, v_3, and v_4. This is what Floyd's algorithm is doing, where k = 4 in the example I just gave. The algorithm is able to store the D^(k) arrays on top of one another because
                      D^(k)[i, j] = min(D^(k-1)[i, j] , D^(k-1)[i, k] + D^(k-1)[k, j])
and also
                     D^(k-1)[i, k] = D^(k)[i, k]   and   D^(k-1)[k, j] = D^(k)[k, j].
Hence it does not matter whether we use D^(k-1) or D^(k) in the minimum computation in Floyd's algorithm. As with the "obvious" approach, D^(0) = W because there can be no intermediate vertices if no subscript can exceed 0.

Chapter 4

Overview of Greedy: Greedy algorithms are based on making the best choice at the current stage without regard for the future. They are usually simpler to visualize and implement than divide and conquer or dynamic programming algorithms; however, it is often harder to prove that they are correct.
page 144: Prim's algorithm uses the extra data structures to reduce running time. We could use a simpler approach, but its running time is of order nm, where m is the number of edges and so n-1 <= m <= n(n-1)/2, which can be worse than Prim's algorithm. Here is the algorithm:
- Start with Y = {v_1} and F the empty set.
- Examine all edges and select an edge e with one end in Y and one end not in Y such that the weight of e is as small as possible. Add e to F and its ends to Y (one end is already in Y).
- While Y is not all vertices, repeat the previous step.
page 167: The example becomes worse for the greedy-algorithm thief if the 10 pound item is worth $75.: The value of what he steals goes down even though one of the items he could steal goes up in value!

Chapter 5

Overview of Backtracking: This is a general purpose approach. Like most general purpose tools, it is usually better to use a more specialized tool (Chapters 2-4) if any is applicable. Running time can be very sensitive to the order in which choices are made and the nature of the promising( ) function. Proving correctness is usually fairly simple since one is simply examining all possibilities except those that have been ruled out as being impossible. The backtracking may be a source of programming errors if one is not careful.
Section 5.3: Monte Carlo methods are used to estimate the running time of a backtracking algorithm on a specific problem. This method does not let you determine the complexity class of the algorithm. Determining the complexity class of a backtracking algorithm is often quite difficult and is beyond the scope of this course. On the other and, complexity class information is usually not of interest since the running time most backtracking programs increases very rapidly with problem size. (The growth is frequently exponential.)

Appendix A

page 435: induction base is also called base case, basis, and base step.
page 443:   There is a simple device for remembering how to deal with products and quotients of logarithms:
Think of log_a(b) as b/a.  You're allowed to cancel equal factors from numerator and denominator and to replace ?/(x/y) by ?(y/x), but do not do any actual division. (Here ? stands for any expression.) Convert the result back to logarithms.  (Factors of 1 can be dropped and an answer 1/1 means the expression equals 1.)
Example: Think of log_b(x)/log_b(a) as (x/b)/(a/b) = x/a, which corresponds to log_a(x).   See p.443 #7.
Example: Think of  ln n  as n/e = (n/2)(2/e), which corresponds to (ln 2)(lg n). See p.66.
Example:  Think of log_a(b) as b/a = 1/(a/b), which corresponds to 1/log_b(a). Thus  log_a(b) = 1/log_b(a).
Example:   Think of ln(10)lg(e)log_10(2) as (10/e)(e/2)(2/10) = 1/1, so the product of logarithms is 1.
page 463: The square brackets [ ] in the exercises for section A.7 should be parentheses -- these are binomial coefficients.

Appendix B

page 465: The title of Section B.1 may be somewhat misleading. What it discusses is proving that the solution to a recurrence is correct once you have somehow found the solution.
page 469: The order of a recursion is the difference between the largest and smallest subscripts on t. Thus the order of the recursion at the bottom of the page is k. Such a recursion requires exactly k initial conditions; that is, the it requires the first k values of t_i. ("Initial condition" is defined on page 466.)
page 472: The order of the characteristic equation equals the order of the recursion. (See previous note.)
page 478: Examples B.14 and B.15 may be misleading. If the recursion has order l (see note for p.469) and the number of initial conditions exceeds l, only the last l values should be used. To illustrate, suppose t_0=1 in Example B.14. The recursion would give the same values of t_k for k>0, but the method of solution would not work. The correct approach is described in Example B.16 on page 481: Use l initial values of t_k that are given with the problem and then use the recursion to generate however many more tk values are needed. For Theorem B.3 on page 479, the number of additional values that are needed is d+1. To illustrate: Example B.14 should have only the one initial condition t_0=0. Since p(n)=1, we have d=0. Thus the recursion would then be used to compute one addtional value of t, namely t_1.
page 486: The method in Section B.3 involves repeated substitution to convert the recurrence into a summation which you can evaluate (Ex. B.21) or approximate (Ex. B.22).
page 490: In the definition of smooth, 2n can be replaced by any kn for any integer k > 1.
page 490: The term smooth is unfortunate since it is misleading and conflicts with the usage in mathematical analysis. We think of smooth as describing the graph of the function and so the exponential function 2^n would be smooth -- but it is not (Ex. B.24). On the other hand, 2^[lg n] is smooth, but it's graph is not. ([x] is the floor of x) "Smooth" in this context really means that the function f(n) is not growing too fast and is not bouncing around. Better terminology might be moderately growing or almost polynomial since a smooth function is O(n^k) for some large enough k and since polynomials are eventually nondecreasing.
page 490: Theorem B.4 requires that you solve the recursion for n a power of b. You can usually avoid doing this by using Theorem B.6 or the improvement of it noted below.
page 493:  Theorem B.6 can be improved in two ways:
(a)  The initial condition T(s) = d need not be specified.
(b) cn^k can be replaced by g(n) in Theta(f (n)) where  g(n) is eventually positive and f(n) is smooth.
Then:
(1) If f(n) = n^k, the conclusions of the theorem hold.
(2) Otherwise, in (B.5), the first case becomes Theta(f (n)), the second case is too complicated to state here, and the third case is unchanged, where we define k to be the limit of log f (n)/log n as n goes to infinity.