Notes on Sipser

Sipser's Text: In General, next; click for Specific Comments

The focus of the quarter will be on various aspects of languages. Each of the following classes of languages is increasingly more general. There are languaguages which are not recursively enumerable. There is a class, Turing-decidable, which lies between context-free and Turing-reconizable languages.

Regular Languages  (Chapter 1)
Generated by:     Regular expressions
Recognized by:  Finite automata
Determinism:      Nondeterminism does NOT add power
Decidable:          Yes (Chapter 4), and in polynomial time (Chapter 7)
Relevance:         Used in UNIX commands (e.g., grep)
Context-Free Languages  (Chapter 2)
Generated by:     Context-free grammars
Determinism:      Nondeterminism DOES add power
Recognized by:  pushdown automata
Decidable:          Yes (Chapter 4), and in polynomial time (Chapter 7)
Relevance:         Used in computer langauage specification (compilation can be done in reasonable time).
Recursively Enumerable (=Turing-recognizable) Languages  (Chapter 3)
Generated by:     Enumerators (a type of Turing machine)
Recognized by:  Turing machines
Determinism:      Nondeterminism does NOT add power
Decidable:          No (Chapter 4)
Relevance:         Describes what it is possible to compute.

Specific Comments on Sipser's Text

ACRONYMS: This subject has a variety of acronyms. You need to know what they refer to in order to understand the material. I'll list the major ones at the start of each chapter.
page xi: Read the section directed to you.

Chapter 0 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 7

Chapter 0

page 6: X is used for both Cartesian product and multiplication, which are similar operations, but don't confuse them! Cartesian product is used only for sets.
page 12: A directed graph is also called a "digraph" by some people.
page 16: If you are not familiar with this terminology, you may want to xerox the page and use it as a bookmark.
page 17-25: For more information on proving, click on link.
page 23: The basis for induction is often called the base case or base step or induction base by other authors.

Chapter 1
ACRONYMS: DFA, NFA, GNFA
page 38: Since M3 accepts precisely in the states M2 does not, it accepts the complement of L(M2), which is what the text gets.
page 39: Recall from the top of page 4 that n modulo k is the remainder when n is divided by k.
page 40: "Regular" is used at the top of the page, but is not defined until the bottom of the page.
page 40: Definition of "accepts": Note that., if M and w are given, then 1 and 2 imply that r0=q0 and the remaining r's are determined by w and delta. Hence there is no need to hunt for a sequence of r's since only one sequence can possibly work. Thus, given w, we determine r and check to see if rn is in F.
page 48: With a DFA you can always proceed, but with an NFA you can get stuck: there may be no edge out from a vertex with the label you need. For example, if the input string is 11..., we use Figure 1.14, and we go from q1 to q2 with the first 1, then we are stuck. (In #3 of the definition on page 53, delta(q2,1) is the empty set.)
page 49: It is convenient to say that q' can be "reached" from q with input a if q' is in delta(q,a).
page 55: In #2, delta'(R,a) is the set of states in Q that can be reached from some r in R on some edge labeled a. (That's what your fingers did.)
page 56: The last paragraph of the proof, just above the middle of the page, says "The construction of M obviously works correctly." It is "obvious" only after you clearly understand what the definition does! Some of you might prefer a proof. The next sentence in the paragraph is really the proof: If R is the set of states the NFA can be in after some input string w, then you should be able to convince yourself that delta'(R,a) is the set of states the NFA can be in after input string wa.
page 63: Keep straight the two uses of "regular":
regular expressions are built by certain rules and lead to a language;
regular languages are languages that are recognized by DFAs.
In Theorem 1.28 it is proved that the confusion doesn't matter: A language comes from a regular expression if and only if it is a regular language.
page 64: You can think of a regular expression as a set of strings that is constructed by using the constructions allowed in Definition 1.26.
page 65: An inductive definition is also called a "recursive" definition.

page 66: Here's a comparison between languages and arithmetic:

languages	union	o	epsilon	empty set
arithmetic	+	X	1	0

page 66: Because of Theorem 1.28 and the closure properties of regular languages (pp. 58-63), we could define a "generalized regular expression" that allowed more operations than the definition on page 64 but still gave the same set of languages. For example, we could allow intersection by the footnote on page 46, complementation by Exercise 1.10(a), and reversal by Problem 1.24. For example, if R and S are regular expressions, their intersection R^S is not a regular expression but it describes a regular language. (Consequently, there is a regular expression that defines the same language; however, that regular expression is not R^S.)
page 73: In an FA, delta tells us what edges and edge labels are possible. It was convenient to do specify that in one way with DFAs and NFAs, but a different form is simpler for GNFAs.
page 78: The pigeonhole principle is only indirectly related to pigeons. Old desks had a set of compartments for storing papers and other items. They were referred to as pigeonholes. The image of putting things into such compartments led to the terminology "pigeonhole principle." The desk compartments were probably named because they reminded people of nesting boxes for pigeons.
page 80: The proof at the top of the page involves a couple of twists. First, it is a proof by contradiction. Second, to show that the pumping lemma fails, we must negate it with the "if...then" and quantifiers. The lemma has the form
"If regular, then there exists p > 0 such that for all s with |s| >= p there exists x, y, z such that for all i>= 0, ..."
It's easy to agree with the book about what the negation of this is, but can you negate the lemma by yourself? If you're not sure, you may want to see the discussion at the end of the information on proofs.
page 82: Example 1.42 can also be done by using reversal (Problem 1.24).
A variety of properties preserve regularity: Suppose R and S are regular languages. Then so are their union, intersection, and concatentation. The star and reversal of R are also regular languages. You should look in the text for proofs of these facts. Regularity is also preserved under taking "substrings". If x, y, and z are (possibly empty) strings, then y is a substring of xyz. If x is empty, y is an initial substring. If z is empty, y is an end substring. The substrings of R are regular language, the initial substrings of R are an initial language, and the end substrings of R are a regular language. Can you prove this?

Chapter 2

ACRONYMS: CFL, CFG, PDA
NOTATION: Unfortunately, the index is incomplete. For ---> (right arrow) and ==> (double right arrow), it only refers to implication and for =*=> (double right arrow with an asterisk over it), there is no entry.
- ---> is used to indicate a rule in a grammar (p. 92, middle)
- ==> is used to indicate a step in a derivation (p. 92, bottom)
- =*=> is used to indicate one or more steps in a derivation. It's first appearance seems to be on page 113, without a definition.
page 93: You will probably find it helpful to generate other "sentences" for this grammar. Also, can you find some reasonable English sentences that can be made with the vocabulary that are not in the language?
page 103: In 2 at the bottom of the page, the top of the stack is on the left. Thus, in at, the top of the stack is a.
page 106: The idea is to use a leftmost derivation (p.98). When a rule is used, any terminal symbols to the left of what is now the leftmost terminal are popped from the stack and checked against the input.
page 111: There's a point that is slid over in the middle of the page. If we are attempting to write rules for A_sub_pq based on the strings that are generated, we might expect an infinite number of rules since most CFLs contain an infinite number of strings. This would be illegal since a CFG has only a finite number of rules (Defn. 2.1.3 p.94). In the proof at the bottom of the page, this problem disappears since we focus on the choices for a, b, r, and s, of which there are a finite number. At this point, you may find it helpful to construct the CFG for the PDA in Figure 2.6 (p.104).
page 117: One can show that A = {a^n b^n c^k | n and k nonnegative integers} and B = {a^k b^n c^n | n and k nonnegative integers} are CFLs by using the idea in Example 2.9 on page 104. The intersection of these two languages is the language in Example 2.20. Thus the intersection of CFLs may not be a CFL. In contrast, the intersection of regular languages is regular (footnote page 46). Of course, some operations on CFLs always give CFLs. (See Problem 2.15.)

Chapter 3

ACRONYMS: TM
Although this chapter begins a new part of the text, it can be viewed as a continuation Part I since it introduces more powerful automata We've gone from DFA to NFA (no more powerful) to PDA, and now to Turing machines. ("More powerful" means able to recognize more languages.)
page 128: In the definition of delta, the symbol is written before moving the head. This can be seen by studying the discussion in the first two paragraphs on page 129.
page 131: "4. Return the head to the left-hand end of the tape." There is no way to tell when the left-hand end of the tape has been reached. (In contrast, one can detect the right-hand end of the written material because of the blank character.) There are at least three ways around this in general and one in particular:
- (general) Insist that all input strings must begin with a special character, say $, to mark the start of the string. This limitation on input is neither desirable nor elegant.
- (general) Figure out how to detect the beginning. See the discussion on page 134 for Stage 2.
- (general) Figure out how to mark the beginning.  This can be done by copying each character one character to the right and replacing the left-most character with $. This would seem to require that a Turing machine have memory so that in can remember a symbol to write it after moving. (See the note concerning page 131.)  You can give a machine a finite memory---think of it as labeling the states the way we did for DFAs as on pages 42-44. With this idea, one can build a machine that starts by remembering $ and then repeating
                  remember current symbol, write previously remembered symbol, and move right
  until a blank is remembered.   Having done this, move left until $ is seen, move right one step. The machine is now at the start of input and can begin executing the "program" in Example 3.4.  Thus, we can always assume that the tape has a special symbol at its left-hand end.
  An alternative method for marking the beginning is to double the size of the input alphabet.  The added characters are the same as original except that each has an added mark, say x' instead of x. The first thing the machine does is replace the initial symbol a on the tape with a'. The machine transitions must then allow for the possibility of symbols with primes when it may be at the beginning of the tape.
- (particular) If the leftmost symbol on the tape is not needed, it can be overwritten with a special character. In this example, the blank character is used.
page 133: The Turing machine is missing q_reject. It is reached from q_14 if the symbol on the tape is #, 0, or 1. Other missing edges also go to the rejecte state.
finite memory: In various places, e.g. the discussion of Stage 2 on page 134, we want a Turing machine to have a (finite memory). Here is a way to do this. Let M be the possible memory states, including empty (same as don't care). Replace each state q of the machine excpet the start, accept and reject states with the set of states {q} x M. The new machine has a transition from (q, m) to (q', m') when we want to move from q to q' and change the memory contents from m to m'. (Some of the new states may be impossible to reach. They can simply be dropped from the machine.)
Section 3.2: This section shows the extraordinary robustness of the Turing machine concept; that is, many modifications fail to increase the languages Turing machines recognize. Compare this to DFAs: Adding nondeterminism does not give DFAs more power, but adding a stack does. PDAs are rather nonrobust: Removing nondeterminism reduces their power (not proved in text---just a remark near the bottom of page 102). Adding a second stack increases their power; for example, a two stack machine can recognize the language {0^n 1^n 2^n | n > 0}, but a one stack machine cannot. Turing machine robustness lends strength to the Church-Turing thesis (page 143).
page 136: The "simulation" idea mentioned in the middle of the page is not new:
- To prove Theorem 1.19 (p. 55) we showed that a DFA could simulate an NFA.
- To prove Lemma 2.13 (p. 107) we showed that a PDA could simulate the derivation process of a CFG.
page 137: Step 3 requires some recopying, which can be done in the manner described in the note for page131.
page 138: Think about the following question: Why can't Theorem 3.8 be proved by some simple variation of the proof for Theorem 1.19 (p. 55)?
page 141: The proof of Theorem 3.13 is rather brief. For example, how can M run E? Essentially, this involves embedding most of the machine for E in M. We did something like this on pages 59-62, but the details were given much more explicitly.
Problem 3.9: A TM can simulate an multistack PDA. Part b of this problem shows that a 2-stack PDA can simulate a TM, so they have equivalent power. Part a shows that 2-stack PDAs are more powerful than 1-stack PDAs. Hence we have the following:
DFA = NFA < 1-stack PDA < n-stack PDA = 1-tape TM = nondeter., n-tape, etc., TM
Here n>1 and the equalities and inequalities refer to power, so A < B means B is more powerful than A.
Running Forever: Some of our machines could run forever.  Here are the ones we've looked at:
       DFA:  Must stop because an input is read at each step.
       NFA:  Could run forever by following a cycle of epsilon edges.
       PDA: Like an NFA, could run forever.
       Turing machine:  Could run forever reading and writing on the tape; e.g., a Turing machine
                                   with input n can replace it on the tape with n+1, so let this machine start
                                   again after it writes n+1.  It will just keep "counting".
Although they can run forever, we can replace NFAs and PDAs with equivalent devices which do not:
- We showed that a NFA can be replaced by a DFA.
- A PDA can be replaced by a CFG which can be rewritten in Chomsky normal form (page 99). In Chomsky normal form, the number of rules of the form A --> BC that are used equals the length of the input. Hence for input of length n, exactly 2n rules are used unless n=0, in which case one rule is used.
- The Turing machine situation is more difficult and will be discussed in Section 4.2.
Impossible Problems: In the past couple of centuries, mathematics has developed tools for showing that problems are impossible.
- Perhaps the earliest such problems go back to the Greeks: Using the traditional tools of geometric construction (straight-edge and compass), double the cube, trisect the angle, and square the circle. Doubling the cube requires constructing the cube root of 2 and trisecting the angle involves solving a cubic equation. Galios theory (Math 100) can be used to show that these cannot be done because all geometric construction allows you to do is add, subtract, multiply, divide, and take square roots. Squaring the circle requires constructing pi. About a century ago Lindemann showed that pi is "transcendental," which makes this problem impossible, too.
- A problem dating to the Renaissance is the question of finding a method for obtaining the roots of a general 5th degree polynomial. A "method" means an algorithm where the only arithmetic operations allowed are addition, subtraction, multiplication, division, and taking roots. A formula for quadratics (requiring square roots) had been known for a long time. Italian mathematicians dealt with 3rd and 4th degree polynomials (which required square, cube, and fourth roots). Galois theory can be used to show that there cannot be any such method. By the way, solving 3rd degree (cubic) polynomials motivated the introduction of imaginary numbers.
- Around the turn of the century, mathematical logicians were concerned with the problem of providing a secure basis for mathematics. (Various paradoxes had arisen. For example, in set theory, if S is the set of all sets that don not contain themselves, does S contain itself? This is akin to the problem of the village's adult male barber who shaves every adult male villager who does not shave himself.) Go"del proved that it was impossible to find a provably secure basis.
- Turing machines provide the tool for proving the nonexistence of algorithms. The problem of finding an algorithm for Diophantine problems (integer solutions of polynomial equations) is discussed in Section 3.3. Section 4.2 and Chapter 5 discuss other problems for which algorithms do not exist. (They are called "undecidable problems.")
- The most famous unsolved question about impossibility is P=NP (page 247).
         It is impossible to find polynomial time algorithms for a variety of important problems
                                                                        if and only if
                                                            P is strictly smaller than NP.

Chapter 4

ACRONYMS: A-sub-X, E-sub-X, EQ-sub-X where X is some language, automaton, or grammar.
ECODING: In the chapter, we often have one machine whose input may involve a string in an arbitrary alphabet. Thus it seems that the machine's tape alphabet must be infinite to allow for all possibilities. Not so, what we call something is arbitrary. A finite set can be ordered in an arbitrary manner and its members encoded as s1, s2, s3, and so on. This only requires an eleven symbol alphabet: {s,0,1,2,3,4,5,6,7,8,9}.
page 152: For A_DFA, there is the problem of encoding arbitrary finite sets, discussed in the preceding remark, and the problem of encoding a machine description. The latter is handled much like the graph on page 146. For example, the DFA in Figure 1.4 on page 34 could be encoded as
(q1,q2,q3) (0,1) ((q1,0,q1),(q1,1,q2),(q2,0,q3),(q2,1,q2),(q3,0,q2),(q3,1,q2)) q1 (q2)
where the five parts of the DFA definition (page 35) are separated by spaces for clarity.
page 153: The reduction of an NFA to a DFA requires an algorithm for computing E(R) on page 56. The idea is simple start with E(R)={R} and R unmarked. As long as E(R) contains an unmarked q, mark it and add to E(R) all q' for which there is an epsilon edge (q, q').
page 158: You may wonder what the difference is between Theorem 4.6 and Theorem 4.8. The former requires a machine that decides membership for any CFL while the latter only requires a machine that does so for a particular (but arbitrary) CFL. Hence Theorem 4.6 is stronger.
page 159: All containments in the figure are proper:
- We already know not all context-free are regular (Examples 1.38 and 2.9).
- Not all decidable are context-free. (Example 2.20 is decidable.)
- Section 4.2 shows that not all Turing-recognizable are decidable.
page 161:  An alternative to Definition 4.10 is:
     1.  If there is a one-to-one map f : A --> B, we say that B is at least as big as A.
     2.  If A is at least as big as B and B is at least as big as A, we say that A and B are the same size.
It can be shown that this definition of "same size" is equivalent to the one in the text, but I will not do so.
page 163: A similar diagonal argument shows that, for any set S, the set of subsets of S is bigger than S.
Halting problem: If you think that this doesn't show that undecidability may be a common occurrence, how about a simple solitare game. A set of "dominoes" is generated randomly and you are supposed to do something with it. See Section 5.2 for an explanation of the game. You might think that the halting problem could be solved if we simply had a great store of algorithms for answering the question and were clever enough to choose the right algorithm for the particular TM that is being given as input. It would require an infinite supply of algorithms since choosing among a finite set can be done by a nondeterministic TM. (You should be able to see why this is so.)

Chapter 7

Notation: If you have trouble understanding big-oh and little-oh notation, you might look at another book; for example, Concrete Mathematics by Graham, Knuth, and Patashnik.
page 227: We could define f (n) = O(g(n)) by "f (n)/g(n) is bounded". Some people define O(g(n)) to be the set of all functions f (n) such that f (n)/g(n) is bounded. In that case, equality is replaced by set membership.
page 228: Little-oh can be treated the same way big-oh was treated in the previous remark. Thus o(g(n)) would be all functions f (n) such that f (n)/g(n) goes to 0 as n goes to infinity.
page 229: Note that, if s(n) = O(t(n)), then TIME(s(n)) is a subset of TIME(t(n)).
page 239: The proof has a gap: We need to know how long each step in algorithm E takes. Step 3 clearly takes an amount of time that is roughly proportional to the number of digits in x and y and so is O(input size). How long does division take? Recall the standard algorithm for division. It obtains digits in the quotient one by one. The number of digits in the quotient is O(input size). The amount of work to find a single digit is also O(input size), which can be seen by trying all possible values of the digit and noting that multiplying a number by a digit requires O(input size) work.
page 240: Be careful with Theorem 7.14. It states that given a CFL, say L, the language L is in P. In other words, there is a Turing machine, say TM_L such that TM_L decides L in polynomial time. This is not the same as saying that A_CFG is in P because this requires one Turing machine for all CFGs. Suppose we are given a CFG. We could build a Turing machine to convert it to Chomsky normal form and then use the algorithm in the proof of Theorem 7.14 to see if the CFG can produce the string w. How can the TM convert the grammar to Chomsky normal form? If it uses the method in the proof of Theorem 2.6 (page 99), the new Chomsky normal form could be exponentially larger that the original language. This can happen because a rule containing k copies of A in its right side becomes 2^k rules when we eliminate the rule A-->(epsilon). Prof. Sam Buss has indicated how to avoid this. First change all rules so that they have at most two symbols on their right side, then proceed as in the proof of Theorem 2.6. To illustrate the conversion, the rule A --> BCDE would be replaced by the rules A --> BX, X --> CY, and Y --> DE, where X and Y are new variable names. This expands the langauge by an amount that is at most linear in the number of symbols needed to express the original grammar.
page 251: cnf-form plays an important role in the computer language Prolog.
page 252: Two nodes in the graph are connected by an edge if and only if (i) they are in different clauses and (ii) they correspond to literals that can both be true at the same time.
page 253: If there is an NP-complete problem, it must be, in some sense, the hardest problem in NP. There is no reason to believe such a problem exists. (Why can't there be harder and harder problems in NP without end?) What is remarkable is that many important problems are NP-complete.
page 255: There is a slight problem: It turns out that the proof needs the Turing machine to be able to continue after reaching q_accept. It also requires that, once q_reject is reached, the machine cannot continue on to q_accept. The first occurs because the proof requires a filled tableau. The second occurs because the proof only checks for the presence of q_accept in the tableau. To avoid these problems, we can modify the definition of a Turing machine so that it loops at both q_accept and q_reject.
page 255: Before reading the details of the various subscripted phis, it may help to understand what each one is designed to accomplish:
- phi_cell insures that each for each cell (i, j) exactly one element of C is present.
- phi_start insures that the first line of the configuration is the start state of the Turing machine.
- phi_move insures that each row of the table is a legal move from the preceding move. This can be done by creating a 2 by 3 window which is moved across a pair of lines and checking that each entry in the window is legal.
- phi_accept insures that the accept state appears in the configuration.
End of Section 7.4: There are two natural questions to ask at this point:
- Are there other NP complete problems? Yes, hundreds of them. Garey and Johnson (see Sipser's references) list a lot, but many more have been discovered since their book was published.
- We don't know if P=NP, so are there any interesting problems that are Turing decidable that are known to not be in P? Yes, an example is given on pages 313-314: Consider regular expressions. Allow the shorthand R^k to stand for R concatenated with itself k times. Determining if two regular expressions written in this shorthand are equal is not decidable in polynomial time.