Sipser's Text: In General,
next; click for Specific Comments
The focus of the quarter will be on various aspects of languages.
Each of the following classes of languages is increasingly
more general. There are languaguages which are not recursively
enumerable. There is a class, Turing-decidable,
which lies between context-free and Turing-reconizable languages.
- Regular Languages (Chapter 1)
Generated by: Regular
expressions
Recognized by: Finite
automata
Determinism: Nondeterminism
does NOT add power
Decidable: Yes
(Chapter 4), and in polynomial time (Chapter 7)
Relevance: Used
in UNIX commands (e.g., grep)
- Context-Free Languages (Chapter 2)
Generated by: Context-free
grammars
Determinism: Nondeterminism
DOES add power
Recognized by: pushdown
automata
Decidable: Yes
(Chapter 4), and in polynomial time (Chapter 7)
Relevance: Used
in computer langauage specification (compilation can be done
in reasonable time).
- Recursively Enumerable
(=Turing-recognizable) Languages (Chapter
3)
Generated by: Enumerators
(a type of Turing machine)
Recognized by: Turing
machines
Determinism: Nondeterminism
does NOT add power
Decidable: No
(Chapter 4)
Relevance: Describes
what it is possible to compute.
Specific
Comments on Sipser's Text
- ACRONYMS:
This subject has a variety of acronyms. You need
to know what they refer to in order to understand the material.
I'll list the major ones at the start of each chapter.
- page xi: Read the section directed to you.
Chapter 0 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 7
Chapter
0
- page 6: X is used for both Cartesian product
and multiplication, which are similar operations, but don't confuse
them! Cartesian product is used only for sets.
- page 12: A directed graph is also called a "digraph"
by some people.
- page 16: If you are not familiar with this terminology,
you may want to xerox the page and use it as a bookmark.
- page 17-25: For more information on proving,
click on link.
- page 23: The basis for induction is often
called the base case or base step
or induction base by other authors.
Chapter
1
- ACRONYMS:
DFA, NFA, GNFA
- page 38: Since M3 accepts precisely in the states
M2 does not, it accepts the complement of L(M2),
which is what the text gets.
- page 39: Recall from the top of page 4 that n
modulo k is the remainder when n is divided by
k.
- page 40: "Regular" is used at the top of
the page, but is not defined until the bottom of the page.
- page 40: Definition of "accepts": Note that.,
if M and w are given, then 1 and 2 imply that r0=q0
and the remaining r's are determined by w and delta.
Hence there is no need to hunt for a sequence of r's
since only one sequence can possibly work. Thus, given w,
we determine r and check to see if rn is in F.
- page 48: With a DFA you can always proceed, but with
an NFA you can get stuck: there may be no edge out from a vertex
with the label you need. For example, if the input string
is 11..., we use Figure 1.14, and we go from q1 to q2
with the first 1, then we are stuck. (In #3 of the definition
on page 53, delta(q2,1) is the empty set.)
- page 49: It is convenient to say that q' can
be "reached" from q with input a if q'
is in delta(q,a).
- page 55: In #2, delta'(R,a) is
the set of states in Q that can be reached from some
r in R on some edge labeled a. (That's
what your fingers did.)
- page 56: The last paragraph of the proof, just above
the middle of the page, says "The construction of M
obviously works correctly." It is "obvious"
only after you clearly understand what the definition
does! Some of you might prefer a proof. The
next sentence in the paragraph is really the proof: If R
is the set of states the NFA can be in after some input string
w, then you should be able to convince yourself that delta'(R,a)
is the set of states the NFA can be in after input string wa.
- page 63: Keep straight the two uses of "regular":
regular expressions are built by certain rules and lead
to a language;
regular languages are languages that are recognized by
DFAs.
In Theorem 1.28 it is proved that the confusion doesn't matter:
A language comes from a regular expression if and only if it
is a regular language.
- page 64: You can think of a regular expression as a
set of strings that is constructed by using the constructions
allowed in Definition 1.26.
- page 65: An inductive definition is also called a "recursive"
definition.
- page 66: Here's a comparison between languages and
arithmetic:
languages |
union |
o |
epsilon |
empty set |
arithmetic |
+ |
X |
1 |
0 |
- page 66: Because of Theorem 1.28 and the closure properties
of regular languages (pp. 58-63), we could define a "generalized
regular expression" that allowed more operations than the
definition on page 64 but still gave the same set of languages. For
example, we could allow intersection by the footnote on page
46, complementation by Exercise 1.10(a), and reversal by
Problem 1.24. For example, if R and S are regular
expressions, their intersection R^S is not a regular expression
but it describes a regular language. (Consequently, there
is a regular expression that defines the same language; however,
that regular expression is not R^S.)
- page 73: In an FA, delta tells us what edges
and edge labels are possible. It was convenient to do specify
that in one way with DFAs and NFAs, but a different form is simpler
for GNFAs.
- page 78: The pigeonhole principle is only indirectly
related to pigeons. Old desks had a set of compartments
for storing papers and other items. They were referred
to as pigeonholes. The image of putting things into
such compartments led to the terminology "pigeonhole principle."
The desk compartments were probably named because they
reminded people of nesting boxes for pigeons.
- page 80: The proof at the top of the page involves
a couple of twists. First, it is a proof by contradiction.
Second, to show that the pumping lemma fails, we must negate
it with the "if...then" and quantifiers. The
lemma has the form
"If regular, then there exists
p > 0 such that for all s with |s| >= p
there exists x, y, z such
that for all i>= 0, ..."
It's easy to agree with the book about what the negation of this
is, but can you negate the lemma by yourself? If
you're not sure, you may want to see the discussion at the end
of the information on proofs.
- page 82: Example 1.42 can also be done by using reversal
(Problem 1.24).
- A variety of properties preserve regularity: Suppose
R and S are regular languages. Then so are their union,
intersection, and concatentation. The star and reversal
of R are also regular languages. You should look in
the text for proofs of these facts. Regularity is also
preserved under taking "substrings". If x, y,
and z are (possibly empty) strings, then y is a substring of
xyz. If x is empty, y is an initial substring. If
z is empty, y is an end substring. The substrings of R
are regular language, the initial substrings of R are an initial
language, and the end substrings of R are a regular language.
Can you prove this?
Chapter
2
- ACRONYMS:
CFL, CFG, PDA
- NOTATION:
Unfortunately, the index is incomplete. For --->
(right arrow) and ==> (double right arrow), it only refers
to implication and for =*=> (double right arrow with an asterisk
over it), there is no entry.
- ---> is used to indicate a rule in a
grammar (p. 92, middle)
- ==> is used to indicate a step in a
derivation (p. 92, bottom)
- =*=> is used to indicate one or more steps in a
derivation. It's first appearance seems to be on page 113,
without a definition.
- page 93: You will probably find it helpful to generate
other "sentences" for this grammar. Also, can
you find some reasonable English sentences that can be made with
the vocabulary that are not in the language?
- page 103: In 2 at the bottom of the page, the top of
the stack is on the left. Thus, in at, the top of the
stack is a.
- page 106: The idea is to use a leftmost derivation
(p.98). When a rule is used, any terminal symbols to the
left of what is now the leftmost terminal are popped from the
stack and checked against the input.
- page 111: There's a point that is slid over in the
middle of the page. If we are attempting to write rules
for A_sub_pq based on the strings that are generated,
we might expect an infinite number of rules since most
CFLs contain an infinite number of strings. This would
be illegal since a CFG has only a finite number of rules
(Defn. 2.1.3 p.94). In the proof at the bottom of the page,
this problem disappears since we focus on the choices for a,
b, r, and s, of which there are a finite
number. At this point, you may find it helpful to construct
the CFG for the PDA in Figure 2.6 (p.104).
- page 117: One can show that A = {a^n b^n c^k | n and k nonnegative integers}
and B = {a^k b^n c^n | n and k nonnegative integers}
are CFLs by using the idea in Example 2.9 on page 104.
The intersection of these two languages is the language in Example 2.20.
Thus the intersection of CFLs may not be a CFL. In contrast,
the intersection of regular languages is regular (footnote page
46). Of course, some operations on CFLs always give CFLs.
(See Problem 2.15.)
Chapter
3
- ACRONYMS:
TM
- Although this chapter begins a new part of the text, it can
be viewed as a continuation Part I since it introduces more powerful
automata We've gone from DFA to NFA (no more powerful)
to PDA, and now to Turing machines. ("More powerful"
means able to recognize more languages.)
- page 128: In the definition of delta, the
symbol is written before moving the head. This can
be seen by studying the discussion in the first two paragraphs
on page 129.
- page 131: "4. Return the head to the left-hand
end of the tape." There is no way to tell when
the left-hand end of the tape has been reached. (In contrast,
one can detect the right-hand end of the written material because
of the blank character.) There are at least three ways
around this in general and one in particular:
- (general) Insist that all input strings must begin with a
special character, say $, to mark the start of the string. This
limitation on input is neither desirable nor elegant.
- (general) Figure out how to detect the beginning. See
the discussion on page 134 for Stage 2.
- (general) Figure out how to mark the beginning. This
can be done by copying each character one character to the right
and replacing the left-most character with $. This would
seem to require that a Turing machine have memory so that in
can remember a symbol to write it after moving.
(See the note concerning page 131.) You can give a
machine a finite memory---think of it as labeling the states
the way we did for DFAs as on pages 42-44. With this idea,
one can build a machine that starts by remembering $ and then
repeating
remember
current symbol, write previously remembered symbol, and move
right
until a blank is remembered. Having done this,
move left until $ is seen, move right one step. The machine
is now at the start of input and can begin executing the "program"
in Example 3.4. Thus, we
can always assume that the tape has a special symbol at its left-hand
end.
An alternative method for marking the beginning is to double
the size of the input alphabet. The added characters
are the same as original except that each has an added mark,
say x' instead of x. The first thing the machine
does is replace the initial symbol a on the tape with
a'. The machine transitions must then allow for
the possibility of symbols with primes when it may be at the
beginning of the tape.
- (particular) If the leftmost symbol on the tape is not needed,
it can be overwritten with a special character. In
this example, the blank character is used.
- page 133: The Turing machine is missing q_reject. It is reached from q_14 if the symbol on the tape is #,
0, or 1. Other missing edges also go to the rejecte
state.
- finite memory: In various places, e.g. the discussion
of Stage 2 on page 134, we want a Turing machine to have a (finite
memory). Here is a way to do this. Let M be
the possible memory states, including empty (same as don't care). Replace
each state q of the machine excpet the start, accept and
reject states with the set of states {q} x M. The new
machine has a transition from (q, m) to (q', m')
when we want to move from q to q' and change the
memory contents from m to m'. (Some
of the new states may be impossible to reach. They can
simply be dropped from the machine.)
- Section 3.2: This section shows the extraordinary
robustness of the Turing machine concept; that is, many modifications
fail to increase the languages Turing machines recognize. Compare
this to DFAs: Adding nondeterminism does not give
DFAs more power, but adding a stack does. PDAs are
rather nonrobust: Removing nondeterminism reduces their
power (not proved in text---just a remark near the bottom of
page 102). Adding a second stack increases their power;
for example, a two stack machine can recognize the language
{0^n 1^n 2^n | n > 0},
but a one stack machine cannot. Turing machine robustness
lends strength to the Church-Turing thesis (page 143).
- page 136: The "simulation" idea mentioned
in the middle of the page is not new:
- To prove Theorem 1.19 (p. 55) we showed that a DFA could
simulate an NFA.
- To prove Lemma 2.13 (p. 107) we showed that a PDA could simulate
the derivation process of a CFG.
- page 137: Step 3 requires some recopying, which
can be done in the manner described in the note for page131.
- page 138: Think about the following question: Why can't
Theorem 3.8 be proved by some simple variation of the proof for
Theorem 1.19 (p. 55)?
- page 141: The proof of Theorem 3.13 is rather brief.
For example, how can M run E? Essentially,
this involves embedding most of the machine for E in M.
We did something like this on pages 59-62, but the details were
given much more explicitly.
- Problem 3.9: A TM can simulate an multistack PDA.
Part b of this problem shows that a 2-stack PDA can simulate
a TM, so they have equivalent power. Part a shows
that 2-stack PDAs are more powerful than 1-stack PDAs. Hence
we have the following:
DFA = NFA
< 1-stack PDA < n-stack PDA = 1-tape TM = nondeter.,
n-tape, etc., TM
Here n>1 and the equalities and inequalities refer to power,
so A < B means B is more powerful than A.
- Running Forever: Some of our machines could
run forever. Here are the ones we've looked at:
DFA: Must
stop because an input is read at each step.
NFA: Could
run forever by following a cycle of epsilon edges.
PDA: Like an
NFA, could run forever.
Turing machine: Could
run forever reading and writing on the tape; e.g., a Turing machine
with
input n can replace it on the tape with n+1, so
let this machine start
again
after it writes n+1. It will just keep "counting".
Although they can run forever, we can replace NFAs and PDAs with
equivalent devices which do not:
- We showed that a NFA can be replaced by a DFA.
- A PDA can be replaced by a CFG which can be rewritten in
Chomsky normal form (page 99). In Chomsky normal
form, the number of rules of the form A --> BC
that are used equals the length of the input. Hence
for input of length n, exactly 2n rules are used
unless n=0, in which case one rule is used.
- The Turing machine situation is more difficult and will be
discussed in Section 4.2.
- Impossible Problems: In the past couple of centuries,
mathematics has developed tools for showing that problems are
impossible.
- Perhaps the earliest such problems go back to the Greeks:
Using the traditional tools of geometric construction (straight-edge
and compass), double the cube, trisect the angle, and square
the circle. Doubling the cube requires constructing the
cube root of 2 and trisecting the angle involves solving a cubic
equation. Galios theory (Math 100) can be used to
show that these cannot be done because all geometric construction
allows you to do is add, subtract, multiply, divide, and take
square roots. Squaring the circle requires constructing
pi. About a century ago Lindemann showed that pi is
"transcendental," which makes this problem impossible,
too.
- A problem dating to the Renaissance is the question of finding
a method for obtaining the roots of a general 5th degree polynomial. A
"method" means an algorithm where the only arithmetic
operations allowed are addition, subtraction, multiplication,
division, and taking roots. A formula for quadratics (requiring
square roots) had been known for a long time. Italian
mathematicians dealt with 3rd and 4th degree polynomials (which
required square, cube, and fourth roots). Galois theory
can be used to show that there cannot be any such method.
By the way, solving 3rd degree (cubic) polynomials motivated
the introduction of imaginary numbers.
- Around the turn of the century, mathematical logicians were
concerned with the problem of providing a secure basis for mathematics.
(Various paradoxes had arisen. For example, in set
theory, if S is the set of all sets that don not contain themselves,
does S contain itself? This is akin to the problem
of the village's adult male barber who shaves every adult male
villager who does not shave himself.) Go"del proved
that it was impossible to find a provably secure basis.
- Turing machines provide the tool for proving the nonexistence
of algorithms. The problem of finding an algorithm for
Diophantine problems (integer solutions of polynomial equations)
is discussed in Section 3.3. Section 4.2 and Chapter 5
discuss other problems for which algorithms do not exist.
(They are called "undecidable problems.")
- The most famous unsolved question about impossibility is
P=NP (page 247).
It is impossible to
find polynomial time algorithms for a variety of important problems
if
and only if
P
is strictly smaller than NP.
Chapter
4
- ACRONYMS:
A-sub-X, E-sub-X, EQ-sub-X where X is
some language, automaton, or grammar.
- ECODING:
In the chapter, we often have one machine whose input may involve
a string in an arbitrary alphabet. Thus it seems that the machine's
tape alphabet must be infinite to allow for all possibilities.
Not so, what we call something is arbitrary. A
finite set can be ordered in an arbitrary manner and its members
encoded as s1, s2, s3, and so on. This only requires an
eleven symbol alphabet: {s,0,1,2,3,4,5,6,7,8,9}.
- page 152: For A_DFA,
there is the problem of encoding arbitrary finite sets, discussed
in the preceding remark, and the problem of encoding a machine
description. The latter is handled much like the graph
on page 146. For example, the DFA in Figure 1.4 on
page 34 could be encoded as
(q1,q2,q3) (0,1)
((q1,0,q1),(q1,1,q2),(q2,0,q3),(q2,1,q2),(q3,0,q2),(q3,1,q2))
q1 (q2)
where the five parts of the DFA definition (page 35) are
separated by spaces for clarity.
- page 153: The reduction of an NFA to a DFA requires
an algorithm for computing E(R) on page 56. The
idea is simple start with E(R)={R} and R
unmarked. As long as E(R) contains an unmarked
q, mark it and add to E(R) all q' for which
there is an epsilon edge (q, q').
- page 158: You may wonder what the difference is between
Theorem 4.6 and Theorem 4.8. The former requires
a machine that decides membership for any CFL while the
latter only requires a machine that does so for a particular
(but arbitrary) CFL. Hence Theorem 4.6 is stronger.
- page 159: All containments in the figure are proper:
- We already know not all context-free are regular (Examples
1.38 and 2.9).
- Not all decidable are context-free. (Example 2.20 is decidable.)
- Section 4.2 shows that not all Turing-recognizable are decidable.
- page 161: An alternative to Definition 4.10 is:
1. If there is a one-to-one
map f : A --> B, we say that B
is at least as big as A.
2. If A is at
least as big as B and B is at least as big as A,
we say that A and B are the same size.
It can be shown that this definition of "same size"
is equivalent to the one in the text, but I will not do so.
- page 163: A similar diagonal argument shows that, for
any set S, the set of subsets of S is bigger than S.
- Halting problem: If you think that this doesn't show
that undecidability may be a common occurrence, how about a simple
solitare game. A set of "dominoes" is generated
randomly and you are supposed to do something with it.
See Section 5.2 for an explanation of the game. You might
think that the halting problem could be solved if we simply had
a great store of algorithms for answering the question and were
clever enough to choose the right algorithm for the particular
TM that is being given as input. It would require
an infinite supply of algorithms since choosing among a finite
set can be done by a nondeterministic TM. (You should
be able to see why this is so.)
Chapter
7
- Notation: If you have trouble understanding big-oh
and little-oh notation, you might look at another book; for example,
Concrete Mathematics by Graham, Knuth, and Patashnik.
- page 227: We could define f (n)
= O(g(n)) by "f (n)/g(n)
is bounded". Some people define O(g(n))
to be the set of all functions f (n)
such that f (n)/g(n) is bounded.
In that case, equality is replaced by set membership.
- page 228: Little-oh can be treated the same way big-oh
was treated in the previous remark. Thus o(g(n))
would be all functions f (n) such that f (n)/g(n)
goes to 0 as n goes to infinity.
- page 229: Note that, if s(n) = O(t(n)),
then TIME(s(n))
is a subset of TIME(t(n)).
- page 239: The proof has a gap: We need to know how
long each step in algorithm E takes. Step 3
clearly takes an amount of time that is roughly proportional
to the number of digits in x and y and so is O(input size). How
long does division take? Recall the standard algorithm
for division. It obtains digits in the quotient one by
one. The number of digits in the quotient is O(input size). The
amount of work to find a single digit is also O(input size),
which can be seen by trying all possible values of the digit
and noting that multiplying a number by a digit requires O(input size)
work.
- page 240: Be careful with Theorem 7.14. It
states that given a CFL, say L, the language L is in P.
In other words, there is a Turing machine, say TM_L such that
TM_L decides L in polynomial time. This is not
the same as saying that A_CFG is in P because this requires one
Turing machine for all CFGs. Suppose we are given a CFG.
We could build a Turing machine to convert it to Chomsky normal
form and then use the algorithm in the proof of Theorem 7.14
to see if the CFG can produce the string w. How can
the TM convert the grammar to Chomsky normal form? If it
uses the method in the proof of Theorem 2.6 (page 99), the
new Chomsky normal form could be exponentially larger that the
original language. This can happen because a rule containing
k copies of A in its right side becomes 2^k rules
when we eliminate the rule A-->(epsilon). Prof. Sam
Buss has indicated how to avoid this. First change
all rules so that they have at most two symbols on their right
side, then proceed as in the proof of Theorem 2.6. To illustrate
the conversion, the rule A --> BCDE would be replaced
by the rules A --> BX, X --> CY, and
Y --> DE, where X and Y are new variable names. This
expands the langauge by an amount that is at most linear in the
number of symbols needed to express the original grammar.
- page 251: cnf-form plays an important role in the computer
language Prolog.
- page 252: Two nodes in the graph are connected by an
edge if and only if (i) they are in different clauses and (ii) they
correspond to literals that can both be true at the same time.
- page 253: If there is an NP-complete problem, it must
be, in some sense, the hardest problem in NP. There is
no reason to believe such a problem exists. (Why can't
there be harder and harder problems in NP without end?)
What is remarkable is that many important problems
are NP-complete.
- page 255: There is a slight problem: It turns out that
the proof needs the Turing machine to be able to continue after
reaching q_accept.
It also requires that, once q_reject
is reached, the machine cannot continue on to q_accept. The first occurs
because the proof requires a filled tableau. The second
occurs because the proof only checks for the presence of q_accept in the tableau. To
avoid these problems, we can modify the definition of a Turing
machine so that it loops at both q_accept
and q_reject.
- page 255: Before reading the details of the various
subscripted phis, it may help to understand what each
one is designed to accomplish:
- phi_cell insures that each
for each cell (i, j) exactly one element of
C is present.
- phi_start insures that the first
line of the configuration is the start state of the Turing machine.
- phi_move insures that each row of the table is
a legal move from the preceding move. This can be done
by creating a 2 by 3 window which is moved across a pair of lines
and checking that each entry in the window is legal.
- phi_accept insures that the accept state appears in
the configuration.
- End of Section 7.4: There are two natural questions
to ask at this point:
- Are there other NP complete problems? Yes, hundreds
of them. Garey and Johnson (see Sipser's references) list
a lot, but many more have been discovered since their book was
published.
- We don't know if P=NP, so are there any interesting
problems that are Turing decidable that are known to not be in
P? Yes, an example is given on pages 313-314: Consider
regular expressions. Allow the shorthand R^k to
stand for R concatenated with itself k times. Determining
if two regular expressions written in this shorthand are equal
is not decidable in polynomial time.