Discrete Mathematics - MGNet



Discrete Mathematics

University of Kentucky CS 275

Spring, 2007

Professor Craig C. Douglas



Material Covered (Spring 2007)

|Tuesday |Pages |Thursday |Pages |

| | |1/11 |1-9 |

|1/16 |9-24 |1/18 |24-33 |

|1/23 |34-45 |1/25 |46-52 |

|1/30 |53-65 |2/1 |Exam 1 |

|2/6 |66-73 |2/8 |74-83 |

|2/13 |84-92 |2/15 |92-94 |

|2/20 |95-106 |2/22 |106-115 |

|2/27 |116-124 |3/1 |Exam 2 |

|3/6 |125-132 |3/8 |No class |

|3/13 |Spring |3/15 |Break |

|3/20 |132-142 |3/22 |No class |

|3/26 |142-156 |3/28 |Exam 3 |

|4/3 |157-169 |4/5 |170-177 |

|4/10 |178-185 |4/12 |186-197 |

|4/17 |198-210 |4/19 |Exam 4 |

|4/24 |211-217 |4/26 |Rama: review |

|5/1 |No class |5/3 |Final: 8-10 AM |

The final exam will cover Chapters 1-10.

Course Outline

1. Logic Principles

2. Sets, Functions, Sequences, and Sums

3. Algorithms, Integers, and Matrices

4. Induction and Recursion

5. Simple Counting Principles

6. Discrete Probability

7. Advanced Counting Principles

8. Relations

9. Graphs

10. Trees

11. Boolean Algebra

12. Modeling Computation

Logic Principles

Basic values: T or F representing true or false, respectively. In a computer T an F may be represented by 1 or 0 bits.

Basic items:

• Propositions

o Logic and Equivalences

• Truth tables

• Predicates

• Quantifiers

• Rules of Inference

• Proofs

o Concrete, outlines, hand waving, and false

Definition: A proposition is a statement of a true or false fact (but not both).

Examples:

• 2+2 = 4 is a proposition because this is a fact.

• x+1 = 2 is not a proposition unless a specific value of x is stated.

Definition: The negation of a proposition p, denoted by ¬p and pronounced not p, means that, “it is not the case that p.” The truth values for ¬p are the opposite for p.

Examples:

• p: Today is Thursay, ¬p: Today is not Thursday.

• p: At least a foot of snow falls in Boulder on Fridays. ¬p: Less than a foot of snow falls in Boulder on Fridays.

Definition: The conjunction of propositions p and q, denoted p(q, is true if both p and q are true, otherwise false.

Definition: The disjunction of propositions p and q, denoted p(q, is true if either p or q is true, otherwise false.

Definition: The exclusive or of propositions p and q, denoted p(q, is true if only one of p and q is true, otherwise false.

Truth tables:

|p |¬p |q |p(q |p(q |p(q |

|T |F |T |T |T |F |

|T * |F * |F |F |T |T |

|F * |T * |T |F |T |T |

|F |T |F |F |F |F |

* The truth table for p and ¬p is really a 2(2 table.

Concepts so far can be extended to Boolean variables and Bit strings.

Definition: A bit is a binary digit. Hence, it has two possible values: 0 and 1.

Definition: A bit string is a sequence of zero or more bits. The length of a bit string is the number of bits.

Definition: The bitwise operators OR, AND, and XOR are defined based on (, (, and (, bit by bit in a bit string.

Examples:

• 010111 is a bit string of length 6

• 010111 OR 110000 = 110111

• 010111 AND 110000 = 010000

• 010111 XOR 110000 = 100111

Definition: The conditional statement is an implication, denoted p(q, and is false when p is true and q is false, otherwise it is true. In this case p is known as a hypothesis (or antecedent or premise) and q is known as the conclusion (or consequence).

Definition: The biconditional statement is a bi-implication, denoted p(q, and is true if and only if p and q have the same truth table values.

Truth tables:

|p |q |p(q |p(q |

|T |T |T |T |

|T |F |F |F |

|F |T |T |F |

|F |F |T |T |

We can compound logical operators to make complicated propositions. In general, using parentheses makes the expressions clearer, even though more symbols are used. However, there is a well defined operator precedence accepted in the field. Lower numbered operators take precedence over higher numbered operators.

|Operator |Precedence |

|¬ |1 |

|( |2 |

|( |3 |

|( |4 |

|( |5 |

Examples:

• ¬p(q = (¬p) (q

• p(q(r = (p(q) (r

Definition: A compound proposition that is always true is a tautology. One that is always false is a contradiction. One that is neither is a contingency.

Example:

|p |¬p |p(¬p |p(¬p |

|T |F |F |T |

|F |T |F |T |

|contigencies |contradiction |tautology |

Definition: Compound propositions p and q are logically equivalent if p(q is a tautology and is denoted p(q (sometimes written as p(q instead).

Theorem: ¬(p(q) ( ¬p ( ¬q.

Proof: Construct a truth table.

|p |q |¬(p(q) |¬p |¬q |¬p(¬q |

|T |T |F |F |F |F |

|T |F |F |F |T |F |

|F |T |F |T |F |F |

|F |F |T |T |T |T |

qed

Theorem: ¬(p(q) ( ¬p ( ¬q.

Proof: Construct a truth table similar to the previous theorem.

These two theorems are known as DeMorgan’s laws and can be extended to any number of propositions:

¬(p1(p2(…(pk) ( ¬ p1 ( ¬ p2 ( … ( ¬ pk

¬(p1(p2(…(pk) ( ¬ p1 ( ¬ p2 ( … ( ¬ pk

Theorem: p(q ( ¬p(q.

Proof: Construct a truth table.

|p |q |p(q |¬p |¬p(q |

|T |T |T |F |T |

|T |F |F |F |F |

|F |T |T |T |T |

|F |F |T |T |T |

qed

These proofs are examples are concrete ones that are proven using an exhaustive search of all possibilities. As the number of propositions grows, the number of possibilities grows like 2k for k propositions.

The distributive laws are an example when k=3.

Theorem: p( (q(r) ( (p(q)((p(r).

Proof: Construct a truth table.

|p |q |r |p( (q(r) |p(q |p(r |(p(q)((p(r) |

|T |T |T |T |T |T |T |

|T |T |F |T |T |T |T |

|T |F |T |T |T |T |T |

|T |F |F |T |T |T |T |

|F |T |T |T |T |T |T |

|F |T |F |F |T |F |F |

|F |F |T |F |F |T |F |

|F |F |F |F |F |F |F |

qed

Theorem: p( (q(r) ( (p(q) ( (p(r).

Proof: Construct a truth table similar to the previous theorem.

Some well known logical equivalences includes the following laws:

|( |Law |

|p(T(p |Identity |

|p(F(p | |

|p(T(T |Domination |

|p(F(F | |

|p(p(p |Idempotent |

|p(p(p | |

|¬(¬p) (p |Double negation |

|p(¬p (T |Negation |

|p(¬p (F | |

|p(q(q(p |Commutative |

|p(q(q(p | |

|(p(q)(r( p((q(r) |Associative |

|(p(q) (r( p((q(r) | |

|p((q(r) ((p(q)((q(r) |Distributive |

|p((q(r) ((p(q)((q(r) | |

|¬(p(q) ( ¬p(¬q |DeMorgan |

|¬(p(q) ( ¬p(¬q | |

|p((p(q)(p |Absorption |

|p((p(q)(p | |

All of these laws can be proven concretely using truth tables. It is a good exercise to see if you can prove some.

Well known logical equivalences involving conditional statements:

p(q ( ¬p(q

p(q ( ¬q(¬p

p(q ( ¬p(q

p(q ( ¬(p(¬q)

¬(p(q) ( p(¬q

(p(q)((p(r) ( p((q(r)

(p(r)((q(r) ( (p(q)(r

(p(q)((p(r) ( p((q(r)

(p(r)((q(r) ( (p(q)(r

Well known logical equivalences involving biconditional statements:

p(q ( (p(q)((q(p)

p(q ( ¬p(¬q

p(q ( (p(q) ( (¬p(¬q)

¬(p(q) ( p(¬q

Propositional logic is pretty limited. Almost anything you really are interested in requires a more sophisticated form of logic: predicate logic with quantifiers (or predicate calculus).

Definition: P(x) is a propositional function when a specific value x is substituted for the expression in P(x) gives us a proposition. The part of the expression referring to x is known as the predicate.

Examples:

• P(x): x > 24. P(2) = F, P(102) = T.

• P(x): x = y + 1. P(x) = T for one value only (y is an unbounded variable).

• P(x,y): x = y + 1. P(2,1) = T, P(102,-14) = F.

Definition: A statement of the form P(x1,x2,…,xn) is the value of the propositional function P at the n-tuple (x1,x2,…,xn). P is also known as a n-place (or n-ary) predicate.

Definition: The universal quantification of P(x) is the statement P(x) is true for all values of x in some domain, denoted by (x P(x).

Definition: The existential quantification of P(x) is the statement P(x) is true for at least one value of x in some domain, denoted by (x P(x).

Definition: The uniqueness quantification of P(x) is the statement P(x) is true for exactly one value of x in some domain, denoted by (!x P(x).

There is an infinite number of quantifiers that can be constructed, but the three above are among the most important and common.

Examples: Assume x belongs to the real numbers.

• (x 0). The negative real numbers form the domain.

• (!x (x1223 = 0).

( and ( have higher precedence than the logical operators.

Example: (x P(x)(Q(x) means ((x P(x))(Q(x).

Definition: When a variable is used in a quantification, it is said to be bound. Otherwise the variable is free.

Example: (x (x = y + 1).

Definition: Statements involving predicates and quantifiers are logically equivalent if and only if they have the same truth value independent of which predicates are substituted and in which domains are used. Notation: S ( T.

DeMorgan’s Laws for Negation:

• ¬(x P(x) ( (x ¬P(x).

• ¬(x P(x) ( (x ¬P(x).

Nested quantifiers just means that more than one is in a statement. The order of quantifiers is important.

Examples: Assume x and y belong to the real numbers.

• (x(y (x + y = 0).

• (x(y (x < 0) ( (y > 0) ( xy < 0.

Quantification of two variables:

|Statement |When True? |When False? |

|(x(y P(x,y) |For all x and y, P(x,y)=T. |There is a pair of x and y such that P(x,y)=F. |

|(x(y P(x,y) |For all x there is a y such that P(x,y)=T |There is an x such that for all y, P(x,y)=F. |

|(x(y P(x,y) |There is an x such that for all y, P(x,y)=T. |For all x there is a y such that P(x,y)=F. |

|(x(y P(x,y) |There is a pair x and y such that P(x,y)=T. |For all x and y, P(x,y)=F. |

Rules of Inference are used instead of truth tables in many instances. For n variables, there are 2n rows in a truth table, which gets out of hand quickly.

Definition: A propositional logic argument is a sequence of propositions. The last proposition is the conclusion. The earlier ones are the premises. An argument is valid if the truth of the premises implies the truth of the conclusion.

Definition: A propositional logic argument form is a sequence of compound propositions involving propositional variables. An argument form is valid if no matter what particular propositions are substituted for the proposition variables in its premises, the conclusion remains true if the premises are all true.

Translation: An argument form with premises p1, p2, …, pn and conclusion q is valid when (p1(p2(…(pn) ( q is a tautology.

There are eight basic rules of inference.

|Rule |Tautology |Name |

|p |[p(( p(q)] ( q |Modus ponens |

|p(q | | |

|(q | | |

|¬q |[¬q((p(q)] ( ¬p |Modus tollens |

|p(q | | |

|(¬p | | |

|p(q |[(p(q)((q(r)] ( (p(r) |Hypothetical syllogism |

|q(r | | |

|( p(r | | |

|p(q |[(p(q)(¬p] ( q |Disjunctive syllogism |

|¬p | | |

|(q | | |

|p |p ( (p(q) |Addition |

|(p(q | | |

|Rule |Tautology |Name |

|p(q |(p(q) ( p |Simplification |

|(p | | |

|p |[(p)((q)] ( (p(q) |Conjunction |

|q | | |

|(p(q | | |

|p(q |[(p(q)((¬p(r)]( (q(r) |Resolution |

|¬p(r | | |

|(q(r | | |

Rules of Inference for Quantified Statements:

|Rule of Inference |Name |

|(x P(x) |Universal instantiation |

|(P(c) | |

|P(c) for an arbitrary c |Universal generalization |

|((x P(x) | |

|(x (P(x) ( Q(x)) |Universal modus ponens |

|P(a), where a is a particular element in the domain | |

|(Q(a) | |

|(x (P(x) ( Q(x)) |Universal modus tollens |

|¬Q(a), where a is a particular element in the domain | |

|(¬P(a) | |

|(x P(x) |Existential instantiation |

|(P(c) for some c | |

|P(c) for some c |Existential generalization |

|((x P(x) | |

Sets, Functions, Sequences, and Sums

Definition: A set is a collection of unordered elements.

Examples:

• Z = {…, -3, -2, -1, 0, 1, 2, 3, …}

• N = {1, 2, 3, …} and ( = N0 = {0, 1, 2, 3, …} (Slightly different than text)

• Q = {p/q | p,q(Z, q(0}

• R = {reals}

Definition: The cardinality of a set S is denoted |S|. If |S| = n, where n(Z, then the set S is a finite set. Otherwise it is an infinite set (|S| = ().

Example: The cardinality of of Z, N, N0, Q, and R is infinite.

Definition: If |S| = |N|, then S is a countable set. Otherwise it is an uncountable set.

Examples:

• Q is countable.

• R is uncountable.

Definition: Two sets S and T are equal, denoted S = T, if and only if (x(x(S ( x(T).

Examples:

• Let S = {0, 1, 2} and T = {2, 0, 1}. Then S = T. Order does not count.

• Let S = {0, 1, 2} and T = {0, 1, 3}. Then S ( T. Only the elements count.

Definition: The empty set is denoted by (. Note that (S(((S).

Definition: A set S is a subset of a set T if (x(S(x(T) and is denoted S(T. S is a proper subset of T if S(T, but S(T and is denoted S(T.

Example: S = {1, 0} and T = {0, 1, 2}. Then S(T.

Theorem: (S(S(S).

Proof: By definition, (x(S(x(S).

[pic]

Definition: The Power Set of a set S, denoted P(S), is the set of all possible subsets of S.

Theorem: If |S| = n, then |P(S)| = 2n.

Example: S = {0, 1}. Then P(S) = {(, {0}, {1}, {0,1}}

Definition: The Cartesian product of n sets Ai is defined by ordered elements from the Ai and is denoted A1(A2(…(An = {(a1,a2,…an) | ai(Ai}.

Example: Let S = {0, 1} and T = {a, b}. Then S(T = {(0,a), (0,b), (1,a), (1,b)}.

Definition: The union of n sets Ai is defined by

[pic]= A1(A2(…(An = {x | (i x(Ai}.

Definition: The intersection of n sets Ai is defined by

[pic] = A1(A2(…(An = {x | (i x(Ai

[pic] [pic]

Definition: n sets Ai are disjoint if A1(A2(…(An = (.

Definition: The complement of set S with respect to T, denoted T(S, is defined by T(S = {x(T | x(S}. T(S is also called the difference of S and T.

Definitions: The universal set is denoted U. The universal complement of S is

[pic] = U(S.

Examples:

• Let S = {1, 0} and T = {0, 1, 2}. Then

o S(T.

o S(T = S.

o S(T = T.

o T(S = {2}.

o Let U = N0. [pic]= {2, 3, …}

• Let S = {0, 1} and T = {2, 3}. Then

o S(T.

o S(T = (.

o S(T = {0, 1, 2, 3}.

o T(S = {2, 3}.

o Let U=R. Then [pic] is the set of all reals except the integers 0 and 1, i.e., [pic] = {x(R | x(0 ( x(1}.

The textbook has a large number of set identities in a table.

|Identity |Law(s) |

|A(( = A, A(U = A |Identity |

|A(U = U, A(( = ( |Domination |

|A(A = A, A(A = A |Idempotent |

|[pic] = A |Complementation |

|A(B = B(A, A(B = B(A |Commutative |

|A((B(C) = (A(B)(C, A( (B(C) = (A(B) (C |Associative |

|A( (B(C) = (A(B) ( (A(C) |Distributive |

|A((B(C) = (A(B) ( (A(C) | |

|[pic], [pic] |DeMorgan |

|A( (A(B) = A, A( (A(B) = A |Absorption |

|[pic] |Complement |

Many of these are simple to prove from very basic laws.

Definition: A function f:A(B maps a set A to a set B, denoted f(a) = b for a(A and b(B, where the mapping (or transformation) is unique.

Definition: If f:A(B, then

• If (b(B (a(A (f(a) = b), then f is a surjective function or onto.

• If A=B and f(a) = f(b) implies a = b, then f is one-to-one (1-1) or injective.

• A function f is a bijection or a one-to-one correspondence if it is 1-1 and onto.

Definition: Let f:A(B. A is the domain of f. The minimal set B such that f:A(B is onto is the image of f.

Definitions: Some compound functions include

• [pic]. We can substitute + if we expand the summation.

• [pic]. We can substitute * if we expand the product.

Definition: The composition of n functions fi: Ai(Ai+1 is defined by

(f1(f2(…(fn)(a) = f1(f2(…(fn(a)…)),

where a(A1.

Definition: If f: A(B, then the inverse of f, denoted f-1: B(A exists if and only if (b(B (a(A (f(a) = b ( f-1(b) = a).

Examples:

• Let A = [0,1] ( R, B = [0,2] ( R.

o f(a) = a2 and g(a) = a+1. Then f+g: A(B and f*g: A(B.

o f(a) = 2*a and g(a) = a-1. Then neither f+g: A(B nor f*g: A(B.

• Let B = A = [0,1] ( R.

o f(a) = a2 and g(a) = 1-a. Then f+g: A(A and f*g: A(A. Both compound functions are bijections.

o f(a) = a3 and g(a) = a1/3. Then g(f(a): A(A is a bijection.

• Let A = [-1, 1] and B=[0, 1]. Then

o f(a) = a3 and g(a) = {x>0 | x= a1/3}. Then g(f(a): A(B is onto.

Definition: The graph of a function f is {(a,f(a)) | a(A}.

Example: A = {0, 1, 2, 3, 4, 5} and f(a) = a2. Then

[pic][pic]

(a) graph(f,A) (b) an approximation to graph(f,[0,5])

Definitions: The floor and ceiling functions are defined by

• (x( = largest integer smaller or equal to x.

• (x( = smallest integer larger or equal to x.

Examples:

• (2.99( = 2, (2.99( = 3

• (-2.99( = -3, (-2.99( = -2

Definition: A sequence is a function from either N or a subset of N to a set A whose elements ai are the terms of the sequence.

Definitions: A geometric progression is a sequence of the form {ari, i=0, 1, …}. An arithmetic progression is a sequence of the form {a+id, i=0, 1,…}.

Translation: f(a,r,i) = ari and f(a,d,i) = a + id are the corresponding functions.

There are a number of interesting summations that have closed form solutions.

Theorem: If a,r(R, then

[pic]

Proof: If r = 1, then we are left summing a n+1 times. Hence, the r = 1 case is trivial. Suppose r ( 1. Let [pic] Then

|rS = |[pic] |Substitution S formula. |

| |[pic] |Simplifying. |

| |[pic] |Removing n+1 term and adding 0 term. |

| |S+(arn+1-a) |Substituting S for formula |

Solve for S in rS = S+(arn+1-a) to get the desired formula. qed

Some other common summations with closed form solutions are

|Sum |Closed Form Solution |

|[pic] |[pic] |

|[pic] |[pic] |

|[pic] |[pic] |

|[pic] |(1(x)-1 |

|[pic] |(1(x)-2 |

Proving some of these requires knowledge about limits. There are close ties to integral and differential calculus, which is no surprise since integration is summation taken to a limit.

Example: [pic]xi = 0 when |x|k.

Pronunciation: f(x) is Big Oh of g(x).

Examples:

• f(x) = x2+2x is O(xn)

o When 0(x(1, x2(x, so 0 ( x2+2x ( x+2x ( 3x

o When x(1, x(x2, so 0 ( x2+2x ( x2+2x2 = 3x2

• In general, f(x) = [pic] with an(0 is O(xn) when x(1.

• n! is O(nn) when n(1.

o n! = 1(2(…(n ( n(n(…(n = nn.

• log(n!) is O(nlogn) when n(1.

o log(n!) ( log(nn) = nlog(n)

• log(n) is O(n) when n(1.

Theorem: If fi(x) is O(gi(x)), for 1(i(n, then

[pic] is O(max{|g1(x)|, |g2(x)|, …, |gn(x)|}).

Proof: Let g(x) = max{|g1(x)|, |g2(x)|, …, |gn(x)|} and Ci the constants associated with O(gi(x)). Then

[pic] ( [pic] ( [pic] = |g(x)| [pic] = C|g(x)|.

Theorem: If fi(x) is O(gi(x)), for 1(i(n, then [pic] is O([pic]).

Proof: Let g(x) = |g1(x)|(|g2(x)|(…|gn(x)| and Ci the constants associated with O(gi(x)). Then

[pic] ( [pic] ( C[pic].

Definition: Let f and g be functions from either Z or R to R. Then f(x) is ((g(x)) if there are constants C and k such that |f(x)| ( C|g(x)| whenever x>k.

Definition: Let f and g be functions from either Z or R to R. Then f(x) is ((g(x)) if f(x) = O(g(x)) and f(x) = ((g(x)). In this case, we say that f(x) is of order g(x).

Comment: f(x) = O(g(x)) notation is great in the limit, but does not always provide the right bounds for all values of x. (, denoted Big Omega, is used to provide lower bounds. (, denoted Big Theta, is used to provide both lower and upper bounds.

Example: f(x) = [pic] with an(0 is of order xn.

Notation: Timing, as a function of the number of elements falls into the field of Complexity.

|Complexity |Terminology |

|((1) |Constant |

|((log(n)) |Logarithmic |

|((n) |Linear |

|((nlog(n)) |nlog(n) |

|((nk) |Polynomial |

|((nklog(n)) |Polylog |

|((kn), where k>1 |Exponential |

|((n!) |Factorial |

Notation: Problems are tractable if they can be solved in polynomial time and are intractable otherwise.

Algorithms, Integers, and Matrices

Definition: An algorithm is a finite set of precise instructions for solving a problem.

Computational algorithms should have these properties:

• Input: Values from a specified set.

• Output: Results using the input from a specified set.

• Definiteness: The steps in the algorithm are precise.

• Correctness: The output produced from the input is the right solution.

• Finiteness: The results are produced using a finite number of steps.

• Effectiveness: Each step must be performable and in a finite amount of time.

• Generality: The procedure should accept all input from the input set, not just special cases.

Algorithm: Find the maximum value of [pic], where n is finite.

procedure max([pic]: integers)

max := a1

for i := 2 to n

if max < ai then max := ai

{max is the largest element}

Proof of correctness: We use induction.

1. Suppose n = 1, then max := a1, which is the correct result.

2. Suppose the result is true for k = 1, 2, …, i-1. Then at step i, we know that max is the largest element in a1, a2, …, ai-1. In the if statement, either max is already larger than ai or it is set to ai. Hence, max is the largest element in a1, a2, …, ai. Since i was arbitrary, we are done. qed

This algorithm’s input and output are well defined and the overall algorithm can be performed in O(n) time since n is finite. There are no restrictions on the input set other than the elements are integers.

Algorithm: Find a value in a sorted, distinct valued [pic], where n is finite.

There are many, many search algorithms.

procedure linear_search(x, [pic]: integers)

i := 1

while (i(n and x(ai)

i := i + 1

if i(n then location := i else location := 0

{location is the subscript of [pic] equal to x or 0 if x is not in [pic]}

We can prove that this algorithm is correct using an induction argument. This algorithm does not rely on either distinctiveness nor sorted elements.

Linear search works, but it is very slow in comparison to many other searching algorithms. It takes 2n+2 comparisons in the worst case, i.e., O(n) time.

procedure binary_search(x, [pic]: integers)

i := 1

j := n

while ( i < j )

m := ((i+j)/2(

if x > am then i := m+1 else j := m

if x = ai then location := i else location := 0

{location is the subscript of [pic] equal to x or 0 if x is not in [pic]}

We can prove that this algorithm is correct using an induction argument.

This algorithm is much, much faster than linear_search on average. It is O(logn) in time. The average time to find a member of [pic] can be proven to be of order n.

Algorithm: Sort the distinct valued [pic] into increasing order, where n is finite.

There are many, many sorting algorithms.

procedure bubble_sort([pic]: reals, n(1)

for i := 1 to n-1

for j := 1 to n-i

if aj > aj+1 then swap aj and aj+1

{[pic] is in increasing order}

This is one of the simplest sorting algorithms. It is expensive, however, but quite easy to understand and implement. Only one temporary is needed for the swapping and two loop variables as extra storage. The worst case time is O(n2).

procedure insertion_sort([pic]: reals, n(1)

for j := 2 to n

i := 1

while aj > ai

i := i + 1

t := aj

for k := 0 to j-i-1

aj-k := aj-k-1

ai := t

{[pic] is in increasing order}

This is not a very efficient sorting algorithm either. However, it is easy to see that at the jth step that the jth element is put into the correct spot. The worst case time is O(n2). In fact, insertion_sort is trivially slower than bubble_sort.

Number theory is a rich field of mathematics. We will study four aspects briefly:

1. Integers and division

2. Primes and greatest common denominators

3. Integers and algorithms

4. Applications of number theory

Most of the theorems quoted in this part of the textbook require knowledge of mathematical induction to rigorously prove, a topic covered in detail in the next chapter. (

Definition: If a,b(Z and a(0, we say that a divides b if (c(Z(b=ac), denoted by a | b. When a divides b, we denote a as a factor of b and b as a multiple of a. When a does not divide b, we denote this as a[pic]b.

Theorem: Let a,b,c(Z. Then

1. If a | b and a | c, then a | (b+c).

2. If a | b, then a | (bc).

3. If a | b and b | c, then a | c.

Proof: Since a | b, (s(Z(b=as).

1. Since a | c it follows that ( t(Z(c=at). Hence, b+c = as + at = a(s+t). Therefore, a | (b+c).

2. bc = (as)c = a(sc). Therefore, a | (bc).

3. Since b | c it follows that ( t(Z(c=bt). c = bt = (as)t = a(st), Therefore, a | c.

Corollary: Let a,b,c(Z. If a | b and b | c, then a | (mb+nc) for all m,n(Z.

Theorem (Division Algorithm): Let a,d(Z(d > 0). Then (!q,r(Z(a = dq+r).

Definition: In the division algorithm, a is the dividend, d is the divisor, q is the quotient, and r is the remainder. We write q = a div d and r = a mod d.

Examples:

• Consider 101 divided by 9: 101 = 11(9 + 2.

• Consider -11 divided by 3: -11 = 3(-4) + 1.

Definition: Let a,b,m(Z(m > 0). Then a is congruent to b modulo m if m | (a-b), denoted a ( b (mod m). The set of integers congruent to an integer a modulo m is called the congruence class of a modulo m.

Theorem: Let a,b,m(Z(m > 0). Then a ( b (mod m) if and only if a mod m = b mod m.

Examples:

• Does 17 ( 5 mod 6? Yes, since 17 – 5 = 12 and 6 | 12.

• Does 24 ( 14 mod 6? No, since 24 – 14 = 10, which is not divisible by 6.

Theorem: Let a,b,m(Z(m > 0). Then a ( b (mod m) if and only if (k(Z(a=b+km).

Proof: If a ( b (mod m), then m | (a-b). So, there is a k such that a-b = km, or a = b+km. Conversely, if there is a k such that a = b + km, then km = a-b. Hence, m | (a-b), or a ( b (mod m).

Theorem: Let a,b,c,d,m(Z(m > 0). If a ( b (mod m) and c ( d (mod m), then

a+c ( b+d (mod m) and ac ( bd (mod m).

Corollary: Let a,b,m(Z(m > 0). Then (a+b) mod m = ((a mod m)+(b mod m)) mod m and (ab) mod m = ((a mod m)(b mod m)) mod m.

Some applications involving congruence include

• Hashing functions h(k) = k mod m.

• Pseudorandom numbers: xn+1 = (axn+c) mod m.

o c = 0 is known as a pure multiplicative generator.

o c ( 0 is known as a linear congruential generator.

• Cryptography

Definition: A positive integer a is a prime if it is divisible only by 1 and a. It is a composite otherwise.

Fundamental Theorem of Arithmetic: Every positive integer greater than 1 can be written uniquely as a prime or the product of two or more primes where the prime factors are written in nondecreasing order.

Theorem: If a is a composite number, then a has a prime divisor less than or equal to a1/2.

Theorem: There are infinitely many primes.

Prime Number Theorem: The ratio of primes not exceeding a and x/ln(a) approaches 1 as a((.

Example: The odds of a randomly chosen positive integer n being prime is given by (n/ln(n))/n = 1/ln(n) asymptotically.

There are still a number of open questions regarding the distribution of primes.

Definition: Let a,b(Z(a and b not both 0). The largest integer d such that d | a and d | b is the greatest common devisor of a and b, denoted by gcd(a,b).

Example: gcd(24,36) = 12.

Definition: The integers a and b are relatively prime if gcd(a,b) = 1.

Definition: The integers [pic] are pairwise relatively prime if gcd(ai,aj) = 1 whenever 1(i1). Then if n(N, then there is a unique expression such that n = akbk+ ak-1bk-1+…+a1b+a0, where {ai},k(N0, ak(0, and 0(ai1, is really easy. Just group k bits together and convert to the base 2k symbol.

• Base 10 to any base 2k is a pain.

• Base 2k to base 10 is also a pain.

Algorithm: Addition of integers

procedure add(a, b: integers)

(an-1an-2…a1a0)2 := base_2_expansion(a)

(bn-1bn-2…b1b0)2 := base_2_expansion(b)

c := 0

for j := 0 to n-1

d := ((aj+bj+c)/2(

sj := aj+bj+c – 2d

c := d

sn := c

{the binary expansion of the sum is (sk-1sk-2…s1s0)2}

Questions:

• What is the complexity of this algorithm?

• Is this the fastest way to compute the sum?

Algorithm: Mutiplication of integers

procedure multiply(a, b: integers)

(an-1an-2…a1a0)2 := base_2_expansion(a)

(bn-1bn-2…b1b0)2 := base_2_expansion(b)

for j := 0 to n-1

if bj = 1 then cj := a shifted j places else cj := 0

{c0,c1,…,cn-1 are the partial products}

p := 0

for j := 0 to n-1

p := p + cj

{p is the value of ab}

Examples:

• (10)2((11)2 = (110)2. Note that there are more bits than the original integers.

• (11)2((11)2 = (1001)2. Twice as many binary digits!

Algorithm: Compute div and mod

procedure division(a: integer, d: positive integer)

q := 0

r := |a|

while r ( d

r := r – d

q := q + 1

if a < 0 and r > 0 then

r := d – r

q := -(q + 1)

{q = a div d is the quotient and r = a mod d is the remainder}

Notes:

• The complexity of the multiplication algorithm is O(n2). Much more efficient algorithms exist, including one that is O(n1.585) using a divide and conquer technique we will see later in the course.

• There are O(log(a)log(d)) complexity algorithms for division.

Modular exponentiation, bk mod m, where b, k, and m are large integers is important to compute efficiently to the field of cryptology.

Algorithm: Modular exponentiation

procedure modular_exponentiation(b: integer, k,m: positive integers)

(an-1an-2…a1a0)2 := base_2_expansion(k)

y := 1

power := b mod m

for i := 0 to n-1

if ai = 1 then y := (y ( power) mod m

power := (power ( power) mod m

{y = bk mod m}

Note: The complexity is O((log(m))2log(k)) bit operations, which is fast.

Euclidean Algorithm: Compute gcd(a,b)

procedure gcd(a,b: positive integers)

x := a

y := b

while y(0

r := x mod y

x := y

y := r

{gcd(a,b) is x}

Correctness of this algorithm is based on

Lemma: Let a=bq+r, where a,b,q,r(Z. then gcd(a,b) = gcd(b,r).

The complexity will be studied after we master mathematical induction.

Number theory useful results

Theorem: If a,b(N then (s,t(Z(gcd(a,b) = sa+tb).

Lemma: If a,b,c(N (gcd(a,b) = 1 and a | bc, then a | c).

Note: This lemma makes proving the prime factorization theorem doable.

Lemma: If p is a prime and p | a1a2…an where each ai(Z, then p | ai for some i.

Theorem: Let m(N and let a,b,c(Z. If ac ( bc (mod m) and gcd(c,m) = 1, then a ( b (mod m).

Definition: A linear congruence is a congruence of the form ax ( b (mod m), where m(N, a,b(Z, and x is a variable.

Definition: An inverse of a modulo m is an a such that aa ( 1 (mod m).

Theorem: If a and m are relatively prime integers and m>1, then an inverse of a modulo m exists and is unique modulo m.

Proof: Since gcd(a,m) = 1, (s,t(Z(1 = sa+tb). Hence, sa=tb ( 1 (mod m). Since tm ( 0 (mod m), it follows that sa ( 1 (mod m). Thus, s is the inverse of a modulo m. The uniqueness argument is made by assuming there are two inverses and proving this is a contradiction.

Systems of linear congruences are used in large integer arithmetic. The basis for the arithmetic goes back to China 1700 years ago.

Puzzle Sun Tzu (or Sun Zi): There are certain things whose number is unknown.

• When divided by 3, the remainder is 2.

• When divided by 5, the remainder is 3, and

• When divided by 7, the remainder is 2.

What will be the number of things? (Answer: 23… stay tuned why).

Chinese Remander Theorem: Let m1, m2,…,mn(N be pairwise relatively prime. Then the system x ( ai (mod mi) has a unique solution modulo [pic].

Existence Proof: The proof is by construction. Let Mk = m / mk, 1(k(n. Then gcd(Mk, mk) = 1 (from pairwise relatively prime condition). By the previous theorem we know that there is a yk which is an inverse of Mk modulo mk, i.e., Mkyk ( 1 (mod mk). To construct the solution, form the sum

x = a1M1y1 + a2M2y2 + … + anMnyn.

Note that Mj ( 0 (mod mk) whenever j(k. Hence,

x ( akMkyk ( ak (mod mk), 1(k(n.

We have shown that x is simultaneous solution to the n congruences. qed

Sun Tzu’s Puzzle: The ak({2, 1, 2} from 2 pages earlier. Next

mk({3, 5, 7}, m=3(5(7=105, and Mk=m/mk({35, 21, 15}.

The inverses yk are

1. y1 = 2 (M1 = 35 modulo 3).

2. y2 = 1 (M2 = 21 modulo 5).

3. y3 = 1 (M3 = 15 modulo 7).

The solutions to this system are those x such that

x ( a1M1y1 + a2M2y2 + a2M2y2 = 2(35(2 + 3(21(1 + 2(15(1 = 233

Finally, 233 ( 23 (mod 105).

Definition: A m(n matrix is a rectangular array of numbers with m rows and n columns. The elements of a matrix A are noted by Aij or aij. A matrix with m=n is a square matrix. If two matrices A and B have the same number of rows and columns and all of the elements Aij = Bij, then A = B.

Definition: The transpose of a m(n matrix A = [Aij], denoted AT, is AT = [Aji]. A matrix is symmetric if A = AT and skew symmetric if A = -AT.

Definition: The ith row of an m(n matrix A is [Ai1, Ai2, …, Ain]. The jth column is [A1j, A2j, …, Amj]T.

Definition: Matrix arithmetic is not exactly the same as scalar arithmetic:

• C = A + B: cij = aij + bij, where A and B are m(n.

• C = A – B: cij = aij - bij, where A and B are m(n

• C = AB: [pic], where A is m(k, B is k(n, and C is m(n.

Theorem: A(B = B(A, but AB(BA in general.

Definition: The identity matrix In is n(n with Iii = 1 and Iij = 0 if i(j.

Theorem: If A is n(n, then AIn = InA = A.

Definition: Ar = AA(A (r times).

Definition: Zero-One matrices are matrices A = [aij] such that all aij({0, 1}. Boolean operations are defined on m(n zero-one matrices A = [aij] and B = [bij] by

• Meet of A and B: A(B = aij(bij, 1(i(m and 1(j(n.

• Join of A and B: A(B = aij(bij, 1(i(m and 1(j(n.

• The Boolean product of A and B is C = A[pic]B, where A is m(k, B is k(n, and C is m(n, is defined by cij = (ai1(b1j)((ai2(b2j)(…((aik(bkj).

Definition: The Boolean power of a n(n matrix A is defined by A[r] = A[pic]A[pic]…[pic]A (r times), where A[0] = In.

Induction and Recursion

Principle of Mathematical Induction: Given a propositional function P(n), n(N, we prove that P(n) is true for all n(N by verifying

1. (Basis) P(1) is true

2. (Induction) P(k)(P(k+1), (k(N.

Notes:

• Equivalent to [P(1) ( (k(N (P(k)(P(k+1))] ( (n(N P(n).

• We do not actually assume P(k) is true. It is shown that if it is assumed that P(k) is true, then P(k+1) is also true. This is a subtle grammatical point with mathematical implications.

• Mathematical induction is a form of deductive reasoning, not inductive reasoning. The latter tries to make conclusions based on observations and rules that may lead to false conclusions.

• Sometimes P(1) is not the basis, but some other P(k), k(Z.

• Sometimes P(k) is for a (possibly infinite) subset of N or Z.

• Sometimes P(k-1)(P(k) is easier to prove than P(k)(P(k+1).

• Being flexible, but staying within the guiding principle usually works.

• There are many ways of proving false results using subtly wrong induction arguments. Usually there is a disconnect between the basis and induction parts of the proof.

• Examples 10, 11, and 12 in your textbook are worth studying until you really understand each.

Lemma: [pic] (sum of odd numbers).

Proof: (Basis) Take k = 1, so 1 = 1.

(Induction) Assume 1+3+5+…+(2k-1) = k2 for an arbitrary k > 1. Add 2k+1 to both sides. Then (1+3+5+…+(2k-1))+(2k+1) = k2+(2k+1) = (k+1)2.

Lemma: [pic]

Proof: (Basis) Take k=0, so 20 = 1 = 21 – 1.

(Induction) Assume [pic] for an arbitrary k > 0. Add 2k+1 to both sides. Then

[pic],

which simplifies to [pic]

Principle of Strong Induction: Given a propositional function P(n), n(N, we prove that P(n) is true for all n(N by verifying

1. (Basis) P(1) is true

2. (Induction) [P(1)(P(2)(…(P(k)](P(k+1) is true (k(N.

Example: Infinite ladder with reachable rungs. For mathematical or strong induction, we need to verify the following:

|Step |Mathematical |Strong |

|Basis |We can reach the first rung. |

|Induction |If we can reach an arbitrary rung k, then we can reach rung k+1. |(k(N, if we can reach all k rungs, then we can reach rung k+1. |

We cannot prove that you can climb an infinite ladder using mathematical induction. Using strong induction, however, you can prove this result using a trick: since you can prove that you can climb to rungs 1, 2, …, k, it follows that you can climb 2 rungs arbitrarily, which gets you from rung k-1 to rung k+1.

Rule of thumb: Always use mathematical induction if P(k)(P(k+1) ( (k(N. Only resort to strong induction when that fails.

Fundamental Theorem of Arithmetic: Every n(N (n>1) is the product of primes.

Proof: Let P(n) be the proposition that n can be written as the product of primes.

(Basis) P(2) is true: 2 = 2, the product of 1 prime.

(Induction) Assume P(j) is true (j(k. We must verify that P(k+1) is true.

Case 1: k+1 is a prime. Hence, P(k+1) is true.

Case 2: k+1 is a composite. Hence k+1 = a•b, where 2(a(b0, in terms of {f(j) | {j} such that 0(j0.

• g(0) = 12, g(1) = 1, g(n) = 2g(n-1) – g(n-2), n>2.

• h(0) = 1, h(n) = nh(n-1) = n!

• Fibonacci numbers: f0 = 0, f1 = 1, fn = fn-1 + fn-2, n>1.

|n |0 |1 |2 |3 |4 |

|f(n) |1 |6 |16 |36 |76 |

|g(n) |12 |1 |-10 |-21 |-32 |

|h(n) |1 |1 |2 |6 |24 |

|fn |0 |1 |1 |2 |3 |

Theorem: Whenever n(3, fn > (n-2, where [pic].

The proof is by modified strong induction.

Lamé’s Theorem: Let a,b(N (a(b). Then the number of divisions used by the Euclidean algorithm to find gcd(a,b) ( 5•decimal digits in b.

We can recursively define sets, too, not just functions. There is a basis step and a recursion step with the possibility of an exclusion step.

Definition: The set (* of strings over an alphabet ( is defined by

(Basis) (((*, where ( is the empty string.

(Recursion) If w,x((, then wx((*.

Example: ( = {0,1}. Then (* is the binary representation of N0.

Principle of Structured Induction:

1. (Basis) Show the result holds for all elements specified in the basis step of the recursive definition of the set.

2. (Induction) Show that if the statement is true for each element used to construct new elements in the recursive step of the definition, then the result holds for these new elements.

The validity of this approach comes from mathematical induction over N. First state that P(n) is true whenever n or fewer elements are used to generate an element. We must show that P(0) is true (i.e., the basis element). Now assume that P(k) is true for an arbitrary k. Hence, P(k+1) must be true, too, due to the recursion involving k or fewer elements.

Definition: A recursive algorithm solves a problem by reducing it to an instance of the same problem with smaller input(s).

Note: Recursive algorithms can be proven correct using mathematical induction or modified strong induction.

Examples:

• n! = n•(n-1)!

• an = a•(an-1)

• gcd(a,b) with a,b(N (a 1 then

m := (n/2(

L1 := [pic]

L2 := [pic]

L := merge(merge_sort(L1), merge_sort(L2))

{L is now the sorted [pic]}

procedure merge(L1, L2: sorted lists)

L := (

while L1 and L2 are both nonempty

remove the smaller of the first element of L1 and L2 and append it to end of L

if either L1 or L2 are empty, append the other list to the end of L

{L is the merged, sorted list}

Theorem: If ni = |Li|, i=1,2, then merge requires at most n1+n2-1 comparisons. If n = |L|, then merge_sort requires O(nlog2n) comparisons.

Quick sort is another sorting algorithm that breaks an initial list into many sublists, but using a different heuristic than merge sort. If L = [pic] with distinct elements, then quick sort recursively constructs two lists: L1 for all ai < a1 and L2 for all ai > a1 with a1 appended to the end of L1. This continues recursively until each sublist has only one element. Then the sublists are recombined in order to get a sorted list.

Note: On average, the number of comparisons is O(nlog2n) for n elements, but can be O(n2) in the worst case. Quick sort is one of the most popular sorting algorithms used in academia.

Exercise: Google “quick sort, C++” to see many implementations or look in many of the 200+ C++ primers. Defining quick sort is in Rosen’s exercises.

Counting, Permutations, and Combinations

Product Rule Principle: Suppose a procedure can be broken down into a sequence of k tasks. If there are ni, 1(i(k, ways to do the ith task, then there are [pic] ways to do the procedure.

Sum Rule Principle: Suppose a procedure can be broken down into a sequence of k tasks. If there are ni, 1(i(k, ways to do the ith task, with each way unique, then there are [pic] ways to do the procedure.

Exclusion (Inclusion) Principle: If the sum rule cannot be applied because the ways are not unique, we use the sum rule and subtract the number of duplicate ways.

Note: Mapping the individual ways onto a rooted tree and counting the leaves is another method for summing. The trees are not unique, however.

Examples:

• Consider 3 students in a classroom with 10 seats. There are 10(9(8 = 720 ways to assign the students to the seats.

• We want to appoint 1 person to fill out many, may forms that the administration wants filled in by today. There are 3 students and 2 faculty members who can fill out the forms. There are 3+2 = 5 ways to choose 1 person. (Duck fast.)

• How many variables are legal in the orginal Dartmouth BASIC computer language? Variables are 1 or 2 alphanumeric characters long, begin with A-Z, case independent, and are not one of the 5 two character reserved words in BASIC. We use a combination of the three counting principles:

o 1 character variables: V1 = 26

o 2 character variables: V2 = 26(36 - 5 = 931

o Total: V = V1 + V2 = 957

Pigeonhole Principle: If there are k(N boxes and at least k+1 objects placed in the boxes, then there is at least one box with more than one object in it.

Theorem: A function f: D(E such that |D| >k and |E| = k, then f is not 1-1.

The proof is by the pigeonhole principle.

Theorem (Generalized Pigeonhole Principle): If N objects are placed in k boxes, then at least one box contains at least (N/k( - 1 objects.

Proof: First recall that (N/k( < (N/k)+1. Now suppose that none of the boxes contains more than (N/k( - 1 objects. Hence, the total number of objects has to be

k((N/k( - 1) < k((N/k)+1)-1) = N. ((

Hence, the theorem must be true (proof by contradiction).

Theorem: Every sequence of n2+1 distinct real numbers contains a subsequence of length n+1 that is either strictly increasing or decreasing.

Examples: From a standard 52 card playing deck.

• How many cards must be dealt to guarantee that k = 4 cards from the same suit are dealt?

o GPP Theorem says (N/k( - 1 ( 4 or N = 17.

o Real minimum turns out to be (N/k( ( 4 or N = 16.

• How many cards must be dealt to guarantee that 4 clubs are dealt?

o GPP Theorem does not apply.

o The product rule and inclusion principles apply: 3(13+4 = 43 since all of the hearts, spaces, and diamonds could be dealt before any clubs.

Definition: A permutation of a set of distinct objects is an ordered arrangment of these objects. A r-permutation is an ordered arrangement of r of these objects.

Example: Given S = {0,1,2}, then {2,1,0} is a permutation and {0,2} is a 2-permutation of S.

Theorem: If n,r(N, then there are P(n,r) = n((n-1)((n-2)(…((n-r+1) = n!/(n-r)! r-permutations of a set of n distinct elements. Further, P(n,0) = 1.

The proof is by the product rule for r(1. For r=0, there is only way to order 0 objects.

Example: You want to visit 10 cities in China on a vacation. You will arrive in Hong Kong as your first city and you want to maximize the number of frequent flier miles you will accumulate by flying to 9 more cities. You have 9! Different paths to check. Good luck since 9! = 362,880.

Definition: A r-combination is an unordered subset with r elements from the original set.

Definition: The binomial coefficient is defined by [pic].

Theorem: The number of r-combinations of a set with n elements with n,r(N0 is C(n,r) = [pic].

Proof: The r-permutations can be formed using C(n,r) r-combinations and then ordering each r-combination, which can be done in P(r,r) ways. So,

P(n,r) = C(n,r)(P(r,r)

or

[pic].

Theorem: C(n,r) = C(n,n-r) for 0(r(n.

Definition: A combinatorial proof of an identity is a proof that uses counting arguments to prove that both sides f the identity count the same objects, but in different ways.

Binomial Theorem: Let x and y be variables. Then for n(N,

(x+y)n = [pic].

Proof: Expanding the terms in the product all are of the form xn-jyj for j=0,1,…,n. To count the number of terms for xn-jyj, note that we have to choose n-j x’s from the n sums so that the other j terms in the product are y’s. Hence, the coefficient for xn-jyj is [pic].

Example: What is the coefficient of x12y13 in (x+y)25? [pic] = 5,200,300.

Corollary: Let n(N0. Then [pic].

Proof: [pic].

Corollary: Let n(N0. Then [pic].

Proof: [pic].

Corollary: [pic]

Corollary: Let n(N0. Then [pic].

Theorem (Pascal’s Identity): Let n,k(N with n(k. Then [pic].

Note: Using [pic] as a basis, we can define [pic] recursively using Pascal’s Identity. It is normally written as a triangular table, denoted Pascal’s Triangle.

Theorem (Vandermonde’s Identity): Let m,n,r(N with r(m and r(n. Then

[pic].

Corollary: If n(N0, then [pic].

Proof: [pic].

Theorem: Let n,r(N0 such that r(n. Then [pic].

If we allow repetitions in the permutations, then all of the previous theorems and corollaries no longer apply. We have to start over (.

Theorem: The number of r-permutations of a set with n objects and repetition is nr.

Proof: There are n ways to select an element of the set of all r positions in the r-permutation. Using the product principle completes the proof.

Theorem: There are C(n+r-1,r) = C(n+r-1,n-1) r-combinations from a set with n elements when repetition is allowed.

Example: How many solutions are there to x1+x2+x3 = 9 for xi(N? C(3+9-1,9) = C(11,9) = C(11,2) = 55. Only when the constraints are placed on the xi can we possibly find a unique solution.

Definition: The multinomial coefficient is C(n; n1, n2, …, nk) = [pic].

Theorem: The number of different permutations of n objects, where there are ni, 1(i(k, indistinguishable objects of type i, is C(n; n1, n2, …, nk).

Theorem: The number of ways to distribute n distinguishable objects in k distinguishable boxes so that ni objects are placed into box i, 1(i(k, is C(n; n1, n2, …, nk).

Theorem: The number of ways to distribute n distinguishable objects in k indistinguishable boxes so that ni objects are placed into box i, 1(i(k, is

[pic].

Multinomial Theorem: If n(N, then

[pic].

Generating permutations and combinations is useful and sometimes important.

Note: We can place any n-set into a 1-1 correspondence with the first n natural numbers. All permutations can be listed using {1, 2, …, n} instead of the actual set elements. There are n! possible permutations.

Definition: In the lexicographic (or dictionary) ordering, the permutation of {1,2,…,n} a1a2…an precedes b1b2…bn if and only if ai ( bi, for all 1(i(n.

Examples:

• 5 elements. The permutation 21435 precedes 21543.

• Given 362541, then 364125 is the next permutation lexicographically.

Algorithm: Generate the next permutation in lexicographic order.

procedure next_perm(a1a2…an: ai({1,2,…,n} and distinct)

j := n – 1

while aj > aj+1

j := j – 1

{j is the largest subscript with aj < aj+1}

k := n

while aj > ak

k := k – 1

{ak is the smallest integer greater than aj to the right of aj}

Swap aj and ak

r := n, s := j+1

while r > s

Swap ar and as

r := r – 1, s:= s + 1

{This puts the tail end of the permutation after the jth position in increasing order}

Algorithm: Generating the next r-combination in lexicographic order.

procedure next_r_combination(a1a2…an: ai({1,2,…,n} and distinct)

i := r

while ai = n-r+1

i := i – 1

ai := ai + 1

for j := i+1 to r

aj := ai + j - 1

Example: Let S = {1, 2, …, 6}. Given a 4-permutation of {1, 2, 5, 6}, the next 4-permutation is {1, 3, 4, 5}.

Discrete Probability

Definition: An experiment is a procedure that yields one of a given set of possible outcomes.

Definition: The sample space of the experiment is the set of (all) possible outcomes.

Definition: An event is a subset of the sample space.

First Assumption: We begin by only considering finitely many possible outcomes.

Definition: If S is a finite sample space of equally likely outcomes and E(S is an event, then the probability of E is p(E) = |E| / |S|.

Examples:

• I randomly chose an exam1 to grade. What is the probability that it is one of the Davids? Thirty one students took exam1 of which five were Davids. So, p(David) = 5 / 31 ~ 0.16.

• Suppose you are allowed to choose 6 numbers from the first 50 natural numbers. The probability of picking the correct 6 numbers in a lottery drawing is 1/C(50,6) = (44!(6!) / 50! ~ 1.43(10-9. This lottery is just a regressive tax designed for suckers and starry eyed dreamers.

Definition: When sampling, there are two possible methods: with and without replacement. In the former, the full sample space is always available. In the latter, the sample space shrinks with each sampling.

Example: Let S = {1, 2, …, 50}. What is the probability of sampling {1, 14, 23, 32, 49}?

• Without replacement: p({1,14,23,32,49}) = 1 / (50(49(48(47(46) = 3.93(10-9.

• With replacement: p({1,14,23,32,49}) = 1 / (50(50(50(50(50) = 3.20(10-9.

Definition: If E is an even, then [pic] is the complementary event.

Theorem: p([pic]) = 1 – p(E) for a sample space S.

Proof: p([pic]) = (|S| – |E|) / |S| = 1 – |E| / |S| = 1 – p(E).

Example: Suppose we generate n random bits. What is the probability that one of the bits is 0? Let E be the event that a bit string has at least one 0 bit. Then [pic] is the event that all n bits are 1. p(E) = 1 – p([pic]) = 1 – 2-n = (2n – 1) / 2n.

Note: Proving the example directly for p(E) is extremely difficult.

Theorem: Let E and F be events in a sample space S. Then

p(E(F) = p(E) + p(F) – p(E(F).

Proof: Recall that |E(F| = |E| + |F| – |E(F|. Hence,

p(E(F) = |E(F| / |S| = (|E| + |F| – |E(F|) / |S| = p(E) + p(F) – p(E(F).

Example: What is the probability in the set {1, 2, …, 100} of an element being divisible by 2 or 3? Let E and F represent elements divisible by 2 and 3, respectively. Then |E| = 50, |F| = 33, and |E(F| = 16. Hence, p(E(F) = 0.67.

Second Assumption: Now suppose that the probability of an event is not 1 / |S|. In this case we must assign probabilities for each possible event, either by setting a specific value or defining a function.

Definition: For a sample space S with a finite or countable number of events, we assign probabilities p(s) to each event s(S such that

(1) 0 ( p(s) ( 1 (s(S, and

(2) [pic].

Notes:

1. When |S| = n, the formulas (1) and (2) can be rewritten using n.

2. When |S| = ( and is uncountable, integral calculus is required for (2).

3. When |S| = ( and is countable, the sum in (2) is true in the limit.

Example: Coin flipping with events H and T.

• S = {H, T} for a fair coin. Hence, p(H) = p(T) = 0.5.

• S = {H, H, T} for a weighted coin. Then p(H) = 0.67 and p(T) = 0.33.

Definition: Suppose that S is a set with n elements. The uniform distribution assigns the probability 1/n to each element in S.

Definition: The probability of the event E is the sum of the probabilities of the outcomes in E, i.e., [pic].

Note: When |E| = (, the sum [pic] must be convergent in the limit.

Definition: The experiment of selecting an element from a sample space S with a uniform distribution is known as selecting an element from S at random.

We can prove that (1) p(E) = 1 – p([pic]) and (2) p(E(F) = p(E) + p(F) – p(E(F) using the more general probability definitions.

Definition: Let E and F be events with p(F) > 0. The conditional probability of E given F is defined by p(E|F) = p(E(F) / p(F).

Example: A bit string of length 3 is generated at random. What is the probability that there are two 0 bits in a row given that the first bit is 0? Let F be the event that the first bit is 0. Let E be the event that there are two 0 bits in a row. Note that E(F = {000, 001} and p(F) = 0.5. Hence, p(E|F) = 0.25 / 0.5 = 0.5.

Definition: The events E and F are independent if p(E(F) = p(E)p(F).

Note: Independence is equivalent to having p(E|F) = p(E).

Example: Suppose E is the event that a bit string begins with a 1 and F is the event that there is are an even number of 1’s. Suppose the bit strings are of length 3. There are 4 bit strings beginning with 1: {100, 101, 110, 111}. There are 3 strings with an even number of 1’s: {101, 110, 011}. Hence, p(E) = 0.5 and p(F) = 0.375. E(F = {101, 110}, so p(E(F) = 0.25. Thus, p(E(F) ( p(E)p(F). Hence, E and F are not independent.

Note: For bit strings of length 4, 0.25 = p(E(F) = (0.5)((0.5) = p(E)p(F), so the events are independent. We can speculate on whether or not the even/odd length of the bit strings plays a part in the independence characteristic.

Definition: Each performance of an experiment with exactly two outcomes, denoted success (S) and failure (F), is a Bernoulli trial.

Definition: The Bernoulli distribution is denoted b(k; n,p) = C(n,k)pkqn-k.

Theorem: The probability of exactly k successes in n independent Bernoulli trials, with probability of success p and failure q = 1 – p is b(k; n,p).

Proof: When n Bernoulli trials are carried out, the outcome is an n-tuple

(t1, t2, …, tn), all n ti({S, F}. Due to the trials independence, the probability of each outcome having k successes and n-k failures is pkqn-k. There are C(n,k) possible tuples that contain exactly k successes and n-k failures.

Example: Suppose we generate bit strings of length 10 such that p(0) = 0.7 and p(1) = 0.3 and the bits are generated independently. Then

• b(8; 10,0.7) = C(10,8)(0.7)8(0.3)2 = 45(0 .0823543(0.09 = 0.3335

• b(7; 10,0.7) = C(10,7)(0.7)7(0.3)3 = 120(0 .05764801(0.027 = 0.1868

Theorem: [pic].

Proof: [pic].

Definition: A random variable is a function from the sample space of an experiment to the set of reals.

Notes:

• A random variable assigns a real number to each possible outcome.

• A random function is not a function nor random.

Example: Flip a fair coin twice. Let X(t) be the random variable that equals the number of tails that appear when t is the outcome. Then

X(HH) = 0, X(HT) = X(TH) = 1, and X(TT) = 2.

Definition: The distribution of a random variable X on a sample space is the set of pairs (r, p(X=r)) (r(X(S), where p(X=r) is the probability that X takes the value r.

Note: A distribution is usually described by specifying p(X=r) (r(X(S).

Example: For our coin flip example above, each outcome has probability 0.25. Hence,

p(X=0) = 0.25, p(X=1) = 0.5, and p(X=2) = 0.25.

Definition: The expected value (or expectation) of the random variable X(s) in the sample space S is [pic].

Note: If S = [pic], then [pic].

Example: Roll a die. Let the random variable X take the valuess 1, 2, …, 6 with probability 1/6 each. Then [pic]. This is not really what you would like to see since the die does not a 3.5 face.

Theorem: If X is a random variable and p(X=r) is the probability that X=r so that [pic], then [pic].

Proof: Suppose X is a random variable with range X(S). Let p(X=r) be the probability that X takes the value r. Hence, p(X=r) is the sum of probabilities of outcomes s such that X(s)=r Finally, [pic].

Theorem: If Xi, 1(i(n, are random variables on S and if a,b(R, then

1. E(X1+X2+…+Xn) = E(X1)+E(X2)+…+E(Xn)

2. E(aXi+b) = aE(Xi) + b

Proof: Use mathematical induction (base case is n=2) for 1 and using the definitions for 2.

Note: The linearity of E is extremely convenient and useful.

Theorem: The expected number of successes when n Bournoulli trials is performed when p is the probability of success on each trial is np.

Proof: Apply 1 from the previous theorem.

Notes:

• The average case complexity of an algorithm can be interpreted as the expected value of a random variable. Let S={ai}, where each possible input is an ai. Let X be the random variable such that X(ai) = bi, the number of operations for the algorithm with input ai. We assign a probability p(ai) based on bi. Then the average case complexity is [pic].

• Estimating the average complexity of an algorithm tends to be quite difficult to do directly. Even if the best and worst cases can be estimated easily, there is no guarantee that the average case can be estimated without a great deal of work. Frankly, the average case is sometimes too difficult to estimate. Using the expected value of a random variable sometimes simplifies the process enough to make it doable.

Example of linear search average complexity: See page 44 in the class notes for the algorithm and worst case complexity bound. We want to find x in a distinct set [pic]. If x = ai, then there are 2i+1 comparisons. If x([pic], then there are 2n+2 comparisons. There are n+1 input types: [pic](x. Clearly, p(ai) = p/n, where p is the probability that x([pic]. Let q = 1(p. So,

|E |= |(p/n)[pic] + (2n+2)q |

| |= |(p/n)((n+1)2 + (2n+2)q |

| |= |p(n+2) + (2n+2)q. |

There are three cases of interest, namely,

• p = 1, q = 0: E = n + 1

• p = q = 0.5: E = (3n + 4) / 2

• p = 0, q = 1: E = 2n + 2

Definition: A random variable X has a geometric distribution with parameter p if p(X=k) = (1(p)k-1p for k = 1, 2, …

Note: Geometric distributions occur in studies about the time required before an event happens (e.g., time to finding a particular item or a defective item, etc.).

Theorem: If the random variable X has a geometrix distribution with parameter p, then E(X) = 1/p.

Proof:

|E(X) |= |[pic] |

| |= |[pic] |

| |= |[pic] |

| |= |pp-2 |

| |= |1/p |

Definition: The random variables X and Y on a sample space are independent if p(X(s)=r1 and Y(S)=r2) = p(X(S)=r1)p(Y(S)=r2).

Theorem: If X and Y are independent random variables on a space S, then

E(XY) = E(X)E(Y).

Proof: From the definition of expected value and since X and Y are independent random variables,

|E(XY) |= |[pic] |

| |= |[pic] |

| |= |[pic] |

| |= |[pic] |

| |= |E(X)E(Y). |

Third Assumption: Not all problems can be solved using deterministic algorithms. We want to assess the probability of an event based on partial evidence.

Note: Some algorithms need to make random choices and produce an answer that might be wrong with a probability associated with its likelihood of correctness or an error estimate. Monte Carlo algorithms are examples of probabilistic algorithms.

Example: Consider a city with a lattice of streets. A drunk walks home from a bar. At each intersection, the drunk must choose between continuing or turning left or right. Hopefully, the drunk gets home eventually. However, there is no absolute guarantee.

Example: You receive n items. Sometimes all n items are guaranteed to be good. However, not all shipments have been checked. The probability that an item is bad in an unchecked batch is 0.1. We want to determine whether or not a shipment has been checked, but are not willing to check all items. So we test items at random until we find a bad item or the probability that a shipment seems to have been checked is 0.001. How items do we need to check? The probability that an item is good, but comes from an unchecked batch is 1(0.1 = 0.9. Hence, the kth check without finding a bad item, the probability that the items comes from an unchecked shipment is (0.9)k. Since (0.9)66~0.001, we must check only 66 items per shipment.

Theorem: If the probability that an element of a set S does have a particular property is in (0,1), then there exists an element in S with this property.

Bayes Theorem: Suppose that E and F are events from a sample space S such that p(E) ( 0 and p(F) ( 0. Then

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|[pic])p([pic])).

Generalized Bayes Theorem: Suppose that E is an event from a sample space and that F1, F2, …, Fn are mutually exclusive events such that [pic]. Assume that p(E) ( 0 and p(Fi) ( 0, 1(i(n. Then

p(Fj|E) = p(E| Fj)p(Fj) / [pic].

Example: We have 2 boxes. The first box contains 2 green and 7 red balls. The second box contains 4 green and 3 red balls. We select a box at random, then a ball at random. If we picked a red ball, what is the probability that it came from the first box?

• Let E be the event that we chose a red ball. Thus, [pic] is the event that we chose a green ball. Let F be the event that we chose a ball from the first box. Thus, [pic] is the event that we chose a ball from the second box. p(F) = p([pic]) = 0.5 since we pick a box at random.

• We want to calculate p(F|E) = p(E(F) / p(E), which we will do in stages.

• p(E|F) = 7/9 since there are 7 red balls out of 9 total in box 1. p(E|[pic]) = 3/7 since there are 3 red balls out of a total of 7 in box 2.

• p(E(F) = p(E|F)p(F) = 7/18 = 0.389 and p(E([pic]) = p(E|[pic])p([pic]) = 3/14.

• We need to find p(E). We do this by observing that E = (E(F)((E([pic]), where E(F and E([pic] are disjoint sets. So, p(E) = p(E(F)+p(E([pic]) = 0.603.

• p(F|E) = p(E(F) / p(E) = 0.389 / 0.603 = 0.645, which is greater than the 0.5 from the second bullet above. We have improved our estimate!

Example: Suppose one person in 100,000 has a particular rare disease and that there is an accurate diagnostic test for this disease. The test is 99% accurate when given to someone with the disease and is 99.5% accurate when given to someone who does not have the disease. We can calculate

(a) the probability that someone who tests positive has the disease, and

(b) the probability that someone who tests negative does not have the disease.

Let F be the event that a person has the disease and let F be the event that this person tests positive. We will use Bayes theorem to calculate (a) and (b), so have to calculate p(F), p([pic]), p(E|F), and p(E|[pic]).

• p(F) = 1 / 100000 = 10(5 and p([pic]) = 1 ( p(F) = 0.99999.

• p(E|F) = 0.99 since someone who has the disease tests positive 99% of the time. Similarly, we know that a false negative is p([pic]|F) = 0.01. Further, p([pic]|[pic]) = 0.995 since the test is 99.5% accurate for someone who does not have the disease.

• p(E|[pic]) = 0.005, which is the probability of a false negative (100 ( 99.5%).

Now we calculate (a):

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|[pic])p([pic])) =

(0.99(10(5) / (0.99(10(5 + 0.005(0.99999) = 0.002.

Roughly 0.2% of people who test positive actually have the disease. Getting a positive should not be an immediate cause for alarm (famous last words).

Now we calculate (b):

p([pic]|[pic]) = p([pic]|[pic])p([pic]) / (p([pic]|[pic])p([pic]) + p([pic]|F)p(F))

(0.995(0.99999) / (0.995(0.99999 + 0.01(10(5) = 0.9999999.

Thus, 99.99999% of people who test negative really do not have the disease.

Bayesian Spam Filters used to be the first line of defense for email programs. Like many good things, the spammers ran right over the process in about two years. However, it is an interesting example of useful discrete mathematics.

The filtering involves a training period. Email messages need to be marked as Good or Bad messages, which we will denote as being the G or B sets. Eventually the filter will mark messages for you, hopefully accurately.

The filter finds all of the words in both sets and keeps a running total of each word per set. We construct two functions nG(w) and nB(w) that return the number of messages containing the word w in the G and B sets, respectively.

We use a uniform distribution. The empirical probability that a spam message contains the word w is p(w) = nB(w) / |B|. The empirical probability that a non-spam message contains the word w is q(w) = nG(w) / |G|.

We can use p and q to estimate if an incoming message is or is not spam based on a set of words that we build dynamically over time.

Let E be the event that an incoming message contains the word w. Let S be the event that an incoming message is spam and contains the word w. Bayes theorem tells us that the probability that an incoming message containing the word w is spam is

p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E|[pic])p([pic])).

If we assume that p(S) = p([pic]) = 0.5, i.e., that any incoming message is equally likely to be spam or not, then we get the simplified formula

p(S|E) = p(E|S) / (p(E|S) + p(E|[pic])).

We estimate p(E|S) = p(w) and p(E|[pic]) = q(w). So, we estimate p(S|E) by

r(w) = p(w) / (p(w) + q(w)).

If r(w) is greater than some preset threshold, then we classify the incoming message as spam. We can consider a threshold of 0.9 to begin with.

Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in

5 / 1000 good messages. We will estimate the probability that an incoming message with Rolex in it is spam assuming that it is equally likely that the incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125 and q(Rolex) = 5 / 1000 = 0.005. So,

r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9.

Hence, we would reject the message as spam. (Note that some of us would reject all messages with the word Rolex in it as spam, but that is another case entirely.)

Using just one word to determine if a message is spam or not leads to excessive numbers of false positives and negatives. We actually have to use the generalized Bayes theorem with a large set of words.

[pic] ,

which we estimate assuming equal probability that an incoming message is spam or not by

[pic].

Example: The word w1 = stock appears in 400 / 2000 spam messages and in just 60 / 1000 good messages. The word w2 = undervalued appears in 200 / 2000 spam messages and in just 25 / 1000 good messages. Estimate the likelihood that an incoming message with both words in it is spam. We know p(stock) = 0.2 and q(stock) = 0.06. Similarly, p(undervalued) = 0.1 and q(undervalued) = .025. So,

|r(stock,undervalued) |= |[pic] |

| |= |[pic] |

| |= |0.930 > 0.9 |

Note: Looking for particular pairs or triplets of words and treating each as a single entity is another method for filtering. For example, enhance performance probably indicates spam to almost anyone, but high performance computing probably does not indicate spam to someone in computational sciences (but probably will for someone working in, say, Maytag repair).

Advanced Counting Principles

Definition: A recurrence relation for the sequence {an} is the equation that expresses an in terms of one or more of the previous terms in the sequence. A sequence is called a solution to a recurrence relation if its terms satisfy the recurrence relation. The initial conditions specify the values of the sequence before the first term where the recurrence relation takes effect.

Note: Recursion and recurrence relations have a connection. A recursive algorithm provides a solution to a problem of size n in terms of a problem size n in terms of one more instances of the same problem, but of smaller size. Complexity analysis of the recursive algorithm is a recurrence relation on the number of operations.

Example: Suppose we have {an} with an = 3n, n(N. Is this a solution for

an = 2an-1 ( an-2 for n(2? Yes, since for n(2,

2an-1 ( an-2 = 2(3(n(1)) – 3(n(2) = 3n = an.

Example: Suppose in 1977 you invested $100,000 into a tax free, 30 year municipal bond that paid 15% per year. What is it worth at maturity? Did it beat inflation and if so, by how much?

• P0 = 100000

• P1= 1.15(P0

• P2 = 1.15(P1 = (1.15)2(P0

• Pi = (1.15)i(P0, which can be rigorously proven using mathematical induction.

• P30 = (1.15)30(P0 = $6,621,180

This is a big number. What about inflation? We can find the consumer price increase (CPI) monthly and yearly on the Internet, e.g., . Consider just the yearly CPI to make the comparison fairer.

• {Ij} the CPI per year

• Bj = [pic] = $354,580.

Investing your money in a bank that just beat inflation would have been a huge investing error. 15% seems high, but that existed back then due to high inflation.

Fibonacci Example: A young pair of rabbits (1 male, 1 female) arrive on a deserted island. They can breed after they are two months old and produce another pair. Thereafter each pair at least two months old can breed once a month. How many pairs fn of rabbits are there after n months.

• n = 1: f1 = 1 Initial

• n = 2: f2 = 1 conditions

• n > 2: fn = fn-1 + fn-2 Recurrence relation

The n > 2 formula is true since each new pair comes from a pair at least 2 months old.

Example: For bit strings of length n ( 3, find the recurrence relation and initial conditions for the number of bit strings that do not have two consecutive 0’s.

• n = 1: a1 = 2 Initial {0,1}

• n = 2: a2 = 3 conditions {01,10,11}

• n > 2: an = an-1 + an-2 Recurrence relation

For n > 2, there are two cases: strings ending in 1 (thus, examine the n(1 case) and strings ending in 10 (thus, examine the n(2 case).

Definition: A linear homogeneous recurrence relation of degree k with constant coefficients is a recurrence relation of the form

an = c1an(1 + c2an(2 + … + ckan(k,

where {ci}(R.

Motivation for study: This type of recurrence relation occurs often and can be systematically solved. Slightly more general ones can be, too. The solution methods are related to solving certain classes of ordinary differential equations.

Notes:

• Linear because the right hand side is a sum of previous terms.

• Homogeneous because no terms occur that are not multiples of aj’s.

• Constant because no coefficient is a function.

• Degree k because an is defined in terms of the previous k sequential terms.

Examples: Typical ones include

• Pn = 1.15(Pn-1 is degree 1.

• fn = fn-1 + fn-2 is degree 2.

• an = an-5 is degree 5.

Examples: Ones that fail the definition include

• an = an-1 + [pic] is nonlinear.

• Hn = 2Hn-1 + 1 is nonhomogeneous.

• Bn = nBn-1 is variable coefficient.

We will get to nonhomogeneous recurrence relations shortly.

Solving a recurrence relation usually assumes that the solution has the form

an = rn,

where r(C, if and only if

rn = c1rn-1 + c2rn-2 + … + cn-krn-k.

Dividing both sides by rn-k to simplify things, we get

Definition: The characteristic equation is

rk ( c1rk-1 ( c2rk-2 ( … ( cn-k = 0.

Then {an} with an = rn is a solution if and only if r is a solution to the characteristic equation. The proof is quite involved.

The n = 2 case is much easier to understand, yet still multiple cases.

Theorem: Assume c1,c2,(1,(2(R and r1,r2(C. Suppose that r2(c1r(c2 = 0 has two distinct roots r1 and r2. Then the sequence {an} is a solution to the recurrence relation an = c1an-1 + c2an-2 if and only if [pic] for n(N0.

Example: a0 = 2, a1 = 7, and an = an-1 + 2an-2 for n(2. Then

• Characteristic equation: r2 – r – 2 = 0 or (r(2)(r+1) = 0.

• Roots: r1 = 2 and r2 = (1.

• Constants: a0 = 2 = (1 + (2 and a1 = 7 = 2(1 ( (2.

• Solve [pic].

• Solution: an = 3(2n + ((1)n.

Matlab or Maple is essential to solving recurrence relations quickly and accurately.

Fibonacci Example: f0 = 0, f1 = 1, and fn = fn-1 + fn-2, n(2.

• Characteristic equation: r2 – r – 1 = 0.

• Roots: r1 = [pic] and r2 = [pic].

• Set up a 2(2 matrix problem to solve for (1 and (2, which are [pic] and [pic].

• Solution: [pic].

Now comes the second case for n = 2.

Theorem: Assume c1,c2,(1,(2(R and r0(C. Suppose that r2(c1r(c2 = 0 has one root r0 with multiplicity 2. Then the sequence {an} is a solution to the recurrence relation an = c1an-1 + c2an-2 if and only if [pic] for n(N0.

Example: a0 = 1, a1 = 6, and an = 6an-1 ( 9an-2 for n(2. Then

• Characteristic equation: r2 ( 6r + 9 = 0 or (r(3)2 = 0.

• Double root: r0 = 3.

• Constants: a0 = 1 = (1 and a1 = 6 = 3(1 + 3(2.

• Solve [pic].

• Solution: an = (n+1)3n.

Theorem: Let [pic], [pic](R and [pic](C. Suppose the characteristic equation rk – c1rk(1 (… ( ck = 0 has k distinct roots ri, 1(i(k. Then the sequence {an} is a solution of the recurrence relation an = c1an(1 + c2an(2 + … + ckan(k if and only if [pic]for n(N0.

Example: a0 = 2, a1 = 5, a2 = 15, and an = 6an(1 (11an(2 + 6an(3, n(3.

• Characteristic equation: r3 ( 6r2 +11r ( 6 = 0 or (r(1)(r(2)(r(3) = 0.

• Roots: r1 = 1, r2 = 2, and r3 = 3.

• Constants: a0 = 2 = (1 + (2 + (3, a2 = 5 = (1 + 2(2 + 3(3, and

a0 = 15 = (1 + 4(2 + 9(3.

• Solve [pic].

• Solution: an = 1 ( 2n + 2(3n.

Theorem: Let [pic], [pic](R and [pic](C. Suppose the characteristic equation rk – c1rk(1 (… ( ck = 0 has t distinct roots ri, 1(i(t, with multiplicities mi(N such that [pic]. Then the sequence {an} is a solution of the recurrence relation an = c1an(1 + c2an(2 + … + ckan(k if and only if

[pic]

for n(N0 and all (i,j, 1(i(t and 0(j(mi(1.

Example: Suppose the roots of the characteristic equation are 2, 2, 3, 3, 3, 5. Then the general solution form is

((1,0+(1,1n)2n + ((2,0+(2,1n+(2,2n2)3n + (3,05n.

With given initial conditions, we can even compute the (’s.

Definition: A linear nonhomogeneous recurrence relation of degree k with constant coefficients is a recurrence relation of the form

an = c1an(1 + c2an(2 + … + ckan(k + F(n),

where {ci}(R.

Theorem: If [pic] is a particular solution of the recurrence relation with constant coefficients an = c1an(1 + c2an(2 + … + ckan(k + F(n), then every solution is of the form [pic], where [pic] is a solution of the associated homogeneous recurrence relation (i.e., F(n) = 0).

Note: Finding particular solutions for given F(n)’s is loads of fun unless F(n) is rather simple. Usually you solve the homogeneous form first, then try to find a particular solution from that.

Theorem: Assume {bi},{ci}(R. Suppose that {an} satisfies the nonhomogeneous recurrence relation

an = c1an(1 + c2an(2 + … + ckan(k + F(n)

and

f(n) = (btnt + bt-1nt-1 + … + b1n + b0)sn.

When s is not a root of the characteristic equation of the associated homogeneous recurrence relation, there is a particular solution of the form

(ptnt + pt-1nt-1 + … + p1n + p0)sn.

When s is a root of multiplicity m of the characteristic equation, there is a particular solution of the form

nm(ptnt + pt-1nt-1 + … + p1n + p0)sn.

Note: If s = 1, then things get even more complicated.

Example: Let an = 6an-1 – 9an-2 + F(n). When F(n) = 0, the characteristic equation is (r(3)2. Thus, r0 = 3 with multiplicity 2.

• F(n) = 3n: particular solution is n2p03n.

• F(n) = n3n: particular solution is n2(p1n + p0)3n.

• F(n) = n22n: particular solution is (p2n2 + p1n + p0)2n.

• F(n) = (n+1)3n: particular solution is n2(p2n2 + p1n + p0)3n.

Definition: Suppose a recursive algorithm divides a problem of size n into m subproblems of size n/m each. Also suppose that g(n) extra operations are required to combine the m subproblems into a solution of the problem of size n. If f(n) is the cost of solving a problem of size n, then the divide and conquer recurrence relation is f(n) = af(n/b) + g(n).

We can easily work out a general cost for the divide and conquer recurrence relation using Big-Oh notation.

Divide and Conquer Theorem: Let a,b,c(R and be nonnegative. The solution to the recurrence relation

[pic]

for n a power of b is

[pic]

Proof: If n is a power of b, then for r = a/b, [pic]. There are 3 cases:

• a < bd: Then [pic] converges, so f(n) = O(nd).

• a = bd: Then each term in the sum is 1, so f(n) = O(ndlogn).

• a > bd: Then [pic] which is O([pic]) or O([pic]).

Example: Recall binary search (see page 45 in the class notes). Searching for an element in a set requires 2 comparisons to determine which half of the set to search further. The search keeps halving the size of the set until at most 1 element is left. Hence, f(n) = f(n/2) + 2. Using the Divide and Conquer theorem, we see that the cost is O(logn) comparisons.

Example: Recall merge sort (see pages 81-83 in the class notes). This sorts halves of sets of elements and requires less than n comparisons to put the two sorted sublists into a sorted list of size n. Hence, f(n) = 2f(n/2) + n. Using the Divide and Conquer theorem, we see that the cost is O(nlogn) comparisons.

Multiplying integers can be done recursively based on a binary decomposition of the two numbers to get a fast algorithm. The patent on this technique, implemented in hardware, made a computer company several billion dollars back when a billion dollars was real money (cf. a trillion dollars today).

Why stop with integers? The technique extends to multiplying matrices, too, with real, complex, or integer entries.

Example (funny integer multiplication): Suppose a and b have 2n length binary representations a = (a2n(1a2n(2… a1a0)2 and a = (b2n(1b2n(2… b1b0)2. We will divide a and b into left and right halves:

a = 2nA1 + A0 and , where b = 2nB1 + B0 and

A1 = (a2n(1a2n(2…an+1an)2 and A0 = (an-1an(2…a1a0)2,

B1 = (b2n(1b2n(2…bn+1bn)2 and B0 = (bn-1bn(2…b1b0)2.

The trick is to notice that

ab = (22n+2n)A1B1 + 2n(A1(A0)(B0(B1) + (2n+1)A0B0.

Only 3 multiplies plus adds, subtracts, and shifts are required. So, f(2n) = 3f(n) + Cn, where C is the cost of the adds, subtracts, and shifts. The Divide and Conquer theorem tells us this O(nlog3), which is about O(n1.6). The standard algorithm is O(n2). It might not seem like much of an improvement, but it actually is when lots of integers are multiplied together. The trick can be applied recursively on the three multiplies in the ab line (halving 2n in the recursion).

Example (Strassen-Winograd Matrix-Matrix multiplication): We want to multiply A: m(k by B: k(n to get C: m(n. The matrix elements can be reals, complex numbers, or integers. When m = k = n, this takes O(n3) operations using the standard matrix-matrix multiplication algorithm. However, Strassen first proposed a divide and conquer algorithm that reduced the exponent. The belief is that someday, someone will devise an O(n2) algorithm. Some hope it will even be plausible to use such an algorithm. The variation of Strassen’s algorithm that is most commonly implemented by computer vendors in high performance math libraries is the Winograd variant. It computes the product as

[pic].

C is computed in 22 steps involving the submatrices of A, B, and intermediate temporary submatrices. An interesting question for many years was how little extra memory was needed to implement the Strassen-Winograd algorithm (see C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith, GEMMW: A portable Level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm, Journal of Computational Physics, 110 (1994), pp. 1-10 for an answer).

The 22 steps are the following:

|Step |Wmk |C11 |C12 |C21 |C22 |Wkn |Operation |

|1 | | | | | |S7 |B22(B12 |

|2 |S3 | | | | | |A11(A21 |

|3 | | | |M4 | | |S3S7 |

|4 |S1 | | | | | |A21+A22 |

|5 | | | | | |S5 |B12(B11 |

|6 | | | | |M5 | |S1S5 |

|7 | | | | | |S6 |B22(S5 |

|8 |S2 | | | | | |S1(A11 |

|9 | |M1 | | | | |S2S6 |

|10 |S4 | | | | | |A12(S2 |

|11 | | |M6 | | | |S4B22 |

|12 | | |T3 | | | |M5+M6 |

|13 |M2 | | | | | |A11B11 |

|14 | |T1 | | | | |M1+M2 |

|15 | | |C12 | | | |T1+T3 |

|16 | |T2 | | | | |T1+M4 |

|17 | | | | | |S8 |S6(B21 |

|18 | | | |M7 | | |A22S8 |

|19 | | | |C21 | | |T2(M7 |

|20 | | | | |C22 | |T2+M5 |

|21 | |M3 | | | | |A12B21 |

|22 | |C11 | | | | |M2+M3 |

There are four tricky steps in the table above, depending on whether or not k is even or odd. Each step makes certain that we do not use more memory than is allocated for a submatrix or temporary. For example,

• In step 4, we have to take care that with S1. (a) If k is odd, then copy the first column of A21 into Wmk. (b) Complete S1.

• In step 10, we have to take care that with S4. (a) If k is odd, then pretend the first column of A21 = 0 in Wmk. (b) Complete S4.

• In step 11, we have to take care that with M6. (a) If m is odd, then save the first row of M5. (b) Calculate most of M6. (c) Complete M6 using (a) based on whether or not m is odd.

• In step 21, we have to take care that with M3. (a) Caluclate M3 using an index shift.

This all sounds very complicated. However, the code GEMMW that is readily available on the Web effectively is implemented in 27 calls to subroutines that do the matrix operations and actually implements

C = ((op(A)op(B) + ((C,

where op(X) is either X, X transpose, X conjugate, or X conjugate transpose.

What is the total cost?

• There are 7 submatrix-submatrix multiplies and 15 submatrix-submatrix adds or subtracts. So the cost is f(n) = 7f(n/2) + 15n2/4 when m=k=n. This is actually an O(n2.807logn) algorithm, where log27 = 2.807.

• The work area Wmk needs (((m+1)max(k,n)+m+4)/4( space.

• The work area Wkn needs (((k+1)n+n+4)/4( space.

• If C overlaps A or B in memory, an additional mn space is needed to save C before calculating ((C when ((0.

• The maximum amount of extra memory is bounded by (m(max(k,n)+kn)/3+(m+max(k,n)+k+3n)/2+32+mn. Hence, the overall extra storage is cN2/3, where c({2,5}.

• Typical memory usage when m=k=n is

o ((0 or A or B overlap with C: 1.67N2.

o (=0 and A and B do not overlap with C: 0.67N2.

Definition: The (ordinary) generating function for a sequence a1, a2, …, ak, … of real numbers is the infinite series [pic]. For a finite sequence [pic], the generating function is [pic].

Examples:

1. ak = 3, [pic].

2. ak = k+1, [pic].

3. ak = 2k, [pic].

4. ak = 1, 0(k(2, [pic].

Notes:

• x is a placeholder, so that G(1) in example 4 above is undefined does not matter.

• We do not have to worry about convergence of the series, either.

• When solving a series using calculus, knowing the ball of convergence for the x’s is required.

Lemma: f(x) = (1(ax)(1 is the generating function for the sequence 1, (ax), (ax)2, …, (ax)k, … since for a(0 and |ax| 0 with a0 = 2. Let [pic] be the generating function for {ak}. Then [pic]. Using the recurrence relation directly, we have

|f(x) – 3xf(x) |= [pic] |

| |= [pic] |

| |= a0 |

| |= 2 |

Hence, f(x) ( 3xf(x) = (1(3x)f(x) = 2 or f(x) = 2 / (1(3x). Using the identity for (1(ax)(1, we see that

[pic] or [pic].

Example: an = 8an(1 + 10n(1 with a0 = 1, which gives us a1 = 9. Find an in closed form. First multiply the recurrence relation by xn to give us [pic]. If [pic], then

|f(x) ( 1 |= [pic] |

| |= [pic] |

| |= 8xf(x) + x/(1(10x) |

Hence,

|f(x) |= [pic] |

| |= [pic] |

| |= [pic] |

|or |an = .5(8k+10k). |

Note: It is possible to prove many identities using generating functions.

Exclusion-Inclusion Theorem: Given sets Ai, 1(i(n, the number of elements in the union is

|[pic] |[pic] |

| |[pic] |

| |… |

| |[pic] |

and there are 2n(1 terms in the formula.

Note: Venn diagrams motivate the above theorem.

Example: A factory produces vehicles that are car or truck based: 2000 could be cars, 4000 could be trucks, and 3200 are SUV’s, which can be car or truck based (depending on the frames). How many vehicles were produced? Let A1 be the number of cars and A2 be the number of trucks. There are

[pic].

Theorem: The number of onto functions from a set of m elements to a set of n elements with m,n(N is

nm ( C(n,1)(n(1)m(1 + C(n,2) )(n(1)m(1 ( … + ((1)n(1C(n,n(1).

Definition: A derangement is a permutation of objects such that no object is in its original position.

Theorem: The number of derangements of a set of n elements is

[pic]

Example: I hand back graded exams randomly. What is the probability that no student gets his or her own exam? It is Pn = Dn / n! since there are n! possible permutations. As n((, Pn(e(1.

Relations

Definition: A relation on a set A is a subset of A(A.

Definition: A binary relation between two sets A and B is a subset of A(B. It is a set R of ordered pairs, denoted aRb when (a,b)(R and [pic]when (a,b)(R.

Definition: A n-ary relation on n sets A1, …, An is a subset of A1(…(An. Each Ai is a domain of the relation and n is the degree of the relation.

Examples:

• Let f: A(B be a function. Then the ordered pairs (a,f(a)), (a(A, forms a binary relation.

• Let A = {Springfield} and B = {U.S. state | (Springfield in the state}. Then (Springfield,U.S. states) is a relation with about 44 elements (the so-called Simpsons relation).

Theorem: Let A be a set with n elements. There are [pic] unique relations on A.

Proof: We know there are n2 elements in A(A and that there are 2m possible subsets of a set with m elements. Hence, the result.

Definitions: Consider a relation R on a set A. Then

• R is reflexive if (a,a)(R, (a(A.

• R is symmetric if (a,b)(R and (b,a)(R, (a,b(A.

• R is antisymmetric if (a,b)(R and (b,a)(R, then a=b, (a,b(A.

• R is transitive if (a,b)(R and (b,c)(R, then (a,c)(R, (a,b,c(A.

Theorem: Let A be a set with n elements. There are 2n(n(1) unique transitive relations on A.

Proof: Each of the n pairs (a,a)(R. The remaining n(n(1) pairs may or may not be in R. The product rule and previous theorem give the result.

Examples: Let A = {1, 2, 3, 4}.

• R1 = {(1,1), (1,2), (2,1), (2,2), (3,4), (4,1), (4,4)} is

o just a relation

• R2 = {(1,1), (1,2), (2,1)} is

o symmetric

• R3 = {(1,1), (1,2), (1,4), (2,1), (2,2), (3,3), (4,1), (4,4)} is

o reflexive and symmetric

• R4 = {(2,1), (3,1), (3,2), (4,1), (4,2), (4,3)} is

o antisymmetric and transitive

• R5 = {(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4), (3,3), (3,4), (4,1), (4,4)} is

o reflexive, antisymmetric, and transitive

• R6 = {(3,4)} is

o antisymmetric

Note: We will come back to these examples when we get around to representations of relations that work in a computer.

Note: We can combine two or more relations to get another relation. We use standard set operations (e.g., (, (, (, (, …).

Definition: Let R be a relation on a set A to B and S a relation on B to a set C. Then the composite of R and S is the relation [pic] such that if (a,b)(R and (b,c)(S, then (a,c)([pic], where a(A, b(B, and c(C.

Definition: Let R be a relation on a set A. Then Rn is defined recursively: R1 = R and [pic], n>1.

Theorem: The relation R is transitive if and only if R(Rn, n(1.

Representation: The relation R from a set A to a set B can be represented by a zero-one matrix MR = [mij], where

[pic]

Notes:

• This is particularly useful on computers, particularly ones with hardware bit operations for packed words.

• MR contains I for reflexive relations.

• [pic] for symmetric relations.

• mij = 0 or mji = 0 when i(j for antisymmetric relations.

Examples:

• [pic] is transitive and symmetric.

• [pic] is antisymmetric.

Representation: A relation can be represented as a directed graph (or digraph). For (a,b)(R, a and b are vertices (or nodes) in the graph and a directional edge runs from a to b.

Example: The following digraph represents {(a,b), (b,c), (c,a), (c,b)}.

a( ( b

( c

What about all of those examples on page 159 of the class notes? We can do all of them over in either representation.

Examples (from page 159):

• [pic]

• [pic] or a digraph a1 ( ( a2

• [pic]

• [pic]

• [pic]

• [pic] or the digraph a3 ( ( a4

Definition: A relation on a set A is an equivalence relation if it is reflexive, symmetric, and transitive. Two elements a and b that are related by an equivalence relation are called equivalent and denoted a~b.

Examples:

• Let A = Z. Define aRb if and only if either a = b or a = (b.

o symmetric: aRa since a = a.

o reflexive: aRb ( bRa since a = (b.

o transitive: aRb and bRc ( aRc since a = (b = (c.

• Let A = R. Define aRb if and only if a(b(Z.

o symmetric: aRa since a(a = 0(Z.

o reflexive: aRb ( bRa since a(b(Z ( ((a(b) = b(a(Z.

o transitive: aRb and bRc ( aRc since (a(b)+(b(c) (Z ( a(c(Z.

Definition: Let R be an equivalence relation on a set A. The set of all elements that are related to an element a(A is called the equivalence class of a and is denoted by [a]R. When R is obvious, it is just [a]. If b([a]R, b is called a representative of this equivalence class.

Example: Let A = Z. Define aRb if and only if either a = b or a = (b. There are two cases for the equivalence class:

• [0] = {0}

• [a] = {a, (a} if a(0.

Theorem: Let R be an equivalence relation on a set A. For a,b(A, the following are equivalent:

1. aRb

2. [a] = [b]

3. [a] ( [b] ( (.

Proof: 1 ( 2 ( 3 ( 1.

• 1 ( 2: Assume aRb. Suppose c([a]. Then aRc. Due to symmetry, we know that bRa. Knowing that bRa and aRc, by transitivity, bRc. Hence, c([b]. A similar argument shows that if c([b], then c([a]. Hence, [a] = [b].

• Assume that [a] = [b]. Since a(A and R is reflexive, [a] ( [b] ( (.

• Assume [a] ( [b] ( (. So there is a c([a] and c([b], too. So, aRc and bRc. By symmetry, cRb. By transitivity, aRc and cRb, so aRb.

Lemma: For any equivalence relation R on a set A, [pic].

Proof: For all a(A, a([a]R.

Definition: A partition of a set S is a collection of disjoint sets whose union is A.

Theorem: Let R be an equivalence relation on a set S. Then the equivalence classes of R form a partition of S. Conversely, given a partition {Ai | i(I} of the set S, there is an equivalence relation R that has the sets Ai, i(I, as its equivalence classes.

Graphs

Definition: A graph G = (V,E) consists of a nonempty set of vertices V and a set of edges E. Each edge has either one or two vertices as endpoints. An edge connects its endpoints.

Note: We will only study finite graphs (|V| < ().

Categorizations:

• A simple graph has edges that connects two different vertices and no two edges connect the same vertex.

• A multigraph has multiple edges connecting the same vertices.

• A loop is a set of edges from a vertex back to itself.

• A pseudograph is a graph in which the edges do not have a direction associated with them.

• An undirected graph is a graph in which the edges do not have direction.

• A mixed graph has both directed and undirected edges.

Definition: Two vertices u and v in an undirected graph G are adjacent (or neighbors) in G if u and v are endpoints of an edge e in G. Edge e is incident to {u,v} and e connects u and v.

Definition: The degree of a vertex v, denoted deg(v), in an undirected graph is the number of edges incident with it except that loops contribute twice to the degree of that vertex. If deg(v) = 0, then it is isolated. If deg(v) = 1, then it is a pendant.

Handshaking Theorem: If G = (V,E) is an undirected graph with e edges, then [pic].

Proof: Each edge contributes 2 to the sum since it is incident to 2 vertices.

Example: Let G = (V,E). Suppose |V| = 100,000 and deg(v) = 4 for all v(V. Then there are (4(100,000)/2 = 200,000 edges.

Theorem: An undirected graph has an even number of vertices and an odd degree.

Definition: Let (u,v)(E in a directed graph G(V,E). Then u and v are the initial and terminal vertices of (u,v), respectively. The initial and terminal vertices of a loop (u,u) are both u.

Definition: The in-degree of a vertex, denoted deg((v), is the number of edges with v as their terminal vertex. The out-degree of a vertex, denoted deg+(v), is the number of edges with v as their initial vertex.

Theorem: For a directed graph G(V,E), [pic].

Examples of Simple Graphs:

• A complete graph has an edge between any vertex.

• A cycle Cn is a graph with |V|(3 such that the n edges are from {v1,v2}, {v2,v3}, …, {vn,v1}.

• A wheel Wn is a cycle Cn with an extra vertex with an edge connecting to each vertex in Cn.

Definition: A simple graph G = (V,E) is bipartite if V = V1(V2 with V1(V2 = ( and every edge in the graph connects a vertex in V1 to a vertex in V2. The pair (V1,V2) is a bipartition of V in G.

Theorem: A simple graph is bipartite if and only if it is possible to assign one of two colors to each vertex of the graph so that no two adjacent vertices are assigned the same color.

Definition: The union of two simple graphs G = (V,E) and H = (W,F) is the simple graph G(H = (V(W,E(F).

Representation: For graphs without multiple edges we can use adjacency lists or matrices. For general graphs we can use incidence matrices.

Definition: Let G(V,E) have no multiple edges. The adjacency list LG = {av}v(V, where av = adj(v) = {w(V | w is adjacent to v}.

Definition: Let G(V,E) have no multiple edges. The adjacency matrix AG = [aij] is

[pic]

Example:

[pic] results in [pic] and [pic]

Note: For an undirected graph, [pic]. However, this is not necessarily true for a directed graph.

Definition: The incidence matrix M = [mij] for G(V,E) is

[pic]

Definition: The simple graphs G(V,E) and H = (W,F) are isomorphic if there is an isomorphism f: V(W, a one to one, onto function, such that a and b are adjacent in G if and only if f(a) and f(b) are adjacent in H for all a,b(V.

Examples:

• [pic] and [pic] are not isomorphic.

• [pic] and [pic] are isomorphic.

Note: Isomorphic simple graphs have the same number of vertices and edges.

Definition: A property preserved by graph isomorphism is called a graph invariant.

Note: Determining whether or not two graphs are isomorphic has exponential worst case complexity, but linear average case complexity using the bet algorithms known.

Definition: Let G = (V,E) be an undirected graph and n(N. A path of length n for u,v(V is a sequence of edges e1, e2, …, en(E with associated vertices in V of u = x0, x1, …, xn = v. A circuit is a path with u = v. A path or circuit is simple if all of the edges are distinct.

Notes:

• We already defined these terms for directed graphs.

• The terminal vertex of the first edge in a path is the initial vertex of the second edge. We can define a path using a recursive definition.

Definition: An undirected graph is connected if there is a path between every pair of distinct vertices in the graph.

Theorem: There is a simple path between every distinct pair of vertices of a connected undirected graph G = (V,E).

Proof: Let u,v(V such that u ( v. Since G is connected, there is a path from u to v that has minimum length n. Suppose this path is not simple. Then in this minimum length path, there is some pair of vertices xi=xj(V for some 0(i ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download