COMPSCI 225:
Discrete Structures in Mathematics and Computer Science

Deterministic Finite Automata

Alphabet and strings

Alphabet

Link copied

Let Σ\Sigma be a finite alphabet. When Σ\Sigma contains kk letters, we say Σ\Sigma is a kk -letter alphabet. A 1-letter alphabet is a unary alphabet, and a 2-letter alphabet is a binary alphabet. Our typical binary alphabet is {a,b}\{a,b\} . A finite sequence of symbols from Σ\Sigma is called a string or word.

Strings

Link copied

Each string vv is of the form σ1σ2σn,\sigma_1 \sigma_2 \ldots \sigma_n, where σiΣ\sigma_i\in\Sigma . The length of vv , denoted by v|v| , is the number of symbols it has. Thus, abb=3|abb|=3 , baba=4|baba|=4 , and a=1|a|=1 .

The string of length 00 is called the empty string. It is denoted λ\lambda .

There are knk^n strings of length nn over a kk -letter alphabet.

Concatenation operation

Link copied

The set of all strings over the alphabet Σ\Sigma is

Σ={σ1σ2σmσ1,σ2,,σmΣ, mN}.\Sigma^{\star}=\{ \sigma_{1} \sigma_{2} \ldots \sigma_{m} \mid \sigma_{1}, \sigma_{2}, \ldots, \sigma_{m} \in \Sigma, \ m \in \N \}.

We denote strings by the letters u,v,w,u,v,w, \ldots .

The concatenation of uu and vv is obtained by writing uu followed by vv . Concatenation is denoted by uvu\cdot v . We have:

u(vw)=(uv)w.u\cdot (v\cdot w)=(u\cdot v) \cdot w.

Note that for any string uu , because λ\lambda is the empty string, we have λu=uλ=u\lambda \cdot u =u\cdot \lambda = u .

We sometime write uvuv , instead of uvu\cdot v .

Substrings

Link copied

For a string uu we denote by unu^n the following string:

un=uuu.n  times u^n=\underbrace{u\cdot u\cdot \ldots \cdot u.}_{n \ \ times}

If a string ww occurs in a string uu , we say that ww is a substring of uu . That is, ww is a substring of uu if u=u1wu2u=u_1wu_2 for some strings u1u_1 and u2u_2 . One can see that every string uu is a substring of itself (take u1=u2=λu_1 = u_2 = \lambda ).

A string ww is a prefix of a string uu if uu can be written as wu1wu_1 .

For example, The prefixes of aabbbaaabbba are λ\lambda , aa , aaaa , aabaab , aabbaabb , aabbbaabbb and aabbbaaabbba .

Languages

Link copied

A language over an alphabet Σ\Sigma is a subset of Σ\Sigma^{\star} .

Here are some examples of languages:

  1. \emptyset , Σ\Sigma^\star , {annN} \{a^n \mid n\in \N \} , {aba,bab} \{aba, bab\} .
  2. {w{a,b}bab\{w \in \{a,b\}^\star \mid bab is a substring of w}w\} .
  3. {wΣw\{w \in \Sigma^\star \mid w has even length }\} .
  4. {wΣw\{w \in \Sigma^\star \mid w has a substring aba}aba\} .

We denote languages by UU , VV , WW , LL , \ldots .


Operations on languages

Boolean operations

Link copied

Let UU and VV be languages over an alphabet Σ\Sigma . The following operations on languages are called Boolean operations:

  1. The union of UU and VV is UVU\cup V ,
  2. The intersection of UU and VV is UVU\cap V ,
  3. The complement of UU is ΣU\Sigma^{\star}\setminus U .

Concatenation operation

Link copied

Let UU and WW be languages on some alphabet set Σ\Sigma . The concatenation of UU and WW , denoted by UWU\cdot W , is the language UW={uwuU,wW}U\cdot W =\{u\cdot w \mid u\in U, w\in W\} .

Example 7.1.Link copied Let U={aba,bab}U=\{aba, bab\} and W={aab,bba}W=\{aab, bba\} . Then UW={abaaab,ababba,babaab,babbba}U\cdot W=\{abaaab, ababba, babaab, babbba\} .

Deterministic finite automata

Let UU be a language over an alphabet Σ\Sigma . Suppose we are given a string vv , and we would like to check whether vv belongs to UU . A deterministic finite automaton (DFA) is an algorithm that determines whether vv belongs to UU or not.

We can represent a finite automata as a labeled directed graph. We call this graph the transition diagram.

Formal definition of a DFA

Link copied

Definition 7.1 _A deterministic finite automaton (DFA) is a 55 -tuple (S,q0,T,F,Σ)(S, q_0, T, F, \Sigma) , where


Example 7.2 Let us look at the language UU that consists of all strings uu such that uu contains the substring baabaa . We want to design an algorithm that, given a string vv , determines whether vUv\in U . Below is the FindFind -baa(v)baa(v) -algorithm that on input

v=σ1σnv=\sigma_1\ldots \sigma_n

determines if vv contains baabaa as a sub-string (and if it does, then vUv \in U ).


Find-baa(v)-algorithm

The algorithm makes its transitions from one state to another depending on the input symbol σ\sigma read. The transition function is T:{0,1,2,3}×{a,b}{0,1,2,3}T : \{ 0,1,2,3\} \times \{ a, b\} \to \{ 0,1,2,3 \} given by the following Table.

ab001121231333 \begin{array}{|c|c|c|} & a & b\\ 0 & 0 & 1\\ 1 & 2 & 1 \\ 2 & 3 & 1 \\ 3 & 3 & 3 \ \end{array}

We can represent a finite automata as a labeled directed graph. We call this graph transition diagram. Therefore we have the following transition diagram for the FindFind -baa(v)baa(v) -algorithm.


As a program, the algorithm Find-baa is the following:

  1. Initialize variables i=1i=1 and state=0state=0 .
  2. If state=0state=0 and σi=a\sigma_i=a then set state=0state=0 .
  3. If state=0state=0 and σi=b\sigma_i=b then set state=1state=1 .
  4. If state=1state=1 and σi=a\sigma_i=a then set state=2state=2 .
  5. If state=1state=1 and σi=b\sigma_i=b then set state=1state=1 .
  6. If state=2state=2 and σi=a\sigma_i=a then set state=3state=3 .
  7. If state=2state=2 and σi=b\sigma_i=b then set state=1state=1 .
  8. If state=3state=3 and σi{a,b}\sigma_i\in \{a,b\} then state=3state=3 .
  9. Increment ii by one.
  10. If i=n+1i=n+1 then go to Line 11. Otherwise go to Line 22 .
  11. If state=3state=3 then output accept. Otherwise output reject.

Example 7.3.Link copied What is the transition table for the following transition diagram?

Runs and acceptance

Link copied

Let M=(S,q0,T,F,Σ)\mathcal M=(S, q_0, T, F, \Sigma) be a DFA and u=σ1σnu=\sigma_1\ldots \sigma_n be a string. The run of the automaton on uu is the sequence of states s1,s2,,sn,sn+1s_1, s_2, \ldots,s_n, s_{n+1} such that s1s_1 is the initial state and T(si,σi)=si+1T(s_i, \sigma_i)=s_{i+1} for all i=1,,ni=1,\ldots, n .

The run of M\mathcal M on a string u=σ1σnu=\sigma_1\ldots \sigma_n can be viewed of as the execution the following algorithm Run(M,u)Run(\mathcal M, u) :

  1. Initialize s=q0s=q_0 , i=1i=1 , and print ss .
  2. While ini\leq n do

(a) Set σ=σi\sigma=\sigma_i .

(b) Set s=T(s,σ)s=T(s,\sigma) .

(c) Print ss .

(d) Increment ii

Let M=(S,q0,T,F,Σ)\mathcal M=(S, q_0, T, F, \Sigma) be a DFA and u=σ1σnu=\sigma_1\ldots \sigma_n be a string. We say that M\mathcal M accepts uu if the run s1,,sn,sn+1s_1,\ldots, s_n, s_{n+1} of M\mathcal M on uu is such that the last state sn+1Fs_{n+1}\in F . Such a run is called an accepting run.

DFA recognizable languages

Link copied

Let M=(S,q0,T,F,Σ)\mathcal M=(S,q_0, T,F,\Sigma) be a DFA. The language accepted by M\mathcal M , denoted by L(M)L(\mathcal M) , is the language L(M)={wL(\mathcal M) = \{w \mid the automaton M\mathcal M accepts m}m\} .

A language LΣL\subseteq \Sigma^\star is DFA recognizable if there exists a DFA M\mathcal M such that L=L(M)L=L(\mathcal M) .


Example 7.4 Consider a DFA with exactly one state. If the state is an accepting state, then the automaton accepts the language Σ\Sigma^\star . If the state is not an accepting state, then the automaton recognizes the empty language \emptyset .


Example 7.5.Link copied Consider the language L={u}L=\{u\} consisting of one word u=σ1σn.u=\sigma_1\ldots \sigma_n.

This language LL is recognized by the following DFA (S,0,T,F)(S, 0, T, F) :

  1. S={0,1,2,3,4,,n+1}S=\{0,1,2,3,4,\ldots, n+1\} with 00 being the initial state.

  2. For all in1i \leq n-1 , T(i,σi+1)=i+1T(i,\sigma_{i+1})=i+1 . In other cases T(s,σ)=n+1T(s,\sigma)=n+1 .

  3. The accepting state is nn .


Example 7.6.Link copied Describe languages accepted by the DFA below:

Designing finite automata

Given a language LL , we would like design a deterministic finite automata that recognizes the language LL . But can we always do so? And if so, how?

For example, let us consider the language L={abnanN}L=\{ ab^na \mid n \in \N \} . Is there a DFA recognising LL ?


Example 7.7.Link copied Consider the language L={uu{a,b}L=\{u \mid u \in \{a,b\}^\star such that uu contains an odd number of aa ‘s and an even number of bb ‘s }.\}.

Our problem is to design a DFA recognizing this language.

One way of doing so, is to count the number of aa ‘s and bb ‘s in each given word. But we don’t really need to count the number of aa ‘s and bb ‘s. We just need to keep track of four cases: whether the number of aa ‘s is odd or even, and whether the number of bb ‘s is odd or even. We can do this with four states:

State 0: Even number of aa ‘s and bb ‘s.

Stete 1: Even number of aa ‘s and odd number of bb ‘s .

State 2: Odd number of aa ‘s and bb ‘s.

State 3: Odd number of aa ‘s and even number of bb ‘s.

We get the following transition diagram:


Designing finite automata

Union automata

Link copied

Let M1=(S1,q0(1),T1,F1)\mathcal M_1=(S_1, q_0^{(1)}, T_1,F_1) and M2=(S2,q0(2),T2,F2)\mathcal M_2=(S_2, q_0^{(2)}, T_2, F_2) , and let two DFA recognizing L1=L(M1)L_1=L(\mathcal M_1) and L2=L(M2)L_2=L(\mathcal M_2) .

The Union Problem: Design a DFA M=(S,q0,T,F)\mathcal M=(S, q_0, T,F) that recognizes L1L2L_1\cup L_2 .


Construction of DFA M=(S,q0,T,F)\mathcal M=(S,q_0, T, F) for L1L2L_1\cup L_2

  1. The set SS of states is S1×S2S_1\times S_2 .

  2. The initial state is the pair (q0(1),q0(2))(q_0^{(1)}, q_0^{(2)}) .

  3. The transition function TT is the product of the transition functions T1T_1 and T2T_2 , that is:

T((p,q),σ)=(T1(p,σ),T2(q,σ)),T((p,q), \sigma)=(T_1(p,\sigma), T_2(q,\sigma)),

where pS1p\in S_1 , qS2q\in S_2 , and σΣ\sigma \in \Sigma .

  1. The set FF of final states consists of all pairs (p,q)(p,q) such that either pF1p\in F_1 or qF2q\in F_2 .

The notation for the new automaton is: M1M2\mathcal M_1\oplus \mathcal M_2 .

Why does the construction work?

If uL1L2u\in L_1\cup L_2 then either M1M_1 accepts uu or M2M_2 accepts uu . In either case, since MM simulates both M1M_1 and M2M_2 , the string uu must be accepted by MM .

If uu is accepted by MM then the run of MM on uu is split into two runs: one is the run of M1M_1 on uu and the other is the run of M2M_2 on uu . Since MM accepts uu , it must be the case that one of the runs is accepting.

Intersection automata

Link copied

Let M1=(S1,q0(1),T1,F1)\mathcal M_1=(S_1, q_0^{(1)}, T_1,F_1) and M2=(S2,q0(2),T2,F2)\mathcal M_2=(S_2, q_0^{(2)}, T_2, F_2) , and let two DFA recognizing L1=L(M1)L_1=L(\mathcal M_1) and L2=L(M2)L_2=L(\mathcal M_2) .

The intersection problem : Design a DFA M=(S,q0,T,F)\mathcal M=(S, q_0, T,F) that recognizes L1L2L_1\cap L_2 .


Construction of DFA M=(S,q0,T,F)\mathcal M=(S,q_0, T, F) for L1L2L_1\cap L_2

  1. The set SS of states is S1×S2S_1\times S_2 .

  2. The initial state is the pair (q0(1),q0(2))(q_0^{(1)}, q_0^{(2)}) .

  3. The transition function TT is the product of the transition functions T1T_1 and T2T_2 , that is:

T((p,q),σ)=(T1(p,σ),T2(q,σ)),T((p,q), \sigma)=(T_1(p,\sigma), T_2(q,\sigma)),

where pS1p\in S_1 , qS2q\in S_2 , and σΣ\sigma \in \Sigma .

  1. The set FF of final states consists of all pairs (p,q)(p,q) such that pF1p\in F_1 and qF2q\in F_2 .

The notation for the automaton MM is this: M1M2M_1\otimes M_2 .

Complementation automata

Link copied

The complementation problem :

Given a DFA M=(S,q0,T,F)M=(S,q_0, T, F) , design a DFA that recognizes the complement of L(M)L(M) .

This is a simple procedure. Keep the original states, the initial state, and the transition function TT . Swap: Declare non-accepting states as accepting, and accepting states non-accepting.