Lab 10 - RegExp

Lab Session #10 #

Agenda #

  1. RegExp to (N)FSA: Thompson’s Construction
  2. FSA to RegExp: Kleene’s Algorithm

From Regular Expression to (N)FSA. #

The Thompson’s Construction #

  • It is an algorithm for transforming a regexp into an equivalent (N)FSA.

  • I This (N)FSA can be used to match strings against the regular expression.


The Algorithm #

  • The algorithm works recursively by splitting an expression into its constituent subexpressions, from which the (N)FSA will be constructed using a set of rules (see below).

Rule: the Empty Expression #

The empty-expression ϵϵ is converted to:

1


Rule: a symbo #

A symbol a of the input alphabet is converted to

2


Rule: Concatenation Expression #

The concatenation expression st is converted to

3

N(s) and N(t) are the (N)FSA of the subexpression s and t, respectively.


Rule: Union Expression #

The union expression s|t is converted to

4

N(s) and N(t) are the (N)FSA of the subexpression s and t, respectively.


Rule: Kleene Star Expression #

The Kleene star expression ss ^∗ is converted to

5

N(s) is the (N)FSA of the subexpression s.


Example 1 #

Build a (N)FSA for (101)(1 | 01)^*

Solution:


Exercises #

Build a (N)FSA for:

  1. 0101^*
  2. (01)01(0 | 1)01
  3. 00(01)00(0 | 1)^*

Solution. #


FSA to RegExp #

Kleene’s Algorithm: from FSA to Regular Expression #

It transforms a given finite state automaton (FSA) into a regular expression. Description: Given an FSA M=(Q,A,δ,q0,F) M = (Q, A, δ, q_0, F) , with

Q=q0,...,qnQ = {q0, . . . , qn} its set of states, the algorithm computes

  • the sets RijkR^k_{ij} of all strings that take M from state qijq_{ij} to qjq_j without going through any state numbered higher than k.

  • each set RijkR^k_{ij} represented by a regular expression.

  • the algorithm computes them step by step for k=1,0,...,n.k = −1, 0, . . . , n.

  • since there is no state numbered higher than n, the regular expression R0jnR^n_{0j} represents the set of all strings that take M from its start state qiq_i to qjq_j .

  • If F=qi,...,qfF = {q_i , . . . , q_f } is the set of accept states, the regular expression R0in...R0fnR^ n _{0i} | . . . | R^ n _{0f} represents the language accepted by M.


Kleene’s Algorithm #

The initial regular expressions, for k = −1, are computed as

  • Rij1=a1...amR^ {−1} _{ij} = a_1 | . . . | a_m if i=ji = j ,where δ(qi,a1)=...=δ(qi,am)=qjδ (q_i , a_1) = . . . = δ (q_i , a_m) = q_j

  • Rij1=a1...amϵR^ {−1} _{ij} = a_1 | . . . | a_m|ϵ if i=ji = j ,where δ(qi,a1)=...=δ(qi,am)=qjδ (q_i , a_1) = . . . = δ (q_i , a_m) = q_j

After that, in each step the expressions RiklR^k_il are computed from the previous ones by

RijkR^k_{ij} = Rikk1R^{k-1}_{ik} (Rkkk1)(R^{k-1}_{kk})^* Rkjk1R^{k-1}_{kj} | Rijk1R^{k-1}_{ij}


Kleene’s Algorithm: Example #

Give a regular expression that describes the language accepted by:

11

Initial Regular Expression (Step -1)

R001R^{-1}_{00} = 0 | ϵϵ

R011R^{-1}_{01} = 1

R101R^{-1}_{10} = 0

R111R^{-1}_{11} = ϵϵ

Step 0

R000R^{0}_{00} = (0ϵ)(0 |ϵ) (0ϵ)(0 |ϵ)^* (0ϵ)(0 |ϵ) | (0ϵ)(0 |ϵ) = 00^*

R010R^{0}_{01} = (0ϵ)(0 |ϵ) (0ϵ)(0 |ϵ)^* 1 |1= 010^*1

R100R^{0}_{10} = 0 (0ϵ)(0 |ϵ)^* (0ϵ)(0 |ϵ) | 0 = 0000^*

R110R^{0}_{11} = 0 (0ϵ)(0 |ϵ)^* 1 | ϵϵ = 00100^*1 | ϵϵ

step 1

R001R^1_{00} = (01)(0^*1) (001ϵ)(00^*1| ϵ)^* (00)0(00^*)|0^* = (01)(0^*1) (001)(00^*1)^* (00)0(00^*)|0^*

Do we need to compute the rest? (i.e R011,R101,R111R^{1}_{01},R^{1}_{10},R^{1}_{11} )


Exercises #

Give a regular expression that describes the language accepted by:

16

solution #


Homework #

  1. Give a regular expression that describes the language accepted by:

18

  1. Build a (N)FSA for: (11)(01)(11)^*(0 | 1)