MUI_IIMAI_16_Markov_chains

hace 297 días por sevillad1617

Markov chains / Cadenas de Márkov

Markov chains are a common way of modelling random processes, and they make use of Linear Algebra. We will study the basics and learn to use Sage to do computations with them.

A random variable (in Spanish variable aleatoria) is simply a value that may change every time we observe it. For example, a dice roll.

Let us simulate it with Sage. We will produce random numbers 0,...,5 with probability 1/6 each. Run the next cell several times.

P = [1/6]*6 print P X = GeneralDiscreteDistribution(P) X.get_random_element() 
       
[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
5
[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
5
 
       

What is the relation between one result and the next?

¿Qué relación hay entre un resultado y el siguiente?

 

Unfortunately, in many real-life situations the result of a random process depends on the previous results. For example, the answer to the question "Will it rain tomorrow?" is related to whether it rained today. But for this purpose Markov chains are very well suited.

A box has 3 white balls and 3 black balls. If we take one out at random, and repeat (up to 6 times of course), does the probability of white/black change?

En una caja hay 3 bolas blancas y 3 bolas negras. Si sacamos una al azar y repetimos (hasta 6 veces claro), ¿cambia la probabilidad de blanco/negro?

 

Basic concepts

A Markov chain consists in:

  • an infinite sequence of random variables;
  • the possible values of the variables, called states. There could be infinitely many. Here we will work with a finite number of states, so we can do calculations.

In this way, a Markov chain can be understood as a random variable that takes as values infinite sequences of states (in Spanish sucesiones infinitas de estados). Each time we observe the chain we see a different infinite sequence.

It can happen that the next state depends only on the one before; or it can happen that it depends on previous states. For example, "Will it rain tomorrow?" depends on what happened during several days (it basically depends on the season, for example). Here we will only work with processes where each state depends only on the previous state.

If it rains one day, does it matter which is the season of the year in order to guess the next day's weather?

Si un día llueve, ¿importa la estación a la hora de adivinar si lloverá al día siguiente?

If I want to guess the next result of a regular dice, does it help to know the previous ones?

Para adivinar la siguiente tirada de un dado normal, ¿me ayuda conocer las anteriores?

 
       

Some examples of processes that we can model with Markov chains:

 

  • The box with balls of before. Each state is "X black and Y white balls in the box".

 

  • Sun/rain each day, if we assume that there are no seasons (sometimes this is done in practice).

 

  • Student A may come to class next Friday or not. He/she tells student B, who hears the answer, but with probability 0.01 gets it wrong. B tells C... At each step the answer to "is A coming on Friday?" is Yes/No.

 

  • We toss a coin many times, until we get four heads. Then we stop.

 

  • A drunk person is in a bar, some blocks away from home. Every time he/she walks a block there is a chance to go home, or maybe to go the opposite way; towards home or towards the bar. When at home or at the bar, the person will stay there forever.

 

  • A model of how economical situation changes through generations:
    • If a person is rich, its child will be rich with probability 4/5 and middle class with probability 1/5.
    • It a person is middle class, its child will be middle class with probability 6/10, rich with probability 1/10 and poor with probability 3/10.
    • If a person is poor, its child with be poor with probability 9/10 and middle class with probability 1/10.

 

  • Given a directed graph, we can "walk" around: at each vertex, choose randomly one of the edges that start in that vertex, and travel along to the end of the vertex. An example:

 

  • A dice can also be modelled with a Markov chain; the states are 1 to 6 (with Sage we would work with 0 to 5).

 

  • An example in Genetics: an individual has a pair of genes, each can be "G" or "g". So each individual can have GG, Gg or gg. The first two cases cannot be distinguished in appearance; "G" is called dominant and "g" recessive. When two individuals mate, each one gives one of its two genes at random to its offspring. For example, GG and Gg may produce GG or Gg, because the first one gives G for sure and the second one can give G or g.
    • Now we take an individual, mate it with a Gg, and do the same with its offspring repeatedly. The states of the succesive generations can be GG, Gg or gg.
    • Same if they always mate with a GG.
    • If we want both individuals to be random, the states will be pairs of GG, Gg and gg (how many are there?)

 

 
       

If we want to determine a Markov chain we need to say the states in a certain order, and the probability that one state changes to another one. These values can be written into a transition matrix: each row are the probabilites that a given state becomes any other one.

For example, for the economical situation model, we choose states "Rich", "Middle class" and "Poor" in that order, so the matrix is:

P = Matrix([ [4/5, 1/5, 0], [1/10, 6/10, 3/10], [0, 1/10, 9/10] ]) P 
       
[ 4/5  1/5    0]
[1/10  3/5 3/10]
[   0 1/10 9/10]
[ 4/5  1/5    0]
[1/10  3/5 3/10]
[   0 1/10 9/10]

Write the transition matrices of two other examples.

Escribe las matrices de transición de otros dos ejemplos.

 
       
 
       
 
       

Any matrix can be a transition matrix, if it satisfies two conditions:

  • Each element of the matrix is $\geq0$.
  • The sum of each row is 1.

 

In some of the examples there are one or more states with the property that, once we arrive, we will stay there. They are called absorbing states. When every initial state can reach an absorbing state, the chain itself is called an absorbing chain.

In each example of the list, find the absorbing states.

En cada ejemplo de la lista, encuentra los estados absorbentes.

The absorbing states can be detected in the transition matrix: if the $i$-th row has all zeros and a one, and the one is in position $i$, then the $i$-th state is absorbing.

Find the absorbing states of the drunk walk, whose matrix is below.

Mira los estados absorbentes del paseo del borracho, cuya matriz está debajo.

d = 5 # the distance between the bar and home p = 2/3 # the probability of going in the right direction # the status is 0..d indicating the distance from our position to home P = Matrix(QQ,d+1) # rational numbers for i in [1..d-1]: P[i,i-1] = p P[i,i+1] = 1-p P[0,0] = 1; P[d,d]=1 P 
       
[  1   0   0   0   0   0]
[2/3   0 1/3   0   0   0]
[  0 2/3   0 1/3   0   0]
[  0   0 2/3   0 1/3   0]
[  0   0   0 2/3   0 1/3]
[  0   0   0   0   0   1]
[  1   0   0   0   0   0]
[2/3   0 1/3   0   0   0]
[  0 2/3   0 1/3   0   0]
[  0   0 2/3   0 1/3   0]
[  0   0   0 2/3   0 1/3]
[  0   0   0   0   0   1]

The next image is a random graph (like the one described in the list of examples). Find the absorbing states in the image and also looking at its matrix (we will write it in class).

La imagen siguiente es un grafo aleatorio (como el de la lista de ejemplos). Encuentra los estados absorbentes en la imagen y luego mirando su matriz (en clase la escribiremos).

set_random_seed(137272368778074143154610542690247382868) # 0..n, n=3..9 (con mas no se ve bien) #set_random_seed() #print initial_seed() V = [0..8] G = DiGraph([V,[]],multiedges=True) for i in [0..round(2*len(V))]: G.add_edge(choice(V),choice(V)) G.am() G.show(figsize=6) 
       
 
       

Calculations with Markov chains

Now we will start using the power of Sage to compute interesting things about Markov chains.

The first thing we can do is the following: choose an initial state at random and simulate a value of the chain. We do this with a loop:

  • Before the loop create a list L with only one element, the initial value. It would be an element from 0 to $n-1$ if the chain has $n$ states.
  • Inside the loop:
    • Take the row corresponding to the current state (the last element of L).
    • Choose at random one state according to the probabilities of the row. You have an example at the beginning of the session.
    • Append that state to L.
  • When you have all the states you want, show L.

Calculate 10 steps of the drunk walk, supposing that we start in an intermediate position that you choose. Repeat it a few times.

Calcula 10 pasos de borracho, suponiendo que empecemos en una posición intermedia que tú elijas. Repítelo varias veces.

 
       
 
       

After your experiments, where could we be in 10 steps?

Después de tus experimentos, ¿dóne podríamos estar dentro de 10 pasos?

The last question can be answered in a precise way using the matrix $M$. Suppose that we write a vector with the probabilities of the initial state. For example if we choose a concrete place to start it could be $(0,0,0,1,0,0)$. But if we are not sure of the initial state we could have different probabilites. For example if we could be equally be in two places we would write something like the vector $(0,0,0,1/2,1/2,0)$.

Then: if $v$ is a vector of state probabilities, $v\cdot P$ is the vector of probabilities in the next state.

For example, if we start one block away from the bar:

d = 5 # the distance between the bar and home p = 2/3 # the probability of going in the right direction # the status is 0..d indicating the distance from our position to home P = Matrix(QQ,d+1) # rational numbers for i in [1..d-1]: P[i,i-1] = p P[i,i+1] = 1-p P[0,0] = 1; P[d,d]=1 v = vector([0]*(d+1)) v[d-1] += 1 v*P 
       
(0, 0, 0, 2/3, 0, 1/3)
(0, 0, 0, 2/3, 0, 1/3)

If we don't know at all where we are (same probability for each state):

v = vector([1/(d+1)]*(d+1)) v; v*P 
       
(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
(5/18, 1/9, 1/6, 1/6, 1/18, 2/9)
(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
(5/18, 1/9, 1/6, 1/6, 1/18, 2/9)

If we start next to the bar, where could we be after 2 steps? And 3 steps? Calculate the probabilities. And after 50 steps?

Si empezamos al lado del bar, ¿dónde podríamos estar después de 2 pasos? ¿Y 3 pasos? Calcula las probabilidades. ¿Y después de 50 pasos?

 
       
 
       
 
       

As we can expect, if $P$ gives the probabilities of transitions after one step, $P^n$ gives the probabilities of transitions after $n$ steps.

Calculate $P^{100}$ and interpret the result. It will be easier to work with approximations: apply .n(digits=4) to the result.

Calcula $P^{100}$ e interpreta el resultado. Es más fácil aproximando: aplica .n(digits=4) al resultado.

 
       
 
       

Let us consider now the example of the Yes/No message that is transmitted from one person to the next. Suppose that with $p=0.99$ it is transmitted correctly. Then, if the states are Yes and No:

p = 0.99 P = Matrix(RDF,[ [p,1-p], [1-p,p], ]) P 
       
[                0.99 0.010000000000000009]
[0.010000000000000009                 0.99]
[                0.99 0.010000000000000009]
[0.010000000000000009                 0.99]

The probability of each result after 100 steps is read in the rows of:

P^100 
       
[ 0.5663097779473767 0.43369022205262364]
[0.43369022205262364  0.5663097779473767]
[ 0.5663097779473767 0.43369022205262364]
[0.43369022205262364  0.5663097779473767]
 
       

You can see that independently of how you started there is a similar probability of receiving Yes or No after 100 steps.If you took more steps, Yes and No would have probabilities of about 50%. Conclusion: it doesn't matter what the initial message was, after many steps the result is nearly the same.

This is a common characteristic of many Markov chains:

Independently of the initial vector $v$, the limit $v\cdot P^n$ when $n\to\infty$ is always the same vector $w$.

(It is not true for every chain but we will not go into details.)

In one of the two examples that you did before, take an initial probability vector and apply a large number of steps, then take other initial vectors to see if you get a similar results.

En uno de los dos ejemplos que hiciste antes, toma un vector inicial de probabilidades y aplica un número grande de pasos, después toma otros vectores iniciales para ver si te salen resultados parecidos.

 

 
       
 
       
 
       

In the next graph, take several initial vectors and calculate many steps, to see if we get different results.

En el grafo siguiente, toma varios vectores iniciales y calcula muchos pasos para ver si obtenemos resultados distintos.

#set_random_seed(300883861636595325563778496093690875780) # ? y 4 set_random_seed(171438275675054752107986872719147705740) # 4 y 4 #set_random_seed() #print initial_seed() V = [0..3] G = DiGraph([V,[]],multiedges=True) for i in [0..round(2*len(V))]: G.add_edge(choice(V),choice(V)) V = [4..7] G.add_vertices(V) for i in [0..round(2*len(V))]: G.add_edge(choice(V),choice(V)) G.am(); G.show(figsize=6) 
       
[0 0 1 0 0 0 0 0]
[0 0 1 0 0 0 0 0]
[1 1 0 1 0 0 0 0]
[1 1 1 0 0 0 0 0]
[0 0 0 0 0 1 0 0]
[0 0 0 0 1 0 1 1]
[0 0 0 0 0 0 0 1]
[0 0 0 0 2 0 0 0]
[0 0 1 0 0 0 0 0]
[0 0 1 0 0 0 0 0]
[1 1 0 1 0 0 0 0]
[1 1 1 0 0 0 0 0]
[0 0 0 0 0 1 0 0]
[0 0 0 0 1 0 1 1]
[0 0 0 0 0 0 0 1]
[0 0 0 0 2 0 0 0]
 
       

An important property of the limit vector $w$ is that it satisfies $w=w\cdot P$. So:

The vector $w$ is an eigenvector (in Spanish, autovector o vector propio) of the matrix $P$ for the eigenvalue 1.

With Sage:

  • We calculate the eigenvectors and select the one corresponding to the eigenvalue 1.
  • There are infinitely many eigenvectors (they form a linear subspace) and we need to take the one whose sum is equal to one (otherwise it is not a vectors of probabilities). For this we just take the vector from Sage and divide it by its sum.
P; print; P.eigenvalues() 
       
[                0.99 0.010000000000000009]
[0.010000000000000009                 0.99]

[1.0, 0.98]
[                0.99 0.010000000000000009]
[0.010000000000000009                 0.99]

[1.0, 0.98]
e = P.eigenvectors_left() e 
       
[(1.0, [(0.7071067811865475, 0.7071067811865475)], 1),
 (0.98, [(-0.7071067811865475, 0.7071067811865475)], 1)]
[(1.0, [(0.7071067811865475, 0.7071067811865475)], 1),
 (0.98, [(-0.7071067811865475, 0.7071067811865475)], 1)]
w = [v[1][0] for v in e if v[0]==1][0] w 
       
(0.7071067811865475, 0.7071067811865475)
(0.7071067811865475, 0.7071067811865475)
w = w/sum(w) w 
       
(0.5, 0.5)
(0.5, 0.5)
 
       

In summary, we can, for any Markov chain:

  • Simulate a series of states.
  • Calculate the probabilities of going from one state to another one after $n$ steps.
  • Calculate the probability vector after $n$ steps if we started with a certain initial probability vector.
  • Calculate the vector of limit probabilities.

Choose another example of the list and do (1) a simulation, (2) calculate the probability from some state to another state in three steps, (3) the limit probabilities.

Elige otro ejemplo de la lista y haz (1) una simulación, (2) calcula la probabilidad entre dos estados concretos tras tres pasos, (3) las probabilidades límite.

 
       
 
       
 
       
 
       
 
       

Calculating with absorbing chains

Now we will concentrate on absorbing chains.

Intuitively, in the drunk walk example we already expected that we will always end at home or at the bar, if we walk long enough. This can be checked experimentally as before. It is also the content of one of the main theorems on absorbing chains:

Theorem: If a Markov chain is absorbing, the probability of eventually reaching an absorbing state is 1.

 So, it may take a long time, but we should not expect to escape forever.

 

We can ask some other questions:

  • How many times are we going to visit each state before we end in an absorbing state? It is a random variable, we would like the average.
  • How many steps does it take to reach an absorbing state? It is a random variable, we would like the average.
  • If there is more than one absorbing state, what is the probability of ending at each one?

We will work on these questions now. For this we need to do more things with the matrix $P$.

Since we can put the states in any order, suppose that we put all the absorbing states in the beginning. How would the matrix look like? It would have blocks like this:

\[ P=\left(\begin{array}{c|c} I & 0 \\\hline R & Q \end{array}\right) \]

The top-left is an identity matrix, because the first states are absorbing so they go to themselves with probability 1. The top-right is zero because the absorbing states never go to other states. The two bottom blocks are arbitrary. Note that if $R$ is zero, then the non-absorbing states cannot reach the absorbing ones, so we would not have an absorbing chain (the condition was that every state could reach an absorbing state after some steps).

 

In our example, we rearrange $P$:

d = 5 # the distance between the bar and home p = 2/3 # the probability of going in the right direction # the status is 0..d indicating the distance from our position to home P = Matrix(QQ,d+1) # rational numbers for i in [1..d-1]: P[i,i-1] = p P[i,i+1] = 1-p P[0,0] = 1; P[d,d]=1 P = P.matrix_from_rows_and_columns([0]+[d]+[1..d-1],[0]+[d]+[1..d-1]) P 
       
[  1   0   0   0   0   0]
[  0   1   0   0   0   0]
[2/3   0   0 1/3   0   0]
[  0   0 2/3   0 1/3   0]
[  0   0   0 2/3   0 1/3]
[  0 1/3   0   0 2/3   0]
[  1   0   0   0   0   0]
[  0   1   0   0   0   0]
[2/3   0   0 1/3   0   0]
[  0   0 2/3   0 1/3   0]
[  0   0   0 2/3   0 1/3]
[  0 1/3   0   0 2/3   0]

The submatrix $Q$ is important.

Q = P.matrix_from_rows_and_columns([2,3,4,5],[2,3,4,5]) Q 
       
[  0 1/3   0   0]
[2/3   0 1/3   0]
[  0 2/3   0 1/3]
[  0   0 2/3   0]
[  0 1/3   0   0]
[2/3   0 1/3   0]
[  0 2/3   0 1/3]
[  0   0 2/3   0]

$Q^n$ contains the probabilities between the non-absorbing states after $n$ steps. We can see that as $n$ becomes large the probabilities become small, because it is more and more probable to reach an absorbing state. In fact, $Q^n \to 0$.

(Q^100).n(digits=4); (Q^100).n(digits=4) 
       
[4.783e-13    0.0000 3.869e-13    0.0000]
[   0.0000 1.252e-12    0.0000 3.869e-13]
[1.548e-12    0.0000 1.252e-12    0.0000]
[   0.0000 1.548e-12    0.0000 4.783e-13]
[4.783e-13    0.0000 3.869e-13    0.0000]
[   0.0000 1.252e-12    0.0000 3.869e-13]
[1.548e-12    0.0000 1.252e-12    0.0000]
[   0.0000 1.548e-12    0.0000 4.783e-13]
[4.783e-13    0.0000 3.869e-13    0.0000]
[   0.0000 1.252e-12    0.0000 3.869e-13]
[1.548e-12    0.0000 1.252e-12    0.0000]
[   0.0000 1.548e-12    0.0000 4.783e-13]
[4.783e-13    0.0000 3.869e-13    0.0000]
[   0.0000 1.252e-12    0.0000 3.869e-13]
[1.548e-12    0.0000 1.252e-12    0.0000]
[   0.0000 1.548e-12    0.0000 4.783e-13]

Now we define the fundamental matrix:

\[ N=(I-Q)^{-1} \]

It answers the first question, "how many times will we visit each state before ending in an absorbing state?". The row indicates the initial state, the column indicates the state that we want to visit, and the value in that position is the average number of times that we will visit before we fall into an absorbing state.

Below is $N$ for our example. What is the meaning of $N[3,2]$?

Debajo está la $N$ de nuestro ejemplo. ¿Qué significa $N[3,2]$?

N = (1-Q)^(-1) N; print; N[3,2] 
       
[45/31 21/31  9/31  3/31]
[42/31 63/31 27/31  9/31]
[36/31 54/31 63/31 21/31]
[24/31 36/31 42/31 45/31]

42/31
[45/31 21/31  9/31  3/31]
[42/31 63/31 27/31  9/31]
[36/31 54/31 63/31 21/31]
[24/31 36/31 42/31 45/31]

42/31

Next question: how many steps until we reach an absorbing state?

Consider $N$ again. We just saw that if we choose an initial state (a row of $N$), the elements of the row are the number of visits to the other non-absorbing states. From this we have:

Theorem: the column vector $N\cdot\left(\begin{array}{c} 1 \\ \vdots \\ 1 \\ \end{array} \right)$ contins the average number of steps until the end, from the different initial states.

Why is it true? Think of what hapens when you multiply a row of $N$ by the column of ones.

¿Por qué es así? Piensa en lo que pasa cuando multiplicas una fila de $N$ por la columna de unos.

t = N * (vector([1]*4).column()) t; print; t.n(digits=4) 
       
[ 78/31]
[141/31]
[174/31]
[147/31]

[2.516]
[4.548]
[5.613]
[4.742]
[ 78/31]
[141/31]
[174/31]
[147/31]

[2.516]
[4.548]
[5.613]
[4.742]

The first row means that it takes about two and a half steps to finish the walk if you start next to home. The last one means that it takes almost five if you start next to the bar.

 

For the last question, "If there is more than one absorbing state, what is the probability of ending at each one?", the answer is in the matrix $NR$. Again, each row indicates an initial state (non-absorbing), and it contains the probabilities of ending at each absorbing state.

d = 5 # the distance between the bar and home p = 2/3 # the probability of going in the right direction # the status is 0..d indicating the distance from our position to home P = Matrix(QQ,d+1) # rational numbers for i in [1..d-1]: P[i,i-1] = p P[i,i+1] = 1-p P[0,0] = 1; P[d,d]=1 P = P.matrix_from_rows_and_columns([0,5,1,2,3,4],[0,5,1,2,3,4]) R = P.matrix_from_rows_and_columns([2,3,4,5],[0,1]) R 
       
[2/3   0]
[  0   0]
[  0   0]
[  0 1/3]
[2/3   0]
[  0   0]
[  0   0]
[  0 1/3]
N = (1-Q)^(-1) N*R 
       
[30/31  1/31]
[28/31  3/31]
[24/31  7/31]
[16/31 15/31]
[30/31  1/31]
[28/31  3/31]
[24/31  7/31]
[16/31 15/31]

For example, the first row means: if you start at distance 1 from home, then almost surely you will end at home. The last row means: if you start next to the bar, it is almost 50-50 that you go home or to the bar.

Generate a random graph that has at least one absorbing state. Answer the three questions that we learned about absorbing chains.

Genera un grafo aleatorio que tenga algún estado absorbente. Responde las tres preguntas que hemos aprendido sobre cadenas absorbentes.

 
       
 
       
 
       
 
       

Exercise: a board game / un juego de mesa

We will model a known board game in order to practice what we have learned and interpret the results.