To understand markov chains you need to understand stochastic processes and to understand stochastic processes you need to understand random variables. I'm going to explain random variables briefly here in a very intuitive sense. For more details, refer to the earlier blog post at http://randomvars.blogspot.com/2010/10/fundamental-ideas-in-probability.html.
The most basic thing in probability is the experiment which is a set of trials. The experiment could be "two flips of a coin" consisting of two trials, each of which is a single flip of the coin. The possible outcomes of this experiment are { HH, HT, TH, TT }. This set is called the sample space.
A random variable is an interesting numerical property of the experiment. In our "two flips of a coin" experiment, an interesting numerical property could be the number of heads in any outcome. This would be the random variable:
X : number of heads
The random variable is described by two things: a range and a probability distribution. The range is the possible values that the random variable can take on and the probability distribution is the probability of each of those values.
In our case, the range of X = { 0, 1, 2 } :
0 when the outcome is TT
1 when the outcome is HT or TH and
2 when the outcome is HH
Each of these numbers is associated with a probability. The expression of the probability of each outcome is called the probability mass function. In our case the probability mass function is:
p(0) = 1/4
p(1) = 1/2 and
p(2) = 1/4
Sequence of Random Variables
Stochastic processes require the understanding of the concept of a sequence of random variables. For this, we need to consider a different experiment.
Consider a telephone switchboard that can handle up to 100 simultaneous calls. Define a random variable (an interesting numerical property of the experiment) X as the number of active calls at a given instant of time. The range of this random variable is { 0, 1, 2, ... 100 } which represents the specific number of possible active calls at any given time.
X has a probability mass function that describes the probability of each of the following:
0 active calls,
1 active call,
2 active calls and so on up to
100 active calls.
Now consider a related random variable X1 which is the number of active calls at exactly 10:00 am. And another random variable X2 which is the number of active calls at 10:05 am and so on till X100 which is the number of active calls around ten hours later.
X1 to X100 is a sequence of random variables. They all have the same probability mass function. And we could make the simplifying assumption that they are all independent (most calls dont last longer than 5 minutes). In that case, this sequence of random variables is said to be independent and identically distributed (abbreviated as "iid").
Stochastic Processes
Average:
Let us consider the average of the sequence of random variables:
A = (X1 + X2 + ... + Xn) / n
This means that to find the average number of active calls at any given time, we would add up the values obtained for all the measurements and divide by the total number which is the usual sense of average.
Stochastic Process:
Suppose we consider the outcome of the whole sequence [X1, X2, X3...Xn] as a list. For example, in one experiment we may get the result [32, 35, 40, ..., 22, 23] as the number of active calls at different times. This is one possible outcome of the joint sequence.
Such a joint sequence is called a stochastic process. It is described by the joint probability mass function which is the product of the individual probabilities of the Xi's.
For example, the probability of getting the vector [32, 35, 40, ..., 22, 23] will be given by the product of the probabilities of getting 32 and 35 and 40 and ... and 22 and 23:
p(32, 35, ...23) = p(32) x p(35) x ... p(23).
It gives a complete progression of a sequence of random variables and it has a probability associated with each of the possible outcomes.
Markov Chains
Consider a stochastic process defined by the unending sequence X1, X2,...Xn.... of random variables. Add to this stochastic process the property that the probability of the next random variable in the sequence depends on the value of the previous random variable. Such a stochastic process is called a markov chain. The term "chain" comes from the property (described in detail below) that gives the probability of going from one Xi to the next Xi+1.
Adding this property implies that the probability of Xn depends on the probability of Xn-1. But, Xn-1 could represent a hundred different values from 0 active calls to 100 active calls and Xn itself could represent a hundred different values. This means that there is a probability of Xn-1 being, say, 10 and Xn being 93; and any other possible combination.
This kind of thing can only be defined by a 100 X 100 matrix which we cant show here, but if we were to reduce the problem to a maximum of 3 active calls, we would get a matrix
0 | 1 | 2 | 3 | |
0 | 1/2 | 1/4 | 1/8 | 1/8 |
1 | 1/4 | 1/4 | 2/6 | 1/6 |
2 | 1/8 | 1/4 | 1/2 | 1/8 |
3 | 0 | 1/4 | 1/2 | 1/4 |
The first row of the matrix says that when Xn-1 is 0 then the probability of Xn also being 0 is 1/2 or 50%; the probability of Xn being 1 is 1/4 or 25% and so on.
This matrix is called a transition probability matrix for the markov chain.
Steady State Probabilities
There is an interesting consequence of the markov chain. Since the probability of transition into the next element in the sequence depends on the previous element in the sequence, the probability mass function changes every time the next element in the sequence comes up.
This is because the probability mass function for a random variable (of Xn) defines the probability of the random variable (Xn) taking on different possible values in its range. And that is exactly what the row of the transition matrix (corresponding to Xn-1) defines. But it is not the whole story.
The probabilities of Xn taking on different values is the probability mass function which can be calculated as:
the probability of the outcome taking on the first value in the range (viz. 0 active calls) times the transition probability of going from the first value to the Xn value
And this is then summed over all the values in the range of X:
pmf (stochastic process) = q(s) = p(x1)Q(Xn|x1) + p(x2)Q(Xn|x2) +... for the range of X
If the condition that q(s) does not change occurs, then the markov chain is said to be in the steady state.