Thursday, November 22, 2007

Intuitively Understanding Random Variables

Understanding random variables and probability distributions is not as intuitive as one would like it to be. And searching on the internet typically gets you articles that contain a lot of equations and a lot of math which doesnt give the intuitive feel for the subject. The following discussion is to give that intuitive feel for the subject.

Probability

The first thing to understand is probabilities. And the simple situation to deal with probabilities is that of tossing a coin. Multiple times. And one needs to understand the probabilities involved in this.

There is a very parallel situation that can be used for understanding this idea: binary numbers. Binary numbers are numbers that consist of ones and zeros only. Counting in binary numbers proceeds as:
0
1
10
11
100
101
110
111 ... where these numbers numerically correspond to 0 1 2 3 4 5 6 7 ...

Now consider a binary number (e.g. 1100011001). It could be used to represent a sequence of coin tosses where the sequence of outcomes was (H H T T T H H T T H) where the H and T have been replaced by 1 and 0 respectively.

Now we can use some properties of binary numbers which will help us in calculating probabilities.

Consider an experiment consisting of 10 tosses of a coin. This will lead to a ten bit binary number (bit = binary digit). Like 1100011001.

All combinations of 10 bit binary numbers can be enumerated as
00000 00000
00000 00001
00000 00010
...
11111 11101
11111 11110
11111 11111

From the theory of binary numbers we know that there will be 1024 (or 2 raised to 10) such numbers. This means that there will be 1024 possible outcomes for 10 tosses of a coin.

Now let us consider a possible outcome of the experiment - let us consider the outcome that there will be only one head, all the others will be tails. To understand this, let us look at it in terms of the binary numbers. What we are looking for is those 10 bit binary numbers which have only one 1 in them. How many of these do we have? Lets enumerate:
00000 00001
00000 00010
00000 00100
00000 01000
...
00100 00000
01000 00000
10000 00000

There are ten of them - with a 1 in each of the 10 digit (bit) positions.
So what can one say about the probability of getting one of these patterns: this is expressed as 10 possibilities in 1024 outcomes (10/1024) which is less than one in hundred (or 0.009... also expressed as 0.9..% probability).

So, get it? The probability is the total outcomes matching the criteria divided by the total outcomes possible.

Random Variables and Probability Distributions
Above we considered the probablity of a single outcome - that of getting a single head in a series of 10 tosses of a coin. And got a value that is less that one in hundred.

We could continue this analysis and find the probabilities for getting 2 heads in a series of 10 tosses. The analysis would be more complicated, but here is the general pattern that we would follow:

Let us fix the position of one 1, and see how many other possibilities are there.
XXXXX XXXX1

00000 00011
00000 00101
00000 01001
...
10000 00001
There are nine of these patterns since the 1 in the last position is fixed and the other 1 can be in the remaining nine posistions. So these are nine outcomes.
Now we change the position of the fixed 1 to the next spot:

00000 00011
00000 00110
00000 01010
...
10000 00010
There are nine more of these patterns. However, notice that the pattern 00000 00011 has repeated.

And we can continue to shift the fixed 1 in all the 10 bit positions.
So we get 10 X 9 = 90 possible variations. But each pattern has occurred twice, so we get 90 / 2 distinct outcomes = 45. So, now the proability of getting 2 heads in a series of 10 tosses is 45 / 1024 which is a 4-fold increase in the probability.

We continue to do this kind of counting and getting the possible outcomes for each condition - 3 heads, 4 heads, 5 heads ...

OK - so we are now in a position to define the random variable:
Let us represent by "X" the specific outcome like 4 heads or 7 heads etc. What this means is that X can take values from 0 to 10. So if X has the value 7, it means that we are talking about the outcome of getting 7 heads in a series of 10 tosses.

Because X can take on differnt values (all the way from 0 to 10), we call it a variable and it can be plotted on a graph. We plot X on the horizontal (or x-)axis.

Then what can we plot on the y-axis? The probability values.

For a series of 10 tosses, the values for the probabilities work out to:
X=0 P=0.00097
X=1 P=0.0097
X=2 P=0.043
X=3 P=0.17
X=4 P=0.20
X=5 P=0.24
X=6 P=0.20
X=7 P=0.17
X=8 P=0.043
X=9 P=0.0097
X=10P=0.00097

If you plot these values on a graph on your notebook and join the dots with a smooth curve, you will get the famous "bell-shaped" curve.

This graph is called the probablity distribution for the series of 10 tosses of a coin. The distribution is also referred to as the binomial distribution.

A Technicality (that can be ignored)
Now, what we called X would properly be called X-10 (actually, X with subscript 10) since it related to a series of 10 tosses. We could also consider X-11 which represents a series of 11 tosses. And of course, X-1...X-10... to infinity. We can represents all these X's by a single X which represents the outcomes of any series of coin tosses. This is properly called the random variable. However, there is a probabilty distribution associated with each X-i and so what we have considered is technically correct.