The results of two coins which are tossed 200 times respectively are:
(A) 11110000000000100000101100000100000101001111001100 01111110010110110101101001111001100011011101100000 10001001111110100100001011001011101101110001010010 01100111111100011100101000101001110011100010100111 (B) 01110010010010100010011110010100010011010111001110 01111011010111101101001000111001101011010101101001 00101001110110100100001110101101101001110101100110 01110011110110001110011010111001110011110010100111
Which is the unfair one (or a false record)? In the animation below, x1 denotes the first coin, while x2 is the record of the second coin. The plot in the middle is 1000 simulations from the Binomial distribution with p = 0.5 and size = 1. An equivalent question to the hypothesis test is, which plot looks like the simulation more? Of course we should give a visual definition to “similarity” before comparison. Imagine if you are going to perform a test numerically, which statistic will you choose? For me, at least three options are available:
- Number of heads (or tails): if too many/few heads (or tails) show up, the coin might be unfair
- Maximum run length, i.e. maximum number of successive 0′s or 1′s (e.g. for coin A, there are ten 0′s); don’t take it for granted that ten successive 0′s is a rare event in 200 tosses — the probability is not 0.5^10; if the run length is too long or too short, we may consider the coin as unfair
- Number of changes from 0 to 1 or 1 to 0: if the coin changes too frequently from one side to the other side, it can be regarded as unusual too
Accordingly, we can present these statistics in a visual way. Plot the observed sequences and a simulated sequence as a reference, and compare observed graphs with the reference to see which one is unusual:
- How many points are in the top (equivalently, bottom)
- Length of the longest horizontal segment
- Density of vertical lines
Now watch the Flash animation below [Fullscreen Flash animation]:
Is it clear enough to observe the interaction between z and x from these bubble plots?
If there is no interaction between x and z, the size of bubbles will increase at the same rate while either x or z is fixed.
But when interaction exists, the rate of increasing will be different.
The other day I sent a small assignment to a group of people in order that they could “play” with statistics and become more interested with this subject. The data provided to them is:
Downdload the file here (104K)The data-generating process was quite simple: first I generated 20000 random numbers (10000 rows, 2 columns) from N(0, 1) and then add 10000 rows of numbers which lie exactly on a circle; at last I provided this data in a randomized order so people cannot easily discover the pattern just from the numbers.
The question is, how to reveal the particular pattern in this “pile of sand”? Let’s look at the original plot:

The original scatter plot
What can we observe from this scatter plot? Perhaps nothing but “a pile of sand”. However, if we choose alternative ways to create the plot again, things will be completely different. Here are my approaches:


Recent Comments