Sep 062008

The other day I sent a small assignment to a group of people in order that they could “play” with statistics and become more interested with this subject. The data provided to them is:

Downdload the file here (104K)

The data-generating process was quite simple: first I generated 20000 random numbers (10000 rows, 2 columns) from N(0, 1) and then add 10000 rows of numbers which lie exactly on a circle; at last I provided this data in a randomized order so people cannot easily discover the pattern just from the numbers.

The question is, how to reveal the particular pattern in this “pile of sand”? Let’s look at the original plot:

The original scatter plot

The original scatter plot

What can we observe from this scatter plot? Perhaps nothing but “a pile of sand”. However, if we choose alternative ways to create the plot again, things will be completely different. Here are my approaches:

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie