Statistics
Numbers that can either reveal truth or tell lies
HTH and HTT respectively? (For example, for the sequence HHTH, the number for HTH to appear is 4, and in THTHTT, the number for HTT is 6.)
It seems that the two results are equivalent, as H and T occurs with equal probability 0.5, so we naturally believe the average numbers of steps to HTH and HTT are the same, but the fact is not as we imagined.
## smart guys use math formulae to solve the problem,
## but *lazy* guys like me use simulations with R
coin.seq = function(v) {
x = NULL
n = 0
while (!identical(x, v)) {
x = append(x[length(x) - 1:0], rbinom(1, 1, 0.5))
n = n + 1
}
n
}
set.seed(919)
mean(htt <- replicate(1e+05, coin.seq(c(1, 0, 0))))
# [1] 8.00304
mean(hth <- replicate(1e+05, coin.seq(c(1, 0, 1))))
# [1] 10.0062
png("coin-htt-hth.png", height = 150, width = 500)
par(mar = c(3, 2.5, 0.1, 0), mgp = c(2, 0.8, 0))
boxplot(list(HTT = htt, HTH = hth), horizontal = T,
xlab = "n", ylim = range(boxplot(list(HTT = htt, HTH = hth),
plot = FALSE)$stats))
points(c(mean(htt), mean(hth)), 1:2, pch = 19)
dev.off()
The answer is counterintuitive, isn’t it?
Well, mathematicians certainly do not like my solution (I guess they even hate such an imprecise approach). I hope some smart guys can give me some hints on working out the probability distribution and hence the expectation.
Is it clear enough to observe the interaction between z and x from these bubble plots?
If there is no interaction between x and z, the size of bubbles will increase at the same rate while either x or z is fixed.
But when interaction exists, the rate of increasing will be different.
There was something wrong with the wheel of my mouse these days: it could not control the scrolling of a page very well. Today I found a screwdriver and opened the mouse. Then I knew the reason: there was a lot of dirt beneath the cover!
However, obviously the dirt would not follow a B(n, p = 0.5) distribution. The two proportions of the dirt in front and back of the wheel were not equal — there was much more dirt in the back. Why?
The Glivenko-Cantelli Theorem tells us that the empirical distribution converges to the true distribution almost surely.
So what? The proportions of the dirt has revealed my habit of using the mouse, as I have used it for too many times. What I usually do is to scroll down a page, and rarely will I scroll up — that is the real distribution, and the dirt in the mouse has recorded the empirical distribution day by day. If I am serious enough, I can get the numbers of the weight and estimate the parameter p in the Binomial distribution. (Perhaps Michal will again suggest me do this.
)
I just found this page for the book "Handbook of Computational Statistics" by chance:
http://math.u-bourgogne.fr/monge/bibliotheque/ebooks/csa/start.html
$300 is too expensive for me…
Personally I don’t believe structural equation models (SEM) at all. I have been fighting against this modeling technique for quite a long time, but it seems that SEM is becoming more and more popular outside the discipline of statistics. It’s indeed strange. Yesterday I received an email asking me about the SEM, and I could not help being angry at this model to some extent. As a result, I listed some reasons for my objection to such a kind of modeling in my reply as follows:
Hi,
The most obvious disadvantage, I believe, just lies in the critical modeling technique it adopted at the beginning: constructing models to fit the sample COVARIANCE instead of the sample values themselves. I insist that such an idea has almost destroyed our valuable data. You know, the covariance is just one of the TOO MANY characteristics of the data, therefore if we only focus on this simple information (covariance), other information will be dropped (e.g. mean, kurtosis, skewness, …). And a very natural question is, what does the covariance stand for? Can it represent all of the original information? The answer is certainly NO!
The second disadvantage in my eyes is the complexity of this model. In statistics, rarely have I ever met models more complex than SEMs. Even the simplest SEMs will include tens of parameters, as there are several parameter matrices in a SEM. The direct consequence it brings is the computational complexity. You can easily calculate the minimum of f(x)=x^2+1, but do you know how to calculate the minimum of f(a,b,c,d,e,f,g,…)=a*b^4/(2*c+d^14*a)-f/g*c^d+…? Actually the target function for SEM to optimize is much more complex than this one! It involves with the multiplication, inverse and determinant of huge matrices… Just tell me, can you trust the software to correctly find the GLOBAL optimum for such a complex function? Personally I cannot, as I know there are too many “stories” behind this problem. It’s very likely that your software only tells you a LOCAL optimum WITHOUT warnings.
The third reason comes from a philosophy: you may regard SEM simply as a process of hypothesis testing. I think you surely know the null hypothesis. In the end, you can ONLY REJECT your model but you can NEVER accept it (or say, ah, my model is correct!). In other words, you can never find the truth, although there are many measures (Chi-square, GFI, …) to tell you how “good” you models are. The basic philosophy of hypothesis testing in statistics is that null hypothesis can only be rejected (because in most cases you can only know the risk of rejecting; you cannot know the risk of accepting it). If we are unable to accept a SEM, why bother to construct it? (If you declare this SEM is correct, other people can naturally doubt whether there are other alternative models.)
I spent about only half a year on SEM, so my words might be incorrect. My advice is never believe anyone (including me) unless you really understand it, especially the mathematical and statistical theories! I really hate those textbooks avoiding maths, because to some degree, they are just cheating the readers by hiding the most important part of a technique.
There are still a couple of reasons but I don’t have enough time to list all of them out here.
You may judge from my above words that I have been fighting against this model for quite a long time. Once a statistician said, “All models are wrong, but some models are useful”, and I’d like to say that SEM is surely not among these “some models”. I’d be very glad to hear from you about the “successful” applications of SEM if you have any cases in this aspect. I’ve inquired such cases of many people who wrote me emails asking about the SEM but I have not got any till today, which ensured me even more of my impression that SEM is a useless technique.
Regards,
Yihui
Of course it is NOT the obligation for statistics (or statisticians) to draw conclusions.
Statistics helps us to understand facts (approximately, in most cases), and the result might be "insignificant" or "unstable", etc. I believe most people hate such results, as they cannot draw conclusions as usual.
A stranger sent me an email asking about an unstable result from his/her K-Means cluster analysis, and this guy wanted to know a "good" method to choose the initial centers so that the cluster membership would be stable, besides, he/she also wondered which result was "correct", as there were many results according to different initial centers.
My answer to such a kind of questions is just the title for this blog. No matter how eagerly you want a "beautiful" result, please make sure you understand statistics first. "Insignificant" or "unstable" results are ALSO results, although they look different from what you read in most papers or textbooks. "Unstable" cluster membership just indicated that individuals in your sample were actually NOT clustered from the viewpoint of K-Means cluster algorithm. Who can deny this is NOT also a conclusion?
There are some scholars making research on “indices” this year, and a few of them are my teachers. The day before yesterday I saw there was a press conference about a development index of China, which triggered one of my old ideas on the function of statistics (analysis). I believe two conditions will make the work of statistics meaningless:
- If we do nothing but just rank our samples; i.e. which is the first, and which is the last…
- If our conclusions are “axioms“; i.e. anybody can know our conclusions without statistical analysis.
Unfortunately, what I read from the news seemed to satisfy these two conditions.
I used to try hard to remember the pronunciations of English words by looking up the phonetic symbols over and over again in a dictionary, which was indeed quite inefficient. This evening I read a post about learning English, and one of the viewpoints of the author was that one should remember the pronunciations via listening instead of phonetic symbols. I believed he was right after I corrected one of the confusing pronunciations of one word “specify” ([e] or for the “e”) from the VOA radio programs.
Today I read a Chinese paper about the “Function Data Analysis” and I was greatly amazed at what he described in the paper. Currently I’m not familiar with functional data at all, but he told us such a kind of “data” was just the result of applying a (some) smoothing function(s) to the original discrete observations, so the sample points became continuous (actually they became functions). These smoothing functions might either be Fourier transformations or B-splines.
I wonder whether there are some rules about the choice of smoothing functions, because if there aren’t any, the functional data will be rather free, and I cannot believe such a kind of data can really represent information behind these discrete data points: who knows what happens between two observations?! Only my naive ideas…
Yesterday I read a quotation from "Quotes of the Day" which by accident expressed a rule for statistical modelling:
… "I never guess. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."…
–Sir Arthur Conan Doyle
This is interesting. It just indicated a rule of ethics: be honest and don't alter your data to "fit" your "marvelous" models.



Recent Comments