Theories
Math, formulae, pages of Greek letters
We know the real distribution of the F statistic in linear models — it is a non-central F distribution. Under H0, we have a central F distribution. Given 1 – α, we can compute the probability of (correctly) rejecting H0. I created a simple demo to illustrate how the power changes as other parameters vary, e.g. the degrees of freedoms, the non-central parameter and alpha. Here is the video:
And for those who might be interested, here is the code (you need to install the gWidgets package first and I recommend the RGtk2 interface). Have fun:
animation package. I’ve finished writing the initial version of the function sample.ratio() for this package, which will appear in the version 1.1-2 a couple of days later.
As we know, the benefit of ratio estimation is that sampling skewness may be adjusted for, because the estimation of will make use of the information in the relationship of X and Y:
. Here is a demo (we can see the ratio estimate, denoted by the red line, generally performs better than
):
Here are some (trivial) R tips in the course Stat 511. I’ll update this post till the semester is over.
-
Formatting R Code
I’ve submitted an R package named
formatR to CRAN yesterday. This package should be easier than the code below, because there is a GUI to tidy your R code. Install with install.packages('formatR').
Reading code is pain, but the well-formatted code might alleviate the pain a little bit. The function tidy.source() in the animation package can help us format our R code automatically. By default it will read your code in the clipboard, parse it and return the well-formatted code. You have options to keep or remove the comments/blank lines and set the width of the code, etc. Spaces and indent will be added automatically. This can save us time typing spaces and paying attention to indent.
## install.packages('animation') if it is not installed yet
library(animation)
## copy some R code somewhere and type:
tidy.source()
## or specify the path of your code file
tidy.source(file.path(system.file(package = "graphics"), "demo", "image.R"))
## can also use a URL
tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R')
## remove blank lines
tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
keep.blank.line = FALSE)
## remove comments
tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
keep.comment = FALSE)
HTH and HTT respectively? (For example, for the sequence HHTH, the number for HTH to appear is 4, and in THTHTT, the number for HTT is 6.)
It seems that the two results are equivalent, as H and T occurs with equal probability 0.5, so we naturally believe the average numbers of steps to HTH and HTT are the same, but the fact is not as we imagined.
## smart guys use math formulae to solve the problem,
## but *lazy* guys like me use simulations with R
coin.seq = function(v) {
x = NULL
n = 0
while (!identical(x, v)) {
x = append(x[length(x) - 1:0], rbinom(1, 1, 0.5))
n = n + 1
}
n
}
set.seed(919)
mean(htt <- replicate(1e+05, coin.seq(c(1, 0, 0))))
# [1] 8.00304
mean(hth <- replicate(1e+05, coin.seq(c(1, 0, 1))))
# [1] 10.0062
png("coin-htt-hth.png", height = 150, width = 500)
par(mar = c(3, 2.5, 0.1, 0), mgp = c(2, 0.8, 0))
boxplot(list(HTT = htt, HTH = hth), horizontal = T,
xlab = "n", ylim = range(boxplot(list(HTT = htt, HTH = hth),
plot = FALSE)$stats))
points(c(mean(htt), mean(hth)), 1:2, pch = 19)
dev.off()
The answer is counterintuitive, isn’t it?
Well, mathematicians certainly do not like my solution (I guess they even hate such an imprecise approach). I hope some smart guys can give me some hints on working out the probability distribution and hence the expectation.
Is it clear enough to observe the interaction between z and x from these bubble plots?
If there is no interaction between x and z, the size of bubbles will increase at the same rate while either x or z is fixed.
But when interaction exists, the rate of increasing will be different.
There was something wrong with the wheel of my mouse these days: it could not control the scrolling of a page very well. Today I found a screwdriver and opened the mouse. Then I knew the reason: there was a lot of dirt beneath the cover!
However, obviously the dirt would not follow a B(n, p = 0.5) distribution. The two proportions of the dirt in front and back of the wheel were not equal — there was much more dirt in the back. Why?
The Glivenko-Cantelli Theorem tells us that the empirical distribution converges to the true distribution almost surely.
So what? The proportions of the dirt has revealed my habit of using the mouse, as I have used it for too many times. What I usually do is to scroll down a page, and rarely will I scroll up — that is the real distribution, and the dirt in the mouse has recorded the empirical distribution day by day. If I am serious enough, I can get the numbers of the weight and estimate the parameter p in the Binomial distribution. (Perhaps Michal will again suggest me do this.
)
I just found this page for the book "Handbook of Computational Statistics" by chance:
http://math.u-bourgogne.fr/monge/bibliotheque/ebooks/csa/start.html
$300 is too expensive for me…
Personally I don’t believe structural equation models (SEM) at all. I have been fighting against this modeling technique for quite a long time, but it seems that SEM is becoming more and more popular outside the discipline of statistics. It’s indeed strange. Yesterday I received an email asking me about the SEM, and I could not help being angry at this model to some extent. As a result, I listed some reasons for my objection to such a kind of modeling in my reply as follows:
Hi,
The most obvious disadvantage, I believe, just lies in the critical modeling technique it adopted at the beginning: constructing models to fit the sample COVARIANCE instead of the sample values themselves. I insist that such an idea has almost destroyed our valuable data. You know, the covariance is just one of the TOO MANY characteristics of the data, therefore if we only focus on this simple information (covariance), other information will be dropped (e.g. mean, kurtosis, skewness, …). And a very natural question is, what does the covariance stand for? Can it represent all of the original information? The answer is certainly NO!
The second disadvantage in my eyes is the complexity of this model. In statistics, rarely have I ever met models more complex than SEMs. Even the simplest SEMs will include tens of parameters, as there are several parameter matrices in a SEM. The direct consequence it brings is the computational complexity. You can easily calculate the minimum of f(x)=x^2+1, but do you know how to calculate the minimum of f(a,b,c,d,e,f,g,…)=a*b^4/(2*c+d^14*a)-f/g*c^d+…? Actually the target function for SEM to optimize is much more complex than this one! It involves with the multiplication, inverse and determinant of huge matrices… Just tell me, can you trust the software to correctly find the GLOBAL optimum for such a complex function? Personally I cannot, as I know there are too many “stories” behind this problem. It’s very likely that your software only tells you a LOCAL optimum WITHOUT warnings.
The third reason comes from a philosophy: you may regard SEM simply as a process of hypothesis testing. I think you surely know the null hypothesis. In the end, you can ONLY REJECT your model but you can NEVER accept it (or say, ah, my model is correct!). In other words, you can never find the truth, although there are many measures (Chi-square, GFI, …) to tell you how “good” you models are. The basic philosophy of hypothesis testing in statistics is that null hypothesis can only be rejected (because in most cases you can only know the risk of rejecting; you cannot know the risk of accepting it). If we are unable to accept a SEM, why bother to construct it? (If you declare this SEM is correct, other people can naturally doubt whether there are other alternative models.)
I spent about only half a year on SEM, so my words might be incorrect. My advice is never believe anyone (including me) unless you really understand it, especially the mathematical and statistical theories! I really hate those textbooks avoiding maths, because to some degree, they are just cheating the readers by hiding the most important part of a technique.
There are still a couple of reasons but I don’t have enough time to list all of them out here.
You may judge from my above words that I have been fighting against this model for quite a long time. Once a statistician said, “All models are wrong, but some models are useful”, and I’d like to say that SEM is surely not among these “some models”. I’d be very glad to hear from you about the “successful” applications of SEM if you have any cases in this aspect. I’ve inquired such cases of many people who wrote me emails asking about the SEM but I have not got any till today, which ensured me even more of my impression that SEM is a useless technique.
Regards,
Yihui
Today I read a Chinese paper about the “Function Data Analysis” and I was greatly amazed at what he described in the paper. Currently I’m not familiar with functional data at all, but he told us such a kind of “data” was just the result of applying a (some) smoothing function(s) to the original discrete observations, so the sample points became continuous (actually they became functions). These smoothing functions might either be Fourier transformations or B-splines.
I wonder whether there are some rules about the choice of smoothing functions, because if there aren’t any, the functional data will be rather free, and I cannot believe such a kind of data can really represent information behind these discrete data points: who knows what happens between two observations?! Only my naive ideas…
During the class of this evening, Dr MA mentioned his experiences about the Arcing algorithm, among which he said that a few years ago he realized a fact that what the computer does in machine learning is just simulating the process in the brain of the human being.
I strongly agree with him in this point, because I always make computer programs by myself to finish some computations or algorithms; I know the difference between the humankind and the computers. Sometimes we don’t need iterations in our brain to classify sample points by some straight lines or curves (or hyperplanes), but computers do need!
Computers seem to be “stupid”, but they “don’t care”; we just need to “teach” them. For machine learning, what we do is to teach machines to learn from their “mistakes” (say, misclassfication). Both Ada-Boost and Arcing are such kind of algorithms.




Recent Comments