Theories

Math, formulae, pages of Greek letters

Apr 032010

We know the real distribution of the F statistic in linear models — it is a non-central F distribution. Under H0, we have a central F distribution. Given 1 – α, we can compute the probability of (correctly) rejecting H0. I created a simple demo to illustrate how the power changes as other parameters vary, e.g. the degrees of freedoms, the non-central parameter and alpha. Here is the video:

The Power of F Test

And for those who might be interested, here is the code (you need to install the gWidgets package first and I recommend the RGtk2 interface). Have fun:

Mar 242010
Amber Watkins gave me a suggestion on the animation for the ratio estimation, and I think this is a good topic for my animation package. I’ve finished writing the initial version of the function sample.ratio() for this package, which will appear in the version 1.1-2 a couple of days later.

As we know, the benefit of ratio estimation is that sampling skewness may be adjusted for, because the estimation of \bar{Y} will make use of the information in the relationship of X and Y: \bar{X} \cdot (\bar{y}/\bar{x}). Here is a demo (we can see the ratio estimate, denoted by the red line, generally performs better than \bar{y}):

An animation demo for the ratio estimation

An animation demo for the ratio estimation

Mar 232010

Here are some (trivial) R tips in the course Stat 511. I’ll update this post till the semester is over.

  1. Formatting R Code

  2. I’ve submitted an R package named formatR to CRAN yesterday. This package should be easier than the code below, because there is a GUI to tidy your R code. Install with install.packages('formatR').

    Reading code is pain, but the well-formatted code might alleviate the pain a little bit. The function tidy.source() in the animation package can help us format our R code automatically. By default it will read your code in the clipboard, parse it and return the well-formatted code. You have options to keep or remove the comments/blank lines and set the width of the code, etc. Spaces and indent will be added automatically. This can save us time typing spaces and paying attention to indent.

    ## install.packages('animation') if it is not installed yet
    library(animation)
    ## copy some R code somewhere and type:
    tidy.source()
    ## or specify the path of your code file
    tidy.source(file.path(system.file(package = "graphics"), "demo", "image.R"))
    ## can also use a URL
    tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R')
    ## remove blank lines
    tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
               keep.blank.line = FALSE)
    ## remove comments
    tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
               keep.comment = FALSE)
    
Aug 292009
Coin-flipping is a rather old topic in probability theory, so most of us think we know very well about it, however, the other day I saw a question about this old topic (in David Smith’s REvolution?) which was beyond me expectation: how many times do we need to flip the coin until we get a sequence of HTH and HTT respectively? (For example, for the sequence HHTH, the number for HTH to appear is 4, and in THTHTT, the number for HTT is 6.)

It seems that the two results are equivalent, as H and T occurs with equal probability 0.5, so we naturally believe the average numbers of steps to HTH and HTT are the same, but the fact is not as we imagined.

## smart guys use math formulae to solve the problem,
## but *lazy* guys like me use simulations with R
coin.seq = function(v) {
    x = NULL
    n = 0
    while (!identical(x, v)) {
        x = append(x[length(x) - 1:0], rbinom(1, 1, 0.5))
        n = n + 1
    }
    n
}
set.seed(919)
mean(htt <- replicate(1e+05, coin.seq(c(1, 0, 0))))
# [1] 8.00304
mean(hth <- replicate(1e+05, coin.seq(c(1, 0, 1))))
# [1] 10.0062

png("coin-htt-hth.png", height = 150, width = 500)
par(mar = c(3, 2.5, 0.1, 0), mgp = c(2, 0.8, 0))
boxplot(list(HTT = htt, HTH = hth), horizontal = T,
    xlab = "n", ylim = range(boxplot(list(HTT = htt, HTH = hth),
        plot = FALSE)$stats))
points(c(mean(htt), mean(hth)), 1:2, pch = 19)
dev.off()

The answer is counterintuitive, isn’t it?

Number of times needed to get HTT and HTH

Number of times needed to get HTT and HTH (bold segments are for median; dots denote mean)

Well, mathematicians certainly do not like my solution (I guess they even hate such an imprecise approach). I hope some smart guys can give me some hints on working out the probability distribution and hence the expectation.

Sep 182008

Is it clear enough to observe the interaction between z and x from these bubble plots?

No interaction

If there is no interaction between x and z, the size of bubbles will increase at the same rate while either x or z is fixed.

Positive Interaction

But when interaction exists, the rate of increasing will be different.

Jun 152008

There was something wrong with the wheel of my mouse these days: it could not control the scrolling of a page very well. Today I found a screwdriver and opened the mouse. Then I knew the reason: there was a lot of dirt beneath the cover!

However, obviously the dirt would not follow a B(n, p = 0.5) distribution. The two proportions of the dirt in front and back of the wheel were not equal — there was much more dirt in the back. Why?

The Glivenko-Cantelli Theorem tells us that the empirical distribution converges to the true distribution almost surely.

So what? The proportions of the dirt has revealed my habit of using the mouse, as I have used it for too many times. What I usually do is to scroll down a page, and rarely will I scroll up — that is the real distribution, and the dirt in the mouse has recorded the empirical distribution day by day. If I am serious enough, I can get the numbers of the weight and estimate the parameter p in the Binomial distribution. (Perhaps Michal will again suggest me do this. :grin: )

May 132008

I just found this page for the book "Handbook of Computational Statistics" by chance:

http://math.u-bourgogne.fr/monge/bibliotheque/ebooks/csa/start.html

$300 is too expensive for me… 

Jan 112008

Personally I don’t believe structural equation models (SEM) at all. I have been fighting against this modeling technique for quite a long time, but it seems that SEM is becoming more and more popular outside the discipline of statistics. It’s indeed strange. Yesterday I received an email asking me about the SEM, and I could not help being angry at this model to some extent. As a result, I listed some reasons for my objection to such a kind of modeling in my reply as follows:

Hi,

The most obvious disadvantage, I believe, just lies in the critical modeling technique it adopted at the beginning: constructing models to fit the sample COVARIANCE instead of the sample values themselves. I insist that such an idea has almost destroyed our valuable data. You know, the covariance is just one of the TOO MANY characteristics of the data, therefore if we only focus on this simple information (covariance), other information will be dropped (e.g. mean, kurtosis, skewness, …). And a very natural question is, what does the covariance stand for? Can it represent all of the original information? The answer is certainly NO!

The second disadvantage in my eyes is the complexity of this model. In statistics, rarely have I ever met models more complex than SEMs. Even the simplest SEMs will include tens of parameters, as there are several parameter matrices in a SEM. The direct consequence it brings is the computational complexity. You can easily calculate the minimum of f(x)=x^2+1, but do you know how to calculate the minimum of f(a,b,c,d,e,f,g,…)=a*b^4/(2*c+d^14*a)-f/g*c^d+…? Actually the target function for SEM to optimize is much more complex than this one! It involves with the multiplication, inverse and determinant of huge matrices… Just tell me, can you trust the software to correctly find the GLOBAL optimum for such a complex function? Personally I cannot, as I know there are too many “stories” behind this problem. It’s very likely that your software only tells you a LOCAL optimum WITHOUT warnings.

The third reason comes from a philosophy: you may regard SEM simply as a process of hypothesis testing. I think you surely know the null hypothesis. In the end, you can ONLY REJECT your model but you can NEVER accept it (or say, ah, my model is correct!). In other words, you can never find the truth, although there are many measures (Chi-square, GFI, …) to tell you how “good” you models are. The basic philosophy of hypothesis testing in statistics is that null hypothesis can only be rejected (because in most cases you can only know the risk of rejecting; you cannot know the risk of accepting it). If we are unable to accept a SEM, why bother to construct it? (If you declare this SEM is correct, other people can naturally doubt whether there are other alternative models.)

I spent about only half a year on SEM, so my words might be incorrect. My advice is never believe anyone (including me) unless you really understand it, especially the mathematical and statistical theories! I really hate those textbooks avoiding maths, because to some degree, they are just cheating the readers by hiding the most important part of a technique.

There are still a couple of reasons but I don’t have enough time to list all of them out here.

You may judge from my above words that I have been fighting against this model for quite a long time. Once a statistician said, “All models are wrong, but some models are useful”, and I’d like to say that SEM is surely not among these “some models”. I’d be very glad to hear from you about the “successful” applications of SEM if you have any cases in this aspect. I’ve inquired such cases of many people who wrote me emails asking about the SEM but I have not got any till today, which ensured me even more of my impression that SEM is a useless technique.

Regards,
Yihui

Dec 202007

Today I read a Chinese paper about the “Function Data Analysis” and I was greatly amazed at what he described in the paper. Currently I’m not familiar with functional data at all, but he told us such a kind of “data” was just the result of applying a (some) smoothing function(s) to the original discrete observations, so the sample points became continuous (actually they became functions). These smoothing functions might either be Fourier transformations or B-splines.

I wonder whether there are some rules about the choice of smoothing functions, because if there aren’t any, the functional data will be rather free, and I cannot believe such a kind of data can really represent information behind these discrete data points: who knows what happens between two observations?! Only my naive ideas…

Oct 292007

During the class of this evening, Dr MA mentioned his experiences about the Arcing algorithm, among which he said that a few years ago he realized a fact that what the computer does in machine learning is just simulating the process in the brain of the human being.

I strongly agree with him in this point, because I always make computer programs by myself to finish some computations or algorithms; I know the difference between the humankind and the computers. Sometimes we don’t need iterations in our brain to classify sample points by some straight lines or curves (or hyperplanes), but computers do need!

Computers seem to be “stupid”, but they “don’t care”; we just need to “teach” them. For machine learning, what we do is to teach machines to learn from their “mistakes” (say, misclassfication). Both Ada-Boost and Arcing are such kind of algorithms.

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie