<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Statistics, R, Graphics and Fun &#187; Strip Chart</title>
	<atom:link href="http://yihui.name/en/tag/strip-chart/feed/" rel="self" type="application/rss+xml" />
	<link>http://yihui.name/en</link>
	<description>Yihui XIE</description>
	<lastBuildDate>Thu, 26 Aug 2010 03:32:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>R Tips in Stat 511</title>
		<link>http://yihui.name/en/2010/03/r-tips-in-stat-511/</link>
		<comments>http://yihui.name/en/2010/03/r-tips-in-stat-511/#comments</comments>
		<pubDate>Tue, 23 Mar 2010 05:03:57 +0000</pubDate>
		<dc:creator>Yihui Xie</dc:creator>
				<category><![CDATA[R Computation]]></category>
		<category><![CDATA[R Graphics]]></category>
		<category><![CDATA[R Programming]]></category>
		<category><![CDATA[Theories]]></category>
		<category><![CDATA[F distribution]]></category>
		<category><![CDATA[Format]]></category>
		<category><![CDATA[formatR]]></category>
		<category><![CDATA[fractions()]]></category>
		<category><![CDATA[Hypothesis Test]]></category>
		<category><![CDATA[jitter()]]></category>
		<category><![CDATA[Linear Models]]></category>
		<category><![CDATA[Power]]></category>
		<category><![CDATA[R code]]></category>
		<category><![CDATA[read.table()]]></category>
		<category><![CDATA[Stat 511]]></category>
		<category><![CDATA[Strip Chart]]></category>
		<category><![CDATA[stripchart()]]></category>
		<category><![CDATA[tidy.source()]]></category>
		<category><![CDATA[unname()]]></category>

		<guid isPermaLink="false">http://yihui.name/en/?p=458</guid>
		<description><![CDATA[Here are some (trivial) R tips in the course Stat 511. I&#8217;ll update this post till the semester is over. Formatting R Code Reading code is pain, but the well-formatted code might alleviate the pain a little bit. The function tidy.source() in the animation package can help us format our R code automatically. By default [...]]]></description>
			<content:encoded><![CDATA[<p>Here are some (trivial) R tips in the course Stat 511. I&#8217;ll update this post till the semester is over.</p>
<ol>
<li>
<h2>Formatting R Code</h2>
</li>
<span class="notice">I&#8217;ve submitted an R package named <code>formatR</code> to CRAN yesterday. This package should be easier than the code below, because there is a GUI to tidy your R code. Install with <code>install.packages('formatR')</code>.</span>
<p>Reading code is pain, but the well-formatted code might alleviate the pain a little bit. The function <code>tidy.source()</code> in the <code>animation</code> package can help us format our R code automatically. By default it will read your code in the clipboard, parse it and return the well-formatted code. You have options to keep or remove the comments/blank lines and set the width of the code, etc. Spaces and indent will be added automatically. This can save us time typing spaces and paying attention to indent.</p>
<pre>## install.packages('animation') if it is not installed yet
library(animation)
## copy some R code somewhere and type:
tidy.source()
## or specify the path of your code file
tidy.source(file.path(system.file(package = "graphics"), "demo", "image.R"))
## can also use a URL
tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R')
## remove blank lines
tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
           keep.blank.line = FALSE)
## remove comments
tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
           keep.comment = FALSE)
</pre>
<p><span id="more-458"></span></p>
<li>
<h2>Approximating Rationals by Fractions</h2>
<p>We often deal with matrices like <img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Creverse%20C%28X%27X%29%5E%7B-1%7DX%27" title="C(X&#039;X)^{-1}X&#039;" alt="C(X&#039;X)^{-1}X&#039;" align="absmiddle" /> in 511 and may wonder what on earth they are. If we directly compute <code>solve(t(X)%*%X)%*%t(X)</code> (or generalized inverse <code>ginv()</code> in <code>MASS</code>) we often end up with seeing a lot of decimals, which makes it difficult to see what these numbers really mean. The function <code>fractions()</code> in the <code>MASS</code> package can approximate rationals by fractions. For example:</p>
<pre>## from the movie rating example
X = matrix(c(1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
    0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1,
    1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
    0, 1, 0, 1, 0), byrow = T, nrow = 7)
XX = t(X) %*% X
library(MASS)
XXgi = ginv(XX)
C = matrix(c(1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
    0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0,
    1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
    1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
    0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1,
    0, 0, 0, 1, 0, 0, 1), byrow = T, nrow = 12)
## what does C(X'X)^{-}X' mean?
#   hard to see
C %*% XXgi %*% t(X)
#               [,1]          [,2]          [,3]          [,4]
# [1,]  7.500000e-01  2.500000e-01  2.220446e-16  5.551115e-17
# [2,]  2.500000e-01  7.500000e-01  1.387779e-16  5.551115e-17
# [3,]  2.500000e-01  7.500000e-01 -1.000000e+00  1.000000e+00
# [4,]  5.000000e-01 -5.000000e-01  1.000000e+00  0.000000e+00
# [5,] -1.665335e-16 -5.551115e-17  1.000000e+00 -2.220446e-16
# [6,]  3.330669e-16  1.110223e-16  1.110223e-16  1.000000e+00
# [7,]  5.000000e-01 -5.000000e-01  1.000000e+00 -1.000000e+00
# [8,] -5.551115e-16 -3.330669e-16  1.000000e+00 -1.000000e+00
# [9,] -5.551115e-17 -2.220446e-16 -1.665335e-16  1.110223e-16
#[10,]  2.500000e-01 -2.500000e-01 -1.110223e-16  3.053113e-16
#[11,] -2.500000e-01  2.500000e-01 -2.220446e-16  2.775558e-16
#[12,] -2.500000e-01  2.500000e-01 -1.000000e+00  1.000000e+00
#               [,5]          [,6]          [,7]
# [1,]  2.775558e-17  2.500000e-01 -2.500000e-01
# [2,] -1.665335e-16 -2.500000e-01  2.500000e-01
# [3,] -4.440892e-16 -2.500000e-01  2.500000e-01
# [4,]  4.440892e-16  5.000000e-01 -5.000000e-01
# [5,]  2.220446e-16  0.000000e+00  1.110223e-16
# [6,]  1.110223e-16  0.000000e+00 -2.220446e-16
# [7,]  1.000000e+00  5.000000e-01 -5.000000e-01
# [8,]  1.000000e+00  2.220446e-16  4.440892e-16
# [9,]  1.000000e+00  2.775558e-16  1.110223e-16
#[10,] -5.551115e-17  7.500000e-01  2.500000e-01
#[11,] -2.220446e-16  2.500000e-01  7.500000e-01
#[12,] -6.661338e-16  2.500000e-01  7.500000e-01

#   much easier using fractions
fractions(C %*% XXgi %*% t(X))
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]  3/4  1/4    0    0    0  1/4 -1/4
# [2,]  1/4  3/4    0    0    0 -1/4  1/4
# [3,]  1/4  3/4   -1    1    0 -1/4  1/4
# [4,]  1/2 -1/2    1    0    0  1/2 -1/2
# [5,]    0    0    1    0    0    0    0
# [6,]    0    0    0    1    0    0    0
# [7,]  1/2 -1/2    1   -1    1  1/2 -1/2
# [8,]    0    0    1   -1    1    0    0
# [9,]    0    0    0    0    1    0    0
#[10,]  1/4 -1/4    0    0    0  3/4  1/4
#[11,] -1/4  1/4    0    0    0  1/4  3/4
#[12,] -1/4  1/4   -1    1    0  1/4  3/4
</pre>
</li>
<li>
<h2>Jittered Strip Chart</h2>
</li>
<p>Strip chart is a common tool for batch comparisons. When points get overlapped in the plot, we may &#8220;jitter&#8221; the points by adding a little noise to the data. The R function <code>jitter()</code> is an option to manipulate the data, but <code>stripchart()</code> already supports jittered points.</p>
<pre>## some people do not realize that the 'colClasses' argument in
#      read.table() is quite useful -- can avoid explicit conversion
d = read.table("http://dnett.public.iastate.edu/S511/SeedlingDryWeight2.txt",
    header = TRUE, colClasses = c("factor", "factor", "factor",
        "numeric"))
## R base graphics: method = 'jitter' will do
stripchart(SeedlingWeight ~ Tray, data = d, method = "jitter",
    pch = 20, panel.first = grid())
## or the ggplot2 version: geom = 'jitter'
library(ggplot2)
qplot(Tray, SeedlingWeight, data = d, colour = Genotype, geom = "jitter")
</pre>
<div id="attachment_461" class="wp-caption aligncenter" style="width: 510px"><a href="http://yihui.name/en/wp-content/uploads/2010/03/jittered-stripchart1.png"><img class="size-full wp-image-461" title="Jittered Strip Chart by stripchart()" src="http://yihui.name/en/wp-content/uploads/2010/03/jittered-stripchart1.png" alt="Jittered Strip Chart by stripchart()" width="500" height="350" /></a><p class="wp-caption-text">Jittered Strip Chart by stripchart()</p></div>
<div id="attachment_462" class="wp-caption aligncenter" style="width: 510px"><a href="http://yihui.name/en/wp-content/uploads/2010/03/jittered-stripchart2.png"><img class="size-full wp-image-462" title="Jittered Strip Chart by ggplot2" src="http://yihui.name/en/wp-content/uploads/2010/03/jittered-stripchart2.png" alt="Jittered Strip Chart by ggplot2" width="500" height="350" /></a><p class="wp-caption-text">Jittered Strip Chart by ggplot2</p></div>
<li>
<h2>Testing <img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Creverse%20C%5Cbeta%3Dd" title="C\beta=d" alt="C\beta=d" align="absmiddle" /> in a Linear Model</h2>
</li>
<p>R base does not provide a general test for the coefficients of a linear model, but we can use the function <code>glh.test()</code> in the <code>gmodels</code> package to do it. If you take a look at its source code, you will find unsurprisingly it is nothing but the code in page 7 of slide set 9 of Dr Nettleton&#8217;s lecture notes.</p>
<pre>library(gmodels)
time = factor(rep(c(3, 6), each = 5))
temp = factor(rep(c(20, 30, 20, 30), c(2, 3, 4, 1)))
y = c(2, 5, 9, 12, 15, 6, 6, 7, 7, 16)
d = data.frame(time, temp, y)
o = lm(y ~ time + temp + time:temp, data = d)

## compare with page 7-11 in slide set 9

Ctime = matrix(c(0, 1, 0, 0.5), nrow = 1, byrow = T)
glh.test(o, Ctime)

#           Test of General Linear Hypothesis
#  Call:
#  glh.test(reg = o, cm = Ctime)
#  F = 6.0051, df1 = 1, df2 = 6, p-value = 0.04975 

Ctemp = matrix(c(0, 0, 1, 0.5), nrow = 1, byrow = T)
glh.test(o, Ctemp)

#           Test of General Linear Hypothesis
#  Call:
#  glh.test(reg = o, cm = Ctemp)
#  F = 39.7072, df1 = 1, df2 = 6, p-value = 0.0007447 

Ctimetempint = matrix(c(0, 0, 0, 1), nrow = 1, byrow = T)
glh.test(o, Ctimetempint)

#           Test of General Linear Hypothesis
#  Call:
#  glh.test(reg = o, cm = Ctimetempint)
#  F = 0.1226, df1 = 1, df2 = 6, p-value = 0.7382 

Coverall = matrix(c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
 0, 1), nrow = 3, byrow = T)
glh.test(o, Coverall) 

#           Test of General Linear Hypothesis
#  Call:
#  glh.test(reg = o, cm = Coverall)
#  F = 13.5319, df1 = 3, df2 = 6, p-value = 0.004439
</pre>
<li>
<h2>Demo for the F Distribution</h2>
</li>
<p>I created a dynamic demo to illustrate the power of the F test here: <a href="http://yihui.name/en/2010/04/demonstrating-the-power-of-f-test-with-gwidgets/">Demonstrating the Power of F Test with gWidgets</a>. Play with it and have fun!</p>
<li>
<h2>Tricks in <code>read.table()</code></h2>
</li>
<p>Many people do not realize the possibility of converting the data types of columns in <code>read.table()</code> and always use such specific <em>post hoc</em> conversion:</p>
<pre>soup = read.table("http://www.public.iastate.edu/~dnett/S511/soup.txt",
    TRUE)
soup$taster = factor(soup$taster)
soup$batch = factor(soup$batch)
soup$recipe = factor(soup$recipe)
soup$tasteorder = factor(soup$tasteorder)</pre>
<p>But in fact, we can specify the types of columns while reading data:</p>
<pre>## we know the first 4 are factors and the last one is numeric:
soup = read.table("http://www.public.iastate.edu/~dnett/S511/soup.txt",
    TRUE, <strong>colClasses</strong> = c(rep("factor", 4), "numeric"))
## conversion already done!
&gt; str(soup)
'data.frame':   72 obs. of  5 variables:
 $ recipe    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 2 2 2 2 ...
 $ batch     : Factor w/ 12 levels "1","10","11",..: 1 1 1 1 1 1 5 5 5 5 ...
 $ taster    : Factor w/ 24 levels "1","10","11",..: 1 12 18 19 20 21 1 12 18 19 ...
 $ tasteorder: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3 2 3 1 3 ...
 $ y         : num  3 5 6 4 4 3 6 9 6 7 ...
</pre>
<p>There are other tips in <code>read.table()</code> but I find this one the most useful. Check the 22 arguments in <code>?read.table</code> if you want to know more magic (e.g. how to specify the first column in the data file as the row names).</p>
<li>
<h2>Demo for Newton&#8217;s Method</h2>
</li>
<p>There is a function <code>newton.method()</code> in the package animation which shows the detailed iterations in Newton&#8217;s method. Here is a demo:</p>
<pre>library(animation)
par(pch = 20)
ani.options(nmax = 50)
newton.method(function(x) 5 * x^3 - 7 * x^2 - 40 *
    x + 100, 7.15, c(-6.2, 7.1))
</pre>
<div id="attachment_510" class="wp-caption aligncenter" style="width: 490px"><a href="http://yihui.name/en/wp-content/uploads/2010/03/newtons-method-root-finding.gif"><img class="size-full wp-image-510" title="Newton-Raphson Method for Root-finding" src="http://yihui.name/en/wp-content/uploads/2010/03/newtons-method-root-finding.gif" alt="Newton-Raphson Method for Root-finding" width="480" height="480" /></a><p class="wp-caption-text">Newton-Raphson Method for Root-finding</p></div>
<p>I hope this is useful for understanding iterative algorithms.</p>
<li>
<h2>Misc Tips</h2>
<p>Some little tips:</p>
<ol>
<li><code>unname()</code>: to remove the names of objects</li>
<pre>&gt; x = c(a = 1, b = 2)
&gt; x
a b
1 2
&gt; unname(x)  ## x = unname(x) if one wants to replace x
[1] 1 2
</pre>
</ol>
</li>
</ol>
<h2  class="related_post_title">Related Posts</h2><ul class="related_post"><li><a href="http://yihui.name/en/2010/04/formatr-farewell-to-ugly-r-code/" title="formatR: farewell to ugly R code">formatR: farewell to ugly R code</a></li><li><a href="http://yihui.name/en/2010/04/demonstrating-the-power-of-f-test-with-gwidgets/" title="Demonstrating the Power of F Test with gWidgets">Demonstrating the Power of F Test with gWidgets</a></li><li><a href="http://yihui.name/en/2009/03/enhanced-tidy-source/" title="Enhanced tidy.source() (Preserve Some Comments)">Enhanced tidy.source() (Preserve Some Comments)</a></li><li><a href="http://yihui.name/en/2009/09/how-to-import-ms-excel-data-into-r/" title="How to Import MS Excel Data into R">How to Import MS Excel Data into R</a></li><li><a href="http://yihui.name/en/2008/09/eyeball-test-to-identify-an-unfair-coin/" title="&#8220;Eyeball Test&#8221; to Identify An Unfair Coin (or A False Record)">&#8220;Eyeball Test&#8221; to Identify An Unfair Coin (or A False Record)</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://yihui.name/en/2010/03/r-tips-in-stat-511/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
