<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Statistics, R, Graphics and Fun &#187; Perspective Plot</title>
	<atom:link href="http://yihui.name/en/tag/perspective-plot/feed/" rel="self" type="application/rss+xml" />
	<link>http://yihui.name/en</link>
	<description>Yihui XIE</description>
	<lastBuildDate>Thu, 26 Aug 2010 03:32:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>To See A Circle in A Pile of Sand</title>
		<link>http://yihui.name/en/2008/09/to-see-a-circle-in-a-pile-of-sand/</link>
		<comments>http://yihui.name/en/2008/09/to-see-a-circle-in-a-pile-of-sand/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 06:30:53 +0000</pubDate>
		<dc:creator>Yihui Xie</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[R Graphics]]></category>
		<category><![CDATA[2D Kernel Density]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Perspective Plot]]></category>
		<category><![CDATA[Point Characters]]></category>
		<category><![CDATA[R Language]]></category>
		<category><![CDATA[Scatter Plot]]></category>
		<category><![CDATA[Subset]]></category>
		<category><![CDATA[Transparent Colors]]></category>

		<guid isPermaLink="false">http://yihui.name/en/?p=51</guid>
		<description><![CDATA[The other day I sent a small assignment to a group of people in order that they could &#8220;play&#8221; with statistics and become more interested with this subject. The data provided to them is: The data-generating process was quite simple: first I generated 20000 random numbers (10000 rows, 2 columns) from N(0, 1) and then [...]]]></description>
			<content:encoded><![CDATA[<p>The other day I sent a small assignment to a group of people in order that they could &#8220;play&#8221; with statistics and become more interested with this subject. The data provided to them is:</p>
<span class="download"><a href="http://yihui.name/cn/wp-content/uploads/1220370054_0.zip">Downdload the file here</a> (104K)</span>
<p>The data-generating process was quite simple: first I generated 20000 random numbers (10000 rows, 2 columns) from <code>N(0, 1)</code> and then add 10000 rows of numbers which lie exactly on a circle; at last I provided this data in a randomized order so people cannot easily discover the pattern just from the numbers.</p>
<p>The question is, how to reveal the particular pattern in this &#8220;pile of sand&#8221;? Let&#8217;s look at the original plot:</p>
<div class="wp-caption aligncenter" style="width: 460px"><img style="border: 0pt none;" title="The original scatter plot" src="http://yihui.name/en/wp-content/uploads/1220683692_0.png" border="0" alt="The original scatter plot" width="450" height="450" /><p class="wp-caption-text">The original scatter plot</p></div>
<p>What can we observe from this scatter plot? Perhaps nothing but &#8220;a pile of sand&#8221;. However, if we choose alternative ways to create the plot again, things will be completely different. Here are my approaches:<span id="more-52"></span></p>
<h1>1. Use Semi-transparent Colors</h1>
<p>Actually there are 10000 points lying on the circle, so the critical problem is &#8220;overlapping&#8221;. In order to show the degree of overlapping, we can use semi-transparent colors, because the color will be more opaque if there are many points at the same place.</p>
<div class="wp-caption aligncenter" style="width: 460px"><img style="border: 0pt none;" title="Transparent Colors" src="http://yihui.name/en/wp-content/uploads/1220684290_0.png" border="0" alt="Transparent Colors" width="450" height="450" /><p class="wp-caption-text">Transparent Colors</p></div>
<h1>2. Set Axes Limits</h1>
<p>If we look &#8220;closer&#8221; into the plot, the scene will also be different. For example, we only plot the data in the range [-1, 1].</p>
<div class="wp-caption aligncenter" style="width: 460px"><img style="border: 0pt none;" title="Set Axes Limits" src="http://yihui.name/en/wp-content/uploads/1220684947_0.png" border="0" alt="Set Axes Limits" width="450" height="450" /><p class="wp-caption-text">Set Axes Limits</p></div>
<h1>3. Plot with Smaller Point Symbols</h1>
<p>Certainly, small symbols can prevent overlapping effectively in this case.</p>
<div class="wp-caption aligncenter" style="width: 460px"><img style="border: 0pt none;" title="Plot with small symbols" src="http://yihui.name/en/wp-content/uploads/1220685328_0.png" border="0" alt="Plot with small symbols" width="450" height="450" /><p class="wp-caption-text">Plot with small symbols</p></div>
<h1>4. Draw A Subset of the Data</h1>
<p>As the problem is that there are too many data points, why not draw a subset and try a scatter plot first? For example, here we have sampled 1000 rows of data and the plot is like this:</p>
<div class="wp-caption aligncenter" style="width: 460px"><img style="border: 0pt none;" title="A subset of the original data" src="http://yihui.name/en/wp-content/uploads/1220685596_0.png" border="0" alt="A subset of the original data" width="450" height="450" /><p class="wp-caption-text">A subset of the original data</p></div>
<h1>5. Estimate the 2D Density</h1>
<p>The R package <code>KernSmooth</code> has provided functions to estimate the 1D or 2D density.We can further examine the shape of this 2D density using the package <code>rgl</code>. Here is an animation recorded to illustrate the 2D density.</p>
<div style="text-align:center"><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="500" height="465" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=4745847&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="500" height="465" src="http://vimeo.com/moogaloop.swf?clip_id=4745847&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object><br />
<a href="http://vimeo.com/4745847">Density surface of 2D variables</a> from <a href="http://vimeo.com/yihui">Yihui Xie</a> on Vimeo.</div>
<p>The R code for the above plots &amp; animation is as follows:</p>
<pre># read the data
x = read.csv("data.csv")
par(ask = TRUE)
# original plot
plot(x)
# transparent colors (alpha = 0.1)
plot(x, col = rgb(0, 0, 0, 0.1))
# set axes lmits
plot(x, xlim = c(-1, 1), ylim = c(-1, 1))
# small symbols
plot(x, pch = ".")
# subset
plot(x[sample(nrow(x), 1000), ])
# 2D density estimation
library(KernSmooth)
fit = bkde2D(as.matrix(x), dpik(as.matrix(x)))
# perspective plot by persp()
persp(fit$x1, fit$x2, fit$fhat)
library(rgl)
# perspective plot by OpenGL
rgl.surface(fit$x1, fit$x2, 0.01 * fit$fhat)
# animation
M = par3d("userMatrix")
movie3d(par3dinterp(userMatrix = list(M, rotate3d(M,
   pi/2, 1, 0, 0), rotate3d(M, pi/2, 0, 1, 0), rotate3d(M, pi,
   0, 0, 1))), duration = 20, fps = 10)</pre>
<h2  class="related_post_title">Related Posts</h2><ul class="related_post"><li><a href="http://yihui.name/en/2009/10/50000-revisions-committed-to-r/" title="50000 Revisions Committed to R">50000 Revisions Committed to R</a></li><li><a href="http://yihui.name/en/2007/12/make-optical-illusions-in-r-graphics-system/" title="Make Optical Illusions in R Graphics System">Make Optical Illusions in R Graphics System</a></li><li><a href="http://yihui.name/en/2007/10/animations-in-survey-sampling/" title="Animations in Survey Sampling">Animations in Survey Sampling</a></li><li><a href="http://yihui.name/en/2007/09/process-of-minimization-by-gradient-descent/" title="Process of Minimization by Gradient Descent (2D)">Process of Minimization by Gradient Descent (2D)</a></li><li><a href="http://yihui.name/en/2010/08/auto-completion-in-notepad-for-r-script/" title="Auto-completion in Notepad++ for R Script">Auto-completion in Notepad++ for R Script</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://yihui.name/en/2008/09/to-see-a-circle-in-a-pile-of-sand/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
