<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nathaniel Johnston &#187; Statistics</title>
	<atom:link href="http://www.njohnston.ca/tag/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.njohnston.ca</link>
	<description>A blog of recreational math and quantum information theory</description>
	<lastBuildDate>Thu, 15 Dec 2011 16:11:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Statistical Analysis of Password Strength via Gawker&#8217;s Leaked Database</title>
		<link>http://www.njohnston.ca/2010/12/statistical-analysis-of-password-strength-via-gawkers-leaked-database/</link>
		<comments>http://www.njohnston.ca/2010/12/statistical-analysis-of-password-strength-via-gawkers-leaked-database/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 03:49:37 +0000</pubDate>
		<dc:creator>Nathaniel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Popular Culture]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.nathanieljohnston.com/?p=1305</guid>
		<description><![CDATA[This past weekend, Gawker Media was hacked and its user account database was leaked online. The database contained about 1.3 million rows of information containing usernames, e-mail addresses, and passwords (encrypted via DES). This security breach is unfortunate for people whose information is contained within that database, but the silver lining is that it provides a [...]]]></description>
			<content:encoded><![CDATA[<p>This past weekend, Gawker Media was hacked and its <a href="http://lifehacker.com/5712785/">user account database was leaked online</a>. The database contained about 1.3 million rows of information containing usernames, e-mail addresses, and passwords (encrypted via <a href="http://en.wikipedia.org/wiki/Data_Encryption_Standard">DES</a>). This security breach is unfortunate for people whose information is contained within that database, but the silver lining is that it provides a rare opportunity for statistics nerds like me to analyze some otherwise completely unobtainable data.</p>
<p>Because the passwords were encrypted using such an out-of-date scheme (tsk, tsk, Gawker), about 200,000 of the passwords contained in the database have been decrypted. Of course, the passwords that were cracked were relatively weak. For example, all 2641 accounts that used some trivial modification of &#8220;password&#8221; or &#8220;querty&#8221; as their password were of course decrypted. In this post I will look at trends in which users&#8217; passwords were cracked to gain insight into which users do and do not create strong passwords.</p>
<p>It should of course be made clear that, because this data comes from a single database, the results that follow may not be representative of the population as a whole, but rather may be skewed by the fact that people with Gawker accounts are generally more &#8220;techy&#8221; than the average internet user.</p>
<h3>Preliminaries: Cleaning Up the Database</h3>
<p>The database of course had to be significantly cleaned before it could be of too much use statistically, so some of the numbers here may differ slightly from the raw numbers you see from news outlets or if you download the raw database yourself. The numbers here are the result of removing any incomplete rows from the database (i.e., rows missing a password, e-mail address, or both) and removing any accounts that were clearly created by SPAMbots (I&#8217;m only interested in the password strength of real users).</p>
<p>Also, I will only look at accounts that contain an e-mail address with a domain that was registered in the database at least 50 times. This restriction is in place partly because it is extremely difficult to compute any sort of meaningful statistics on something with a sample size that is much smaller than 50, and it is partly due to the fact that Gawker doesn&#8217;t require verified e-mail addresses (so 46993 of the 52593 domain names listed in the database were used by exactly one person, many of which are clearly fake and/or for SPAM).</p>
<p>After making the aforementioned &#8220;fixes&#8221; to the database, there are 412670 accounts, 157794 (38.2%) of which had their password decrypted.</p>
<h3><strong>Password Strength by Domain Name</strong></h3>
<p>The following table displays the 10 most frequently-occurring domain names used for e-mail addresses in the database along with how many users of the domain had their password cracked.</p>
<table style="margin-left: auto; margin-right: auto;">
<tbody>
<tr>
<th>Domain</th>
<th>Total Accounts</th>
<th>Decrypted Passwords</th>
<th>Decryption %</th>
</tr>
<tr>
<td>gmail.com</td>
<td>158031</td>
<td>50530</td>
<td>32.0%</td>
</tr>
<tr>
<td>yahoo.com</td>
<td>94147</td>
<td>40964</td>
<td>43.5%</td>
</tr>
<tr>
<td>hotmail.com</td>
<td>66752</td>
<td>27332</td>
<td>40.9%</td>
</tr>
<tr>
<td>aol.com</td>
<td>17534</td>
<td>8151</td>
<td>46.5%</td>
</tr>
<tr>
<td>comcast.net</td>
<td>7222</td>
<td>2801</td>
<td>38.8%</td>
</tr>
<tr>
<td>msn.com</td>
<td>5544</td>
<td>2250</td>
<td>40.6%</td>
</tr>
<tr>
<td>mac.com</td>
<td>4951</td>
<td>1750</td>
<td>35.3%</td>
</tr>
<tr>
<td>sbcglobal.net</td>
<td>3896</td>
<td>1667</td>
<td>42.8%</td>
</tr>
<tr>
<td>hotmail.co.uk</td>
<td>3204</td>
<td>1476</td>
<td>46.1%</td>
</tr>
<tr>
<td>verizon.net</td>
<td>2211</td>
<td>860</td>
<td>38.9%</td>
</tr>
</tbody>
</table>
<p>The following table shows the z-values associated with the statistical test that the two given domains have the same proportion of users with strong passwords. Differences that are statistically significant at the α = 0.01 level are in <strong><span style="color: #000000;">bold</span></strong>. Click on a z-value to see a normal distribution showing the associated p-value. Notice in particular that gmail.com users have stronger passwords than users of any of the other top-10 domain names, while aol.com and hotmail.co.uk users have the weakest passwords.</p>
<table style="font-size: 80%; margin-left: auto; margin-right: auto;">
<tbody>
<tr>
<th></th>
<th>Yahoo</th>
<th>Hotmail</th>
<th>AOL</th>
<th>Comcast</th>
<th>MSN</th>
<th>Mac</th>
<th>SBC</th>
<th>HotmailUK</th>
<th>Verizon</th>
</tr>
<tr>
<th>GMail</th>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=58.28">58.28</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=40.84">40.84</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=38.65">38.65</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=12.10">12.10</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=13.48">13.48</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=5.00">5.00</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=14.27">14.27</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=16.89">16.89</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=6.92">6.92</a></span></strong></td>
</tr>
<tr>
<th>Yahoo</th>
<td>–</td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-10.26">-10.26</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=7.29">7.29</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-7.81">-7.81</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-4.27">-4.27</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-11.31">-11.31</a></strong></span></td>
<td><a href="http://www.statdistributions.com/normal?z=-0.89">-0.89</a></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=2.87">2.87</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-4.33">-4.33</a></strong></span></td>
</tr>
<tr>
<th>Hotmail</th>
<td>–</td>
<td>–</td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=13.23">13.23</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=-3.55">-3.55</a></span></strong></td>
<td><a href="http://www.statdistributions.com/normal?z=-0.53">-0.53</a></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-7.74">-7.74</a></strong></span></td>
<td><a href="http://www.statdistributions.com/normal?z=2.27">2.27</a></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=5.75">5.75</a></strong></span></td>
<td><a href="http://www.statdistributions.com/normal?z=-1.93">-1.93</a></td>
</tr>
<tr>
<th>AOL</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=-11.09">-11.09</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=-7.70">-7.70</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=-13.94">-13.94</a></span></strong></td>
<td><strong><span style="color: #800000;"><a href="http://www.statdistributions.com/normal?z=-4.19">-4.19</a></span></strong></td>
<td><a href="http://www.statdistributions.com/normal?z=-0.44">-0.44</a></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-6.75">-6.75</a></strong></span></td>
</tr>
<tr>
<th>Comcast</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><a href="http://www.statdistributions.com/normal?z=2.06">2.06</a></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-3.85">-3.85</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=4.11">4.11</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=6.98">6.98</a></strong></span></td>
<td><a href="http://www.statdistributions.com/normal?z=0.09">0.09</a></td>
</tr>
<tr>
<th>MSN</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-5.52">-5.52</a></strong></span></td>
<td><a href="http://www.statdistributions.com/normal?z=2.14">2.14</a></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=5.00">5.00</a></strong></span></td>
<td><a href="http://www.statdistributions.com/normal?z=-1.37">-1.37</a></td>
</tr>
<tr>
<th>Mac</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=7.14">7.14</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=9.67">9.67</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=2.88">2.88</a></strong></span></td>
</tr>
<tr>
<th>SBC</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=2.77">2.77</a></strong></span></td>
<td><span style="color: #800000;"><strong><a href="http://www.statdistributions.com/normal?z=-2.97">-2.97</a></strong></span></td>
</tr>
<tr>
<th>HotmailUK</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-5.24">-5.24</a></strong></td>
</tr>
</tbody>
</table>
<h3>Educational Institutions</h3>
<p>Not surprisingly, users who entered an e-mail address from an educational institution typically had stronger passwords than the general population. Of the 2092 users who provided a college or university-based e-mail address, only 697 (33.3%) were decrypted. This proportion is significantly lower than the corresponding proportion for the general population (z = 4.64, <a href="http://www.statdistributions.com/normal?z=4.64">p &lt; 0.001</a>).</p>
<p>However, two universities stood out as having particularly weak passwords: of the 56 users who used a University of Texas e-mail address, 27 (48.2%) had their password decrypted, and similarly 101 (45.1%) of 224 New York University passwords were decrypted.</p>
<h3>ISP-Provided E-Mail Users</h3>
<p>Users who used an e-mail address provided to them by their ISP (such as something@comcast.net) typically had weaker passwords than the general population, a fact that can perhaps be explained by the fact that tech-unsavvy folks are less likely to go out and get a new e-mail address for themselves at a place like GMail. Of the 31667 users who provided an ISP-based e-mail address, 13053 (41.2%) of them had their password decrypted. This proportion is significantly higher than the corresponding proportion for the general population (z = -11.36, <a href="http://www.statdistributions.com/normal?z=-11.36">p &lt; 0.001</a>).</p>
<h3>E-Mail Addresses with Typos</h3>
<p>Also unsurprisingly, users who entered an obvious typo in their e-mail address were much more likely to have a weak password than people who entered their e-mail address correctly (by &#8220;obvious typo&#8221; I basically mean an e-mail address containing a typo of a common domain name, such as &#8220;fred@yahoo,com&#8221; or &#8220;fred@hotmail&#8221;). Of the 530 users with a typo in their e-mail address, 280 (52.8%) had passwords that were decrypted. This proportion is significantly higher than the average (z = -6.87, <a href="http://www.statdistributions.com/normal?z=-6.87">p &lt; 0.001</a>).</p>
<h3>Password Strength by Country</h3>
<p>The following table shows the strength of user passwords based on the country associated with their e-mail address. Of course some e-mail addresses provide no information about the user&#8217;s country, so domains that serve a largely international market (such as gmail.com, mac.com and aim.com) are excluded from this analysis.</p>
<table style="margin-left: auto; margin-right: auto;">
<tbody>
<tr>
<th>Country</th>
<th>Total Accounts</th>
<th>Decrypted Passwords</th>
<th>Decryption %</th>
</tr>
<tr>
<td>India</td>
<td>3129</td>
<td>1448</td>
<td>46.3%</td>
</tr>
<tr>
<td>United Kingdom</td>
<td>6874</td>
<td>3057</td>
<td>44.5%</td>
</tr>
<tr>
<td>China</td>
<td>1411</td>
<td>600</td>
<td>42.5%</td>
</tr>
<tr>
<td>Canada</td>
<td>2825</td>
<td>1160</td>
<td>41.1%</td>
</tr>
<tr>
<td>United States</td>
<td>30891</td>
<td>12507</td>
<td>40.5%</td>
</tr>
<tr>
<td>Germany</td>
<td>1378</td>
<td>484</td>
<td>35.1%</td>
</tr>
<tr>
<td>Russia</td>
<td>2223</td>
<td>533</td>
<td>24.0%</td>
</tr>
</tbody>
</table>
<p>So Russia and Germany are the big winners when it comes to password strength, while India and the United Kingdom seem to have the weakest passwords. The following table shows the z-values associated with the statistical test that the two given countries have the same proportion of users with strong passwords. Differences that are statistically significant at the α = 0.01 level are in <strong>bold</strong>. Click on a z-value to see a normal distribution showing the associated p-value.</p>
<table style="font-size: 80%; margin-left: auto; margin-right: auto;">
<tbody>
<tr>
<th></th>
<th>UK</th>
<th>China</th>
<th>Canada</th>
<th>US</th>
<th>Germany</th>
<th>Russia</th>
</tr>
<tr>
<th>India</th>
<td><a href="http://www.statdistributions.com/normal?z=-1.67">-1.67</a></td>
<td><a href="http://www.statdistributions.com/normal?z=-2.32"> -2.32</a></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-4.03">-4.03</a></strong></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-6.26">-6.26</a></strong></td>
<td><strong> <a href="http://www.statdistributions.com/normal?z=-6.94">-6.94</a></strong></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-16.62">-16.62</a></strong></td>
</tr>
<tr>
<th>UK</th>
<td>–</td>
<td><a href="http://www.statdistributions.com/normal?z=-1.31">-1.31</a></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-3.06">-3.06</a></strong></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-6.05">-6.05</a></strong></td>
<td><strong> <a href="http://www.statdistributions.com/normal?z=-6.37">-6.37</a></strong></td>
<td><strong> <a href="http://www.statdistributions.com/normal?z=-17.16">-17.16</a></strong></td>
</tr>
<tr>
<th>China</th>
<td>–</td>
<td>–</td>
<td><a href="http://www.statdistributions.com/normal?z=-0.88">-0.88</a></td>
<td><a href="http://www.statdistributions.com/normal?z=-1.49">-1.49</a></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-3.97">-3.97</a></strong></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-11.72">-11.72</a></strong></td>
</tr>
<tr>
<th>Canada</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td><a href="http://www.statdistributions.com/normal?z=-0.57">-0.57</a></td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-3.67">-3.67</a></strong></td>
<td><strong> <a href="http://www.statdistributions.com/normal?z=-12.73">-12.73</a></strong></td>
</tr>
<tr>
<th>United States</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-3.95">-3.95</a></strong></td>
<td><strong> <a href="http://www.statdistributions.com/normal?z=-15.37">-15.37</a></strong></td>
</tr>
<tr>
<th>Germany</th>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td><strong><a href="http://www.statdistributions.com/normal?z=-7.18">-7.18</a></strong></td>
</tr>
</tbody>
</table>
<p>Attached below is an Excel Spreadsheet containing significantly more detailed information than the snippets contained in this post (though of course all passwords, e-mail addresses and personally-identifiable information has been removed).</p>
<p><strong>Download:</strong> <a href="http://www.nathanieljohnston.com/wp-content/uploads/2010/12/GawkerDBStats.xls">Gawker Database Statistics</a> [Excel spreadsheet]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.njohnston.ca/2010/12/statistical-analysis-of-password-strength-via-gawkers-leaked-database/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>P-Value Calculators and Graphers in Javascript</title>
		<link>http://www.njohnston.ca/2010/09/p-value-calculators-and-graphers-in-javascript/</link>
		<comments>http://www.njohnston.ca/2010/09/p-value-calculators-and-graphers-in-javascript/#comments</comments>
		<pubDate>Sun, 05 Sep 2010 20:37:06 +0000</pubDate>
		<dc:creator>Nathaniel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Websites]]></category>

		<guid isPermaLink="false">http://www.nathanieljohnston.com/?p=1170</guid>
		<description><![CDATA[There are a lot of online tools out there for computing p-values and test statistics associated with common statistical distributions such as the normal or Student&#8217;s t-distributions. Unfortunately, most of them are either ad-ridden or powered by Java (and hence slow to initially load and finicky when it comes to which browsers they work with). [...]]]></description>
			<content:encoded><![CDATA[<p>There are a lot of online tools out there for computing p-values and test statistics associated with common statistical distributions such as the <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal</a> or <a href="http://en.wikipedia.org/wiki/Student's_t-distribution">Student&#8217;s t</a>-distributions. Unfortunately, most of them are either ad-ridden or powered by Java (and hence slow to initially load and finicky when it comes to which browsers they work with). So one of my summertime projects this year was to create a website that solves both of those problems:</p>
<p style="text-align: center;"><a href="http://www.statdistributions.com/"><img class="aligncenter size-full wp-image-1174" title="StatDistributions.com" src="http://www.nathanieljohnston.com/wp-content/uploads/2010/09/statdistributions.png" alt="" width="506" height="42" /></a></p>
<p>The website computes p-values and test statistics in real-time via javascript (and thus does not need Java or any other plug-in). The computations themselves are fairly straightforward and are performed via the <a href="http://en.wikipedia.org/wiki/Trapezoidal_rule">trapezoid rule</a>. The graphic on the right is composed of a static PNG that displays the appropriate distribution. The distribution&#8217;s image is transparent under the graph and opaque above the graph, which makes it easy to display the p-value graphically – the light blue area is actually just a blue rectangle that is drawn beneath the distribution&#8217;s image.</p>
<p>Additionally, through the magic of PHP the tool automatically creates a URL that links to the current computation (and thus makes it much more citable). So, for example, if you want to know the T-value that corresponds to a right-tailed test with 12 degrees of freedom and a p-value of 0.1, you could simply <a href="http://www.statdistributions.com/t?p=0.1&amp;df=12&amp;tail=2">click here</a>.</p>
<p>Anyway, if you&#8217;re a nerd like me then enjoy it and of course feel free to leave any feedback/suggestions that you might have.</p>
<p><a href="http://www.statdistributions.com/"><img class="aligncenter size-full wp-image-1175" title="Normal distribution with p = 0.05" src="http://www.nathanieljohnston.com/wp-content/uploads/2010/09/normal.png" alt="" width="612" height="349" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.njohnston.ca/2010/09/p-value-calculators-and-graphers-in-javascript/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Keep the &quot;Info&quot; Before the &quot;Graphic&quot;</title>
		<link>http://www.njohnston.ca/2009/11/keep-info-before-graphic/</link>
		<comments>http://www.njohnston.ca/2009/11/keep-info-before-graphic/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 13:00:57 +0000</pubDate>
		<dc:creator>Nathaniel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.nathanieljohnston.com/?p=427</guid>
		<description><![CDATA[The term &#8220;infographic&#8221; is a ridiculous little buzzword that really took off on the internet sometime last year. It used to refer to genuinely useful things like subway maps and blueprints. Recently, however, the term has come to mean &#8220;an obnoxiously oversized image that has numbers on it&#8221;. My problem isn&#8217;t with infographics like these [...]]]></description>
			<content:encoded><![CDATA[<p>The term &#8220;infographic&#8221; is a ridiculous little buzzword that really took off on the internet sometime last year. It used to refer to genuinely useful things like subway maps and blueprints. Recently, however, the term has come to mean &#8220;an obnoxiously oversized image that has numbers on it&#8221;.  My problem isn&#8217;t with infographics like <a href="http://blog.dailyfill.com/timetravel.html">these</a> <a href="http://xkcd.com/657/large/">ones</a> that just display some fun, meaningless information is a visual way, or <a href="http://www.flickr.com/photos/michaelpaukner/4041257378/sizes/o/in/pool-16135094@N00/">this one</a> that displays a phenomenon that is inherently visual. My beef is with infographics that reduce a variety of related statistics to an oversized mess of overlapping graphs and charts that are (purposely or otherwise) misleading.</p>
<p>This post will present four rules that infographic designers, if they decide that they absolutely <em>must</em> make an infographic, should always follow (but often don&#8217;t). To get the ball rolling, let&#8217;s consider an example that made its way around the internet just a couple of weeks ago (<a href="http://www.throng.co.nz/ratings/american-2009-tv-season-ratings">source</a>):</p>
<div id="attachment_868" class="wp-caption aligncenter" style="width: 641px"><a href="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/american-2009-season-ratings.gif"><img class="size-full wp-image-868 " title="American 2009 Season Premieres and Averages to Date" src="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/american-2009-season-ratings.gif" alt="american-2009-season-ratings" width="631" height="71" /></a><p class="wp-caption-text">American 2009 Season Premieres and Averages to Date (click to enlarge)</p></div>
<p>We are told that the above infographic depicts the US viewership for a variety of shows during their premiere (<span style="color: #ce0000;">light red</span>) and on average since they began their 2009 season (<span style="color: #7e0000;">dark red</span>). However, I have two main problems with the image, and they&#8217;re both problems that are prevalent throughout many infographics and can easily be solved by just using a simple bar graph.</p>
<p><strong>1. Infographics should not require horizontal scrolling.</strong> The above infographic is 3133 pixels wide, which means there is no consumer-available monitor in the world capable of displaying the entire image on one screen without scrunching it down. This is apparently exactly what infographic makers want, since they all seem to subscribe to the school of thought that dictates their image deserves 45 inches of horizontal viewing space. This would be fine if infographics were readable when zoomed out, but by their very nature they almost never are.</p>
<p>Computer monitors were not meant to view posters. If you want to make the image high-resolution enough that it can be printed out as a poster then it should be created as a vector graphic, not a raster graphic. If you <em>still</em> insist that your infographic should be a monstrously large bitmap, make it readable from a zoom level that will fit on standard monitor resolutions.</p>
<p>Some other popular infographics that suffer from this problem are <a href="http://www.mint.com/blog/trends/the-new-auto-industry-breakdown/?display=wide">the new auto industry breakdown</a>, <a href="http://awesome.good.is//transparency/web/0911/flat.html">weight of the world</a>, and <a href="http://awesome.good.is/goodsheet/goodsheet009First100Days.html">the first 100 days</a>.</p>
<p><strong>2. Two-dimensional figures should <em>never</em> be used to compare linear data.</strong> The above infographic compares the number of people watching different shows, so why are circles being used to represent the data? What represents the number of viewers &#8212; the radius of the circle or the area of the circle? The source doesn&#8217;t tell us, so we have no way of appropriately assessing how many more people are viewing NCIS: Los Angeles than The Good Wife. If it&#8217;s the radius of the circle, NCIS appears to have about 5% more viewers. If it&#8217;s the area of the circle then it&#8217;s probably over 10% (and the discrepancy gets much larger if you compare shows that are farther apart).</p>
<p>Furthermore, even if we <em>were</em> told whether it&#8217;s the radii of the circles or their areas that we should be looking at, there&#8217;s still a problem. If the radii are what are being compared, then the visual is misleading because the differences in areas cause the relative differences to appear larger than they actually are. If the areas are what are being compared, then it should be noted that people just plain suck at visually comparing areas. By looking at the above image (and not getting out a ruler or anything) can you tell which circles have about half as much area as the NCIS: Los Angeles circle? Can you tell how much higher the viewership of The Good Wife is than that of Glee? I certainly can&#8217;t, at least not quickly.</p>
<p>InfomationIsBeautiful.net is a particularly notorious violator of this rule, as these three examples show: <a href="http://www.informationisbeautiful.net/2009/visualising-the-guardian-datablog/">deadliest drugs</a>, <a href="http://www.informationisbeautiful.net/2009/how-safe-is-the-hpv-vaccine/">how safe is the HPV vaccine?</a>, <a href="http://www.informationisbeautiful.net/visualizations/reduce-your-chances-of-dying-in-a-plane-crash/">reduce your chances of dying in a plane crash</a> (scroll down to the &#8220;bad month&#8221; and &#8220;the odds&#8221; sections). What&#8217;s worse is they aren&#8217;t even consistent with whether it&#8217;s the areas of the circles or the radii of the circles they&#8217;re comparing.</p>
<p>Problems #1 and #2 can both be rectified by simply turning the data into a bar graph. A plain old-fashioned bar graph. Voila:</p>
<div id="attachment_878" class="wp-caption aligncenter" style="width: 654px"><a href="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/graph11.png"><img class="size-full wp-image-878" title="American 2009 Season Premieres and Averages to Date (easier to read)" src="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/graph11.png" alt="American 2009 Season Premieres and Averages to Date (easier to read)" width="644" height="427" /></a><p class="wp-caption-text">American 2009 Season Premieres and Averages to Date (easier to read)</p></div>
<p>The above bar graph doesn&#8217;t need to be zoomed in to be read, it makes it easier to compare the relative viewership of each show, and it actually contains more data than the previous infographic thanks to the labels on the vertical axis.</p>
<p>The next example (<a href="http://www.flickr.com/photos/metrobest/3491197426/sizes/o/in/set-72157617478192160/">source</a>) supposedly explains how and why <a href="http://en.wikipedia.org/wiki/Low_cost_carrier">low-cost airlines</a> are able to offer flights that are so much cheaper than other airlines. It made its rounds this last spring during recession fever, when anything that had anything to do with something being cheap was instantly popular. While it does not suffer from problem #1 above (since it is readable when zoomed out), it suffers from two instances of problem #2 as well as multiple other problems.</p>
<div id="attachment_885" class="wp-caption aligncenter" style="width: 641px"><a href="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/3491197426_6ccebde82d_o.jpg"><img class="size-full wp-image-885" title="How come cheap airlines are so cheap?" src="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/3491197426_6ccebde82d_o.jpg" alt="How come airlines are so cheap?" width="631" height="1077" /></a><p class="wp-caption-text">How come cheap airlines are so cheap? (click to enlarge)</p></div>
<p><strong>3. Infographics (and everything else) should be about substance over style.</strong> While there&#8217;s no denying that the above infographic is pretty, does it actually tell us anything? Beyond the myriad of small problems such as the average fare of Southwest flights including cents when none of the other numbers do, the misspelling of &#8220;Aer Lingus&#8221; and &#8220;maintenance&#8221;, and the mysterious 43% &#8220;total advantage&#8221; at the bottom that seems to pop out of nowhere, the infographic at its core doesn&#8217;t even make sense.</p>
<p>As the infographic itself says, low-cost airlines generally don&#8217;t do long-haul flights; they focus on short point-to-point routes. So why are their average fares being compared to the average fares of the likes of British Airways, who regularly do intercontinental flights? Doesn&#8217;t it make sense that travel distance makes more of a contribution to the price of the flight than whether or not tickets are sold primarily online? Average fare per kilometer travelled would make more sense to compare, though it would still be misleading because take-off and landing are disproportionately expensive.</p>
<p>Another recent offending infographic that just simply doesn&#8217;t say a thing is <a href="http://blog.dailyfill.com/400millionclub.html">the $400 million club</a>, which notes that Transformers: Revenge of the Fallen is only the ninth movie in history to gross more than $400 million at the box office in the US during its theatrical run. The infographic then compares the other eight movies, which of course are juggernauts like Star Wars and Titanic. The problem is that none of the figures are adjusted for inflation. If you scale the numbers properly, Transformers: Revenge of the Fallen actually comes in as <a href="http://www.boxofficemojo.com/alltime/adjusted.htm">about the 65th</a> highest-grossing movie. Impressive, sure, but to say that the infographic is misleading is an understatement.</p>
<p>I will finish by presenting a graphic that ran on <a href="http://www.newsweek.com/id/199914">NewsWeek.com</a> that shows obesity and &#8220;life evaluation&#8221; trends over the last year or two. It&#8217;s debatable whether or not it falls into the category of what most people would consider an &#8220;infographic&#8221;, but it perfectly illustrates a core problem with them.</p>
<p style="text-align: center;"><a href="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/obesity_graphic_final.jpg"><img class="aligncenter size-full wp-image-880" title="Obesity infographic" src="http://njohns01home.webfactional.com/wp-content/uploads/2009/11/obesity_graphic_final.jpg" alt="Obesity infographic" width="647" height="389" /></a></p>
<p><strong>4. Be careful with your data.</strong> Just making your graphic pretty doesn&#8217;t give you free reign to ignore basic statistical principles when presenting data. In the above graphic, the left graph shows two lines &#8212; one showing how many people have BMI less than or equal to 30 in a given month and one showing how many people have BMI over 30 in a given month. I have a news flash for you, NewsWeek: one of those lines is redundant. Not only that, but the redundant second line manipulates the reader by giving the false impression that the number of obese people is converging toward the number of non-obese people. Nevermind the fact that the vertical scale is completely out of whack and it jumps a vertical distance of 46.4% in the same amount of space that is used to represent about a 2.5% jump elsewhere.</p>
<p>I&#8217;m willing to bet that the vertical scale on the right graph is completely out of whack too, but it&#8217;s a little difficult to tell since they don&#8217;t tell you what percentages any of the intermediate y-values correspond to. On the blue &#8220;struggling&#8221; line, we are given a value of 48.4% on the left edge of the graph and a value of 49.6% at the right edge of the graph at a nearly identical height. Are we supposed to be able to tell how high and low the peaks in the middle of the graph are based on that? Does the blue line get as low as 40%? 35%? 30%? Would labels along the vertical axis (similar to the bar graph I showed above) really have detracted from the desired aesthetic too much?</p>
<p>So if you have a set of data that you wish to convey graphically, please first consider whether or not it can be presented by a simple bar graph or line graph. If it can, don&#8217;t try to make it more complicated than that. If it can&#8217;t, at least make sure that the information is the motivating factor in your decisions. If the layout ends up dictating how you present your data, you&#8217;ve got your priorities backward.</p>
<p style="text-align: center;">
]]></content:encoded>
			<wfw:commentRss>http://www.njohnston.ca/2009/11/keep-info-before-graphic/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>IMDb Movie Ratings Over the Years</title>
		<link>http://www.njohnston.ca/2009/10/imdb-movie-ratings-over-the-years/</link>
		<comments>http://www.njohnston.ca/2009/10/imdb-movie-ratings-over-the-years/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 12:00:55 +0000</pubDate>
		<dc:creator>Nathaniel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Movies]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.nathanieljohnston.com/?p=774</guid>
		<description><![CDATA[It&#8217;s time for a random dose of statistics courtesy of The Internet Movie Database. Let&#8217;s consider all movies that have been released theatrically over the last 60 years and see whether there is a trend in their perceived quality over time. That is, do new movies generally receive higher or lower scores on IMDb than [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s time for a random dose of statistics courtesy of <a href="http://www.imdb.com/">The Internet Movie Database</a>. Let&#8217;s consider all movies that have been released theatrically over the last 60 years and see whether there is a trend in their perceived quality over time. That is, do new movies generally receive higher or lower scores on IMDb than old movies?</p>
<p>Before looking at the numbers though, we need some rules to clarify what types of movies we are considering:</p>
<ul>
<li>We only consider theatrically-released films &#8212; no straight-to-video movies or TV movies.</li>
<li>Short films that were released theatrically (such as Pixar&#8217;s <a href="http://www.imdb.com/title/tt1245104/">Presto</a>) <em>are</em> included.</li>
<li>We only consider movies that have received 1000 or more votes. This restriction is to prevent movies with only a handful of votes from skewing the results too much.</li>
<li>The theatrical release date of the movie must have been at least as recent at 1950.</li>
</ul>
<p>IMDb contains 10034 movies that satisfy the above criteria. The average score (on a scale of 1 to 10) of those movies is 6.38 and the median score is 6.6. The average score per release year is given by the following graph:</p>
<p><img class="aligncenter size-full wp-image-775" title="IMDb Ratings" src="http://njohns01home.webfactional.com/wp-content/uploads/2009/10/imdbRatings.png" alt="IMDb Ratings" width="609" height="387" /></p>
<p>As you can see, older movies (1950 &#8211; 1975) have abnormally high scores, as do very recent movies (2000 &#8211; 2009). These differences are indeed statistically significant. For example, the p-value associated with the test that the mean score in 1950 is the same as the mean score in 1989 is less than 10<sup>-19</sup>. The p-value associated with the test that the mean score in 2008 is the same as the mean score in 1989 is about 0.0021. Other nearby years give similar p-values.</p>
<p>So this tells us that, in general, particularly old movies receive the highest scores, followed by newly-released movies, followed by &#8220;semi-old&#8221; movies from the 1980&#8242;s and 1990&#8242;s. So why the differences? Were movies from the 1980&#8242;s really just that bad? Possibly, but the more likely explanation is that movies from the 1950&#8242;s  through 1970&#8242;s have artificially higher scores because people don&#8217;t generally go back and watch the crummy movies of the last generation, so they get forgotten and do not have 1000 votes on IMDb. Will people be watching <a href="http://www.imdb.com/title/tt1213644/">Disaster Movie</a> in forty years? I sure hope not.</p>
<p>On the other hand, particularly recent movies tend to draw a fair amount of hype and fanboyism. Remember when <em>The Dark Knight</em> had a score of 9.8 and was at #1 on the <a href="http://www.imdb.com/chart/top">IMDb top 250</a>? Now, one year later, it has a score of 8.9 and is located at #9 on the top 250. It will likely dwindle a little further down over the coming years as well.</p>
<h3>The Best and Worst of Each Year</h3>
<p>While we&#8217;re looking at ratings of movies over the years, I suppose I might as well provide a list of the best and worst movie of each year (based on the votes of IMDb users), since such a list is not available on the IMDb website itself to my knowledge. Keep in mind that, as before, only movies with 1000 or more votes are considered. Enjoy!</p>
<table style="margin-left:auto;margin-right:auto" border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<th>Year</th>
<th>Best</th>
<th>Worst</th>
</tr>
<tr>
<td>1950</td>
<td><a href="http://www.imdb.com/title/tt0043014/">Sunset Blvd.</a></td>
<td><a href="http://www.imdb.com/title/tt0042393/">Destination Moon</a></td>
</tr>
<tr>
<td>1951</td>
<td><a href="http://www.imdb.com/title/tt0044079/">Strangers on a Train</a></td>
<td><a href="http://www.imdb.com/title/tt0043548/">Flying Padre: An RKO-Pathe Screenliner</a></td>
</tr>
<tr>
<td>1952</td>
<td><a href="http://www.imdb.com/title/tt0045152/">Singin&#8217; in the Rain</a></td>
<td><a href="http://www.imdb.com/title/tt0044762/">Jack and the Beanstalk</a></td>
</tr>
<tr>
<td>1953</td>
<td><a href="http://www.imdb.com/title/tt0045708/">Duck Amuck</a></td>
<td><a href="http://www.imdb.com/title/tt0046248/">Robot Monster</a></td>
</tr>
<tr>
<td>1954</td>
<td><a href="http://www.imdb.com/title/tt0047396/">Rear Window</a></td>
<td><a href="http://www.imdb.com/title/tt0047127/">Jail Bait</a></td>
</tr>
<tr>
<td>1955</td>
<td><a href="http://www.imdb.com/title/tt0048434/">Nuit et brouillard</a></td>
<td><a href="http://www.imdb.com/title/tt0047898/">Bride of the Monster</a></td>
</tr>
<tr>
<td>1956</td>
<td><a href="http://www.imdb.com/title/tt0049406/">The Killing</a></td>
<td><a href="http://www.imdb.com/title/tt0049092/">The Conqueror</a></td>
</tr>
<tr>
<td>1957</td>
<td><a href="http://www.imdb.com/title/tt0050083/">12 Angry Men</a></td>
<td><a href="http://www.imdb.com/title/tt0050177/">Beginning of the End</a></td>
</tr>
<tr>
<td>1958</td>
<td><a href="http://www.imdb.com/title/tt0052357/">Vertigo</a></td>
<td><a href="http://www.imdb.com/title/tt0052169/">The Screaming Skull</a></td>
</tr>
<tr>
<td>1959</td>
<td><a href="http://www.imdb.com/title/tt0053125/">North by Northwest</a></td>
<td><a href="http://www.imdb.com/title/tt0053464/">Yusei oji</a></td>
</tr>
<tr>
<td>1960</td>
<td><a href="http://www.imdb.com/title/tt0054215/">Psycho</a></td>
<td><a href="http://www.imdb.com/title/tt0054333/">Ein Toter hing im Netz</a></td>
</tr>
<tr>
<td>1961</td>
<td><a href="http://www.imdb.com/title/tt0055913/">Divorzio all&#8217;italiana</a></td>
<td><a href="http://www.imdb.com/title/tt0054673/">The Beast of Yucca Flats</a></td>
</tr>
<tr>
<td>1962</td>
<td><a href="http://www.imdb.com/title/tt0056172/">Lawrence of Arabia</a></td>
<td><a href="http://www.imdb.com/title/tt0055946/">Eegah</a></td>
</tr>
<tr>
<td>1963</td>
<td><a href="http://www.imdb.com/title/tt0057115/">The Great Escape</a></td>
<td><a href="http://www.imdb.com/title/tt0057507/">The Skydivers</a></td>
</tr>
<tr>
<td>1964</td>
<td><a href="http://www.imdb.com/title/tt0057012/">Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb</a></td>
<td><a href="http://www.imdb.com/title/tt0058615/">The Starfighters</a></td>
</tr>
<tr>
<td>1965</td>
<td><a href="http://www.imdb.com/title/tt0059578/">Per qualche dollaro in più</a></td>
<td><a href="http://www.imdb.com/title/tt0059464/">Monster a-Go Go</a></td>
</tr>
<tr>
<td>1966</td>
<td><a href="http://www.imdb.com/title/tt0060196/">Il buono, il brutto, il cattivo.</a></td>
<td><a href="http://www.imdb.com/title/tt0060753/">Night Train to Mundo Fine</a></td>
</tr>
<tr>
<td>1967</td>
<td><a href="http://www.imdb.com/title/tt0061512/">Cool Hand Luke</a></td>
<td><a href="http://www.imdb.com/title/tt0061759/">The Hellcats</a></td>
</tr>
<tr>
<td>1968</td>
<td><a href="http://www.imdb.com/title/tt0064116/">C&#8217;era una volta il West</a></td>
<td><a href="http://www.imdb.com/title/tt0174685/">Girl in Gold Boots</a></td>
</tr>
<tr>
<td>1969</td>
<td><a href="http://www.imdb.com/title/tt0066904/">Le chagrin et la pitié</a></td>
<td><a href="http://www.imdb.com/title/tt0061671/">Five the Hard Way</a></td>
</tr>
<tr>
<td>1970</td>
<td><a href="http://www.imdb.com/title/tt0066078/">Mihai Viteazul</a></td>
<td><a href="http://www.imdb.com/title/tt0065832/">Hercules in New York</a></td>
</tr>
<tr>
<td>1971</td>
<td><a href="http://www.imdb.com/title/tt0065670/">12 stulyev</a></td>
<td><a href="http://www.imdb.com/title/tt0066476/">The Touch of Satan</a></td>
</tr>
<tr>
<td>1972</td>
<td><a href="http://www.imdb.com/title/tt0068646/">The Godfather</a></td>
<td><a href="http://www.imdb.com/title/tt0069005/">Night of the Lepus</a></td>
</tr>
<tr>
<td>1973</td>
<td><a href="http://www.imdb.com/title/tt0070735/">The Sting</a></td>
<td><a href="http://www.imdb.com/title/tt0070122/">Gojira tai Megaro</a></td>
</tr>
<tr>
<td>1974</td>
<td><a href="http://www.imdb.com/title/tt0071562/">The Godfather: Part II</a></td>
<td><a href="http://www.imdb.com/title/tt0071198/">The Bat People</a></td>
</tr>
<tr>
<td>1975</td>
<td><a href="http://www.imdb.com/title/tt0252487/">Hababam sinifi</a></td>
<td><a href="http://www.imdb.com/title/tt0072666/">Zaat</a></td>
</tr>
<tr>
<td>1976</td>
<td><a href="http://www.imdb.com/title/tt0253828/">Tosun Pasa</a></td>
<td><a href="http://www.imdb.com/title/tt0075343/">Track of the Moon Beast</a></td>
</tr>
<tr>
<td>1977</td>
<td><a href="http://www.imdb.com/title/tt0253614/">Saban Oglu Saban</a></td>
<td><a href="http://www.imdb.com/title/tt0076191/">The Incredible Melting Man</a></td>
</tr>
<tr>
<td>1978</td>
<td><a href="http://www.imdb.com/title/tt0252597/">Kibar Feyzo</a></td>
<td><a href="http://www.imdb.com/title/tt0077834/">Laserblast</a></td>
</tr>
<tr>
<td>1979</td>
<td><a href="http://www.imdb.com/title/tt0078788/">Apocalypse Now</a></td>
<td><a href="http://www.imdb.com/title/tt0078778/">Angels&#8217; Brigade</a></td>
</tr>
<tr>
<td>1980</td>
<td><a href="http://www.imdb.com/title/tt0080684/">Star Wars: Episode V &#8211; The Empire Strikes Back</a></td>
<td><a href="http://www.imdb.com/title/tt0081693/">L&#8217;uomo puma</a></td>
</tr>
<tr>
<td>1981</td>
<td><a href="http://www.imdb.com/title/tt0082971/">Raiders of the Lost Ark</a></td>
<td><a href="http://www.imdb.com/title/tt0081027/">Le lac des morts vivants</a></td>
</tr>
<tr>
<td>1982</td>
<td><a href="http://www.imdb.com/title/tt0084868/">Vincent</a></td>
<td><a href="http://www.imdb.com/title/tt0084316/">Megaforce</a></td>
</tr>
<tr>
<td>1983</td>
<td><a href="http://www.imdb.com/title/tt0085743/">Jaane Bhi Do Yaaro</a></td>
<td><a href="http://www.imdb.com/title/tt0086026/">Los nuevos extraterrestres</a></td>
</tr>
<tr>
<td>1984</td>
<td><a href="http://www.imdb.com/title/tt0086935/">Balkanski spijun</a></td>
<td><a href="http://www.imdb.com/title/tt0086972/">Ator l&#8217;invincibile 2</a></td>
</tr>
<tr>
<td>1985</td>
<td><a href="http://www.imdb.com/title/tt0089108/">Esperando la carroza</a></td>
<td><a href="http://www.imdb.com/title/tt0087258/">Final Justice</a></td>
</tr>
<tr>
<td>1986</td>
<td><a href="http://www.imdb.com/title/tt0090605/">Aliens</a></td>
<td><a href="http://www.imdb.com/title/tt0092297/">Zombie Nightmare</a></td>
</tr>
<tr>
<td>1987</td>
<td><a href="http://www.imdb.com/title/tt0093488/">L&#8217;homme qui plantait des arbres</a></td>
<td><a href="http://www.imdb.com/title/tt0093405/">Leonard Part 6</a></td>
</tr>
<tr>
<td>1988</td>
<td><a href="http://www.imdb.com/title/tt0095765/">Nuovo cinema Paradiso</a></td>
<td><a href="http://www.imdb.com/title/tt0089280/">Hobgoblins</a></td>
</tr>
<tr>
<td>1989</td>
<td><a href="http://www.imdb.com/title/tt0097564/">Ilha das Flores</a></td>
<td><a href="http://www.imdb.com/title/tt0098156/">R.O.T.O.R.</a></td>
</tr>
<tr>
<td>1990</td>
<td><a href="http://www.imdb.com/title/tt0099685/">Goodfellas</a></td>
<td><a href="http://www.imdb.com/title/tt0131550/">The Final Sacrifice</a></td>
</tr>
<tr>
<td>1991</td>
<td><a href="http://www.imdb.com/title/tt0102926/">The Silence of the Lambs</a></td>
<td><a href="http://www.imdb.com/title/tt0101615/">Cool as Ice</a></td>
</tr>
<tr>
<td>1992</td>
<td><a href="http://www.imdb.com/title/tt0105236/">Reservoir Dogs</a></td>
<td><a href="http://www.imdb.com/title/tt0104837/">Meatballs 4</a></td>
</tr>
<tr>
<td>1993</td>
<td><a href="http://www.imdb.com/title/tt0108052/">Schindler&#8217;s List</a></td>
<td><a href="http://www.imdb.com/title/tt0306519/">Barschel &#8211; Mord in Genf?</a></td>
</tr>
<tr>
<td>1994</td>
<td><a href="http://www.imdb.com/title/tt0111161/">The Shawshank Redemption</a></td>
<td><a href="http://www.imdb.com/title/tt0145529/">Tangents</a></td>
</tr>
<tr>
<td>1995</td>
<td><a href="http://www.imdb.com/title/tt0114814/">The Usual Suspects</a></td>
<td><a href="http://www.imdb.com/title/tt0112873/">Dis &#8211; en historie om kjærlighet</a></td>
</tr>
<tr>
<td>1996</td>
<td><a href="http://www.imdb.com/title/tt0117293/">Paradise Lost: The Child Murders at Robin Hood Hills</a></td>
<td><a href="http://www.imdb.com/title/tt0174917/">Merlin&#8217;s Shop of Mystical Wonders</a></td>
</tr>
<tr>
<td>1997</td>
<td><a href="http://www.imdb.com/title/tt0128332/">Masumiyet</a></td>
<td><a href="http://www.imdb.com/title/tt0107838/">Pocket Ninjas</a></td>
</tr>
<tr>
<td>1998</td>
<td><a href="http://www.imdb.com/title/tt0120586/">American History X</a></td>
<td><a href="http://www.imdb.com/title/tt0162930/">Die Hard Dracula</a></td>
</tr>
<tr>
<td>1999</td>
<td><a href="http://www.imdb.com/title/tt0137523/">Fight Club</a></td>
<td><a href="http://www.imdb.com/title/tt0201290/">The Underground Comedy Movie</a></td>
</tr>
<tr>
<td>2000</td>
<td><a href="http://www.imdb.com/title/tt0209144/">Memento</a></td>
<td><a href="http://www.imdb.com/title/tt0252060/">The Tony Blair Witch Project</a></td>
</tr>
<tr>
<td>2001</td>
<td><a href="http://www.imdb.com/title/tt0120737/">The Lord of the Rings: The Fellowship of the Ring</a></td>
<td><a href="http://www.imdb.com/title/tt0118589/">Glitter</a></td>
</tr>
<tr>
<td>2002</td>
<td><a href="http://www.imdb.com/title/tt0317248/">Cidade de Deus</a></td>
<td><a href="http://www.imdb.com/title/tt0364986/">Ben &amp; Arthur</a></td>
</tr>
<tr>
<td>2003</td>
<td><a href="http://www.imdb.com/title/tt0167260/">The Lord of the Rings: The Return of the King</a></td>
<td><a href="http://www.imdb.com/title/tt0339034/">From Justin to Kelly</a></td>
</tr>
<tr>
<td>2004</td>
<td><a href="http://www.imdb.com/title/tt0338013/">Eternal Sunshine of the Spotless Mind</a></td>
<td><a href="http://www.imdb.com/title/tt0270846/">Superbabies: Baby Geniuses 2</a></td>
</tr>
<tr>
<td>2005</td>
<td><a href="http://www.imdb.com/title/tt0476735/">Babam Ve Oglum</a></td>
<td><a href="http://www.imdb.com/title/tt0469849/">Troppo belli</a></td>
</tr>
<tr>
<td>2006</td>
<td><a href="http://www.imdb.com/title/tt1051713/">Kiwi!</a></td>
<td><a href="http://www.imdb.com/title/tt0417056/">Pledge This!</a></td>
</tr>
<tr>
<td>2007</td>
<td><a href="http://www.imdb.com/title/tt1094594/">Heima</a></td>
<td><a href="http://www.imdb.com/title/tt0473310/">Ram Gopal Varma Ki Aag</a></td>
</tr>
<tr>
<td>2008</td>
<td><a href="http://www.imdb.com/title/tt0468569/">The Dark Knight</a></td>
<td><a href="http://www.imdb.com/title/tt1213644/">Disaster Movie</a></td>
</tr>
<tr>
<td>2009 (so far)</td>
<td><a href="http://www.imdb.com/title/tt0361748/">Inglourious Basterds</a></td>
<td><a href="http://www.imdb.com/title/tt1229827/">Jonas Brothers: The 3D Concert Experience</a></td>
</tr>
</tbody>
</table>
<p><strong>Downloads:</strong></p>
<ul>
<li><a href="http://njohns01home.webfactional.com/wp-content/uploads/2009/10/IMDb_ratings.zip">IMDb rating data</a> [.zip of an Excel spreadsheet -- 341KB]</li>
<li><a href="http://njohns01home.webfactional.com/wp-content/uploads/2009/10/IMDb_ratings.txt">IMDb rating data</a> [tab-delimited plaintext -- 0.97MB]</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.njohnston.ca/2009/10/imdb-movie-ratings-over-the-years/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
	</channel>
</rss>

