tag:blogger.com,1999:blog-13639537068340771882024-03-19T02:16:06.181-07:00The Numerical and Insightful Blog<br> <br> <br> <br> <br>Welcome to our blog focusing on interesting concepts and products built using mathematical ideas. We invite you to join the discussion and create material to entertain, inform and educate. We will be qualitative and descriptive rather than technical and precise.<br>
Visit <a href="http://www.numericinsight.com" rel="external">Numeric Insight, Inc</a> for company information.Shashi Sathyanarayanahttp://www.blogger.com/profile/13278009180608463800noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-1363953706834077188.post-71392370657681078422015-08-24T14:22:00.000-07:002015-08-24T14:22:06.748-07:00The FFT is Fifty<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzAim0sapCCvUeypGSyX2zzQ7nn1hw63Xm2RQke6Xjkn8f19UdIyBXJPW4UdBIZ4Ow2gObqEDCfm5aMGgiQQATyw-TyYeC3hdMZhBDh0ZIrZOLmIZEso54HCcI2CuHt7jtXeTcQqQPdD8/s1600/FFT_Speedup.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzAim0sapCCvUeypGSyX2zzQ7nn1hw63Xm2RQke6Xjkn8f19UdIyBXJPW4UdBIZ4Ow2gObqEDCfm5aMGgiQQATyw-TyYeC3hdMZhBDh0ZIrZOLmIZEso54HCcI2CuHt7jtXeTcQqQPdD8/s1600/FFT_Speedup.jpg" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
Their 1965 paper starts with the unassuming words "<em style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; font-family: inherit; font-stretch: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">An efficient method for the calculation of the interactions of a 2^m factorial experiment was introduced by Yates and is widely known by his name.</em>" The paper does not use the word "fast" even once. But the algorithm described in that paper written by James Cooley and John Tukey[1] came to be known as the <em style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; font-family: inherit; font-stretch: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Fast Fourier Transform </em>(FFT)<em style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; font-family: inherit; font-stretch: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">,</em> and turned out to be just what the world was waiting for. Overnight, in universities and laboratories around the world, scientists and engineers began writing code and building hardware to implement the FFT. Fifty years later, versions of that algorithm are routinely used, often tacitly assumed to be available to programmers. Any computation that requires (either as an end, or as an intermediate step) the unraveling of the frequency content of information carrying signals benefits from the enormous speedup offered by the FFT. Other than the speedup, the FFT does not offer any computational result that was not available before its discovery. However, were it not for this amazing algorithm, it would be practically impossible to filter cell-phone signals, to compress movie files, do spectroscopy, take magnetic resonance imaging scans, perform quantum computing, and solve differential equations, among other things. The Cooley-Tukey paper created a surge in products that suddenly became possible, and often even became practical to operate in real-time.</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
Although the idea behind FFT can be explained without using complex math (it involves complex numbers, so I guess that does make the math complex), no attempt to do so will be made here. Suffice it to say that the algorithm belongs to a class of mathematical calculations that can either be performed in a straightforward manner, or can be solved by dividing the calculations into smaller steps. Cooley and Tukey showed how the dividing could be done in a manner such that the total effort needed to compute the simpler steps is much less than the effort that would be needed to tackle the computation without such division. There is a wealth of deep mathematics underlying the FFT algorithm. But let us remember that John Tukey, ever the pragmatic engineer, is said to have declared “I wouldn’t want to fly in a plane whose design depended on whether a function was Riemann or Lebesgue integrable.”</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
For signals with a million samples, the gain in speed obtained by virtue of the magic of FFT can be upwards of 50,000. To appreciate this speedup ratio, consider some calculation that takes 1 second to complete on your computer. To most of us, this would seem sluggish. In today's world we expect our devices to be instantly responsive, but we might tolerate such slowness once in a while, say when we are applying some fancy effects to produce an irresistible cat video. Now, imagine that same computation being 50,000 times slower i.e. taking almost 14 hours to execute.</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
For decades, the FFT algorithm and its direct variants have ruled the world of signal processing. It is only recently that researchers[2] at MIT came up with an algorithm which takes advantage of a special property of real-life signals (sparsity) to achieve significantly faster performance. Although the speedup achieved (relative to the FFT, <em style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; font-family: inherit; font-stretch: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">not</em> relative to the straightforward computation) is not as spectacular, it is still useful in many practical applications. For example, it has the potential to reduce the time patients spend inside a MRI machine from an agonizing one hour to a bearable 20 minutes. There are many other applications where increased efficiency of computing the spectrum leads to other benefits such as longer battery life.</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
My introduction to the FFT came via a classic textbook, Athanasios Papoulis's <em style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; font-family: inherit; font-stretch: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Signal Analysis</em>. In graduate school, I remember scribbling pages after pages of formulas as I worked to absorb the concept. When it all fell into place, the feeling was as exhilarating as though I had personally come up with the algorithm myself. Wonderful applications suggested themselves, and it seemed that the world could be saved by my new found understanding of the FFT.</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
I was to be humbled because I was soon introduced to an application made possible by the FFT. Transmultiplexers are computational devices that allow a single wire (or radio channel) to simultaneously transmit thousands of speech signals in a way that allows them to be separated by individual receivers. Remarkably, FFT figures into the calculations needed to pull off this feat. Unlike common applications I had encountered at that time where the input to the FFT is a sequence of samples of a single signal as it is generated in time, in the transmultiplexer calculations, the input to the FFT at each time step is the sequence formed by gathering samples of several independent speech signals at a given instant of time. It amazed me that you could, in a manner of speaking, meaningfully combine and take apart independent words uttered by a crowd at a given time instant. Now this is one application I could not have come up with.</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
I know that many engineers and scientists have their own personal story of their first encounter with the FFT. Well, we are nerds are we not? Can we share?</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
------------</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
[1] Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Mathematics of Computation, 19: 297–301.</div>
<div style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 0px; box-sizing: border-box; color: #232629; font-family: Georgia, serif; font-size: 18px; font-stretch: inherit; line-height: 32px; margin-bottom: 32px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;">
[2] Haitham Hassanieh, Piotr Indyk, Dina Katabi, Eric Price. (2012). "Simple and Practical Algorithm for Sparse Fourier Transform". ACM-SIAM Symposium on Discrete Algorithms, Kyoto Japan</div>
<div style="text-align: left;">
<br /></div>
</div>
Shashi Sathyanarayanahttp://www.blogger.com/profile/13278009180608463800noreply@blogger.com33tag:blogger.com,1999:blog-1363953706834077188.post-32362648834716997042015-06-02T09:52:00.000-07:002015-06-23T21:45:59.437-07:00What is different about Big Data?<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
</div>
What is different about Big Data? Other than that it is, er, big?<br />
<br />
This is a question that I am frequently asked, most often by people who are plainly skeptical about Big Data simply because it is hugely oversold these days. At the recent Annual Signal and Image Sciences Workshop (more about this later in this blog), one of those skeptics remarked to me that there is a certain sameness to all the articles on Big Data found on the Internet. The first, and often the only, takeaway you get out of those articles is that Big Data is big. Also, the authors of those articles readily give in to the temptation to create clever phrases using the word “big.” The Big Data & Analytics Special Interest Group was inaugurated with a great presentation titled <a href="https://californiaconsultants.org/event/your-professional-network-the-secrets-to-a-successful-consultancy-2/" target="_blank">Big Lies and Small Truths</a>. In many articles, the apparently big benefits are touted, mixed in with cautionary mention of some other things Big: Infrastructure, Expense, Dangers, Fatigue and Brother.<br />
<br />
So, what is really different about Big Data? To me, the most qualitatively distinguishing aspect of the data in Big Data is this: It has quietly come into existence, with no one in particular having intentionally created it. Where in the past, data creation was a purposeful and protocol driven activity, today vast amounts of data have just come into existence by virtue of the fact that all of us choose to live and conduct business digitally. We do not stop to wipe off fingerprints, we let the soda dribble, we do not sweep away breadcrumbs. It is this nature of its creation, the largely unintentional, evolving, uncurated, uncontrolled and imperfect nature of Big Data that distinguishes it from the more familiar data. The bigness is not the essential feature, even if bigness is inevitable. This type of data quietly accumulated for years till, quite suddenly, we woke up to the Big Possibilities.<br />
<br />
However, Big Problems arise due to the fact that the data comprising Big Data does not originate from projects specifically designed to facilitate statistical inference and prediction. This brings us back to the event of May 13, the <a href="https://casis.llnl.gov/casis-2015/" target="_blank">Annual Signal and Image Sciences Workshop</a> hosted and sponsored by Lawrence Livermore National Laboratories. Unlike most forums in which we hear about problems and innovative solutions that arise due to the bigness of the data, this workshop featured several talks highlighting its more fundamental aspects.<br />
<br />
The difficulties that must be resolved before truly significant insights can be obtained from Big Data are many. These require an understanding of the fundamental limits to information extraction and the mathematical trade-offs involved in creating algorithms. The workshop offered a chance to hear about the relevance of kernel PCA, spectral clustering, Johnson-Lindenstrauss lemma, and the Chinese restaurant process. The last has nothing to do with General Tso’s Chicken, which delicacy GrubHub’s elementary Big Data analysis tells us is the most popular Chinese dish in the USA.<br />
<br />
Is there anything else that makes Big Data fundamentally different? The answer is yes. Compared to data sets studied traditionally, the data sets in most Big Data scenarios are of high dimension. And also high, of course, is the number of high dimensional observations available in the data set. Dimension, as in 2D and 3D, is a familiar concept. But with a little mental stretching, we can understand the dimension of a data set as the number of variables that are measured for each observation.<br />
<br />
High dimensionality is troublesome even in scenarios with moderate sized data sets. When there are too many directions, which are what dimensions are, we don’t know which way to look. For instance, take a look at the <a href="http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime" target="_blank">Communities and Crime Data Set</a> available in the Machine Learning Repository maintained by the University of California at Irvine. This is a 122-dimensional data set of size 1994. And this doesn’t even qualify as Big Data.<br />
<br />
If you need any convincing that data sets can look meaningless except when looked at in just the right way, take a look at the 3D data set I specially prepared for this blog. Each data point in the set is a 3D point fixed in a cubical volume of space. One possible way to present such a data set on a 2D screen is this: imagine placing the collection of points on a slowly spinning turntable and simply watch till the points align in a way that makes sense.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7udMNnnzOkGpL4gWfVD3TT3tLP94tswJ_aw2_-Qo9mqxv31hS2zm3KwZKA7k4v4X4BMZSV2hdwAQKu3y7KkjezXVltu7tfnvdSGApaR1B51FdirPpWwi6CvkLxkb-9mso42oOSJC5X98/s1600/What_is_different_about_big_data.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7udMNnnzOkGpL4gWfVD3TT3tLP94tswJ_aw2_-Qo9mqxv31hS2zm3KwZKA7k4v4X4BMZSV2hdwAQKu3y7KkjezXVltu7tfnvdSGApaR1B51FdirPpWwi6CvkLxkb-9mso42oOSJC5X98/s1600/What_is_different_about_big_data.gif" /></a></div>
<div>
The complexity in real-life high dimensional data hides the meaning really deep within the data. You (or more correctly, your algorithm) would not only need to figure out exactly from what viewpoint to look, but would also need to design funny mirrors to undo nonlinear distortions in the data.<br />
<br />
Funny things happen in high dimensional spaces. For example, it is quite rare for two random lines drawn on a page (2D) to turn out to be perpendicular to each other. However, in the high dimensional space of many Big Data sets it is practically a given that any two random vectors will be almost perpendicular to each other. In dealing with data sets of high dimensionality, the data scientist is forced (actually the good data scientist is delighted) to think about issues like noise accumulation, spurious correlations, incidental homogeneity and algorithmic instability which are rare considerations in working with traditional data. These are important issues because at the end of the day the data scientist is expected to help arrive at decisions from the data, or at very least help decision makers better understand high dimensional data and make informed decisions.<br />
<br />
“Solving a problem simply means representing it so that the solution is obvious,” Nobel laureate Herbert Simon said it best. Large-scale data visualization is a growing research topic because high dimensional data is growing while our computer screens are not. Ideally, effective visualization algorithms highlight patterns and structure, and remove distracting clutter. Since human perception is limited to three-dimensional space, the challenge of visualizing high dimensional data is to optimally reduce the data and present it in a manner comprehensible to the user while preserving as much relevant information as possible.<br />
<br />
Unfortunately, having so many dimensions (and the proliferation of tools being created by big.data.dot.coms) to play with opens up almost limitless ways to look at data. Anyone who wants to find something will find something. On the other hand, it is rarely lack of data that has resulted in bad consequences or policies. Domain expertise remains valuable despite all the data readily available. Astute practitioners place great importance on beginning with posing good questions rather than diving into the data headfirst. The questions shift the focus back to the purpose of utilizing big data – or what should be the purpose – gaining tangible awareness and understanding of real-world behaviors or conditions that lead to an enrichment of society.</div>
<div>
<br /></div>
<div>
[This post first appeared on the website of the <a href="https://californiaconsultants.org/" target="_blank">IEEE Consultants Network of Silicon Valley</a>.]</div>
</div>
Shashi Sathyanarayanahttp://www.blogger.com/profile/13278009180608463800noreply@blogger.com33tag:blogger.com,1999:blog-1363953706834077188.post-84884552206385386222014-07-22T15:19:00.000-07:002014-09-22T12:36:07.811-07:00A Gentle Introduction to Backpropagation<div dir="ltr" style="text-align: left;" trbidi="on">
<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>
<br />
<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
</div>
<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="clear: both; text-align: left;">
<span style="font-family: inherit; font-size: x-large;">
<div style="text-align: left;">
</div>
</span></h2>
<div style="clear: both;">
<br /></div>
<h2 style="clear: both;">
<span style="font-size: x-large;">Why is this blog being written?</span></h2>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">[<i>Update: A more detailed version of this article with coding outline is available. Please read the note at the end of this article.</i>]</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">Neural networks have always fascinated me ever since I became aware of them in the 1990s. They are often represented with a hypnotizing array of connections. In the last decade, deep neural networks have dominated pattern recognition, often replacing other algorithms for applications like computer vision and voice recognition. At least in specialized tasks, they indeed come close to mimicking the miraculous feats of cognition our brains are capable of.</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">While neural networks are capable of such feats, the very discovery of a method of programming such a computational device seems to me to be a miraculous feat of cognition worthy of celebration. My purpose in writing this blog is to share my perspective on an amazing algorithm made widely known by a 1986 publication in <i>Nature</i>.</span></div>
<div style="border-left-color: rgb(173, 189, 76); border-left-style: solid; border-left-width: 20px;">
<blockquote>
<span style="font-family: inherit;">I will assume that the reader has some understanding of the meaning and purpose of the various elements of a neural network such as the one shown in Figure 1. I have provided a little bit of background below. Other than that, this blog should be an easy read for those with some familiarity of basic calculus.</span></blockquote>
</div>
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguAGbiJ0fOXC03Lch48XMKJZ9W96Go1tafMxAVEaXqBT56KMRlLHztRyaLB4PoTe52gKMfP2hdQMEiS_JFpEl-u1UTeygPQRYQDE10Futgd8eYMPcPw02TxRj8YN1regaBhNYH7Y-GTbA/s1600/NN_Blog_Fig_1.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguAGbiJ0fOXC03Lch48XMKJZ9W96Go1tafMxAVEaXqBT56KMRlLHztRyaLB4PoTe52gKMfP2hdQMEiS_JFpEl-u1UTeygPQRYQDE10Futgd8eYMPcPw02TxRj8YN1regaBhNYH7Y-GTbA/s1600/NN_Blog_Fig_1.JPG" height="228" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: left;"><blockquote class="tr_bq">
Figure 1: An example of a multi-layered neural network that can be used to associate an input consisting of 10 numbers with one of 4 decisions or predictions.</blockquote>
</td></tr>
</tbody></table>
<h2 style="clear: both; text-align: left;">
<span style="font-family: inherit; font-size: x-large;">What is so difficult about designing a neural network?</span></h2>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;">To appreciate the difficulty involved in designing a neural network, consider this: The neural network shown in Figure 1 can be used to associate an input consisting of 10 numbers with one of 4 decisions or predictions. For example, the neural network shown may be used by a bank to determine if credit should be extended to a customer. In this case, the 10 input numbers represent various parameters relevant to an individual's financial responsibility such as balance in savings accounts, outstanding loan amounts, number of years of employment, and so on. </span><span style="font-family: inherit;">The neural network takes in these 10 numbers, performs calculations and produces an output consisting of 4 numbers. Depending on the position at which the maximum of these 4 numbers appears in the output, the prediction could be one of the following:</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<ol style="text-align: left;">
<li><span style="font-family: inherit;">Excellent creditworthiness with high spending limit</span></li>
<li><span style="font-family: inherit;">Average creditworthiness with moderate spending limit</span></li>
<li><span style="font-family: inherit;">Low creditworthiness with low spending limit</span></li>
<li><span style="font-family: inherit;">High default risk.</span></li>
</ol>
<span style="font-family: inherit;">Based on this prediction, the bank would take the appropriate decision.</span><br />
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;">A neural network such as the one shown in Figure 1 can perform this miraculous feat of cognition only if it is specifically trained to do so. For the network to work correctly, the weight at each of its </span>(10$\times $4)+(4$\times $6)+(6$\times $8)+(8$\times $4) =144<span style="font-family: inherit;"> connections have to be carefully chosen such that the network classifies every input drawn from a known training set with a high degree of accuracy. This is a classic application of the <i>supervised learning</i> paradigm in machine learning.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;">There are, of course, no formulas in </span>existence<span style="font-family: inherit;"> to directly set the values of the 144 weights. The only recourse is to start with some initial values for the 144 weights, check how good the resulting neural network is, and repeatedly refine the weights to progressively make the network more and more accurate. So, what is needed is a method of refining the weights.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;">If one were to think of the accuracy as some grand function of the weights, it makes sense to refine each weight by changing it by an amount proportional to the partial derivative of that grand function with respect to that weight. </span>Why bring in partial derivatives? Because they precisely predict how the accuracy responds to small changes in the weights. In fact, at every iteration, performing a refinement guided by the partial derivatives results in a more advantageous gain in accuracy compared to any other method of refinement. The method of steepest descent does exactly what is suggested here – with the minor, but entirely equivalent, goal of seeking to progressively decrease the error, rather than increase the accuracy.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;">To keep things straight, let me list the concepts I have described thus far:</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<ol style="text-align: left;">
<li><span style="font-family: inherit;">The weights of the neural network must be set such that the error on a known training set is minimized. Ideally, the network must yield zero error on a large and representative training data set.</span></li>
<li><span style="font-family: inherit;">For any given setting of weights, we can attempt to refine them by changing each weight by an amount proportional to the partial derivative of the error with respect to that weight. The partial derivatives themselves change after any such refinement</span>, and must be recomputed.</li>
<li><span style="font-family: inherit;">We can keep on refining the weights in this manner, and stop refining when the error is zero. In real life, we call it done when the error is low enough. Or refuses to fall anymore. Or we run out of time after a few million rounds of refinements.</span></li>
</ol>
<span style="font-family: inherit;">Seems simple, right? Yes. As long as there is a simple way of calculating the partial derivative of error with respect to every weight. In principle, </span>the partial derivatives <span style="font-family: inherit;">can be calculated by systematically perturbing the weights and measuring the resulting changes in error. A word to the wise, don't try this at home. Or even on your neighborhood supercomputer.</span><br />
<span style="font-family: inherit;"><br /></span>
<br />
<h2 style="clear: both; text-align: left;">
<span style="font-family: inherit; font-size: x-large;">
Backpropagation</span></h2>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: inherit;">In fact, this error minimization problem that must be solved to train a neural network eluded a practical solution for decades till D. E. Rumelhart, C. E. Hinton, and R. J. Williams (drawing inspiration from other researchers) demonstrated a technique, which they called </span><i>backpropagation, </i><span style="font-family: inherit;">and made it widely known (</span><i>Nature </i>323, 533-536, 9 October 1986<span style="font-family: inherit;">)</span><span style="font-family: inherit;">. I</span><span style="font-family: inherit;">t is essentially by building upon their method that today others have ventured to program neural networks with 60 </span><i style="font-family: inherit;">million</i><span style="font-family: inherit;"> weights, with astounding results.</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">According to Bernard Widrow, now Professor Emeritus at Stanford University and one of the pioneers of neural networks "The basic concepts of backpropagation are easily grasped. Unfortunately, these simple ideas are often obscured by relatively intricate notation, so formal derivations of the backpropagation rule are often tedious." </span><span style="font-family: inherit;">This is indeed unfortunate because the backpropagation rule is one of the most elegant applications of calculus that I have known. </span></div>
<div class="separator" style="clear: both;">
<br /></div>
<span style="font-size: x-large; font-weight: bold;">Easy as 1-2-3</span><br />
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">Once you appreciate the fact that, in order to train a neural network, you need to somehow calculate the </span>partial derivatives of the error with respect to weights<span style="font-family: inherit;">, backpropagation can be easily and qualitatively derived by reducing it to three core concepts. It also helps immensely to keep the notation intuitive and easy to connect to the concept being symbolized.</span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5TrH1OnMSg-WSxbb86QrjlvowfnIa3ZunXZYMvjZY7xB26RVNn-TROA3nMVTbwGNsON_oUmk0ZgomOqlSJFdZskUHCWQ4ZtmJWmEE8QTehxvJIKPgyc2apivdkX1uuAROYAOZoYrdQ3Q/s1600/NN_Blog_Fig_2.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5TrH1OnMSg-WSxbb86QrjlvowfnIa3ZunXZYMvjZY7xB26RVNn-TROA3nMVTbwGNsON_oUmk0ZgomOqlSJFdZskUHCWQ4ZtmJWmEE8QTehxvJIKPgyc2apivdkX1uuAROYAOZoYrdQ3Q/s1600/NN_Blog_Fig_2.JPG" height="227" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: left;"><blockquote class="tr_bq">
Figure 2: The derivation of the backpropagation algorithm is simplified by adding an extra computational block to calculate the error and also by boxing parts of the network.</blockquote>
</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<h2 style="clear: both; text-align: left;">
</h2>
<h3 style="clear: both; text-align: left;">
<span style="font-family: inherit; font-size: large;">1. Boxing</span></h3>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">Since training the neural network is all about minimizing the training error, the first step in the derivation involves tacking on an extra computational block to calculate the error between the actual output $\left\{o_1,o_2,o_3,o_4\right\}$ and a known target $\{t_1,t_2,t_3,t_4\}$. This is shown as a triangular block in Figure 2. For now, let us think of the output and the target as known and fixed entities. Although we need not concern ourselves with the exact formula to compute the error, I offer the familiar sum-of-squares error as an example</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">\[\text{e=(}o_1-t_1)^2\text{+(}o_2-t_2)^2\text{+(}o_3-t_3)^2\text{+(}o_4-t_4)^2\]</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both;">
<span style="font-family: inherit;">Next, we choose one of the layers (say Layer 3) and enclose that layer and all following layers (including the error calculating block) in a box, as shown in gray in Figure 2. Keep in mind that this is just one of several nested boxes we can construct in order to compartmentalize the network. Let us resolve not to worry about anything going on inside the box but simply think of the relationship between the input to the box and the output </span>(i.e the error) <span style="font-family: inherit;">coming out of the box. We will call this box the <i>current box</i>, and call the input to this box the <i>current input</i>, </span>$\{c_1,c_2,c_3,c_4,c_5,c_6\}$. <span style="font-family: inherit;">It is important to recognize that </span>the functional relationship between the current input to the box and the output emanating from the box is completely defined and can be computed. Let us denote this function as $E(c_1,c_2,c_3,c_4,c_5,c_6)$.</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHjdHExdd8cKO-YIP_EByBLO9mSVXLxpdXB7ZU6gC8fKQTaEh7Dpw8lD3mBx_aIs5log-FDqcO6qSr_jvj1Ai_zVPWe4JR-lfgP-CweHVT_w5qeCCOXdtihLDq0A0Z_ylfZOINu94tRec/s1600/NN_Blog_Fig_3.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHjdHExdd8cKO-YIP_EByBLO9mSVXLxpdXB7ZU6gC8fKQTaEh7Dpw8lD3mBx_aIs5log-FDqcO6qSr_jvj1Ai_zVPWe4JR-lfgP-CweHVT_w5qeCCOXdtihLDq0A0Z_ylfZOINu94tRec/s1600/NN_Blog_Fig_3.JPG" height="400" width="342" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: left;"><blockquote class="tr_bq">
Figure 3: The box contains parts of the neural network hidden from view. It allows us to think about broad relationships among different parts of the network.</blockquote>
</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<h3 style="text-align: left;">
<span style="font-size: large;">2. Sensitivity</span></h3>
As our journey through backpropagation continues, I gently request you to assume that the vector of partial derivatives $\frac{\partial E}{\partial c_1},\frac{\partial E}{\partial c_2}, \ldots<br />
$ of the function $E(c_1,c_2,c_3,c_4,c_5,c_6)$ is known. This might seem like a stretch. After all, we have set out to find a method to compute some other (equally unrealized) partial derivatives. But I assure you it will all work out in the end. To emphasize the crucial nature of this simple concept, it has been given a name: <i>Sensitivity</i>. Let us denote the sensitivity of our current box as $\{\text{$\delta $c}_1,\text{$\delta $c}_2,\text{$\delta $c}_3,\text{$\delta<br />
$c}_4,\text{$\delta $c}_5,\text{$\delta $c}_6\}$<span style="font-family: inherit;">. With the help of Figure 3 to hold these concepts in our mind,</span> we can concretely think about how the output of the current box responds to a small perturbation applied to any of its current inputs. For example, if the fourth component of the current input changes by the small amount $\Delta c_4,$ we can expect the error at the output to change by $\Delta c_4 \delta c_4.$ Further, in addition to the hypothetical change in component 4, if there is a simultaneous change of $\Delta c_6$ in component 6, we can expect the error at the output to change by an additional amount, making the total change $\Delta c_4 \delta c_4 + \Delta c_6 \delta c_6.$<span style="font-family: inherit;"> The effect of small simultaneous changes in the current input components simply add up at the output.</span><br />
<span style="font-family: inherit;"><br /></span>
Knowing the sensitivity of the current box, what can we say about the sensitivity of the <i>preceding box</i>? Keep in mind that the preceding box encloses Layer 2 and all following layers, including the error calculating block. For our specific example, let us call the input to this box the <i>preceding input</i>, $\{p_1,p_2,p_3,p_4.\}$ It follows quite logically that the sensitivity of the preceding box (which we will naturally denote as $\{\text{$\delta $p}_1,\text{$\delta $p}_2,\text{$\delta $p}_3,\text{$\delta<br />
$p}_4\}$) must be related to the sensitivity of the current box <u>and</u> the extra neural network elements making up the difference between the two nested boxes. The extra elements are the very vital nonlinear activation function units, summing junctions and weights.<br />
<br />
Figure 4(a) shows the current box and the extra elements that must be added to construct the preceding box. For clarity, all the elements not relevant to the calculation of the first component of sensitivity ($\delta p_1$) have been grayed out. Look closely at Figures 4(a) and 4(b) to understand how the sensitivity of the preceding box can easily be derived from first principles. Specifically, Figure 4(b) provides insight into how $\delta p_1(=\frac{\partial e}{\partial p_1})$ can be computed by allowing the input component $p_1$ to change by a small quantity $\Delta p_1$ and following the resulting changes in the network. Notes: (i) The notation $\mathcal{A}'\left(p_1\right)$ has been used for the derivative of the activation function evaluated at $p_1.$ (ii) For clarity, not all changes in signals have been explicitly labeled. Those that are not labeled can easily be determined since they all follow an obvious pattern.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWwq8Vxm_SpnNqYnOO6y9xa8WQSar79FGnqA-hzb8Y0DFsOHkZFSxS9s8dgOZjUl-5BxR-cO5D73_2VWyljtjNR2ccIgfOVU3EHeK0vobq9k3-yB174DnP0CO3CeS29GT6Cjo-3LziPko/s1600/NN_Blog_Fig_4ab.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWwq8Vxm_SpnNqYnOO6y9xa8WQSar79FGnqA-hzb8Y0DFsOHkZFSxS9s8dgOZjUl-5BxR-cO5D73_2VWyljtjNR2ccIgfOVU3EHeK0vobq9k3-yB174DnP0CO3CeS29GT6Cjo-3LziPko/s1600/NN_Blog_Fig_4ab.jpg" height="640" width="555" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: left;"><blockquote class="tr_bq">
Figure 4(a) shows the current box and the extra elements that must be added to construct the preceding box. Figure 4(b) provides insight into how $\delta p_1(=\frac{\partial e}{\partial p_1})$ can be computed by allowing the input component $p_1$ to change by a small quantity $\Delta p_1$ and following the resulting changes in the network.</blockquote>
</td></tr>
</tbody></table>
This is indeed a deep result. (And one which we have managed to arrive at without recourse to tedious calculus and numbered equations.) By repeated application, this result allows us to work backwards and calculate the sensitivity of the error to changes in the input at every layer.<br />
<br />
The algorithm gets the name <i>backpropagation </i>because the sensitivities are propagated backwards as they are calculated in sequence. The textbook formula to express the sensitivity of the preceding layer in terms of the sensitivity of the current layer is easily seen to be<br />
\[\delta p_i = \mathcal{A}'(p_i) \sum _{j} w_{i j}\delta c_j \]<br />
<br />
A starting point is all we need to completely calculate all the sensitivity terms throughout the neural network. To do this, we consider the error computing block itself as the first box. For this box, the input is $\left\{o_1,o_2,o_3,o_4\right\},$ and the output is $e$ as given in the sum-of-squares error formula we have seen before. Simple calculus gives us the components of the sensitivity of the error computing block<br />
\[\left\{<br />
2 \left(o_1-t_1\right),<br />
2 \left(o_2-t_2\right),<br />
2 \left(o_3-t_3\right),<br />
2 \left(o_4-t_4\right)<br />
\right\}\]<br />
<h3>
<span style="font-size: large;">3. Weight updates</span></h3>
At this point, the last section writes itself. Following the same strategy outlined in the previous figure, look at Figure 5(a) and 5(b) to intuitively understand how the error changes in response to a small change in one of the weights, say $w_{1 1}$. Once again in these figures, details of connections not immediately relevant to this calculation have been grayed out. The much sought after partial derivative of error with respect to the specific weight $w_{1 1}$ is easily seen to be $\mathcal{A}\left(p_1\right) \delta c_1$. The textbook formula to compute the partial derivative of the error with respect to any weight is easily seen to be<br />
\[\frac{\partial e}{\partial w_{i j}} =\mathcal{A}\left(p_i\right) \delta c_j \]<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheB7N2gFuxUcAhDxz0dE4RurkCKiK9CnXJ9ynCwX39KoTwHv6BPM5FMdo3G7xkjvyLG2_sIpTfi9LL7prFWFk4d8nKwBLOOx1bLQsdxEV1C3tMQ_6x4WeKM0Tc9QFT7fiBlrQLlqEUcwc/s1600/NN_Blog_Fig_5ab.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheB7N2gFuxUcAhDxz0dE4RurkCKiK9CnXJ9ynCwX39KoTwHv6BPM5FMdo3G7xkjvyLG2_sIpTfi9LL7prFWFk4d8nKwBLOOx1bLQsdxEV1C3tMQ_6x4WeKM0Tc9QFT7fiBlrQLlqEUcwc/s1600/NN_Blog_Fig_5ab.jpg" height="640" width="552" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: left;"><blockquote class="tr_bq">
Figure 5: Another set of figures designed to provide insight into the error changes in response to a small change in one of the weights, $w_{1 1}$. Using this intuition, the partial derivative $\frac{\partial e}{\partial w_{1 1}}$ can be computed as $\mathcal{A}\left(p_1\right) \delta c_1$.</blockquote>
</td></tr>
</tbody></table>
Now that we have a formula to calculate the partial derivative of error with respect to every weight in the network, we can proceed to iteratively refine the weights and minimize the error using the method of steepest descent.<br />
<br />
In the most popular version of backpropagation, called <i>stochastic backpropagation</i>, the weights are initially set to small random values and the training set is randomly polled to pick out a single input-target pair. The input is passed through the network to compute internal signals (like $\mathcal{A}\left(p_1\right)$ and $\mathcal{A}'\left(p_1\right)$ shown in Figures 4 and 5) and the output vector. Once this is done, all the information needed to initiate backpropagation becomes available. The partial derivatives of error with respect to the weights can be computed, and the weights can be refined with intent to reduce the error. The process is iterated using another randomly chosen input-target pair.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<h2 style="text-align: left;">
<span style="font-size: x-large;">The miraculous feat of cognition</span></h2>
<div style="border-left-color: rgb(173, 189, 76); border-left-style: solid; border-left-width: 20px;">
<blockquote>
I am in awe of the miraculous feat of cognition that lead early neural network researchers to arrive at the backpropagation algorithm. They clearly had the ability to see patterns and make elegant groupings which ultimately made it possible to train huge networks. Their work not only resulted in the neural network applications we use today, but have also inspired a host of other related algorithms which depend on error minimization.</blockquote>
</div>
Although this algorithm has been presented here as a single established method, it should be regarded as a framework. In my experience, appreciating how an algorithm is derived leads to insight which makes it possible to explore beneficial variations. The process of designing robust and efficient scientific algorithms frequently leads us to regard established frameworks as substrates upon which to build new and better algorithms.<br />
<br />
<hr />
Note: A PDF version of this article with high quality graphics and pseudocode ("Training Wheels for Training Neural Networks") for implementing the algorithm is available. Please visit <a href="http://www.numericinsight.com/">www.numericinsight.com</a> and find Downloads on the home page.<br />
<a href="http://4.bp.blogspot.com/-vjvsY3foRWE/U8XVl10L82I/AAAAAAAAGDU/fR5UoFzoUwM/s1600/NN_Blog_Fig_1.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-left: 1em;"><br /></a></div>
</div>
</div>
Shashi Sathyanarayanahttp://www.blogger.com/profile/13278009180608463800noreply@blogger.com173tag:blogger.com,1999:blog-1363953706834077188.post-40484010817221506322013-09-05T16:43:00.000-07:002013-09-05T17:20:08.528-07:00Land that Big Data job. Cash in on the movement.<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
Riding on the wave of a new movement requires that you learn
new skills. I am often asked “What do I need to learn to profit from the Big
Data trend?” My answer is “Learn some Machine Learning.”<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Why do I say that? Three reasons</div>
<div class="MsoNormal">
</div>
<ol style="text-align: left;">
<li>For business leaders,
investing just a little bit of time to pick up the basics of machine learning
can help them formulate solvable problems and navigate the sea of potential
technological options that can be used to realize business value</li>
<li>Those with
even a basic degree of comfort with numbers (high school math, really) can quickly come up
to speed on machine learning and go on to become leaders who contribute to
gains on hundreds of challenging real-world data analysis problems from a wide
span of industries and sciences.</li>
<li>They are hiring. Do you have the skills?
More and more companies are looking at data to increase productivity and
competitiveness. What are you waiting for?</li>
</ol>
<br />
<div class="MsoNormal">
<o:p></o:p></div>
<div style="text-align: left;">
<span style="font-size: large;"><b>
Machine Learning has arrived</b></span></div>
<div class="MsoNormal">
After all, no less an authority than Bill Gates recently
said, “When it comes to technology, there are four areas where I think a lot of
exciting things will happen in the coming decades: big data, machine learning,
genomics, and ubiquitous computing …” What is more, Bill Gates is also reported
to have declared that “a breakthrough in machine learning would be worth ten
Microsofts.”<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Even if you happen to think that the above stand is
debatable, there is no question that machine learning has arrived and is moving
center stage. Large computational capital is readily available, and simple
algorithms are waiting for someone to leverage them for handsome returns.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Numerous aspects of our life, both significant and trivial,
have become incredibly measurable. Several aspects are already finely measured
and stored. Privacy concerns aside, this might seem like a good thing till we
realize that most of this data lies in deep slumber. Unused, unsorted and
unedited, this data, rather than boosting the bottom line, kills productivity –
cluttering up storage space on devices and slowing down connections. Just as
most of us store more digital pictures than we can handle, companies frequently
find that – although a treasure troves of data is on hand – the very volume of
data makes it difficult to find what they need or glean actionable insight.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
But we mustn’t blame the data. We must celebrate it. Astute
business leaders have realized that tremendous insights exist in Big Data and
want to use it in order to leap ahead of their competitors. Businesses are
looking for expertise in machine learning to parse, reduce, simplify and
categorize data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Practitioner of this craft, sometimes called data
scientists, are not quite software engineers. Although they might be quite
competent at coding, their expertise lies in developing or adapting algorithms
that operate on data to reveal order that can lead to insight. Data scientists
can understand a data-centric problem, come up with a solvable formulation and
recommend a practical solution. They may provide working prototypes to
demonstrate the solution, and even sometimes provide segments of final code.
They work closely with others more grounded in software engineering whose job
is to translate their work, optimize it exquisitely and plug it into the larger
framework of efficient runtime code implemented in the final product.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="text-align: left;">
<span style="font-size: large;"><b>
You are already a Data Scientist</b></span></div>
<div class="MsoNormal">
Lest we forget, the ability to tame Big Data is the ultimate
hallmark of intelligent beings. Your brain runs the best machine learning
algorithms known. It absorbs massive data sets every moment of every day. The
processing and directing of thought and action that you so effortlessly
accomplish is amazing. Even more so once you consider that no neuron in your
brain fires faster than 1 kHz, the speed of a circa 1980 PC. Of course, your
brain’s biological computing system does not work the way a computer does.
Rather than laboriously finding and opening large files loaded with complex
information, your brain calls up millions of individually stored simple data
elements, processes them in parallel in multiple locations, sometimes breaks
rules leading to profoundly creative solutions and generally causes an
emergence of sensible outcomes. This happened even before you spoke your first
word, and now happens even when you sleep.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
So, it looks like we already know machine learning without
knowing that we know it. Many practitioners that I know report that the process
of learning machine learning is accompanied by fleeting feelings of
understanding how we think. This seems natural considering that machine
learning is all about the problem of finding essential and distinguishing
properties of a category, a quest that we are constantly engaged in.</div>
<div class="MsoNormal">
<br /></div>
<div style="text-align: left;">
<span style="font-size: large;"><b>
Just good for fun and games?</b></span></div>
<div class="MsoNormal">
Let me end this blog by taking another question that I am
sometimes asked, “Is Big Data only useful for superficial stuff like
recommending good movies to watch?”<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
My answer is that machine learning is used in several
efforts to improve the quality of security, public health and safety. Methods
for the automatic detection of events and other patterns in huge data sets find
natural applications in areas such as flagging shipping containers likely to
hold banned goods, timely alarms of disease outbreaks, warning of possible but
yet-to-be-committed crimes, predicting dangerous sinkholes using satellite
data, improved methods of assessing a patient’s risk of developing diabetes,
proactively abating traffic congestion, early detection and containment of
forest fires before they become too big, the list goes on ...<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
To be sure, Big Data and machine learning will be used to
earn big bucks for corporations. But it is being, and can be, put to more noble
uses in the wider world.</div>
</div>
Shashi Sathyanarayanahttp://www.blogger.com/profile/13278009180608463800noreply@blogger.com6tag:blogger.com,1999:blog-1363953706834077188.post-69836687739922642512013-05-24T21:39:00.001-07:002015-12-21T16:24:11.234-08:00Prudential Harnesses the Wisdom of Crowds<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: inherit;"><br /></span>
<br />
<div class="MsoNormal">
<span style="font-family: inherit; font-size: large; line-height: 115%;">“Statistics
is boring! I can prove it to you!”</span><span style="font-family: inherit;"><span style="line-height: 115%;"> I sometimes declare in </span></span><a href="http://www.numericinsight.com/Services.html#Training" style="font-family: inherit; line-height: 115%;" target="_blank">statistics classes that I teach</a><span style="font-family: inherit;"><span style="line-height: 115%;">. I do that in order to make the class lively. As proof, I ask the
class to name a famous physicist (Einstein's name comes up every time), a famous biologist
(Darwin's name comes up every time), and so on. I finally, ask them to name a famous
statistician. This is usually greeted with silence, laughter, and the
inevitable smart aleck response, “Shashi.” Statistics has no </span></span><span style="line-height: 18px;">celebrities</span><span style="font-family: inherit;"><span style="line-height: 115%;"> to boast of? It must be boring! Is it really? I proceed to challenge the class
with “<i>I bet you </i></span></span><i style="font-family: inherit; line-height: 115%;">definitely </i><span style="font-family: inherit;"><span style="line-height: 115%;"><i>know the name of a famous and pioneering
statistician.</i>” This is indeed true, but the class has to wait till the end of the
day to know who it is.<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">I often try
to make up simple experiments to help the class realize that statistics is
merely common sense quantified. Sadly, this is not the message most people take
away from statistics classes or training that they might have endured. So, I
was quite struck by the uniqueness of the </span><a href="https://youtu.be/jdVop4AQqIk" style="line-height: 115%;" target="_blank">Prudential commercial demonstrating statistics in an interactive way</a><span style="line-height: 115%;">. In this commercial, filmed at a park in
Austin, Texas, a group of 400 people were asked the question “<i>How old is the
oldest person </i></span><span style="line-height: 18px;"><i>you've</i></span><span style="line-height: 115%;"><i> ever known?</i>” Each person was given a blue dot and asked to
stick it on a 1,100-square-foot wall, lined up with the age of that person. The
results, and the way the results develop into a neat histogram, are striking.</span></span><br />
<span style="font-family: inherit;"><span style="line-height: 115%;"><br /></span></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibhU37_WC2mYDfxWek4zQld_IBa6KYu3KdHvIDTMz9770CihPCxq4qRxxgkc0WvER_O3brSzLwZP-PjAt0n8hPVTY_UammnnHY-Toyu650z04VyhI7oBM7BTqvzg4sQV-at7njR1l0yko/s1600/Prudential_Blue_Stickers_Experiment.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="color: black; font-family: inherit;"><img border="0" height="77" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibhU37_WC2mYDfxWek4zQld_IBa6KYu3KdHvIDTMz9770CihPCxq4qRxxgkc0WvER_O3brSzLwZP-PjAt0n8hPVTY_UammnnHY-Toyu650z04VyhI7oBM7BTqvzg4sQV-at7njR1l0yko/s400/Prudential_Blue_Stickers_Experiment.png" width="400" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: inherit; font-size: small;">A participant in Prudential's commercial adds her sticker.</span><br />
<span style="font-family: inherit; font-size: small;">The final array of stickers is on the right.</span></td></tr>
</tbody></table>
<div class="MsoNormal" style="text-align: left;">
</div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span>
<span style="font-family: inherit; line-height: 115%;">The visual effect
is spectacular because the blue dots organically pile up into a mountain well
to the right of a reference line showing the retirement age of 65. Prudential prudently
realized that the public cannot be expected to show sustained interest in
static graphs or percentages (even when rendered in arresting, moving colors
and fonts) attempting to communicate details of life expectancy and the cost of
retirement. In a refreshing move, they harnessed the wisdom of the crowds to
create a compelling visual for them.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;">Compelling,
it is. But the thought that is so compellingly communicated – “Oh my gosh! Just
look at this huge number of very very old people! If they retired at 65, what
in the world are they going to live on in their 90s and beyond? Now, I better
do something about this for myself!” – is quite far removed from the
statistical facts that happen to be actually relevant to retirement and ageing.</span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">Besides, there is a bit of a problem with the question <o:p></o:p></span><span style="line-height: 18px;">“<i>How old is the oldest person </i></span><span style="line-height: 18px;"><i>you've</i></span><span style="line-height: 18px;"><i> ever known?</i>” This was revealed when I attempted to duplicate the experiment using my friends as volunteers: Given more time to think and more opportunity for family members to jog our memory, we often find that we have known someone older than what we first thought. And, as it often happened in my experiment, we can recall knowing someone <i>really </i>old without knowing their exact age. A statisticians job is to prevent such problems by helping develop a solid protocol for the experiment.</span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">But never
mind. I </span><span style="line-height: 18px;">don't</span><span style="line-height: 115%;"> want this blog to turn out to be a critique of Prudential’s approach
to selling retirement planning products. On the contrary, I very much
appreciate Prudential’s effort in bringing complex statistical concepts to the
masses. In fact, a number of interesting statistical and computational ideas
can be illustrated by taking Prudential’s experiment as a starting point. Let
us explore.</span><span style="line-height: 115%;"><o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<span style="font-family: inherit; line-height: 115%;">First, let us try to quantitatively describe
shape of the mountain of blue stickers. Since it looks suggestive of the famous
bell curve, let us fit such a curve to approximate the tops of the stacks of
blue stickers. The blue sticker experiment’s design – and the resulting images
– actually makes it easy for us to fit a bell curve. The steps are easy: (1) Scan
each column of pixels in the image from top to bottom and place a red mark at the
first blue pixel encountered. (2) Next, allow a curve fitting algorithm (quite
routinely used in scientific programming) to find the bell curve that best
follows the red marks. The figure below shows the steps and the result of doing this
exercise involving simple image processing and curve fitting operations.</span><br />
<span style="font-family: inherit;"><span style="line-height: 115%;"><br /></span>
</span><br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNwAAfdJQhP-2KLxDbnRHC3A87RhpDk0RgwItErwM8CevKsDOw3XxTIaq740TFwbshpbq_UaVXR4iYutBMQvKwEPQs0cyPBVzLGun1iU4SmxMHSPl78I3zd3XHp_0ekDDkjcjLXerlOuE/s1600/Mathematical_model_of_histogram_of_blue_stickers.gif" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="color: black; font-family: inherit;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNwAAfdJQhP-2KLxDbnRHC3A87RhpDk0RgwItErwM8CevKsDOw3XxTIaq740TFwbshpbq_UaVXR4iYutBMQvKwEPQs0cyPBVzLGun1iU4SmxMHSPl78I3zd3XHp_0ekDDkjcjLXerlOuE/s1600/Mathematical_model_of_histogram_of_blue_stickers.gif" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: inherit; font-size: small;">Fitting a bell curve to the histogram of blue stickers</span></td></tr>
</tbody></table>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">Next we ask
questions. “<i>Why does the mountain of blue stickers have the appearance that
it does? Why does it look orderly even though the participants did not plan for creating the order? Why does it have a peak around 90 years?” </i></span><span style="line-height: 115%;">We can answer these questions by doing our own blue sticker experiment, this time virtually (and at near-zero cost!)</span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;">Let us start with the <a href="https://www.census.gov/population/age/data/2011comp.html" target="_blank">2011 US Census</a>, which
makes data on age distribution available on its website. From this data, we can apply curve fitting again to build an approximate mathematical model of age distribution, shown as the blue
curve which approximately follows the red data points.</span></div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6_aPRf_nb6_ADiOB_bRpnV6wYbRxbl81GV8JJ5bq_7bDDt8Xn2cu5ittL1mkSDp_GH5tskCiOSwn21Jcrf-j3KrLy-S_rno6gsRf6rfOSJHXouEtXuNHpleszGotEFH0fJ5hTa_n0jo4/s1600/2011_US_Census_Age_Distribution_and_Model.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="color: black; font-family: inherit;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6_aPRf_nb6_ADiOB_bRpnV6wYbRxbl81GV8JJ5bq_7bDDt8Xn2cu5ittL1mkSDp_GH5tskCiOSwn21Jcrf-j3KrLy-S_rno6gsRf6rfOSJHXouEtXuNHpleszGotEFH0fJ5hTa_n0jo4/s1600/2011_US_Census_Age_Distribution_and_Model.png" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: inherit; font-size: small;">Age distribution data and its mathematical model</span></td></tr>
</tbody></table>
<span style="font-family: inherit;"><br /></span>
<br />
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;">Having a
mathematical model allows us to perform all manner of useful thought experiments,
or simulations<span style="line-height: 115%;"> – or indulge
our fancy.</span> When done carefully, these experiments give rise to insight,
help make predictions and often provide easy answers to real world questions. For
example, from the mathematical model of age distribution, we can generate a random
group of virtual people and be sure that the number of retirees (i.e. those
past 65) is about what would be expected in an actual sample of the same size (roughly 12%). We
can then create an easily understandable picture of the age distribution of
that group.</span><br />
<span style="font-family: inherit; line-height: 115%;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikXI3-Rts99utDO5UT2LqprKn8HpEWnkJGCKi2MNUL25D7QlY8gjnBxbCzngnFIQE-puMYWmrL4dAERiwsPD0NK-Ohp3j4_UJh5_4eeE_c7jKbp1Q6jrTnOT0u_sVvFRgdXUE59VyRMAQ/s1600/Sample_Drawn_From_Age_Distribution_Model.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="color: black; font-family: inherit;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikXI3-Rts99utDO5UT2LqprKn8HpEWnkJGCKi2MNUL25D7QlY8gjnBxbCzngnFIQE-puMYWmrL4dAERiwsPD0NK-Ohp3j4_UJh5_4eeE_c7jKbp1Q6jrTnOT0u_sVvFRgdXUE59VyRMAQ/s1600/Sample_Drawn_From_Age_Distribution_Model.png" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: inherit; font-size: small;">A random group of virtual people. Those aged 65 and older are colored orange.</span></td></tr>
</tbody></table>
<div class="MsoNormal">
<span style="line-height: 115%;"><span style="font-family: inherit;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">If I were a
participant in Prudential’s experiment and if the people depicted in this
picture were the ones (and only ones) ever known to me, I would go up to the
wall and place my sticker above the 92 mark. </span><i><span style="line-height: 115%;">How old is the oldest person </span><span style="line-height: 18px;">you've</span><span style="line-height: 115%;"> ever known?</span></i><span style="line-height: 115%;"> 92.<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="line-height: 115%;"><span style="font-family: inherit;"><br /></span></span></div>
<span style="font-family: inherit;"><span style="line-height: 115%;">Since we have a mathematical model, we can
repeat this process of </span><span style="line-height: 18px;">identifying the oldest individual as often as we like, each time </span><span style="line-height: 115%;">using a different virtual group. Each repetition of the process
corresponds to one instance of an individual in Prudential’s experiment
mentally going over everyone ever known to him or her and identifying the
oldest person known. The result of repeating the process a large number of
times – each time placing a virtual sticker on a virtual wall – is shown in the
figure below.</span></span></div>
<div class="MsoNormal">
<span style="line-height: 115%;"><span style="font-family: inherit;"><br /></span></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOeP_5a4F7MUoH4MDr94nQxk-d5ApX69o0v9S5jrkaFkRMbHCmJ-UkB5Xt0T__VvpdvrHfHF4KxlLcYJKFvt3iaZSVwX0TSYsbRzF-c_ohk7pVZBbucTVlEehjpAGfhE3QgVu4bcycV5w/s1600/Histogram_of_simulated_blue_stickers_and_model.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="color: black; font-family: inherit;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOeP_5a4F7MUoH4MDr94nQxk-d5ApX69o0v9S5jrkaFkRMbHCmJ-UkB5Xt0T__VvpdvrHfHF4KxlLcYJKFvt3iaZSVwX0TSYsbRzF-c_ohk7pVZBbucTVlEehjpAGfhE3QgVu4bcycV5w/s1600/Histogram_of_simulated_blue_stickers_and_model.png" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: inherit; font-size: small;">Virtual blue stickers generated by running a simulation of Prudential's experiment.<br />The histogram and the bell curve fit closely follow the result from the live experiment.</span></td></tr>
</tbody></table>
<div class="MsoNormal">
<div class="MsoNormal">
<span style="line-height: 115%;"><span style="font-family: inherit;"><br /></span></span></div>
<div class="MsoNormal">
<span style="line-height: 115%;"><span style="font-family: inherit;">Even though
the experiment involves generating random ages, we see that the pattern of stickers is
not quite random. It follows what seems to be the familiar bell curve again
(shown as a blue curve). What is more, it even peaks at around 90 years. The
agreement between the actual (shown as a black dashed curve) and virtual
versions is quite close. In aggregate, a predictable pattern emerges from
randomness. The wisdom of the crowd ensures that the stickers fall into an orderly and familiar histogram, rather than be scattered all over the place.<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="line-height: 115%;"><span style="font-family: inherit;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">So, that’s
it for now. I hope we see more such creative attempts to infuse freshness into
commercials. Everyone I know enjoyed this commercial. I am sure there were statisticians behind the </span><span style="line-height: 18px;">scene</span><span style="line-height: 115%;"> who worked hard, (well before the experiment was </span><span style="line-height: 18px;">conducted</span><span style="line-height: 115%;"> live) to make sure that it did not turn out to be a public flop. There is lots more
material than can be covered in this blog. </span><i style="line-height: 115%;">Does it really follow a bell curve? Does
it matter? What is the distribution that is actually relevant to funding
retirement plans?</i></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><span style="line-height: 115%;">As always, I will enjoy talking to you about this or any other numerical and insightful topics.<o:p></o:p></span></span></div>
</div>
<div class="MsoNormal">
<span style="font-family: inherit; line-height: 115%;"></span></div>
<span style="font-family: inherit;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="text-align: left;">
<br /></div>
</div>
Shashi Sathyanarayanahttp://www.blogger.com/profile/13278009180608463800noreply@blogger.com345