Monday, August 24, 2015

The FFT is Fifty

Their 1965 paper starts with the unassuming words "An efficient method for the calculation of the interactions of a 2^m factorial experiment was introduced by Yates and is widely known by his name." The paper does not use the word "fast" even once. But the algorithm described in that paper written by James Cooley and John Tukey[1] came to be known as the Fast Fourier Transform (FFT), and turned out to be just what the world was waiting for. Overnight, in universities and laboratories around the world, scientists and engineers began writing code and building hardware to implement the FFT. Fifty years later, versions of that algorithm are routinely used, often tacitly assumed to be available to programmers. Any computation that requires (either as an end, or as an intermediate step) the unraveling of the frequency content of information carrying signals benefits from the enormous speedup offered by the FFT. Other than the speedup, the FFT does not offer any computational result that was not available before its discovery. However, were it not for this amazing algorithm, it would be practically impossible to filter cell-phone signals, to compress movie files, do spectroscopy, take magnetic resonance imaging scans, perform quantum computing, and solve differential equations, among other things. The Cooley-Tukey paper created a surge in products that suddenly became possible, and often even became practical to operate in real-time.
Although the idea behind FFT can be explained without using complex math (it involves complex numbers, so I guess that does make the math complex), no attempt to do so will be made here. Suffice it to say that the algorithm belongs to a class of mathematical calculations that can either be performed in a straightforward manner, or can be solved by dividing the calculations into smaller steps. Cooley and Tukey showed how the dividing could be done in a manner such that the total effort needed to compute the simpler steps is much less than the effort that would be needed to tackle the computation without such division. There is a wealth of deep mathematics underlying the FFT algorithm. But let us remember that John Tukey, ever the pragmatic engineer, is said to have declared “I wouldn’t want to fly in a plane whose design depended on whether a function was Riemann or Lebesgue integrable.”
For signals with a million samples, the gain in speed obtained by virtue of the magic of FFT can be upwards of 50,000. To appreciate this speedup ratio, consider some calculation that takes 1 second to complete on your computer. To most of us, this would seem sluggish. In today's world we expect our devices to be instantly responsive, but we might tolerate such slowness once in a while, say when we are applying some fancy effects to produce an irresistible cat video. Now, imagine that same computation being 50,000 times slower i.e. taking almost 14 hours to execute.
For decades, the FFT algorithm and its direct variants have ruled the world of signal processing. It is only recently that researchers[2]  at MIT came up with an algorithm which takes advantage of a special property of real-life signals (sparsity) to achieve significantly faster performance. Although the speedup achieved (relative to the FFT, not relative to the straightforward computation) is not as spectacular, it is still useful in many practical applications. For example, it has the potential to reduce the time patients spend inside a MRI machine from an agonizing one hour to a bearable 20 minutes. There are many other applications where increased efficiency of computing the spectrum leads to other benefits such as longer battery life.
My introduction to the FFT came via a classic textbook, Athanasios Papoulis's Signal Analysis. In graduate school, I remember scribbling pages after pages of formulas as I worked to absorb the concept. When it all fell into place, the feeling was as exhilarating as though I had personally come up with the algorithm myself. Wonderful applications suggested themselves, and it seemed that the world could be saved by my new found understanding of the FFT.
I was to be humbled because I was soon introduced to an application made possible by the FFT. Transmultiplexers are computational devices that allow a single wire (or radio channel) to simultaneously transmit thousands of speech signals in a way that allows them to be separated by individual receivers. Remarkably, FFT figures into the calculations needed to pull off this feat. Unlike common applications I had encountered at that time where the input to the FFT is a sequence of samples of a single signal as it is generated in time, in the transmultiplexer calculations, the input to the FFT at each time step is the sequence formed by gathering samples of several independent speech signals at a given instant of time. It amazed me that you could, in a manner of speaking, meaningfully combine and take apart independent words uttered by a crowd at a given time instant. Now this is one application I could not have come up with.
I know that many engineers and scientists have their own personal story of their first encounter with the FFT. Well, we are nerds are we not? Can we share?
[1] Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Mathematics of Computation, 19: 297–301.
[2] Haitham Hassanieh, Piotr Indyk, Dina Katabi, Eric Price. (2012). "Simple and Practical Algorithm for Sparse Fourier Transform". ACM-SIAM Symposium on Discrete Algorithms, Kyoto Japan

Tuesday, June 2, 2015

What is different about Big Data?

What is different about Big Data? Other than that it is, er, big?

This is a question that I am frequently asked, most often by people who are plainly skeptical about Big Data simply because it is hugely oversold these days. At the recent Annual Signal and Image Sciences Workshop (more about this later in this blog), one of those skeptics remarked to me that there is a certain sameness to all the articles on Big Data found on the Internet. The first, and often the only, takeaway you get out of those articles is that Big Data is big. Also, the authors of those articles readily give in to the temptation to create clever phrases using the word “big.” The Big Data & Analytics Special Interest Group was inaugurated with a great presentation titled Big Lies and Small Truths. In many articles, the apparently big benefits are touted, mixed in with cautionary mention of some other things Big: Infrastructure, Expense, Dangers, Fatigue and Brother.

So, what is really different about Big Data? To me, the most qualitatively distinguishing aspect of the data in Big Data is this: It has quietly come into existence, with no one in particular having intentionally created it. Where in the past, data creation was a purposeful and protocol driven activity, today vast amounts of data have just come into existence by virtue of the fact that all of us choose to live and conduct business digitally. We do not stop to wipe off fingerprints, we let the soda dribble, we do not sweep away breadcrumbs. It is this nature of its creation, the largely unintentional, evolving, uncurated, uncontrolled and imperfect nature of Big Data that distinguishes it from the more familiar data. The bigness is not the essential feature, even if bigness is inevitable. This type of data quietly accumulated for years till, quite suddenly, we woke up to the Big Possibilities.

However, Big Problems arise due to the fact that the data comprising Big Data does not originate from projects specifically designed to facilitate statistical inference and prediction. This brings us back to the event of May 13, the Annual Signal and Image Sciences Workshop hosted and sponsored by Lawrence Livermore National Laboratories. Unlike most forums in which we hear about problems and innovative solutions that arise due to the bigness of the data, this workshop featured several talks highlighting its more fundamental aspects.

The difficulties that must be resolved before truly significant insights can be obtained from Big Data are many. These require an understanding of the fundamental limits to information extraction and the mathematical trade-offs involved in creating algorithms. The workshop offered a chance to hear about the relevance of kernel PCA, spectral clustering, Johnson-Lindenstrauss lemma, and the Chinese restaurant process. The last has nothing to do with General Tso’s Chicken, which delicacy GrubHub’s elementary Big Data analysis tells us is the most popular Chinese dish in the USA.

Is there anything else that makes Big Data fundamentally different? The answer is yes. Compared to data sets studied traditionally, the data sets in most Big Data scenarios are of high dimension. And also high, of course, is the number of high dimensional observations available in the data set. Dimension, as in 2D and 3D, is a familiar concept. But with a little mental stretching, we can understand the dimension of a data set as the number of variables that are measured for each observation.

High dimensionality is troublesome even in scenarios with moderate sized data sets. When there are too many directions, which are what dimensions are, we don’t know which way to look. For instance, take a look at the Communities and Crime Data Set available in the Machine Learning Repository maintained by the University of California at Irvine. This is a 122-dimensional data set of size 1994. And this doesn’t even qualify as Big Data.

If you need any convincing that data sets can look meaningless except when looked at in just the right way, take a look at the 3D data set I specially prepared for this blog. Each data point in the set is a 3D point fixed in a cubical volume of space. One possible way to present such a data set on a 2D screen is this: imagine placing the collection of points on a slowly spinning turntable and simply watch till the points align in a way that makes sense.
The complexity in real-life high dimensional data hides the meaning really deep within the data. You (or more correctly, your algorithm) would not only need to figure out exactly from what viewpoint to look, but would also need to design funny mirrors to undo nonlinear distortions in the data.

Funny things happen in high dimensional spaces. For example, it is quite rare for two random lines drawn on a page (2D) to turn out to be perpendicular to each other. However, in the high dimensional space of many Big Data sets it is practically a given that any two random vectors will be almost perpendicular to each other. In dealing with data sets of high dimensionality, the data scientist is forced (actually the good data scientist is delighted) to think about issues like noise accumulation, spurious correlations, incidental homogeneity and algorithmic instability which are rare considerations in working with traditional data. These are important issues because at the end of the day the data scientist is expected to help arrive at decisions from the data, or at very least help decision makers better understand high dimensional data and make informed decisions.

“Solving a problem simply means representing it so that the solution is obvious,” Nobel laureate Herbert Simon said it best. Large-scale data visualization is a growing research topic because high dimensional data is growing while our computer screens are not. Ideally, effective visualization algorithms highlight patterns and structure, and remove distracting clutter. Since human perception is limited to three-dimensional space, the challenge of visualizing high dimensional data is to optimally reduce the data and present it in a manner comprehensible to the user while preserving as much relevant information as possible.

Unfortunately, having so many dimensions (and the proliferation of tools being created by to play with opens up almost limitless ways to look at data. Anyone who wants to find something will find something. On the other hand, it is rarely lack of data that has resulted in bad consequences or policies. Domain expertise remains valuable despite all the data readily available. Astute practitioners place great importance on beginning with posing good questions rather than diving into the data headfirst. The questions shift the focus back to the purpose of utilizing big data – or what should be the purpose – gaining tangible awareness and understanding of real-world behaviors or conditions that lead to an enrichment of society.

[This post first appeared on the website of the IEEE Consultants Network of Silicon Valley.]