## Why is this blog being written?

[

*Update: A more detailed version of this article with coding outline is available. Please read the note at the end of this article.*]
Neural networks have always fascinated me ever since I became aware of them in the 1990s. They are often represented with a hypnotizing array of connections. In the last decade, deep neural networks have dominated pattern recognition, often replacing other algorithms for applications like computer vision and voice recognition. At least in specialized tasks, they indeed come close to mimicking the miraculous feats of cognition our brains are capable of.

While neural networks are capable of such feats, the very discovery of a method of programming such a computational device seems to me to be a miraculous feat of cognition worthy of celebration. My purpose in writing this blog is to share my perspective on an amazing algorithm made widely known by a 1986 publication in

*Nature*.I will assume that the reader has some understanding of the meaning and purpose of the various elements of a neural network such as the one shown in Figure 1. I have provided a little bit of background below. Other than that, this blog should be an easy read for those with some familiarity of basic calculus.

Figure 1: An example of a multi-layered neural network that can be used to associate an input consisting of 10 numbers with one of 4 decisions or predictions. |

## What is so difficult about designing a neural network?

To appreciate the difficulty involved in designing a neural network, consider this: The neural network shown in Figure 1 can be used to associate an input consisting of 10 numbers with one of 4 decisions or predictions. For example, the neural network shown may be used by a bank to determine if credit should be extended to a customer. In this case, the 10 input numbers represent various parameters relevant to an individual's financial responsibility such as balance in savings accounts, outstanding loan amounts, number of years of employment, and so on. The neural network takes in these 10 numbers, performs calculations and produces an output consisting of 4 numbers. Depending on the position at which the maximum of these 4 numbers appears in the output, the prediction could be one of the following:

- Excellent creditworthiness with high spending limit
- Average creditworthiness with moderate spending limit
- Low creditworthiness with low spending limit
- High default risk.

A neural network such as the one shown in Figure 1 can perform this miraculous feat of cognition only if it is specifically trained to do so. For the network to work correctly, the weight at each of its (10$\times $4)+(4$\times $6)+(6$\times $8)+(8$\times $4) =144 connections have to be carefully chosen such that the network classifies every input drawn from a known training set with a high degree of accuracy. This is a classic application of the

*supervised learning*paradigm in machine learning.
There are, of course, no formulas in existence to directly set the values of the 144 weights. The only recourse is to start with some initial values for the 144 weights, check how good the resulting neural network is, and repeatedly refine the weights to progressively make the network more and more accurate. So, what is needed is a method of refining the weights.

If one were to think of the accuracy as some grand function of the weights, it makes sense to refine each weight by changing it by an amount proportional to the partial derivative of that grand function with respect to that weight. Why bring in partial derivatives? Because they precisely predict how the accuracy responds to small changes in the weights. In fact, at every iteration, performing a refinement guided by the partial derivatives results in a more advantageous gain in accuracy compared to any other method of refinement. The method of steepest descent does exactly what is suggested here – with the minor, but entirely equivalent, goal of seeking to progressively decrease the error, rather than increase the accuracy.

To keep things straight, let me list the concepts I have described thus far:

- The weights of the neural network must be set such that the error on a known training set is minimized. Ideally, the network must yield zero error on a large and representative training data set.
- For any given setting of weights, we can attempt to refine them by changing each weight by an amount proportional to the partial derivative of the error with respect to that weight. The partial derivatives themselves change after any such refinement, and must be recomputed.
- We can keep on refining the weights in this manner, and stop refining when the error is zero. In real life, we call it done when the error is low enough. Or refuses to fall anymore. Or we run out of time after a few million rounds of refinements.

## Backpropagation

In fact, this error minimization problem that must be solved to train a neural network eluded a practical solution for decades till D. E. Rumelhart, C. E. Hinton, and R. J. Williams (drawing inspiration from other researchers) demonstrated a technique, which they called

*backpropagation,*and made it widely known (*Nature*323, 533-536, 9 October 1986). It is essentially by building upon their method that today others have ventured to program neural networks with 60*million*weights, with astounding results.
According to Bernard Widrow, now Professor Emeritus at Stanford University and one of the pioneers of neural networks "The basic concepts of backpropagation are easily grasped. Unfortunately, these simple ideas are often obscured by relatively intricate notation, so formal derivations of the backpropagation rule are often tedious." This is indeed unfortunate because the backpropagation rule is one of the most elegant applications of calculus that I have known.

Once you appreciate the fact that, in order to train a neural network, you need to somehow calculate the partial derivatives of the error with respect to weights, backpropagation can be easily and qualitatively derived by reducing it to three core concepts. It also helps immensely to keep the notation intuitive and easy to connect to the concept being symbolized.

Figure 2: The derivation of the backpropagation algorithm is simplified by adding an extra computational block to calculate the error and also by boxing parts of the network. |

### 1. Boxing

Since training the neural network is all about minimizing the training error, the first step in the derivation involves tacking on an extra computational block to calculate the error between the actual output $\left\{o_1,o_2,o_3,o_4\right\}$ and a known target $\{t_1,t_2,t_3,t_4\}$. This is shown as a triangular block in Figure 2. For now, let us think of the output and the target as known and fixed entities. Although we need not concern ourselves with the exact formula to compute the error, I offer the familiar sum-of-squares error as an example

\[\text{e=(}o_1-t_1)^2\text{+(}o_2-t_2)^2\text{+(}o_3-t_3)^2\text{+(}o_4-t_4)^2\]

Next, we choose one of the layers (say Layer 3) and enclose that layer and all following layers (including the error calculating block) in a box, as shown in gray in Figure 2. Keep in mind that this is just one of several nested boxes we can construct in order to compartmentalize the network. Let us resolve not to worry about anything going on inside the box but simply think of the relationship between the input to the box and the output (i.e the error) coming out of the box. We will call this box the

*current box*, and call the input to this box the*current input*, $\{c_1,c_2,c_3,c_4,c_5,c_6\}$. It is important to recognize that the functional relationship between the current input to the box and the output emanating from the box is completely defined and can be computed. Let us denote this function as $E(c_1,c_2,c_3,c_4,c_5,c_6)$.Figure 3: The box contains parts of the neural network hidden from view. It allows us to think about broad relationships among different parts of the network. |

### 2. Sensitivity

As our journey through backpropagation continues, I gently request you to assume that the vector of partial derivatives $\frac{\partial E}{\partial c_1},\frac{\partial E}{\partial c_2}, \ldots$ of the function $E(c_1,c_2,c_3,c_4,c_5,c_6)$ is known. This might seem like a stretch. After all, we have set out to find a method to compute some other (equally unrealized) partial derivatives. But I assure you it will all work out in the end. To emphasize the crucial nature of this simple concept, it has been given a name:

*Sensitivity*. Let us denote the sensitivity of our current box as $\{\text{$\delta $c}_1,\text{$\delta $c}_2,\text{$\delta $c}_3,\text{$\delta

$c}_4,\text{$\delta $c}_5,\text{$\delta $c}_6\}$. With the help of Figure 3 to hold these concepts in our mind, we can concretely think about how the output of the current box responds to a small perturbation applied to any of its current inputs. For example, if the fourth component of the current input changes by the small amount $\Delta c_4,$ we can expect the error at the output to change by $\Delta c_4 \delta c_4.$ Further, in addition to the hypothetical change in component 4, if there is a simultaneous change of $\Delta c_6$ in component 6, we can expect the error at the output to change by an additional amount, making the total change $\Delta c_4 \delta c_4 + \Delta c_6 \delta c_6.$ The effect of small simultaneous changes in the current input components simply add up at the output.

Knowing the sensitivity of the current box, what can we say about the sensitivity of the

*preceding box*? Keep in mind that the preceding box encloses Layer 2 and all following layers, including the error calculating block. For our specific example, let us call the input to this box the

*preceding input*, $\{p_1,p_2,p_3,p_4.\}$ It follows quite logically that the sensitivity of the preceding box (which we will naturally denote as $\{\text{$\delta $p}_1,\text{$\delta $p}_2,\text{$\delta $p}_3,\text{$\delta

$p}_4\}$) must be related to the sensitivity of the current box

__and__the extra neural network elements making up the difference between the two nested boxes. The extra elements are the very vital nonlinear activation function units, summing junctions and weights.

Figure 4(a) shows the current box and the extra elements that must be added to construct the preceding box. For clarity, all the elements not relevant to the calculation of the first component of sensitivity ($\delta p_1$) have been grayed out. Look closely at Figures 4(a) and 4(b) to understand how the sensitivity of the preceding box can easily be derived from first principles. Specifically, Figure 4(b) provides insight into how $\delta p_1(=\frac{\partial e}{\partial p_1})$ can be computed by allowing the input component $p_1$ to change by a small quantity $\Delta p_1$ and following the resulting changes in the network. Notes: (i) The notation $\mathcal{A}'\left(p_1\right)$ has been used for the derivative of the activation function evaluated at $p_1.$ (ii) For clarity, not all changes in signals have been explicitly labeled. Those that are not labeled can easily be determined since they all follow an obvious pattern.

The algorithm gets the name

*backpropagation*because the sensitivities are propagated backwards as they are calculated in sequence. The textbook formula to express the sensitivity of the preceding layer in terms of the sensitivity of the current layer is easily seen to be

\[\delta p_i = \mathcal{A}'(p_i) \sum _{j} w_{i j}\delta c_j \]

A starting point is all we need to completely calculate all the sensitivity terms throughout the neural network. To do this, we consider the error computing block itself as the first box. For this box, the input is $\left\{o_1,o_2,o_3,o_4\right\},$ and the output is $e$ as given in the sum-of-squares error formula we have seen before. Simple calculus gives us the components of the sensitivity of the error computing block

\[\left\{

2 \left(o_1-t_1\right),

2 \left(o_2-t_2\right),

2 \left(o_3-t_3\right),

2 \left(o_4-t_4\right)

\right\}\]

### 3. Weight updates

At this point, the last section writes itself. Following the same strategy outlined in the previous figure, look at Figure 5(a) and 5(b) to intuitively understand how the error changes in response to a small change in one of the weights, say $w_{1 1}$. Once again in these figures, details of connections not immediately relevant to this calculation have been grayed out. The much sought after partial derivative of error with respect to the specific weight $w_{1 1}$ is easily seen to be $\mathcal{A}\left(p_1\right) \delta c_1$. The textbook formula to compute the partial derivative of the error with respect to any weight is easily seen to be\[\frac{\partial e}{\partial w_{i j}} =\mathcal{A}\left(p_i\right) \delta c_j \]

In the most popular version of backpropagation, called

*stochastic backpropagation*, the weights are initially set to small random values and the training set is randomly polled to pick out a single input-target pair. The input is passed through the network to compute internal signals (like $\mathcal{A}\left(p_1\right)$ and $\mathcal{A}'\left(p_1\right)$ shown in Figures 4 and 5) and the output vector. Once this is done, all the information needed to initiate backpropagation becomes available. The partial derivatives of error with respect to the weights can be computed, and the weights can be refined with intent to reduce the error. The process is iterated using another randomly chosen input-target pair.

## The miraculous feat of cognition

I am in awe of the miraculous feat of cognition that lead early neural network researchers to arrive at the backpropagation algorithm. They clearly had the ability to see patterns and make elegant groupings which ultimately made it possible to train huge networks. Their work not only resulted in the neural network applications we use today, but have also inspired a host of other related algorithms which depend on error minimization.

Note: A PDF version of this article with high quality graphics and pseudocode ("Training Wheels for Training Neural Networks") for implementing the algorithm is available. Please visit www.numericinsight.com and find Downloads on the home page.

Good and easy grasp explanation. Best :)

ReplyDeleteThe best explanation I've seen on this topic yet!

ReplyDeleteGreat explanation. However I would recommend not having animations but instead just images with the red/yellow boxes added. It is a bit frustrating to read them and then wait until they come back.

ReplyDeleteI worked on a classic problem of using a neural network to play the Japanese game GO. Training a

ReplyDeleteNN on a 19x19 board is to computationally expensive. Humans learn on a 9x9 board. So the question

is, can we train a NN on a 9x9 board and apply the learning to the full board? The solution is to think

about the 9x9 NN as a filter that can "slide around" the 19x19, looking for a match. A fast fourier

transform can be used to do this. The FFT can "look for" the 9x9 pattern over the 19x19 board.

So you can "teach in the small" and "learn in the large".

I never wrote up and published this result and I haven't seen it anywhere else to date.

It is useful for complex problems (e.g. finding enemy tanks in a forest) involving large data

sets (high density images).

Can I repost this blog and translate it into Chinese by my understanding ? It is very helpfully to understand backpropagation.

ReplyDeleteThank you for your interest. Please email me at shashi@numericinsight.com to discuss this matter.

DeleteSir how can the same algorithm be used for Texture analysis

ReplyDeletewow...I am impressed!

ReplyDeletethanks for sharing wonderful article description of what while written the blogs in back end process. your graphical information is too good. keep updating more.

ReplyDeleteWeb Designing Training in chennai

Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

ReplyDeleteiOS Training in Chennai

Nice it seems to be good post... It will get readers engagement on the article since readers engagement plays an vital role in every blog.i am expecting more updated posts from your hands.

ReplyDeleteAndroid App Development Company

Very useful & Informative

ReplyDeleteBest Android Training in Chennai, Velachery

Best ios Training in Chennai, Velachery

great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...

ReplyDeleteFreshers Jobs in Chennai

Nice it seems to be good post... It will get readers engagement on the article since readers engagement plays an vital role in every blog.i am expecting more updated posts from your hands.

ReplyDeleteiOS App Development Company

iOS App Development Company

Thanks for sharing!

ReplyDeleteWeb design company in Hyderabad

Digital Marketing company in Hyderabad

Its fantatic explaintion lot of information gather it...nice article....

ReplyDeleteseo company in Chennai

This article is very much helpful and i hope this will be an useful information for the needed one. Keep on updating these kinds of informative things...

ReplyDeleteFitness SMS

Fitness Text

Salon SMS

Salon Text

Investor Relation SMS

Investor Relation Text

Mobile Marketing Services

mobile marketing companies

Sms API

Helped me a lot

ReplyDeleteonline pickles in hyderabad

Study MBBS in Philippines

Low cost MBBS in Philippines

Web Designing company in Hyderabad

Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.

ReplyDeletemcdonaldsgutscheine | startlr.com/ | saludlimpia

This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...

ReplyDeleteMobile Marketing Service

Mobile Marketing Companies

Texting API

Sms API

sms marketing

Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.

ReplyDeletePHP Training Institute in Chennai

The best explanation I've seen on this topic yet!BEST PHP TRAINING IN CHENNAI

ReplyDeleteIt's more informative blog and useful also PHP training in chennai

ReplyDeleteFound your post interesting to read. I cant wait to see your post soon. Good Luck for the upcoming update.This article is really very interesting and effective.

ReplyDeletePSD to Wordpress

wordpress website development

Well explained. Keep updating Artificial intelligence Online Trining

ReplyDeleteI think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!! Best Machine Learning Training courses | best machine learning institute in chennai | Machine Learning course in chennai

ReplyDeleteI love the blog. Great post. It is very true, people must learn how to learn before they can learn. lol i know it sounds funny but its very true. . .

ReplyDeleteangularjs online Training

angularjs Training in marathahalli

angularjs interview questions and answers

angularjs Training in bangalore

angularjs Training in bangalore

angularjs online Training

Hello I am so delighted I found your blog, I really found you by mistake, while I was looking on Yahoo for something else, anyways I am here now and would just like to say thanks for a tremendous post. Please do keep up the great work.

ReplyDeleteJava training in Bangalore | Java training in Jaya nagar

Java training in Bangalore | Java training in Electronic city

Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in USA

This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.

ReplyDeleterpa training in chennai

rpa training in bangalore

rpa course in bangalore

best rpa training in bangalore

rpa online training

It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we forgot all hardwork and wait for getting that happened.

ReplyDeletepython interview questions and answers

python tutorials

python course institute in electronic city

From your discussion I have understood that which will be better for me and which is easy to use. Really, I have liked your brilliant discussion. I will comThis is great helping material for every one visitor. You have done a great responsible person. i want to say thanks owner of this blog.

ReplyDeleteData Science Training in Chennai | Best Data science Training in Chennai

Data Science training in anna nagar | Data science training in Chennai

Data Science training in chennai | Best Data Science training in chennai

Data science training in Bangalore | Data Science training institute in Bangalore

Data Science training in marathahalli | Data Science training in Bangalore

Data Science interview questions and answers

Interesting post which makes you think about a lot of things. This post helped to make the career.

ReplyDeletedatapower tutorial video

Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.

ReplyDeleteDevops Training courses

Devops Training in Bangalore

Best Devops Training in pune

Devops interview questions and answers

I am really happy with your blog because your article is very unique and powerful for new reader.

ReplyDeleteClick here:

selenium training in chennai | selenium course in chennai

selenium training in bangalore | selenium course in bangalore

selenium training in Pune | selenium course in pune | selenium class in pune

selenium training in Pune | selenium course in pune | selenium class in pune

selenium online training | selenium training online | online training on selenium

All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.

ReplyDeletepython course in pune

python course in chennai

python course in Bangalore

Greetings. I know this is somewhat off-topic, but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform like yours, and I’m having difficulty finding one? Thanks a lot.

ReplyDeleteAdvanced AWS Online Training | Advanced Online AWS Certification Course - Gangboard

Best AWS Training in Chennai | Amazon Web Services Training Institute in Chennai Velachery, Tambaram, OMR

Advanced AWS Training in Bangalore |Best AWS Training Institute in Bangalore BTMLA ,Marathahalli

All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.

ReplyDeleteJava training in Bangalore | Java training in Jaya nagar

Java training in Bangalore | Java training in Electronic city

Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in USA

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

ReplyDeleterpa training in Chennai | rpa training in bangalore | best rpa training in bangalore | rpa course in bangalore | rpa training institute in bangalore | rpa training in bangalore | rpa online training

Very useful and informative content has been shared out here, Thanks for sharing it.

ReplyDeleteVisit Learn Digital Academy for more information on Digital marketing course in Bangalore https://www.learndigital.co/.

Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.

ReplyDeleteBest Devops training in sholinganallur

Devops training in velachery

Devops training in annanagar

Devops training in tambaram

Excellent blog, I wish to share your post with my folks circle. It’s really helped me a lot, so keep sharing post like this

ReplyDeleteData Science Course in Indira nagar

Data Science Course in btm layout

Python course in Kalyan nagar

Data Science course in Indira nagar

Data Science Course in Marathahalli

Data Science Course in BTM Layout

When I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.

ReplyDeleteAmazon Web Services Interview Questions And Answers | Best AWS interview questions 2018

Advanced Amazon Web Services Training in Chennai | Best AWS Training Institute in Chennai

Best AWS Training Institute in Pune | Best AWS Training Center in Pune

Best AWS Training Institute in Marathahalli | Advanced AWS Training in Marathahalli, Bangalore

Nice post..

ReplyDeletedata science training in BTM

best data science courses in BTM

data science institute in BTM

data science certification BTM

data analytics training in BTM

data science training institute in BTM

Nice post..

ReplyDeleterobotics courses in BTM

robotic process automation training in BTM

blue prism training in BTM

rpa training in BTM

automation anywhere training in BTM