## Why is this blog being written?

Neural networks have always fascinated me ever since I became aware of them in the 1990s. They are often represented with a hypnotizing array of connections. In the last decade, deep neural networks have dominated pattern recognition, often replacing other algorithms for applications like computer vision and voice recognition. At least in specialized tasks, they indeed come close to mimicking the miraculous feats of cognition our brains are capable of.

While neural networks are capable of such feats, the very discovery of a method of programming such a computational device seems to me to be a miraculous feat of cognition worthy of celebration. My purpose in writing this blog is to share my perspective on an amazing algorithm made widely known by a 1986 publication in Nature.
I will assume that the reader has some understanding of the meaning and purpose of the various elements of a neural network such as the one shown in Figure 1. I have provided a little bit of background below. Other than that, this blog should be an easy read for those with some familiarity of basic calculus.

 Figure 1: An example of a multi-layered neural network that can be used to associate an input consisting of 10 numbers with one of 4 decisions or predictions.

## What is so difficult about designing a neural network?

To appreciate the difficulty involved in designing a neural network, consider this: The neural network shown in Figure 1 can be used to associate an input consisting of 10 numbers with one of 4 decisions or predictions. For example, the neural network shown may be used by a bank to determine if credit should be extended to a customer. In this case, the 10 input numbers represent various parameters relevant to an individual's financial responsibility such as balance in savings accounts, outstanding loan amounts, number of years of employment, and so on. The neural network takes in these 10 numbers, performs calculations and produces an output consisting of 4 numbers. Depending on the position at which the maximum of these 4 numbers appears in the output, the prediction could be one of the following:
1. Excellent creditworthiness with high spending limit
2. Average creditworthiness with moderate spending limit
3. Low creditworthiness with low spending limit
4. High default risk.
Based on this prediction, the bank would take the appropriate decision.

A neural network such as the one shown in Figure 1 can perform this miraculous feat of cognition only if it is specifically trained to do so. For the network to work correctly, the weight at each of its (10$\times$4)+(4$\times$6)+(6$\times$8)+(8$\times$4) =144 connections have to be carefully chosen such that the network classifies every input drawn from a known training set with a high degree of accuracy. This is a classic application of the supervised learning paradigm in machine learning.

There are, of course, no formulas in existence to directly set the values of the 144 weights. The only recourse is to start with some initial values for the 144 weights, check how good the resulting neural network is, and repeatedly refine the weights to progressively make the network more and more accurate. So, what is needed is a method of refining the weights.

If one were to think of the accuracy as some grand function of the weights, it makes sense to refine each weight by changing it by an amount proportional to the partial derivative of that grand function with respect to that weight. Why bring in partial derivatives? Because they precisely predict how the accuracy responds to small changes in the weights. In fact, at every iteration, performing a refinement guided by the partial derivatives results in a more advantageous gain in accuracy compared to any other method of refinement. The method of steepest descent does exactly what is suggested here – with the minor, but entirely equivalent, goal of seeking to progressively decrease the error, rather than increase the accuracy.

To keep things straight, let me list the concepts I have described thus far:
1. The weights of the neural network must be set such that the error on a known training set is minimized. Ideally, the network must yield zero error on a large and representative training data set.
2. For any given setting of weights, we can attempt to refine them by changing each weight by an amount proportional to the partial derivative of the error with respect to that weight. The partial derivatives themselves change after any such refinement, and must be recomputed.
3. We can keep on refining the weights in this manner, and stop refining when the error is zero. In real life, we call it done when the error is low enough. Or refuses to fall anymore. Or we run out of time after a few million rounds of refinements.
Seems simple, right? Yes. As long as there is a simple way of calculating the partial derivative of error with respect to every weight. In principle, the partial derivatives can be calculated by systematically perturbing the weights and measuring the resulting changes in error. A word to the wise, don't try this at home. Or even on your neighborhood supercomputer.

## Backpropagation

In fact, this error minimization problem that must be solved to train a neural network eluded a practical solution for decades till D. E. Rumelhart, C. E. Hinton, and R. J. Williams (drawing inspiration from other researchers) demonstrated a technique, which they called backpropagation, and made it widely known (Nature 323, 533-536, 9 October 1986). It is essentially by building upon their method that today others have ventured to program neural networks with 60 million weights, with astounding results.

According to Bernard Widrow, now Professor Emeritus at Stanford University and one of the pioneers of neural networks "The basic concepts of backpropagation are easily grasped. Unfortunately, these simple ideas are often obscured by relatively intricate notation, so formal derivations of the backpropagation rule are often tedious." This is indeed unfortunate because the backpropagation rule is one of the most elegant applications of calculus that I have known.

Easy as 1-2-3
Once you appreciate the fact that, in order to train a neural network, you need to somehow calculate the partial derivatives of the error with respect to weights, backpropagation can be easily and qualitatively derived by reducing it to three core concepts. It also helps immensely to keep the notation intuitive and easy to connect to the concept being symbolized.
 Figure 2: The derivation of the backpropagation algorithm is simplified by adding an extra computational block to calculate the error and also by boxing parts of the network.

### 1. Boxing

Since training the neural network is all about minimizing the training error, the first step in the derivation involves tacking on an extra computational block to calculate the error between the actual output $\left\{o_1,o_2,o_3,o_4\right\}$ and a known target $\{t_1,t_2,t_3,t_4\}$. This is shown as a triangular block in Figure 2. For now, let us think of the output and the target as known and fixed entities. Although we need not concern ourselves with the exact formula to compute the error, I offer the familiar sum-of-squares error as an example
$\text{e=(}o_1-t_1)^2\text{+(}o_2-t_2)^2\text{+(}o_3-t_3)^2\text{+(}o_4-t_4)^2$

Next, we choose one of the layers (say Layer 3) and enclose that layer and all following layers (including the error calculating block) in a box, as shown in gray in Figure 2. Keep in mind that this is just one of several nested boxes we can construct in order to compartmentalize the network. Let us resolve not to worry about anything going on inside the box but simply think of the relationship between the input to the box and the output (i.e the error) coming out of the box. We will call this box the current box, and call the input to this box the current input$\{c_1,c_2,c_3,c_4,c_5,c_6\}$. It is important to recognize that the functional relationship between the current input to the box and the output emanating from the box is completely defined and can be computed. Let us denote this function as $E(c_1,c_2,c_3,c_4,c_5,c_6)$.
 Figure 3: The box contains parts of the neural network hidden from view. It allows us to think about broad relationships among different parts of the network.

### 2. Sensitivity

As our journey through backpropagation continues, I gently request you to assume that the vector of partial derivatives $\frac{\partial E}{\partial c_1},\frac{\partial E}{\partial c_2}, \ldots$ of the function $E(c_1,c_2,c_3,c_4,c_5,c_6)$ is known. This might seem like a stretch. After all, we have set out to find a method to compute some other (equally unrealized) partial derivatives. But I assure you it will all work out in the end. To emphasize the crucial nature of this simple concept, it has been given a name: Sensitivity. Let us denote the sensitivity of our current box as $\{\text{$\delta $c}_1,\text{$\delta $c}_2,\text{$\delta $c}_3,\text{$\delta
$c}_4,\text{$\delta $c}_5,\text{$\delta $c}_6\}$. With the help of Figure 3 to hold these concepts in our mind, we can concretely think about how the output of the current box responds to a small perturbation applied to any of its current inputs. For example, if the fourth component of the current input changes by the small amount $\Delta c_4,$ we can expect the error at the output to change by $\Delta c_4 \delta c_4.$ Further, in addition to the hypothetical change in component 4, if there is a simultaneous change of $\Delta c_6$ in component 6, we can expect the error at the output to change by an additional amount, making the total change $\Delta c_4 \delta c_4 + \Delta c_6 \delta c_6.$ The effect of small simultaneous changes in the current input components simply add up at the output.

Knowing the sensitivity of the current box, what can we say about the sensitivity of the preceding box? Keep in mind that the preceding box encloses Layer 2 and all following layers, including the error calculating block. For our specific example, let us call the input to this box the preceding input, $\{p_1,p_2,p_3,p_4.\}$ It follows quite logically that the sensitivity of the preceding box (which we will naturally denote as $\{\text{$\delta $p}_1,\text{$\delta $p}_2,\text{$\delta $p}_3,\text{$\delta
$p}_4\}$) must be related to the sensitivity of the current box and the extra neural network elements making up the difference between the two nested boxes. The extra elements are the very vital nonlinear activation function units, summing junctions and weights.

Figure 4(a) shows the current box and the extra elements that must be added to construct the preceding box. For clarity, all the elements not relevant to the calculation of the first component of sensitivity ($\delta p_1$) have been grayed out. Look closely at Figures 4(a) and 4(b) to understand how the sensitivity of the preceding box can easily be derived from first principles. Specifically, Figure 4(b) provides insight into how $\delta p_1(=\frac{\partial e}{\partial p_1})$ can be computed  by allowing the input component $p_1$ to change by a small quantity $\Delta p_1$ and following the resulting changes in the network. Notes: (i) The notation $\mathcal{A}'\left(p_1\right)$ has been used for the derivative of the activation function evaluated at $p_1.$ (ii) For clarity, not all changes in signals have been explicitly labeled. Those that are not labeled can easily be determined since they all follow an obvious pattern.
 Figure 4(a) shows the current box and the extra elements that must be added to construct the preceding box. Figure 4(b) provides insight into how $\delta p_1(=\frac{\partial e}{\partial p_1})$ can be computed  by allowing the input component $p_1$ to change by a small quantity $\Delta p_1$ and following the resulting changes in the network.
This is indeed a deep result. (And one which we have managed to arrive at without recourse to tedious calculus and numbered equations.) By repeated application, this result allows us to work backwards and calculate the sensitivity of the error to changes in the input at every layer.

The algorithm gets the name backpropagation because the sensitivities are propagated backwards as they are calculated in sequence. The textbook formula to express the sensitivity of the preceding layer in terms of the sensitivity of the current layer is easily seen to be
$\delta p_i = \mathcal{A}'(p_i) \sum _{j} w_{i j}\delta c_j$

A starting point is all we need to completely calculate all the sensitivity terms throughout the neural network. To do this, we consider the error computing block itself as the first box. For this box, the input is $\left\{o_1,o_2,o_3,o_4\right\},$ and the output is $e$ as given in the sum-of-squares error formula we have seen before. Simple calculus gives us the components of the sensitivity of the error computing block
$\left\{ 2 \left(o_1-t_1\right), 2 \left(o_2-t_2\right), 2 \left(o_3-t_3\right), 2 \left(o_4-t_4\right) \right\}$

At this point, the last section writes itself. Following the same strategy outlined in the previous figure, look at Figure 5(a) and 5(b) to intuitively understand how the error changes in response to a small change in one of the weights, say $w_{1 1}$. Once again in these figures, details of connections not immediately relevant to this calculation have been grayed out. The much sought after partial derivative of error with respect to the specific weight $w_{1 1}$ is easily seen to be $\mathcal{A}\left(p_1\right) \delta c_1$. The textbook formula to compute the partial derivative of the error with respect to any weight is easily seen to be
$\frac{\partial e}{\partial w_{i j}} =\mathcal{A}\left(p_i\right) \delta c_j$
 Figure 5: Another set of figures designed to provide insight into the error changes in response to a small change in one of the weights, $w_{1 1}$. Using this intuition, the partial derivative $\frac{\partial e}{\partial w_{1 1}}$ can be computed as $\mathcal{A}\left(p_1\right) \delta c_1$.
Now that we have a formula to calculate the partial derivative of error with respect to every weight in the network, we can proceed to iteratively refine the weights and minimize the error using the method of steepest descent.

In the most popular version of backpropagation, called stochastic backpropagation, the weights are initially set to small random values and the training set is randomly polled to pick out a single input-target pair. The input is passed through the network to compute internal signals (like $\mathcal{A}\left(p_1\right)$ and $\mathcal{A}'\left(p_1\right)$ shown in Figures 4 and 5) and the output vector. Once this is done, all the information needed to initiate backpropagation becomes available. The partial derivatives of error with respect to the weights can be computed, and the weights can be refined with intent to reduce the error. The process is iterated using another randomly chosen input-target pair.

## The miraculous feat of cognition

I am in awe of the miraculous feat of cognition that lead early neural network researchers to arrive at the backpropagation algorithm. They clearly had the ability to see patterns and make elegant groupings which ultimately made it possible to train huge networks. Their work not only resulted in the neural network applications we use today, but have also inspired a host of other related algorithms which depend on error minimization.
Although this algorithm has been presented here as a single established method, it should be regarded as a framework. In my experience, appreciating how an algorithm is derived leads to insight which makes it possible to explore beneficial variations. The process of designing robust and efficient scientific algorithms frequently leads us to regard established frameworks as substrates upon which to build new and better algorithms.

1. Good and easy grasp explanation. Best :)

2. The best explanation I've seen on this topic yet!

3. Great explanation. However I would recommend not having animations but instead just images with the red/yellow boxes added. It is a bit frustrating to read them and then wait until they come back.

4. I worked on a classic problem of using a neural network to play the Japanese game GO. Training a
NN on a 19x19 board is to computationally expensive. Humans learn on a 9x9 board. So the question
is, can we train a NN on a 9x9 board and apply the learning to the full board? The solution is to think
about the 9x9 NN as a filter that can "slide around" the 19x19, looking for a match. A fast fourier
transform can be used to do this. The FFT can "look for" the 9x9 pattern over the 19x19 board.
So you can "teach in the small" and "learn in the large".

I never wrote up and published this result and I haven't seen it anywhere else to date.
It is useful for complex problems (e.g. finding enemy tanks in a forest) involving large data
sets (high density images).

5. Can I repost this blog and translate it into Chinese by my understanding ? It is very helpfully to understand backpropagation.

1. Thank you for your interest. Please email me at shashi@numericinsight.com to discuss this matter.

6. Sir how can the same algorithm be used for Texture analysis

7. wow...I am impressed!

8. thanks for sharing wonderful article description of what while written the blogs in back end process. your graphical information is too good. keep updating more.
Web Designing Training in chennai

9. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

iOS Training in Chennai

10. Nice it seems to be good post... It will get readers engagement on the article since readers engagement plays an vital role in every blog.i am expecting more updated posts from your hands.
Android App Development Company

11. great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...
Freshers Jobs in Chennai

12. Nice it seems to be good post... It will get readers engagement on the article since readers engagement plays an vital role in every blog.i am expecting more updated posts from your hands.
iOS App Development Company
iOS App Development Company

13. Its fantatic explaintion lot of information gather it...nice article....
seo company in Chennai

Fitness SMS
Fitness Text
Salon SMS
Salon Text
Investor Relation SMS
Investor Relation Text
Mobile Marketing Services
mobile marketing companies
Sms API

15. Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
mcdonaldsgutscheine | startlr.com/ | saludlimpia

16. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Mobile Marketing Service
Mobile Marketing Companies
Texting API
Sms API
sms marketing

17. Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
PHP Training Institute in Chennai

18. The best explanation I've seen on this topic yet!BEST PHP TRAINING IN CHENNAI

19. It's more informative blog and useful also PHP training in chennai

PSD to Wordpress
wordpress website development

21. Well explained. Keep updating Artificial intelligence Online Trining

22. I think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!! Best Machine Learning Training courses | best machine learning institute in chennai | Machine Learning course in chennai

23. I love the blog. Great post. It is very true, people must learn how to learn before they can learn. lol i know it sounds funny but its very true. . .
angularjs online Training

angularjs Training in marathahalli

angularjs Training in bangalore

angularjs Training in bangalore

angularjs online Training

24. Hello I am so delighted I found your blog, I really found you by mistake, while I was looking on Yahoo for something else, anyways I am here now and would just like to say thanks for a tremendous post. Please do keep up the great work.
Java training in Bangalore | Java training in Jaya nagar

Java training in Bangalore | Java training in Electronic city

Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in USA

25. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.

rpa training in chennai
rpa training in bangalore
rpa course in bangalore
best rpa training in bangalore
rpa online training

26. It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we forgot all hardwork and wait for getting that happened.

python tutorials
python course institute in electronic city

27. From your discussion I have understood that which will be better for me and which is easy to use. Really, I have liked your brilliant discussion. I will comThis is great helping material for every one visitor. You have done a great responsible person. i want to say thanks owner of this blog.

Data Science Training in Chennai | Best Data science Training in Chennai
Data Science training in anna nagar | Data science training in Chennai
Data Science training in chennai | Best Data Science training in chennai
Data science training in Bangalore | Data Science training institute in Bangalore
Data Science training in marathahalli | Data Science training in Bangalore
Data Science interview questions and answers

28. Interesting post which makes you think about a lot of things. This post helped to make the career.
datapower tutorial video

29. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
Devops Training courses
Devops Training in Bangalore
Best Devops Training in pune

30. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.
python course in pune
python course in chennai
python course in Bangalore

31. Greetings. I know this is somewhat off-topic, but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform like yours, and I’m having difficulty finding one? Thanks a lot.

Advanced AWS Online Training | Advanced Online AWS Certification Course - Gangboard
Best AWS Training in Chennai | Amazon Web Services Training Institute in Chennai Velachery, Tambaram, OMR
Advanced AWS Training in Bangalore |Best AWS Training Institute in Bangalore BTMLA ,Marathahalli

32. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
rpa training in Chennai | rpa training in bangalore | best rpa training in bangalore | rpa course in bangalore | rpa training institute in bangalore | rpa training in bangalore | rpa online training

33. Very useful and informative content has been shared out here, Thanks for sharing it.

34. Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
Best Devops training in sholinganallur
Devops training in velachery
Devops training in annanagar
Devops training in tambaram

35. When I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.

Amazon Web Services Interview Questions And Answers | Best AWS interview questions 2018
Advanced Amazon Web Services Training in Chennai | Best AWS Training Institute in Chennai
Best AWS Training Institute in Pune | Best AWS Training Center in Pune
Best AWS Training Institute in Marathahalli | Advanced AWS Training in Marathahalli, Bangalore

36. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

Good to learn about DevOps at this time.

devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

37. Very nice post here thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
machine learning training in chennai
machine learning training in omr
top institutes for machine learning in chennai
Android training in chennai
PMP training in chennai

38. Thank you so much for your information,its very useful and helpful to me.Keep updating and sharing. Thank you.
RPA training in chennai | UiPath training in chennai | rpa course in chennai | Best UiPath Training in chennai

39. Thank you so much for your information,its very useful and helpful to me.Keep updating and sharing. Thank you.
RPA training in chennai | UiPath training in chennai | rpa course in chennai | Best UiPath Training in chennai

40. Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up!
angularjs Training in chennai

angularjs Training in chennai

angularjs-Training in tambaram

angularjs-Training in sholinganallur

angularjs-Training in velachery

angularjs-Training in pune

41. I like your article very much. It has many useful ideas and suggestions. I thinks it will be more helpful for my research in an efficient manner. Please try to post some more topics as well as possible.
DevOps Training in Chennai
DevOps certification Chennai
DevOps course in Chennai
Best devOps Training in Chennai
AWS Training in Chennai
R Programming Training in Chennai

42. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.
rpa training in bangalore
best rpa training in bangalore
rpa training in pune

43. I really like the dear information you offer in your articles. I’m able to bookmark your site and show the kids check out up here generally. Im fairly positive theyre likely to be informed a great deal of new stuff here than anyone
Data Science Training in Chennai
Data Science training in kalyan nagar
Data science training in Bangalore
Data Science training in marathahalli
Data Science interview questions and answers
Data science training in jaya nagar
Data science training in bangalore

44. Such a wonderful blog on Machine learning . Your blog almost full information about Machine learning .Your content covered full topics of Machine learning that it cover from basic to higher level content of Machine learning . Requesting you to please keep updating the data about Machine learning in upcoming time if there is some addition.
Thanks and Regards,
Machine learning tuition in chennai
Machine learning workshops in chennai
Machine learning training with certification in chennai

45. Wow nice to read the blog
R programming training in chennai

46. This is really impressive post. I would like to appreciate you for making it very simple and easy.
Regards,
Best Devops Training in Chennai | Best Devops Training Institute in Chennai

47. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.
microsoft azure training in bangalore
rpa training in bangalore
best rpa training in bangalore
rpa online training

48. Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
mi service center in chennai
redmi service center in chennai
xiaomi service centre chennai
redmi service center
mi service center
redmi service center near me
redmi mobile service centre in chennai

I welcome your comments. Do let me know if you have a numerical and insightful story to tell.