An efficient way to apply the chain rule for partial differentiation, in order to calculate the gradient of the loss function with respect to the parameters of the network.

The error of neuron j in layer l is:

$$

\delta_j^l = \frac{\partial J}{\partial z_j^l} = \frac{\partial J}{\partial a_j^l} \cdot \phi'(z_j^l)

$$

In MLP, the error depends on neurons in layer l only through the activation of neurons in the subsequent layer l+1, so we can use the chain rule to write,

$$ \Delta^{(l)}_j = \frac{\partial J}{\partial Z^{(l)}j} = \sum{k} \frac{\partial J}{\partial Z^{(l+1)}_k} \frac{\partial Z^{(l+1)}_k}{\partial Z^{(l)}j} \\ = \sum{k} \Delta^{(l+1)}_k \frac{\partial Z^{(l+1)}k}{\partial Z^{(l)}j} \\ = \left( \sum{k} W^{(l+1)}{jk} \Delta^{(l+1)}_k \right) \odot \Phi'^{(l)}(Z^{(l)}_j) $$

In vector form,

$$ \Delta^{(l)} = \frac{\partial J}{\partial a^{(l)}} \odot \Phi'^{(l)}(Z^{(l)}), \\

\Delta^{(l)} = \frac{\partial J}{\partial b^{(l)}}', \\

\Delta^{(l)} = \left( W^{(l+1)} \Delta^{(l+1)} \right) \odot \Phi'^{(l)}(Z^{(l)}),

$$

The final equation is the derivative of the loss w.r.t to the weights.

$$ \frac{\partial J}{\partial W^{(l)}_{kj}} = \frac{\partial J}{\partial Z^{(l)}_j} \frac{\partial Z^{(l)}j}{\partial W^{(l)}{kj}} = \Delta^{(l)}_j a^{(l-1)}_k $$

In practical examples gradients are accumulated and parameters are updates for mini-batches of data. However, can the mammalian cortex do back-propagation? Refer Stanford Seminar - Can the brain do back-propagation? Geoffrey Hinton - YouTube for more insights on this.