Code
function relu(preactivation) {
return preactivation < 0 ? 0 : preactivation
}
2024-06-04
I was on a bit of a roll writing these little articles frequently, but it has been a while since the last one. I’m writing this one at least partly with a very small baby resting on me, so maybe it’s something to do with that.
In any case, I’ve recently got a job which is more data focussed, so either I’ll write a lot more of these as I learn lots of new things, or I’ll have no time to write any more. The job will involve quite a bit of deep learning. I think most of the actual interacting with neural nets will be at a high level, but I’m working through Simon Prince’s (so far) excellent Understanding Deep Learning to get a better grasp of how they work under the hood. It has excellent text and figures to help build understanding. Even better, it has notebooks to work through. The notebooks are great for getting you to write some code, but I thought I could add some interactivity to the examples. Some of the explanations here will be quite brief; I’m not trying to duplicate the information in the book.
Here we start with the very basics. The Rectified Linear Unit (ReLU) function is a common activation function used in deep learning. It’s simple to understand: if a value is negative, return zero. Otherwise, return the value.
An artificial neural net applies an activation function like ReLU to the output of a linear function.
Here, we apply ReLU to a neuron. Play around with the sliders to get a feel for how changing the slope and y-intercept of the linear function changes the output of the neuron after applying the activation function.
So how does a neural network use these activation functions? The answer is that they can be combined to approximate any function. For an example, let’s see how three neurons can approximate a section of a sine function.
The way you can combine the neurons is to add up their outputs, each multiplied by a constant. In the three-neuron example:
\(y = \phi_0 + \phi_1 h_1 + \phi_2 h_2 + \phi_3 h_3\)
where \(h_d\) is the \(d^{th}\) neuron. Since the ReLU neurons can be 0 in some region and >0 for the rest, you can use them to build up an approximation with pieces of lines. Adding the weights (\(phi_d\)) to the equation, and allowing them to be positive or negative means that those piece-wise functions can have a positive or negative gradient.
See how fiddling with the parameters of the neurons below can get you close to the section of sine!1
I hope having the sliders to play with gives you a good feel for how this works. As I said, I’m not trying to cover the theory of this, so if you want that, I can recommend the book I’m working through.
This is a simple example, with just three hidden neurons in a single layer. I’ll work on a more complex example another time if anyone2 tells me they want one.
sin_x = Array.from({length: 90}, (x,i) => i*4);
neuron_1 = sin_x.map((x) => activation(x, (d) => linearfunc(d, neuron_1_form.slope, neuron_1_form.intercept), relu))
neuron_2 = sin_x.map((x) => activation(x, (d) => linearfunc(d, neuron_2_form.slope, neuron_2_form.intercept), relu))
neuron_3 = sin_x.map((x) => activation(x, (d) => linearfunc(d, neuron_3_form.slope, neuron_3_form.intercept), relu))
function net_output(neurons, phi) {
let input_x = neurons[0].map((d) => d.x);
let sum = (r, a) => r.map((b, i) => a[i] + b);
let neuron_outputs = neurons.map(
(neuron, i) => neuron.map((x) => x.act*phi[i+1])
).reduce(
sum
).map((x) => x+phi[0]);
return input_x.map((d, i) => ({x: d, net: neuron_outputs[i]}))
}