Convert Simple Neuron Network to Mathematical Equation
This article was inspired by “Neural Networks are Function Approximation Algorithms” , where Jason Brownlee shows how using neural networks helps in searching of “unknown underlying function that is consistent in mapping inputs to outputs.”
Despite the simplicity of implementation of basic neural networks in most famous frameworks, what we are getting when the model has been trained are weights and model architecture, which we can implement later. But what the is a mathematical equation for these function approximations? Of course, in most cases, it is impossible to convert complex NN architectures to mathematical notation, but in some cases, it can be useful.
Thus, in this article I will show:-
how Keras Sequential model computes output value via trained weights and matrix multiplication;-
how to convert Keras model architecture to algebraic function;-
how algebraic function can be transformed to Latex and Numpy expression.
Keras Sequential model computation
When it comes to function approximation there is no necessity in deep neural networks: perceptron architecture could be enough. Keras Sequential model is what we need (check the the article mentioned above). After finding the appropriate architecture that gives good approximation we are left with weights, which are nothing but the set of matrices with numbers. To get the output result from input numbers we usually use the same framework, but to get algebraic function we need to know how it is calculated. So, let’s define such multilayer perceptron (MLP):
where: x — input value, W — weights between layers, B — biases, f — activation functions, y — output value. Thus, I set 4 layers for MLN: 2 hidden (the first with 3 neurons, the second with 2), 1 input and 1 output. Now let’s implement some code (for details about model construction, please, refer to Jason Brownlee article).
But instead of simple quadratic function I tried to approximate this one:
and after several attempts (depending on starting weights initialization) I got a model that gives not bad approximation for this function:
But how can we find out what is the “real” function that lies in the heart of the model that produces this output? First of all, we need to understand what exact calculation is being done for deriving output value from the input value. Mathematically this process could be represented in terms of matrix multiplication:
where W are matrices with weights, B — with biases, X — input value, Y — output value, f — activation functions of each hidden layer. In my model I used tanh function, which in Keras is implemented as follows:
So, now I need to get weights and biases for each layer, do multiplication, summation and run activation function on each loop:
As a result, this applet returns the same output as Keras model.predict() function. But this is still not a mathematical equation of a function.
Convert model architecture to algebraic function
The logic of conversion is the same as in the last loop, but instead of matrix multiplications, I construct mathematical formula from extracted weights and biases:
The output of this code for the trained model above is the following:
A long string for small neural network… Could you imagine what is the equation for deep neural networks (taking into account that there is no activation formula here yet)? For clarity, this formula can be presented in such form:
(3 neurons of the first hidden layer x 2 neurons of the second hidden layer + output layer). But to get the exact formula for the approximation of given function I need to add formula of tanh (or those AF that is used in the model).
Transformation of function into Latex and Python expression.
I wrote two functions: -
one returns text in Latex format to include formulas into documents (in any case, the big models notation unlikely fit the A4 list);-
another returns text that can be used with Python numpy library to calculate single output from a single input.
Finally, I get two strings (the weights here are different of those in previous images because of different training processes):
- LaTeX: -0.71(exp{(-0.23(exp{(0.52x+0.15)} — exp{-(0.52x+0.15)})/ … /(exp{(3.47x-0.68)} + exp{-(3.47x-0.68)})+0.52)})+0.03;
- Numpy: -0.71*(np.exp((-0.23*(np.exp((0.52*x+0.15)) — np.exp(-(0.52*x+0.15)))/ … /(np.exp((3.47*x-0.68)) + np.exp(-(3.47*x-0.68)))+0.52)))+0.03
Using Overleaf website one can convert this string into mathematical equation. Here is how such simple multilayer perceptron looks like (although don`t forget that MinMax scaling was applied).
Despite such ugly equation, using simplification techniques and Taylor series expansion, one can derive much simpler formulas than such ones (as this is just approximations).
In any case, 1 layer perceptron can also produce approximations with “normal” equations:
Also, here is almost an exact approximation, which was done by 2 layer MLP with 10 and 2 neurons respectively:
Link to jupyter notebook with full code.