The Impact of Noise on Recurrent Neural Networks I
In this post, we are going to study reservoir computing, a subset of recurrent neural networks that are used for complex temporal processing tasks. We will consider the setting where our computational reservoir is subject to noise and errors. A classic example of reservoir computing is in prediction of dynamics of physical systems, however by using typical encodings as in transformer-based methods they can also be used to predict language patterns for human-like conversation based on previous inputs.
For this post, there are at least two motivations behind considering reservoir computers as opposed to something like a transformer. The first is academic. Many simply have not heard of reservoir computers. Second, reservoir computing is simple, which makes it an excellent pedagogical tool. The training, as we will see, involves a simple linear layer, and the recurrent structure is given by a random matrix. This simplicity lends itself to intepretability.
The specific kind of reservoir computer we are going to consider are echo state networks (ESNs). They are a very simple network with a few tunable parameters: the network sparsity, the bleedthrough between time steps, an encoding map, a decoding map, an internal transition map, and the size of the reservoir. An interesting property of ESNs is that by randomly initializing the weights in the encoding map and internal transition map, it is possible to learn to predict time series by simply training the decoding map, which can be taken to be linear, and is hence computable via a standard least squares estimator. For the purposes of this tutorial, we will fix the sparsity and bleedthrough.
The goal of this tutorial is to begin to understand how noise affects the performance of reservoir computers as a function of their size. Our analysis is more generally applicable to analog systems where the state of the system is a continuous value, such as the neural activations of our reservoir. What we aim to demonstrate is that in this scenario, noise tends to substantially impair the kinds of computations a system is able to perform. We start with a simple notebook that introduces the model (echo state networks), and the computational task (NARMA10).
Check out the next notebook here!
Acknowledgements
A special thanks to Alex Meiburg, André Melo and Eric Peterson for feedback on this post!