> we show that it is possible to learn classification tasks at near competitive accuracy **without
backpropagation**, by maximizing a surrogate of the mutual information between hidden representations and labels and
simultaneously minimizing the mutual dependency between hidden representations and the inputs...
the hidden units of a network trained in this way form useful representations. Specifically, fully competitive accuracy
can be obtained by freezing the network trained without backpropagation and appending and training a one-layer
network using conventional SGD to convert convert the representation to the desired format.
The training method uses an approximation of the [#information bottleneck](/tag/information_bottleneck_method).
> - The method facilitates parallel processing and requires significantly less operations.
> - It does not suffer from exploding or vanishing gradients.
> - It is biologically more plausible than Backpropagation