This is what Friston so elegantly points out in the video. In machine learning, the machine is given a goal. Friston uses the example of a hungry owl ("find food, find a mouse"), because the owl will then scan the visual field looking for a mouse. But machines don't work that way, a machine will find the optimal algorithm to minimize the hunger it's experiencing, never once engaging in active search.
Active search is a "capability", so whenever it's brought into play it has to be staged along the timeline. That's what the
generative part of the frontal cortex does, it puts stuff into the timeline along T > 0. The basal ganglia are responsible for the subsequent tracking - again: caudate nucleus, whole brain map. Putamen, whole brain map. The staging includes the
expected sensory consequences.
So "minimizing error" involves updating either the goal or the environment, and sometimes updating the environment is impossible. (Politics being an example). We are sometimes left with "residual error". Which may cause discomfort whenever it enters our consciousness.
"Residual error" is
generative of emotions. When you can't do something, maybe you get frustrated, you say "God this sucks" even knowing full well it's your fault. Unresolved residual error can be easily deconstructed, it happens every day in places like AA. On the other hand, inability to minimize residual error affects self esteem. It can generate anger, and fear.
According to this model then, the capability to predict is
necessary for perception to occur. There can be no perception without prediction. In the human visual system the oculomotor tremor during fixation is about 100 Hz and covers about 0.004 degrees of arc in the visual field (that's about 0.24 minutes which is considerably larger than the receptive field of a foveal cone). That means, no matter what the visual system predicts, it will always be wrong. By a very small amount, but still wrong. During a typical fixation of 1/3 second, at 100 Hz, the visual cortex will receive 33 different images of the same scene. Gradient descent can thus occur 33 times, no more. In that window, the visual system has to model the image, and generate the next prediction (for what it will see after the next eye movement).
How the visual system does this, is very clever. It breaks up the 1/3 second window into three windows of 1/10 second each (at the alpha frequency), and optimizes each image 3 times. This way any noise due to micro tremors cancels itself out. The last estimate goes into hippocampus with a delay of about 200 msec, and there, if the free energy indicates surprise, you get a P300 about 100 msec later (just about enough time for a round trip through the frontal cortex, which is a context search to verify that the information really is surprising).
(btw I can give you references for all this stuff if you'd like)
Free energy is a unifying principle that ties together physics and psychology. It is accessible mathematically through statistical thermodynamics, as first discovered by Shun-Ichi Amari in 1978 (the godfather of information geometry - if life is fair he should be the next Nobel Prize winner). The brain is a physical device, it obeys physical principles. It is agnostic to the data, except insofar as said data may create unresolved residual errors.
One of the specific predictions of the free energy model was the need for a closed control loop in the oculomotor system, as first noticed by David Robinson in 1981 (although he was unable to prove it at the time, because the free energy principle didn't exist yet, it was invented in 1991 by Rajesh Rao and Dana Ballard at the Salk Institute, working with Francis Crick of DNA fame and Terry Sejnowski who built NetTalk, the first artificial neural network that could learn to read all by itself). The closed loop in the oculomotor system has now been found, it starts in the palisade endings of the ocular muscles and the reason it wasn't found before is because ocular proprioceptors enter through the trigeminal nerve, not the oculomotor nerves. No one still knows exactly how this system works, but we can see its effects. And no one knows "why 100 Hz" either, whether this is an oscillation in the motor neurons themselves or whether it belongs to the network. But we know for sure the closed loop is there, and this year we'll find out how it develops.
Here's an easy way to modify your perception: go to Disneyland and put your hands on the Electric Genie on main street. Note carefully what you're perceiving while you're getting 60 hz AC. (You'll perceive stuff, for sure). Human beings have no electric sense, we don't have lateral line organs. But you're sure as hell going to perceive that electricity. You have my word in it.
