A new study reveals the precise moment the brain detects gaze direction, enhancing our understanding of social interactions and disorders like autism and Alzheimer's.
neurosciencenews.com
Why?
Because head position goes directly to sensory cortex S1. Whereas eye position goes through multiple pre-cortical areas before it finally arrives in the temporal lobe.
This information dovetails perfectly with the hypothesis I presented in the other thread.
Our consciousness is "slightly ahead of" real time. That's why we're "aware".
"Vergence" eye movements are related to depth, they are "disjunctive" which means the eyes move in opposite directions. As distinct from saccades, where both eyes move in the same direction.
In the hippocampus there "place cells" and "grid cells", which put together a 3-d map of the external world, relative to the organism.
Here's the important part: that map is taken apart again when we move our eyes. Here's how we know: vergence movements interrupt saccades. They're handled by a different part of the brain.
Here's a graph of what an eye movement looks like, when it involves both saccades and vergence:
The graphs on top clearly show the vergence movement that takes place "in the middle of" a saccade.
This is what the eye movement system looks like in the human brain. The sensory side is on the right, the motor side is on the left. The two sides converge on the square box in the middle, where it says SC which stands for "superior colliculus". The SC is part of the midbrain, it's near the cerebellum.
The SC is "retinotopic", it tells the eyes where to move by targeting areas in the visual field. Below the SC, there are areas in the brainstem that map target position into the activity of the oculomotor muscles. Here's what that looks like (the right side of the diagram):
Returning to the top picture, we know what most of these brain areas do. FEF are the frontal eye fields, area 8 in Broca's brain map. But notice the box right next to it that no one ever talks about (because no one knows what it does or how it works) - the box labeled DLPFC/SEF, which stands for the dorsolateral prefrontal cortex and the supplementary eye fields. Here's a bit about that:
Look in the section that says 'Function". Note especially the information about the "delayed response task".
"Patients with minor DLPFC damage display disinterest in their surroundings and are deprived of spontaneity in language as well as behavior."
The DLPFC is generative. It tells us what to pay attention to, and it makes us curious and clever. And, it organized the CONTEXT for the spatial maps that are presented to the hippocampus.
Here's an example. You just took your dog for a walk, and now you're putting him in the back seat of the car. The dog's spatial map changes, it goes from the in-the-park map to the in-the-car map. Instead of sniffing trees, the dog is now looking out the open car window. "Objects" have a different context in the new map. If you show the dog a stick in the park, it becomes excited because it thinks you're going to play catch. If you show the dog a stick in the car, it becomes confused and frightened because it thinks it's about to be punished. The dog knows it can't play catch in the car. This is what "comparing two objects" means. Stick + park = good, stick + car = bad.
So - the relevant part of this, for consciousness and for the egocentric reference frame, is something called "efference copy". It works like this:
Let's say you're looking at a visual scene, and you become interested in one of its features. So, your eyes move to the feature. When that happens, the sensory brain (visual cortex) gets a new view of the scene. It's the same scene, nothing has changed in the external world, but now the retinal image has slightly shifted from where it was.
It turns out in this case, that the visual cortex already knows what to expect, before the new view of the scene arrives. The same frontal eye field neurons that command the eyes to move, send an advance copy of the command to the visual cortex. This is the "efference copy", and it allows the visual system to selectively respond to the features of interest. (This is "selective attention" in action).
There is all kinds of very complicated evidence for how and why this works, and some thick models. But the part I'd like to draw your attention to, is the "expectation template". The FEF is telling the visual cortex what to expect.
So now, if you map this onto a timeline, you'll notice this is a continuous behavior. There are micro saccades about 100 times a second, so our frontal brain is constantly telling the sensory systems what to expect. And whereas the retina takes in about a gigabit per second, it only actually uses a few bits of that information, to map the salient features of the visual scene.
You can consider this mechanism for example, in reading. As our eyes move across the page, we foveate the current word of interest, but our peripheral retina is seeing the "next" word, and telling the FEF where to move to so it can be looked at. Then, the FEF issues the command to move the eyes, and at the same time it tells the visual system "get ready, I'm moving the eyes to the word CAT" - so when the retina moves there, it already knows to expect the letters C, A, and T in order. This is why we can read so fast, word to word instead of letter to letter.
When the visual system expects CAT and it gets DOG instead, it generates a P300 brain wave, which is a whole-brain reset, because something has gone drastically wrong with the model. In this case, the microsaccades stop, and the visual cortex stops being selective, and the eyes start moving "around" the area of interest to establish a new reference frame. Once they do, the microsaccades begin again.
In terms of consciousness, reading comprehension is "uh huh uh huh", but the P300 is "whaaaat???". There is meta-activity that deals with the timeline in different ways. When a P300 occurs, it's the same as changing the spatial map, it's the same as moving from the park to the car. Out with the old context, in with the new.
People with damage to the DLPFC can't do this. It takes them a LONG time to establish a new context, or recover an existing one. Not only is the meta processing missing, but the expectation templates are missing as well. Reading is "different" in these people, it's slow and laborious and they don't get interested in what they're reading. The eyes work perfectly fine, and the eye movements "seem" normal, until they engage in a delayed response task (which is what reading is, you don't get the meaning till you get to the end of the sentence, and meanwhile you have to hold the preceding words in memory).
To follow up - for those of you interested in AI - you probably already know about TDRNN which means "time delay recurrent neural network".
These are most often used for stochastic control problems. They're very popular in economics, for example they tell us which index's volatility has the greatest impact on a stock.
But here's the new and clever piece - the time delays can be mapped with Information Geometry.
Here is what a TDRNN looks like:
Notice the tiny boxes on the right that say "delay box". With an appropriate network construction (like the one discussed in this thread), we can map the time delays into what basically amounts to a shit register, like this:
So you can envision the signal going from one neuron, to the next, to the next - with a small delay at each step.
However the same thing can be accomplished with the "delay boxes", if each box has a slightly different delay.
Let's say we have a collection of increasing stochastic delays. It turns out, the network structure is formally identical to an array of Gaussians with increasing mean.
To understand how this looks, we invoke information geometry. A gentle introduction to information geometry is on the slides in this link:
Instead of mapping a Gaussian reference frame, we use time delays - so we are basically "shifting each curve by the amount of the delay. We can do this in two dimensions, so we have the Gaussian on one axis and the delay on the other.
When we compactify this structure, we get a three dimensional Hawaiian earring - two dimensions of which are an actual earring, like this:
The third dimension is our shift - which is very hard to visualize, but a slice of it looks something like this - only instead of a cone we have a Gaussian.
So now, here's why this matters. Here is the world's simplest compactified neural network, it's called a "monosynaptic relex arc".
The question is: "where is the origin"? What defined "now"? We can rotate this arc so "now" can be anywhere, but we DEFINE "now" to be the sensorimotor interface, in other words, it's in the muscle spindle, between the sensory fiber and the contractile element. Here's what that looks like:
This way, the "now" that lives in the motor neuron, "covers" the activity in the muscle. And every "now" that's above it in the more central reflex loops, also covers it - the only difference is that the more central "now" has a greater delay. So we end up with a map of delays, which when they're aligned "cover" real time just like a set of Lebesgue pancakes. Therefore any action generated centrally that rolls out through the network, can be mapped and tracked using information geometry. We end up with a time to space mapping, a snapshot of which is the rollout.
This is how the brain learns its own internal activity. For instance, the egocentric reference frame requires an alignment of all 5 senses in both space and time. Well, this is how it's done. The entire snapshot in 3 dimensions can be phase encoded into the firing pattern of a single neuron - which can then be auto encoded (self organized) back into the specific locations and behaviors of each sensory and motor modality.
This is how the frontal lobe encodes "context" for a cognitive map. (In the hippocampus or anywhere else). The whole structure basically becomes a giant associative memory, which can be realized using LSTM and TTFS and whatever other learning methods are available
The topology of the earring structure is non trivial. It has a non trivial fundamental group. In discrete form it has a fractal structure, while in continuous form it becomes a Hilbert space.
What is this good for? Consider a "delayed grasping task". The robot is presented with an object at a random location, it then has to wait 5 seconds and subsequently "grasp" the object, and rotate it to look at its back side (where it will find a reward). In this example we have two sensory modalities (vision and touch), and two motor modalities (eye movement and arm movement). The arm must move the hand to the location where the eye tells the brain the object is, and then the hand must grasp the object according to its perceived size and shape. This is a pretty complicated task, if you map the sequence onto a timeline there's a lot going on. However the entire task sequence can be mapped into a single "image" in earring space, and its rollout then becomes homologous to a moving visual stimulus.
The moral of the story? Time flow through the earring space is the same as optic flow through the visual system. Same exact thing. Therefore it makes sense that there should be columns in the earring cortex that map direction and velocity just like the visual system. Nature reuses the designs that work.
Previous quantum simulations of higher-order topological phases have utilized synthetic dimensions. Here the authors simulate high-order topological phases on lattices with spatial dimension up to four on a superconducting quantum computer using a mapping to a many-body Hamiltonian in a reduced...
Here is one of my favorite spaces: The earring space, i.e. the “shrinking wedge of circles.” This space is the first step into the world of “wild” topological spaces. This p…
wildtopology.com
It's a lot easier with information geometry.
Draw a line through the X axis, and map it to the space of Gaussians.
What you get is basically a set of Kalman filters with increasing time constants.
This maps directly to Granger causality. In economics, when looking at market volatility, you get a set of time series with increasing window size.
In our brains, this is called "adaptive planning". If you're a catcher in a baseball game, this is how your mitt moves to capture a curve ball.
If nothing changes, all your estimates agree. If something changes, the ones on the left diverge from the ones on the right.