why the feedback?

scruffy

Diamond Member
Joined
Mar 9, 2022
Messages
25,908
Reaction score
22,374
Points
2,288
The machine learning types haven't figured this out yet.

They try to build brain-like machines, but they're too stupid to build them like the brain.

Today's example is feedback. The ML crowd's best take on this is "recurrent neural networks". Which just means sequence learning.

But the human brain is a lot more clever. It uses feedback connections for a higher purpose, and they don't all have to be in the same layer.

A quick look at the wiring diagram of the visual cortex provides a clue.

1742715378130.webp


The outputs from layer 6 feed back to the thalamus, whose inputs arrive in layer 4. Why is that? It's not "recurrent", because there are three other layers between input and feedback.

Here's the answer. The feedback tracks which features make up an object.

Here's an example. Let's say you have a house in your visual scene on the retina. First you get pixel level intensity, contrast, and color. Then you get lines of varying orientations and lengths - and "edges" - corners, and grids. (Like windows, and doors). Eventually there will be a neuron in your convolutional network that says "aha! that's a house". All of this occurs from feedforward connections only.

But now, you have eye movements. Your gaze is still on the house, but your eyes focus on various details using micro-saccades, as they move from one window to another, then to the door, then maybe to the front lawn - you're "studying" the scene that's in front of you. But the house is still a house - it's just that its features have changed position. As your eye moves, the door is now where the window used to be.

So, as a good computer scientist, are you going to recalculate "house" every time your focus shifts by a few degrees? No! What you're going to do is leave "house" running, as long as you're looking at the house All you need to know is "this" door and "these" windows constitute the house - and as long as they're in view, you're still looking at the "house".

But OTHER parts of your brain need the precise feature locations, like for example for targeting. Whereas, your cognitive brain doesn't need them, it just needs to know "house", so it can do logic. (Like maybe "hm, I wonder if it has a pool", "or gee that's a lovely house, I wonder who lives there").

So what the feedback connections do, is they PERSIST the house, while allowing the exact feature set to remain intact. This function can not be performed with memory, because in that case the first feature locations would persist and the updated ones would never be processed.

The feedback connections in the human brain say "this" object consists of "that" set of features, and then track the features as they move around. The process only stops when the object disappears from view.

So in the above circuit diagram of V1, the feedback tracks which retinal receptors make up each line segment in the visual field. The line segments themselves may flutter around a little with micro-saccades, but by and large their relative positions and lengths and angles remain the same. You need TOP-DOWN processing to track all this. That's what the "centrifugal" feedback pathways are for.
 
The machine learning types haven't figured this out yet.

They try to build brain-like machines, but they're too stupid to build them like the brain.

Today's example is feedback. The ML crowd's best take on this is "recurrent neural networks". Which just means sequence learning.

But the human brain is a lot more clever. It uses feedback connections for a higher purpose, and they don't all have to be in the same layer.

A quick look at the wiring diagram of the visual cortex provides a clue.

View attachment 1092612

The outputs from layer 6 feed back to the thalamus, whose inputs arrive in layer 4. Why is that? It's not "recurrent", because there are three other layers between input and feedback.

Here's the answer. The feedback tracks which features make up an object.

Here's an example. Let's say you have a house in your visual scene on the retina. First you get pixel level intensity, contrast, and color. Then you get lines of varying orientations and lengths - and "edges" - corners, and grids. (Like windows, and doors). Eventually there will be a neuron in your convolutional network that says "aha! that's a house". All of this occurs from feedforward connections only.

But now, you have eye movements. Your gaze is still on the house, but your eyes focus on various details using micro-saccades, as they move from one window to another, then to the door, then maybe to the front lawn - you're "studying" the scene that's in front of you. But the house is still a house - it's just that its features have changed position. As your eye moves, the door is now where the window used to be.

So, as a good computer scientist, are you going to recalculate "house" every time your focus shifts by a few degrees? No! What you're going to do is leave "house" running, as long as you're looking at the house All you need to know is "this" door and "these" windows constitute the house - and as long as they're in view, you're still looking at the "house".

But OTHER parts of your brain need the precise feature locations, like for example for targeting. Whereas, your cognitive brain doesn't need them, it just needs to know "house", so it can do logic. (Like maybe "hm, I wonder if it has a pool", "or gee that's a lovely house, I wonder who lives there").

So what the feedback connections do, is they PERSIST the house, while allowing the exact feature set to remain intact. This function can not be performed with memory, because in that case the first feature locations would persist and the updated ones would never be processed.

The feedback connections in the human brain say "this" object consists of "that" set of features, and then track the features as they move around. The process only stops when the object disappears from view.

So in the above circuit diagram of V1, the feedback tracks which retinal receptors make up each line segment in the visual field. The line segments themselves may flutter around a little with micro-saccades, but by and large their relative positions and lengths and angles remain the same. You need TOP-DOWN processing to track all this. That's what the "centrifugal" feedback pathways are for.
This is exactly why there is a need for larger data centers so that the GPU's can pass along updating information while the main process runs a fixed concept of a "house".

This is also why those who have some irrational fear of AI don't concern Me. AI is a very, very, very, long way away from being Skynet.
 
This is exactly why there is a need for larger data centers so that the GPU's can pass along updating information while the main process runs a fixed concept of a "house".

This is also why those who have some irrational fear of AI don't concern Me. AI is a very, very, very, long way away from being Skynet.

the amazing world of compositional mapping

What you say is entirely true, Darkwind. Hopefully photonics will help with some of the computational and power consumption issues.

So, WHY in the visual system, is there a separation of "what" from "where"?

In previous threads I've shown where this happens in the brain, and alternately called these two systems the "what" pathway and the "where" pathway, or the ventral and dorsal streams respectively.

Why is there a separation? It seems counter-intuitive, because when navigating we want to know what's where - like if we're trying to find the candy (reward) in a house (maze), we usually end up with something like "in the bedroom, in the top shelf of the bookcase, behind some books".

The answer is: compositional mapping, and it has everything to do with the feedback mechanism we're talking about.

The answer comes in three parts:

1. primitives
2. composition of primitives
3. memorization and recall of compositions

Simply put, a composition is a combination of "what" and "where". I already laid out the "what" pathway extensively in another thread. The ventral stream begins in the retina and ends in the inferior temporal cortex (side of the head behind the ear), its path is eye=>LGN=>V1=>V2=>V4=>IT, with perhaps a few other way stations depending on species.

I'd like to say a little about the "where" pathway, which splits off at the level of V2 and ends in the posterior parietal cortex (top of the head in back, at approximately the plane of the back of the ear). Interestingly enough, the "where" primitives originate in the eye movement system, where saccadic targets are encoded by location, relative to the current position/angle of the fovea. So for example an eye movement command might be "go 30 degrees to the left and 10 degrees up".

We already talked about primitives in the "what" system, they include things like the orientation of edges and the angles at corners. Now let's talk about primitives in the "where" system. The first thing to realize is eye (movement) coordinates are egocentric, they don't tell us the distance between objects in 3d space, rather they tell us the distance from US to the objects, and the angles relative to where WE are facing. But the primitives in the two coordinate systems are the same, they describe relative locations. For example, "to the right of", "to the left of", "in front of", and "behind". These primitives can be used to translate from one coordinate system to the other (egocentric vs allocentric, and vice versa).

But more importantly, they construct a map of "where" the objects are in visual space, without telling us "what" the objects are. For instance there's a set of corners and edges 6" to the left of the door about midway between the top and bottom - without telling us it's a light switch. Let's cut to the chase and I'll tell you that the information from the what and where pathways is combined compositionally in the hippocampal formation, a highly conserved part of the brain's limbic system that hasn't changed much since goldfish.

Here is a paper that describes the compositional process in detail, and what it's good for.


The idea is, that relationships between objects and locations can be re-used if they're encoded in compositional form - making memory capacity and speed much more efficient. Moreover, compositional relationships imply actions - "the reward is south of the wall, therefore move to the wall then move south".

So let's say the "where" pathway tracks the configurations of objects in space (using the feedback mechanism in the OP to latch and track small variations), and the "what" pathway does the exact same thing to each local object, and both use a library of geographic primitives defined relative to anatomy. Because the neural pathways are self organizing, the variations between individuals don't matter. At the input to the hippocampus, all humans will know there's an object 10 degrees to the left and 10 degrees down ("where"), and that object is a chair ("what").

What the hippocampus does, is it puts together a map of where you are, relative to walls, salient objects, and rewards/goals. And it stores this map (as episodic memory) in the form of vectors describing relationships. The relationships can then be re-used, because "any" room is a combination of walls, obstacles, rewards, and light switches. The hippocampus stores compositional maps into the lateral frontal cortex, from where they can be recalled in real time as they are needed.

So basically this very clever system combines the functions of spatial mapping, episodic memory, navigation, and experience based reasoning into one small set of neurons. The memory function is implemented as a self organizing map in a recurrent neural network in layer CA3 of the hippocampus. The compositional relationships are defined in the entorhinal cortex immediately surrounding the hippocampus (which receives input from all sensory cortical areas) and passed into the hippocampus via the perforant path. The compositional maps are then passed to the frontal lobe for storage and recall (re-use) from hippocampal layer CA1.

The hippocampal system looks approximately like this:

1742776131189.webp


1742776155603.webp


1742778492037.webp


"Input" is entorhinal cortex, DG is dentate gyrus, "place cells" are CA3, "target decision cells" are CA1, and "output" is the lateral frontal lobe.

The overall map is a compactified projection mapping that hovers over T=0 in the brain's evoked potential timeline, on the opposite side of the Riemann sphere. (Its center is "apposed" to T=0). For this reason it forms an "analog" of real perceptual space. The analog space is processed mainly in the cingulate cortex, which is a subject for another thread. Here, we can mainly see how relational reasoning is introduced into the timeline. The hippocampus first generates "suggestions" for movement, then later receives the "results" of those movements. As such, it is able to map which movements generated which outcomes.

The only missing piece of this puzzle is the "attention" mechanism, which we briefly discussed in relation to the pulvinar, although it's a lot more comprehensive and thorough than that. But you can already see that this architecture is vastly more efficient than a deep learning transformer. It requires fewer convolutional layers (four, or six, as distinct from 58 or 68 in the generative AI models derived from AlexNet), and fewer calculations, and stores sequences and relationships far more efficiently.

In fact this architecture has not yet been attempted in a large learning model. When it is, we expect that it will finally make one-shot learning a reality in the AI/ML world. The only downside is our machines will have to "sleep", that is, they'll have to kick off long term memory consolidation as a background task, to be performed during idle moments.
 
Last edited:
the amazing world of compositional mapping

What you say is entirely true, Darkwind. Hopefully photonics will help with some of the computational and power consumption issues.

So, WHY in the visual system, is there a separation of "what" from "where"?

In previous threads I've shown where this happens in the brain, and alternately called these two systems the "what" pathway and the "where" pathway, or the ventral and dorsal streams respectively.

Why is there a separation? It seems counter-intuitive, because when navigating we want to know what's where - like if we're trying to find the candy (reward) in a house (maze), we usually end up with something like "in the bedroom, in the top shelf of the bookcase, behind some books".

The answer is: compositional mapping, and it has everything to do with the feedback mechanism we're talking about.

The answer comes in three parts:

1. primitives
2. composition of primitives
3. memorization and recall of compositions

Simply put, a composition is a combination of "what" and "where". I already laid out the "what" pathway extensively in another thread. The ventral stream begins in the retina and ends in the inferior temporal cortex (side of the head behind the ear), its path is eye=>LGN=>V1=>V2=>V4=>IT, with perhaps a few other way stations depending on species.

I'd like to say a little about the "where" pathway, which splits off at the level of V2 and ends in the posterior parietal cortex (top of the head in back, at approximately the plane of the back of the ear). Interestingly enough, the "where" primitives originate in the eye movement system, where saccadic targets are encoded by location, relative to the current position/angle of the fovea. So for example an eye movement command might be "go 30 degrees to the left and 10 degrees up".

We already talked about primitives in the "what" system, they include things like the orientation of edges and the angles at corners. Now let's talk about primitives in the "where" system. The first thing to realize is eye (movement) coordinates are egocentric, they don't tell us the distance between objects in 3d space, rather they tell us the distance from US to the objects, and the angles relative to where WE are facing. But the primitives in the two coordinate systems are the same, they describe relative locations. For example, "to the right of", "to the left of", "in front of", and "behind". These primitives can be used to translate from one coordinate system to the other (egocentric vs allocentric, and vice versa).

But more importantly, they construct a map of "where" the objects are in visual space, without telling us "what" the objects are. For instance there's a set of corners and edges 6" to the left of the door about midway between the top and bottom - without telling us it's a light switch. Let's cut to the chase and I'll tell you that the information from the what and where pathways is combined compositionally in the hippocampal formation, a highly conserved part of the brain's limbic system that hasn't changed much since goldfish.

Here is a paper that describes the compositional process in detail, and what it's good for.


The idea is, that relationships between objects and locations can be re-used if they're encoded in compositional form - making memory capacity and speed much more efficient. Moreover, compositional relationships imply actions - "the reward is south of the wall, therefore move to the wall then move south".

So let's say the "where" pathway tracks the configurations of objects in space (using the feedback mechanism in the OP to latch and track small variations), and the "what" pathway does the exact same thing to each local object, and both use a library of geographic primitives defined relative to anatomy. Because the neural pathways are self organizing, the variations between individuals don't matter. At the input to the hippocampus, all humans will know there's an object 10 degrees to the left and 10 degrees down ("where"), and that object is a chair ("what").

What the hippocampus does, is it puts together a map of where you are, relative to walls, salient objects, and rewards/goals. And it stores this map (as episodic memory) in the form of vectors describing relationships. The relationships can then be re-used, because "any" room is a combination of walls, obstacles, rewards, and light switches. The hippocampus stores compositional maps into the lateral frontal cortex, from where they can be recalled in real time as they are needed.

So basically this very clever system combines the functions of spatial mapping, episodic memory, navigation, and experience based reasoning into one small set of neurons. The memory function is implemented as a self organizing map in a recurrent neural network in layer CA3 of the hippocampus. The compositional relationships are defined in the entorhinal cortex immediately surrounding the hippocampus (which receives input from all sensory cortical areas) and passed into the hippocampus via the perforant path. The compositional maps are then passed to the frontal lobe for storage and recall (re-use) from hippocampal layer CA1.

The hippocampal system looks approximately like this:

View attachment 1092895

View attachment 1092897

View attachment 1092922

"Input" is entorhinal cortex, DG is dentate gyrus, "place cells" are CA3, "target decision cells" are CA1, and "output" is the lateral frontal lobe.

The overall map is a compactified projection mapping that hovers over T=0 in the brain's evoked potential timeline, on the opposite side of the Riemann sphere. (Its center is "apposed" to T=0). For this reason it forms an "analog" of real perceptual space. The analog space is processed mainly in the cingulate cortex, which is a subject for another thread. Here, we can mainly see how relational reasoning is introduced into the timeline. The hippocampus first generates "suggestions" for movement, then later receives the "results" of those movements. As such, it is able to map which movements generated which outcomes.

The only missing piece of this puzzle is the "attention" mechanism, which we briefly discussed in relation to the pulvinar, although it's a lot more comprehensive and thorough than that. But you can already see that this architecture is vastly more efficient than a deep learning transformer. It requires fewer convolutional layers (four, or six, as distinct from 58 or 68 in the generative AI models derived from AlexNet), and fewer calculations, and stores sequences and relationships far more efficiently.

In fact this architecture has not yet been attempted in a large learning model. When it is, we expect that it will finally make one-shot learning a reality in the AI/ML world. The only downside is our machines will have to "sleep", that is, they'll have to kick off long term memory consolidation as a background task, to be performed during idle moments.
I'm heading to bed so I'll read this later, but damn fine answer. What I've read of it, so far.

So, My main question is, "When will you overthrow us meddling Humans?" :)
 
The machine learning types haven't figured this out yet.

They try to build brain-like machines, but they're too stupid to build them like the brain.

Today's example is feedback. The ML crowd's best take on this is "recurrent neural networks". Which just means sequence learning.

But the human brain is a lot more clever. It uses feedback connections for a higher purpose, and they don't all have to be in the same layer.

A quick look at the wiring diagram of the visual cortex provides a clue.

View attachment 1092612

The outputs from layer 6 feed back to the thalamus, whose inputs arrive in layer 4. Why is that? It's not "recurrent", because there are three other layers between input and feedback.

Here's the answer. The feedback tracks which features make up an object.

Here's an example. Let's say you have a house in your visual scene on the retina. First you get pixel level intensity, contrast, and color. Then you get lines of varying orientations and lengths - and "edges" - corners, and grids. (Like windows, and doors). Eventually there will be a neuron in your convolutional network that says "aha! that's a house". All of this occurs from feedforward connections only.

But now, you have eye movements. Your gaze is still on the house, but your eyes focus on various details using micro-saccades, as they move from one window to another, then to the door, then maybe to the front lawn - you're "studying" the scene that's in front of you. But the house is still a house - it's just that its features have changed position. As your eye moves, the door is now where the window used to be.

So, as a good computer scientist, are you going to recalculate "house" every time your focus shifts by a few degrees? No! What you're going to do is leave "house" running, as long as you're looking at the house All you need to know is "this" door and "these" windows constitute the house - and as long as they're in view, you're still looking at the "house".

But OTHER parts of your brain need the precise feature locations, like for example for targeting. Whereas, your cognitive brain doesn't need them, it just needs to know "house", so it can do logic. (Like maybe "hm, I wonder if it has a pool", "or gee that's a lovely house, I wonder who lives there").

So what the feedback connections do, is they PERSIST the house, while allowing the exact feature set to remain intact. This function can not be performed with memory, because in that case the first feature locations would persist and the updated ones would never be processed.

The feedback connections in the human brain say "this" object consists of "that" set of features, and then track the features as they move around. The process only stops when the object disappears from view.

So in the above circuit diagram of V1, the feedback tracks which retinal receptors make up each line segment in the visual field. The line segments themselves may flutter around a little with micro-saccades, but by and large their relative positions and lengths and angles remain the same. You need TOP-DOWN processing to track all this. That's what the "centrifugal" feedback pathways are for.
Another conceited post by the self appointed scientist in chief of US Message Board.

Machine Learning does not seek to emulate the brain, therefore your entire diatribe is misplaced, based on false premises as so many of your tiresome posts are.

Furthermore ML doesn't even claim to represent Artificial Intelligence in the classic sense.

Given that AI is booming just now, the fact that your are not involved and have nothing tangible to offer, proves you are just a legend in your own lunchtime.
 
Last edited:
Back
Top Bottom