alignment of sensory fields

scruffy

Diamond Member
Joined
Mar 9, 2022
Messages
30,082
Reaction score
26,741
Points
2,788
Humans can do something robots can't (yet). We can look at a target, close our eyes, and reach for the target and grasp it without opening our eyes.

It sounds simple, doesn't it. But it's not! Let's look at an example.

First of all, vision and reaching occur in different coordinate systems. Vision is Cartesian (after a few calculations) - every object we see has x, y, and z coordinates. But reaching (and all other forms of somatic motor activity) are vectorized, the movement is independent of its starting coordinates.

It's just like computer graphics, where bitmaps and jpg's are Cartesian, whereas svg's and PostScript files are vectorized. So computer graphics is a good way to get a clue about how to translate from one representation to the other.

Let's consider our 5 senses. Only one of them is truly three dimensional, that's hearing. Vision has no "rear", it only happens in front of us. And even though our bodies have sensory receptors in back, the extent of the map is strictly limited by body space (we can't feel beyond the boundaries of our skin).

Our simple example that we will consider, comes from an area of the human brain called "superior colliculus". It's in the midbrain, approximately at the same level as the cerebellum. It supports a reflex, specifically an eye movement reflex - if someone sticks you with a pin, your eyes are directed to look at the spot where it happened. This is a reflex, if you remove the cerebrum entirely it still persists.

The issue and the question is, there is no VISUAL stimulus in this scenario. So how do the eyes know where to move?

Obviously, there must be some alignment of the body map with the visual field. The location "right index finger" is translated by the superior colliculus, to "20 degrees down and 5 degrees to the right", which causes your eye to move to that location.

How do these maps align? What is the algorithm by which alignment occurs?

First of all, superior colliculus (SC) gets input from the retina and the visual cortex. This is a topographic mapping. If I flash a light in front of you, your eyes will move to exactly the coordinates of the flashing. This part is fairly intuitive and easy to understand - wherever the light appears, is the target for your saccade.

And we kinda-sorta understand how the visual map is registered with the eye movement map. But the SC is a layered structure, and in addition to topographic input from the retina and visual cortex, there is also a layer that receives input from the somatic sensory system. It is this latter layer we are concerned with.

Now it turns out, that in genetically engineered mice that have a duplicated mapping from the retina to the SC, the axons from V1 split when they grow, to form two entire mappings of the visual field. This occurs very early in development, as soon as the mice can see.


This indicates the alignment is "activity dependent", in other words the active cortical axons are trying to find the active retinal signal - and if the retinal signal exists in two places, the axons will split and connect with both.

However the body layer is different. There is no coactivation in the body layer. Instead what happens is there's a chemical gradient that lines up the maps. The body map and visual map both generate gradients of ephrin-A, and the axons align themselves along the gradients.


The stimulus seeking in the photic reflex is a primitive form of vectorization. It says "move to" this location independently of the starting coordinates of the eyes. In a way, it is much like the reaching for an occluded stimulus, which tells the body to "reach here" independently of posture or current body location.

What is particularly fascinating about the alignment of the body map to the visual map, is it occurs from the somatic sensory cortex, not the motor cortex.

We know what the sensory map looks like, it has no relationship whatsoever to the visual field. This is what it looks like:

1754098386430.webp
 
Humans can do something robots can't (yet). We can look at a target, close our eyes, and reach for the target and grasp it without opening our eyes.

It sounds simple, doesn't it. But it's not! Let's look at an example.

First of all, vision and reaching occur in different coordinate systems. Vision is Cartesian (after a few calculations) - every object we see has x, y, and z coordinates. But reaching (and all other forms of somatic motor activity) are vectorized, the movement is independent of its starting coordinates.

It's just like computer graphics, where bitmaps and jpg's are Cartesian, whereas svg's and PostScript files are vectorized. So computer graphics is a good way to get a clue about how to translate from one representation to the other.

Let's consider our 5 senses. Only one of them is truly three dimensional, that's hearing. Vision has no "rear", it only happens in front of us. And even though our bodies have sensory receptors in back, the extent of the map is strictly limited by body space (we can't feel beyond the boundaries of our skin).

Our simple example that we will consider, comes from an area of the human brain called "superior colliculus". It's in the midbrain, approximately at the same level as the cerebellum. It supports a reflex, specifically an eye movement reflex - if someone sticks you with a pin, your eyes are directed to look at the spot where it happened. This is a reflex, if you remove the cerebrum entirely it still persists.

The issue and the question is, there is no VISUAL stimulus in this scenario. So how do the eyes know where to move?

Obviously, there must be some alignment of the body map with the visual field. The location "right index finger" is translated by the superior colliculus, to "20 degrees down and 5 degrees to the right", which causes your eye to move to that location.

How do these maps align? What is the algorithm by which alignment occurs?

First of all, superior colliculus (SC) gets input from the retina and the visual cortex. This is a topographic mapping. If I flash a light in front of you, your eyes will move to exactly the coordinates of the flashing. This part is fairly intuitive and easy to understand - wherever the light appears, is the target for your saccade.

And we kinda-sorta understand how the visual map is registered with the eye movement map. But the SC is a layered structure, and in addition to topographic input from the retina and visual cortex, there is also a layer that receives input from the somatic sensory system. It is this latter layer we are concerned with.

Now it turns out, that in genetically engineered mice that have a duplicated mapping from the retina to the SC, the axons from V1 split when they grow, to form two entire mappings of the visual field. This occurs very early in development, as soon as the mice can see.


This indicates the alignment is "activity dependent", in other words the active cortical axons are trying to find the active retinal signal - and if the retinal signal exists in two places, the axons will split and connect with both.

However the body layer is different. There is no coactivation in the body layer. Instead what happens is there's a chemical gradient that lines up the maps. The body map and visual map both generate gradients of ephrin-A, and the axons align themselves along the gradients.


The stimulus seeking in the photic reflex is a primitive form of vectorization. It says "move to" this location independently of the starting coordinates of the eyes. In a way, it is much like the reaching for an occluded stimulus, which tells the body to "reach here" independently of posture or current body location.

What is particularly fascinating about the alignment of the body map to the visual map, is it occurs from the somatic sensory cortex, not the motor cortex.

We know what the sensory map looks like, it has no relationship whatsoever to the visual field. This is what it looks like:

View attachment 1143899
The human body is G-ds greatest invention. Within it, the brain is the most impressive.

It's akin to reaching the bottom of the ocean, so much mystery and information is unknown. That which is known is fascinating.
 
I'm not sure you are aware of how computer vision works. The computer can easily determine the distance to the target, as well as the vector to it based on the computers location. With that data, it would be no different than a programmers input

This simple processing program does what you claim is impossible.

import cv2
import math

def detect_target(frame):
# Simple color threshold or object detection
# Return coordinates of detected object
pass

def calculate_distance(angle1, angle2, pos1, pos2):
# Simple trig to estimate distance
delta_angle = math.radians(abs(angle2 - angle1))
pixel_shift = abs(pos2[0] - pos1[0])
baseline = 10 # cm between two angles (depends on your setup)
distance = baseline / math.tan(delta_angle)
return distance

# Main loop
angles = []
positions = []
for angle in range(0, 180, 10):
rotate_camera_to(angle)
frame = capture_frame()
pos = detect_target(frame)
if pos:
angles.append(angle)
positions.append(pos)
if len(angles) == 2:
dist = calculate_distance(angles[0], angles[1], positions[0], positions[1])
print(f"Target at {dist:.2f} cm, angle {angles[0]}°")
break
 
This simple processing program does what you claim is impossible.

Very good, Grasshopper. You're a good programmer.

But... what did I say was impossible?

Look carefully at the sensory homunculus. With ONE exception it is linear, from bottom to top of the body.

Therefore, if I stick a pin in your big toe your eyes will be commanded to move farther "down". Like the cite says, there's a gradient. It's a weird looking one though, compared to the visual field.

Because in our brains, "down" is relative to body position. If you're standing up, down is indeed "down". But if you're reclining on the couch, let's say your legs are crossed so one leg is kinda in the air, then "down" is kinda straight ahead of you. "Downer" is farther away, along the depth axis, your gaze remains approximately parallel to the couch - and most likely slightly off to one side, the side where your big toe is pointing (the one that's in the air). Unless you're watching TV. :p

Relative to the visual field, the axes of the mapping from your body have changed. So, that would be the vestibular system, which is also connected into this pathway we're talking about. You said about computer graphics, and if you play with Maya or Blender or something, this is a type of "perspective transformation", right?

But it's more than that. Let's continue with the graphics analogy. Let's simplify to three senses (vision, hearing, touch). You have three "point clouds", with principal components (or axes) not entirely fixed (they're a little bit noisy and uncertain). Your job is to align the axes of the point clouds so a vector (or matrix or tensor) can be extracted from them that maps each cloud into its respective transformation. In other words, what were the steps that converted each cloud into its final position? Because those same steps have to be applied whenever we want to locate an object.

This is actually a "hard" problem, computationally, because you have to optimize around the inverses. But neural networks can come up with a ballpark solution in milliseconds. It's one of the things they're really good at, searching a feature space. Your neural network can snapshot a map, and then keep the search space within a radius of those features until told to do otherwise.

Viewed this way, the circuitry is both a data system and a control system. I didn't say it was impossible, I said it was beautiful and clever. In each case the result is a single transformation matrix for each sense. And the extra added benefit of this is you get to choose which sense directs the others, so like, do I want to draw your eyes to the pin prick, or to some important event in auditory space, or maybe to some prey you happened to spot out of the corner of your eye? To get from (map to) any sensory cloud to any other, all you have to do is go "through" the alignment, which is just multiplying two matrices, max.

It's a very clever way of doing business. The thing is, these mathematical procedures are enabled by the neural network connections that form during development. The mouse example shows how easy it is to disrupt proper development. From this example we learn that cerebral neurons will map to their sensory counterparts on the basis of (mutual) activity, this way they "automatically" get a topographic mapping and its registration is always within pretty narrow limits of the base map. In the auditory system, attention to an important auditory stimulus involves a lot of head turning and body turning, and not necessarily so much eye movements. That is the job of the cousin colliculus, the "inferior" colliculus. (By the way these areas are called "optic tectum" and "auditory tectum" in birds and fish - we humans call ours "colliculus" because they're literally bumps on the brain stem).

So our brain stem has all these high level reflexes that involve mappings between the senses. At some point higher up in the brain, all these mappings result in one integrated "perception" of sensory reality. The point being that some facility with these mappings is required, as you're going about your life. During a typical day you'll use each of the 9 mappings (for three senses) multiple times, in various ways. You automatically control them without even thinking about them.
 
Last edited:
Humans can do something robots can't (yet). We can look at a target, close our eyes, and reach for the target and grasp it without opening our eyes.

It sounds simple, doesn't it. But it's not! Let's look at an example.

First of all, vision and reaching occur in different coordinate systems. Vision is Cartesian (after a few calculations) - every object we see has x, y, and z coordinates. But reaching (and all other forms of somatic motor activity) are vectorized, the movement is independent of its starting coordinates.

It's just like computer graphics, where bitmaps and jpg's are Cartesian, whereas svg's and PostScript files are vectorized. So computer graphics is a good way to get a clue about how to translate from one representation to the other.

Let's consider our 5 senses. Only one of them is truly three dimensional, that's hearing. Vision has no "rear", it only happens in front of us. And even though our bodies have sensory receptors in back, the extent of the map is strictly limited by body space (we can't feel beyond the boundaries of our skin).

Our simple example that we will consider, comes from an area of the human brain called "superior colliculus". It's in the midbrain, approximately at the same level as the cerebellum. It supports a reflex, specifically an eye movement reflex - if someone sticks you with a pin, your eyes are directed to look at the spot where it happened. This is a reflex, if you remove the cerebrum entirely it still persists.

The issue and the question is, there is no VISUAL stimulus in this scenario. So how do the eyes know where to move?

Obviously, there must be some alignment of the body map with the visual field. The location "right index finger" is translated by the superior colliculus, to "20 degrees down and 5 degrees to the right", which causes your eye to move to that location.

How do these maps align? What is the algorithm by which alignment occurs?

First of all, superior colliculus (SC) gets input from the retina and the visual cortex. This is a topographic mapping. If I flash a light in front of you, your eyes will move to exactly the coordinates of the flashing. This part is fairly intuitive and easy to understand - wherever the light appears, is the target for your saccade.

And we kinda-sorta understand how the visual map is registered with the eye movement map. But the SC is a layered structure, and in addition to topographic input from the retina and visual cortex, there is also a layer that receives input from the somatic sensory system. It is this latter layer we are concerned with.

Now it turns out, that in genetically engineered mice that have a duplicated mapping from the retina to the SC, the axons from V1 split when they grow, to form two entire mappings of the visual field. This occurs very early in development, as soon as the mice can see.


This indicates the alignment is "activity dependent", in other words the active cortical axons are trying to find the active retinal signal - and if the retinal signal exists in two places, the axons will split and connect with both.

However the body layer is different. There is no coactivation in the body layer. Instead what happens is there's a chemical gradient that lines up the maps. The body map and visual map both generate gradients of ephrin-A, and the axons align themselves along the gradients.


The stimulus seeking in the photic reflex is a primitive form of vectorization. It says "move to" this location independently of the starting coordinates of the eyes. In a way, it is much like the reaching for an occluded stimulus, which tells the body to "reach here" independently of posture or current body location.

What is particularly fascinating about the alignment of the body map to the visual map, is it occurs from the somatic sensory cortex, not the motor cortex.

We know what the sensory map looks like, it has no relationship whatsoever to the visual field. This is what it looks like:

View attachment 1143899
Its called proprioception Proprioception is the sense that tells your brain about the position of your body parts in space, even without looking. It's the body's ability to sense its own movement, posture, and changes in equilibrium. Essentially, it's your body's awareness of itself.
Its created by nerve plexus in your synovial joints
 
Very good, Grasshopper. You're a good programmer.

But... what did I say was impossible?

Look carefully at the sensory homunculus. With ONE exception it is linear, from bottom to top of the body.

Therefore, if I stick a pin in your big toe your eyes will be commanded to move farther "down". Like the cite says, there's a gradient. It's a weird looking one though, compared to the visual field.

Because in our brains, "down" is relative to body position. If you're standing up, down is indeed "down". But if you're reclining on the couch, let's say your legs are crossed so one leg is kinda in the air, then "down" is kinda straight ahead of you. "Downer" is farther away, along the depth axis, your gaze remains approximately parallel to the couch - and most likely slightly off to one side, the side where your big toe is pointing (the one that's in the air). Unless you're watching TV. :p

Relative to the visual field, the axes of the mapping from your body have changed. So, that would be the vestibular system, which is also connected into this pathway we're talking about. You said about computer graphics, and if you play with Maya or Blender or something, this is a type of "perspective transformation", right?

But it's more than that. Let's continue with the graphics analogy. Let's simplify to three senses (vision, hearing, touch). You have three "point clouds", with principal components (or axes) not entirely fixed (they're a little bit noisy and uncertain). Your job is to align the axes of the point clouds so a vector (or matrix or tensor) can be extracted from them that maps each cloud into its respective transformation. In other words, what were the steps that converted each cloud into its final position? Because those same steps have to be applied whenever we want to locate an object.

This is actually a "hard" problem, computationally, because you have to optimize around the inverses. But neural networks can come up with a ballpark solution in milliseconds. It's one of the things they're really good at, searching a feature space. Your neural network can snapshot a map, and then keep the search space within a radius of those features until told to do otherwise.

Viewed this way, the circuitry is both a data system and a control system. I didn't say it was impossible, I said it was beautiful and clever. In each case the result is a single transformation matrix for each sense. And the extra added benefit of this is you get to choose which sense directs the others, so like, do I want to draw your eyes to the pin prick, or to some important event in auditory space, or maybe to some prey you happened to spot out of the corner of your eye? To get from (map to) any sensory cloud to any other, all you have to do is go "through" the alignment, which is just multiplying two matrices, max.

It's a very clever way of doing business. The thing is, these mathematical procedures are enabled by the neural network connections that form during development. The mouse example shows how easy it is to disrupt proper development. From this example we learn that cerebral neurons will map to their sensory counterparts on the basis of (mutual) activity, this way they "automatically" get a topographic mapping and its registration is always within pretty narrow limits of the base map. In the auditory system, attention to an important auditory stimulus involves a lot of head turning and body turning, and not necessarily so much eye movements. That is the job of the cousin colliculus, the "inferior" colliculus. (By the way these areas are called "optic tectum" and "auditory tectum" in birds and fish - we humans call ours "colliculus" because they're literally bumps on the brain stem).

So our brain stem has all these high level reflexes that involve mappings between the senses. At some point higher up in the brain, all these mappings result in one integrated "perception" of sensory reality. The point being that some facility with these mappings is required, as you're going about your life. During a typical day you'll use each of the 9 mappings (for three senses) multiple times, in various ways. You automatically control them without even thinking about them.
You said "We can look at a target, close our eyes, and reach for the target and grasp it without opening our eyes.", but a computer couldn't do the same. I just showed you that they can. After plotting the target location, the camera can be cut off, and the location will be stored and accessed just like any other Gcode coordinate.

I just love instant internet experts who read a web site or two on a complex subject, and think they have it all figured out.
Would you like me to explain Gcode now?
 
You said "We can look at a target, close our eyes, and reach for the target and grasp it without opening our eyes.", but a computer couldn't do the same. I just showed you that they can. After plotting the target location, the camera can be cut off, and the location will be stored and accessed just like any other Gcode coordinate.

I just love instant internet experts who read a web site or two on a complex subject, and think they have it all figured out.
Would you like me to explain Gcode now?
Bullshit.

Show me a video of a robot bending down to pick up a dime after its vision has been turned off.

Doesn't exist.

Not saying it can't, just saying it doesn't.
 
Bullshit.

Show me a video of a robot bending down to pick up a dime after its vision has been turned off.

Doesn't exist.

Not saying it can't, just saying it doesn't.
Admit you were wrong on this one, and let it go. I don't have any dime videos, but a computer can easily control location and depth given the XYZ coordinate which it can easily acquire with a camera. That location is stored and is easily retrieved. Once the coordinates are determined, It doesn't need the camara to know where to locate it's manipulators. Note that in the video, that each XYZ movement is automatic. Yes, I know thar the floor might be lower than shown in the video, but that is just a matter of longer manipulators
 
Admit you were wrong on this one, and let it go. I don't have any dime videos, but a computer can easily control location and depth given the XYZ coordinate which it can easily acquire with a camera. That location is stored and is easily retrieved. Once the coordinates are determined, It doesn't need the camara to know where to locate it's manipulators. Note that in the video, that each XYZ movement is automatic. Yes, I know thar the floor might be lower than shown in the video, but that is just a matter of longer manipulators


I'm not wrong. I'm always right. You should know that by now. :p

I'll tell you why it's very difficult for a robot. It's not a vision issue, it's a grasping issue.

The scenario goes like this:

1. See the dime on the floor
2. Close your eyes
3. Reach down and pick it up

The issue occurs at the intersection between calculating the optimal path and actually executing it. Optimal path calculation is difficult, it's a "hard" computing problem, it's NP complete.

So what a neural network does, is reduce the solution space to a neighborhood. It does the best it can in the allotted milliseconds (remember, this is a real time problem, you can not take minutes to calculate the optimal path).

The result from trajectory calculation is "one path is as good as another", WITHIN the calculated neighborhood. This is why humans never arrive at the dime "exactly", they arrive in the NEIGHBORHOOD of "exactly". And a robot will behave the same way, given the same constraints.

At the very tail end of the reaching phase, humans use a STRATEGY to find the target. We "fish around a little" to find the exact location of the target, and we can do that because we know the approximate size of the neighborhood.

The WAY we fish around depends on the exact characteristics of the target. We fish differently for a dime than we would for a contact lens, and maybe differently if it's on a rug or a concrete floor.

There is no robot in existence today that can do this any better than a human can. You won't find a video, it doesn't exist. Even if the visual map is very precise, it won't help you in the reaching phase. Typically in reaching the neighborhood is a few seconds of visual arc in any direction from the estimated target location. When your hands hit the floor, you feel around a little to find the dime, and then you feel around some more to find the edge of the dime so you can pick it up.
 
I'm not wrong. I'm always right. You should know that by now. :p

I'll tell you why it's very difficult for a robot. It's not a vision issue, it's a grasping issue.

The scenario goes like this:

1. See the dime on the floor
2. Close your eyes
3. Reach down and pick it up

The issue occurs at the intersection between calculating the optimal path and actually executing it. Optimal path calculation is difficult, it's a "hard" computing problem, it's NP complete.

So what a neural network does, is reduce the solution space to a neighborhood. It does the best it can in the allotted milliseconds (remember, this is a real time problem, you can not take minutes to calculate the optimal path).

The result from trajectory calculation is "one path is as good as another", WITHIN the calculated neighborhood. This is why humans never arrive at the dime "exactly", they arrive in the NEIGHBORHOOD of "exactly". And a robot will behave the same way, given the same constraints.

At the very tail end of the reaching phase, humans use a STRATEGY to find the target. We "fish around a little" to find the exact location of the target, and we can do that because we know the approximate size of the neighborhood.

The WAY we fish around depends on the exact characteristics of the target. We fish differently for a dime than we would for a contact lens, and maybe differently if it's on a rug or a concrete floor.

There is no robot in existence today that can do this any better than a human can. You won't find a video, it doesn't exist. Even if the visual map is very precise, it won't help you in the reaching phase. Typically in reaching the neighborhood is a few seconds of visual arc in any direction from the estimated target location. When your hands hit the floor, you feel around a little to find the dime, and then you feel around some more to find the edge of the dime so you can pick it up.
Computers can easily sense if the item is acquired. If not, a sub program directs the pre programmed search until it is found. You are obviously too childish to admit you are wrong. Believe hat you want. This is obviously not the first time You chose to believe bullshit over a proven fact.
 
Computers can easily sense if the item is acquired. If not, a sub program directs the pre programmed search until it is found.

You're fantasizing.

Show me.

If it exists, it's on video.

I don't think you can produce a video.

You are obviously too childish to admit you are wrong. Believe hat you want. This is obviously not the first time You chose to believe bullshit over a proven fact.

You haven't proven a damn thing.

Let's see your proof.

Show us.
 
This is obviously not the first time You chose to believe bullshit over a proven fact.

lol

Well, that was easy. :p

In case you're still around, and for your reference, here is the state of the art as of this morning:


Robots are so notoriously bad at picking things up blindfolded, that this has actually become a hot topic.

Sensory fabrics are not new, they've been around for a while. Connecting them into neural networks with sufficient resolution is new, that hasn't happened yet.

As of this morning, a robot still has zero chance of picking up a dime blindfolded.

Actually to do so, it would require fingernails... think about it...
 
lol

Well, that was easy. :p

In case you're still around, and for your reference, here is the state of the art as of this morning:


Robots are so notoriously bad at picking things up blindfolded, that this has actually become a hot topic.

Sensory fabrics are not new, they've been around for a while. Connecting them into neural networks with sufficient resolution is new, that hasn't happened yet.

As of this morning, a robot still has zero chance of picking up a dime blindfolded.

Actually to do so, it would require fingernails... think about it...
What a stupid thing to say. Do really think it would be impossible to build a manipulator that can pick up a dime?
 
What a stupid thing to say. Do really think it would be impossible to build a manipulator that can pick up a dime?
lmao !

Libtard reading comprehension is near zero. :p
 
For those of you who actually are following, the first half of this video provides an excellent overview of the machine learning context - how you get your transformation matrices and what they mean.



In the case of objects in the visual field, the graphs take the form of simplicial complexes, or "meshes" - the same thing animators use when they're building structure in a virtual world, and the same thing analysts use when they're looking at point clouds.

Turns out, when you're aligning reference frames, you don't have to use coordinates. You can just as easily line them up using "landmarks". Sometimes the calculations are easier and faster that way.

In humans, alignment has a lot to do with attention. For an example, look at the auditory system and an area of the brain called "nucleus of the brachium of the inferior colliculus", which equates with the "external inferior colliculus" in barn owls. The auditory system is interesting insofar as the coordinate system isn't built-in like it is in the retina - instead it's calculated, from tiny timing and level differences as sound enters the two ears. This mapping is aligned with the gaze map in the neighboring superior colliculus, and then the whole thing is passed directly to the lateral posterior nucleus of the thalamus, which is the beginning of the attention system.

The mapping from inferior to superior colliculus (auditory to visual) is plastic, it adapts to changes in the reference frames. You can show this by putting prism goggles on the barn owl's eyes and then measuring the auditory map in the inferior colliculus.

The relevance of the machine learning video is it shows how to do this with just matrix multiplication. The regression will automatically adapt to the correct alignment this way, there's nothing else you have to do.

The second half of the video talks about how the interest in graph symmetries came out of the world of chemistry and molecular design. If you understand what's being said in the video, you understand about 90% of machine learning and at least half of neural plasticity.

The part they don't talk about is the information geometry, where you approximate an arbitrary function (or shape, or graph) using a library of primitives (like Gaussian distributions with varying means and variances). It turns out that this is a relatively simple extension of geometric networks that can be easily accomplished with predictive coding. So for instance in their example of the rabbit, it's computationally expensive to build a 3d mesh from a point cloud, maybe it can be done faster by estimating a set of Gaussians that match the contours. In either case the permutations make the representation resistant to geometric variations (rotations, scaling, displacement). A rabbit is still a rabbit, no matter how big or small it is, which direction it's facing, or which side of the retina it's on.

In the case of alignment, one of the reference frames is the rabbit. It's pretty much that simple
 
15th post
AI will never duplicate human thought. The human mind is stimulated by emotion which is the energy that makes the brain work. Every thought begins as an emotional message form the limbic system. It goes up to the prefrontal cortex which sets goals and actions. AI cant create emotions its 100% equal to the PFC as a processor of explicit thoughts. Emotions determine what we think we know.
 
AI will never duplicate human thought. The human mind is stimulated by emotion which is the energy that makes the brain work. Every thought begins as an emotional message form the limbic system. It goes up to the prefrontal cortex which sets goals and actions. AI cant create emotions its 100% equal to the PFC as a processor of explicit thoughts. Emotions determine what we think we know.

So this idea of aligning reference frames isn't exactly "thought", although the facility is used by thought - it's kind of "sub-thought", part of the infrastructure that supports thought.

In humans, in the sensory domain, there are two main systems that don't require "thought". One is the geometric system in the parietal lobe and the other is the attention system which has some top-down control from the anterior cingulate cortex. These two systems are very much related to each other - what's interesting about this is the way the geometry is used in different ways for specific purposes. Like, you have "place cells" in the hippocampus that help you navigate, and the episodic memory makes use of the attention and orienting information and stores it in a form that can be easily recovered.

What's cool about the geometric deep learning trick is you're essentially mapping these various reference frames onto the surface of a sphere, which has intrinsic symmetry and in turn induces symmetry even in a dataset that had none. Rotations of the sphere let you do invariant convolutions and then the mapping between any two reference frames becomes nothing but a single matrix multiplication.

In humans, the mapping to a sphere can be accomplished anatomically. There are many examples of spherical mappings in the brain, one of which is the auditory field we were talking about. These mappings can be easily accomplished on the basis of chemical gradients during development. The alignment between any two of them ends up being the center and the single parameter r the radius of the sphere. The rest is just rotations, which we're going to use in this way to promote invariant convolutions.

You can map just about anything to the surface of a sphere. It's one of most highly studied areas of mathematics, the Riemann sphere and all that. And once your data is in this form you can do things with it, and yes you can definitely generalize this mechanism into "thought", as the same types of mappings can be used to handle graphs of any kind.

Viewed this way, the mechanisms for mapping sensory activity, motor activity, and "thought" become one and the same. The question of aligning two sensory reference frames becomes the same as the question of finding the shortest path to an object, and navigating a maze the same as navigating the maze of thought.

One of the ways this matters is "deep" learning doesn't have to be so deep anymore. You can look at some of the AI architectures, they have insane numbers of layers, and each layer does something slightly different. Our brains are shorter and wider, the clever use of geometry is what allows one part to work seamlessly with another.

Another view to spherical mapping is compactification, which was the topic of an earlier thread. The same principle applies in the time domain, you can take an interval of time and compactify it into a circle and then create invariances by rotating the circle. It's the same geometry that causes our episodic memory to be played "backwards" during consolidation.
 
So this idea of aligning reference frames isn't exactly "thought", although the facility is used by thought - it's kind of "sub-thought", part of the infrastructure that supports thought.

In humans, in the sensory domain, there are two main systems that don't require "thought". One is the geometric system in the parietal lobe and the other is the attention system which has some top-down control from the anterior cingulate cortex. These two systems are very much related to each other - what's interesting about this is the way the geometry is used in different ways for specific purposes. Like, you have "place cells" in the hippocampus that help you navigate, and the episodic memory makes use of the attention and orienting information and stores it in a form that can be easily recovered.

What's cool about the geometric deep learning trick is you're essentially mapping these various reference frames onto the surface of a sphere, which has intrinsic symmetry and in turn induces symmetry even in a dataset that had none. Rotations of the sphere let you do invariant convolutions and then the mapping between any two reference frames becomes nothing but a single matrix multiplication.

In humans, the mapping to a sphere can be accomplished anatomically. There are many examples of spherical mappings in the brain, one of which is the auditory field we were talking about. These mappings can be easily accomplished on the basis of chemical gradients during development. The alignment between any two of them ends up being the center and the single parameter r the radius of the sphere. The rest is just rotations, which we're going to use in this way to promote invariant convolutions.

You can map just about anything to the surface of a sphere. It's one of most highly studied areas of mathematics, the Riemann sphere and all that. And once your data is in this form you can do things with it, and yes you can definitely generalize this mechanism into "thought", as the same types of mappings can be used to handle graphs of any kind.

Viewed this way, the mechanisms for mapping sensory activity, motor activity, and "thought" become one and the same. The question of aligning two sensory reference frames becomes the same as the question of finding the shortest path to an object, and navigating a maze the same as navigating the maze of thought.

One of the ways this matters is "deep" learning doesn't have to be so deep anymore. You can look at some of the AI architectures, they have insane numbers of layers, and each layer does something slightly different. Our brains are shorter and wider, the clever use of geometry is what allows one part to work seamlessly with another.

Another view to spherical mapping is compactification, which was the topic of an earlier thread. The same principle applies in the time domain, you can take an interval of time and compactify it into a circle and then create invariances by rotating the circle. It's the same geometry that causes our episodic memory to be played "backwards" during consolidation.
Emotions determone wht tiy think you know. Do you lnow what you really feel
 
See? Scruffy is always right. This suddenly became a hot topic in robotics. Now someone has designed a network that automatically integrates new sensors.

 
Back
Top Bottom