Happy New Year everybody. I spent some time over the holidays away from California and not working, which was nice. Upon return though, there was a large box full of robot parts at the front door, and i've been trying to make room for it.
So I'm gonna stop talking about this for a while, and just "do". There's a lot of work in it. But I'll share this one interesting thing with you, check this out -
So a human eye, is about an inch big. It weighs about 8 grams. To move it we attach 6 motors, 3 pairs in push-pull for horizontal, vertical and oblique.
The first part of the requirement is, it has to cover the field of vision. The field of vision looks like this, it's about 200 degrees horizontally and 130 degrees vertical, and the central 120 degrees horizontally handle depth perception, but the fovea is only 6 degrees of that.
The eye has to move very fast! Humans can do 3 saccades per second, with a maximal velocity of 700 degrees/sec or so. So these servo motors have to be very tight!
It turns out, the fastest commercially available servo motor BARELY meets this requirement. The PTK 7308MGD can handle about 900 degrees/sec. It runs on 8.4 volts, which is convenient. It can easily move 8 grams, and we have six of them.
So here's the idea: you're walking down the street and suddenly you see a shadow in the extreme periphery of your visual field. It causes you to turn your eyes and then your head. The initial saccade might cover 100 degrees, and it has to be completed and accurate within 1/3 second.
Our reflexes are arranged so the eyes move first, and then to foveate a spot in the extreme periphery the head has to turn. WHILE the head is turning the eyes remain focused on the target (this is the vestibulo-ocular reflex).
This is do-able with servo motors and Arduinos, but remember, we're going to train this thing like a baby. There is NO programming involved, only self organization. Humans have stretch receptors and pain receptors in the eyes, so if a baby tries to move its eyes too far or too fast it'll stop, it's self limiting. We have to put very careful controls on our robot version, to keep it from going off the rails, so to speak.
I tried doing the control systems theory on this stuff. It's not easy. The plant requires a high degree of stability to handle the full range of motion. So we need to introduce some nonlinearity, maybe sigmoid curves so the deviation can't exceed 100 degrees, that kind of thing.
Before we can train the gaze reflex, we have to train the opposing oculomotor pairs so they work smewthly with each other. This is done using the spindle (stretch) receptors in a closed feedback loop with both sets of drivers. This part is pretty easy because the feedback is direct - if one motor moves, the other one senses it and reports it. Piece o' cake for a self organizing neural network.
The harder part is the fixation reflex, which requires feedback from visual cortex. A very VERY crude version of it could be done with the brainstem alone, but it would not be very accurate (the eyes would end up "in the vicinity of" the shadow but not "on" it). Remember, the SAME system that does these 100 degree emergency saccades, also handles precision reading. One could argue it's a different "kind" of saccade because one occurs within the fovea while the other occurs in the periphery, and it is true there are two independent channels (called X cells and Y cells in the retina, X cells handling detail near the fovea and Y cells handling motion in the periphery).
Which leads us to a whole 'nother discussion about how fast does the VIDEO have to be. 30 fps is not enough. It has to be 200 fps to calculate the motion with any accuracy. The motion capture people mostly use 120 fps, if you've played with motion capture you know the limitations. Moving outlines come out "pretty" good but not perfect.
At 120 fps, 1/3 of a second will give us 40 frames, and in that time we have to turn every pixel into a Y cell. The parallel processing in the aggregate of Y cells will give us VERY good resolution. Basically we're doing hard wired convolutions on every pixel (using a CUDA), turning every pixel into a filter. For every pixel, we get all possible information for that point, including the velocity of the changes in luminance and color. If we have this information for neighboring pixels, we can completely determine the motion and direction of motion at every point.
'Nuff said. It's a challenge. I need a name for my robot. What should I call it?
So I'm gonna stop talking about this for a while, and just "do". There's a lot of work in it. But I'll share this one interesting thing with you, check this out -
So a human eye, is about an inch big. It weighs about 8 grams. To move it we attach 6 motors, 3 pairs in push-pull for horizontal, vertical and oblique.
The first part of the requirement is, it has to cover the field of vision. The field of vision looks like this, it's about 200 degrees horizontally and 130 degrees vertical, and the central 120 degrees horizontally handle depth perception, but the fovea is only 6 degrees of that.
The eye has to move very fast! Humans can do 3 saccades per second, with a maximal velocity of 700 degrees/sec or so. So these servo motors have to be very tight!
It turns out, the fastest commercially available servo motor BARELY meets this requirement. The PTK 7308MGD can handle about 900 degrees/sec. It runs on 8.4 volts, which is convenient. It can easily move 8 grams, and we have six of them.
So here's the idea: you're walking down the street and suddenly you see a shadow in the extreme periphery of your visual field. It causes you to turn your eyes and then your head. The initial saccade might cover 100 degrees, and it has to be completed and accurate within 1/3 second.
Our reflexes are arranged so the eyes move first, and then to foveate a spot in the extreme periphery the head has to turn. WHILE the head is turning the eyes remain focused on the target (this is the vestibulo-ocular reflex).
This is do-able with servo motors and Arduinos, but remember, we're going to train this thing like a baby. There is NO programming involved, only self organization. Humans have stretch receptors and pain receptors in the eyes, so if a baby tries to move its eyes too far or too fast it'll stop, it's self limiting. We have to put very careful controls on our robot version, to keep it from going off the rails, so to speak.
I tried doing the control systems theory on this stuff. It's not easy. The plant requires a high degree of stability to handle the full range of motion. So we need to introduce some nonlinearity, maybe sigmoid curves so the deviation can't exceed 100 degrees, that kind of thing.
Before we can train the gaze reflex, we have to train the opposing oculomotor pairs so they work smewthly with each other. This is done using the spindle (stretch) receptors in a closed feedback loop with both sets of drivers. This part is pretty easy because the feedback is direct - if one motor moves, the other one senses it and reports it. Piece o' cake for a self organizing neural network.
The harder part is the fixation reflex, which requires feedback from visual cortex. A very VERY crude version of it could be done with the brainstem alone, but it would not be very accurate (the eyes would end up "in the vicinity of" the shadow but not "on" it). Remember, the SAME system that does these 100 degree emergency saccades, also handles precision reading. One could argue it's a different "kind" of saccade because one occurs within the fovea while the other occurs in the periphery, and it is true there are two independent channels (called X cells and Y cells in the retina, X cells handling detail near the fovea and Y cells handling motion in the periphery).
Which leads us to a whole 'nother discussion about how fast does the VIDEO have to be. 30 fps is not enough. It has to be 200 fps to calculate the motion with any accuracy. The motion capture people mostly use 120 fps, if you've played with motion capture you know the limitations. Moving outlines come out "pretty" good but not perfect.
At 120 fps, 1/3 of a second will give us 40 frames, and in that time we have to turn every pixel into a Y cell. The parallel processing in the aggregate of Y cells will give us VERY good resolution. Basically we're doing hard wired convolutions on every pixel (using a CUDA), turning every pixel into a filter. For every pixel, we get all possible information for that point, including the velocity of the changes in luminance and color. If we have this information for neighboring pixels, we can completely determine the motion and direction of motion at every point.
'Nuff said. It's a challenge. I need a name for my robot. What should I call it?