my first eye movement !

scruffy

Diamond Member
Joined
Mar 9, 2022
Messages
30,085
Reaction score
26,743
Points
2,788
Well, other than a dumbass bug in my code, it worked!

So let me explain - here I'm using the Raspberry running Python under Linux. It has a camera pointed at a picture of Scruffy on the wall. The purple box is the eyeball. It starts out perfectly centered in the image. Then I tell it to find Scruffy's face, I want the center of the purple box to be exactly on Scruffy's upper lip.

You can see it worked, the vertical part worked perfectly but I forgot to add w/2 to the left side of the box. I got lazy cause I wanted something working, so now I have to go back and properly align the centers and add half the height and width and etc etc.

The pic of Scruffy is from the Stanford dog database, it's a standardized training set for machine learning.

At this point the motor is still disconnected from the Raspberry, I'm just getting the control loop to work. Next week I'll have the motor connected and then we can try some movies.

IMG_20260224_223502052_BURST005.webp
 
Bounding box fixed.

I'm testing with video already.

Here's a random scene from a music video. You can see all the little blue bounding boxes around the people.

IMG_20260225_234401827.webp


Unfortunately YOLO has identified the mandala as a "clock" lol. :p

I'll hook in PyTorch this weekend, with my own neural network trained from scratch. We'll kick some booty.

So now for my next trick, I'll get the eyes to move from one person to the next. Using an "attention" mechanism.

So far this is like an underperforming transformer. In about 2 weeks I'll be ahead of the curve. There is a 256 gB CUDA arriving tomorrow...
 
It's not bad, all things considered.

YOLO was trained for autonomous vehicles, so it's picking up things like traffic lights real good.

I just wanted to get something running real fast, and YOLO is an industry standard for small devices like the Raspberry (think "slightly better security camera", it's about like that). It picks up cars real good but has trouble with detached faces and very small people. This one came out pretty good.

IMG_20260226_001632939.webp


On this one though, it's missing quite a few of the cops.

IMG_20260226_001514952_HDR.webp


These political ones worked pretty good lol - you can see the robot has been correctly identified as "not a person". (Note the traffic light on the top right of the first image). :p

IMG_20260226_001754750.webp


IMG_20260226_001602514_HDR.webp
 
Oh - these are all videos running in real time. Source was an mp4 file. I'm taking pictures with my phone while the video is playing on the screen. So far, it plays back around half speed. That problem should be solved next week.

YOLO is not a full transformer, it's been greatly reduced for performance reasons. Instead of processing layers one at a time like serial convolutions, it does the entire image all at once in one pass. Hence the problems with size, it doesn't scale well. My network will fix all that. I'm doing predictive coding in the dendrites, something brand new that's never been tried before. My theory is, one layer with active dendrites is equivalent to two without.

If things go well I'll be able to triple the performance of ResNet, because I only have two derivatives to calculate instead of six! Training will take longer in my version, but recognition will be much MUCH faster.
 
The FedEx man just brought me two boxes that are bigger than my bass amp! Where am I going to put all this stuff? :(

Got an RTX 5090 and two H200's, that's about 280 gigs of GPU all together. Plus a new motherboard and chassis to fit it all. Now I get to have fun configuring a Linux for all this. :)

Shouldn't take more than two days... meanwhile I'm limping along with two GTX-1080's. This new system is promising me 280 frames per second. Yowza.
 
This is my simulator software. Using this, I can quickly and easily design neural networks. I just tell it what the network looks like, and what kind of neurons and connections to use, and it does the rest. The pic shows an eye movement system. On the left is my targeting network and on the right are some motor neurons that drive eye muscles. Right now, there are sliders on the left that specify object locations. On the right are the oculomotor commands needed to move the eyes from one object to another. If you look carefully you can see the spikes from the motor neurons in the colored traces.

IMG_20260226_134342996_AE.webp


This now requires an attention network to provide a "winner take all" capability to the list of available eye movements (so one and only one colored trace is applied to the muscles, all the rest are inhibited).

With this simulator I can do in one day, what it takes Google an entire month to accomplish with a dozen of their best people. It can switch from a spiking network to a rate code with one line of Python code, all the displays adjust automatically. It interfaces with TensorFlow, PyTorch, SpiNNaker, and HP supercomputers (anything that'll run MIDL-C or Concurrent C, and it'll accept as many Raspberries as you want to throw at it, up to the connector limit or the performance limit, whichever comes first).
 

You may ask yourself, "self, why is that crazy guy doing all this?"

The answer's real simple: "because I can". :p

There's 2 million dollars attached to the next ILSVRC. Do you realize, the network that won the last round had a 15.3% error rate? I have that already, with only 2 days' work. My error rate will match the best transformer LLM, it'll be down around 0.5%.

Besides, there's something I know that Yann LeCun doesn't. So they can pay me the 2 million, instead of him. :)

Meanwhile I get to show you guys how it's done, that way you can get 2 million too. :lmao: Seriously, the forum keeps me honest, I have to post work product on a regular basis. If I get bored (which happens frequently) the work has to continue.

We're not going to make it to the stars without AI, I'd like to do my part. I ain't no physicist but I know computers and I know biology. I want to make AI really work, not like the BS that passes for intelligence on Google. Not like self driving cars that tear down fences and mailboxes and end up driving down train tracks.

If this project turns out to be a dead end there's 40 other projects I have in mind. AI with a conscience could be an interesting one. (We're halfway there already, Meta is already extracting intention from the inflections of speech). But I figured industrial robotics is a sure money maker, as distinct from ethics which usually costs rather than makes.
 
Well, Linux is working. According to the benchmark I'm getting 242 fps at 4k without the GPU's. That's good enough. For training I can use the GPU's in Google CoLab for free. So that's the next step, train up a full dog model on "my" network.

How the training will work:

3 phases

(this is a targeting system, driven by visual input)

1. train on stills - 282 dog breeds, annotated

2. train on movies of moving and running dogs, one dog at a time

3. train on movies of groups of dogs running in the dog park

Tha goal is to be able to maintain the bounding box on a moving target, and accurately identify the location information for an eye movement needed to center the target on the fovea.

Once that's done I get to play with the motors. That's when the real fun begins. The first acceptance test will be moving the eyes from one target to the next, in some order (size, color, doesn't matter), "as" the dogs are running.
 
Back
Top Bottom