exciting new evidence supports Scruffy's brain model

Okay, no reverb. Shucks. So here's another way to do it. Today's math exercise is fractional calculus.

We start with the differentiation operator D, defined as

D (f(x)) = d/dx (f(x))

and we can get second and third derivatives by simply composing D with itself, like

f ' ' = D ° D (f)

which is like D^2 (f).

So, question: what is sqrt(D)? What is D^(1/2)?

Enter fractional calculus.


What good is it?


The chaotic modes of a Hopfield network "with delay" can not be adequately explained by integer order derivatives (alone).

Where this helps right now, is as follows: there are two dozen or more types of ion channels that contribute to bursting in neurons. Some combination of three of them appears to be essential.

. A voltage gated calcium channel that inactivates with hyperpolarization
. A hyperpolarizing potassium channel
. A tonic depolarizing sodium channel

The dynamics of these three channels is what requires fractional calculus.

Remember, we're trying to extend the predictive timeline down to the molecular level, because we want the limit as dT => 0. The fractional calculus is good for this because it describes past, present, and future.
 
Oh - here's the connection.

Gamma function <=> gamma distribution

Gamma function <=> fractional integral

Gamma distribution <=> Bayesian statistics

The gamma function is the integral of a gamma distribution.

This directly links Bayesian probability to neural firing rates. When the gamma rate parameter is set to 1, it corresponds to the n-th event in a Poisson process with a rate of 1.

Hence you can model your spike train as a stochastic Poisson process and apply the methods of information geometry to extract the resulting population dynamics.

This relationship should be very easy to test on a simulator.
 
This is great stuff. It's all coming together now. All that's needed is a distinctly non-Hebbian synapse. Of which we already have... y'know... two dozen examples in biology.

First of all, the modes of a neuron are determined by its ion channels. Simple membranes like the squid giant axon have only two modes: they're either spiking or they're not. More complicated neurons like Aplysia central pattern generators or human LGN relay cells, can have multiple modes. There is a quiescent "off" mode with no spikes, and two different "on" modes depending on whether the membrane is in an up or down state. In the down state, only a single action potential is generated, but in the up state, the neuron starts bursting.

The interval between spikes therefore has a biphasic distribution. There is one peak at a high frequency corresponding to the spike interval during bursts, and there is another broader peak at a much lower frequency corresponding to single spikes. Both the length of the burst and the interval between spikes can be controlled, and in a circuit where the bursting neuron is part of an oscillator, or in central pattern generators, the burst rate can be controlled as well.

Burst rate means brain waves, the burst rate is what contributes to the frequencies you pick up in an EEG.

Fundamentally, rates and intervals, as well as delays, mean durations. Duration is what you specify in the scale parameter of a gamma distribution, which corresponds with the rate parameter of a Poisson distribution (they are reciprocals). And, duration corresponds with the inter-spike interval, which is simply the time to the next event in the Poisson model.

ThereFORE - if you have a library of time constants, you can model any sequence of events as a combination of Poisson distributions. How do you get a library of time constants? Two ways:

1. Genetically. This method is used in the ramp cells in the entorhinal cortex. There are six time constants, determined by proteins that bind to NMDA (glutamate) receptor channels.

2. By self organization. This method requires a synapse that programs the inter-spike interval. In other words, it has to have the right set of conductances.

Method (2) is the domain of information geometry. You can NOT use traditional gradient descent on a Poisson process because the coordinate system isn't Euclidean (Poisson, gamma, and related distributions are part of the "exponential family"). Instead you have to use information geometry, which treats the parameters as a Riemannian manifold (a "surface"), the metric for which is the Fisher information metric.

So the synapse that makes this work, is not a traditional Hebbian synapse whose "strength" is modified by correlation, it's a different kind of synapse whose time constant is modified by correlation.

In the Hebbian version you get algebra, the synapses come to model the statistics of the input. In the extended version you get dynamics too, you get lead and lag correlations instead of just correlations at a point in time.

And here's the kicker: these temporal correlations can push both individual neurons and populations into different modes. An example of this was just given in the other thread on Ising machines, where input pushes the dynamic to a different region in the phase plane.

So in the frontal lobe, how you get programmable delay, is with these special synapses. It turns out that for y'r average refractory period of 1 msec or so, you can program delays from seconds to days, simply by altering the synaptic time constants. In a Hopfield network this behavior is measured by "time to return", which is a term the mathematician Poincare came up with to describe how long it takes for the same pattern to recur in the network.
 
Beautiful. This is how the visual system works, geometrically.



There is one such point cloud for each explicitly defined output of the visual system. One for color, one for motion, one for direction, one for spatial frequency, and so on. They are overlaid to obtain a meaningful reconstruction of the visual scene.

The information in the scene is then heavily compressed. We remember "chair, blue, over there" and we don't remember the pile of dog hair that was next to it, or the air freshener someone sprayed moments before. The "other" visual system, the one from the retina to superior colliculus to pulvinar to parietal cortex, is the one that determines what to pay attention to. Ultimately that ends up in the frontal eye fields and guides eye movements.

In memory, the object label is combined with its physics, and a value is attached to it. The value is bound to the label and the geometry, so it won't be activated again till the input occurs again. This way, you can have practically infinite delay.
 
At about the 9:51 mark in the video, is an embedding that's flying under the radar screen. See if you can spot it.

Start at 9 min.

They're showing you a point cloud created from sensors, and then they talk about comparing it with the ground truth. Then they show you a picture of the two indicator grids.

Then they say the magic words. L2 norm... they say it so casually it's like they don't even know what they're saying. It goes by really fast. But that's the key, that's what makes the whole thing work. That's a joint probability distribution that equates with a topological embedding. In this case it's the world's simplest embedding, it's just point to point.
 
Very, very clever stuff. Check this out - the brain has a GUID! it's located near the hippocampus in an area called entorhinal cortex (EC) at the inside tip of the temporal lobe.

What happens is this: in scene mapping, spatial information is first completely separated from information about time, and then recombined with it in a different way. In the EC there are "grid cells" that projectively map a two dimensional view along several different orientations and spatial frequencies, limited by "border cells" at the boundaries. The result is a bunch of "place cells" that fire in sequence as the organism traverses the scene.

For time though, EC uses a collection of "ramp cells" with different time constants ranging from seconds to hours. They're periodic like grid cells, but they only have one axis. The result is an internal time to space code that's completely independent of the timeline mapping, and it's needed only for memory. The particular configuration of ramp activity forms a spatial pattern that gets stamped into episodic memory along with object location. It's essentially a GUID, a unique time stamp.

The GUID is needed for playback. If you inhibit the GUID all the frames are still there but they're in the wrong order, and subjects show marked deficits in sequential recall. Apparently all that's being stored is "this object, in that place, at this time".
 
And... finally provides a reason for the consolidation of memory in human beings.

You'll recall that short term memory is eventually consolidated into long term memory.

And, you'll remember the earring model, where the outer hoops represent long time constants and the inner hoops go all the way to dt.

And you'll remember all the noise I made about predictive coding.

Well, look here - look what happens while memories are being consolidated:


According to the earring model, the memories are moving straight up, along a line segment connecting all the points at infinity.

View attachment 1212837

And you'll recall I said two very specific things, predicted by this model:

1. Memory lives at the point at infinity. Because this is where the ends of the compactified timeline meet. Memory is the ultimate destination of all information, and the ultimate source of all information. The only way a sensory configuration turns into a motor action is by going "through" the point at infinity.

2. Awareness lives slightly ahead of "now". It's a prediction, is what it is. A prediction in the limit as dt => 0. It is a prediction that is mapped into the timeline, so it can be treated just like any other piece of information.

The amazing speed (resolution) of the network is what makes it work. Calculated at .05 picoseconds for 20.billion neurons, 5 ps if we only do cortical mini-columns, and a fraction of a nanosecond if we take full cortical columns.

That thing that you experience to be about a second wide, isn't. It's a critical state that lasts a few nanoseconds at most. The smoothness around state changes is due to the asynchronous nature of the updates, and is supported by an embedding into a sea of relatively long time constants.

The areas that glue the compactified timeline together are exactly the hippocampus and the dorsolateral prefrontal cortex - the two areas most closely related to memory.

And guess what? This also explains why phase coding is necessary! It's the only way to represent an entire pattern of activity along the timeline, as a single spike train in a single neuron.

And it also explains the sparse coding in the hippocampus.

I think, ladies and gentlemen, we have a winner. Now that the underlying mechanism is established, we can turn to more fundamental questions like "why is red red" and where am "I".

I'm pretty sure this last piece of evidence clinches the deal on the model. I'm not aware of any contradictory evidence. (Like, "any").

It makes perfect sense, too. The outer hoop is the width of the whole network, maybe a second in each direction from "now". It corresponds with "biological time frames", in other words this is how fast signals typically change when you're, say, pursuing prey or trying not to get eaten. To access that as quickly as possible the information has to move down into the inner hoops where the radius is narrower, and there, at the end of the journey in the limit as dt => 0, you get molecular memory which equates with electrical memory.
I find your model interesting, the extensions are also interesting.
 
I find your model interesting, the extensions are also interesting.

Thank you. Three things I learned today:

1. There are 7 separate conductances in thalamo-cortical relay cells.

2. The variation in EC ramp times is due to a genetically programmed chemical gradient.

3. Little patches of dendrite can form their own oscillators.

I'm still struggling to understand the role of the "coupling constant" in nonequilibrium thermodynamics. The equations treat it like it's dimensionless, and there's no process delay associated with it. It seems to be an "instantaneous" effect related to correlation, much like quantum entanglement.
 
Well... I can program the most fundamental part of this, which is a duration. (A "time interval"). On a time scale of msec to days, in a single neuron.

How it works is a little complicated. It uses the "glutamate spillover" mechanism from an earlier post.

Turns out, spiny neurons have two different stable states in their dendrites, plus one unstable state. The stable states are called "down" and "up", and the unstable state is when the dendrite "fires", it generates a mini-spike that travels down to the cell body in the usual way.

Transition from down to up requires many successive strong excitations (three or more). But up is a stable state, once it's been achieved it stays there, until it's specifically turned off.

In the down state, a strong excitation may create a mini-spike that travels to the cell body and then dissipates, or results in at most one action potential. Whereas, in the up state, the same mini-spike can initiate bursting.

I can control the duration of the up state, synaptically. Because, it's a stable state, it's regenerative but balanced. However the required learning rule is unclear. The ordinary Hebbian rule would have to have two different time courses to work in this context. And maybe that's the secret, maybe it does.

This mechanism requires a glutamate neurotransmitter and two different kinds of receptors, called AMPA-R and NMDA-R. Both receptors require interaction with a hyperpolarization-inactivated calcium channel.

Reset is accomplished through inhibition, could be a chloride or potassium channel or both.

During the UP state the neuron has high reliability throughput, whereas in the DOWN state it behaves like an ordinary neuron. What I can regulate specifically is how long it's up.

In theory, I can put together a spike train using a combination of small intervals. This mechanism would have to work across neurons to be effective, and there is plenty of evidence that it does.
 
Why can't you tickle yourself?
 
Okay, got it. Altogether, this is very clever.

Earlier I showed the concept of extracting real time causality from a predictive neural network. (That was the Granger example).

This type of analysis makes certain simplifying assumptions about the data. In real life, all data is highly nonlinear and so machines that use separability to estimate parameters (which is most of them) can easily fail.

However you can use nonlinear thermodynamics to map the data into a coordinate system that's linearly separable. And it turns out that to do this, the network has to operate at the edge of chaos.

This paper proves it. The "mean field model" is straight out of nonlinear thermodynamics.


"Ordered dynamics" means you have nicely behaved brain waves, and your neurons are firing more or less in synchrony. "Chaotic dynamics" means your neurons are all over the place, firing at wild intervals, bursting, all of the above. The transition between order and chaos is a "phase change", like ice-water-steam. Just like in an Ising model (when you heat the magnet the dipoles disalign and you lose magnetism), when the neurons disalign you lose coherence and the system goes into a critical state. Right at the edge of that transition, is the optimal point for both calculations and storage. You can actually measure the separability using the KL divergence.

At the critical point, the boundaries between states take on a fractal character and exhibit power law dynamics.

Okay, so how you control this, is by controlling the interval between spikes (called ISI, the inter-spike interval). You can use an inhomogeneous Poisson model of the spiking, or even a leaky integrate-and-fire with noise. You want a renewal process for the ISI's, this way you get a gamma distribution that's very easy to manipulate by simply changing rates (which in this case equates with delays, or ISI's).

These notes explain the math in detail.


With this math and Scruffy's topological model you can tie everything together.

The good news is, you can do this in hardware too. Right now, today. Two different ways! You can use memristors, and you can use bitstream generators in the binary case.

This mechanism will absolutely work, I guarantee it. I will prove it in the coming weeks. With this mechanism you can generate precisely timed signals along the timeline, from phase encoded memories, in something approximating real time.
 
Okay, got it. Altogether, this is very clever.

Earlier I showed the concept of extracting real time causality from a predictive neural network. (That was the Granger example).

This type of analysis makes certain simplifying assumptions about the data. In real life, all data is highly nonlinear and so machines that use separability to estimate parameters (which is most of them) can easily fail.

However you can use nonlinear thermodynamics to map the data into a coordinate system that's linearly separable. And it turns out that to do this, the network has to operate at the edge of chaos.

This paper proves it. The "mean field model" is straight out of nonlinear thermodynamics.


"Ordered dynamics" means you have nicely behaved brain waves, and your neurons are firing more or less in synchrony. "Chaotic dynamics" means your neurons are all over the place, firing at wild intervals, bursting, all of the above. The transition between order and chaos is a "phase change", like ice-water-steam. Just like in an Ising model (when you heat the magnet the dipoles disalign and you lose magnetism), when the neurons disalign you lose coherence and the system goes into a critical state. Right at the edge of that transition, is the optimal point for both calculations and storage. You can actually measure the separability using the KL divergence.

At the critical point, the boundaries between states take on a fractal character and exhibit power law dynamics.

Okay, so how you control this, is by controlling the interval between spikes (called ISI, the inter-spike interval). You can use an inhomogeneous Poisson model of the spiking, or even a leaky integrate-and-fire with noise. You want a renewal process for the ISI's, this way you get a gamma distribution that's very easy to manipulate by simply changing rates (which in this case equates with delays, or ISI's).

These notes explain the math in detail.


With this math and Scruffy's topological model you can tie everything together.

The good news is, you can do this in hardware too. Right now, today. Two different ways! You can use memristors, and you can use bitstream generators in the binary case.

This mechanism will absolutely work, I guarantee it. I will prove it in the coming weeks. With this mechanism you can generate precisely timed signals along the timeline, from phase encoded memories, in something approximating real time.

Can l trade mine in for a newer model?
 
I find your model interesting, the extensions are also interesting.

He has quite an interesting past:

But I lucked out, got a high number in the last year of the draft and then the war ended.
He avoided combat.... Keep that in mind as you keep reading his "accomplishments: Such as this one:

1770891346566.webp



And I'm very proud of my son, he's been arrested three times already

I happen to be a credentialed security expert with LOTS of field experience,

I have all kinds of great Navy stories.

Yesterday I was in the music studio with Dr Dre's horn section.

I'm a seasoned combat veteran and I know bullshit when I see it.

Jeez - we live in the middle of Oklahoma, 30 miles from the second largest Indian reservation in the country.

I used to run security for a big medical provider.

For the record, I've been smoking a Schedule 1 substance since I was 9 years old.

in 1979 I was decoding cavitation noises from Soviet submarines

Yesterday I helped a food bank in one of the poorest areas of Philadelphia. This morning I helped an old lady across the street in the middle of rush hour. Tonight I'm putting together a concert in the park for charity. Tomorrow I'll be donating four enormous boxes of designer clothing to the local Goodwill. Sunday I'm leading the choir at the local church. Monday I'm helping a store owner avoid bankruptcy by offloading his backlog of unfixable electronics.
 
Okay. For my first trick, I'll get my robot to correctly alternate between smooth pursuit and saccades.

The rules are:

1. Use smooth pursuit until the error exceeds threshold, then initiate a saccade to correct the error.

2. If the error is over the threshold to begin with, use a saccade to foveate the target.

3. After either (1) or (2), once the target has been (re)acquired and the error is below threshold, (re)initiate smooth pursuit.

4. If the eyes happen to be on target to begin with, condition (3) applies and begin smooth pursuit right away.

The input is a retinotopic map identifying the target. (The location of the target comes from the bounding box logic). This then becomes a map relative to current eye position. The error is target position - current eye position. When the target is foveated the error is 0 by definition.

This system will self organize, except for the following:

1. The threshold value.

2. The axes of the retina.

3. The selection of the target.

Number 3 will change when we get to attention, next round. (Attention is hard, it's like meta-learning).

Number 1 is assumed to be a network attractor, which means it's determined by neural wiring and synaptic weights. This attractor can be guided by a single genetically programmed setpoint.

Number 2 is genetically programmed by extracellular chemical gradients. That requires two molecules (one per axis), so between 1 and 2 we require only 3 genes to program the entire network.

To be safe we probably want to use two extra genes for the vergent and torsional components of eye movements, which don't directly align with retinal position. Not sure if we'll need them, but we'll note them anyway.
 
Wait -- damn -- I forgot what I was going to say.
 
15th post
This project replicates a big part of the brain. It has a sensory system, a motor system, and a closed loop control system that's required to optimize around "now" (T=0). It is, in effect, my timeline - which I'll then use to show how voluntary eye movements work. (I know how they work, but no one else does, so I'll just have to show em).

And, this project will contribute to medical science in countless ways. Doing this will expose the reason for the existence of some important clinical indicators. Eye movements are indicative of a lot of things, you can use them to predict Parkinson's, Alzheimer's, even schizophrenia.

And, from a hobby standpoint, this is a great engineering project. Just the idea of ocular drift (and correcting it) is enough to keep most engineers up at night. You can check out those Chinese robots, they don't drift. Their eye movements (if they even have any) are very unnatural.

Think of the commercial applications! I want to do this with a Raspberry Pi, which costs all of 50 bucks. For like 150 bucks you could have the world's best security camera. All that wasted footage in your overnight stream would go away instantly, leaving you with a direct view of any targets along with their information. Once the attention system is in place you can program it to look for faces. The possibilities are endless.

The dog and pony show is just to have the robot track you around the room. Trust me, it's really really scary to see a machine with human eyes. You won't forget the experience.

All of this is self organized. Driven by the data, by the statistical organization of natural visual scenes. All the control systems are closed loop, although there may be some performance benefit in having an initial ballistic component. The information theoretic aspects of eye movements are fascinating, they maximize the information transfer between the world and the brain. Check this out - you see the boundary, but what's so attractive about the ear?

1771019040002.gif
 
Okay well, the bounding box is already working. Piece of cake. I'm using the YOLO Docker image, I hooked it up to my webcam, it works. The disappointing part is I'm only getting about 8 fps out of the Raspberry. On the PC I have an nVidia CUDA, it's giving me the full 60 fps. YOLO uses PyTorch, I may have to split the workload between more than one Raspberry. Which is painful because it requires about 1 mb of shared memory to get the point map back and forth. Or send 1 mb between machines every 1/60 second.

The other possibility is to go fully modular but then the cost multiplies and software management becomes complex and the device grows and the power requirements grow.

Maybe I can do it with 8 fps. We'll see.
 
Coupla world class links for you guys.

This is everything you need to know about information geometry.


This is more than you ever wanted to know about information geometry.


Thank me later. :p
 
Back
Top Bottom