exciting new evidence supports Scruffy's brain model

scruffy · Feb 2, 2026

Here, something you may be interested in.

Friston's free energy principle: new life for psychoanalysis? - PMC

The free energy principle (FEP) is a new paradigm that has gain widespread interest in the neuroscience community. Although its principal architect, Karl Friston, is a psychiatrist, it has thus far had little impact within psychiatry. This article ...

pmc.ncbi.nlm.nih.gov

The free energy principle is physics, and like I said, if the model is correct it should help explain psychology too.

scruffy · Feb 3, 2026

This is what Friston so elegantly points out in the video. In machine learning, the machine is given a goal. Friston uses the example of a hungry owl ("find food, find a mouse"), because the owl will then scan the visual field looking for a mouse. But machines don't work that way, a machine will find the optimal algorithm to minimize the hunger it's experiencing, never once engaging in active search.

Active search is a "capability", so whenever it's brought into play it has to be staged along the timeline. That's what the generative part of the frontal cortex does, it puts stuff into the timeline along T > 0. The basal ganglia are responsible for the subsequent tracking - again: caudate nucleus, whole brain map. Putamen, whole brain map. The staging includes the expected sensory consequences.

So "minimizing error" involves updating either the goal or the environment, and sometimes updating the environment is impossible. (Politics being an example). We are sometimes left with "residual error". Which may cause discomfort whenever it enters our consciousness.

"Residual error" is generative of emotions. When you can't do something, maybe you get frustrated, you say "God this sucks" even knowing full well it's your fault. Unresolved residual error can be easily deconstructed, it happens every day in places like AA. On the other hand, inability to minimize residual error affects self esteem. It can generate anger, and fear.

According to this model then, the capability to predict is necessary for perception to occur. There can be no perception without prediction. In the human visual system the oculomotor tremor during fixation is about 100 Hz and covers about 0.004 degrees of arc in the visual field (that's about 0.24 minutes which is considerably larger than the receptive field of a foveal cone). That means, no matter what the visual system predicts, it will always be wrong. By a very small amount, but still wrong. During a typical fixation of 1/3 second, at 100 Hz, the visual cortex will receive 33 different images of the same scene. Gradient descent can thus occur 33 times, no more. In that window, the visual system has to model the image, and generate the next prediction (for what it will see after the next eye movement).

How the visual system does this, is very clever. It breaks up the 1/3 second window into three windows of 1/10 second each (at the alpha frequency), and optimizes each image 3 times. This way any noise due to micro tremors cancels itself out. The last estimate goes into hippocampus with a delay of about 200 msec, and there, if the free energy indicates surprise, you get a P300 about 100 msec later (just about enough time for a round trip through the frontal cortex, which is a context search to verify that the information really is surprising).

(btw I can give you references for all this stuff if you'd like)

Free energy is a unifying principle that ties together physics and psychology. It is accessible mathematically through statistical thermodynamics, as first discovered by Shun-Ichi Amari in 1978 (the godfather of information geometry - if life is fair he should be the next Nobel Prize winner). The brain is a physical device, it obeys physical principles. It is agnostic to the data, except insofar as said data may create unresolved residual errors.

One of the specific predictions of the free energy model was the need for a closed control loop in the oculomotor system, as first noticed by David Robinson in 1981 (although he was unable to prove it at the time, because the free energy principle didn't exist yet, it was invented in 1991 by Rajesh Rao and Dana Ballard at the Salk Institute, working with Francis Crick of DNA fame and Terry Sejnowski who built NetTalk, the first artificial neural network that could learn to read all by itself). The closed loop in the oculomotor system has now been found, it starts in the palisade endings of the ocular muscles and the reason it wasn't found before is because ocular proprioceptors enter through the trigeminal nerve, not the oculomotor nerves. No one still knows exactly how this system works, but we can see its effects. And no one knows "why 100 Hz" either, whether this is an oscillation in the motor neurons themselves or whether it belongs to the network. But we know for sure the closed loop is there, and this year we'll find out how it develops.

Here's an easy way to modify your perception: go to Disneyland and put your hands on the Electric Genie on main street. Note carefully what you're perceiving while you're getting 60 hz AC. (You'll perceive stuff, for sure). Human beings have no electric sense, we don't have lateral line organs. But you're sure as hell going to perceive that electricity. You have my word in it.

scruffy · Feb 3, 2026

scruffy · Feb 3, 2026

******* biologists.... someone needs to tell these assholes to stop using a 1920's classification system for brain waves. The aloha-bets-theta game is broken, they should stop playing. Just give me the frequency and shut up about the rest.

Okay, so look here, I'll show you why oscillations are important. We're now talking about a direct monosynaptic projection from ventral hippocampus CA1 and subiculum, to medial prefrontal cortex pyramidal cells and basket cells. This pathway has a transmission delay of about 15 ms.

As you know, hippocampus generates theta, which in humans is in the 7 hz range. Spike trains in mPFC are phase-locked to theta that occurred about 50 msec earlier.

However mPFC generates two different kinds of gamma, slow and fast, coming in at 50 hz and 100 Hz respectively. And, the amplitude of the gamma is phase locked to the theta in the hippocampus.

The blue line shows the coherence spectrum between vHC and mPFC. The other colors are controls.

The arrows show the two gamma frequencies. Gamma in the mPFC occurs in localized bursts, in hot spots that are small and localized compared to the whole projection.

There's a lot of speculation in the ref below, but the landscape is clear. HC makes a map indicating where objects are, but it doesn't encode the details of the object's features. A gamma-initiating signal to mPFC is a request to go get contextual information about an object in the scene. The mPFC does this, not the hippocampus. If the mPFC doesn't find anything, that's when you get the P300. The request has to make the round trip through the PFC, and if nothing comes back, then and only then do you get the P300.

Oscillations and hippocampal–prefrontal synchrony - PMC

The hippocampus, a structure required for many types of memory, connects to the medial prefrontal cortex, an area that helps direct neuronal information streams during intentional behaviors. Increasing evidence suggests that oscillations regulate ...

pmc.ncbi.nlm.nih.gov

Hippocampal-Prefrontal Circuit and Disrupted Functional Connectivity in Psychiatric and Neurodegenerative Disorders - PMC

In rodents, the hippocampus has been studied extensively as part of a brain system responsible for learning and memory, and the prefrontal cortex (PFC) participates in numerous cognitive functions including working memory, flexibility, decision ...

pmc.ncbi.nlm.nih.gov

scruffy · Feb 3, 2026

The Credit Assignment Problem

Or, why this new evidence is so wonderful.

So, you're a rat navigating a maze. Suddenly you come upon a reward. The reward is good, you want to get it again. So, how do you determine, which sequence of actions and decisions led to the reward? This is called the "credit assignment problem", because it could have been something you just did, or something you did yesterday, or last year.

The first rule of predictive coding is the information has to be made explicit. The psychologists, in this scenario, will immediately yell "dopamine!", but that's not what's going on. That only happens later, once there is an "expectation" of reward. In the scenario i'm posing, there is no expectation, it just happens. We will stipulate that there is an explicit representation of the reward. If the reward is some fruit juice, we'll assume there are taste receptors and etc.

To make the navigation history explicit, we need not just memory, but a specific kind of memory. If you're a computer programmer your head goes to lists. If we have a function of the form

f(t, t-1, t-2, ... , t-n)

we can make a list of what happened at each time step, then traverse the list and correlate each action with the reward. The problem is the list becomes recursive, because you have to work through it backwards.

If your sequence of actions to get the reward was A=>B=>C then was it A? No. Was it B? No. Was it C? Yes! That's how I got here, that's how I got the reward. But how did I get to C? Now you have to go through the list again. Was it A? No. Was it B? Yes! That's how I got to C. Rinse and repeat till you get the whole sequence.

This procedure implies time ordering of the events. How does that happen, in memory? Watch the Friston video, then think back to what I said about gamma functions. The answer is, the current state is encoded along with the action. Which in turn, requires the current state to be made explicit.

But there's another way to do this, that doesn't require long term time ordering. And that is, you simply associate successive scenes. So I start the maze, I get to A. Snapshot. Now I move again, to B. Snapshot. B is now associated with A in short term memory. Now I move again, to C. Snapshot. C is now associated with B. So now I have the reward and I'm doing credit assignment. How did I get here? C. Okay (HC asks to PFC) give me everything associated with C. B comes back (because there is no direct association between A and C). So now, give me everything associated with B. This time, both A and C come back. So now you know that A is associated with C, and if you keep doing this eventually your answers will converge and there won't be any new information.

The point being, there's lots of ways to do this. BUT, the constraint is you have to do it within 100 msec. That makes it a lot harder, because now you can't use recursion (at least not in real time - because you never know how long it's going to take, how many times you have to recurse).

For credit assignment then, recursion is no good. However - the second time you run the maze, you have an expectation that you'll get a reward. So all the time your body is doing A=>B=>C, your mind is thinking "reward, reward, reward". In this case, the time cells in the entorhinal cortex will become active and you'll get a series of ramps with different time constants, somewhere between half a second and several minutes. What will happen then is the hippocampus will serve as an extension of the timeline based on its phase encoding behavior. What is actually happening is HC is switching its allegiance between its role in navigation and its role in memory. Everywhere there's a gamma burst, it's a request for contextual information. Everything else, gets stored.

Quite obviously, the goal is to build a causal model between the action and the reward. When I execute "this" action in "that" context, I get the reward. If it happens in some other context, maybe I don't get the reward. But this, is nothing more than statistical correlation. The whole complicated process with theta and gamma is all to correlate the action with the reward. And there is a statistically efficient way to do this, that doesn't require recursion.

This is what Friston is talking about when he mentions KL divergence. I showed you a "very" slightly different version of this, using Granger causality. This method is entirely statistical, it doesn't depend on maze geometries or navigation paths. And, it can be done with a single dynamic rotation during a single theta cycle. Very fast, very efficient. Easily meets the real time requirement. But it requires a control system, which is where theta and synchronization come in.

So think about what happens when theta equates with a rotation. You're going to sweep across the timeline from from to back, and every place there's an event you'll generate a spike. So now in one neuron in one pass, you have a map of the entire timeline activity. Each spike corresponds with some other currently active system in the brain that holds information relevant to the case. If I want to correlate one time with another, all I have to do is inhibit everything in the spike train except the two spikes (times) I want.

You see? We're testing the model, to make sure it works computationally, before we spend time doing simulations or performing experiments.

Rotations extract causality in one pass. If I have a good enough motor map, when I input "reward" it'll give me back the timeline sequence, A=>B=>C. This way, when my dopamine kicks in, I already have the whole sequence, so I know exactly how to get the reward.

The beauty of the paper in the OP, is it's showing me the requirements for consolidation. Specifically, mapping is statistical rather than geometric. It IS geometric, but the geometry is there to support the statistics.

Hafar1014 · Feb 3, 2026

scruffy said:
Only the computational structures are damaged, not the memories. Lack of computation makes recall impossible, even if the memories are still there.

Two points - one, we have established beyond any shadow of a doubt that long term memory equates with stable molecular configurations. These configurations persist after death, they can only be changed by active processes tied to experience. RNA is required, and the transfer of RNA between neurons and between neurons and glia is required.

Two, computational inaccessibility has been extensively studied. Especially in primitive organisms. A neuron will shut down if it's not given input, eventually it'll atrophy and die (it's just like a muscle, it needs to be used). But the brain grows new neurons. In the hippocampus you mentioned, in the area that feeds it called the entorhinal cortex, neurons are replaced every 15 days or so. What does that tell us? It tells us the memory is not in the neurons, it's somewhere else. Neurons are built for computation, they don't do memory. GLIA do memory. If you lose all your neurons your memories will still be intact, because they're in the glia. You just won't be able to get to them, because you don't have any computational power because you have no neurons.

Memory is supported by physical structures which in this case no longer exist. So where is it coming from? Glia cells simply support action potentials by secreting myelin but they no longer exist in these cases.

scruffy · Feb 3, 2026

Here's two versions of the flash lag illusion. Is the green dot aligned with the red or is it behind?

Here's another version. Are the flashes lined up or are they behind?

Why does this happen?

Simple answer. Your brain can predict the red, it can't predict the green. Your brain can predict the rotating bar,but can't predict the flashes.

Your perception is predictive. Anything that can't be predicted is lagged. Not by much, but noticeably so.

scruffy · Feb 3, 2026

I'm going to show you the topological embedding, so you can see why it works. This is conceptual, and I'll do it in steps.

First - here is an artificial neural network.

1. You see the neurons, and you see the little dots underneath them. I want you to lift the dots off the page, and put them in a separate layer. (Conceptually, right?) Now fill in any missing dots, so continue the rows and columns of all the little dots.

2. Note that the small dots are a lot denser than the green neurons. The green neurons are going to be our timeline, and the little dots are going to be our embedding network. The timeline is laid out backwards, but that's irrelevant, it'll still work. So now the green neurons get input and send output, and they will continue to function as-is without the little dots.

3. Now, pretend the little dots are neurons. (They're not connected to anything yet, there in a separate layer). Connect these neurons "omni", which means every neuron connects to every other neuron.

4. Overlay the two layers so they're in register, with the small dots on top and the green neurons on the bottom..Now connect the little dots into the green layer, straight down point to point. The result is a coarse grained matrix in green, and a fine grained matrix of little dots.

5. The computational rules for the two layers are: the green layer can use ReLU, and the dot layer has to use Hopfield's asynchronous update method. Now send signals through the green layer. Assume that every synapse is plastic. Arbitrarily choose a wiring delay between neurons, say... 1 (msec).

6. You now have an embedded timeline, although it's not compactified. Let the network run, as you feed it lots of data. After a while, you'll notice the network behavior changes dramatically. What happens is, the dot layer comes to predict the behavior of the green layer, even as the green layer is learning to generate outputs from inputs.

The prize: the embedding layer will predict the green layer's next output before it occurs.

This model works with just about any neuron type and threshold function.

scruffy · Feb 3, 2026

Oh - I forgot one thing.

Neurons in the embedding layer are updated asynchronously, one at a time, according to Hopfield's Monte Carlo method.

All neurons in the embedding layer must be updated on every pass through the green layer.

You have more dots than green neurons, so you can calculate how fast you have to update the dots to make this happen.

scruffy · Feb 3, 2026

Hi folks, almost there. Yesterday it was neural networks, today it's synapses. Today's buzz word is "glutamate spillover". It's part of how the timeline architecture works at the synaptic and molecular level, where the radius of the hoops on the earring gets very very small

Along a dendrite, there are spines, and each spine is a synapse. Glial cells (astrocytes) wrap themselves around the synapse. Just as there are neurotransmitters, there are also gliotransmitters. The most common one is glutamate, which is also the most common excitatory neurotransmitter. Glutamate affects the permeability of the postsynaptic membrane to calcium, through two different kinds of receptors, called NMDA and AMPA (and some others we won't discuss). These receptors are also on glial cells. In addition to receptors, there are also glutamate transporters that remove any excess extracellular glutamate after a synaptic event (it's called glutamate reuptake).

I'd like to refer you to this link for a deeper understanding of what calcium does. It's a great summary of membrane behavior.

2.2 Hodgkin-Huxley Model | Neuronal Dynamics online book

Freely available online version of the computational neuroscience book "Neuronal Dynamics" written by Wulfram Gerstner, Werner M. Kistler, Richard Naud and Liam Paninski. Visit us for teaching materials, online lectures and more.

neuronaldynamics.epfl.ch

When a spiny glutamate synapse becomes very active, the reuptake mechanisms can no longer keep up with the amount of transmitter being released, and you get "glutamate spillover", where the glutamate leaves the synaptic cleft and diffuses into the extracellular space. But since the glial membrane is wrapped around the synapse the glutamate has nowhere to go but down the shaft of the spine. And, indeed, we find that glutamate receptors are expressed on the shafts of dendritic spines.

Spillover is a medium-short event in the brain. Compared to a synapse which is like spilling a drink, spillover is like cleaning it up. The glial cells eventually vacuum up all the excess glutamate, and in doing so their own receptors generate waves of calcium that travel from one astrocyte to the next. The astrocyte network is parallel to the neural network, they only touch at the point of synaptic communication. There are gap junctions between the astrocytes that are permeable to both calcium and potassium, creating an electrical syncytium with a time constant of microseconds rather than the milliseconds usually associated with a synaptic event. In other words the astrocyte network is much faster than the neural network, the difference is that neurons are isolated entities whereas astrocytes are a web.

What I've just described is the effect of neural electrical activity on molecules. We already know about the effect of molecules on electrical activity, that's the Hodgkin-Huxley equation with its myriad ion channels. This embedding of chemical reactions into the network space is a big part of what allows us to physically realize the "limit as dT => 0".

Is there "prediction" at the molecular level? Yes, there is. And this model will tell us exactly how it should work, and then we'll go find the evidence that it actually does work this way.

The easiest way to show this is with a compartmental model of a dendrite. You can take a little tiny piece of a dendrite containing a few dozen spiny synapses, and model it like a cylinder with pins sticking out of it. Like a hair brush. And then use this model to prove that one compartment can predict the activity of another.

scruffy · Feb 5, 2026

Oh lookie, the physicists agree with me.

https://www.sciencedirect.com/science/article/abs/pii/S0378437114007511

"Although in the long-time regime ergodicity and stationarity are recovered, this article shows that the free-will states generate synchronization, operating on a much shorter time scale than equilibrium. The time distance between two consecutive free-will states is described by a brand new nonstationary survival probability. The brand new distribution tells how long we have to wait for a new event given that we start measuring the time at the instance of an event occurrence."

Check Figure 1 in this link - it's the embedding I just showed you earlier.

Long-Range Amplitude Coupling Is Optimized for Brain Networks That Function at Criticality - PubMed

Brain function depends on segregation and integration of information processing in brain networks often separated by long-range anatomic connections. Neuronal oscillations orchestrate such distributed processing through transient amplitude and phase coupling, yet surprisingly, little is known...

pubmed.ncbi.nlm.nih.gov

Only here, the number of neurons in each layer is equal. Whereas my model specifically requires more neurons and a finer synaptic grain in the embedding layer.

And note they have the loop topology correct (top left)!

This is unquestionably the mechanism brains use to generate awareness. Mathematically, the stochastic embedding works faster than the timeline, resulting in a scale-free "unfolding" of physical time.

What exactly you're aware "of" is the movement of information through the timeline (which equates with "around the circle" in the compactified version).

There is a specialized architecture in the brain, for the point at infinity. It's the bidirectional pathway between the hippocampus and the medial prefrontal cortex. Its specific job is to make the point at infinity compact in the topological sense, so information can flow "through" it. It makes perfect sense that this point should handle memory, because memory is the ultimate destination and the ultimate source of all information, and therefore in the limit as dT => 0 you want this exact point to be the point of continuity - where sensory input evokes memory, and memory generates prediction (and therefore action, on the basis of an error signal between predicted state and actual state).

This is a simple, elegant mechanism. Exactly the kind of thing nature likes.

scruffy · Feb 5, 2026

Ha! Jawohl meine damen und herren -

I found a simulator that will handle this! Here it is:

NEST Simulator

Written by the Swiss. Figures.

scruffy · Feb 6, 2026

Ha ha - Wikipedia is busted!

Here's the same kind of stupidity that Marvin Minsky used to set back AI by 20 years.

"Because digital systems need not be causal, some operations can be implemented in the digital domain that cannot be implemented using discrete analog components. Digital filters that require finite numbers of future values can be implemented while the analog counterparts cannot."

Shift-invariant system - Wikipedia

en.wikipedia.org

I just proved this statement dead wrong. The unfolding of the point at infinity allows the embedding of a discrete network into a continuous topological space.

This is nothing more than information geometry, where points on a Riemann sphere become probability distributions. You're free to map a lattice onto the sphere, nothing's stopping you. And this is one case where the imaginary time formalism will come in handy. Imaginary time is related to inverse temperature, which in turn is related to order parameters and long range coupling constants.

It's easy to prove this, because the compactified timeline is a projection mapping. If you're an observer standing on the point at infinity, as you turn left to right your view extends beyond the bounds of the original linear timeline. In fact, it extends to infinity in both directions, it extends to the horizon, as it were (which is a result of the projection mapping).

I'm not gonna fix the Wiki page though. **** em. Some dumbass leftard will probably start arguing with me, and I don't have time for it.

scruffy · Feb 6, 2026

All right. Here is an airtight framework for the basics.

0. Physical time is an abstraction we call t, it starts at 0 (it's never negative) and always increases. The origin is arbitrary unless we're talking about cosmology, for the purpose of this discussion we can translate the origin anywhere we want. Physical time uses an allocentric reference frame, it is independent of the observer and all observers agree on what time it is. (We won't deal with relativity, we'll restrict to a nearly Euclidean patch of the manifold).

1. The brain is a window (of computational processes, chemical, electrical, psychological, take your pick) moving through time. It uses an egocentric reference frame, we'll call it T to distinguish it from physical time t. T=0 is always "now". (So you can look at "now" as a point moving through physical time). The key concept is that T and t move in opposite directions - a time of T=-1 in the past means that physical time has already advanced to t=+1.

2. To describe the flow of physical time through the brain's window, begin with a time series S(t) that describes an ordered sequence of events. We'll use this as our training data. To keep things simple we can use a binary vector like [1 0 0 1 0...], and the length of the vector is fixed at the size (width) of the brain's window, which we'll call W. We'll call the activity in the brain's window X, to distinguish it from S and to indicate that it will receive an unknown input.

3. Now we're going to take the data vector and slide it across the brain's window (which is the same size, W) from right to left, one time step at a time. As we do this, the point "now" (defined as T=0 in the egocentric reference frame), moves forward in physical time and always aligns with the correct slot in the data vector S. T(0) always equates with S(t).

And, all the earlier data bits are staged in a linear sequence along T < 0, which from the egocentric reference frame is the "past".

That's it, it's that simple. It's kind of a half-assed convolution, but really all it is, is moving a stimulus across a retina.

Now let us say, we have an adaptive process that works "much faster than" a time step. This process is going to predict the next data point. (Next meaning the one that will appear at "now", T=0). It will do so by performing statistics on the earlier data points. But not just "any" ordinary statistics. Serious, incredibly high powered stuff. Multi-factor causality analysis in real time. Mapping of the Fisher information to a stochastic computational manifold. Impressive stuff.

scruffy · Feb 6, 2026

And here is why the embedding network makes it work. The embedding network will learn the joint probability between itself and the timeline. It will essentially come to predict how it itself will respond to incoming data. As far as the timeline is concerned the embedding neurons are "hidden units" representing parameters that model the data. Only, the data isn't just the data, it's the data plus the embedding activity. In this way the brain and the environment are intimately coupled, at the most intricate level possible.

scruffy · Feb 7, 2026

Physics of energy and information

Physical systems optimize an energy function. They find the minimum, the point of lowest energy. Biological systems are the same way. They find the point of lowest energy. Statistical mechanics can in some cases relate the energy to a configuration (a "state" of the system). In general, physically, the energy function is given by a Lagrangian, or a Hamiltonian. These contain terms for potential energy (configuration) and kinetic energy (momentum). Usually the Lagrangian

L = T - V

where T is kinetic energy and V is potential energy. In biology, in neural networks, the energy function is given by the variational free energy, which has exactly the same form

F = complexity - accuracy

where the accuracy term carries error information (difference between predicted and actual) and the complexity term is "recognition density" (basically how many bits are needed to generate the next prediction).

The actual math for this is statistical, just like statistical mechanics. Only in this case we have an advantage because we can calculate the distributions from actual data, instead of pulling them out of our butts by making unrealistic assumptions to get to a well behaved Bernoulli trial. This is what learning does, it estimates the distributions by fitting the data to an internal model. When it first starts out the model is empty, and every tick it's going to get new data and update its parameters. After many ticks it'll have a pretty good model of the data. Therefore during learning, complexity will decrease and accuracy will increase, lowering the free energy

To get the distributions from the data, the two key concepts are the Fisher information and the Kullback-Leibler divergence. They use both the bit sequence and the relationship between the bits, and because of the time to space mapping of the timeline these calculations can be done all at once in parallel.

This formulation reflects the hard reality that biological systems consume energy, and energy is expensive so it's necessary to minimize its use. Errors use energy, they cause neurons to fire. Complexity uses energy because neurons are firing. Now we'll see how the timeline minimizes free energy.

We already imagined a visual signal flowing along the timeline, which each point in time being a retinotopic mapping, essentially a snapshot of a moving visual image. But now, instead of a retina, we have a point to point mapping of the whole brain. Our "retinal image" is essentially a snapshot of the electrical activity in the network. It's that simple. The network now optimizes free energy in the whole brain.

scruffy · Feb 7, 2026

Great article on the free energy principle.

Box 2 shows the incredibly simple neural network that supports it.

Figure 4 shows all the things it applies to.

https://www.uab.edu/medicine/cinl/images/KFriston_FreeEnergy_BrainTheory.pdf

scruffy · Feb 7, 2026

An interesting historical aside.

Andrey Markov, who invented Markov chains and the Markov process, lived during the time of the Russian revolution.

Andrey Markov - Wikipedia

en.wikipedia.org

His nemesis was a guy named Pavel Nekrasov, who worked for the Tsar.

Pavel Nekrasov - Wikipedia

en.wikipedia.org

Nekrasov was deeply religious and believed math could eventually explain free will and the mind of God.

Markov was a socialist and an atheist. And unfortunately, in 1917 the politics became more important than the math, so today we study Markov in school and we aren't exposed to Nekrasov's ideas.

Fast forward 100 years, and here comes Friston, a Brit, who drags Nekrasov's ideas out of the dust pile and formalizes them. Bayesian statistics are the polar opposite of a Markov blanket, every belief depends on the accumulated experience from the beginning of time.

Make no mistake, Markov was a really smart cat. His thesis advisor was Pafnuty Chebyshev, the ultimate in no-nonsense math and engineering. But the politics made him miss something, and it took 100 years to find it again.

scruffy · Feb 8, 2026

Now that the basic mechanisms are settled, I'm investigating the timing.

The hippocampus is complicated. It phase encodes all the sensory information and passes it to the frontal lobe. In turn, the frontal lobe does contextual lookups in memory and passes the context back to the hippocampus. The circuit is lengthy, unlike most of the other circuitry at the top of the midbrain. This is what it looks like.

The purple thing is the hippocampus. The pink thing next to it is the amygdala. You can see these fibers have a long path and a complicated connection scheme.

You can trace the path of the connections from the amygdala and hippocampus to the other pink area in front, which is the nucleus accumbens. The fibers go all the way around the back of the brain, and around the top, before they finally hit their target. Why is that? They don't have to do that, they could just go straight from one pink area to the other, there's plenty of room. The answer is: timing. Something needs that information delayed by a little bit.

Here's a clue. There are two visual streams: what and where. They combine in the hippocampus. But there is a third stream! It's the reward stream, it attaches value to objects. It involves the orbitofrontal cortex and the nucleus accumbens. That information, is necessarily delayed, because you have to determine what an object is, before you can determine its value.

We know how the spatial information is encoded, place cells and all that. But we don't know how the value is encoded. The same object can cause either approach or avoidance, depending on the context and the rest of the scene. So far there is zero evidence that value is phase encoded. However it makes sense that it would be, given that N. accumbens also shows prominent theta.

So one reason for the delay, could be to ensure the synchronous arrival of scene information with the values that match it. Because this is what the organism will use to navigate, it will navigate towards rewards and away from threats.

And, we already know there is time encoding on a longer scale, because there is "delayed" reward, which may or may not be associated with anticipation. These encodings must match events at a biological time scale, that is their purpose. Therefore there is an additional time encoding system in the frontal lobe that hasn't been discovered yet.

scruffy · Feb 8, 2026

So far, everything dovetails. Most of what we know about frontal lobe encoding of time comes from delayed response experiments, where the subject has to wait X seconds before reaching for the reward. Turns out, the dorsolateral prefrontal cortex that handles this is the exact same area that talks to the hippocampus in phase-speak. My hypothesis was correct, there is an additional mechanism with a longer time scale than phase encoding. This mechanism involves short bursts of sparse neural firing. In other words, if a signal is reverberating somewhere, it ain't here.

There is a variation of delayed response called DOR - delayed oculomotor response. To get the reward, the subject has to make an eye movement to where the stimulus was X seconds ago. The idea is we want to decouple the behavior from the delay. Your reward might come next week, next year... it takes 40 years to pay off your mortgage... y'know, and meanwhile your dopamine is going through the roof for 40 years lol

The delayed response is on the order of 30 seconds, whereas phase encoding is on the order of 1/5 second. Time cells in the entorhinal cortex ramp at rates within this range, providing a convenient translation. There are about six ramp ranges, in both humans and monkeys.

So there's still another mechanism. It is ultimately necessary to completely decouple an event from its dependencies - by definition, that's the only way it can be recognized as an event. Since events occur at varying time scales, something must ultimately unify them into a common representation. And that representation has to be scale-free, yet it must have access to the mechanisms needed to realize it at any desired scale. What I learned today is I can't use reverberation to do it, there has to be another way.

exciting new evidence supports Scruffy's brain model

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Using AI effectively.

"Ahabian monomania for artificial intelligence"

Similar threads