exciting new evidence supports Scruffy's brain model

This is good. In the first 100 iterations (0.1 nanoseconds) we arrive at a stable optimal partition of a complex data set.

1771050664755.gif


Yeah baby. That's what I'm talkin about.

1771050860899.gif
 
At this point some of the support mechanisms start to make sense, yes?

Think information theory, S = - p log p

Before you can estimate information, you need to know the probability. Which means you need the distribution.

To get the probability, you have to be able to separate the event from the background, which usually involves labeling (assigning a name to it) and estimating the distribution, from sparse data using Bayesian inference.

The first is object individuation and memorization, which is triggered by the late component of a P300 event related potential, which comes out of the dorsal attention stream and signals "I don't have this information".

This triggers a memory process, that's very different from scene mapping. Here you're not putting together episodes, you're doing the exact opposite. You're removing time related information, so you can store the invariances. Scene mapping results in memory traces in the dorsolateral prefrontal cortex, whereas object individuation results in memory traces in the inferior temporal cortex.

To estimate the distribution, you can do one of two things:

A. optimize thermodynamically using an energy function

B. draw from a library of distributions, weighting each one to approximate your data

Method A is the Hopfield approach (Ising model, statistical mechanical), method B is a mixture network (information geometry).

The resulting distribution then becomes your initial guess for future Bayesian inferences.

Attached to each object is a value, that indicates what it's good for (what its capabilities are, so to speak). This is yet a third kind of memory, it happens in the medial prefrontal cortex and the contextual input for scene mapping arrives via the nucleus accumbens. The three kinds of memory come together in the so-called VLAM, the vision-language-action model which is an LLM with vision attached to it, used in robots with motor systems.


VLAM's are extensions of the VLM's that power AI systems like Canva, you give it a command in English and it generates an image for you. CoPilot, is another example.

The "actions" are intended for warehouse workers, but you can just as easily have a kung fu master. In my case they'll be simple eye movements. The input will be a visual scene with moving objects and a command saying which object to target. The output will be a sequence of eye movements designed to maximize the information gained from the target.
 
Here's a nice bit on generative neural networks. In machine learning terminology these are called "variational auto-encoders". This video shows you why you need information geometry. Around the 3 min mark, they show you exactly why.



It's for recall. This is what gives you the context for scenes. If you try to recall a distribution from an ordinary synaptic matrix you'll get garbage. You need to "shape" the data so objects with similar features are clustered together in memory space. This way the extent of your context is determined by a simple radius. Which makes it perfectly suitable for hot spots with near-critical oscillators. The hot spots are in the input (say, visual image), but they may not be the most important features you need to look at. That's what attention is for. If you're a rat in a maze and your reward pellet is on top of one of three brightly colored pedestals, the three pedestals will be the hot spots in the image, but the tiny little pellet is the feature you want to look for.

This stuff is actually pretty easy once you get going. You can write a variational auto-encoder in 8 lines of code in PyTorch. But it takes 40 lines of code to plot the results and 200 lines to vectorize your data into probabilistic form suitable for presentation to the machine.

Nevertheless - this mechanism provides you with everything you need to convert between egocentric and allocentric reference frames. Because you have a sequence of data that unfolds in real physical time (that's allocentric), and then you have an organism with a neural network that only thinks in terms of "previous" and "next" (that's egocentric, past and future). You take the allocentric frame, align it and compactify it, and then you have a "loop" topology that dovetails with both brain wiring and the Riemannian manifolds needed to represent the features of sequences of data. If you can represent your data set as clusters in three dimensions you can parametrize it with just three numbers. With the variational auto-encoder you can generate thousands of examples of a cluster with just two angles and a radius. If your distribution is Gaussian you can map the mean and variance, and if you're Poisson, lambda is your rate and you can use the Fano factor to manipulate the variance.

This method may or may not be faster than thermodynamic optimization, we'll have to see. Tensor math is very powerful, but there's a lot of calculations to get to an information manifold.
 
Great and entertaining vid on the state of the art in robotics

 
All this evidence supports Scruffy's hypothesis. Which is specifically:

1. The perceptual part of our awareness requires the timeline. IS the timeline. There is no subjective experience without it.

2. The conscious part of our awareness is something entirely different, it's an unfolding of physical time into the timeline that involves active tracking of predictions along the timeline. It still requires the timeline, that's how we "perceive" our conscious thoughts.

3. The boundary of these two processes is in the circuitry around the hippocampus that performs three essential functions all at once:

a. translation of reference frames
b. scene mapping
c. short term memory

These three functions together, enable navigation.

4. The two essential computational components of all neural circuitry are:

a. predictive coding
b. error (energy) minimization

5. A timeline about a second wide consisting of 100 billion neurons enables computational precision in the sub-picosecond range. This equates with the update time in an MCMC model (which includes the Hopfield model and many others when they're implemented on machines), and it's about equivalent to a CPU generating random numbers at 1000 gHz.

6. Noise is error. To distinguish measurement noise from process noise, a timeline is required. There's no other way to do it. Auto-correlations and cross-correlations are required, which means you have to slide the signals against themselves, and the cheapest and easiest way to do that is with a compactified timeline you can simply rotate at any given scale. Hence the earring topology.

7. The need for this architecture to self-organize explains ALL the different kinds of synaptic plasticity, including the time scales.

And even with all this, I still can't tell why red is red. That requires a lot more study and experimentation. For sure it's related to oscillators somehow. But after this, with Scruffy's model, the computational part becomes straightforward.

Here endeth the thread, I guess.

This is really no different from building an electronic circuit. You have components, and you have wires. The difference is that in both electronics and biology you have hundreds of different components available to choose from, but in biology there are different kinds of wires too. The wires themselves have computational properties, they become more like components that way. But... y'know, same principle. You color code your wires, put the blue one here and the green one there, according to the schematic, and the circuit magically starts working. Physics is great that way. Works every time.
 
Back
Top Bottom