time series, windows, and fractals

scruffy

Diamond Member
Joined
Mar 9, 2022
Messages
25,908
Reaction score
22,374
Points
2,288
A common technique in time series analysis is the "moving window" (either the time series flows through the window or the window moves along the time series).

Statistics are calculated "within the window", for example when analyzing a stock ticker you might want to look at a 30 day or 90 day moving average.

Usually people use fixed size windows, and in that case, the larger the window the smoother the result. For example the average (mean) will change less with a 90 day window, compared to a 7 day window. The shorter window will tend to pick up the "jaggies" in the daily trading.

So this situation is very much like Benoit Mandelbrot's famous paper on fractals, entitled "How Long Is The Coastline Of Britain?". The answer is, it depends on the size of your yardstick. A short ruler will pick up the jaggies, whereas a long measuring stick will give you distance as the crow flies.

In fractal geometry there is a way of calculating the difference in the measurement depending on the size of the yardstick - but only for certain types of functions, those that are "self similar". In information theory this concept is extended into the "relative entropy" using the KL divergence or Renyi's theory of entropy.

In the stock market, the purpose is to predict the next stock value. "Trends" can be seasonal, or industry related, or tied to interest rates or closing days. Stock tickers can also be self similar, meaning they have fractal content tied to volatility. In the latter case, changing the window size can reveal some of the fractal content.

Wall Street has been using neural networks for time series analysis and prediction for years. The networks are "trained" with historical time series, on the assumption that historical behavior is likely to continue. Typically such analysis is an overnight affair, historical data is "passed through" various size windows and the most recent trends are used to strategize the next day's trading. Typically such analysis is multi dimensional - they don't analyze just one ticker, they analyze all stocks within an industry, or a basket or an index. When the stocks are related, it looks very much like a retina, where each pixel is its own time series. And, much like the human visual system, the analytics extract periodicity in both time and space, along with the covariances and relative entropies.

So here is my question: how are the various window sizes related to each other? If you pick up a periodicity in a 7 day window that doesn't appear in a 30 day window, what does it mean?

Mandelbrot would tell you it means you have pebbles of a certain size, along your coastline. They only become visible within a narrow range of yardsticks. If the yardstick is too big, they won't be visible at all. If the yardstick is too small, they'll get lost in the molecules of sand. If the yardstick is somewhere between the size of a pebble and the distance between pebbles, you'll start seeing them.

But this requires one of two things: either foreknowledge of the size of the pebbles (so you can choose the proper sized yardstick), or a "sweeping window width" which then requires further analysis to identify which widths show the pebbles.

A sweeping window width is computationally expensive, because you have to pass the data through multiple times, once for each width. But what if you could do all the widths at once, in parallel? Then you could just look for the peaks and there's your periodicity. The issue becomes one of visualization. How do you visualize stochastic periodicity, for instance where the pebbles are all the same size but the distance between pebbles varies widely? Or vice versa?

The math gets pretty hairy. For example:


Can anyone suggest a better way?

Here's a hint. This one's called a "mesochronic plot".

1732171166702.webp


Here's another version of it:

1732171309707.webp


Note the things that look like planes. What do they mean? Note the axes f1 and f2, those are frequencies, in the case of pebbles they would be spatial frequencies. The equidistant planes are showing us an orientation. What does it mean?

This is very cool stuff. Our brains do this in real time. Whereas it takes a Python program overnight to calculate the orientation of the planes. A further hint: the planes are covectors. The blue picture was generated by a Google AI program called TensorFlow (which is the same program they used to train ChatGPT).

We are trying to visualize the pebbles along the coastline of Britain, with no knowledge whatsoever about their size and shape, or even their existence.
 
A common technique in time series analysis is the "moving window" (either the time series flows through the window or the window moves along the time series).

Statistics are calculated "within the window", for example when analyzing a stock ticker you might want to look at a 30 day or 90 day moving average.

Usually people use fixed size windows, and in that case, the larger the window the smoother the result. For example the average (mean) will change less with a 90 day window, compared to a 7 day window. The shorter window will tend to pick up the "jaggies" in the daily trading.

So this situation is very much like Benoit Mandelbrot's famous paper on fractals, entitled "How Long Is The Coastline Of Britain?". The answer is, it depends on the size of your yardstick. A short ruler will pick up the jaggies, whereas a long measuring stick will give you distance as the crow flies.

In fractal geometry there is a way of calculating the difference in the measurement depending on the size of the yardstick - but only for certain types of functions, those that are "self similar". In information theory this concept is extended into the "relative entropy" using the KL divergence or Renyi's theory of entropy.

In the stock market, the purpose is to predict the next stock value. "Trends" can be seasonal, or industry related, or tied to interest rates or closing days. Stock tickers can also be self similar, meaning they have fractal content tied to volatility. In the latter case, changing the window size can reveal some of the fractal content.

Wall Street has been using neural networks for time series analysis and prediction for years. The networks are "trained" with historical time series, on the assumption that historical behavior is likely to continue. Typically such analysis is an overnight affair, historical data is "passed through" various size windows and the most recent trends are used to strategize the next day's trading. Typically such analysis is multi dimensional - they don't analyze just one ticker, they analyze all stocks within an industry, or a basket or an index. When the stocks are related, it looks very much like a retina, where each pixel is its own time series. And, much like the human visual system, the analytics extract periodicity in both time and space, along with the covariances and relative entropies.

So here is my question: how are the various window sizes related to each other? If you pick up a periodicity in a 7 day window that doesn't appear in a 30 day window, what does it mean?

Mandelbrot would tell you it means you have pebbles of a certain size, along your coastline. They only become visible within a narrow range of yardsticks. If the yardstick is too big, they won't be visible at all. If the yardstick is too small, they'll get lost in the molecules of sand. If the yardstick is somewhere between the size of a pebble and the distance between pebbles, you'll start seeing them.

But this requires one of two things: either foreknowledge of the size of the pebbles (so you can choose the proper sized yardstick), or a "sweeping window width" which then requires further analysis to identify which widths show the pebbles.

A sweeping window width is computationally expensive, because you have to pass the data through multiple times, once for each width. But what if you could do all the widths at once, in parallel? Then you could just look for the peaks and there's your periodicity. The issue becomes one of visualization. How do you visualize stochastic periodicity, for instance where the pebbles are all the same size but the distance between pebbles varies widely? Or vice versa?

The math gets pretty hairy. For example:


Can anyone suggest a better way?

Here's a hint. This one's called a "mesochronic plot".

View attachment 1044638

Here's another version of it:

View attachment 1044640

Note the things that look like planes. What do they mean? Note the axes f1 and f2, those are frequencies, in the case of pebbles they would be spatial frequencies. The equidistant planes are showing us an orientation. What does it mean?

This is very cool stuff. Our brains do this in real time. Whereas it takes a Python program overnight to calculate the orientation of the planes. A further hint: the planes are covectors. The blue picture was generated by a Google AI program called TensorFlow (which is the same program they used to train ChatGPT).

We are trying to visualize the pebbles along the coastline of Britain, with no knowledge whatsoever about their size and shape, or even their existence.
I used Tensorflow in the past, pre-covid. It was a passionate, personal interest of mine at the time, I still have about a dozen fully filled notebooks on my journey to understanding ML, of which tensorflow was one module/library I Iearned about and leveraged. At the time, I actually went further and tried to understand the black box behind the coding, tensorflow I believe was a google invention and made everything easier, but I felt it was "cheating" per se as I wanted to understand the underlying math.

Covid took me away from all of that at a time in which I was really in the flow. Then the evils of life hit me hard but I digreess
:( In fact, if I recall correctly the code for calling the module in python was "tf.(function)" am I right? That is how often I used it, along with Keras and others which I could probably remember if I sat and focused on trying to.

Regardless, the time series for stock transactions is very tricky and I wrapped my mind around this for a few hours as I had designed a simple chat bot (based on NLP, natural language processing and pointing to a website for its cortex) and watched videos of others who claimed to have successfully designed a time series bot.

I concluded that there are a number of difficulties in trying to design such an application based.on historic information, not.least of which as you suggest is the duration you choose and how they intersect. Determining the yard stick to apply is just one aspect and the mind can spin due to the infinite factors of trying to do the impossible and narrow down the correct measurements to return optimal results. It might not be impossible to be honest.

I'm not trying to avoid the question but just to inject more questions that arise even if you answer your initial inquiry, and they are numerous I assure you.

As it is all theoretical, I might suggest a weighted average applying the premises you included. Something I had considered if I were to pursue this, I'm sure I have the code saved in some python file. Essentially you apply a weighted average based on the time duration you believe is most accurate (90 days instead of 7 days for instance). You can adjust the weighted average based on duration, so in my case I was looking at a range from five years down to a few days or something. Again it was just theory.

In your case, perhaps you apply a weighted average placing more weight on the 90 day period and less during the 7 day period, find some modality or mean value and proceed in that manner. I assure you there are so many factors to include though, there are already A.I mutual funds. I'm sure with the right minds and the right questions answered, there may be more efficient and accurate models, but, stocks include human nature and large instutional investors who can alter a price, especially more junior stocks with far smaller floats than say a Microsoft or McDonalds.
 
Last edited:
I used Tensorflow in the past, pre-covid. It was a passionate, personal interest of mine at the time, I still have about a dozen fully filled notebooks on my journey to understanding ML, of which tensorflow was one module/library I Iearned about and leveraged. At the time, I actually went further and tried to understand the black box behind the coding, tensorflow I believe was a google invention and made everything easier, but I felt it was "cheating" per se as I wanted to understand the underlying math.

Covid took me away from all of that at a time in which I was really in the flow. Then the evils of life hit me hard but I digreess
:( In fact, if I recall correctly the code for calling the module in python was "tf.(function)" am I right? That is how often I used it, along with Keras and others which I could probably remember if I sat and focused on trying to.

Regardless, the time series for stock transactions is very tricky and I wrapped my mind around this for a few hours as I had designed a simple chat bot (based on NLP, natural language processing and pointing to a website for its cortex) and watched videos of others who claimed to have successfully designed a time series bot.

I concluded that there are a number of difficulties in trying to design such an application based.on historic information, not.least of which as you suggest is the duration you choose and how they intersect. Determining the yard stick to apply is just one aspect and the mind can spin due to the infinite factors of trying to do the impossible and narrow down the correct measurements to return optimal results. It might not be impossible to be honest.

I'm not trying to avoid the question but just to inject more questions that arise even if you answer your initial inquiry, and they are numerous I assure you.

As it is all theoretical, I might suggest a weighted average applying the premises you included. Something I had considered if I were to pursue this, I'm sure I have the code saved in some python file. Essentially you apply a weighted average based on the time duration you believe is most accurate (90 days instead of 7 days for instance). You can adjust the weighted average based on duration, so in my case I was looking at a range from five years down to a few days or something. Again it was just theory.

In your case, perhaps you apply a weighted average placing more weight on the 90 day period and less during the 7 day period, find some modality or mean value and proceed in that manner. I assure you there are so many factors to include though, there are already A.I mutual funds. I'm sure with the right minds and the right questions answered, there may be more efficient and accurate models, but, stocks include human nature and large instutional investors who can alter a price, especially more junior stocks with far smaller floats than say a Microsoft or McDonalds.
This is the magic topology. (You've seen it before).

1732408927907.webp


Timeline for the time series is horizontal across the top. Future to the right, past on the left, information moves from right to left. Stuff to the right are predictions, stuff to the left is memory. The intersection of the circles is "now". Circles are generated by compactifying various intervals ("window widths"). They are projection mappings. A vertical slice through all points at infinity gives you the relationship between the widths. e1...en are mapped with information geometry using the KL divergence. Distances between e's are mapped with the Fisher information metric.

Neural networks perform these computations in real time, literally milliseconds. You can build a chip to do this with memristors.
 
The .png image isn't showing for me, just the file name.

I don't doubt the speed of the calculations, but how would this apply to the buying and selling of stocks for profit? Trying to gauge the market is only half the battle and that doesn't obviously including breaking news on a stock that can impact it negatively or positively. Even absent that consideration, how would you buy and sell so maximally? Buying and selling isn';t instant (not as instant as the calculations at least) and there are broker fees.
 
The .png image isn't showing for me, just the file name.

I don't doubt the speed of the calculations, but how would this apply to the buying and selling of stocks for profit?

The usual use of a predictive neural network is to generate the predictions on the right. It does this using the backward arrows in the pic, which usually means error correction through back propagation and gradient descent.

Trying to gauge the market is only half the battle and that doesn't obviously including breaking news on a stock that can impact it negatively or positively.

Exactly. So in this case, there will be an additional neural network layer with nodes connected to the points at infinity on the circles, which are those directly opposite their intersection. These nodes represent the current time series model for each window size. The network will learn the relationships between the window sizes.

Your example is a special case of "volatility", we could call it a point event with discontinuous volatility. When the inputs no longer match the training model the network will react, the state will move away from its learned attractor. This is the same way the fraud detection networks work.

Even absent that consideration, how would you buy and sell so maximally? Buying and selling isn';t instant (not as instant as the calculations at least) and there are broker fees.

You'll notice that by projecting the bottom half of any circle back onto the original time series axis, we get a reversed time series for free. (The arrows show the direction of information flow). So now we have the forward and backward series needed for causal analysis.

Additionally, since these are projective maps we can "extract" a complex coordinate along an axis, turning it into a real number that enters the network in the usual way. Thus a conformal mapping is just a tensor a + bi where the i is implicit and "drops out" with the simple substitution of an inhibitory synapse for an excitatory one. This explains the prevalence of "biphasic" synapses in the brain, and in spiking networks this becomes nothing more than LTP in one direction and LTD in the other. In oscillatory spiking networks this generates the "phase precession" observed in the hippocampus, so for example this is the same mechanism that results in place cells, grid cells, and time cells. In a 3-d volume of the network the encoding is therefore 4-dimensional, we have effectively "synthesized" an additional dimension. Topologically this lets us treat the information geometry like a fiber bundle, at each point there is a fiber that maps back down into the base space.
 
The usual use of a predictive neural network is to generate the predictions on the right. It does this using the backward arrows in the pic, which usually means error correction through back propagation and gradient descent.



Exactly. So in this case, there will be an additional neural network layer with nodes connected to the points at infinity on the circles, which are those directly opposite their intersection. These nodes represent the current time series model for each window size. The network will learn the relationships between the window sizes.

Your example is a special case of "volatility", we could call it a point event with discontinuous volatility. When the inputs no longer match the training model the network will react, the state will move away from its learned attractor. This is the same way the fraud detection networks work.



You'll notice that by projecting the bottom half of any circle back onto the original time series axis, we get a reversed time series for free. (The arrows show the direction of information flow). So now we have the forward and backward series needed for causal analysis.

Additionally, since these are projective maps we can "extract" a complex coordinate along an axis, turning it into a real number that enters the network in the usual way. Thus a conformal mapping is just a tensor a + bi where the i is implicit and "drops out" with the simple substitution of an inhibitory synapse for an excitatory one. This explains the prevalence of "biphasic" synapses in the brain, and in spiking networks this becomes nothing more than LTP in one direction and LTD in the other. In oscillatory spiking networks this generates the "phase precession" observed in the hippocampus, so for example this is the same mechanism that results in place cells, grid cells, and time cells. In a 3-d volume of the network the encoding is therefore 4-dimensional, we have effectively "synthesized" an additional dimension. Topologically this lets us treat the information geometry like a fiber bundle, at each point there is a fiber that maps back down into the base space.
Unfortuately the image you attached snt visible. Is the reference to an actual brain an abstract comparison? If not,.I don't understand the relevance to a neural network using tensorflow.
 
Unfortuately the image you attached snt visible. Is the reference to an actual brain an abstract comparison? If not,.I don't understand the relevance to a neural network using tensorflow.
Not sure why the PNG isn't showing up for you, I can see it fine. Here's the links to the original image and the Google search:


https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F2137630%2Ffigure%2Ffig3%2FAS%3A669007984226322%401536515341817%2FThe-Hawaiian-earring.pbm

This model is a topological expansion of the fundamental building block of a real brain, which is the reflex loop. (Google "monosynaptic reflex arc").

In TensorFlow there is no convenient way to represent this topology, except for what they call a "recurrent neural network", which computationally is something completely different.

TensorFlow handles "stacks", which are linear sequences of neural network layers. Whereas a reflex arc is more like a control system, using feedback to keep the muscle length (or tension) constant.

To get from a reflex loop to a stack, you have to cut the loop, at the synapse between the sensory neuron and the motor neuron, and lay it out flat, so the motor neuron is on the right, and the sensory neuron is on the left. You get this:

SN <= (now) <= MN

and in TensorFlow your stack would look like this:

C1 <= C2 <= (now) <= C3 <= C4

where the Cn's might be convolutional layers ("feed-forward").

To get from the stack to a loop, you have to join the ends of the stack, like this:

C1 <= C2 <= (now) <= C3 <= C4
^=======================^

Mathematically this is called "compactification", because you're making the topology compact. Beginning with an open interval of the real line which is not compact, you add a single point when you join the ends, which is designated "the point at infinity". This is called an "Alexandrov 1-point compactification", and there are other ways of doing this but this is the simplest and easiest.

So now you have a circle instead of a straight line, and this is a projective map in the Riemann sense. When you do this repeatedly for different size intervals, you get the "Hawaiian earring" construction, which is the PNG. The salient feature is that all the origins are the same point, which we designate as "now" - whereas the points at infinity form a new axis, which is your synthetic dimension.

So now, if you're in TensorFlow, you add an additional layer, call it a vector V, and you connect it so each Vi is one of the points at infinity, and V0 is the origin (which is "now", the point where all the circles meet).

If you trace the information flow, it goes from right to left along the stack, which becomes the top half of each circle - but then it reverses along the bottom half of each circle and goes from left to right. So if you run a time series through the stack you end up going round and round around the circles.

If you the look at the stack like a shift register, you can map each layer to points in time. So let's say it takes 1 ms for the information to travel between layers, from one layer to the next. Then C1(t) becomes f(C2(t-1)) and so on. But we set this up so the origin is always "now", so essentially this whole picture moves through physical time, and in this way the stuff on the right becomes the future and the stuff on the left becomes the past, and T=0 is always "now". So this is just a change in the reference frame.

So when you make a prediction, you're using data from the left and your prediction ends up on the right, and the comparison (cost function) occurs at T=0. You're comparing your prediction to the data at "now".

This is the simplest way of presenting the concept. In real life it's slightly more complicated, but still the same topology. In real life your sensory data enters through a separate channel that introduces new information "now" (so the channel enters from the top at the origin T=0), and the comparison is made between this new information and the predictions from the loops. In this way you're always optimizing around "now".

The purpose of the stack layers on the left is to buffer t-n, to extract features and handle non-linearities in the sensory data. The purpose of the stack layers on the right is to make better and better predictions as you get closer and closer to "now". The purpose of the vector V is to control which predictions get emphasized.

So in your scenario where something suddenly changes, it is very likely that the predictions at T>>0 will be faulty, and therefore we would wish to invalidate the entire right side of the stack and replace it with updated predictions. So in addition to the vector V we need a control system built around V that performs this function. In a real brain this manifests as a P300 brain wave, it's like a "reset" of the forward portion of the timeline.
 
Last edited:
Not sure why the PNG isn't showing up for you, I can see it fine. Here's the links to the original image and the Google search:


https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F2137630%2Ffigure%2Ffig3%2FAS%3A669007984226322%401536515341817%2FThe-Hawaiian-earring.pbm

This model is a topological expansion of the fundamental building block of a real brain, which is the reflex loop. (Google "monosynaptic reflex arc").

In TensorFlow there is no convenient way to represent this topology, except for what they call a "recurrent neural network", which computationally is something completely different.

TensorFlow handles "stacks", which are linear sequences of neural network layers. Whereas a reflex arc is more like a control system, using feedback to keep the muscle length (or tension) constant.

To get from a reflex loop to a stack, you have to cut the loop, at the synapse between the sensory neuron and the motor neuron, and lay it out flat, so the motor neuron is on the right, and the sensory neuron is on the left. You get this:

SN <= (now) <= MN

and in TensorFlow your stack would look like this:

C1 <= C2 <= (now) <= C3 <= C4

where the Cn's might be convolutional layers ("feed-forward").

To get from the stack to a loop, you have to join the ends of the stack, like this:

C1 <= C2 <= (now) <= C3 <= C4
^=======================^

Mathematically this is called "compactification", because you're making the topology compact. Beginning with an open interval of the real line which is not compact, you add a single point when you join the ends, which is designated "the point at infinity". This is called an "Alexandrov 1-point compactification", and there are other ways of doing this but this is the simplest and easiest.

So now you have a circle instead of a straight line, and this is a projective map in the Riemann sense. When you do this repeatedly for different size intervals, you get the "Hawaiian earring" construction, which is the PNG. The salient feature is that all the origins are the same point, which we designate as "now" - whereas the points at infinity form a new axis, which is your synthetic dimension.

So now, if you're in TensorFlow, you add an additional layer, call it a vector V, and you connect it so each Vi is one of the points at infinity, and V0 is the origin (which is "now", the point where all the circles meet).

If you trace the information flow, it goes from right to left along the stack, which becomes the top half of each circle - but then it reverses along the bottom half of each circle and goes from left to right. So if you run a time series through the stack you end up going round and round around the circles.

If you the look at the stack like a shift register, you can map each layer to points in time. So let's say it takes 1 ms for the information to travel between layers, from one layer to the next. Then C1(t) becomes f(C2(t-1)) and so on. But we set this up so the origin is always "now", so essentially this whole picture moves through physical time, and in this way the stuff on the right becomes the future and the stuff on the left becomes the past, and T=0 is always "now". So this is just a change in the reference frame.

So when you make a prediction, you're using data from the left and your prediction ends up on the right, and the comparison (cost function) occurs at T=0. You're comparing your prediction to the data at "now".

This is the simplest way of presenting the concept. In real life it's slightly more complicated, but still the same topology. In real life your sensory data enters through a separate channel that introduces new information "now" (so the channel enters from the top at the origin T=0), and the comparison is made between this new information and the predictions from the loops. In this way you're always optimizing around "now".

The purpose of the stack layers on the left is to buffer t-n, to extract features and handle non-linearities in the sensory data. The purpose of the stack layers on the right is to make better and better predictions as you get closer and closer to "now". The purpose of the vector V is to control which predictions get emphasized.

So in your scenario where something suddenly changes, it is very likely that the predictions at T>>0 will be faulty, and therefore we would wish to invalidate the entire right side of the stack and replace it with updated predictions. So in addition to the vector V we need a control system built around V that performs this function. In a real brain this manifests as a P300 brain wave, it's like a "reset" of the forward portion of the timeline.
Darn, I wish we were having this discussion three years ago, I was a sponge and had learned so much so as to even answer questions on stackflow for others on the topic of ML. I would literally go to sleep listening to advanced coding or theoretical discussions on a specific neural network. or cutting edge developments. That is the beauty of Machine Learning, it moves fast and there is infinite amounts of new research and information. I pursued some of the most technical concepts in nature related to Neural Networks.

Alas, I've forgotten much of the granular details. I get the general idea of what you are trying to achieve however. What I do remember is that RNNs were the critical NN used for time series data due to the memory function obviously. I did a course or two on RNNs. I did also learn CNNs which at the time were fairly new as far as I remember.

I really enjoyed Natural Language Processing as a subject matter and perhaps my favourite model was Reinforcement Learning which to me is the future of AI (unless something more dynamic has surpassed it). Your task is a difficult but noble one, attempted by others. If you perfect this model/application, you could find yourself with deep pocket investors who would support you further no doubt.

If anything you have motivated me to pull out my old notebooks of double sided details of the courses I completed (literally dozens of them). I had already installed a python ide so who knows...maybe I will be downloading libraries and get back into coding :)
 
Last edited:
A very interesting subplot in this scenario, is access to a global associative memory store.

In TensorFlow, each layer of a stack can have its own learning rule, and the learning that takes place is layer-specific.

However in this Hawaiian earring model, you can see by inspection that each different interval (window size) wants access to a global information store that's "not" layer specific. So what we can do is embed this entire loop architecture into a planar Hopfield network, which will serve as our global associative memory. This Hopfield network should be "much larger than" any of the layers in the stack, in fact it should contain at least stack_size * neurons_per_stack_layer nodes. Here is a thought experiment to understand how it would work.

Let's say we're looking at a 12x12 basket of stocks, where one axis is the industry and the other axis is the stocks within the industry. And let's say we're looking at this VISUALLY, so the source data consists of 144 separate time series (we can color code them for convenience). At each "tick" in the network (no pun intended) we will feed into the network, a graph of 144 stock tickers, let's say each ticker is one day's worth of trading information.

Initially, we will disconnect the forward portion of the timeline and the vector V, and just allow the convolutional layers at T < 0 to learn the features of the input data. (In other words, in this first phase the stack is like a developing visual system). We expect that after this training phase, the left side of our stack will look very much like the visual system in the human brain - the first layer will respond to orientation (slopes of tickers), the second layer will be translation invariant, the third layer might process color, and so on. Overall the convolutional layers are extracting the "features" of the input data.

Now let's say we turn on the stack layers on the right, so the network starts making predictions. We expect these predictions will occur based on the processing of the corresponding sensory layer for each window size, in other words the first layer at T > 0 will learn to predict according to orientation (which equates with the instantaneous slopes of our 144 tickers), the second layer will generalize invariances, the third layer will predict per-color, and so on As these predictions develop, we will return error information to each right-sided stack layer using back propagation (or whatever other method is convenient). Eventually we will expect increasing prediction accuracy on the right, as we train this "motor" portion of the timeline. We expect that the predictions will become more specific as we move inward from the extreme right towards T=0, in other words the very next layer at T=1 is going to give us a slope (which we interpret as the amount and direction of the next predicted trade for that stock), whereas the layers on the far right at T >> 0 will be high level (maybe even strategic), and somewhere in the middle at Ti we'll get the relationships between individual stocks and their respective industries.

This completes the training of the stack. Now, we introduce a global associative layer that sits horizontally across the entire stack. It is connected locally per layer, so the neurons on the far left get high level sensory information, and the neurons on the far right get high level motor information. When we train this layer, it will learn the "associations" between the data in each stack layer, at successive points in time. Subsequently after training, if we allow only one stack layer at a time to access the global store, we expect it will return the predicted configurations of the OTHER stack layers, given that input. While there is some redundancy in this concept, it's not equivalent to knowing the responses of the other layers, because the global layer will train to ALL layers instead of just one at a time. (In real life the sequential access of stack layers to the global store corresponds to alpha brain waves).

Finally we engage our vector V, and allow it to begin managing the weight of each interval in the decision making process. We expect that "under certain global conditions" V will overweight or underweight a particular stage in the stack - and note that "a stage" as defined in V is a sensory-motor pair, consisting of two stack layers with corresponding indices T(-n) and T(+n).

This architecture suggests plenty of useful modifications and enhancements, for example we can let V modify the information coming out of the global store, or vice versa. We can also organize a secondary motor structure whose purpose it is to maintain the trading "posture" (or position). If we want a single signal that says "trade now" we can easily get that, but we can also get signals that say "unwind this position and move into another", and such high level commands would then result in a flurry of trading at T=0 which is why we need the supplementary motor area to organize this activity, the idea being that such trades are required to be simultaneous, or nearly so).
In real life the supplementary motor coordination system resembles the function of the basal ganglia and the cerebellum in the human brain, which organize and coordinate rhythmic and timed motor activity.

A peculiarity of the supplemental motor system in humans is it's driven by reward. What ends up happening is the global store feeds a range of possible actions to the supplementary system, which then selects between them based on which is most advantageous. This becomes especially important when there are conflicting trading signals, and in this case the determination of advantage requires a global optimization. In its full behavior the supplemental system has to be able to start and stop trading in progress, without thrashing.
 
One more note on this - the additional requirement for starting and stopping alternate prediction strategies introduces the need for a short term memory that can replicate patterns in the stack on demand.

In a real brain this role is fulfilled by the hippocampus, which lives on the far left of the stack line, and communicates with the prefrontal cortex which lives on the right. The hippocampus is where we find the place cells, time cells, and grid cells that translate the external reference frame into the internal reference frame.

It is also where we find the "phase coding" that operates in conjunction with the theta rhythm. Phase coding is a way of compressing stack configurations into a small number of neural spike trains. The only requirement is a topographic mapping of the left side of the stack. The hippocampus converts stack configurations into events ("episodes", the psychologists say it handles "episodic" memory, which lasts about half an hour in most vertebrates).

The way it does this is real interesting, it takes the most important features of an event and creates a map using TTFS encoding ("time to first spike" relative to the theta rhythm), then about 3 msec later builds a feature map from the details, relative to the hot spots in the initial encoding. So in 1 msec it'll tell you where the important features are, and in less than 5 msec it'll detail those features in a way that allows them to be replicated on demand in the stack.

Also with this architecture it becomes quickly obvious that careful control of the information entering the global store is essential. This is why we have "consolidation" of short term memory into long term memory. Only the important events are consolidated, the rest are discarded.
 
This is what a real brain looks like in action. In this particular image, the outer portions of the stack are active, and the inner portions around T=0 not so much.

1732522909149.webp


The red thing in the middle is approximately the hippocampus. You can see how it connects to the entire extent of the stack. Here's a more precise anatomical picture, in this pic you can see how it emphasizes the left (posterior) portion of the stack.

1732523259164.webp


Here is a stack image where the central neighborhood around T=0 is more active than the edges.

1732523199744.webp
 
Here's a wiring diagram of the hippocampus. You can see this would be fairly easy to implement in TensorFlow.

1732523880360.webp


The stack itself, is a lot harder. Each stack layer has a "stack on top of a stack", there are sub-layers within the stack and they have complex connectivity. This pic shows the frontal cortex that the hippocampus talks to, and it also shows the auxiliary motor system that selects from the available behaviors.

1732524274805.webp



The top portion of this pic is a stack layer. It has two sub-layers, on top and on the bottom. The input connects in the middle. The top sublayer connects only to other stack layers, whereas the bottom sublayer provides the output. The green cells at the top of L5 provide the yellow input at the top of the hippocampus pic (indirectly, there's a layer in between).

Consider one of the green cells in the hippocampus. It will fire when it has strong enough convergent input at the top and in the middle. When it fires, two things happen:

1. It generates the TTFS signal
2. A "notification of firing" goes backwards up the cell into the yellow layer, which then integrates its synaptic inputs in the usual way (sum of weights times activity). As soon as the TTFS signal has cleared, the feature map is sent back down the green cell, which transmits its output in the form of a burst (several spikes in a row). The burst contains the phase encoded feature map.

This behavior of TTFS-then-burst is not easily represented in TensorFlow. It's even hard to model in "real" network languages like NeuroML. It's highly nonlinear and it depends on an accompanying dynamic oscillation.

The further complication is that the learning rule in the yellow-to-green synapses is non-Hebbian, it can't be modeled with simple correlation. It depends on two types of excitatory calcium channels, one is very fast with a fast recovery time and the other is slower with a much longer recovery time. The learning in the yellow-to-green connection has to use STDP ("spike timing dependent plasticity"): if the input precedes the spike within a narrow window the synapse is reinforced, whereas if it lags the spike in a much longer window the synapse is depressed.

So in the creation of the feature map, each cell has to distinguish gaps in the spike train from actual lack of activity. It does this by having the dendrites themselves generating mini-spikes. The dendritic spikes are not ordinary action potentials, they don't faithfully propagate back down into the cell body and the axon, and they can fail at branch points (which is an important part of the learning mechanism). The dendritic spikes are generated in "spines" that look like little mushrooms in the microscope, they have high impedance stalks that generate but do not faithfully transmit the calcium spikes. This is what they look like:

1732526578129.webp



1732526615861.webp
 
This is the same mechanism we're talking about, applied to materials.

Here, scientists have succeeded in creating and accessing additional dimensions in a photonic lattice.

Same mechanism. Synthetic dimensionality using information geometry.

 

New Topics

Back
Top Bottom