Imaginary time is used in neural networks, particularly within the field of quantum computing and quantum field theory, to prepare ground states and thermal states of quantum systems. Instead of direct real-time evolution, the process uses a mathematical transformation (imaginary time) to project an initial state towards the ground state. Artificial neural networks (ANNs) are employed to either approximate this evolution, train on quantum data, or learn the properties of the system at different temperatures.
How it works
Wick Rotation: The core idea is a mathematical trick called a Wick rotation, which changes the real time
tt
to an imaginary time
Ï„=ittau equals i t
. This transformation causes the non-unitary real-time evolution to become a non-unitary imaginary-time evolution, which can be used to find the ground state.
Quantum state preparation: The imaginary-time evolution acts as a projection, exponentially cooling the system down to its lowest energy state (the ground state).
Neural network integration: ANNs are used to overcome the limitations of quantum computers. They can be trained to:
Approximate the wave function: Train a network to represent the target wave function derived from the Schrödinger equation.
Learn from data: Train on data from quantum simulations at different temperatures to estimate the system's properties, such as its action.
Guide evolution: Use deep reinforcement learning to guide the imaginary-time evolution and mitigate algorithmic errors.
Classical analogy: In a sense, the neural network is trained on the probability of a system configuration at a given temperature, which is related to the Boltzmann factor
e−Se raised to the negative cap S power
.
Applications
Quantum state preparation: Prepare ground states and thermal states on near-term quantum devices.
Quantum field theory: Estimate the action in quantum field theories, helping to explore phase diagrams.
Quantum many-body systems: Represent the thermal states of many-body quantum systems.
Key takeaway
Imaginary time is not "unreal" but a mathematical tool used in quantum physics. In the context of neural networks, it enables the use of classical ANNs and quantum-classical hybrid methods to solve complex quantum problems, such as finding the ground state or learning about quantum systems at different temperatures.
.
It's used in physics, in relativity. By introducing an imaginary variable in the right place, difficult problems in Euclidean space can be "rotated" into Minkowski space, solved there, and rotated back. (Or vice versa).
But there is a deeper more mysterious and more significant application of Wick rotations, in probability theory. You'll see why it's significant in a moment.
First we will need the concept of moments. You may know these by their colloquial names mean, variance, skewness, kurtosis, and so on. Actually there are an infinite number of such moments, they form an infinite series that describes the probability distribution, one can speak of (and calculate) the n-th order moment, which in the probability theory is defined as E[X^n], and most often we're interested in a moment "around a point" (like, around the mean), and in that case these are "central" moments defined as E[(X-x0)^n].
The idea is we're sampling the variable n times. In theory if everything is independent and nicely behaved the probability after n outcomes should simply be the product of the n probabilities, hence the outcome raised to the power.
In SOME nicely behaved cases, we can have a function that generates the moments for us, it's called the moment generating function. For example if you have a family of Gaussians with widths varying from infinity to 0, you can parametrize these from the unit interval.
The moment generating function, if it exists, defines the shape of the probability distribution. Which is where this topic gets interesting. There is a deep relationship between the moment generating function and the Fourier transform of the probability distribution. Which occurs in the complex plane and supports "Wick rotations" that link statistical mechanics with quantum mechanics.
Turns out, understanding Wick rotations is vital for any serious study of chaos and criticality. There is already a bunch of clever methods and a huge literature on phase plane methods in physics, one of them directly represents the Schrodinger equation as a 2n-dimensional probability map and uses it to solve spin glass geometry and things like that. It all leads to Feynman's path integrals, which are directly analogous to moment generating functions.
If you want to know how a particle gets from here to there, you have to know "all possible paths" by which that can happen, and each of the possible paths has a probability assigned to it. You have a "probability distribution" where the outcome is a linear combination of possible paths. There is an "evolution" of the state of the system according to Schrodinger's equation, which in turn maps back to Wiener's original math around Brownian motion. The maths are the same, and it's the same math needed to understand criticality.
One of the best studied chaotic systems in physics is the spin glass. That's what happens to a magnet if you heat it beyond its Curie temperature, the little magnetic dipoles that used to align now start floating around in various directions like a liquid, suddenly the magnetic spins are disordered and chaotic.
And it just so happens that spin glasses are an excellent case study because we have working models for both Wick and non-Wick solutions.
Me too. I want to know how to get a neural network to "automatically" perform complex math.
In other words, the weirdo dimension "i" is just a different set of rules for the algebra. The rules are derived from the Clifford algebra of the appropriate signature, that's how you get your conjugation (reflection through the real axis).
You can derive a complex map from Cartesian coordinates by means of a perspective transformation. The easiest way to do that (the way the computer people do it, so they can use the matrix math in their GPU's) is to use homogeneous coordinates. (x,y,z) becomes (x/w, y/w, z/w, w) where w is an arbitrary real - and in this way any point lying along a line of slope w is actually the same point in Cartesian space ([1,2,3,1] and [2,4,6,2] are the same point).
The computational benefit of doing this is you can now treat translation as a part of a single matrix multiplication, and you can compose transformations (multiply matrices) to get an end result.
When you do this, you are "compactifying" the Cartesian coordinate space. I already showed you the simple example in one dimension. In three dimensions, you need to add a line at infinity instead of just a point, because the endpoint in each direction will become a point on the new line.
These concepts of algebraic symmetries and geometric mappings are related by the groups in the Clifford algebra, the solutions admit Lie groups in the continuous case. In the discrete case there are interesting results too, for instance the Fano plane has 7 points and 7 lines, it's the smallest finite projective plane.
Here's a graphic of a simple projective space, this is the real projective line. We are "compactifying" the real line, using a projective mapping. You'll notice the coordinates.
So this coordinate system can be transformed into ordinary complex coordinates, then "Wick rotated" so the North Pole is no longer a singularity (perhaps some other point takes its place, but we can now do math on the "point at infinity").
A fundamental principle of geometry is that shapes are "invariant" even when they are described differently using different coordinate systems. A circle is still a circle, whether it's described in polar coordinates or Cartesian coordinates.
Riemannian geometry generalizes this into the concept of a "manifold", and "charts" that describe its relationship to a coordinate system. Some coordinate systems have peculiarities, for example one can not divide by 0 in Cartesian space, therefore one can not take derivatives at poorly behaving points. However often, if one changes coordinates the offensive behavior disappears and math can be done smoothly and efficiently. A simple example is taking the derivative of a circle at the point x=0, which is undefined in Cartesian coordinates but nicely behaved in polar coordinates. In other words, the defect is in the chart, not in the surface.
Consider an application that is important in information processing. Let's say we have a neural network where every neuron has spontaneous activity in the absence of input (in other words it's an oscillator, and we'll wave our hands over the stochastic aspects by talking about the "average" firing rate). There are inputs to this network, and excitatory inputs will increase the frequency of oscillation, whereas inhibitory inputs will decrease it. Let's say our baseline firing rate is 10 Hz.
So now, when we start out, all of our oscillators are at random phases so the "average" coherence in the network is 0. But now let's say we have a "reset command", a way to synchronize the entire network to line up all the phases of the oscillators so they all start at the same time. We issue the "sync pulse" and then the average of our network activity now oscillates at 10 Hz instead of hovering around 0 - because all the phases of the individual oscillators now line up.
Here's the magic: superimpose on this structure, a linear gradient. Something simple, let's say ... head position. Straight ahead is 0, look to the right is positive (excitatory), and look to the left is negative (inhibitory). If we look to the left by one unit so we get -1 inhibition, then the neuron will oscillate at 9 Hz instead of 10, which means it will only be phase aligned with the population every 9 cycles. We can do the same thing for 8 Hz and two cycles, or 7 Hz and three cycles, etc. So the number of alignments is now proportional to the angle of gaze, and we have traded a time difference for spatial position in the network. This lets us couple "where are you looking" with "what do you see there", and sequences of these information pairs are the basis of episodic memory.
However we have to remember the paths because we'll need them again, so as we navigate there is a natural dual relationship between the endpoints a and b and the path p(a,b) we took to get there. This situation is formally equivalent to a "pullback function" in Riemannian geometry, except it operates in a space of functions (it's a "functional"). Scientists have tested this model in complicated scenarios like continual reversal and hairpin mazes, it works.
We "could" try doing all the math in Euclidean space, in which case we need cosines and therefore inner products, but really we have an affine situation because the surface is just a surface and doesn't necessarily have a special point called the origin. So we don't have to fix the angles by defining what 1 "unit" means, we just create a ring attractor (using Lie groups) and let the system figure out the roots of unity.
Therefore our most natural and convenient representation of these scenarios is in the complex domain (which is homeomorphic with real projective space). In fact right off the bat we have some important similarities with physics, for instance the idea that the energy that goes into changing the angle of gaze is conserved as the frequency of network oscillations. (E=hf, it's just not quantized the same way, even though the symmetries still exist elsewhere).
And you'll notice that "units" of gaze angle are periodic and all we're really doing is a spatial Fourier transform.
What this is good for, is when the charts change in midstream. Like let's say you get to a hairpin turn in a maze. As you take the turn, your entire visual scene rotates 180 degrees, so what used to be on the left is now on the right. The shape of the room hasn't changed, only the coordinate system used to describe it.
Softmax has no complex terms. However it turns out the softargmax function is formally identical to the Boltzmann distribution in statistical mechanics. Where, (inverse) temperature can be represented in complex terms.
In machine learning, you can have a vector of outputs {y} and apply a softmax to it and arrive at an estimate of "coldness". Here, temperature means the same thing it does in physics, a higher temperature results in a more "random" (higher entropy) output with a more uniform "shape" (configuration).
The specific linkage between coldness and the complex domain is called "thermodynamic beta".
Note the outer circle has 0 on top and infinity on the bottom, whereas the inner circle has infinity on top and 0 at the bottom.
And, you'll note the arrows indicating the circle isn't "quite" closed - to close it we have to "compactify" the temperature.
In machine learning though, the circle is never closed - in fact quite the opposite, it's built up in "patches" and you can only see a small piece of it at a time. There is often other circuitry that treats the patchwork as if it were a circle and estimates the radius and angles.
In the brain, such a projective transformation happens in additional dimensions in the visual system. Building structures from patchworks and stick figures seems to be a recurring theme in brain architecture.
Yes. And for his theory of black hole evaporation too.
Me too. I want to know how to get a neural network to "automatically" perform complex math.
In other words, the weirdo dimension "i" is just a different set of rules for the algebra. The rules are derived from the Clifford algebra of the appropriate signature, that's how you get your conjugation (reflection through the real axis).
You can derive a complex map from Cartesian coordinates by means of a perspective transformation. The easiest way to do that (the way the computer people do it, so they can use the matrix math in their GPU's) is to use homogeneous coordinates. (x,y,z) becomes (x/w, y/w, z/w, w) where w is an arbitrary real - and in this way any point lying along a line of slope w is actually the same point in Cartesian space ([1,2,3,1] and [2,4,6,2] are the same point).
The computational benefit of doing this is you can now treat translation as a part of a single matrix multiplication, and you can compose transformations (multiply matrices) to get an end result.
When you do this, you are "compactifying" the Cartesian coordinate space. I already showed you the simple example in one dimension. In three dimensions, you need to add a line at infinity instead of just a point, because the endpoint in each direction will become a point on the new line.
These concepts of algebraic symmetries and geometric mappings are related by the groups in the Clifford algebra, the solutions admit Lie groups in the continuous case. In the discrete case there are interesting results too, for instance the Fano plane has 7 points and 7 lines, it's the smallest finite projective plane.