Friday, November 20, 2020

Getting Started with Random Matrices: A Step-by-Step Guide

Bringing it all together

In this section I will give a “dirty” evaluation of the eigenvalue density. Although this evaluation gives the correct result, it hides several subtle points I will only briefly mention for now. I will follow a statistical mechanics approach, mostly focused on the calculation side. Also, we will focus for now only on the simplest of the RMT ensembles, the Gaussian Orthogonal Ensemble, or GOE for simplicity. Matrices belonging to this ensemble all have elements that are Gaussian variables; the other condition is the symmetry class under which these matrices are invariant, rotational invariance. Let be O an orthogonal matrix, then O^T = O^{-1}, where the superscript “T” stands for the transpose operation. This matrix acts on the elements of a matrix M and rotates them. For example, in two and three dimensions these matrices have the known form:

where the suffix “z” in the three dimensional matrix means that rotations are performed around the z axis.

What does it mean for M to be rotationally invariant? The rotated matrix $ M’ = O M O^T and M have the same eigenvalues but rotated eigenvectors w= O v. So the spectrum of M’ and M is the same. The first step in the calculation consists in manipulating the trace of the resolvent as follows:

Looking at the above expression you may think we have taken a step behind instead of forward! The idea is, however, to rewrite the trace of the resolvent as a more familiar object in statistical mechanics. Consider the following m-dimensional Gaussian Matrix integral:

the idea is to use this identity “backwards” to rewrite:

where Z now has the form of a statistical mechanics partition function for a specific realization of M and the measure š’Ÿ šœ‘ = ∏ dšœ‘/(2 šœ‹)^(N/2), where the product runs over i=(1,2…,N). So far, we have just performed some formal manipulations to massage our initial expression in a form that can be treated using the tools of statistical mechanics. We can also identify F_N(z) =- 1/N log Z(z) as the free energy of the system (with inverse temperature Ī²=1/T); note that from a probability point of view, this is the (average) cumulant generating function defined in Eq.(14). At this point we need to average the trace of the resolvent over the different realizations of M using the rotationally invariant, Gaussian probability distribution:

where the first term on the right hand side is a normalization constant and we have used a unit standard deviation scaled by N as Ļƒ = 1/√N and zero mean. This brings us to the first (and only for this post) subtlety I mentioned before: when we take the expectation value of the trace of the resolvent in Eq.(17), we need to evaluate < log Z_M >_M, where < . >_M denotes the expectation w.r.t. M. The problem is: how do we evaluate the expectation value of the logarithm of a function and what is its meaning? A detailed answer to this question will be given in a follow up post, for the moment let me state few facts here without proof:

• < log Z_M >_M and log <Z_M >_M are known respectively as the quenched and annealed averages.

  • For classical Random Matrix ensembles, it turns out that <log Z_M >_M = log < Z_M >_M$; the results presented here does give the right answer within a simpler calculation scheme. However, it is not clear why these two type of averages should be equal (or different) for now.

In what follows we evaluate the annealed average of the partition function. In order to get all pre-factors right, is it convenient to write everything in components and separate the diagonal from the off-diagonal part. In doing so, we use the fact that for a symmetric matrix X we can separate the sum over the elements as:

where the second summation spans only the upper triangular part of the matrix and the factor of 2 accounts for the fact that the matrix elements in the lower triangular part are identical. Using this property in both the probability distribution and the resolvent, we can take the expectation value of the diagonal and off-diagonal elements separately and then sum back the two components as follows:

Pay attention to the newly generated non-gaussian term in the exponent proportional to šœ‘⁴; In a physical theory, this term would appear to describe interactions of some sort, e.g. a density-density interaction between “spins” at site i and j. More in general, this term describes how the fluctuations of šœ‘ (the variance) at site i are correlated to those at site j. In order to decouple this term we introduce an additional parameter that plays also the role of the order parameter in statistical mechanics. You may think about this as a change of variable in the integral or a generalized Hubbard-Stratonovich transformation in condensed matter theory. The idea is to use the delta function to enforce the scalar “constraint”:

Before jumping into the calculation, let us understand what this parameter means. Certainly, it redefines a quadratic variable as a linear one. Its meaning however, is to measure the fluctuations of šœ‘ at site i; clearly, this is not a simple change of variable! Order parameters are central objects in statistical mechanics and quantum field theory, as they are used to study transitions across different phases of the system.

This definitely looks better than our initial expression! In the second line of Eq.(23) we have used the Fourier representation of the Dirac-delta and introduced in this way the scalar Lagrange multiplier Ī¾ [6]. This looks like a neat trick, but it is an incredibly useful and far reaching way of enforcing constraints in statistical mechanics and quantum field theory, where in the latter the delta becomes a functional that can be used, e.g. to regularize Gauge theories [7]! Before proceeding, it is convenient to redefine the Lagrange multiplier as follows: Ī· = 2 i Ī¾/N and perform the Gaussian integral over šœ‘:

In case you wonder, the log Ī· term comes from re-exponentiating the result of the Gaussian integral over šœ‘. We are interested in solving these two integrals in the limit of large N, i.e. we are looking for a continuous approximation to the eigenvalues density. In this limit, the evaluation of the two integrals becomes particularly simple, as we can use the steepest descent method [8]. If you are familiar with the concept of mean-field theory in physics, the steepest descent gives you exactly this result (for the moment just keep this in mind, I will come back on this in another post). The idea is to evaluate the integral in the stationary points of ā„’(Ī·, q):

This gives a quadratic equation for q and therefore two stationary points:

As it is often the case in physics, only one of the two solutions makes sense; in this case we will select the solution giving a finite result. We can now use q* to evaluate the result of the integral and finally the trace of the resolvent in Eq.(15):

Note that the factor of N in the definition of the trace of the resolvent cancels out with our solution and we obtain a finite and N independent (remember this is a continuous approximation) result. In order to obtain the eigenvalue density, we need to perform first the analytic continuation z → Ī» + i Īµ, take the Imaginary part and then the limit Īµ → 0^(+). Note that the order of these operations is important! The tricky part is the imaginary part appearing in the square root; rather than having isolated poles, the square root has a branch-cut in the complex plane. A neat and simple way to take care of this problem without using advanced complex analysis is given in Ref. [3] by means of the following identity:

where I have defined:

Using this result back in the definition of the eigenvalue density, and taking the Īµ → 0^(+) limit, we obtain:

From a direst inspection you can see that if |Ī»| > 2 the spectrum is zero (this is a real function!), while is it finite for |Ī»|<2. As the spectrum is defined as a positive quantity, the sign of the solution should be chosen as + for positive Ī» and - otherwise. We finally obtain the celebrated Wigner semicircle law:

So much sweat for such a little formula :) In the section I will plot the above expression against an explicit numerical evaluation.

Numerical experiments

To conclude, we check results against a direct numerical evaluation. Details can be found in this Github repository, here I will go through the main steps only.

At the time Wigner solved this problem, it was probably prohibitive to check results numerically if not via lengthy processes on some gym-room size computer. Nowadays you can evaluate explicitly these expressions on your laptop and check they agree with the analytical results.

One of the problem you face when running simulations is the finite size of the system. Remember that Eq.(31) has been obtained in the limit of large N. But how large is large? How big our simulation needs to be in order to agree with the analytical formula? To answer this question, let us write a function to perform the all evaluation:

The pseudo code is as follows:

  1. Generate an N×N matrix instance X by sampling from the normal distribution with mean zero and variance Ļƒ = 1/√N.
  2. Create a symmetric version of the matrix using: Xs = (X+X^T)/√2.
  3. Get the eigenvalues using numpy.eigvalsh(Xs) note that this is a numerically optimized method to obtain eigenvalues of symmetric matrices.
  4. Repeat n_samples times in order to sample different realizations of X.

For a small 10 ×10 matrix you will find something line this:

Eigenvalue distribution for 1000 samples of an N=10 random matrix in the GOE ensamble. The continuous line is the plot of the Wigner semicircle law.

You can see two things here: one is that the tails of the numerical distrbution exceeds the domain [-2,2] of the bulk semicircle. And two is that the bins do not agree too well with the analytical expression. We can reperat the evaluation for N=500:

Eigenvalue distribution for 1000 samples of an N=500 random matrix in the GOE ensamble. The continuous line is the plot of the Wigner semicircle law and the x marks the value of the bin.

You can see that the results gets definitely better. We can evaluate the error between the analytical and numerical values as a function of the matrix size. Here I compute the Mean Square Root error (MSRE):

In the above equation, the first eigenvalue density is the one evaluated numerically, while the second is the expression we have evaluated analytically, while S is the number of samples. In the plot below I separate the tails from the bulk error:

RMSE error as a function of matrix size for 1000 samples per size.

You can see how the error goes down to ~ 0 (you can add more points to get a better result here) as the matrix size increases, providing in this way a nice numerical validation of our calculation above.

Closing remarks

In this article I presented a simple step-by-step introduction to Random Matrix theory, and in particular the Gaussian Orthogonal Ensemble. The calculation I presented is not the standard one you will find in books, the reason being that some of the steps I performed, e.g. the annealing average, are difficult to justify in this way. I will present a more standard derivation based on the so called “Coulomb gas analogy” in a follow up article.

Hope you enjoyed this first look at random matrices and please leave comments and questions in the comments.



from Hacker News https://ift.tt/3kNznMd

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.