## The Golden-Thompson inequality

Posted at

Let ${A, B}$ be two Hermitian ${n \\times n}$ matrices. When ${A}$ and ${B}$ commute, we have the identity

$\\displaystyle e^{A+B} = e^A e^B.$

When ${A}$ and ${B}$ do not commute, the situation is more complicated; we have the Baker-Campbell-Hausdorff formula

$\\displaystyle e^{A+B} = e^A e^B e^{-\\frac{1}{2}[A,B]} \\ldots$

where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity

$\\displaystyle \\hbox{det}(e^{A+B}) = \\hbox{det}(e^A e^B).$

Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between ${e^{A+B}}$ and ${e^A e^B}$, namely the Golden-Thompson inequality

$\\displaystyle \\hbox{tr}(e^{A+B}) \\leq \\hbox{tr}(e^A e^B). \\ \\ \\ \\ \\ (1)$

The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices ${A, B}$ are required. Note that the right-hand side can be rearranged using the cyclic property of trace as ${\\hbox{tr}( e^{B/2} e^A e^{B/2} )}$; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as ${\\hbox{tr}(e^A e^B e^C)}$ need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.

To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in ${A, B}$. The left-hand side expands as

$\\displaystyle \\hbox{tr} 1 + \\hbox{tr} (A+B) + \\frac{1}{2} \\hbox{tr} (A^2 + AB + BA + B^2) + \\frac{1}{6} \\hbox{tr} (A+B)^3$

$\\displaystyle + \\frac{1}{24} \\hbox{tr} (A+B)^4 + \\ldots$

while the right-hand side expands as

$\\displaystyle \\hbox{tr} 1 + \\hbox{tr} (A+B) + \\frac{1}{2} \\hbox{tr} (A^2 + 2AB + B^2)$

$\\displaystyle + \\frac{1}{6} \\hbox{tr} (A^3 + 3A^2 B + 3 A B^2+B^3) +$

$\\displaystyle \\frac{1}{24} \\hbox{tr} (A^4 + 4 A^3 B + 6 A^2 B^2 + 4 A B^3 +B^4) + \\ldots$

Using the cyclic property of trace ${\\hbox{tr}(AB) = \\hbox{tr}(BA)}$, one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out ${(A+B)^4}$ and using the cyclic property of trace as much as possible, we see that the fourth order terms almost agree, but the left-hand side contains a term ${\\frac{1}{12} \\hbox{tr}(ABAB)}$ whose counterpart on the right-hand side is ${\\frac{1}{12} \\hbox{tr}(ABBA)}$. The difference between the two can be factorised (again using the cyclic property of trace) as ${-\\frac{1}{24} \\hbox{tr} [A,B]^2}$. Since ${[A,B] := AB-BA}$ is skew-Hermitian, ${-[A,B]^2}$ is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)

Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair ${A, B}$ of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the ${A}$ factors lie on one side of a product and all the ${B}$ factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.

The proof of the Golden-Thompson inequality relies on the somewhat magical power of the tensor power trick. For any even integer ${p = 2,4,6,\\ldots}$ and any ${n \\times n}$ matrix ${A}$ (not necessarily Hermitian), we define the ${p}$-Schatten norm ${\\|A\\|_p}$ of ${A}$ by the formula

$\\displaystyle \\| A \\|_p := (\\hbox{tr}(AA^*)^{p/2})^{1/p}.$

(This formula in fact defines a norm for any ${p \\geq 1}$, but we will only need the even integer case here.) This norm can be viewed as a non-commutative analogue of the ${\\ell^p}$ norm; indeed, the ${p}$-Schatten norm of a diagonal matrix is just the ${\\ell^p}$ norm of the coefficients.

Note that the ${2}$-Schatten norm

$\\displaystyle \\|A\\|_2 := (\\hbox{tr}(AA^*))^{1/2}$

is the Hilbert space norm associated to the Frobenius inner product (or Hilbert-Schmidt inner product)

$\\displaystyle \\langle A, B \angle := \\hbox{tr}(A B^*).$

This is clearly a non-negative Hermitian inner product, so by the Cauchy-Schwarz inequality we conclude that

$\\displaystyle |\\hbox{tr}(A_1 A_2^*)| \\leq \\| A_1 \\|_2 \\|A_2\\|_2$

for any ${n \\times n}$ matrices ${A_1, A_2}$. As ${\\|A_2\\|_2 = \\|A_2^*\\|_2}$, we conclude in particular that

$\\displaystyle |\\hbox{tr}(A_1 A_2)| \\leq \\| A_1 \\|_2 \\|A_2\\|_2$

We can iterate this and establish the non-commutative H?lder inequality

$\\displaystyle |\\hbox{tr}(A_1 A_2 \\ldots A_p)| \\leq \\| A_1 \\|_p \\|A_2\\|_p \\ldots \\|A_p\\|_p \\ \\ \\ \\ \\ (2)$

whenever ${p=2,4,8,\\ldots}$ is an even power of ${2}$. Indeed, we induct on ${p}$, the case ${p=2}$ already having been established. If ${p \\geq 4}$ is a power of ${2}$, then by the induction hypothesis (grouping ${A_1 \\ldots A_p}$ into ${p/2}$ pairs) we can bound

$\\displaystyle |\\hbox{tr}(A_1 A_2 \\ldots A_p)| \\leq \\| A_1 A_2 \\|_{p/2} \\|A_3 A_4\\|_{p/2} \\ldots \\|A_{p-1} A_p\\|_{p/2}. \\ \\ \\ \\ \\ (3)$

On the other hand, we may expand

$\\displaystyle \\| A_1 A_2\\|_{p/2}^{p/2} = \\hbox{tr} A_1 A_2 A_2^* A_1^* \\ldots A_1 A_2 A_2^* A_1^*.$

We use the cyclic property of trace to move the rightmost ${A_1^*}$ factor to the left. Applying the induction hypothesis again, we conclude that

$\\displaystyle \\| A_1 A_2\\|_{p/2}^{p/2} \\leq \\| A_1^* A_1 \\|_{p/2} \\|A_2 A_2^*\\|_{p/2} \\ldots \\| A_1^* A_1 \\|_{p/2} \\| A_2 A_2^* \\|_{p/2}.$

But from the cyclic property of trace again, we have ${\\| A_1^* A_1 \\|_{p/2} = \\|A_1\\|_p^2}$ and ${\\| A_2 A_2^* \\|_{p/2} = \\|A_2\\|_p^2}$. We conclude that

$\\displaystyle \\|A_1 A_2 \\|_{p/2} \\leq \\|A_1\\|_p \\|A_2\\|_p$

and similarly for ${\\|A_3 A_4\\|_{p/2}}$, etc. Inserting this into (3) we obtain (2).
Remark 1 Though we will not need to do so here, it is interesting to note that one can use the tensor power trick to amplify (2) for ${p}$ equal to a power of two, to obtain (2) for all positive integers ${p}$, at least when the ${A_i}$ are all Hermitian. Indeed, pick a large integer ${m}$ and let ${N}$ be the integer part of ${2^m/p}$. Then expand the left-hand side of (2) as ${\\hbox{tr}( A_1^{1/N} \\ldots A_1^{1/N} A_2^{1/N} \\ldots A_p^{1/N} \\ldots A_p^{1/N} )}$ and apply (2) with ${p}$ replaced by ${2^m}$ to bound this by ${\\| A_1^{1/N} \\|_{2^m}^N \\ldots \\|A_p^{1/N}\\|_{2^m}^N \\| 1 \\|_{2^m}^{2^m-pN}}$. Sending ${m \ightarrow \\infty}$ (noting that ${2^m = (1+o(1)) Np}$) we obtain the claim.

Specialising (2) to the case where ${A_1=\\ldots=A_p = AB}$ for some Hermitian matrices ${A, B}$, we conclude that

$\\displaystyle \\hbox{tr}( (AB)^{p} ) \\leq \\| AB \\|_p^p$

and hence by cyclic permutation

$\\displaystyle \\hbox{tr}( (AB)^{p} ) \\leq \\hbox{tr}( (A^2 B^2)^{p/2} )$

for any ${p = 2,4,\\ldots}$. Iterating this we conclude that

$\\displaystyle \\hbox{tr}( (AB)^{p} ) \\leq \\hbox{tr}( A^p B^p ). \\ \\ \\ \\ \\ (4)$

Applying this with ${A, B}$ replaced by ${e^{A/p}}$ and ${e^{B/p}}$ respectively, we obtain

$\\displaystyle \\hbox{tr}( (e^{A/p} e^{B/p})^{p} ) \\leq \\hbox{tr}( e^A e^B ).$

Now we send ${p \ightarrow \\infty}$. Since ${e^{A/p} = 1 + A/p + O(1/p^2)}$ and ${e^{B/p} = 1 + B/p + O(1/p^2)}$, we have ${e^{A/p} e^{B/p} = e^{(A+B)/p + O(1/p^2)}}$, and so the left-hand side is ${\\hbox{tr}( e^{A+B + O(1/p)} )}$; taking the limit as ${p \ightarrow \\infty}$ we obtain the Golden-Thompson inequality. (See also these notes of Vershynin for a slight variant of this proof.)

If we stop the iteration at an earlier point, then the same argument gives the inequality

$\\displaystyle \\| e^{A+B} \\|_p \\leq \\| e^A e^B \\|_p$

for ${p=2,4,8,\\ldots}$ a power of two; one can view the original Golden-Thompson inequality as the ${p=1}$ endpoint of this case in some sense. (In fact, the Golden-Thompson inequality is true in any operator norm; see Theorem 9.3.7 of Bhatia