The Golden-Thompson inequality

Posted at

Let {A, B} be two Hermitian {n \\times n} matrices. When {A} and {B} commute, we have the identity

\\displaystyle  e^{A+B} = e^A e^B.

When {A} and {B} do not commute, the situation is more complicated; we have the Baker-Campbell-Hausdorff formula

\\displaystyle  e^{A+B} = e^A e^B e^{-\\frac{1}{2}[A,B]} \\ldots

where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity

\\displaystyle  \\hbox{det}(e^{A+B}) = \\hbox{det}(e^A e^B).

Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between {e^{A+B}} and {e^A e^B}, namely the Golden-Thompson inequality

\\displaystyle  \\hbox{tr}(e^{A+B}) \\leq \\hbox{tr}(e^A e^B). \\ \\ \\ \\  \\ (1)

The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices {A, B} are required. Note that the right-hand side can be rearranged using the cyclic property of trace as {\\hbox{tr}( e^{B/2} e^A e^{B/2} )}; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as {\\hbox{tr}(e^A e^B e^C)} need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.

To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in {A, B}. The left-hand side expands as

\\displaystyle  \\hbox{tr} 1 + \\hbox{tr} (A+B) + \\frac{1}{2}  \\hbox{tr} (A^2 + AB + BA + B^2) + \\frac{1}{6} \\hbox{tr} (A+B)^3

\\displaystyle  + \\frac{1}{24} \\hbox{tr} (A+B)^4 + \\ldots

while the right-hand side expands as

\\displaystyle  \\hbox{tr} 1 + \\hbox{tr} (A+B) + \\frac{1}{2}  \\hbox{tr} (A^2 + 2AB + B^2)

\\displaystyle  + \\frac{1}{6} \\hbox{tr} (A^3 + 3A^2 B + 3 A  B^2+B^3) +

\\displaystyle  \\frac{1}{24} \\hbox{tr} (A^4 + 4 A^3 B + 6 A^2 B^2 +  4 A B^3 +B^4) + \\ldots

Using the cyclic property of trace {\\hbox{tr}(AB) = \\hbox{tr}(BA)}, one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out {(A+B)^4} and using the cyclic property of trace as much as possible, we see that the fourth order terms almost agree, but the left-hand side contains a term {\\frac{1}{12} \\hbox{tr}(ABAB)} whose counterpart on the right-hand side is {\\frac{1}{12} \\hbox{tr}(ABBA)}. The difference between the two can be factorised (again using the cyclic property of trace) as {-\\frac{1}{24} \\hbox{tr} [A,B]^2}. Since {[A,B] := AB-BA} is skew-Hermitian, {-[A,B]^2} is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)

Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair {A, B} of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the {A} factors lie on one side of a product and all the {B} factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.

The proof of the Golden-Thompson inequality relies on the somewhat magical power of the tensor power trick. For any even integer {p = 2,4,6,\\ldots} and any {n \\times n} matrix {A} (not necessarily Hermitian), we define the {p}-Schatten norm {\\|A\\|_p} of {A} by the formula

\\displaystyle  \\| A \\|_p := (\\hbox{tr}(AA^*)^{p/2})^{1/p}.

(This formula in fact defines a norm for any {p \\geq 1}, but we will only need the even integer case here.) This norm can be viewed as a non-commutative analogue of the {\\ell^p} norm; indeed, the {p}-Schatten norm of a diagonal matrix is just the {\\ell^p} norm of the coefficients.

Note that the {2}-Schatten norm

\\displaystyle  \\|A\\|_2 := (\\hbox{tr}(AA^*))^{1/2}

is the Hilbert space norm associated to the Frobenius inner product (or Hilbert-Schmidt inner product)

\\displaystyle  \\langle A, B \angle := \\hbox{tr}(A B^*).

This is clearly a non-negative Hermitian inner product, so by the Cauchy-Schwarz inequality we conclude that

\\displaystyle  |\\hbox{tr}(A_1 A_2^*)| \\leq \\| A_1 \\|_2 \\|A_2\\|_2

for any {n \\times n} matrices {A_1, A_2}. As {\\|A_2\\|_2 = \\|A_2^*\\|_2}, we conclude in particular that

\\displaystyle  |\\hbox{tr}(A_1 A_2)| \\leq \\| A_1 \\|_2 \\|A_2\\|_2

We can iterate this and establish the non-commutative H?lder inequality

\\displaystyle  |\\hbox{tr}(A_1 A_2 \\ldots A_p)| \\leq \\| A_1 \\|_p  \\|A_2\\|_p \\ldots \\|A_p\\|_p \\ \\ \\ \\ \\ (2)

whenever {p=2,4,8,\\ldots} is an even power of {2}. Indeed, we induct on {p}, the case {p=2} already having been established. If {p \\geq 4} is a power of {2}, then by the induction hypothesis (grouping {A_1 \\ldots A_p} into {p/2} pairs) we can bound

\\displaystyle  |\\hbox{tr}(A_1 A_2 \\ldots A_p)| \\leq \\| A_1 A_2  \\|_{p/2} \\|A_3 A_4\\|_{p/2} \\ldots \\|A_{p-1} A_p\\|_{p/2}. \\ \\ \\ \\ \\ (3)

On the other hand, we may expand

\\displaystyle  \\| A_1 A_2\\|_{p/2}^{p/2} = \\hbox{tr} A_1 A_2 A_2^*  A_1^* \\ldots A_1 A_2 A_2^* A_1^*.

We use the cyclic property of trace to move the rightmost {A_1^*} factor to the left. Applying the induction hypothesis again, we conclude that

\\displaystyle  \\| A_1 A_2\\|_{p/2}^{p/2} \\leq \\| A_1^* A_1 \\|_{p/2}  \\|A_2 A_2^*\\|_{p/2} \\ldots \\| A_1^* A_1 \\|_{p/2} \\| A_2 A_2^*  \\|_{p/2}.

But from the cyclic property of trace again, we have {\\| A_1^* A_1 \\|_{p/2} = \\|A_1\\|_p^2} and {\\| A_2 A_2^* \\|_{p/2} = \\|A_2\\|_p^2}. We conclude that

\\displaystyle  \\|A_1 A_2 \\|_{p/2} \\leq \\|A_1\\|_p \\|A_2\\|_p

and similarly for {\\|A_3 A_4\\|_{p/2}}, etc. Inserting this into (3) we obtain (2).
Remark 1 Though we will not need to do so here, it is interesting to note that one can use the tensor power trick to amplify (2) for {p} equal to a power of two, to obtain (2) for all positive integers {p}, at least when the {A_i} are all Hermitian. Indeed, pick a large integer {m} and let {N} be the integer part of {2^m/p}. Then expand the left-hand side of (2) as {\\hbox{tr}( A_1^{1/N} \\ldots A_1^{1/N} A_2^{1/N} \\ldots A_p^{1/N}  \\ldots A_p^{1/N} )} and apply (2) with {p} replaced by {2^m} to bound this by {\\| A_1^{1/N} \\|_{2^m}^N \\ldots \\|A_p^{1/N}\\|_{2^m}^N \\| 1  \\|_{2^m}^{2^m-pN}}. Sending {m \ightarrow \\infty} (noting that {2^m = (1+o(1)) Np}) we obtain the claim.

Specialising (2) to the case where {A_1=\\ldots=A_p = AB} for some Hermitian matrices {A, B}, we conclude that

\\displaystyle  \\hbox{tr}( (AB)^{p} ) \\leq \\| AB \\|_p^p

and hence by cyclic permutation

\\displaystyle  \\hbox{tr}( (AB)^{p} ) \\leq \\hbox{tr}( (A^2  B^2)^{p/2} )

for any {p = 2,4,\\ldots}. Iterating this we conclude that

\\displaystyle  \\hbox{tr}( (AB)^{p} ) \\leq \\hbox{tr}( A^p B^p ). \\ \\  \\ \\ \\ (4)

Applying this with {A, B} replaced by {e^{A/p}} and {e^{B/p}} respectively, we obtain

\\displaystyle  \\hbox{tr}( (e^{A/p} e^{B/p})^{p} ) \\leq \\hbox{tr}(  e^A e^B ).

Now we send {p \ightarrow \\infty}. Since {e^{A/p} = 1 + A/p + O(1/p^2)} and {e^{B/p} = 1 + B/p + O(1/p^2)}, we have {e^{A/p} e^{B/p} = e^{(A+B)/p + O(1/p^2)}}, and so the left-hand side is {\\hbox{tr}( e^{A+B + O(1/p)} )}; taking the limit as {p \ightarrow \\infty} we obtain the Golden-Thompson inequality. (See also these notes of Vershynin for a slight variant of this proof.)

If we stop the iteration at an earlier point, then the same argument gives the inequality

\\displaystyle  \\| e^{A+B} \\|_p \\leq \\| e^A e^B \\|_p

for {p=2,4,8,\\ldots} a power of two; one can view the original Golden-Thompson inequality as the {p=1} endpoint of this case in some sense. (In fact, the Golden-Thompson inequality is true in any operator norm; see Theorem 9.3.7 of Bhatia