-
Notifications
You must be signed in to change notification settings - Fork 0
/
chapter10_Quad.tex
1540 lines (1488 loc) · 82.8 KB
/
chapter10_Quad.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Quadratic Forms}
Symmetric matrices often see their appearance in the areas of Geometry, Statistics, Physics, and more, as \textit{quadratic forms}. They can be used to describe a family of geometric shapes known as \textit{conic sections}, including ellipses and hyperbola. A more common application of quadratic forms in Earth Sciences will be obtaining a \textit{covariance matrix} between different variables, eventually leading to \textit{Principal Component Analysis (PCA)} which breaks down the variables into uncorrelated modes that explain the spread of their distribution. In Atmospheric Science, it is more commonly known as \textit{Empirical Orthogonal Functions (EOF)} and has been widely used to analyze prominent, large-scale climate patterns like \textit{El Niño–Southern Oscillation (ENSO)}.
\section{Mathematical and Geometric Ideas of Quadratic Forms}
\subsection{Definition of Quadratic Forms}
The word \textit{quadratic} is commonly associated with \textit{quadratic equations} in the form of $y = ax^2 + bx + c$. \index{Quadratic Form}\keywordhl{Quadratic forms} are the generalization of quadratic equations when there are multiple variables $x_1, x_2, x_3, \ldots$: The possible quadratic forms will be made up of the usual quadratic terms $x_1^2, x_2^2, x_3^2, \ldots$, as well as the \textit{cross-product terms} (not to be confused with the cross product in Section \ref{section:crossprod}) $x_px_q$, $p \neq q$. The usual quadratic terms then can be seen as just another kind of cross-product terms when $p = q$. We will limit our discussion to real, finite-dimensional vector spaces first.
\begin{defn}[Quadratic Form]
\label{defn:quadform}
Real quadratic forms in multiple variables $\vec{x} = (x_1, x_2, x_3, \ldots, x_n)^T$ has a structure of
\begin{align*}
Q(\vec{x}) = \sum_{p=1}^{n}\sum_{q=1}^{n} b_{pq} x_px_q
\end{align*}
Note that $b_{pq}$ are real and it produces a real scalar.
\end{defn}
For examples, in a two-variable situation, $x^2 + 3xy + y^2$ and $3x^2 - 4xy$ are quadratic forms, while $x^2 + y$ and $xy + xy^2$ are not. Notice that $x_px_q$ and $x_qx_p$ are actually the same term, and we will replace $b_{pq}$ or $b_{qp}$ by $\frac{1}{2}(b_{pq} + b_{qp})$ as a single coefficient for both of them.
\begin{proper}[Matrix Representation of a Quadratic Form]
\label{proper:quadformmatrix}
All quadratic forms as given in Definition \ref{defn:quadform} for any real vector $\vec{x} = (x_1, x_2, x_3, \ldots, x_n)^T \in \mathcal{V}$ can be expressed as
\begin{align*}
Q: \mathcal{V} \to \mathbb{R},\; Q(\vec{x}) = \vec{x}^TB\vec{x}
\end{align*}
where $B$ is real symmetric and has the form of
\begin{align*}
\left[\begin{array}{@{}ccccc@{}}
b_{11} & \frac{1}{2}(b_{12} + b_{21}) & \frac{1}{2}(b_{13} + b_{31}) & \cdots & \frac{1}{2}(b_{1n} + b_{n1}) \\
\frac{1}{2}(b_{12} + b_{21}) & b_{22} & \frac{1}{2}(b_{23} + b_{32}) & & \frac{1}{2}(b_{2n} + b_{n2}) \\
\frac{1}{2}(b_{13} + b_{31}) & \frac{1}{2}(b_{23} + b_{32}) & b_{33} & \cdots & \frac{1}{2}(b_{3n} + b_{n3})\\
\vdots & & \vdots & \ddots & \vdots \\
\frac{1}{2}(b_{1n} + b_{n1}) & \frac{1}{2}(b_{2n} + b_{n2}) & \frac{1}{2}(b_{3n} + b_{n3}) & \cdots & b_{nn}
\end{array}\right]
\end{align*}
\end{proper}
The readers can check that the above matrix expression indeed leads to the desired quadratic form by a direct expansion.\footnote{\begin{align*}
&\quad \begin{bmatrix}
x_1 & x_2 & x_3 & \cdots
\end{bmatrix}
\left[\begin{array}{@{}cccc@{}}
b_{11} & \frac{1}{2}(b_{12} + b_{21}) & \frac{1}{2}(b_{13} + b_{31}) & \cdots \\
\frac{1}{2}(b_{12} + b_{21}) & b_{22} & \frac{1}{2}(b_{23} + b_{32}) & \\
\frac{1}{2}(b_{13} + b_{31}) & \frac{1}{2}(b_{23} + b_{32}) & b_{33} & \\
\vdots & & & \ddots
\end{array}\right]
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
\vdots
\end{bmatrix} \\
&=
\begin{bmatrix}
x_1 & x_2 & x_3 & \cdots
\end{bmatrix}
\left[\begin{array}{@{}cccc@{}}
b_{11}x_1 + \frac{1}{2}(b_{12} + b_{21})x_2 + \frac{1}{2}(b_{13} + b_{31})x_3 + \cdots \\
\frac{1}{2}(b_{12} + b_{21})x_1 + b_{22}x_2 + \frac{1}{2}(b_{23} + b_{32})x_3 + \cdots \\
\frac{1}{2}(b{13} + b_{31})x_1 + \frac{1}{2}(b_{23} + b_{32})x_2 + b_{33}x_3 + \cdots \\
\vdots
\end{array}\right] \\
&= x_1(b_{11}x_1 + \frac{1}{2}(b_{12} + b_{21})x_2 + \frac{1}{2}(b_{13} + b_{31})x_3 + \cdots) \;\; = b_{11}x_1^2 + b_{12}x_1x_2 + b_{13}x_1x_3 + \cdots \\
&\quad + x_2(\frac{1}{2}(b_{12} + b_{21})x_1 + b_{22}x_2 + \frac{1}{2}(b_{23} + b_{32})x_3 + \cdots) \quad + b_{21}x_2x_1 + b_{22}x_2^2 + b_{23}x_2x_3 + \cdots \\
&\quad + x_3(\frac{1}{2}(b_{13} + b_{31})x_1 + \frac{1}{2}(b_{23} + b_{32})x_2 + b_{33}x_3 + \cdots) \quad + b_{31}x_3x_1 + b_{32}x_3x_2 + b_{33}x_3^2 + \cdots
\end{align*}} In the same essence, we can always express any quadratic form using the symmetric part of a matrix. For instance, the quadratic form $x^2 - 2xy + 3y^2$ can be rewritten as
\begin{align*}
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
1 & -1 \\
-1 & 3
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix}
\end{align*}
Short Exercise: Verify the quadratic form by expanding it.\footnote{\begin{align*}
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
1 & -1 \\
-1 & 3
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix}
&=
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
x - y \\
-x + 3y
\end{bmatrix} \\
&= x(x-y) + y(-x+3y) \\
&= x^2 - xy - xy + 3y^2 = x^2 - 2xy + 3y^2
\end{align*}} \par
Even if we are given $\vec{x}^TA\vec{x}$ where $A$ is not symmetric to begin with, we can extract the symmetric part of $A$ (see Exercise \ref{ex:symskew}), that is, $B = \frac{1}{2}(A + A^T)$. Reconstruction of the quadratic form by $\vec{x}^TB\vec{x}$ will still be equivalent to the original $\vec{x}^TA\vec{x}$:
\begin{align*}
\vec{x}^TB\vec{x} &= \frac{1}{2}\vec{x}^T(A + A^T)\vec{x} \\
&= \frac{1}{2}\vec{x}^TA\vec{x} + \frac{1}{2}\vec{x}^TA^T\vec{x} \\
&= \frac{1}{2}\vec{x}^TA\vec{x} + \frac{1}{2}(\vec{x}^TA^T\vec{x})^T & \begin{aligned}
\text{($\vec{x}^TA^T\vec{x} = (\vec{x}^TA^T\vec{x})^T$ since it is just} \\
\text{a scalar in a $1 \times 1$ singleton block)}
\end{aligned} \\
&= \frac{1}{2}\vec{x}^TA\vec{x} + \frac{1}{2}\vec{x}^TA\vec{x} & \text{(Properties \ref{proper:transp})} \\
&= \vec{x}^TA\vec{x}
\end{align*}
Similarly the skew-symmetric part does not contribute anything to the quadratic form\footnote{$\frac{1}{2}\vec{x}^T(A - A^T)\vec{x} = \frac{1}{2}\vec{x}^TA\vec{x} - \frac{1}{2}\vec{x}^TA^T\vec{x} = \frac{1}{2}\vec{x}^TA\vec{x} - \frac{1}{2}(\vec{x}^TA^T\vec{x})^T = \frac{1}{2}\vec{x}^TA\vec{x} - \frac{1}{2}\vec{x}^TA\vec{x} = 0$}, and we will characterize all quadratic forms using symmetric matrices henceforth.
Such a symmetric matrix $B$ is also frequently considered to behave as a \index{Bilinear Form}\index{Symmetric Bilinear Form}\keywordhl{symmetric bilinear form} in a more general setting.
\begin{defn}[(Symmetric) Bilinear Form]
A real bilinear form $B(\vec{x}, \vec{y}): \mathcal{V} \times \mathcal{V} \to \mathbb{R}$ over a real vector space $\mathcal{V}$ takes two vectors $\vec{x}$, $\vec{y} \in \mathcal{V}$ from it and returns a real scalar. A symmetric bilinear form is the one that satisfies $B(\vec{x}, \vec{y}) = B(\vec{y}, \vec{x})$ for any pair of $\vec{x}$ and $\vec{y}$. For finite-dimensional cases, the expression becomes
\begin{align*}
B(\vec{x}, \vec{y}) = \vec{x}^T B \vec{y}
\end{align*}
where $B^T = B$ is an $n \times n$ real symmetric matrix, where $n$ is the dimension of $\mathcal{V}$.
\end{defn}
It is not hard to see that the form $\vec{x}^TB\vec{y}$ fulfills the requirement of $B(\vec{x}, \vec{y}) = B(\vec{y}, \vec{x})$.\footnote{$B(\vec{x}, \vec{y}) = \vec{x}^TB\vec{y} = (\vec{x}^TB\vec{y})^T = \vec{y}^TB^T\vec{x} = \vec{y}^TB\vec{x} = B(\vec{y}, \vec{x})$, where $\vec{x}^TB\vec{y} = (\vec{x}^TB\vec{y})^T$ as it is just a real number.} It is also not hard to see that the pattern of quadratic forms proposed in Properties \ref{proper:quadformmatrix} can be derived from a symmetric bilinear form as we set $\vec{y} = \vec{x}$. In this sense, we say that a quadratic form is characterized by the symmetric bilinear form induced by the appropriate symmetric matrix $B$.
\subsection{(Semi)Definiteness and Congruence}
\label{subsection:definiteness}
An important attribute of quadratic forms is their \index{Definiteness}\index{Semidefiniteness}\keywordhl{(semi)definiteness}. If a quadratic form $Q(\vec{x})$ is \index{Positive-Definite}\index{Negative-Definite}\keywordhl{positive/negative-definite}, it means that it always outputs positive/negative numbers no matter what the input vector $\vec{x}$ is, as long as $\vec{x} \neq \textbf{0}$ is a non-zero vector. Semidefiniteness relaxes the restriction such that the quadratic form can also return zero for some non-zero $\vec{x}$, in other words, a positive(negative)-semidefinite quadratic form always gives non-negative (non-positive) numbers. Now we will show that definiteness is related to the eigenvalues of the symmetric matrix that characterizes the quadratic form.
\begin{defn}
\label{defn:quaddefinite}
For any real quadratic form $Q(x) = \vec{x}^T B\vec{x}$, $Q$ (or $B$) is called
\begin{enumerate}[label=(\alph*)]
\item positive-definite, if for any $\vec{x} \neq \vec{0}$, $\vec{x}^T B\vec{x} > 0$ (positive-semidefinite if $\vec{x}^T B\vec{x} \geq 0$),
\item negative-definite, if for any $\vec{x} \neq \vec{0}$, $\vec{x}^T B\vec{x} < 0$ (negative-semidefinite if $\vec{x}^T B\vec{x} \leq 0$),
\item indefinite if $\vec{x}^T B\vec{x}$ can take both positive and negative values,
\end{enumerate}
\end{defn}
\begin{thm}
\label{thm:quaddefinite}
The quadratic form $Q(x) = \vec{x}^T B\vec{x}$ where $B$ is real symmetric, is
\begin{enumerate}[label=(\alph*)]
\item positive definite, if and only if all eigenvalues of $B$ are positive (positive semi-definite if and only if all eigenvalues of $B$ are non-negative),
\item negative definite, if and only if all eigenvalues of $B$ are negative (negative semi-definite if and only if all eigenvalues of $B$ are non-positive),
\item indefinite when there are both positive and negative eigenvalues for $B$.
\end{enumerate}
\end{thm}
\begin{proof}
We will only show the positive definite part, but the other cases essentially follow the same logic. Since we are given a symmetric matrix, the Spectral Theorem (Theorem \ref{thm:spectral}) naturally comes off as handy. Assume we are working in an $n$-dimensional real vector space $\mathcal{V}$. Part (d) of the Spectral Theorem shows that any vector $\vec{x} \in \mathcal{V}$ can be rewritten into $\vec{x} = \vec{x}_{J_1} + \vec{x}_{J_2} + \cdots + \vec{x}_{J_k}$ where each of $\vec{x}_{J_i} \in \mathcal{E}_{J_i}$ belongs to the respective eigenspace of $B$. Then as in the derivation for part (e) of the Spectral Theorem, we have
\begin{align*}
B\vec{x} &= \lambda_{J_1}\vec{x}_{J_1} + \lambda_{J_2}\vec{x}_{J_2} + \cdots + \lambda_{J_k}\vec{x}_{J_k}
\end{align*}
Subsequently,
\begin{align*}
Q(x) &= \vec{x}^T B\vec{x} \\
&= (\vec{x}_{J_1} + \vec{x}_{J_2} + \cdots + \vec{x}_{J_k}) \cdot (\lambda_{J_1}\vec{x}_{J_1} + \lambda_{J_2}\vec{x}_{J_2} + \cdots + \lambda_{J_k}\vec{x}_{J_k}) \\
&= \lambda_{J_1}(\vec{x}_{J_1} \cdot \vec{x}_{J_1}) + \lambda_{J_2}(\vec{x}_{J_2} \cdot \vec{x}_{J_2}) + \cdots + \lambda_{J_k}(\vec{x}_{J_k} \cdot \vec{x}_{J_k}) \\
&= \lambda_{J_1}\norm{\vec{x}_{J_1}}^2 + \lambda_{J_2}\norm{\vec{x}_{J_2}}^2 + \cdots + \lambda_{J_k}\norm{\vec{x}_{J_k}}^2
\end{align*}
where $\vec{x}_{J_i} \cdot \vec{x}_{J_i'} = 0$ whenever $i \neq i'$ (Properties \ref{proper:symortho}) so we have the second to third line. Now, since for any $\vec{x}$, all the square length quantities $\norm{\vec{x}_{J_i}}^2 \geq 0$ are greater than or equal to $0$, if all $\lambda_{J_i} > 0$ are positive and $\vec{x}$ is not a zero vector such that at least one of the $\norm{\vec{x}_{J_i}}^2 > 0$ is positive, then the quadratic form $Q(x) > 0$ will always take a positive value. The converse can be established by considering the contrapositive and following the same line of reasoning.
\end{proof}
\begin{exmp}
Show that the symmetric matrix (WIP, to $3 \times 3$)
\begin{align*}
B =
\begin{bmatrix}
4 & 1 \\
1 & 4
\end{bmatrix}
\end{align*}
is positive-definite.
\end{exmp}
\begin{solution}
By Theorem \ref{thm:quaddefinite}, we simply check whether all of its eigenvalues are positive. Its characteristic polynomial
\begin{align*}
\det(B-\lambda I) =
\begin{vmatrix}
4 - \lambda & 1 \\
1 & 4 - \lambda
\end{vmatrix}
&= (4-\lambda)(4-\lambda) - (1)(1) \\
&= (16 - 8\lambda + \lambda^2) - 1 \\
&= 15 - 8\lambda + \lambda^2 = (3-\lambda)(5-\lambda)
\end{align*}
has $\lambda = 3,5$ as its roots which are both positive. Hence $B$ is positive-definite. We can double-check by the explicit method of completing the square:
\begin{align*}
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
4 & 1 \\
1 & 4
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix}
&= 4x^2 - 2xy + 4y^2 \\
&= (x^2 - 2xy + y^2) + 3x^2 + 3y^2 \\
&= (x-y)^2 + 3x^2 + 3y^2 > 0
\end{align*}
which is always positive long as $x$ and $y$ are not both zeros.
\end{solution}
Since it has been shown that symmetric matrices can undergo orthogonal diagonalization, which is essentially a change of coordinates to make the matrix representation of a linear operator diagonal, we may ask in general how a coordinate transformation works for a quadratic/symmetric bilinear form. However, bear in mind that the previous way of changing coordinates (Properties \ref{proper:endomorph}, $A' = P^{-1}AP$) is based on regarding the matrix to be a linear transformation, and hence it is reasonable that the rule of coordinate transformation will be somehow different when the matrix acts like a quadratic form instead. Let's take a step back and consider the change of coordinates for a vector in Theorem \ref{thm:bijectivechincoord}: $[\vec{x}]_\beta = P_{\beta'}^\beta [\vec{x}']_{\beta'}$, hence
\begin{align*}
Q(\vec{x}) &\equiv [\vec{x}]_\beta^T B [\vec{x}]_\beta \\
&= (P_{\beta'}^\beta [\vec{x}']_{\beta'})^T B (P_{\beta'}^\beta [\vec{x}']_{\beta'}) \\
&= [\vec{x}']_{\beta'}^T ((P_{\beta'}^\beta)^T B P_{\beta'}^\beta) [\vec{x}']_{\beta'} = [\vec{x}']_{\beta'}^T B' [\vec{x}']_{\beta'}
\end{align*}
so we identify the coordinate transformation of a quadratic form by $B' = P^TBP$ where $P$ is some invertible coordinate basis matrix. In this case, $B'$ and $B$ are known as \index{Congruent}\keywordhl{congruent}.
\begin{defn}
\label{defn:coordtransquad}
The coordinate transformation of a symmetric matrix $B$ as a quadratic form follows
\begin{align*}
B' = P^TBP
\end{align*}
where $P$ is invertible and consists of column vectors in the new basis expressed relative to the old basis. Any pair of $B'$ and $B$ related in this way are referred to as \textit{congruent}.
\end{defn}
Fortunately, previously during orthogonal diagonalization, we have $P^{-1} = P^T$ and hence the corresponding coordinate transformation of a symmetric matrix by either treating it as a linear operator or quadratic form coincides. Hence the quadratic form $B$ can be transformed into (and congruent to) a diagonal matrix $D = P^TBP$ where the columns of $P$ are all the orthonormal eigenvectors of $B$ (Properties \ref{proper:orthobasissym})\footnote{Notice that this is not the only way to make a quadratic form diagonal (unlike a linear transformation, dictated by its eigenvectors) and there exist many other $P$ that can do it. (However, an important observation about them is Theorem \ref{thm:sylvester}, to be introduced soon.)}. Furthermore, such a diagonal matrix $D$ contains the eigenvalues of $B$ and let's say $r$ of them are positive: $\lambda_1^+, \lambda_2^+, \cdots, \lambda_r^+$, $s$ of them are negative $\lambda_1^-, \lambda_2^-, \cdots, \lambda_s^-$, and the remaining eigenvalues are zeros. Arranging them in the order of positive-negative-zero, and via an extra real diagonal factor matrix $F$, where
\begin{align*}
F_{kk} =
\begin{cases}
\frac{1}{\sqrt{\lambda_{k}^+}} & 1 \leq k \leq r \\
\frac{1}{\sqrt{-\lambda_{k}^-}} & r+1 \leq k \leq r+s \\
1 & r+s+1 \leq k \leq n
\end{cases}
\end{align*}
it is easy to see that $F$ is invertible and we can further transform the quadratic form into
\begin{align*}
C &= F^TDF \\
&= \left[\begin{smallmatrix}
\frac{1}{\sqrt{\lambda_{1}^+}} & & 0 & & & & &\\
& \ddots & & & & & & \\
0 & & \frac{1}{\sqrt{\lambda_{r}^+}} & 0 & & & &\\
& & 0 & \frac{1}{\sqrt{-\lambda_{1}^-}} & & 0 & & \\
& & & & \ddots & & & \\
& & & 0 & & \frac{1}{\sqrt{-\lambda_{s}^-}} & 0 & \\
& & & & & 0 & 1 & 0\\
& & & & & & 0 & \ddots
\end{smallmatrix}\right]^T
\left[\begin{smallmatrix}
\lambda_1^+ & & 0 & & & & &\\
& \ddots & & & & & & \\
0 & & \lambda_{r}^+ & 0 & & & &\\
& & 0 & \lambda_{1}^- & & 0 & & \\
& & & & \ddots & & & \\
& & & 0 & & \lambda_{s}^- & 0 & \\
& & & & & 0 & 0 & 0\\
& & & & & & 0 & \ddots
\end{smallmatrix}\right] \\
& \quad \left[\begin{smallmatrix}
\frac{1}{\sqrt{\lambda_{1}^+}} & & 0 & & & & &\\
& \ddots & & & & & & \\
0 & & \frac{1}{\sqrt{\lambda_{r}^+}} & 0 & & & &\\
& & 0 & \frac{1}{\sqrt{-\lambda_{1}^-}} & & 0 & & \\
& & & & \ddots & & & \\
& & & 0 & & \frac{1}{\sqrt{-\lambda_{s}^-}} & 0 & \\
& & & & & 0 & 1 & 0\\
& & & & & & 0 & \ddots
\end{smallmatrix}\right] \\
&=
\begin{bmatrix}
1 & & 0 & & & & &\\
& \ddots & & & & & & \\
0 & & 1 & 0 & & & &\\
& & 0 & -1 & & 0 & & \\
& & & & \ddots & & & \\
& & & 0 & & -1 & 0 & \\
& & & & & 0 & 0 & 0\\
& & & & & & 0 & \ddots
\end{bmatrix} =
\begin{bmatrix}
I_r & & \\
& -I_s & \\
& & [\textbf{0}]
\end{bmatrix}
\end{align*}
which is known as the \index{Canonical Quadratic Form}\keywordhl{canonical quadratic form} for $B$. To summarize, the matrix product $PF$ converts $B$ into such a form by $(PF)^TB(PF) = F^T (P^T BP)F = F^TDF = C$ and therefore $B$ is congruent to its canonical quadratic form. The following theorem shows that two different canonical quadratic forms cannot be congruent and hence the canonical quadratic form of any matrix is unique.
\begin{thm}
\label{thm:prepsylvester}
Two canonical quadratic forms of the same extent $n$
\begin{align*}
C &=
\begin{bmatrix}
I_r & & \\
& -I_s & \\
& & [\textbf{0}]
\end{bmatrix} &
& C' = \begin{bmatrix}
I_{r'} & & \\
& -I_{s'} & \\
& & [\textbf{0}]
\end{bmatrix}
\end{align*}
are congruent if and only if $r = r'$ and $s = s'$. As a corollary, this shows the uniqueness of the canonical quadratic form for a quadratic form.
\end{thm}
\begin{proof}
The "if" part is trivial. For the "only if" part, without loss of generality, assume $r' > r$. If the congruence relation $C' = P^TCP$ has to hold where $P$ is some matrix, then consider
\begin{align*}
\textbf{e}^{(j)T}C'\textbf{e}^{(j)} &= 1 > 0 & \text{ where $1 \leq j \leq r'$}
\end{align*}
which is also equal to
\begin{align*}
\textbf{e}^{(j)T}P^TCP\textbf{e}^{(j)} = \textbf{p}^{(j)T}C\textbf{p}^{(j)}
\end{align*}
where $P = \begin{bmatrix}
\textbf{p}^{(1)} | \cdots | \textbf{p}^{(n)}
\end{bmatrix}$ is consisted of $n$ column vectors $\textbf{p}^{(j)}$ and $P\textbf{e}^{(j)} = \textbf{p}^{(j)}$. We claim that there exists a non-trivial linear combination of $\textbf{p}^{(j)}$, $1 \leq j \leq r'$, i.e.\ $\textbf{q} = \sum_{j=1}^{r'} c_j\textbf{p}^{(j)}$, such that $\textbf{q}_i = 0$ for $1 \leq i \leq r$.\footnote{The corresponding system is
\begin{align*}
\begin{bmatrix}
\textbf{p}_1^{(1)} & \cdots &\textbf{p}_1^{(r)} & \cdots & \textbf{p}_1^{(r')} \\
\vdots & & \vdots & & \vdots \\
\textbf{p}_r^{(1)} & \cdots & \textbf{p}_r^{(r)} & \cdots & \textbf{p}_r^{(r')} \\
\vdots & & \vdots & & \vdots \\
\textbf{p}_n^{(1)} & \cdots & \textbf{p}_n^{(r)} & \cdots & \textbf{p}_n^{(r')}
\end{bmatrix}
\begin{bmatrix}
c_1 \\
\vdots \\
c_r \\
\vdots \\
c_{r'}
\end{bmatrix} =
\begin{bmatrix}
0 \\
\vdots \\
0 {\scriptsize \text{ (the $r$-th entry)}}\\
*
\end{bmatrix}
\end{align*}
The part below the $r$-th row does not matter since the constraints are for the first $r$ rows and so it is effective a $r \times r'$ underdetermined homogeneous linear system. By the discussion in Section \ref{subsection:SolLinSysGauss}, we know that there will be non-trivial solutions for the $c_j$, $1 \leq j \leq r'$.} Subsequently, consider $\textbf{x} = \sum_{j=1}^{r'} c_j \textbf{e}^{(j)}$, and
\begin{align*}
\textbf{x}^T C'\textbf{x} &=
\begin{bmatrix}
c_1 & \cdots & c_r' & 0 & \cdots
\end{bmatrix}
\begin{bmatrix}
I_{r'} & & \\
& -I_{s'} & \\
& & [\textbf{0}]
\end{bmatrix}
\begin{bmatrix}
c_1 \\
\vdots \\
c_r' \\
0 \\
\vdots
\end{bmatrix} \\
&= c_1^2 + \cdots + c_r'^2 = \sum_{j=1}^{r'} c_j^2 > 0
\end{align*}
but also $P\textbf{x} = \sum_{j=1}^{r'} c_j P\textbf{e}^{(j)} = \sum_{j=1}^{r'} c_j \textbf{p}^{(j)} = \textbf{q}$, thus similarly
\begin{align*}
\textbf{x}^T P^T C P\textbf{x} &= (P\textbf{x})^T C P\textbf{x} \\
&= \textbf{q}^T C \textbf{q} \\
&=
\begin{bmatrix}
0 & \cdots & 0 {\scriptsize \text{ (the $r$-th entry)}} & *
\end{bmatrix}
\begin{bmatrix}
I_r & & \\
& -I_s & \\
& & [\textbf{0}]
\end{bmatrix}
\begin{bmatrix}
0 \\
\vdots \\
0 {\scriptsize \text{ (the $r$-th entry)}} \\
*
\end{bmatrix} \leq 0
\end{align*}
Hence $0 < \textbf{x}^T C'\textbf{x} = \textbf{x}^T P^T CP\textbf{x} \leq 0$ which is a contradiction, and it must be that $r' = r$.\footnote{The same argument in opposite direction will show that it is also not possible to have $r' < r$.} By the same logic, we have $s = s'$ as well.
\end{proof}
An immediate result from this is \index{Sylvester's Law of Inertia}\keywordhl{Sylvester's Law of Inertia}.
\begin{thm}[Sylvester's Law of Inertia]
\label{thm:sylvester}
All diagonalized representations of any quadratic form have the same numbers of positive, negative, and zero diagonal entries. They are collectively known as the \index{Signature}\keywordhl{signature} of the quadratic form. Furthermore, if two diagonalized quadratic forms have the same signature, they are congruent.
\end{thm}
\begin{proof}
If two diagonalized representations of a quadratic form have different signatures, then they can be transformed into two canonical quadratic forms with those two sets of signatures using suitable factor matrices as introduced previously. However, this violates the uniqueness of canonical quadratic form in Theorem \ref{thm:prepsylvester}, and hence the two diagonalized representations of a quadratic form must have the same signature. The last statement follows as they will have the same canonical quadratic form and both are congruent to it.
\end{proof}
\begin{exmp}
Show that
\begin{align*}
B &= \begin{bmatrix}
2 & 3 \\
3 & 0
\end{bmatrix}
& \text{and} &
& B' =
\begin{bmatrix}
1 & 0 \\
0 & -2
\end{bmatrix}
\end{align*}
are congruent.
\end{exmp}
\begin{solution}
By Sylvester's Law of Inertia above, we simply count if the two (symmetric) quadratic forms have the same numbers of positive/negative/zero eigenvalues as they will be the diagonal entries when converted via orthogonal diagonalization. The eigenvalues of $B$ are found by
\begin{align*}
\det(B - \lambda I) =
\begin{vmatrix}
2 - \lambda & 3 \\
3 & - \lambda
\end{vmatrix} &= 0 \\
(2-\lambda)(-\lambda) - (3)(3) = -9 - 2\lambda + \lambda^2 &= 0
\end{align*}
whose solution is
\begin{align*}
\lambda &= \frac{-(-2) \pm \sqrt{(-2)^2 - 4(1)(-9)}}{2} \\
&= 1 \pm \sqrt{10}
\end{align*}
so there will be one positive and negative eigenvalue for $B$. It is obvious that the eigenvalues for $B'$ are $\lambda = 1, -2$ so one of them is positive and another is negative as well. Therefore, they are congruent.\footnote{One possible choice of $P$ as in $B' = P^TBP$ is $P =
\begin{bmatrix}
\frac{1}{\sqrt{2}}&-1\\
0&\frac{2}{3}
\end{bmatrix}$.}
\end{solution}
\subsection{Conic Sections}
\label{Conic}
\keywordhl{Conic Sections} are the name given to three types of geometric curves in a two-dimensional space, \textit{ellipses/circles}, \textit{parabola} and \textit{hyperbola}. The name originates from the fact that they can be obtained by intersecting a plane with a \textit{double cone}. They can be described by the general equation form as below.
\begin{center}
% Copyleft 2015 | Ridlo W. Wibowo
\begin{tikzpicture}
\draw[black] (-3,3) ellipse (2 and 0.5);
\draw[black] (3,3) ellipse (2 and 0.5);
\draw[] (-4.95,2.88) -- (-1.05,-2.88);
\draw[] (-4.95,-2.88) -- (-1.05,2.88);
\draw[] (4.95,2.88) -- (1.05,-2.88);
\draw[] (4.95,-2.88) -- (1.05,2.88);
\draw[blue, thick] (-3,-0.6) ellipse (0.4 and 0.1);
\draw[] (-3.8,-0.8) -- (-2.2,-0.8);
\draw[] (-3.8,-0.8) -- (-3.4,-0.4);
\draw[] (-2.2,-0.8) -- (-2.6,-0.4);
\draw[] (-3.4,-0.4) -- (-3.3,-0.4);
\draw[] (-2.6,-0.4) -- (-2.7,-0.4);
\draw[dashed] (-2.7,-0.4) -- (-3.3,-0.4);
\node[] at (-1.8,-0.4) {\textcolor{blue}{Circle}};
\draw[rotate around={20:(-3.26,-1.72)}, red, thick] (-3.26,-1.72) ellipse (1.2 and 0.2);
\draw[] (-4.5,-2.68) -- (-1.5,-1.5);
\draw[] (-4.5,-2.68) -- (-5,-2.06);
\draw[] (-5,-2.06) -- (-4.1,-1.7);
\draw[] (-1.5,-1.5) -- (-2,-0.88);
\draw[] (-2,-0.88) -- (-2.3,-1);
\draw[dashed] (-2.3,-1) -- (-4.1,-1.7);
\node[] at (-1.6,-2) {\textcolor{red}{Ellipse}};
\draw[Green, thick] (-4,3.4) .. controls (-4.5,1.2) and (-4.2,1.5) .. (-3,2.5);
\draw[dotted, Green, thick] (-4,3.4) -- (-3,2.5);
\draw[] (-2.6,2.45) -- (-3.8,-0.15);
\draw[] (-2.6,2.45) -- (-4.1,3.8) coordinate (A);
\draw[] (-3.8,-0.15) -- (-5.3,1.1) coordinate (B);
\draw[dashed] (A) -- (B);
\draw[] (A) -- (-4.28,3.4);
\draw[] (B) -- (-4.7,2.45);
\node[] at (-2.7,1.7) {\textcolor{Green}{Parabola}};
\draw[] (4.5,-3.8) coordinate (A) -- (4.5,2.8) coordinate (B);
\draw[dashed] (A) -- (3.2,-2.63) coordinate (C);
\draw[] (B) -- (3.2,3.97) coordinate (D);
\draw[dashed] (C) -- (D);
\draw[purple, thick] (4.3,-3.4) coordinate (E) .. controls (4,-0.9) and (3.8,-0.6) .. (3.3,-2.5) coordinate (F);
\draw[dotted, purple, thick] (E) -- (F);
\draw[purple, thick] (4.3,2.6) coordinate (G) .. controls (4,1) and (3.8,0.7) .. (3.3,3.5) coordinate (H);
\draw[dotted, purple, thick] (G) -- (H);
\draw[] (A) -- (4.1,-3.44);
\node[] at (5,1) {\textcolor{purple}{Hyperbola}};
\pgfpathmoveto{\pgfpoint{-1cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{0}{\pgfpoint{-3cm}{-3.5cm}}
\pgfpathmoveto{\pgfpoint{-5cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{-1}{\pgfpoint{-3cm}{-3.5cm}}
\pgfstroke
\pgfsetdash{{3pt}{3pt}}{0pt}
\pgfpathmoveto{\pgfpoint{-1cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{-1}{\pgfpoint{-3cm}{-2.5cm}}
\pgfpathmoveto{\pgfpoint{-5cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{0}{\pgfpoint{-3cm}{-2.5cm}}
\pgfstroke
\pgfsetdash{{3pt}{0pt}}{0pt}
\pgfpathmoveto{\pgfpoint{1cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{-1}{\pgfpoint{3cm}{-3.5cm}}
\pgfpathmoveto{\pgfpoint{5cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{0}{\pgfpoint{3cm}{-3.5cm}}
\pgfstroke
\pgfsetdash{{3pt}{3pt}}{0pt}
\pgfpathmoveto{\pgfpoint{1cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{0}{\pgfpoint{3cm}{-2.5cm}}
\pgfpathmoveto{\pgfpoint{5cm}{-3cm}}
\pgfpatharcto{2cm}{0.5cm}{0}{0}{-1}{\pgfpoint{3cm}{-2.5cm}}
\pgfstroke
\end{tikzpicture}\\
\textit{(Adapted from the code of Ridlo W. Wibowo)}
\end{center}
\begin{defn}[Conic Sections]
\label{defn:conic}
Conic Sections (circles, ellipses, parabola, hyperbola) are the curves generated by a second-degree polynomial in two variables $(x, y)$ that take the general form of
\begin{align*}
ax^2 + bxy + cy^2 + mx + ny - h = 0
\end{align*}
where $a$, $b$, $c$, $m$, $n$ and $h$ are all constants. It can be expressed as a quadratic form:
\begin{align*}
\textbf{x}^T B\textbf{x} =
\begin{bmatrix}
x & y & 1
\end{bmatrix}
\begin{bmatrix}
a & \frac{b}{2} & \frac{m}{2} \\
\frac{b}{2} & c & \frac{n}{2} \\
\frac{m}{2} & \frac{n}{2} & -h
\end{bmatrix}
\begin{bmatrix}
x \\
y \\
1
\end{bmatrix} = 0
\end{align*}
where $\textbf{x}^T = (x,y,1)^T$.
\end{defn}
To see what type of conic sections a quadratic form represents, we can examine the determinants of $B$ and its minor $B_{33}$. We simply state the results below.
\begin{proper}
\label{proper:quadgentype}
The quadratic form constructed in Definition \ref{defn:conic} represents a degenerate conic if $\det(B) = 0$. Otherwise, if $\det(B) \neq 0$, it indicates
\begin{itemize}
\item a hyperbola if $\det(B_{33}) < 0$;
\item a parabola if $\det(B_{33}) = 0$;
\item an ellipse if $\det(B_{33}) > 0$.
\end{itemize}
where
\begin{align*}
B_{33} =
\begin{bmatrix}
a & \frac{b}{2} \\
\frac{b}{2} & c
\end{bmatrix}
\end{align*}
In the case of an ellipse so that $\det(B_{33}) > 0$, if $a = c$ and $b = 0$, then it is further reduced to a circle.
\end{proper}
However, for simplicity, we will only discuss the \index{Central Conics}\keywordhl{central conics} where the linear terms $mx$ and $ny$ do not appear. This excludes the case of a parabola, only keeping the ellipses and hyperbola. The quadratic form then can be simplified as follows.
\begin{proper}[Central Conics]
\label{proper:quadcentraltype}
Ellipses (plus circles) and hyperbola, centered at the origin, are called \textit{central conics} and have the form of
\begin{align*}
ax^2 + bxy + cy^2 = h
\end{align*}
or can be expressed as a quadratic form of
\begin{align*}
\textbf{x}^T B_{33} \textbf{x} = h
\end{align*}
where now $\textbf{x} = (x,y)^T$ only and $B_{33}$ is as defined in Properties \ref{proper:quadgentype}.
\end{proper}
By Properties \ref{proper:quadgentype}, they can be classified by the discriminant $\Delta = b^2 - 4ac$, which is easily seen to be equal to $-4\det(B_{33})$: The discriminant is positive (negative) when the graph is a hyperbola (ellipse) and $\det(B_{33})$ is negative (positive). A zero discriminant actually represents a "parabola", but the removal of linear terms in the conic section equation reduces the parabola to degenerate straight lines. \par
Short Exercise: Identify the types of curve generated by $x^2 - xy + 2y^2 = 3$ and $x^2 + xy - y^2 = 1$.\footnote{The first one is an ellipse ($\Delta = (-1)^2 - 4(1)(2) = -7 < 0$) and the second one is a hyperbola ($\Delta = (1)^2 - 4(1)(-1) = 5 > 0$).}\par
Notice that sometimes an "ellipse" where $\det(B_{33}) > 0$ may not produce a real graph and is imaginary. (Take $x^2 + 2y^2 = -3$ as an example.) To address this, we can link the definiteness property of quadratic forms to arrive at an equivalent classification:
\begin{thm}
\label{thm:quadcentraltypealt}
Given $\textbf{x}^TB_{33}\textbf{x} = h$ as in Properties \ref{proper:quadcentraltype}, where $h$ is chosen to be $1$ for scaling, then it represents
\begin{itemize}
\item an ellipse if $B_{33}$ is positive definite,
\item a hyperbola if $B_{33}$ is indefinite,
\item no real graph if $B_{33}$ is negative definite.
\end{itemize}
\end{thm}
The above works because if the central conic is a hyperbola and $\det(B_{33})$ is negative, then from the viewpoint of orthogonal diagonalization the $2 \times 2$ $B_{33}$ matrix must have one positive and one negative eigenvalue, which by Theorem \ref{thm:quaddefinite} is the same as being indefinite. It is similar for an ellipse where $B_{33}$ being positive-definite means that its two eigenvalues are both positive and $\det(B_{33})$ is positive as well. Meanwhile, when $B_{33}$ is negative-definite, the two eigenvalues are both negative and $\det(B_{33})$ is still positive. Nevertheless, as $h$ is chosen to be $1$, the negative-definiteness means that $\textbf{x}$ has no real solution.
\begin{center}
\begin{tikzpicture}
\draw[thick, ->] (-3,0) -- (3,0) node[right]{$x$};
\draw[thick, ->] (0,-3) -- (0,3) node[above]{$y$};
\draw[Green,rotate=30] plot[domain=-1.2:1.3] ({1*cosh(\x)},{1*sinh(\x)});
\draw[Green,rotate=30] plot[domain=-1.2:1.3] ({-1*cosh(\x)},{1*sinh(\x)});
\draw[red,rotate=30] plot[domain=-3:3] ({\x},{0});
\draw[blue,rotate=-60] plot[domain=-3:3] ({\x},{0});
\end{tikzpicture}
\begin{tikzpicture}
\draw[thick, ->] (-3,0) -- (3,0) node[right]{$x$};
\draw[thick, ->] (0,-3) -- (0,3) node[above]{$y$};
\draw[Green, rotate=-30] (0,0) ellipse (2 and 1);
\draw[red,rotate=-30] plot[domain=-3:3] ({\x},{0});
\draw[blue,rotate=60] plot[domain=-3:3] ({\x},{0});
\end{tikzpicture}
\end{center}
\textit{Left: A hyperbola ($x^2 + 2\sqrt{3}xy - y^2 = 1$), Right: An ellipse ($\frac{7}{4}x^2 + \frac{3}{2}\sqrt{3}xy + \frac{13}{4}y^2 = 1$). Both of them are rotated from their standard position so that their major axis (red) and minor axis (blue) are not aligned with the $x$/$y$ axes and make an angle of 30 degrees.}\par
For example, the quadratic equation represented by the quadratic form $\textbf{x}^TB\textbf{x} = 1$, where
\begin{align*}
B &=
\begin{bmatrix}
1 & -2 \\
-2 & 3
\end{bmatrix}
\end{align*}
is just $x^2 - 4xy + 3y^2 = 1$. $B$ can be shown to have an eigenvalue of $\lambda = 2 \pm \sqrt{5}$. As $\lambda_+ = 2 + \sqrt{5} > 0$ and $\lambda_- = 2 - \sqrt{5} < 0$, $B$ is indefinite and the curves are a pair of hyperbola by Theorem \ref{thm:quadcentraltypealt}.\\
\\
The figure on the last page shows that hyperbola and ellipses can be rotated from their \textit{standard position}. The effect on a quadratic equation resulting from the coordinate transformation by an orthogonal matrix is to produce cross-product terms ($xy$ in two-dimensional cases), which can be eliminated by an inverse rotation to restore the curves so that the major and minor axes are again oriented along the $x$ and $y$ axes. If the graphs start with being tilted by an angle of $\theta$, we can make a rotation by the same angle $\theta$ but in an opposite direction to recover the standard position. It is equivalent to rotating the coordinate system by an angle of $\theta$ in the same sense as the initial tilting. The readers can refer back to Section \ref{section:orthogeometricsub} and Definition \ref{defn:coordtransquad} about the rotation of a coordinate system for a quadratic form.
\begin{exmp}
Rotate the quadratic equation $x^2 - xy + y^2 = 1$ so that the major axis lies along the $x$-axis.
\end{exmp}
First, we cast the equation into the quadratic form $\textbf{x}^T B\textbf{x} = 1$, with
\begin{align*}
B &=
\begin{bmatrix}
1 & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
\end{align*}
We first find the eigenvalues of $B$, the characteristic equation is
\begin{align*}
\begin{vmatrix}
1-\lambda & -\frac{1}{2} \\
-\frac{1}{2} & 1-\lambda
\end{vmatrix} = (1-\lambda)^2 - (-\frac{1}{2})^2 &= 0 \\
\lambda^2 - 2\lambda + \frac{3}{4} &= 0 \\
\lambda &= \frac{1}{2} \text{ or } \frac{3}{2}
\end{align*}
So by Theorem \ref{thm:quadcentraltypealt}, $A$ is positive-definite and it is an ellipse. The smaller (larger) eigenvalue corresponds to the major (minor) axis. Now we consider an orthogonal matrix $P$ to perform a rotation on the coordinate system, with the old coordinates related to the new coordinates by $\textbf{x} = P\textbf{x}'$. So the quadratic form is transformed to
\begin{align*}
(P\textbf{x}')^T B (P\textbf{x}') &= \textbf{x}'^T (P^T BP) \textbf{x}'
\end{align*}
We immediately identify $P^T BP$ as a rotation of the coordinate system for the matrix $B$, as noted by Definition \ref{defn:coordtransquad}. Section \ref{section:orthogonaldiagreal} tells us that we can deal with the cross-product terms by orthogonal diagonalization, which turns the off-diagonal entries in $B$ into zeros. The normalized eigenvectors of $B$ are found to be
\begin{align*}
&\vec{v}_\lambda = \begin{bmatrix}
\frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}}
\end{bmatrix}
\text{ for } \lambda = \frac{1}{2}
& \begin{bmatrix}
-\frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}}
\end{bmatrix}
\text{ for } \lambda = \frac{3}{2}
\end{align*}
Hence we can set
\begin{align*}
P =
\begin{bmatrix}
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix}
\end{align*}
So that
\begin{align*}
P^T BP =
\begin{bmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\
-\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix}
\begin{bmatrix}
1 & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
\begin{bmatrix}
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix}
=
\begin{bmatrix}
\frac{1}{2} & 0\\
0 & \frac{3}{2}
\end{bmatrix}
= D
\end{align*}
The new equation is seen to be $\textbf{x}'^T D\textbf{x} = 1$, or $\frac{1}{2}(x')^2 + \frac{3}{2}(y')^2 = 1$. Below are the diagrams before and after the rotation. The major (minor) axis now matches the $x$($y$)-axis as we place the smaller (larger) eigenvalue in the first (second) diagonal entry in $D$.
\begin{center}
\begin{tikzpicture}
\draw[thick, ->] (-3,0) -- (3,0) node[right](vecu){$x$};
\draw[thick, ->] (0,-3) -- (0,3) node[above]{$y$};
\draw[Green, thick, rotate=45] (0,0) ellipse ({1.2*sqrt(2)} and {1.2*sqrt(2/3)});
\draw[red, ->] (0,0) -- (2,2) node[above, xshift=10](vecv){$x'$, Major axis, $\lambda = 1/2$};
\draw[blue, ->] (0,0) -- (-1,1) node[above left, xshift=10]{$y'$, Minor axis, $\lambda = 3/2$};
\node[Green] at (2,-2) {$x^2 - xy + y^2 = 1$};
\pic[draw, ->, "$45^\circ$", angle eccentricity=1.875] {angle = vecu--0--vecv};
\end{tikzpicture} \textit{(Before rotation)} \\
\begin{tikzpicture}
\draw[red, thick, ->] (-3,0) -- (3,0) node[right](vecu){$x'$};
\draw[blue, thick, ->] (0,-3) -- (0,3) node[above]{$y'$};
\draw[Green, thick, rotate=0] (0,0) ellipse ({1.2*sqrt(2)} and {1.2*sqrt(2/3)});
\node[Green] at (2,-2) {$\frac{1}{2}(x')^2 + \frac{3}{2}(y')^2 = 1$};
\end{tikzpicture} \textit{(After rotation)}
\end{center}
The degree of tilting can be found to be exactly $\pi/4 = 45^{\circ}$, by comparing the general two-dimensional rotation matrix
\begin{align*}
\begin{bmatrix}
\cos \theta & -\sin \theta \\
\sin \theta & \cos \theta
\end{bmatrix}
\end{align*}
against $P$: $\cos \theta = \frac{1}{\sqrt{2}}$ and $\sin \theta = \frac{1}{\sqrt{2}}$ implies that $\tan \theta = 1$, and $\theta = \pi/4$. The possibility of eliminating the cross-product terms in quadratic forms is formally known as the \index{Principal Axes Theorem}\keywordhl{Principal Axes Theorem}.
\begin{thm}[Principal Axes Theorem]
For a quadratic form $\textbf{x}^TB\textbf{x}$, where $B$ is a real symmetric matrix, we can always make an orthogonal change of variables $\textbf{x}' = P^T\textbf{x}$ (or equivalently $\textbf{x} = P\textbf{x}'$) such that it turns into $\textbf{x}'^TD\textbf{x}' = \lambda_1 x_1'^2 + \lambda_2 x_2'^2 + \cdots$ which contains purely quadratic terms and no cross-product terms. The primed coordinates $\textbf{x}'$ then represent the \textit{principal axes}. $P$ is formed by the set of orthonormal column eigenvectors of $B$ and $D$ is a diagonal matrix with entries being the eigenvalues of $B$.
\end{thm}
This is simply a rephrasing of Definition \ref{defn:orthodiagonal} and Properties \ref{proper:orthobasissym}. In general, for a two-dimensional quadratic form
\begin{align*}
\begin{bmatrix}
a & \frac{b}{2} \\
\frac{b}{2} & c
\end{bmatrix}
\end{align*}
It can undergo a rotation of the coordinate system by an angle $\theta$ such that
\begin{align*}
\begin{bmatrix}
\cos \theta & \sin \theta \\
-\sin \theta & \cos \theta
\end{bmatrix}
\begin{bmatrix}
a & \frac{b}{2} \\
\frac{b}{2} & c
\end{bmatrix}
\begin{bmatrix}
\cos \theta & -\sin \theta \\
\sin \theta & \cos \theta
\end{bmatrix}
=
\begin{bmatrix}
* & 0\\
0 & *
\end{bmatrix}
\end{align*}
the off-diagonal elements become zero. The required $\theta$ is found by expanding the L.H.S. and equating both sides, which gives
\begin{align*}
-\sin \theta (a \cos\theta + \frac{b}{2}\sin \theta) + \cos\theta (\frac{b}{2} \cos \theta + c\sin \theta) &= 0 \\
\frac{c-a}{2} \sin (2\theta) + \frac{b}{2}\cos(2\theta) &= 0 \\
\cot(2\theta) &= \frac{a-c}{b}
\end{align*}
where we have applied the familiar double-angle formulas from the first to second line.
\subsubsection{Generalizing to the Three-dimensional Space}
Since physically we are living in a three-dimensional world, it is normal to ask if we can extend the idea of geometrically quadratic shapes from two spatial axes to three. This is possible and we only need to modify the $\textbf{x}$ in the quadratic form to encompass the third axis so that $\textbf{x} = (x,y,z)^T$ and $B$ in $\textbf{x}^TB\textbf{x}$ is now a $3 \times 3$ symmetric matrix. Now the quadratic shapes include \textit{ellipsoids} and \textit{hyperboloids}, and the change of coordinates to convert them into the standard position follows the exact same orthogonal diagonalization procedure. We ask the readers to try working with them in Exercise \ref{ex:ellipsoid}.
\subsection{Hermitian Forms}
\label{section:hermform}
The concept of quadratic forms/symmetric bilinear forms can be readily promoted to complex vector spaces. Since Hermiticity is the complex equivalent of symmetricity, it is not surprising that we have the \index{Hermitian Form}\keywordhl{Hermitian forms} as the complex counterpart of symmetric bilinear forms.
\begin{defn}[Hermitian Form]
A (complex) Hermitian form $H(\vec{x}, \vec{y}): \mathcal{V} \times \mathcal{V} \to \mathbb{C}$ takes two vectors $\vec{x}$, $\vec{y} \in \mathcal{V}$ from a complex vector space and returns a complex scalar, that satisfies $H(\vec{x}, \vec{y}) = \overline{H(\vec{y}, \vec{x})}$ for any pair of $\vec{x}$ and $\vec{y}$. For finite-dimensional cases, it takes the general form of
\begin{align*}
H(\vec{x}, \vec{y}) = \textbf{x}^T H \overline{\textbf{y}}
\end{align*}
where $H^* = H$ is an $n \times n$ Hermitian matrix and $n$ is the dimension of $\mathcal{V}$.
\end{defn}
Note that the second argument is conjugated just like the complex dot product given in Definition \ref{defn:complexdotproduct}. Due to Properties \ref{proper:hermrealeig}, the eigenvalues of a Hermitian form/matrix $H$ are always real, hence the logic of Theorem \ref{thm:quaddefinite} is still valid and it can be applied when $\vec{x} = \vec{y}$. We simply note the transferred results below.
\begin{thm}
\label{thm:hermdefinite}
The Hermitian form $H(\vec{x}) = \textbf{x}^T H \overline{\textbf{x}}$ where the two input complex vectors are now identical and $H$ is Hermitian, is
\begin{enumerate}[label=(\alph*)]
\item positive definite, if and only if all eigenvalues of $H$ are positive (positive semi-definite if and only if all eigenvalues of $H$ are non-negative),
\item negative definite, if and only if all eigenvalues of $H$ are negative (negative semi-definite if and only if all eigenvalues of $H$ are non-positive),
\item indefinite when there are both positive and negative eigenvalues for $H$.
\end{enumerate}
\end{thm}
\begin{exmp}
Show that the Hermitian form characterized by
\begin{align*}
H =
\begin{bmatrix}
-\frac{3}{2}+\frac{1}{2}i & \frac{1}{2}+\frac{1}{2}i \\
\frac{1}{2}-\frac{1}{2}i & -\frac{3}{2}-\frac{1}{2}i
\end{bmatrix}
\end{align*}
is negative-definite.
\end{exmp}
\begin{solution}
The readers should check that $H$ is indeed Hermitian. By Theorem \ref{thm:hermdefinite}, we just need to show that the eigenvalues of $H$ are all negative. So we now solve the characteristic polynomial
\begin{align*}
&\quad \begin{vmatrix}
-\frac{3}{2}+\frac{1}{2}i -\lambda & \frac{1}{2}+\frac{1}{2}i \\
\frac{1}{2}-\frac{1}{2}i & -\frac{3}{2}-\frac{1}{2}i - \lambda
\end{vmatrix} \\
&= (-\frac{3}{2}+\frac{1}{2}i -\lambda)(-\frac{3}{2}-\frac{1}{2}i - \lambda) - (\frac{1}{2}-\frac{1}{2}i)(\frac{1}{2}+\frac{1}{2}i) \\
&= (\frac{5}{2} + 3\lambda + \lambda^2) - \frac{1}{2} \\
&= \lambda^2 + 3\lambda + 2 = (\lambda + 1)(\lambda + 2)
\end{align*}
which yields two negative roots $\lambda = -1, -2$ and we are done.
\end{solution}
\section{Statistics with Quadratic Form}
\subsection{Variance and Covariance}
\label{section:variancesec}
One important quantity in the world of Statistics is the \index{Variance}\keywordhl{variance} of a \index{Random Variable}\keywordhl{random variable} or \textit{time-series}. Variance can be viewed as the spread of distribution behind the random variable. The larger the variance, the more dispersed the data points are. In Earth Sciences, we often use it to quantify the variability of certain phenomena or patterns, e.g. the variance of spacetime-filtered winds can tell us how active the corresponding wave type is.
\subsubsection{Single Distribution}
We start with the simplest case, the definition of variance for the distribution of a single random variable first. Since in real life we can only do a finite amount of samplings, the variance of a random variable is always approximated and inferred from the data points if we do not know the underlying statistical distribution.
\begin{defn}
\label{defn:variance}
For a distribution $X$, with $m$ data $x_1, x_2, x_3, \ldots, x_m$, its \index{Population Variance}\keywordhl{population variance} is
\begin{align*}
\sigma^2 = \text{Var}(X) &= \frac{1}{m} ((x_1 - \mu)^2 + (x_2 - \mu)^2 + (x_3 - \mu)^2 + \cdots + (x_m - \mu)^2) \\
&= \frac{1}{m} \sum_{k=1}^m (x_k - \mu)^2
\end{align*}
where $\mu$ is the \index{Mean}\keywordhl{mean}, or \index{Expected Value}\keywordhl{expected value} of $X$, and is computed by
\begin{align*}
\mu = E(X) = \frac{1}{m} (x_1 + x_2 + x_3 + \cdots + x_m)
\end{align*}
that is, the average of all data. Hence, variance is the average of squares of differences between the data and their mean, equivalent to $E((X-\mu)^2)$.
\end{defn}
A simpler formula for computing the population variance is
\begin{align*}
\sigma^2 &= E((X-\mu)^2) \\
&= E((X-E(X))^2) \\
&= E(X^2-2XE(X)+(E(X))^2) \\
&= E(X^2) - 2E(X)E(X) + (E(X))^2 & \begin{aligned}\text{(Note that $E(X)$ is a constant}\\ \text{and $E(E(X)) = E(X)$,} \\
\text{$E(XE(X)) = E(X)E(X)$.)}\end{aligned} \\
&= E(X^2) - (E(X))^2 = E(X^2) - \mu^2
\end{align*}
As said before we always have a finite sample, to account for this, sometimes we have to use the sample variance $s^2$, which is the same as population variance but with the $\frac{1}{m}$ factor replaced by $\frac{1}{m-1}$. (Hence $s^2 = \frac{m}{m-1}\sigma^2$.) As an example, given a dataset $X$, with $5$ data $\vec{x} = (1, 3, 6, 9, 11)^T$, then their mean is
\begin{align*}
\mu = \frac{1}{5}(1 + 3 + 6 + 9 + 11) = 6
\end{align*}
and the population variance is
\begin{align*}
\sigma^2 = \frac{1}{5}((1-6)^2 + (3-6)^2 + (6-6)^2 + (9-6)^2 + (11-6)^2) = 13.6
\end{align*}
We can also use the aforementioned short-cut formula.
\begin{align*}
\sigma^2 &= E(X^2) - \mu^2 \\
&= \frac{1}{5} (1^2 + 3^2 + 6^2 + 9^2 + 11^2) - 6^2 \\
&= 49.6 - 36 \\
&= 13.6
\end{align*}
Short Exercise: Find the sample variance of $X$.\footnote{It is $s^2 = \frac{1}{5-1}((1-6)^2 + (3-6)^2 + (6-6)^2 + (9-6)^2 + (11-6)^2) = 17$.\\(Or simply compute $\frac{5}{5-1}\sigma^2$.)} \par
Note that the variance formula in Definition \ref{defn:variance} can be written as a dot product shown below.
\begin{proper}
Given a distribution $X$, with $m$ data $\vec{x} = (x_1, x_2, x_3, \cdots, x_m)^T$, and a mean of $\mu$, the population variance can be written as
\begin{align*}
\frac{1}{m} (\vec{x}'\cdot\vec{x}') = \frac{1}{m} \textbf{x}'^T \textbf{x}'
\end{align*}
where $\textbf{x}' = \vec{x}' = \vec{x} - \mu$ is the centered distribution with the mean $\mu$ removed.
\end{proper}
It is simply a matter of observing that
\begin{align*}
\text{Var}(X) &= \frac{1}{m} ((x_1 - \mu)^2 + (x_2 - \mu)^2 + \cdots + (x_m - \mu)^2) \\ &= \frac{1}{m} (x_1 - \mu, x_2 - \mu, \ldots, x_n - \mu)^T \cdot (x_1 - \mu, x_2 - \mu, \ldots, x_m - \mu)^T
\end{align*}
Notice that variance, as a sum of squares, can never be negative, and is a positive-semidefinite quantity. \par
Short Exercise: Discuss under what situation the variance will be zero.\footnote{When all data are equal (to the mean).}
\subsubsection{Linear Combination of Multiple Distributions}
Sometimes we may need to consider the "overall" distribution of the sum of multiple variables. More generally, given any linear combination of multiple ($n$) distributions, like $Z = c_1X^{(1)} + c_2X^{(2)} + \cdots + c_nX^{(n)}$, we may want to know how to compute its mean and variance. The mean will be simply
\begin{align*}
\mu_Z &= E(c_1 + c_2X^{(2)} + \cdots + c_nX^{(n)}) \\
&= c_1E(X^{(1)}) + c_2E(X^{(2)}) + \cdots + c_nE(X^{(n)}) \\
&= c_1\mu_1 + c_2\mu_2 + \cdots + c_n\mu_n
\end{align*}
where $E$ is linear and $E(X^{(j)}) = \mu_j$ is the mean of $X^{(j)}$. The variance $\text{Var}(Z)$ is a bit more complicated. First, we need to introduce the concept of \index{Covariance}\keywordhl{covariance} between any two variables, which indicates how they change together.
\begin{defn}
\label{defn:covariance}
For two distributions $X$ and $Y$ consisted of $m$ pairs of data, their \index{Population Covariance}\keywordhl{population covariance} is
\begin{align*}
\text{Cov}(X,Y) &= \frac{1}{m}((x_1-\mu_x)(y_1-\mu_y) + (x_2-\mu_x)(y_2-\mu_y)) \\
&\quad + \cdots + (x_m-\mu_x)(y_m-\mu_y)) \\
&= \frac{1}{m}\sum_{k=1}^{m} (x_k-\mu_x)(y_k-\mu_y)
\end{align*}
where $\mu_x$ and $\mu_y$ are the population means of $X$ and $Y$ respectively. It can be easily seen that $\text{Cov}(X,Y) = \text{Cov}(Y,X)$ so the order does not matter. If $\vec{x}'$ and $\vec{y}'$ are the centered data with their respective mean subtracted away, then their covariance can be denoted by a dot product as
\begin{align*}
\frac{1}{m} \vec{x}' \cdot \vec{y}' = \frac{1}{m} \textbf{x}'^T \textbf{y}'
\end{align*}
For \index{Sample Covariance}\keywordhl{sample covariance}, it is
\begin{align*}
q_{xy} = \frac{1}{m-1} \sum_{k=1}^{m} (x_k-\bar{x})(y_k-\bar{y})
\end{align*}
where the $\frac{1}{m-1}$ factor replaces $\frac{1}{m}$ as for sample variance, $\bar{x}$ and $\bar{y}$ are the \index{Sample Mean}\keywordhl{sample means} of $X$ and $Y$ which happen to have the same values as $\mu_x$ and $\mu_y$.
\end{defn}
There is also a short-cut formula very similar to that for variance:
\begin{align*}
\text{Cov}(X,Y) &= E((X-E(X))(Y-E(Y))) \\
&= E(XY - E(X)Y - XE(Y) + E(X)E(Y)) \\
&= E(XY) - E(X)E(Y) - E(X)E(Y) + E(X)E(Y) \\
&= E(XY) - E(X)E(Y)
\end{align*}
In general, if $\text{Cov}(X,Y)$ is positive (negative), it means that when $X$ increases, $Y$ tends to increase (decrease) together. Finally, a direct comparison reveals that $\text{Cov}(X,X) = \text{Var}(X)$ for any distribution $X$.
\begin{exmp}
Two time-series of measured zonal and meridional wind speeds $U$ and $V$ at a weather station, are shown in the table below.
\begin{center}
\begin{tabular}{|c|c|c|}
\hline
(in \si{\m \per \s}) & $U$ & $V$\\
\hline
1st Measurement & $4.4$ & $-3.5$ \\
\hline
2nd Measurement & $3.8$ & $-2.6$ \\
\hline
3rd Measurement & $3.3$ & $-2.7$ \\
\hline
4th Measurement & $2.8$ & $-1.4$ \\
\hline
5th Measurement & $2.9$ & $-1.2$ \\
\hline
6th Measurement & $1.7$ & $-0.8$ \\
\hline
7th Measurement & $2.1$ & $-1.1$ \\
\hline
\end{tabular}
\end{center}
Find the covariance of $U$ and $V$.
\end{exmp}
\begin{solution}
It is not hard to get $\mu_U = 3.0$ and $\mu_V = -1.9$. By Definition \ref{defn:covariance}, we have
\begin{align*}
\text{Cov}(U,V) &= \frac{1}{7} [(4.4-3.0)((-3.5)-(-1.9))+(3.8-3.0)((-2.6)-(-1.9)) \\
&\quad+(3.3-3.0)((-2.7)-(-1.9))+(2.8-3.0)((-1.4)-(-1.9)) \\
&\quad+(2.9-3.0)((-1.2)-(-1.9))+(1.7-3.0)((-0.8)-(-1.9)) \\
&\quad+(2.1-3.0)((-1.1)-(-1.9))] \\
&= \frac{-5.36}{7} = \SI{-0.77}{\square\m \per \square\s}
\end{align*}
Alternatively, the short-cut formula gives
\begin{align*}
\text{Cov}(U,V) &= E(UV) - \mu_U \mu_V \\
&= \frac{1}{7}[(4.4)(-3.5) + (3.8)(-2.6) + (3.3)(-2.7) + (2.8)(-1.4) \\
&\quad + (2.9)(-1.2) + (1.7)(-0.8) + (2.1)(-1.1)] - (3.0)(-1.9) \\
&= (-6.466) - (-5.7) = \SI{-0.77}{\square\m \per \square\s}
\end{align*}
\end{solution}
There are two take-away observations from the above example. First, if $X$ and $Y$ both have the same unit $a$, then the unit of their covariance, or the variance for each of them individually, will have a unit of $a^2$. If $Y$ has a unit of $b$ instead then their covariance will have a unit of $ab$. Also, covariance can take negative values, which is different from variance which is always non-negative.\par
Another useful measure related to variance and covariance is \index{Correlation}\keywordhl{correlation}. For two distributions $X$ and $Y$, the correlation is defined by the following formula.
\begin{defn}[Correlation]
\label{defn:correlation}
The correlation of two distributions $X$ and $Y$ is
\begin{align*}
\rho_{xy} &= \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}} \\
&= \frac{\text{Cov}(X,Y)}{\sqrt{\text{Cov}(X,X) \text{Cov}(Y,Y)}}
\end{align*}
where Var and Cov are computed as given in Definitions \ref{defn:variance} and \ref{defn:covariance}.
\end{defn}
Moreover,
\begin{proper}
The correlation between any two distributions $X$ and $Y$ falls in the range between $-1$ and $1$, i.e. $-1 \leq \rho_{xy} \leq 1$.
\end{proper}
\begin{proof}
We can rewrite the correlation using the vector notation for covariance in Definition \ref{defn:covariance}, which gives
\begin{align*}
\rho_{xy} &= \frac{(\vec{x}' \cdot \vec{y}')}{\sqrt{(\vec{x}'\cdot\vec{x}')(\vec{y}'\cdot\vec{y}')}} \\
&= \frac{(\vec{x}' \cdot \vec{y}')}{\sqrt{{\norm{\vec{x}'}}\norm{\vec{y}'}}}
\end{align*}
where $\vec{x}' = \vec{x} - \mu_x$ and $\vec{y}' = \vec{y} - \mu_y$ are centered by removing the mean from the original distributions. Observe that this quantity takes the same form as the one in the Cauchy-Schwarz Inequality (Theorem \ref{thm:CauchySch}), and by that we promptly know that $\abs{\rho_{xy}} \leq 1$.
\end{proof}
Correlation between two distributions $X$ and $Y$ indicates how their data varies together just like covariance, but normalized by their variances so that it is dimensionless and will not depend on the units used. Therefore, correlation can be considered as a standardized version of covariance that can be compared across different pairs of variables and is more interpretable. If the correlation is positive, then $X$ and $Y$ will generally increase or decrease together. On the other hand, if the correlation is negative, then when one of them increases, one of them will tend to decrease, and vice versa. The higher the magnitude of correlation, the stronger the \textit{linear} relationship. Notice the word "linear" here. If the correlation is close to zero, it simply means that there is no clear linear relationship between them, but this does not exclude the possibility of having other relationships, e.g. exponential or quadratic.\par
In the last example, $\text{Cov}(U,V) = \SI{-0.77}{\square\m \per \square\s}$, $\text{Var}(U) = \SI{0.75}{\square\m \per \square\s}$, $\text{Var}(V) = \SI{0.90}{\square\m \per \square\s}$, and $\rho_{uv} = \frac{-0.77}{\sqrt{(0.75)(0.90)}} \approx -0.94$. We have used the population variance and covariance for the computation, but they can be replaced by the sample counterparts. It may be tempting to claim that a strong negative relationship exists in this case, however, the sample size here is a bit small for this result to be meaningful. \par
Short Exercise: When will $\rho_{xy}$ take the value of $1$ (or $-1$)?\footnote{It will happen if $X$ and $Y$ have a perfect linear positive (negative) relationship so they appear as a straight line $Y = aX + b$ on the $xy$-plane, $a > 0$ ($a < 0$).}\par
We are now prepared to derive the variance formula for linear combinations of multiple variables.
\begin{proper}
\label{proper:variancemul}
For a distribution constructed by a linear combination of multiple random variables, in the form of $Z = c_1X^{(1)} + c_2X^{(2)} + \cdots + c_nX^{(n)}$, where the coefficients $\vec{c} = (c_1, c_2, \cdots, c_n)^T$ are all constants, the population variance $\text{Var}(Z)$ can be expressed as a quadratic form $\vec{c}^TQ\vec{c}$, where
\begin{align*}
Q &=
\begin{bmatrix}
\text{Cov}(X^{(1)}, X^{(1)}) & \text{Cov}(X^{(1)}, X^{(2)}) & \cdots & \text{Cov}(X^{(1)}, X^{(n)}) \\
\text{Cov}(X^{(2)}, X^{(1)}) & \text{Cov}(X^{(2)}, X^{(2)}) & \cdots & \text{Cov}(X^{(2)}, X^{(n)}) \\
\vdots & \vdots & & \vdots \\
\text{Cov}(X^{(n)}, X^{(1)}) & \text{Cov}(X^{(n)}, X^{(2)}) & \cdots & \text{Cov}(X^{(n)}, X^{(n)}) \\
\end{bmatrix} \\
&=
\begin{bmatrix}
\text{Var}(X^{(1)}) & \text{Cov}(X^{(1)}, X^{(2)}) & \cdots & \text{Cov}(X^{(1)}, X^{(n)}) \\
\text{Cov}(X^{(2)}, X^{(1)}) & \text{Var}(X^{(2)}) & \cdots & \text{Cov}(X^{(2)}, X^{(n)}) \\
\vdots & \vdots & & \vdots \\
\text{Cov}(X^{(n)}, X^{(1)}) & \text{Cov}(X^{(n)}, X^{(2)}) & \cdots & \text{Var}(X^{(n)}) \\
\end{bmatrix}
\end{align*}
is the so-called \index{Covariance Matrix}\keywordhl{covariance matrix} so that $Q_{ij} = \text{Cov}(X^{(i)}, X^{(j)})$. If $[X'] = [X'^{(1)}|X'^{(2)}|\cdots|X'^{(n)}]$ is the matrix consisted of the centered variables $X'^{(j)} = X^{(j)} - E(X^{(j)})$ in columns, then we have $Q = \frac{1}{m-1}[X']^T[X']$ where $m$ is the number of data.
\end{proper}
\begin{proof}
Let's say we have $m$ data for $Z$: $z_1, z_2, \ldots, z_m$, as well as each of the $X^{(j)}$: $x_1^{(j)}, x_2^{(j)}, \ldots, x_m^{(j)}$. Denote the mean of $X^{(j)}$ by $\mu_{j}$. Starting from the expression in Definition \ref{defn:variance}, we have
\begin{align*}
\text{Var}(Z) &= \frac{1}{m-1} \sum_{k=1}^m (z_k - \mu_z)^2 \\
&= \frac{1}{m-1} \sum_{k=1}^m (\sum_{j=1}^{n} c_jx^{(j)}_k - \sum_{j=1}^{n} c_j\mu_{j})^2 \\
&= \frac{1}{m-1} \sum_{k=1}^m (\sum_{j=1}^{n} (c_jx^{(j)}_k - c_j\mu_{j}))^2 \\
&= \frac{1}{m-1} \sum_{k=1}^m [(\sum_{i=1}^{n} c_i(x^{(i)}_k - \mu_{i}))(\sum_{j=1}^{n} c_j(x^{(j)}_k - \mu_{j}))] \\
&\quad \text{(Changing to a new dummy summation variable)} \\
&= \frac{1}{m-1} \sum_{k=1}^m (\sum_{i=1}^{n}\sum_{j=1}^{n} c_ic_j (x^{(i)}_k - \mu_{i})(x^{(j)}_k - \mu_{j})) \\
&= \sum_{i=1}^{n}\sum_{j=1}^{n} c_ic_j (\frac{1}{m-1} \sum_{k=1}^m (x^{(i)}_k - \mu_{i})(x^{(j)}_k - \mu_{j})) \\
&\quad \text{(Switching the order of summation)} \\
&= \sum_{i=1}^{n}\sum_{j=1}^{n} c_ic_j\text{Cov}(X^{(i)}, X^{(j)}) \quad \text{(Definition \ref{defn:covariance})} \\
&= \vec{c}^T Q\vec{c}
\end{align*}
\end{proof}
For the two-variable situation, it reduces to
\begin{align*}
\vec{c}^TQ\vec{c} =
\begin{bmatrix}
c_1 & c_2
\end{bmatrix}
\begin{bmatrix}
\text{Cov}(X^{(1)}, X^{(1)}) & \text{Cov}(X^{(1)}, X^{(2)}) \\
\text{Cov}(X^{(2)}, X^{(1)}) & \text{Cov}(X^{(2)}, X^{(2)})
\end{bmatrix}
\begin{bmatrix}
c_1 \\
c_2
\end{bmatrix}