PCA of tropical Pacific monthly SSTA dataset (Jan 1970- Dec 2002)

Load data
Define dimensions
Graphically time-step through it
Do PCA using svd
A more efficient way to compute leading SVD modes
Extract leading space and time modes of variability
Renormalize patterns and time series for easier interpretation
Linearity
Look if PDF of PC1 is multimodal (shows clustering)
Now add plots for mode 2

Load data

clear
load SSTPac
whos

  Name           Size               Bytes  Class      Attributes

  PSSTA       2520x396            7983360  double               
  land        2520x1                 2520  logical              
  lat         2520x1                20160  double               
  latmax         1x1                    8  double               
  latmin         1x1                    8  double               
  lon         2520x1                20160  double               
  lonmax         1x1                    8  double               
  lonmin         1x1                    8  double               
  yd             1x396               3168  double

Define dimensions

nt = length(yd);
nlat = (latmax - latmin)/2 + 1;
nlon = (lonmax - lonmin)/2 + 1;
ns = nlat*nlon; % Number of spatial gridpoints

Graphically time-step through it

figure(1)
clf
for i = 1:nt
  PSSTAi = reshape(PSSTA(:,i),nlon,nlat);
  image([lonmin lonmax],[latmin latmax],PSSTAi','CDataMapping','scaled')
  xlabel('lon')
  ylabel('lat')
  axis xy
  title(num2str(yd(i)))
  caxis([-4 4])
  colorbar
  getframe; % Used to force 'run' mode to spit out updated plot
end

Do PCA using svd

% Remove the land points before doing analysis to avoid issues with NaNs.

SST = PSSTA(~land,:);

% Remove any residual time-mean at each location.

SST = SST - mean(SST,2)*ones(1,nt);

[U,Sigma,V] = svd(SST,0);  % Always use 0 argument (economy SVD option) to
                       % prevent possible massive unneeded computation.
sizeU=size(U)  % Since ns>nt this only gives the first nt columns of U, which is
         % thus nt x nt
         % Other columns would correspond to zero singular values
sizeSig=size(Sigma)  % Also, we only get the first nt rows of S, which is thus nt x nt.
sizeV=size(V)  % This will be the full nt x nt matrix in this case.
s = diag(Sigma); % Vector of singular values

% Plot cumulative variance explained by first k modes

normsqS = sum(s.^2)         % Show sum of SVs equals
normsqSST = sum(SST(:).^2)  % sum of squares of entries of SST

plot(cumsum(s.^2)/(ns*nt),'x')  % Divide by number of entries in SST
                                % to plot a cumulative mean square variance
xlabel('Number of modes')
ylabel('Cumulative explained variance [C^2]')

varfrac1 = s(1)^2/normsqS   % Fraction of variance explained by modes 1&2
varfrac2 = s(2)^2/normsqS

sizeU =

        2261         396


sizeSig =

   396   396


sizeV =

   396   396


normsqS =

   3.7026e+05


normsqSST =

   3.7026e+05


varfrac1 =

    0.3892


varfrac2 =

    0.0926

A more efficient way to compute leading SVD modes

% Note:  For efficiency we can just compute the first two SVD modes as
% follows - now U is ns x 2, V is nt by 2, and S is diagonal 2x2.

[U,Sigma,V] = svds(SST,2);
sizeU=size(U)
sizeSig=size(Sigma)
sizeV=size(V)

sizeU =

        2261           2


sizeSig =

     2     2


sizeV =

   396     2

Extract leading space and time modes of variability

This code will work whether svd or svds was used to find U and V Columns of U give spatial patterns Columns of V give the corresponding time series Note each column of U, V has 2-norm of 1.

Uall = nan(ns,size(U,2));
Uall(~land,:) = U;

U1 = reshape(Uall(:,1),nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points
V1 = V(:,1);

subplot(2,2,1)
image([lonmin lonmax],[latmin latmax],U1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
title(['U_1; \sigma_1 = ' num2str(s(1)) ' C'])
axis xy
colorbar

subplot(2,2,2)
plot(yd,V1)
xlabel('year')
xlim([yd(1) yd(end)])
title('V_1')

Renormalize patterns and time series for easier interpretation

PC1 = -V1*sqrt(nt-1);        % PC1 = V1 renormalized to have std=1
Ut1 = -U1*s(1)/sqrt(nt-1);   % Ut1 = SST for PC1 = +1 std.
                             % With 20-20 hindsight we also flip signs
                             % of pattern and PC (which are ambiguous)
                             % to match normal conventions.
subplot(2,2,1)
image([lonmin lonmax],[latmin latmax],Ut1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
title('Pattern 1 [C]')
caxis([-1 1])
axis xy
colorbar

hold on

% Add Nino3.4 region as a box

plot([190 190 240 240 190],[-5 5 5 -5 -5],'k')
hold off

subplot(2,2,2)
plot(yd,PC1)
xlabel('year')
ylabel('standardized amplitude')
xlim([yd(1) yd(end)])
title('PC 1')

load SSTA  % SSTA = Nino 3.4 SSTA dataset from nino1.m;
           % also contains yr = corresponding decimal year
inperiod = (yr>=yd(1)) & (yr <= yd(end));
SSTAN34 = SSTA(inperiod);

R=corrcoef(PC1,SSTAN34);
R = R(2,1)  % Corr coeff of PC1 with Nino3.4 SSTA

hold on
plot(yr(inperiod),SSTAN34,'r')
legend('PC1','Nino3.4 SSTA[C]')
hold off

% Note PC1 is very similar to Nino3.4 SSTA (see nino1.html), not
% surprising since pattern 1 has a strong maximum loading in the Nino3.4
% averaging region of 120-170 W, 5S-5N.

R =

    0.8243

Linearity

% Compare patterns for strong positive and negative polarities of PC1, each
% scaled to an equivalent PC1 = 1 pattern.

ElNino = (PC1 > 1);
LaNina = (PC1 < -1);

SSTA_ElNino = mean(SST(:,ElNino'),2)/mean(PC1(ElNino));
SSTA_LaNina = mean(SST(:,LaNina'),2)/mean(PC1(LaNina));

UposPC1 = nan(ns,1);
UposPC1(~land,:) = SSTA_ElNino;
UposPC1 = reshape(UposPC1,nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points

subplot(2,2,3)
image([lonmin lonmax],[latmin latmax],UposPC1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
axis xy
caxis([-1 1])
colorbar
title('El Nino (PC1>1)')

UnegPC1 = nan(ns,1);
UnegPC1(~land,:) = SSTA_LaNina;
UnegPC1 = reshape(UnegPC1,nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points

subplot(2,2,4)
image([lonmin lonmax],[latmin latmax],UnegPC1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
axis xy
caxis([-1 1])
colorbar
title('La Nina (PC1<-1)')

% Slight spatial pattern nonlinearity is evident, esp. for La Nina phase.

Look if PDF of PC1 is multimodal (shows clustering)

subplot(2,2,4)
hist(PC1)
xlabel('PC1')
title('Histogram of PC1')

% PDF of PC1 is slightly skewed but has a single main peak - no evidence of
% clustering

Now add plots for mode 2

U2 = reshape(Uall(:,2),nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points
Ut2 = U2*s(2)/sqrt(nt-1);
Vt2 = V(:,2)*sqrt(nt-1);               % Renormalize V to have std=1

subplot(2,2,3)
image([lonmin lonmax],[latmin latmax],Ut2','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
title('Pattern 2 [C]')
caxis([-1 1])
axis xy
colorbar

subplot(2,2,4)
plot(yd,Vt2)
xlabel('year')
ylabel('standardized amplitude')
xlim([yd(1) yd(end)])
title('PC 2')

% Note patterns 1 and 2, and time series of PCs 1 and 2 are orthogonal.

PCA of tropical Pacific monthly SSTA dataset (Jan 1970- Dec 2002)

Contents

Load data

Define dimensions

Graphically time-step through it

Do PCA using svd

A more efficient way to compute leading SVD modes

Extract leading space and time modes of variability

Renormalize patterns and time series for easier interpretation

Linearity

Look if PDF of PC1 is multimodal (shows clustering)

Now add plots for mode 2