PCA of tropical Pacific monthly SSTA dataset (Jan 1970- Dec 2002)

Contents

Load data

clear
load SSTPac
whos
  Name           Size               Bytes  Class      Attributes

  PSSTA       2520x396            7983360  double               
  land        2520x1                 2520  logical              
  lat         2520x1                20160  double               
  latmax         1x1                    8  double               
  latmin         1x1                    8  double               
  lon         2520x1                20160  double               
  lonmax         1x1                    8  double               
  lonmin         1x1                    8  double               
  yd             1x396               3168  double               

Define dimensions

nt = length(yd);
nlat = (latmax - latmin)/2 + 1;
nlon = (lonmax - lonmin)/2 + 1;
ns = nlat*nlon; % Number of spatial gridpoints

Graphically time-step through it

figure(1)
clf
for i = 1:nt
  PSSTAi = reshape(PSSTA(:,i),nlon,nlat);
  image([lonmin lonmax],[latmin latmax],PSSTAi','CDataMapping','scaled')
  xlabel('lon')
  ylabel('lat')
  axis xy
  title(num2str(yd(i)))
  caxis([-4 4])
  colorbar
  getframe; % Used to force 'run' mode to spit out updated plot
end

Do PCA using svd

% Remove the land points before doing analysis to avoid issues with NaNs.

SST = PSSTA(~land,:);

% Remove any residual time-mean at each location.

SST = SST - mean(SST,2)*ones(1,nt);

[U,Sigma,V] = svd(SST,0);  % Always use 0 argument (economy SVD option) to
                       % prevent possible massive unneeded computation.
sizeU=size(U)  % Since ns>nt this only gives the first nt columns of U, which is
         % thus nt x nt
         % Other columns would correspond to zero singular values
sizeSig=size(Sigma)  % Also, we only get the first nt rows of S, which is thus nt x nt.
sizeV=size(V)  % This will be the full nt x nt matrix in this case.
s = diag(Sigma); % Vector of singular values

% Plot cumulative variance explained by first k modes

normsqS = sum(s.^2)         % Show sum of SVs equals
normsqSST = sum(SST(:).^2)  % sum of squares of entries of SST

plot(cumsum(s.^2)/(ns*nt),'x')  % Divide by number of entries in SST
                                % to plot a cumulative mean square variance
xlabel('Number of modes')
ylabel('Cumulative explained variance [C^2]')

varfrac1 = s(1)^2/normsqS   % Fraction of variance explained by modes 1&2
varfrac2 = s(2)^2/normsqS
sizeU =

        2261         396


sizeSig =

   396   396


sizeV =

   396   396


normsqS =

   3.7026e+05


normsqSST =

   3.7026e+05


varfrac1 =

    0.3892


varfrac2 =

    0.0926

A more efficient way to compute leading SVD modes

% Note:  For efficiency we can just compute the first two SVD modes as
% follows - now U is ns x 2, V is nt by 2, and S is diagonal 2x2.

[U,Sigma,V] = svds(SST,2);
sizeU=size(U)
sizeSig=size(Sigma)
sizeV=size(V)
sizeU =

        2261           2


sizeSig =

     2     2


sizeV =

   396     2

Extract leading space and time modes of variability

This code will work whether svd or svds was used to find U and V Columns of U give spatial patterns Columns of V give the corresponding time series Note each column of U, V has 2-norm of 1.

Uall = nan(ns,size(U,2));
Uall(~land,:) = U;

U1 = reshape(Uall(:,1),nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points
V1 = V(:,1);

subplot(2,2,1)
image([lonmin lonmax],[latmin latmax],U1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
title(['U_1; \sigma_1 = ' num2str(s(1)) ' C'])
axis xy
colorbar

subplot(2,2,2)
plot(yd,V1)
xlabel('year')
xlim([yd(1) yd(end)])
title('V_1')

Renormalize patterns and time series for easier interpretation

PC1 = -V1*sqrt(nt-1);        % PC1 = V1 renormalized to have std=1
Ut1 = -U1*s(1)/sqrt(nt-1);   % Ut1 = SST for PC1 = +1 std.
                             % With 20-20 hindsight we also flip signs
                             % of pattern and PC (which are ambiguous)
                             % to match normal conventions.
subplot(2,2,1)
image([lonmin lonmax],[latmin latmax],Ut1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
title('Pattern 1 [C]')
caxis([-1 1])
axis xy
colorbar

hold on

% Add Nino3.4 region as a box

plot([190 190 240 240 190],[-5 5 5 -5 -5],'k')
hold off

subplot(2,2,2)
plot(yd,PC1)
xlabel('year')
ylabel('standardized amplitude')
xlim([yd(1) yd(end)])
title('PC 1')

load SSTA  % SSTA = Nino 3.4 SSTA dataset from nino1.m;
           % also contains yr = corresponding decimal year
inperiod = (yr>=yd(1)) & (yr <= yd(end));
SSTAN34 = SSTA(inperiod);

R=corrcoef(PC1,SSTAN34);
R = R(2,1)  % Corr coeff of PC1 with Nino3.4 SSTA

hold on
plot(yr(inperiod),SSTAN34,'r')
legend('PC1','Nino3.4 SSTA[C]')
hold off

% Note PC1 is very similar to Nino3.4 SSTA (see nino1.html), not
% surprising since pattern 1 has a strong maximum loading in the Nino3.4
% averaging region of 120-170 W, 5S-5N.
R =

    0.8243

Linearity

% Compare patterns for strong positive and negative polarities of PC1, each
% scaled to an equivalent PC1 = 1 pattern.

ElNino = (PC1 > 1);
LaNina = (PC1 < -1);

SSTA_ElNino = mean(SST(:,ElNino'),2)/mean(PC1(ElNino));
SSTA_LaNina = mean(SST(:,LaNina'),2)/mean(PC1(LaNina));

UposPC1 = nan(ns,1);
UposPC1(~land,:) = SSTA_ElNino;
UposPC1 = reshape(UposPC1,nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points

subplot(2,2,3)
image([lonmin lonmax],[latmin latmax],UposPC1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
axis xy
caxis([-1 1])
colorbar
title('El Nino (PC1>1)')

UnegPC1 = nan(ns,1);
UnegPC1(~land,:) = SSTA_LaNina;
UnegPC1 = reshape(UnegPC1,nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points

subplot(2,2,4)
image([lonmin lonmax],[latmin latmax],UnegPC1','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
axis xy
caxis([-1 1])
colorbar
title('La Nina (PC1<-1)')

% Slight spatial pattern nonlinearity is evident, esp. for La Nina phase.

Look if PDF of PC1 is multimodal (shows clustering)

subplot(2,2,4)
hist(PC1)
xlabel('PC1')
title('Histogram of PC1')

% PDF of PC1 is slightly skewed but has a single main peak - no evidence of
% clustering

Now add plots for mode 2

U2 = reshape(Uall(:,2),nlon,nlat);  % Turn column of U into a lat/lon
                                    % matrix including nan's at land points
Ut2 = U2*s(2)/sqrt(nt-1);
Vt2 = V(:,2)*sqrt(nt-1);               % Renormalize V to have std=1

subplot(2,2,3)
image([lonmin lonmax],[latmin latmax],Ut2','CDataMapping','scaled')
xlabel('lon')
ylabel('lat')
title('Pattern 2 [C]')
caxis([-1 1])
axis xy
colorbar

subplot(2,2,4)
plot(yd,Vt2)
xlabel('year')
ylabel('standardized amplitude')
xlim([yd(1) yd(end)])
title('PC 2')

% Note patterns 1 and 2, and time series of PCs 1 and 2 are orthogonal.