ENNEC 472:  Quantitative Analysis in Earth Sciences

Spring 2009

PROBSET #4 (DUE: Th Apr 30 at start of class)

Multivariate Regression and PCA



1.  Multivariate Statistical Model for Variation over time in Atlantic Named Tropical Cyclones

 

While tropical cyclone counts are best treated as a Poisson process, Atlantic annual named tropical storm totals can approximately be treated as Gaussian. You will formulate a multivariate statistical model for named Atlantic TCs (1870-2008 data are here) in terms of three predictors: North Atlantic Main Development Region (‘MDR’) sea surface temperatures (here—use 2nd column), the DJF Nino3.4 series (here—use 2nd column) and the DJFM North Atlantic Oscillation (NAO) (here-use 2nd column). You will use the 1870-2006 interval of overlap with the TC data (for the Nino3.4 and NAO series, the years in this case reflect the ‘D’ in DJF, so that the 1870 Nino3.4 and NAO values are the ones used to predict the 1870 storm totals. Note that this is different from the convention used in problem set #3).

 

1. Which predictors are found to be statistically significant? Write down the statistical model relating Atlantic TC counts to those predictors.

 

2. Now perform ‘cross validation’ by training the statistical model on one half of the data, and seeing how well it does in predicting the other half of the data. Do this twice alternatively using the 1st and 2nd half of the data to train the model.

 

Briefly discuss your results, commenting on any issues that may impact the reliability of the conclusions that can be drawn from this analysis. You might find it useful to use the routine ‘lincor’ employed in class (note that this routine also requires the routine ‘standardize’)

 

 

2.  PCA of North Pacific Sea Level Pressure Data

 

Perform PCA on the field of winter (DJF) Gridded North Pacific Sea Level Pressure (SLP) data (5 degree latitude x 5 degree gridpoints distributed over the latitude range 20N-70N, and longitude range 120E to 240E). There are 101 years of data (winter 1898/1899 through winter 1998/1999) for each of_ 275 gridpoints contained in the data file npacslpwint.dat.

(note: there are 276 columns; the first column is the year (corresponding to the “JF” months) and the subsequent 275 columns are the 275 available gridpoint series).

 

Make sure to remove the means for all SLP series before the analysis.

 

1. For each of the three leading eigenvectors, plot the corresponding principal components (PCs) time series, and EOF spatial patterns [plot the latter as a latitude/longitude contour plot using the Matlab routine contour and the latitude/longitude coordinates for the gridpoints #1-275 contained in npacslploc.dat. You may find the program spatial.m useful for converting the EOF and lat/lon coordinate information into a format more appropriate for the contour routine. You may also use the subroutine demean.m that removes the means from each gridpoint time series, but make sure the matrix is in the right format for this routine. You might find also find the commands in this script useful for creating a map of the North Pacific for the background of your contour plot].

 

2. Use the first 6  principal components (PCs) time series determined above as predictors in a multivariate forward screening regression to determine a statistical model for the winter Nino3.4 index used in problem #1 above (recall that the series starts in winter 1870/1871 and continues through winter 2005/2006. You should only use the interval (winter 1898/1899 through winter 1998/1999) of overlap between the SLP data and Nino3.4 series.

 

Briefly discuss and interpret your results.