Response to Reviewer Comments

Responses to Potential Misconceptions Regarding “Atlantic hurricanes and climate over the past 1,500 years”

M.E. Mann, J.D. Woodruff, J.P. Donnelly, and Z. Zhang (Nature, Aug 13, 2009).

Assertion #1: The merging together of the more recent instrumental tropical cyclone data with the proxy hurricane strike observations is not appropriate, since the sediment overwash deposits are generally recording direct major hurricane strikes. To attempt to infer basin-wide statistics from major hurricane strikes at four sites is not appropriate.

Firstly, there is no merging of the sediment overwash and historical TC count data, or in fact of any data in our analysis. We simply compare each of the three independent records (the historical TC record, the sediment-based estimate, and the statistical model estimate) over the historical interval (see inset of Figure 3 of article). For the purpose of comparison, the sediment-based record of landfalling hurricanes is normalized and centered so that is has the same scale as the other (TC count) records.

As to the appropriateness of comparing long-term datasets of landfalling hurricanes with the instrumental record of total named storms (i.e. annual TC counts), we have addressed this in detail in the article. As discussed in the article, we have made the reasonable assumption that an adequately weighted combination (accounting for relative importance of the sites as indicators of basin-wide activity—see response to assertion #2 below) of the information from a set of sites spanning the range of regions influenced by landfalling Atlantic hurricanes (not just ‘major’ hurricanes—see response to assertion #2 below), at the centennial timescales of interest to our study (where the temporal averaging allows for a relatively robust signal even from a somewhat limited set of sites—see reply to assertion #2 below) yields an overall signal that is likely to mirror basin-wide activity. In figure 3 of the article, we compared the sediment composite records directly against the observational record of basin-wide total named storms (inset of Figure 3 of article). This comparison shows that the weighted sediment composite record closely mirrors the historical basin-wide TC series at the multidecadal timescales of interest, including the multidecadal variability (i.e. the peaks during the late 19^th century, mid 20^th century, and most recent 1-2 decades, and troughs during the early 20^th century and during the 1970s/1980s). The only information that was used from the historical basin-wide TC record in the sediment composite series was the centering and scaling of series for purpose of comparison, so the similarity in the patterns of variability and trend is in no way built in to our analysis, and instead provides independent confirmation for its validity. Furthermore, because the sediments are not subject to the time-dependent observation biases and uncertainties which inevitably will always leave the historical observational TC/hurricane record in some dispute, the fact that the long-term variability in the sediment composite record, as we have shown, appears to mirror that in the observational record, serves to some degree as independent validation of the historical record itself. In short, our appropriately weighted sediment composite record is plausibly representative of total basin-wide activity on the multidecadal timescales of interest.

Assertion #2: A more appropriate comparison would be with simple arithmetic average of the historical data on major hurricane strikes for the 4 sites used. Such a record shows differences for individual years with the basin-wide record used in the manuscript.

There are several problems with this assertion. First of all, “major” hurricanes means category 3 or larger hurricanes. This would eliminate many of the hurricanes contributing to our chronology, since several of the regional sediment series that were used (the New England and Mid-Atlantic composites), are sensitive to the considerably more numerous category 2 or larger storms.

The assertion also misunderstands the nature of our estimate. We are not simply computing the total summed activity among the sites and regional composites used. Instead, we are attempting to estimate basin-wide activity from these sites/composites. Statistically, these are not the same thing. A straight average of the sites is equivalent to a ‘uniform’ weighting scheme. This is not correct, for reasons spelled out in our manuscript and Supplementary Information. Given a small number of sites, one needs to weight the sites with respect to their inverse return periods, to obtain an appropriate estimate of basin-wide activity: over a short period of time, it is possible to get more events making landfall in e.g. New England than in e.g. Puerto Rico. Yet, an understanding of hurricane climatology would tell you that in trying to estimate the true basin-wide activity, you should give more weight to a record from Puerto Rico, given that a much larger number of storms climatologically make landfall in the Caribbean than in New England, and this record therefore is more likely to inform an estimate of basin-wide trends. We have been carefully about our wording to guard against any such misinterpretation of our analysis.

Finally, the assertion, which focuses primarily on interannual timescales, misses a key point, what is sometimes referred to as the ‘ergodic hypothesis’. The point is that the information gathered at a sparse but representative set of locations over long timescales (i.e., the multidecadal and longer timescales of interest in our study) is increasingly likely to mirror the information contained in a more extensive network. The principal is that what one loses in spatial sampling, one can gain back through greater temporal sampling, to the extent that vagaries of interannual variability are essentially stochastic. Now there are legitimate reasons for why this may not apply strictly in this case (and we’re quite clear about this in the manuscript), but it certainly motivates the view that one should be comparing the records on multidecadal and longer timescales. As discussed above in our response to assertion #1 above, such a comparison shows that the sediment composite record closely tracks the total basin-wide TC record at these longer timescales—both indicate the same multidecadal pattern of variability.

Assertion #3: The study’s findings are inconsistent other studies (Landsea 2005, Vecchi and Knutson 2008) asserting that there is no statistically significant linear trend in landfalling major hurricanes for the continental U.S. In fact, there is no statistically significant linear trend in the straight arithmetic sum of historical TC counts for the regions used in the study.

See response to assertion #2 above: (1) Our composite does not reflect only “major hurricanes” (cat 3 or larger), but in fact reflect considerably more prevalent cat 2 storms for 2 of the 4 cases. (2) Our analysis is not restricted to the continental U.S. but includes the Caribbean. In fact, our analysis is based on 5 (and arguably, essentially all) major regions for landfalling Atlantic hurricanes (New England, mid-Atlantic, southeastern U.S. Atlantic, Gulf Coast, and Caribbean). (3) We are not attempting to assess the mean number of events across our domain. Indeed, we think that such a number is relatively meaningless. As discussed in some detail, we are computing a weighted average of the data which explicitly takes into account (through the estimated return periods for different sites) their appropriate weighted contribution to any estimated basin-wide average of TC activity. We have been careful in the article to make sure that this key aspect of our analysis is clear to readers. Finally, (4) whether or not there is an historical trend in historical Atlantic hurricane or TC activity, regional or otherwise, is not the primary focus of our paper. Indeed, in the first two sentences of our abstract we make clear that the existence of a trend is a matter of current scientific dispute. And in no way does the assumption of whether or not there is a modern trend in the historical data enter into our analyses. Our statistical model is trained on interannual variability, and the sediment composite data are entirely independent of the historical record. To the extent that any modern positive trends emerge in our analyses, they do so independently of whether there is a trend in the historical data. The focus of our analysis is not on the modern trends, but on the history of TC and hurricane activity prior to the historical interval, and our primary conclusion, as expressed in the abstract, is that the recent high levels of activity may indeed not be anomalous in the context of the long-term history provided by our analysis.

Assertion #4: Other studies by e.g. Chang and Guo (2007) and Vecchi and Knutson (2008) quantify the number of “missing” Atlantic TCs based upon the density of ship observations during the last century. Both studies suggested that a significant upward trend remains in the counts of TCs when starting from about 1900, although the latter paper found that the trend from 1878 onward was not significant.

Once again, this paper is not focused on whether simple linear trends fit to different time intervals are statistically significant. The model of a linear trend is inappropriate in describing the time evolution of TCs given the non-linear temporal pattern of the factors underlying long-term changes in TC activity (see e.g. activity (see e.g. Mann, M.E., Emanuel, K.A., Atlantic Hurricane Trends linked to Climate Change, Eos, 87, 24, 233-241, 2006). Issues involving the reality and significance of any linear trends is not in any case central to this paper. A more appropriate question is whether the most recent activity (i.e., that since 1995) is anomalous in the context of the historical record, and the answer to that question appears to be ‘yes’, even using the upper range of published estimates (Landsea 2007—ref #3 of our article) of the degree of undercount bias in the early part of the record (see Mann et al 2007—reference #29 of our article). But even this is not the focus of our manuscript. The focus of this manuscript is, instead, how the level of activity recorded in the modern record compares against paleo-evidence of the past 1500 years, using two entirely independent sources and approaches.

Assertion #5: Landsea et al. (2009) argue that the increase in total TC frequency since the late 19th Century in the database is primarily due to an increase in very short-lived TCs due to

improvements in the quantity and quality of observations, along with enhanced interpretation techniques, which allow storms to be better monitored and detected. When these storms are added back in, there is no statistically significant linear trend since the late 19^th century.

Though we would note that the majority of teams who have looked at the degree of undercount bias in the record find it modest-to-minimal (no more than 1 or 2 missed storms per year) at least back through the beginning of the 20^th century, and the increase in frequency over the past decade thus does appear anomalous in that context, we stress that this is not the focus of this paper. Indeed, if we assume that the historical undercount bias is at the upper end of what has been argued in the published literature (Landsea, 2007—see ref.s #3 and #29 in our article), our key conclusion (that levels of activity during the Medieval era might have equaled or even exceed current levels of activity) is actually strengthened, not weakened.

Assertion #6: Mann et al. (2007) state that their statistical modeling approach “assumes that past Atlantic TC activity continues to have been influenced by the same three basin climate factors that have primarily governed year-to-year variations in TC counts during the historical period: Tropical Atlantic warmth…”. Therefore it must assume a priori that large trend in Atlantic SSTs is linked to the trend in the Atlantic TC record for the 1940s until today.

This assertion misunderstands the nature of the statistical model. The statistical model used by Mann et al (2007---ref #3 in our article) knows nothing about the long-term trend in TCs. It is trained on the interannual (i.e. ‘year-to-year’) relationship between predictors (including MDR SST, ENSO, and the NAO) and predictand (individual annual TC counts). Any trends that emerge in model-predicted TC counts are an emergent result of the model, produced purely by the behavior of the underlying predictors. Indeed, as shown by Mann et al (2007), when the model is trained on the first half of the record, ending in the mid 1940s, it successfully predicts the subsequent rise of the past two decades. Kerry Emanuel (Emanuel et al, 2008) notes that the Mann et al (2007) statistical model exhibits similar level of skill to his dynamical downscaling approach (discussed further below in response to assertion #7). That the model successfully captures much of the long-term variability in the sediment-based hurricane history in the current article indeed appears to provide some additional long-term validation of the statistical model (though we felt no need to actually state so in the manuscript).

Assertion #7: Recent modeling studies regarding anthropogenic climate change impacts upon Atlantic TC frequency (e.g., Chauvin et al. 2006, Bengtsson et al. 2007, Emanuel et al. 2008) indicate little or no trend in TC counts in response to warming SSTs. The authors’ underlying assumptions are therefore not physically valid.

Firstly, this is not an accurate characterization of the recent literature. Emanuel et al (2008), using a particularly elegant dynamical downscaling approach finds a modest projected increase in Atlantic TC counts in response to anthropogenic forcing averaged over all models, and an especially large increase in using large-scale (such as the GFDL coupled model). In more recent work (K. Emanuel, pers. comm) performed shortly after Emanuel et al (2008) went to press, Emanuel finds, using a further refinement of the technique, an increase in frequency averaged over all the models of the IPCC AR4 assessment (SRES A1B scenario) from 13.5 to 15.1, and for the GFDL model a much larger increase from 18.9 to 26.9 (nearly 50% increase). The approach of using regional climate models fed with large-scale boundary conditions used in other studies looking at anthropogenic impacts on TCs can be quite model dependent. In some cases, the sign of the response can even be changed simply by changing the nature of model parameterizations. For example, Yoshimura et al (2006) found an increase in TC number over the Indian Ocean if the model used the Kuo cumulus parameterization but a decrease if the Arkawa-Schubert cumulus parameterization scheme was used.

What models may or may not project with regard to future climate change is in any case not necessarily relevant to interpreting historical trends, We note, for example, that Knutson et al (2008), while projecting little or no 21^st century changes in TC counts when driving their regional model with certain climate change projections, nonetheless use as a validation of their approach the fact that their regional model is able to reproduce the positive trend in Atlantic TC counts over recent decades when driven with late 20^th century reanalysis data,. In fact, the model produces a 40% larger trend then has actually been witnessed. Given the open questions that still exist with climate model-based projections of future TC activity, we find it premature at best to question observed historical trends (which is the focus of our article) based on studies of the projected future behavior.

Assertion #8: The Atlantic basin is the only one to have seen an increase in tropical cyclone frequency over the last few decades. Thus the statistical model used by the authors is not valid.

We frankly find this criticism (which has indeed been made against our study) particularly puzzling. It is unclear how the observation that increased TC counts is only seen for the Atlantic constitutes a shortcoming of our statistical model, since our model only attempts to assess the influences on TC activity that are specific to the Atlantic basin. There are aspects of the Atlantic (e.g. a particularly large area where SSTs are on the cusp of the threshold necessary for supporting TC genesis) that make it potentially unique in its response to modest increases in global SST. Indeed, the Emanuel et al (2008) theoretical modeling study shows Atlantic basin to be most sensitive, in terms of TC activity (i.e. annual TC counts), to further increases in SST.

Assertion #9: Return periods calculated from a 270 km radius, as in this study, are not appropriate for interpreting the information from the sediment overwash deposits.

We agree that the radius of influence for each site is likely less than 270 km, and therefore the return periods for overwash at each site would likely be longer than that predicted using this radius. However, these derived return periods are used only to obtain relative weights when assimilating the different records, with actual return frequencies determined by the reconstructions themselves. The radius of 270 km was chosen in order to have a large enough area for obtaining appropriate statistics using the HurRisk model, yet small enough that the return periods reflect the relative activity at a site compared to the others within the composite. In summary, results using the 270 km radius are only used to vary the relative weighting for the different records and in no way affects the reoccurrence rate of events within each reconstruction.

Assertion #10: The two different estimates of past activity (sediments and proxy-climate driven statistical model) don’t look all that similar, there are discrepancies between them.

It is true that there are discrepancies between these records. In fact, there is substantial discussion given to the discrepancies (e.g. the 15^th century peak that appears in the sediment record but not the statistical model reconstruction) and possible reasons for this, which include the caveats and limitations specific to either approach discussed in the article.

A statistical correlation is probably not the best comparison of the two estimates, as a better question is whether or not the estimates are consistent within their respective uncertainties, not whether or not all of the multidecadal wiggles in the record are the same (they are not---and some of the differences are interesting and are probably telling us something important as well). That having been said, it turns out that the correlation between the two records is nonetheless both relatively high and statistically significant.

The two series are smoothed on timescales of 40 years and longer (i.e. using a filter with passband centered at f=0.025 cycles/year) to emphasize the timescales of variability they are likely to record most reliably. The correlation between the two smoothed series during the 1350 year interval of overlap (AD 500-1849) is r=0.4387.

To evaluate the statistical significance of this correlation properly, we must first account for the degrees of freedom in the series being compared. The nominal number of effective samples n is in this case is the length of overlap (1350 years) divided by the effective sampling spacing, this gives n = 1350/40 = 34.

However, this does not account for the reduced degrees of freedom due to the autocorrelation present in each series. This is evaluated from the lagged correlation between effectively independent samples, which is a lag of 40 years for the 40 year smoothed series being compared. The effective number of samples is n' = n(1-rho1*rho2)/(1+rho1+rho2) where rho1 and rho2 are the autocorrelations of the two series at a lag of 40 years.

For the two time series in question we have rho1 = 0.4387 and rho2= 0.5559. This yields n' = 34 (1-0.4387*0.5559)/(1+0.4387*0.5559) = 34 (1-0.2438)/(1+0.2438) = 34(0.6079) = 21

The number of degrees of freedom in the correlation is the number of effective samples minus 2, so there are approximately n'-2= 19 effective degrees of freedom in the correlation.

We must therefore determine the significance of a correlation of r=0.4387 over the full 1350 year overlap with 19 statistical degrees of freedom. A one-sided hypothesis test is required to establish statistical significance, since we would reject anticorrelation of the two series as failure.

Using online lookup tables, e.g. here: http://faculty.vassar.edu/lowry/tabs.html#r (note that "N" here is 21, and the degrees of freedom "N-2" is 19), we find that the correlation of r=0.4387 is statistically significant at the p=0.02 level.

This is not too shabby. That having been said, we wouldn't place much emphasis in the correlation of the two series. We are more comfortable drawing what we feel are only the most robust inferences, e.g. that there is simultaneous evidence for a medieval peak which indeed might exceed current (1995-present) activity within our uncertainties, and that later centuries demonstrate a lull in activity prior to the recent rise.

Assertion #11: Doesn’t the inactive 2009 Atlantic tropical storm season (at least thus far, as of August 13 2009) disprove the relationships between climate and tropical cyclones argued for in the paper?

Actually, somewhat the contrary is true. Prior to the 2007 Atlantic tropical storm seasons, Mann and colleagues used the very same statistical model used in the current study to forecast the number of named storms that would occur. Their prediction (15 named storms) turned out to be spot on. Prior to the 2009 season, Mann and colleagues also made a prediction. Given the relatively cool tropical Atlantic SSTs going into the season and the possibility of a developing El Nino, they forecast that if an El Nino event indeed did emerge (which we now know it has), we would expect a total of between 6 and 12 named storms (a quite inactive season by modern standards). We see no evidence yet that this forecast is not realistic, but in a few months we’ll know for sure. Further details of the forecasts are available here.