Responses to Potential Misconceptions Regarding
ÒAtlantic hurricanes and climate over the past 1,500 yearsÓ
M.E. Mann, J.D. Woodruff, J.P. Donnelly, and Z. Zhang
(Nature, Aug 13, 2009).
Assertion #1: The
merging together of the more recent instrumental tropical cyclone data with the
proxy hurricane strike observations is not appropriate, since the sediment
overwash deposits are generally recording direct major hurricane strikes. To
attempt to infer basin-wide statistics from major hurricane strikes at four
sites is not appropriate.
Firstly, there is no merging of the sediment
overwash and historical TC count data, or in fact of any data in our
analysis. We simply compare
each of the three independent records (the historical TC record, the
sediment-based estimate, and the statistical model estimate) over the
historical interval (see inset of Figure 3 of article). For the purpose of
comparison, the sediment-based record of landfalling hurricanes is normalized
and centered so that is has the same scale as the other (TC count) records.
As to the appropriateness of comparing long-term
datasets of landfalling hurricanes with the instrumental record of total named
storms (i.e. annual TC counts), we have addressed this in detail in the
article. As discussed in the article, we have made the reasonable assumption
that an adequately weighted combination (accounting for relative importance of
the sites as indicators of basin-wide activity—see response to assertion
#2 below) of the information from a set of sites spanning the range of regions
influenced by landfalling Atlantic hurricanes (not just ÔmajorÕ
hurricanes—see response to assertion #2 below), at the centennial
timescales of interest to our study (where the temporal averaging allows for a
relatively robust signal even from a somewhat limited set of sites—see
reply to assertion #2 below) yields an overall signal that is likely to mirror
basin-wide activity. In figure 3 of the article, we compared the sediment
composite records directly against the observational record of basin-wide total
named storms (inset of Figure 3 of article). This comparison shows that the
weighted sediment composite record closely mirrors the historical basin-wide TC
series at the multidecadal timescales of interest, including the multidecadal
variability (i.e. the peaks during the late 19th century, mid 20th
century, and most recent 1-2 decades, and troughs during the early 20th
century and during the 1970s/1980s). The only information that was used from
the historical basin-wide TC record in the sediment composite series was the
centering and scaling of series for purpose of comparison, so the similarity in
the patterns of variability and trend is in no way built in to our analysis,
and instead provides independent confirmation for its validity. Furthermore, because the sediments are
not subject to the time-dependent observation biases and uncertainties which
inevitably will always leave the historical observational TC/hurricane record
in some dispute, the fact that the long-term variability in the sediment
composite record, as we have shown, appears to mirror that in the observational
record, serves to some degree as independent validation of the historical
record itself. In short, our appropriately
weighted sediment composite record is plausibly representative of total
basin-wide activity on the multidecadal
timescales of interest.
Assertion #2: A more appropriate comparison would be with simple
arithmetic average of the historical data on major hurricane strikes for the 4
sites used. Such a record shows differences for individual years with the
basin-wide record used in the manuscript.
There are several problems with this assertion.
First of all, ÒmajorÓ hurricanes means category
3 or larger hurricanes. This would eliminate many of the hurricanes
contributing to our chronology, since several of the regional sediment series
that were used (the New England and Mid-Atlantic composites), are sensitive to
the considerably more numerous category
2 or larger storms.
The assertion also misunderstands the nature of our
estimate. We are not simply computing the total summed activity among the sites
and regional composites used. Instead, we are attempting to estimate basin-wide
activity from these sites/composites. Statistically, these are not the same
thing. A straight average of the sites is equivalent to a ÔuniformÕ weighting scheme. This is
not correct, for reasons spelled out in our manuscript and Supplementary
Information. Given a small number of sites, one needs to weight the sites with
respect to their inverse return periods, to obtain an appropriate estimate of
basin-wide activity: over a short period of time, it is possible to get more
events making landfall in e.g. New England than in e.g. Puerto Rico. Yet, an
understanding of hurricane climatology would tell you that in trying to
estimate the true basin-wide activity, you should give more weight to a record
from Puerto Rico, given that a much larger number of storms climatologically
make landfall in the Caribbean than in New England, and this record therefore
is more likely to inform an estimate of basin-wide trends. We have been
carefully about our wording to guard against any such misinterpretation of our
analysis.
Finally, the assertion, which focuses primarily on
interannual timescales, misses a key point, what is sometimes referred to as
the Ôergodic hypothesisÕ. The point is that the information gathered at a
sparse but representative set of locations over long timescales (i.e., the
multidecadal and longer timescales of interest in our study) is increasingly
likely to mirror the information contained in a more extensive network. The
principal is that what one loses in spatial sampling, one can gain back through
greater temporal sampling, to the extent that vagaries of interannual variability
are essentially stochastic. Now there are legitimate reasons for why this may
not apply strictly in this case (and weÕre quite clear about this in the
manuscript), but it certainly motivates the view that one should be comparing
the records on multidecadal and longer timescales. As discussed above in our
response to assertion #1 above, such a comparison shows that the sediment
composite record closely tracks the total basin-wide TC record at these longer
timescales—both indicate the same multidecadal pattern of variability.
Assertion #3: The studyÕs findings are inconsistent other studies
(Landsea 2005, Vecchi and Knutson 2008) asserting
that there is no statistically significant linear trend in landfalling major
hurricanes for the continental U.S.
In fact, there is no statistically significant linear trend in the
straight arithmetic sum of historical TC counts for the regions used in the
study.
See response to assertion #2 above: (1) Our
composite does not reflect only Òmajor hurricanesÓ (cat 3 or larger), but in
fact reflect considerably more prevalent cat 2 storms for 2 of the 4 cases. (2)
Our analysis is not restricted to the continental U.S. but includes the
Caribbean. In fact, our analysis
is based on 5 (and arguably, essentially all) major regions for landfalling
Atlantic hurricanes (New England, mid-Atlantic, southeastern U.S. Atlantic,
Gulf Coast, and Caribbean). (3) We are not attempting to assess the mean number
of events across our domain. Indeed, we think that such a number is relatively
meaningless. As discussed in some detail, we are computing a weighted average
of the data which explicitly takes into account (through the estimated return
periods for different sites) their appropriate weighted contribution to any
estimated basin-wide average of TC activity. We have been careful in the
article to make sure that this key aspect of our analysis is clear to readers.
Finally, (4) whether or not there is an historical trend in historical Atlantic
hurricane or TC activity, regional or otherwise, is not the primary focus of
our paper. Indeed, in the first two sentences of our abstract we make clear
that the existence of a trend is a matter of current scientific dispute. And in
no way does the assumption of whether or not there is a modern trend in the
historical data enter into our analyses. Our statistical model is trained on
interannual variability, and the sediment composite data are entirely
independent of the historical record. To the extent that any modern positive
trends emerge in our analyses, they do so independently of whether there is a
trend in the historical data. The focus of our analysis is not on the modern
trends, but on the history of TC and hurricane activity prior to the historical
interval, and our primary conclusion, as expressed in the abstract, is that the
recent high levels of activity may indeed not be anomalous in the context of
the long-term history provided by our analysis.
Assertion #4: Other studies by e.g. Chang and
Guo (2007) and Vecchi and
Knutson (2008) quantify the number of ÒmissingÓ Atlantic TCs
based upon the density of ship observations during the last century. Both
studies suggested that a significant upward trend remains in the counts of TCs when starting from about 1900, although the latter
paper found that the trend from 1878 onward was not significant.
Once again, this paper is not focused on whether
simple linear trends fit to different time intervals are statistically
significant. The model of a linear
trend is inappropriate in describing the time evolution of TCs
given the non-linear temporal pattern of the factors underlying long-term
changes in TC activity (see e.g. activity (see e.g. Mann, M.E., Emanuel, K.A.,
Atlantic Hurricane Trends linked to Climate Change, Eos, 87, 24, 233-241,
2006). Issues involving the reality and significance of any linear trends is
not in any case central to this paper. A more appropriate question is whether
the most recent activity (i.e., that since 1995) is anomalous in the context of
the historical record, and the answer to that question appears to be ÔyesÕ,
even using the upper range of published estimates (Landsea 2007—ref #3 of
our article) of the degree of undercount bias in the early part of the record
(see Mann et al 2007—reference #29 of our article). But even this is not
the focus of our manuscript. The focus of this manuscript is, instead, how the
level of activity recorded in the modern record compares against paleo-evidence
of the past 1500 years, using two entirely independent sources and approaches.
Assertion #5: Landsea et al. (2009) argue
that the increase in total TC frequency since the late 19th Century in the
database is primarily due to an increase in very short-lived TCs due to
improvements in the quantity and
quality of observations, along with enhanced interpretation techniques, which
allow storms to be better monitored and detected. When these storms are added
back in, there is no statistically significant linear trend since the late 19th
century.
Though we would note that the majority of teams who
have looked at the degree of undercount bias in the record find it
modest-to-minimal (no more than 1 or 2 missed storms per year) at least back
through the beginning of the 20th century, and the increase in
frequency over the past decade thus does appear anomalous in that context, we
stress that this is not the focus of this paper. Indeed, if we assume that the
historical undercount bias is at the upper end of what has been argued in the
published literature (Landsea, 2007—see ref.s
#3 and #29 in our article), our key conclusion (that levels of activity during
the Medieval era might have equaled or even exceed current levels of activity)
is actually strengthened, not
weakened.
Assertion #6: Mann et al. (2007) state that
their statistical modeling approach Òassumes that past Atlantic TC activity
continues to have been influenced by the same three basin climate factors that
have primarily governed year-to-year variations in TC counts during the
historical period: Tropical Atlantic warmthÉÓ. Therefore it must assume a priori that large trend in
Atlantic SSTs is linked to the trend in the Atlantic TC record for the 1940s
until today.
This assertion misunderstands the nature of the
statistical model. The statistical model used by Mann et al (2007---ref #3 in
our article) knows nothing about the long-term trend in TCs.
It is trained on the interannual (i.e. Ôyear-to-yearÕ) relationship between
predictors (including MDR SST, ENSO, and the NAO) and predictand (individual
annual TC counts). Any trends that emerge in model-predicted TC counts are an
emergent result of the model, produced purely by the behavior of the underlying
predictors. Indeed, as shown by Mann et al (2007), when the model is trained on
the first half of the record, ending in the mid 1940s, it successfully predicts
the subsequent rise of the past two decades. Kerry Emanuel (Emanuel et al,
2008) notes that the Mann et al (2007) statistical model exhibits similar level
of skill to his dynamical downscaling approach (discussed further below in response
to assertion #7). That the model
successfully captures much of the long-term variability in the sediment-based
hurricane history in the current article indeed appears to provide some
additional long-term validation of the statistical model (though we felt no
need to actually state so in the manuscript).
Assertion #7: Recent modeling studies
regarding anthropogenic climate change impacts upon Atlantic TC frequency
(e.g., Chauvin et al. 2006, Bengtsson et al. 2007,
Emanuel et al. 2008) indicate little or no trend in TC counts in response to
warming SSTs. The authorsÕ underlying assumptions are therefore not physically
valid.
Firstly, this is not an accurate characterization
of the recent literature. Emanuel et al (2008), using a particularly elegant dynamical downscaling approach finds a
modest projected increase in Atlantic TC counts in response to anthropogenic
forcing averaged over all models, and an especially large increase in using
large-scale (such as the GFDL
coupled model). In more recent work (K. Emanuel, pers. comm) performed shortly after Emanuel et al
(2008) went to press, Emanuel finds, using a further refinement of the
technique, an increase in frequency averaged over all the models of the IPCC
AR4 assessment (SRES A1B scenario) from 13.5 to 15.1, and for the GFDL model a
much larger increase from 18.9 to 26.9 (nearly 50% increase). The approach of
using regional climate models fed with large-scale boundary conditions used in
other studies looking at
anthropogenic impacts on TCs can be quite model
dependent. In some cases, the sign of the response can even be changed simply
by changing the nature of model parameterizations. For example, Yoshimura et al
(2006) found an increase in TC number over the Indian Ocean if the model used the
Kuo cumulus parameterization but a decrease if the Arkawa-Schubert cumulus parameterization scheme was used.
What models may or may not project with regard to
future climate change is in any case not necessarily relevant to interpreting
historical trends, We note, for example, that Knutson et al (2008), while
projecting little or no 21st century changes in TC counts when
driving their regional model with certain climate change projections,
nonetheless use as a validation of their approach the fact that their regional
model is able to reproduce the positive trend in Atlantic TC counts over recent
decades when driven with late 20th century reanalysis data,. In fact, the model produces a 40%
larger trend then has actually been witnessed. Given the open questions that
still exist with climate model-based projections of future TC activity, we find
it premature at best to question observed historical trends (which is the focus
of our article) based on studies of the projected future behavior.
Assertion #8: The Atlantic basin is the only
one to have seen an increase in tropical cyclone frequency over the last few
decades. Thus the statistical model used by the authors is not valid.
We frankly find this criticism (which has indeed
been made against our study) particularly puzzling. It is unclear how the
observation that increased TC counts is only seen for the Atlantic constitutes
a shortcoming of our statistical model, since our model only attempts to assess
the influences on TC activity that are specific to the Atlantic basin. There
are aspects of the Atlantic (e.g. a particularly large area where SSTs are on
the cusp of the threshold necessary for supporting TC genesis) that make it
potentially unique in its response to modest increases in global SST. Indeed, the Emanuel et al (2008) theoretical
modeling study shows Atlantic basin to be most sensitive, in terms of TC
activity (i.e. annual TC counts), to further increases in SST.
Assertion #9: Return periods calculated from
a 270 km radius, as in this study, are not appropriate for interpreting the
information from the sediment overwash deposits.
We
agree that the radius of influence for each site is likely less than 270 km,
and therefore the return periods for overwash at each site would likely be
longer than that predicted using this radius. However, these derived return
periods are used only to obtain relative weights when assimilating the
different records, with actual return frequencies determined by the
reconstructions themselves. The radius of 270 km was chosen in order to have a
large enough area for obtaining appropriate statistics using the HurRisk model, yet small enough that the return periods
reflect the relative activity at a site compared to the others within the
composite. In summary, results using the 270 km radius are only used to vary
the relative weighting for the different records and in no way affects the
reoccurrence rate of events within each reconstruction.
Assertion #10: The two different estimates of
past activity (sediments and proxy-climate driven statistical model) donÕt look
all that similar, there are discrepancies between them.
It is
true that there are discrepancies between these records. In fact, there is
substantial discussion given to the discrepancies (e.g. the 15th
century peak that appears in the sediment record but not the statistical model
reconstruction) and possible reasons for this, which include the caveats and
limitations specific to either approach discussed in the article.
A
statistical correlation is probably not the best comparison of the two
estimates, as a better question is whether or not the estimates are consistent
within their respective uncertainties, not whether or not all of the
multidecadal wiggles in the record are the same (they are not---and some of the
differences are interesting and are probably telling us something important as
well). That having been said, it
turns out that the correlation between the two records is nonetheless both
relatively high and statistically significant.
The two series are smoothed on timescales of 40
years and longer (i.e. using a filter with passband
centered at f=0.025 cycles/year) to emphasize the timescales of variability
they are likely to record most reliably. The correlation between the two
smoothed series during the 1350 year interval of overlap (AD 500-1849) is r=0.4387.
To evaluate the statistical significance of
this correlation properly, we must first account for the degrees of freedom in
the series being compared. The nominal number of effective samples n is in this case is the length of
overlap (1350 years) divided by the effective sampling spacing, this gives n = 1350/40 = 34.
However, this does not account for the reduced
degrees of freedom due to the autocorrelation present in each series. This is
evaluated from the lagged correlation between effectively independent samples,
which is a lag of 40 years for the 40 year smoothed series being compared. The effective number of samples is n'
= n(1-rho1*rho2)/(1+rho1+rho2) where rho1 and rho2 are the autocorrelations
of the two series at a lag of 40 years.
For the two time series in question we have
rho1 = 0.4387 and rho2= 0.5559. This yields n'
= 34 (1-0.4387*0.5559)/(1+0.4387*0.5559) = 34 (1-0.2438)/(1+0.2438) = 34(0.6079)
= 21
The number of degrees of freedom in the
correlation is the number of effective samples minus 2, so there are
approximately n'-2= 19 effective
degrees of freedom in the correlation.
We must therefore determine the significance of
a correlation of r=0.4387 over the full 1350 year
overlap with 19 statistical degrees of freedom. A one-sided hypothesis test is required to establish statistical
significance, since we would reject anticorrelation
of the two series as failure.
Using online lookup tables, e.g. here: http://faculty.vassar.edu/lowry/tabs.html#r
(note that "N" here
is 21, and the degrees of freedom "N-2"
is 19), we find that the correlation of r=0.4387
is statistically significant at the p=0.02
level.
This is not too shabby. That having been said,
we wouldn't place much emphasis in the correlation of the two series. We are
more comfortable drawing what we feel are only the most robust inferences, e.g.
that there is simultaneous evidence for a medieval peak which indeed might
exceed current (1995-present) activity within our uncertainties, and that later
centuries demonstrate a lull in activity prior to the recent rise.
Assertion #11: DoesnÕt the inactive 2009
Atlantic tropical storm season (at least thus far, as of August 13 2009)
disprove the relationships between climate and tropical cyclones argued for in the
paper?
Actually,
somewhat the contrary is true. Prior to the 2007 Atlantic tropical storm
seasons, Mann and colleagues used the very same statistical model used in the
current study to forecast the number of named storms that would occur. Their
prediction (15 named storms) turned out to be spot on. Prior to the 2009
season, Mann and colleagues also made a prediction. Given the relatively cool
tropical Atlantic SSTs going into the season and the possibility of a
developing El Nino, they forecast that if an El Nino event indeed did emerge
(which we now know it has), we would expect a total of between 6 and 12 named
storms (a quite inactive season by modern standards). We see no evidence yet
that this forecast is not realistic, but in a few months weÕll know for sure.
Further details of the forecasts are available here.