Field Data Summary
There were 50 plots established in total; 39 conifer (>80% conifer dominant), and 11 mixed. The resulting frequency distribution of defoliation severity across stand type was not optimal (Figure 5). Topography rendered some of the pre-selected plots inaccessible. Defoliation severity did not appear evenly distributed across stand type (i.e. low probability of finding severe defoliation in mixed stands), and there was a tendency for the ocular rating to overestimate light defoliation resulting in few plots of moderate defoliation. The compounded effects of these issues altered the intended distribution of field plots across the desired attributes (i.e. defoliation severity, stand type, and age class).
If height can be used as a surrogate for age, it appears that there is a sufficient age distribution across defoliation severity (Figure 6a). Mid-defoliation ranges, as noted above, were under sampled across all age classes, and lightly defoliated mature stands were not represented at all. This could be attributable to the stage of the outbreak the field samples were taken (i.e. most mature stands had already been severely affected).
To further understand how defoliation severity responds across conifer stands, the distribution can be broken down by species composition (Figure 6b). It is apparent that most severely defoliated plots are Balsam fir dominant and plots exhibiting lower levels of defoliation are Spruce dominant. These results are consistent with the literature 6,8. It may be useful to further dissect the data by species dominance if there are significant differences in spectral response between conifer stand types.
Although it is preferable to have an even sample from each stand type and age class across defoliation severity, it is not expected to affect the ability to discriminate defoliation severity across stand types.
Image Data Summary
Before attempting statistical analyses on the image data it is important to investigate relationships present between raw and normalized image pairs. Figure 7 shows a sample under the pseudo-invariant mask used to derive the linear normalization equations. These graphs illustrate the importance of normalization between time-series. Although there are no major differences in the near-infrared wavelengths (NIR) between raw and normalized data of T1 and T2, we see a significant shift in short-waved infrared (SWIR) after normalization. When detecting change it is important to see a 1:1 relationship between time-series.
The next stage of the analyses is to look at the spectral response defoliation severity has across different wavelengths (Figure 8). This can assist in identifying outliers, determining data distributions, determining which vegetation indices to use, and whether any transformations need to be applied to the data. To avoid redundancy by showing the differences and relative differences of all bands and vegetation indices, only the relative difference between T1 and T2 of bands and vegetation indices are shown.
Green, red and SWIR wavelengths show no obvious relationship to defoliation severity, making the identification of outliers difficult. It is also difficult to determine the normality of the distribution and whether the relationship to cumulative defoliation is linear. Abnormal distributions in these wavelengths may indicate the presence of outliers, but may also reveal holes within the data (i.e. unsampled populations). NIR, NDVI and ISR relative differences identify possible outliers (circled in red), which after removal show a potential linear relationship to cumulative defoliation. Of the 4 outliers identified, 2 plots were mixed conifer (i.e. roughly 65% fir and 35% spruce) and are seen in the top right of the NIR, SWIR and NDVI plots in Figure 8. One of the outlier plots is spruce dominant and the other outlier plot has high species diversity. This indicates that homogenous plots may be needed before accurate modelling of cumulative defoliation from multispectral imagery can be calibrated.
Although transformations may be required for the predictor variables, determining best transformation proved difficult and was left to the residual analysis.
Mapping the results of the vegetation indices across the entire image also provides useful clues into the data. Simple observation of the change in greenness between time series allows the observer to make qualitative inferences about the results. It is apparent that the ISR relative difference delineates defoliation quite nicely (Figure 9).
The preliminary results indicate that defoliation severity can be detected using multispectral satellite imagery.
Statistical Analyses
DISCRIMINANT ANALYSIS
Of all the datasets run through the CANDISC procedure only 2 datasets provided meaningful results; all of the plots (ALL), and conifer plots (CONIFER). The results for the remaining datasets were erroneous where each plot had the same scores within a given severity class. This was thought to be attributable to the small samples of each severity class within each subset (i.e. more predictor variables than observations within a given class).
Two canonical discriminants were found after running ALL and CONIFER, however only the first discriminant is significant for both datasets(Table 2).
Although only one significant canonical discriminant was found, it represents a large amount of variation present in the data (93% and 87% for ALL and CONIFER respectively). As suggested by the F statistic and variation explained, it appears that defoliation severity may be more readily discriminated in pure conifer stands than in variable stand types. It may prove useful to collect more data from homogeneous stand types across severity classes to derive stand specific models (i.e. conifer and mixed).
The results presented in Figures 10 and 11 suggest a multidimensional approach may not be appropriate for discriminating defoliation severity (for graphical purposes only the relative differences are displayed). We see some redundancy in the predictor variables where the band relative differences are negatively correlated to discriminant 1 and the indices positively correlated. Perhaps using the bands that the indices are derived from along with the indices is not useful. It is apparent that the first canonical discriminant can discriminate lightly defoliated plots from more severely defoliated plots in both ALL and CONIFER. The ability to discriminate moderate from severe defoliation appears more difficult, with better (although not significant) discriminatory power in CONIFER than ALL along the second canonical discriminant. It is apparent that discriminanting classes of defoliation severity is either: not a multidimensional problem, or there may be normality violations present in the input variables.
It is proposed that the difficulties in discriminating defoliation severity classes arise from the classification of severity itself. Inevitably there will be issues of overlap, due to the nature of defining arbitrary boundaries onto a continuous variable. It seems more suitable to model defoliation severity directly from the continuous variable.
MANOVA
The discriminant analysis reveal relationships between the predictor and response variables, however the significance of each predictor variable is not determined. The MANOVA test reveals the predictor variables with the highest correlation to cumulative defoliation. By looking at the r-square, root mean square error, F-statistic and the p-value significant variables can be identified (Table 3). Only variables with a p-value <=0.005 were selected for further analysis.
The MANOVA test using all plots suggest that all of the predictor variables (i.e. band and index relative differences) significantly explain the variation observed in cumulative defoliation. From the table we can see that all of the predictor variables are almost equally correlated to cumulative defoliation. However, the relative difference in NDVI and ISR have low RMSE and large F-statistics. This suggests that these 2 variables may be the most useful in assessing cumulative defoliation with multispectral satellite imagery. The relative difference of individual bands are still significantly correlated, but the RMSE is higher suggesting more variability within the data.
The MANOVA test using only conifer plots tell a slightly different story. All of the input variables were significant except for the red relative difference. The correlation coefficients are quite different when running the MANOVA on conifer plots rather than using all of the plots. We can still see the lowest RMSE for the vegetation indices, however ISR fairs much worse than NDVI. It is apparent that for conifer plots, the green and near-infrared wavelengths, and the normalized difference vegetation index may have the most impact on modelling defoliation severity.
The difference in ouput between using all plots, and only conifer plots may reveal the variation in spectral response introduced by stand heterogeneity. Despite differences between the variables within a given run, all were used for the multiple regression analysis of their respective groups.
MULTIPLE REGRESSION
The output equations from the multiple regression were significant with the model explaining roughly 70% of the variation occurring over all defoliated forested areas, and 79% of the variation occurring over defoliated coniferous forest (Table 4). The parameters estimated using all plots reveal that only the relative differences in NDVI, ISR, Red and short-waved infrared are significantly affecting the model of the response variable (Table 5). Of those variables it is apparent that SWIR and NDVI have the most influence. The model derived for conifer plots reveal that only the relative differences in ISR, NIR and SWIR variables significantly affect the model of cumulative defoliation. The intuitive nature of this result reaffirms the validity of the model. It is interesting to note that the bands themselves have a greater influence on the model than the indice derived from them. The large values and error associated with the intercept is troublesome and suggest further refinement to the model is needed.
When looking at a predictor variable and its relationship to cumulative defoliation independently we can see that the vegetation indices and the near-infrared wavelength are weakly (but significantly) correlated (Table 6). Problems of multicollinearity may arise which weakens the ability to assess the impact of a given predictor variable, but should not affect the stability of the model overall.
Despite the potential problems arising from adding insignificant parameters to the model, they were nonetheless applied to the image for their respective cover types (i.e. conifer model applied to coniferous forest, and the all plots model applied to the mixed forest) (Figure 12). As seen in Figure 12a, areas of brown in T2 (which appear green in T1) is thought to represent defoliated areas, and are classified as moderate to severely defoliated by the model. Although these results show that cumulative defoliation can be modelled from multispectral satellite imagery; it is quite apparent that light defoliation is overestimated and severe defoliation underestimated.
To see how well the model fairs, residuals were plotted for predicted and observed cumulative defoliation (Figure 13). The residual plot confirms the suspicion that the model overestimates defoliation at low levels and underestimates defoliation at high levels. It is apparent that the data requires transformation before an accurate model can be fit.