Monica Bobra and Sebastien Couvidat
W.W. Hansen Experimental Physics Laboratory, Stanford University, Stanford, CA 94305
We attempt to forecast M-and X-class flares using a machine-learning algorithm, called Support Vector Machine (SVM), and four years of HMI data. Most flare forecasting efforts described in the literature use either line-of-sight magnetograms or a relatively small number of ground-based vector magnetograms. This is the first time a large dataset of vector magnetograms has been used. The soft-margin SVM with Gaussian kernels that we implement here is a binary non-linear classifier: after training on a catalog of examples characterized by a set of features, the SVM predicts whether new examples belong to the positive or negative classes. By convention, we assign the positive class to flaring active regions (AR), and the negative class to non-flaring ARs.
We build a catalog of positive (303) and negative (5000) examples, characterized by 25 SHARP parameters. For the flaring ARs, we select the SHARP parameters 24h before the peak GOES X-ray flux. We follow the definitions of : they posited two forms — operational and segmented — of associations between ARs and flares. For the operational case, an AR that flares within 24 hours after a sample time belongs to the positive class. Conversely, an AR that does not flare within 24 hours after a sample time belongs in the negative class. The positive class is defined in the same way for the segmented case; however, the negative class is defined differently: if an AR does not flare within a +/- 48 hour period from the sample time, it belongs to the negative class. We train and test the SVM by randomly separating our catalog of examples into a training set (70% of the data) and testing set (30%), and by selecting SVM parameters that provide the best results. The SVM routine comes from the Scikit-Learn module in the Python programming language.
To estimate the SVM performances we compute metrics like recall (characterizing the ability of the classifier to find all of the positive examples), precision (characterizing the ability not to label as positive an example that is negative), Heidke Skill Score (HSS; measuring the improvement of the forecast over a random one), etc… However here we focus on the true skill statistic (TSS), following . In solar-flare forecasting, the two classes are imbalanced: there are many more negative examples than positive ones, reflecting the fact that most ARs do not produce major flares in any given 24 or 48-hour period. This class imbalance is a major issue for machine-learning algorithms. Indeed, a classifier may strongly favor the majority class, and neglect the minority one.
Always predicting that an AR will not flare is likely to give very good results. It also strongly impacts many performance metrics. For instance, the accuracy (ratio of the correct prediction number over the total prediction number) is meaningless: a classifier always predicting that an AR will not flare would result in a high accuracy even though it would be useless for our purpose. Similarly, precision and HSS exhibit some dependence on the class imbalance in the testing set. To alleviate this problem,  suggest the use of the TSS, a.k.a. Hansen-Kuipers skill score or Peirce skill score. The TSS is the recall minus the false alarm rate. It reaches 1 and any misclassification, both of positive or negative examples, reduces this score accordingly.
The TSS is equitable: random or constant forecasts score zero, perfect forecasts score 1,
and forecasts that are always wrong score -1. Here, we obtain high TSS scores and good overall predictive abilities: the highest TSS reached in operational mode is 0.76, while it is 0.82 in segmented mode. These good numbers result in a low precision (only 50% in segmented and 42% in operational modes) but a high recall. By using other SVM parameters we can compromise between high TSS (e.g. 0.703) and high HSS 0.737 in segmented mode). We surmise that these high values are partly due to fine-tuning the SVM for this purpose and also to an advantageous set of features calculated from vector magnetograms.
We also apply a feature selection algorithm based on the univariate Fisher ranking score to determine which of our 25 features are useful for discriminating between flaring and non-flaring ARs. We conclude that only a handful are needed for good predictive abilities (see Figure 1): the total unsigned current helicity, total magnitude of the Lorentz force, total photospheric magnetic free energy density, and total unsigned flux. This confirms the conclusion found in .
 Leka, K.D., & Barnes, G. 2007, Astrophysical Journal, 656, 1173
 Bobra, M.G., Sun, X., Hoeksema, J.T., et al. 2014, Solar Physics, 289, 3549
 Ahmed, O.W., Qahwaji, R., Colak, T. et al., 2013, Solar Physics, 283, 157
 Bloomfield, D.S., Higgins, P.A., McAteer, R.T.J., & Gallagher, P.T. 2012, Astrophysical Journal, 747, L41
 Leka, K.D., & Barnes, G. 2003, Astrophysical Journal, 595, 1296