Machine-learned Identification of RR Lyrae Stars from Sparse, Multi-band Data: The PS1 Sample
Published: May 1, 2017
Latest article update: Nov. 29, 2022
RR Lyrae stars may be the best practical tracers of Galactic halo (sub-)structure and kinematics. The PanSTARRSl (PSI) Зтг survey offers multi-band, multi-epoch, precise photometry across much of the sky, but a robust identification of RR Lyrae stars in this data set poses a challenge, given PSl’s sparse, asynchronous multi-band light curves (<12 epochs in each of five bands, taken over a 4.5 year period). We present a novel template fitting technique that uses well-defined and physically motivated multi-band light curves of RR Lyrae stars, and demonstrate that we get accurate period estimates, precise to 2 s in >80% of cases. We augment these light-curve fits with other features from photometric time-series and provide them to progressively more detailed machine- learned classification models. From these models, we are able to select the widest (three-fourths of the sky) and deepest (reaching 120 kpc) sample of RR Lyrae stars to date. The PSI sample of ~45,000 RRab stars is pure (90%) and complete (80% at 80 kpc) at high galactic latitudes. It also provides distances that are precise to 3%, measured with newly derived period-luminosity relations for optical/near-infrared PSI bands. With the addition of proper motions from Gaia and radial velocity measurements from multi-object spectroscopic surveys, we expect the PSI sample of RR Lyrae stars to become the premier source for studying the structure, kinematics, and the gravitational potential of the Galactic halo. The techniques presented in this study should translate well to other sparse, multi-band data sets, such as those produced by the Dark Energy Survey and the upcoming Large Synoptic Survey Telescope Galactic plane sub-survey.
Statistical - stars, variables: RR Lyrae - surveys Supporting material, machine-readable tables, halo - methods, data analysis - methods, catalogs - Galaxy
Studies of the Galactic halo can help constrain the formation history of the Milky Way and the galaxy formation process in general. For example, in a recent theoretical study, Hargis et al. (2014) suggested that there may be between 300 and 700 low-luminosity () dwarf satellite galaxies orbiting the Milky Way within 300 kpc of the Sun. However, the census of such low-luminosity galaxies is currently complete only within ∼45 kpc of the Sun (Table 3 of Koposov et al. 2008), and only at high galactic latitudes (). A deeper, wider, and more complete census of Milky Way dwarfs would be extremely valuable because it would allow us to test our assumptions about ΛCDM cosmology and galaxy formation, by comparing the observed distribution and properties of discovered dwarfs against those present in state-of-the-art hydrodynamic simulations, such as the APOSTLE (Sawala et al. 2016) and FIRE/Gizmo simulation suites (Hopkins et al. 2014).
Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the authors) and the title of the work, journal citation and DOI. hydrodynamic simulations, such as the APOSTLE (Sawala et al. 2016) and FTRE/Gizmo simulation suites (Hopkins et al. 2014).
The Galactic halo also contains remnants of accreted satellites (i.e., dwarf galaxies and globular clusters) that were disrupted by tidal forces and stretched into stellar tidal streams and clouds (e.g., Ibata et al. 2001; Belokurov et al. 2007; Sesar et al. 2015; Bernard et al. 2016). Stellar streams are especially interesting because their orbits are sensitive to the properties of the Galactic potential (e.g., its shape and total mass of the Milky Way) and thus can be used to constrain it over the range of distances spanned by the streams (e.g., Koposov et al. 2010; Newberg et al. 2010; Sesar et al. 2013; Belokurov et al. 2014). For example, the total mass of the Milky Way is currently uncertain at a factor of two, and its more precise measurement using stellar streams may help resolve (or further aggravate) some apparent issues in the theory of galaxy formation, such as the so-called “Too-Big-To-Fail” problem (Boylan-Kolchin et al. 2011; Wang et al. 2012). A more precise measurement of the total mass requires detailed modeling of stellar streams, such as the Sagittarius stream, as well as precise 3D kinematics and positions of stars in distant streams (Price-Whelan et al. 2014). While the Gaia mission (Perryman et al. 2001) can, and will, deliver precise proper motions of halo stars, in most cases Gaia's parallax estimates will only be marginal beyond a few kiloparsecs. Studies focusing on the 6D phase space structure of the Galactic halo will need to rely on tracers with precise (spectro-)photometric distances for the foreseeable future, making “standard candles,” such as RR Lyrae stars, enormously valuable.
To measure the total mass of the Milky Way and find the faintest dwarf satellites, we need to trace the spatial and kinematic structure and substructure (i.e., stellar streams) of the Galactic halo over the greatest possible distances and with the highest possible precision in distance and velocity, and the best tracers for this task are RR Lyrae stars.
RR Lyrae stars are old ( Gyr), metal-poor ( dex), pulsating horizontal branch stars with periodically variable light curves (periods ranging from 0.2 to 0.9 days; Smith 2004). They are bright stars ( mag) with distinct light curves, which makes them easy to identify with time-domain imaging surveys, even to large distances (5–120 kpc for surveys with a magnitude range; e.g., Sesar et al. 2010). These properties, and the fact that almost every Milky Way dwarf satellite galaxy has at least one RR Lyrae star (including the faintest one, Segue 1; Simon et al. 2011), open up the exciting possibility of locating very-low-luminosity Milky Way dwarf satellites by using distant RR Lyrae stars, as first proposed by Sesar et al. (2014; also see Baker & Willman 2015).
RR Lyrae stars are also precise standard candles (i.e., their intrinsic brightness is well-determined). While distances to RR Lyrae stars can be measured with 3% uncertainty using optical data (Section 3.3), thanks to a tight period-luminosity relation in the near-infrared, distances to RR Lyrae stars can be measured with 2% or better precision using, for example, К-band observations (Braga et al. 2015; Beaton et al. 2016). Having precise distances is crucial for measuring tangential velocities and thus the Galactic potential, as the uncertainty in tangential velocity increases proportionally with the uncertainty in distance.
As made evident by several existing catalogs of Milky Way halo RR Lyrae stars (e.g., Vivas et al. 2001; Keller et al. 2008; Miceli et al. 2008; Sesar et al. 2010, 2013; Drake et al. 2013; Abbas et al. 2014), selection of RR Lyrae stars has become an almost routine procedure, as long as one has access to ∼40 or more observations per star in a single photometric bandpass. While very useful for many Galactic studies, the above catalogs are not ideal: they are either deep with limited sky coverage (e.g., the SDSS Stripe 82 catalog covers 100 deg2 and is complete up to 110 kpc, Sesar et al. 2010) or have wide coverage but are not very deep (e.g., the CRTS catalog covers 20,000 deg2 and is complete up to 30 kpc, Drake et al. 2013). In addition, none of the above catalogs cover the Galactic plane, and thus cannot support studies of the old ( Gyr) Galactic disk. Currently the only existing imaging survey that has the potential to overcome all of the above drawbacks, and provide a deep and wide-area catalog of RR Lyrae stars in the northern skies (), is the Pan-STARRS1 (PS1) survey.
Even though the PS1 survey holds great potential for Galactic studies due to its depth and sky coverage, it is a challenging data set for selection of RR Lyrae stars due to its sparse temporal coverage, cadence, and asynchronous multi-band observations (see Figure 1). As we described in our previous work (Hernitschek et al. 2016), we overcame these challenges by characterizing time-series of PS1 sources using three statistics: a -based variability indicator, a variability amplitude (in the -band) , and a variability timescale τ, where the latter two were obtained by fitting a damped random walk model to observed PS1 multi-band structure functions (see Section 3.2 of Hernitschek et al. 2016). When applied to the second internal PS1 data release (PV2), our approach yielded a candidate sample of ∼150,000 RR Lyrae stars covering three quarters of the sky and reaching up to 120 kpc from the Sun.
Building on the work by Hemitschek et al. (2016), in this paper, we use the final PSI data release (PV3) to significantly increase the completeness and purity of the PSI sample of RR Lyrae stars. Compared to Hemitschek et al. (2016), we achieve higher completeness and purity by (1) having more observation epochs per object (72 in PV3 versus 55 in PV2), (2) by excising fewer and thereby retaining more of these observations (using a machine-learning algorithm that more efficiently identifies bad photometric data), (3) by building a more detailed machine-learned model of RR Lyrae stars in data space, and (4) by developing and running a CPU-intensive multi-band light-curve fitting on PSI time-series, thereby directly determining the RR Lyrae periods. The purer samples of RR Lyrae stars that this work delivers are especially important for studies of the Galactic halo (e.g., when searching for low-luminosity dwarf satellites), as stars incorrectly identified as RR Lyrae stars may cause the appearance of spurious halo substructures (Sesar et al. 2010).
From an observational point of view, RR Lyrae stars are A- F-type stars with distinct, periodically variable light curves. In the following sections, we describe data that capture these properties of RR Lyrae stars.
Pan-STARRS1 (Kaiser et al. 2010, PS1) is a wide-field optical/near-IR survey telescope system located at the Haleakalā Observatory on the island of Maui in Hawai'i. The largest survey undertaken by the telescope, the PS1 survey (Chambers 2011), has observed the entire sky north of decl. in five filter bands (Stubbs et al. 2010; Tonry et al. 2012), reaching single-epoch depths of about 22.0, 22.0, 21.9, 21.0, and 19.8 mag in , , , , and bands, respectively. The uncertainty in photometric calibration of the survey is mag (Schlafly et al. 2012), and the astrometric precision of single-epoch detections is 10 mas (Magnier et al. 2008).
The PSI Зтг survey aimed to observe each position in two pairs of exposures per filter per year, where the observations within each so-called transit-time-interval pair were taken ~25 minutes apart and in the same band. Thus, the survey should have obtained about 16 observations in each band (for a total of 80), but due to bad weather and telescope downtime, fewer epochs were observed (~70 on average).
Unlike Hernitschek et al. (2016, see their Section 2.2), we do not use bit-flags or other ad hoc procedures to exclude detections that may appear as non-astrophysical photometric outliers in PS1 time-series data (e.g., badly calibrated data, blended objects, etc.). We define a non-astrophysical photometric outlier as a photometric measurement that deviates by more than from its "expected" value, where σ is the total photometric uncertainty of that detection. Instead, to identify and remove non-astrophysical photometric outliers, we employ a machine-learned model that uses other properties associated with a detection (e.g., its position on the chip, level of agreement with a point-spread function model, seeing, etc.) to predict whether a detection will be a outlier or not (B. Sesar et al. 2017, in preparation).
Validation tests have shown that our machine-learned outlier model identifies 80% of all true outliers, and only misclassifies one good observation for every true outlier. For comparison, the outlier rejection approach adopted by Hernitschek et al. (2016) identifies almost all of the outliers, but it misclassifies eight good observations for every true outlier.
After removing photometric outliers from the PV3 time-series (using our machine-learned outlier model), the average number of observations per object is 67 (out of the initial 72 observations). If we would have used the outlier rejection method of Hemitschek et al. (2016), the number of observations per object would have decreased to ~30.
To select objects with enough epochs for multi-band light-curve fitting (Section 3.2), and signal-to-noise ratios appropriate for variability studies, we require that PS1 light curves have (after outlier rejection):
Dereddened optical colors are useful because they provide a rough estimate of the spectral type, and could help with the identification of RR Lyrae stars (which are A–F-type stars). Thus, we correct observed PS1 magnitudes for extinction using the extinction coefficients of Schlafly & Finkbeiner (2011; see their Table 6) and the Schlafly et al. (2014) dust map and calculate , , , , and colors, where indicates an uncertainty-weighted mean magnitude. If for some reason an object is not observed in a particular PS1 band, the value of the color involving that band is reset to 9999.99.
To extract variability information from multi-band PS1 light curves, we calculate the variability indicator (Equation (1) of Hernitschek et al. 2016), and fit a damped random walk model to PS1 multi-band structure functions. From the best-fit damped random walk model, we measure the variability amplitude (in the -band), and the variability timescale τ (see Sections 3.1 to 3.3 of Hernitschek et al. 2016). As Hernitschek et al. (2016) have shown, these three parameters are very useful for separating different types of variable sources (e.g., quasars and RR Lyrae stars).
3. Light Curve and Period Fitting
In this section, we describe several approaches to fitting the multi-band light curves, which will result in a period determination and a fit to the phased light curve.
3.1. Multi-band Periodogram
A more detailed separation of variables can be obtained by studying the properties of phased (i.e., period-folded) light curves, such as the amplitude and shape (Dubath et al. 2011; Richards et al. 2011; Elorrieta et al. 2016). However, lightcurve folding requires an assumed or known period.
To measure the period of variability of a PSI light curve, we use the multi-band periodogram of VanderPlas & Ivezic (2015) as implemented in gat spy, an open-source Python package for general astronomical time-series analysis (Vanderplas 2015). Briefly, the algorithm of VanderPlas & Ivezic (2015) models the phased light curves in each band as an arbitrary truncated Fourier series, with the period and optionally the phase, shared across all bands.
Since the phase offsets between RR Lyrae griz light curves are smaller than 1% (Sesar et al. 2010), we adopt the VanderPlas & Ivezić (2015) shared-phase model when calculating the multi-band periodogram (i.e., we set gatspy parameters and see Section 5 of VanderPlas & Ivezić 2015 for details). Furthermore, since RR Lyrae stars have periods between 0.2 and 0.9 days, we limit the period search to the same range.
To test the accuracy of the VanderPlas & Ivezic (2015) period-finding algorithm on PSI data, we use gatspy to calculate multi-band periodograms for 440 RR Lyrae stars previously studied by Sesar et al. (2010), but with the PSI data at hand. From each periodogram, we select the period associated with the highest periodogram peak. We considered the periods measured by Sesar et al. (2010) from more densely sampled Sloan Digital Sky Survey (SDSS; York et al. 2000) Stripe 82 observations to be “true” periods (see Section 2 of Sesar et al. 2010 for details on SDSS Stripe 82). We define a period to be accurately measured if the selected multi-band periodogram peak is within 2 s of the true period.
We find that the VanderPlas & Ivezić (2015) period-finding algorithm accurately identifies the true period for 53% of RR Lyrae stars observed by PS1 (37% if the accuracy of 1 s is required). Changing the gatspy and parameters did not significantly improve this result.
Using a mathematical, not physical, multi-band fight-curve model is one of the reasons why the VanderPlas & Ivezic (2015) period-finding algorithm fails to identify the true period for half of the RR Lyrae stars observed by PSI. When the light curves are sparse, there is no guarantee that the resulting best-fit multi-band model will be physical. If the light-curve model is not constrained by external information on the physics of the problem at hand, inaccurate, or not robust, period estimates may result.
An example of a best-fit, but non-physical (mathematical) multi-band model is shown in Figure 2. The g − r color predicted by the best-fit model does not change as a function of phase (i.e., the difference between the green and red line), while in reality, it is well known that RR Lyrae stars have bluer g − r color when they are brightest (e.g., Sesar et al. 2010). Furthermore, the model predicts mag, while in reality mag. Due to a combination of a non-physical multi-band model and sparse PS1 data, the VanderPlas & Ivezić (2015) algorithm is unable to accurately measure the pulsation period of that particular star.
Since the VanderPlas & Ivezic (2015) algorithm fails to accurately measure the period for almost half of the RR Lyrae stars with the PSI data at hand, we cannot use it to phase the light curves. However, we still calculate and use the multi-band periodogram in this work in the subsequent classification because it improves the selection of RR Lyrae stars (see Section 4.5 below).
3.2. Multi-band Light-curve Fitting and Periods
We now show that it is possible to accurately measure periods of RR Lyrae stars observed by PSI, by using a more realistic and physically constrained multi-band light-curve model.
In principle, such a model could be obtained by extracting theoretical griz light curves from pulsation models, such as those created by Marconi et al. (2006). However, a comparison of theoretical (Marconi et al. 2006) and empirical (i.e., observed) SDSS ugriz light curves by Sesar et al. (2010, see their Figure 8) has shown differences between the two lightcurve sets that cannot be explained by observational uncertainties. Due to these differences, we are reluctant to use theoretical multi-band models of RR Lyrae stars when measuring periods.
Instead of using theoretical multi-band models, we adopt a set of 483 empirical griz models. These models consist of griz light-curve templates that were fitted by Sesar et al. (2010) to observed griz light curves of 483 RR Lyrae stars in SDSS Stripe 82. The curves in Figure 3 illustrate one of the 483 empirical multi-band models. The set contains 379 type ab multi-band templates, corresponding to RR Lyrae stars pulsating in the fundamental model (RRab stars), and 104 type c multi-band templates, corresponding to RR Lyrae stars pulsating in the first overtone (RRc stars).
We define the kth empirical multi-band model, which is a function of the pulsation phase ϕ as
where is the best-fit template light curve, is the (known and fixed) amplitude of this template, and is the (known and fixed) best-fit magnitude at peak brightness (i.e., at ) in the band of the kth RR Lyrae star in SDSS Stripe 82 (see Table 2 of Sesar et al. 2010 for values of and ). Note that the free parameter acts as a zero-point offset in our model, since the griz light curves have been normalized by subtracting from each light curve. The free parameter F allows the amplitudes of model griz light curves to vary by up to 20% from their original values (which are listed in Table 2 of Sesar et al. 2010).
Qualitative inspection of phased PSI z and у band light curves has shown that the two are roughly similar within photometric uncertainties. Therefore, we (can) treat all y-band observations as z-band observations in the remainder of the analysis.
Assuming a period of P days and a phase offset , we calculate the phase of each PS1 observation epoch as
where the time of observation t is in units of heliocentric Julian days, and . The purpose of the phase offset is to make sure the maximum light of the best-fit multi-band template occurs at . Note that the phase of an observation needs to be in the range. If it is outside of that range, one should add or subtract 1.
To find the best-fit values of F, , , and P parameters for a given multi-band template, k, we minimize a -like statistic calculated as
where , , and are the time, magnitude, and the photometric uncertainty of the nth observation in the band (e.g., , , ). The best-fit parameters (period, phase offset, etc.) are measured using the Differential Evolution algorithm of Storn & Price (1997) as implemented in scipy, an open-source Python package for scientific computing (Oliphant 2007; Millman & Aivazis 2011).
We perform multi-band light-curve fitting in two runs. In the first run, we fit every multi-band template to a PS1 multi-band time-series, and for each template we record the best-fit and model parameter values. When fitting a type ab multi-band template, we constrain the minimization to periods ranging from 0.4 to 0.9 days. The minimization is constrained to periods ranging from 0.2 to 0.5 days when fitting a type c multi-band template. The above period ranges are typical of type ab (RRab) and type c (RRc) RR Lyrae stars pulsating in the fundamental or the first-overtone mode, respectively. For illustration, Figure 4 shows the 483 best-fit and period values, one for each template light curve, measured for a faint RR Lyrae star and a faint non-RR Lyrae object. Clearly, there is a global minimum for the RR Lyrae light curve among the values.
In a second round of light-curve fitting, we fit only type ab or type c templates, depending on the type of the best-fit template (i.e., the template with the lowest value) found during the first run. This time, the period range is restricted to ±2 minutes around the period associated with the best-fit template from the first run. At the end of the second fitting iteration, we save only the best-fit and model parameter values associated with the best-fit template (of the second run).
The result of applying this procedure to the PSI fight curves of 440 RR Lyrae stars in SDSS Stripe 82 is illustrated in Figure 5. Our multi-band template fitting method accurately measures periods for 85% of RR Lyrae stars (87% of RRab and 74% of RRc stars), a 32% improvement in period recovery over the VanderPlas & Ivezic (2015) algorithm. Within 1 s, the period is recovered for 73% of RR Lyrae stars (a 36% improvement versus the VanderPlas & Ivezic 2015 algorithm). If the period fitting returns a discrepant value, this can predominately be attributed to one-day beat frequency aliasing (see Figure 5).
When fitting multi-band templates to PSI light curves of 440 RR Lyrae stars in SDSS Stripe 82, there is a possibility that some Stripe 82 RR Lyrae stars will be best fit with their own multi-band templates (recall that multi-band templates were constructed from observed SDSS light curves of individual Stripe 82 RR Lyrae stars). This “self-fitting” can be considered to be a form of overfitting, which, if it happens frequently, may inflate our estimate of the accuracy of period recovery. However, only 6 out of 440 Stripe 82 RR Lyrae stars are fit with their own multi-band templates, indicating that self-fitting does not happen frequently and that it does not significantly inflate our estimate of the accuracy of period recovery. The lack of self-fitting implies that many multi-band templates are quite similar to each other, and suggests that there is a potential for a computational speedup by removing redundant multi-band templates from the set.
As Figure 4 illustrates, the multi-band light-curve fitting also provides useful information for separating RR Lyrae stars and non-RR Lyrae objects. For example, the average and best-fit values measured for the RR Lyrae star are vastly lower than the corresponding values measured for the non-RR Lyrae object, even though both objects have PS1 light curves of similar signal-to-noise ratio and sampling. This result is not unexpected, since measures the statistical agreement between the (observed) phased PS1 light curve and a best-fit empirical multi-band light-curve model of an RR Lyrae star, and thus quantifies how "RR Lyrae-like" an object is.
In order to further characterize how RR-Lyrae-like a PSI multi-band light curve is, we measure additional properties of phased PSI light curves, such as the entropy of the phased light curve, the Stetson (1996) J index, as well as ~20 other properties (called features in machine learning), and describe them in more detail in Appendix A.
3.3. Resulting RR Lyrae Distance Precision
Along with measuring an accurate period (87% of the time for RRab stars), an important aspect of multi-band light-curve fitting is also the increased precision in estimating the star’s average flux (or magnitude). Because RR Lyrae stars follow a tight period-absolute magnitude-metallicity (PLZ) relation, we show that PSI data constrain distances of RR Lyrae stars with 3% precision, even if the metallicity of an RR Lyrae star is unknown.
Theoretical and empirical studies (e.g., Catelan et al. 2004; Sollima et al. 2006; Braga et al. 2015; Marconi et al. 2015) have shown that the absolute magnitudes of RR Lyrae stars can be modeled as
where P is the period of pulsation, is the absolute magnitude at some reference period and metallicity (here chosen to be days and dex), and α and β describe the dependence of the absolute magnitude on period and metallicity, respectively. The is a standard normal random variable with mean 0 and standard deviation , that models the intrinsic scatter in the absolute magnitude convolved with unaccounted measurement uncertainties.
To constrain the PLZ relations for RRab stars in PSI bandpasses, we use a probabilistic approach described in detail in Appendix B, where the data include metallicities and distance moduli of PSI RR Lyrae stars in five Galactic globular clusters. The end product of this approach is a joint posterior distribution of all model parameters. To describe the marginal posterior distributions of individual model parameters, we measure the median, the difference between the 84th percentile and the median, and the difference between the median and the 16th percentile of each marginal posterior distribution (for a Gaussian distribution, these differences are equal to ±1 standard deviation). We report these values in Table 1.
Overall, the PLZ relations behave as expected: as the bandpass moves to redder wavelengths, the dependence on the period increases (i.e., the α parameter becomes more negative), the dependence on the metallicity (i.e., the β parameter) and the scatter decrease, and the reference absolute magnitude becomes brighter. Similar trends were also observed in previous theoretical and observational studies (e.g., Braga et al. 2015; Marconi et al. 2015). Since the PLZ relation for the band is most tightly constrained and has low-metallicity dependence, we use it hereafter when measuring distances.
The metallicity information is not available for the vast majority of stars in PS1. To estimate the uncertainty in absolute magnitudes when the metallicity is unknown, we assume that RR Lyrae stars are drawn from the halo metallicity distribution function, represented with a Gaussian distribution centered on −1.5 dex and with a standard deviation of 0.3 dex (Ivezić et al. 2008). The resulting uncertainty in is then mag, and the expression for simplifies to
To calculate distance moduli of PS1 RR Lyrae stars, we use flux-averaged -band magnitude and Equation (5). For the uncertainty in distance modulus, we adopt mag. This corresponds to a distance precision of , as long as dust extinction is not an important issue.
To validate Equation (5), we compute median distance moduli for three dwarf spheroidal galaxies, using the PSI data for their RR Lyrae stars. We find DM = 19.51 ± 0.03 mag for Draco, DM = 19.67 ± 0.03 mag for Sextans, and DM — 19.17 ± 0.03 mag for Ursa Minor, where the uncertainty in distance moduli is dominated by the systematic uncertainty in absolute magnitude. The values for Draco and Sextans agree well with the literature values of 19.40 ± 0.17 and 19.67 ±0.1 (Bonanos et al. 2004; Lee et al. 2009), respectively. The DM = 19.11 ± 0.03 mag we measure for Ursa Minor agrees with the 19.18 ± 0.12 mag measurement of Mighell & Burke (1999), but disagrees with the 19.4 ± 0.1 mag value measured by Carrera et al. (2002). The rms scatter of distance moduli of RR Lyrae stars in these dwarf galaxies is odm ~ 0.05 mag. This scatter empirically verifies the intrinsic scatter of oM = 0.05 mag, we measured when fitting the PLZ relation in the ipi band.
3.4. WISE Data
Quasars (QSOs) are some of the biggest sources of contamination when selecting RR Lyrae stars, especially at faint magnitudes (i.e., as the probed volume of the universe increases). They overlap with RR Lyrae stars in g − r and redder optical colors (e.g., Figure 4 of Sesar et al. 2007), and may look as variable as RR Lyrae stars when observed in sparse data sets (such as PS1, Figure 3 of Hernitschek et al. 2016). Because most QSOs have a hot dust torus, they show an excess of radiation in the mid-infrared part of the spectrum, and have the WISE (Wright et al. 2010) mid-infrared color mag (Figure 2 of Nikutta et al. 2014). RR Lyrae stars, on the other hand, have mag.
To better separate QSOs and RR Lyrae stars, we supplement PS1 data with the W12 color provided by the all-sky WISE mission by matching PS1 and WISE positions using a radius. If a PS1 object does not have a WISE W1 or W2 measurement, or those measurements have uncertainties greater than 0.3 mag (i.e., the WISE detection is less than above the background), we reset its W12 color to 9999.99 (this happens for about 50% of objects). We also calculate the color, and set its value to 9999.99 if one of its magnitudes is missing.
We wish to build a model that returns the probability of an object to be an RR Lyrae star, given the data from Section 2 and the light-curve fits from Section 3. We will address this problem with a supervised machine-learning approach, where we use a training set (labeled or classified objects and then- data) to infer a function that determines the class of unlabeled objects from their data. Since we have a reliable training set (see Section 4.1), supervised learning techniques (Section 4.2) represent a natural choice for building a classification model for a selection of RR Lyrae stars. The light-curve fitting and its \ 2 value (Section 3.2) play a crucial role in this process.
4.1. Training Set
To “learn” how to classify objects, supervised algorithms need to be trained using a subset of the data in which each object is labeled, i.e., their class is known in advance.
Our main training set consists of 1.9 million PS1 objects located in the SDSS Stripe 82 region (, ) that are brighter than 21.5 mag, have at least 23 observations (Section 2), and are at least (two tidal radii; Harris 1996 (2010 edition)) away from the center of globular cluster NGC 7089. To label the objects in the training set, we match them to Sesar et al. (2010) and Süveges et al. (2012) catalogs of RR Lyrae stars. If the position of an RR Lyrae star in one of these two catalogs matches the position of the closest PS1 object within , we label the PS1 object as an "RR Lyrae" (class 1; there are 462 such matches, and only three RR Lyrae stars do not have a PS1 match). The remaining PS1 objects are labeled as "non-RR Lyrae" (class 0). Out of 465 matched RR Lyrae stars, we know that 364 are of type ab (RRab) and 98 are of type c (RRc).
Since the SDSS Stripe 82 observations are slightly deeper and have six times more epochs than PSI observations, we consider the Sesar et al. (2010) and Siiveges et al. (2012) RR Lyrae catalogs to be 100% pure and complete up to the adopted faint PSI magnitude limit (and likely beyond). Consequently, we consider the labels of PSI objects in SDSS Stripe 82 to be the “ground truth” when measuring the efficiency of our selection method (i.e., the selection completeness and purity).
In SDSS Stripe 82, the majority of previously identified RR Lyrae stars are located within ∼30 kpc of the Sun (400 stars or 83% of the sample, see Figure 10 of Sesar et al. 2010), and are thus fairly bright (). This distribution is the result of Galactic structure, and is not a selection effect. To enhance the training set with fainter, and thus more distant RR Lyrae stars, we use RR Lyrae stars in the Draco dwarf spheroidal galaxy, located at a heliocentric distance of ∼80 kpc (Kinemuchi et al. 2008). If the position of an RR Lyrae star in the Kinemuchi et al. (2008) catalog matches the position of the closest PS1 object (with at least 23 observations) within , we label the PS1 object as an "RR Lyrae" (there are 261 such matches, and 5 RR Lyrae stars do not have a PS1 match). Out of 261 matched RR Lyrae stars, 205 are of type ab (RRab), 30 are of type c (RRc), and 25 are d-type RR Lyrae stars (RRd) that pulsate simultaneously in the fundamental mode and first overtone.
4.2. Supervised Learning
To build this classification model, we use XGBoost (Chen & Guestrin 2016), an open-source implementation of the gradient tree boosting supervised machine-learning technique (Friedman 2001).
We use gradient tree boosting because the technique produces a prediction model in the form of an ensemble of decision trees, and because tree-based models are robust to uninformative features (Hastie et al. 2009; Dubath et al. 2011; Richards et al. 2011). This fact supports the usage of a
large number of features when building the classification model, even when some of them may not be useful. By permitting the classification algorithm to consider even seemingly uninformative features, we allow it to consider potential correlations between data that may improve the classification in the end.
Given the resilience of gradient tree boosting to uninformative features, and the improvement in classification that additional features may bring, the best approach seems to be to train the classifier using the full set of features. However, this is impractical for the data set at hand. While calculating mean optical colors, low-level variability statistics (Section 2), and the multi-band periodogram takes less than a second per object, multi-band fight-curve fitting takes ~30 min per object. Given that our training set contains about 1.9 million objects, calculating all of these features for all objects in the training set would be computationally prohibitive.
Instead of training a single classifier using the full set of features, we build three progressively more detailed classifiers using progressively smaller, but purer training sets (purer in the sense that the fraction of RR Lyrae stars in the training set increases). We describe these classifiers in Sections 4.4-Л.6, but first give an overview of how a classifier is trained in Section 4.3.
4.3. Overview of Classifier Training
In brief, the steps in training a classifier are to
While the first step is fairly self-explanatory, the remaining steps require further explanation.
The gradient tree boosting technique produces a prediction model in the form of an ensemble of decision trees. The number of trees in the ensemble, the maximum depth of a tree, the fraction of features that are considered when constructing a tree, and many other parameters that affect the manner in which the trees are grown and pruned, can be controlled via parameters exposed by the XGBoost package. By properly tuning these model hyperparameters, we can ensure that the classification produced by the model is not sub-optimal (e.g., not overfitted).
Before tuning the hyperparameters, we select input features, and shuffle and split the training set into two equal-sized sets, which we call development and evaluation sets. We use stratified splitting, i.e., we make sure that the ratio of RR Lyrae and non-RR Lyrae objects is equal in both sets.
To find the optimal hyperparameters, we use the Grid- SearchCV function in the scikit-learn open-source package for machine learning (Pedregosa et al. 2011). GridSearchCV selects test values of hyperparameters from a grid, and then measures the performance of the classification model (for the given hyperparameters) using tenfold stratified cross-validation on the development set. In detail, the development set is split into 10 subsets (using stratified splitting, see above), the model is trained on 9 subsets, and the probability of being an RR Lyrae star (hereafter, the classification score) is obtained from the trained model for objects in the tenth (i.e., withheld) subset. The performance of the classification is evaluated on the withheld set using some metric (see Sections 4.4-4.6 for details), and the whole procedure is repeated nine more times, each time with a different withheld set. The average of the 10 performance evaluations is stored, and the set of hyperparameters with the best average performance is used when training the classifier (step 3).
To verify whether the choice of the development set significantly affects the tuning of hyperparameters, we evaluate the performance of the tuned model on the evaluation set (which was not used by GridSearchCV during hyperparameter optimization), and then repeat the tuning process, but this time we use the evaluation set for tuning and the development set for evaluation. We find that the tuning procedure returns similar values of hyperparameters, regardless of the choice of the development set, indicating that the tuning of hyperparameters is not significantly biased by our choice of the development set.
Once the hyperparameters are tuned and the classifier is trained, we evaluate the performance of the classification using a purity versus completeness curve (see Figure 6 for examples). To measure this curve, we use the classification scores of objects in the Stripe 82 part of the training set (see Section 4.1 for details on this set). The classification scores of these objects were calculated using the tenfold cross-validation on the full (Stripe 82 and Draco) training set. For any threshold on the score and knowing the true class of each object in SDSS Stripe 82, we obtain the fraction of recovered RR Lyrae stars (completeness), and the fraction of RR Lyrae stars in the selected sample (purity).
4.3. First Classification Step: Optical/IR Colors and Variability
To train the first classifier, we use the full training set of 1.9 million objects (Section 4.1) and adopt their variability statistics, as well as average PSI and WISE colors as input features for the classifier (for a total of 10 features, see Sections 2 and 3.4 for details); we do not use the light-curve fitting of Section 3. When tuning the classifier, we select the values of hyperparameters that maximize the area under the purity versus completeness curve. The black dotted line in Figure characterizes the performance of the trained classifier.
Our first classification outperforms the one obtained by Hemitschek et al. (2016) for all choices of sample purity and completeness (i.e., for all thresholds on the classification score). This is attributable perhaps foremost to a substantially greater number of observations per object in our data set (67 versus 35 in Hemitschek et al. 2016). The hyperparameter tuning, and use of a different machine-learning algorithm (XGBoost versus scikit-learn Random Forest) may also contribute to better performance.
Using a cut on the classification score of (), we are able to reduce the number of objects under consideration by more than three orders of magnitude (from about 1.9 million to ∼1500), while losing only 2% of RR Lyrae stars. However, the purity of the selected sample is still unacceptably low (only 34%). In order to improve the purity of the selected sample, we need to train the classifier using additional features, such as the multi-band periodogram (Section 3.1).
4.4. Second Classification Step: Multi-band Periodogram
For 1568 objects that pass the first classification cut (), we calculate multi-band periodograms and extract the top 20 periods from each periodogram. Along with the periods, we also extract the power of each period (i.e., the height of the periodogram at that period).
As Figure 7 illustrates, the multi-band periodogram contains useful information for separating RR Lyrae stars from non-RR Lyrae objects. In principle, we could improve the purity of the selection by simply keeping all objects with , without a loss of completeness. On the other hand, we may achieve even better classification if we provide the entire set of periods and their powers to XGBoost, and let the algorithm decide which features to use.
To improve the classification of RR Lyrae stars, we create a new feature set by combining 10 features used by the first classifier, with the top 20 periods and their powers obtained from the multi-band periodogram (for a total of 50 features). As the training set, we use ∼1500 objects remaining from the initial training set after the cut. When tuning hyperparameters, we adopt values that optimize the area under the purity versus completeness curve. The blue dashed line in Figure 6 characterizes the performance of the second classifier.
We find that the addition of multi-band periodogram data improves the selection of RR Lyrae stars, as evidenced by higher sample purity at a given completeness (blue dashed line in Figure 6). For example, at 90% completeness, adding multiband periodogram data increases the purity of the selected sample by 15% (with respect to the purity delivered by the first classifier, the black dotted line).
4.5. Final Classification Step: Multi-band Light-curve Fitting
Given the information in hand, a nearly optimal classification may be obtained by also including features extracted from phased multi-band light curves (Section 3.2 and Appendix A), but first, we need to fit multi-band light curves to objects under consideration.
Since multi-band light-curve fitting is computationally quite expensive (∼30 minute per CPU and per object), we only do it for 910 training objects that have , where is the classification score produced by the second classifier. According to Figure 6, this selection cut returns a sample with 66% purity and 95% completeness. By using this cut, we avoid fitting objects that are not likely to be RR Lyrae stars (i.e., we do not waste CPU time), but at the same time, do not reject many true RR Lyrae stars (i.e., the completeness decreases by 2% due to this cut, with respect to the 97% completeness obtained after the cut). In principle, we could have reached the same sample purity using a cut on the first classification score (), but the decrease in completeness would have been much greater (a 6% decrease).
In Sections 4.4 and 4.5, we have trained classifiers using non-RR Lyrae objects (class 0) and RR Lyrae stars (class 1), that is, we have performed binary classifications. The reason for this two-step procedure was practical. In order to make the multi-band template fitting computationally feasible, we had to reduce the number of objects under consideration to a manageable level (by increasing the purity of the selected sample), while retaining as many true RR Lyrae stars as possible (i.e., by keeping the selection completeness as high as possible). Doing this required only knowing whether an object is likely an RR Lyrae star or not. By using cuts on scorei and scorer (binary) classification scores, we were able to reduce the number of objects from about 1.9 million to 900, while retaining 95% of RR Lyrae stars.
We now take a further step, determining whether an object is a non-RR Lyrae object (class 0), a type ab (class 1), or a type c or d RR Lyrae star (class 2) through a multiclass classification.
To train the final (multiclass) classifier, we use 910 training objects that have (and, of course, ). Of these 910 objects, 541 are RRab stars, and 144 are RRc or RRd stars (based on Sesar et al. 2010 and Kinemuchi et al. 2008 classifications). The remaining 225 objects are non-RR Lyrae objects. The feature set consists of 50 features employed by the second classifier, and 20 features extracted from phased multi-band light curves (Appendix A). Since we are training a multiclass classifier, when tuning hyperparameters we adopt values that minimize the logistic (or cross-entropy) loss (Bishop 2006). The thick red line in Figure 6 characterizes the purity and completeness of the selection as a function of the threshold on , where and are RRab and RRc classification scores, respectively.
5. Verification and Analysis of the RR Lyrae Selection at High Galactic Latitudes
The purity versus completeness curve obtained using the full classifier (the thick red line in Figure 6) shows that we can select samples of RR Lyrae stars that are 90% complete and 90% pure; we deem this a gratifying and impressive success. However, it is important to keep in mind that the purity and completeness shown in Figure 6 are integrated over RRab and RRc stars, and over a range of distances (or magnitudes; roughly, 5–120 kpc, ). Since we have reasons to expect variations in purity and completeness as a function of type and distance (e.g., because the classification becomes more uncertain as objects get fainter and light curves become noisier), and because the knowledge of such variations is important for studies of Galactic structure (e.g., when measuring the stellar number density profile), below we present a more detailed analysis of the selection of RR Lyrae stars at high galactic latitudes ().
5.1. Purity and Completeness in Detail
The solid line in Figure 8 shows the purity of the RRab selection at the faint end (at ∼80 kpc or mag), given a threshold on . To make this curve, we use 80 labeled objects from the SDSS Stripe 82 training set with and , and calculate the fraction of true RRab stars in selected samples (given a threshold on ).
To quantify the completeness of the RRab selection at the faint end, we use 242 RRab stars from the Draco dSph training set (see Section 4.1) that have . The dashed line in Figure 8 shows the completeness of the selection (i.e., the fraction of recovered RRab stars) as a function of the threshold on . This completeness includes all losses due to initial data quality cuts, and classification cuts (i.e., and ). For convenience, we tabulate the purity and completeness in Table 2.
Above, we have used the Stripe 82 sample to measure the purity, and the Draco sample to measure the completeness. We did so because the S82 sample covers a large area and thus contains a more representative sample of contaminants that we may expect to encounter elsewhere on the sky. The Draco sample was used because it contains more faint (r ~ 20 mag) RR Lyrae stars than the Stripe 82 sample, and thus the estimate of the completeness has a lower Poisson noise.
Given the sparseness and the multi-band nature of PS1 data, it is remarkable that our selection method can deliver samples of RRab stars that are pure and complete (e.g., for ), even at distances as far as ∼80 kpc from the Sun. This raises hope for even better performance at the bright end.
To measure the purity and completeness at the bright end, we select objects from the SDSS Stripe 82 training set with and mag (i.e., within ∼40 kpc from the Sun). We use the mag brightness cut because the vast majority of halo RR Lyrae stars are located within that magnitude range (Sesar et al. 2010). The relevant curves are plotted in Figure 9 and tabulated in Table 3.
Finally, the purity and completeness curves characterizing the selection of bright RRc stars are shown in Figure 10 and tabulated in Table 4. Figure 10 shows that the selection of pure and complete samples of RRc stars is more challenging, both due to the lower amplitude of the pulsation and to the contamination by contact binaries with similar sinusoidal right curves. Nonetheless, it is still possible to produce samples that are over 80% complete and pure within ~40 kpc from the Sun. We do not discuss RRc stars further because they are less numerous than RRab stars (by a factor of three), and thus are of lesser importance for Galactic studies.
5.2. RRab Selection Function
Given a position on the sky and the flux-averaged r-band magnitude of an RR Lyrae star, what is the probability of selecting that star using the PS1 data at hand? Characterizing this selection function is of obvious importance for studies of the Galactic structure, especially when modeling the number density distribution of stars (e.g., Bovy et al. 2012; Xue et al. 2015). In this section, we restrict ourselves to characterizing the selection function of RRab stars at high galactic latitudes (), because (1) they are three times more numerous than RRc stars, and (2) at a given purity, they can be recovered at a much higher rate than RRc stars (compare Figure 9 versus Figure 10). Characterizing the selection function at low galactic latitudes would require an appropriate training set (Stripe 82 and Draco are both located at ).
We assume that the selection function S depends only on the flux-averaged r-band magnitude of an RRab star (not corrected for interstellar extinction), and not on its position (i.e., )). This is a reasonable assumption given the uniformity of dust extinction away from the Galactic plane, and the uniformity of PS1 multi-epoch coverage. The selection function will also depend on the threshold imposed on the final classification score, . For the sake of simplicity, we only consider the case when , as this selection cut returns a sample that is appropriate for many studies (90% purity and 80% completeness, even at the faint end; see Section 5.1). By assuming spatial independence, we can now use the SDSS Stripe 82 and Draco training sets to determine the PS1 selection function of RRab stars at high galactic latitudes. The result is illustrated in Figure 11.
We find that the RRab selection function is approximately constant at for mag, after which it steeply drops to zero at mag. To characterize the selection function, we construct a simple probabilistic model.
There are 577 RRab stars in our training set (in SDSS Stripe 82 and Draco), of which 483 pass the selection cut. With each RRab star, we associate a pair of values, where if the star is selected, otherwise ( is the star's extincted flux-averaged r-band magnitude). We denote the full data set of 577 pairs of values as .
The likelihood of this data set is given by
where is the Bernoulli probability mass function with success probability given by the selection function, .
To model the selection function, we use the logistic curve
where L is the curve's maximum value, k is the steepness of the curve, and is the magnitude at which the completeness drops to 50%.
The probability of this model given data is then
where is the prior probability of model parameters. We impose uniform priors such that (i.e., completeness within 40 kpc is between 40% and 100%) and .
We explore the probability of various model parameters using the Goodman & Weare (2010) Affine Invariant Markov chain Monte Carlo (MCMC) ensemble sampler, as implemented in the emcee package22 (v2.2.1, Foreman-Mackey et al. 2013). The most probable model of the selection function (yellow curve) is shown in Figure 11, with the best-fit logistic curve (Equation (7)) being L = 0.91, k = 4.1, mag. To illustrate the uncertainty in the model, we also plot the curves associated with 200 randomly selected models from the posterior distribution (thin blue lines).
6. PSI Catalog of RR Lyrae Stars
We have applied the above multi-step-selection procedure to about 500 million PS1 objects that pass PS1 data quality cuts (Section 2), and have calculated final RRab and RRc classification scores (, and ) for 240,000 objects. We report their positions, distances, PS1 photometry, and classification scores in Table 5. A total of ∼400,000 CPU hours of super-computing time was used to process all of the data and calculate the final classification scores. Below, we illustrate some properties of this sample and leave a more detailed analysis of the distribution of RR Lyrae stars in the Galactic halo for future studies.
To illustrate the coverage of the PS1 catalog of RR Lyrae stars, we have selected a sample of ∼45,000 highly probable RRab stars (, expected purity of 90% and completeness of at 80 kpc), and have plotted their angular distribution in Figure 12.
The leading arm of the Sagittarius tidal stream (Ibata et al. 2001) and four Milky Way satellite galaxies are most easily discernible features in Figure 12. However, another notable feature is an almost complete absence of high probability RRab stars (i.e., those that have ) in regions with high ISM extinction (e.g., ). Improperly dereddened photometry is the most likely reason for this lack of high probability RRab stars at low galactic latitudes.
Briefly, when dereddening photometry, we assume that all sources are located behind the dust layer. At low galactic latitudes, this may not always be true because sources may be embedded in the dust layer. After dereddening, the photometry of such sources will be overcorrected for extinction, and their optical PS1 colors will be shifted blueward from their dust-free values. In addition, improperly dereddened light curves will not be well-fit by multi-band templates (Section 3.2). As a result of these effects, true RR Lyrae stars may look like non-RR Lyrae objects, and may end up with low values.
The lack of high probability RRab stars at low galactic latitudes also demonstrates the resilience of the classifier to contamination. Due to the increase in stellar number density, and the fact that some fraction of stars will be incorrectly tagged as RR Lyrae stars, one would naively expect for the density of objects tagged as RR Lyrae stars to increase toward the Galactic plane. However, no such increase is observed in Figure 12. The features extracted during multiband template fitting are most likely responsible for this resilience, as even a significant increase in the number of contaminants is not sufficient to produce objects that match multi-band light-curve characteristics of RR Lyrae stars.
Finally, to illustrate the efficiency of the final multiclass classifier (Section 4.6) at separating RRab and RRc stars, we show their distribution in the period versus the -band amplitude diagram (Figure 13).
7. Discussion and Summary
In this paper, we have explored with what fidelity RR Lyrae stars can be identified in multi-epoch, asynchronous multi-band photometric data. We have done this for the specific case of the PS1 light curves, which are very sparse; epochs per band over 3000 typical RR Lyrae pulsation periods (i.e., four years). To identify RR Lyrae stars, we have employed, in particular, the fitting (and period phasing) of very specific empirical RR Lyrae light curves, and have utilized supervised machine-learning tools. While we have applied our selection only to this specific data set, many of the approaches described here will be applicable to other sparse, asynchronous multi-band data sets, such as those produced by the Dark Energy Survey (DES; Dark Energy Survey Collaboration et al. 2016) and the Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008). For example, for its Galactic plane sub-survey, LSST is currently planning to obtain 12 observations over four years in each of its ugrizy bandpasses (i.e., 30 observations per band over 10 years; Ž. Ivezić 2017, private communication), making its data set very similar to the one produced by the PS1 survey. Compared to PS1, however, LSST will go deeper by at least 2 mag in izy bands, allowing us to select RR Lyrae stars and study old stellar populations close to the Galactic plane, even at the far side of the Galaxy.
We demonstrated that we can precisely and accurately measure the periods (to within 2 s) for the vast majority of RR Lyrae within PS1 , extending to distance moduli of ~20, or ~100kpc. The high precision of the period determination may seem surprising at face-value, but is owed to the long time baseline: a 2 s/period difference causes a cumulative fight-curve shift of 90 minutes across 3000 pulsation periods. Accurate periods are crucial for calculating the phase of spectroscopic observations, and for transforming the observed radial velocity to the center-of-mass velocity needed for kinematic studies (Sesar 2012). The ephemerides (i.e., periods and phase offsets) provided by our catalog can thus be readily used to turn RR Lyrae stars observed by current (e.g., SDSS-IV/TDSS; Ruan et al. 2016) and upcoming multi-object spectroscopic (MOS) surveys (e.g., Gaia, WEAVE, DESI; Perryman et al. 2001; Levi et al. 2013; Dalton et al. 2014) into precise kinematic tracers of the halo structure and substructure (i.e., stellar streams). With a density of 1 deg-1, PSI RR Lyrae stars represent a unique “piggyback” project for MOS surveys, with a potentially high impact and certainly low cost (~1 target per MOS field).
Using these light-curve fits as one (crucial) feature in a supervised classification of RR Lyrae, we showed that we can—at least at high Galactic latitudes—construct a sample of ~45,000 RRab stars that has 90% purity and 80% completeness, even at 80 kpc from the Sun. In comparison with previous catalogs, our sample is deeper than the SDSS Stripe 82 sample of Sesar et al. (2010), while covering more of the sky than the CRTS sample of Drake et al. (2013). The PSI Зтг data and the classification presented here even allow for a quite reliable separation of RRab and RRc type of RR Lyrae stars, as shown by the period-amplitude diagram (Figure 13)
All this opens up many avenues in exploring the Galactic halo. With its second data release (DR2) expected in April 2018, the Gaia astrometric mission will provide unprecedented proper motions for PSI RR Lyrae stars brighter than mag, but no competitive distance information (beyond a few kiloparsecs). Having precise distances is crucial for measuring tangential velocities, and thus the Galactic potential, as the uncertainty in tangential velocity increases proportionally with the uncertainty in distance. Therefore, it is particularly remarkable that, using PSI data and a period-absolute magnitude relation, we can measure distances to RR Lyrae stars with a precision of , even for stars at 100 kpc from the Sun.
An important avenue to explore with the resulting RR Lyrae catalog is the question of RR Lyrae at low latitudes (covered by PSI). These objects open up the possibility to explore the oldest portion of the Galactic disk. At low latitudes, the selection function of the sample will, however, be considerably
B.S., N.H., and H.-W.R. acknowledge funding from the European Research Council under the European Unions Seventh Framework Programme (EP 7) ERC Grant Agreement n. . H.-W.R. acknowledges support of the Miller Institute at UC Berkeley through a visiting professorship during the completion of this work. We thank the anonymous referee for the thorough review, positive comments, and constructive remarks on this manuscript. The Pan-STARRSl Surveys (PSI) have been made possible through contributions by the Institute for Astronomy, the University of Hawaii, the Pan-STARRS Project Office, the Max-Planck Society and its participating institutes, the Max Planck Institute for Astronomy, Heidelberg, and the Max Planck Institute for Extraterrestrial Physics, Garching, The Johns Hopkins University, Durham University, the University of Edinburgh, the Queen’s University Belfast, the Harvard-Smithsonian Center for Astrophysics, the Las Cumbres Observatory Global Telescope Network Incorporated, the National Central University of Taiwan, the Space Telescope Science Institute, and the National Aeronautics and Space Administration under Grant No. NNX08AR22G issued through the Planetary Science Division of the NASA Science Mission Directorate, the National Science Foundation Grant No. AST-1238877, the University of Maryland, Eotvos Lorand University (ELTE), and the Los Alamos National Laboratory.
Appendix A. Features Extracted From Phased Multi-band Light Curves
In this section, we describe features extracted from phased multi-band light curves.
Appendix В Constraining Period-Absolute Magnitude-Metallicity Relations for PSI Bands
To constrain Equation (4), we use average (apparent) PSI griz magnitudes of 55 RR Lyrae stars located in five Galactic globular clusters that have metallicities ranging from —1.02 dex to —2.37 dex (on the Carretta et al. 2009 scale and taken from the Harris 1996, 2010 catalog; NGCs 6171, 5904, 4590, 6341, and 7078). The distance moduli of these globular clusters were measured by Sollima et al. (2006, see their Table 4) using a period-luminosity-metallicity (PLZ) relation whose zero point of —1.05 ± 0.13 mag was constrained using the Hubble Space Telescope (HST) parallax of the star RR Lyrae. Using HST parallaxes of five RR Lyrae stars, Benedict et al. (2011) measured the zero point of the Sollima et al. (2006) PLZ relation to be —1.08 ± 0.03 mag. Thus, for distance moduli of 55 RR Lyrae stars, we adopt the distance moduli of then- globular clusters (see Table 4 of Sollima et al. 2006), shifted by 0.03 mag to match the more precise zero point measured by Benedict et al. (2011). As the systematic uncertainty in distance moduli of calibration RR Lyrae stars, we assume a value of 0.03 mag.
We measure average (apparent) griz magnitudes of RR Lyrae stars by first converting their best-fit griz model light curves (e.g., see sold lines in Figure 3) from magnitude to flux units. The model curves in flux units are then integrated over a pulsation cycle, and the integrated fluxes are expressed in units of magnitude. Hereafter, we call these flux-averaged magnitudes (, , etc.), and distinguish them from uncertainty-weighted average magnitudes (, , etc.).
Given the above data, we wish to calculate the probability of some set of model parameters for the PS1 band, where is the full data set of 55 stars, and is the data set associated with the kth star that contains its distance modulus and metallicity (assumed to be the same as that of the cluster), period, and flux-averaged PS1 apparent magnitudes.
Using the relations of conditional probability, the probability can be written as
where is the prior probability of model parameters, and
is the likelihood of the full data set given some values of model parameters, .
To make the fitting of PLZ relations robust to possible outliers (e.g., due to an incorrectly measured period, which may happen for 13% of RRab stars; Section 3.2), we model the likelihood of the kth data set using a mixture model (see Section 3 and Equation (17) of Hogg et al. 2010)
where and are the likelihoods of drawing the data set from the inlier or outlier distribution, respectively, and is the mixing proportion.
The likelihood of drawing the data set from the inlier distribution is
where , is the apparent flux-averaged magnitude in the PS1 band, Mb is the absolute magnitude in the b band
The likelihood of drawing the data set from the outlier distribution is defined as
where has the same form as Equation (4), but different parameters, . Most importantly, the distribution of outliers is assumed to be wider than the distribution of inliers, that is, .
Before we can calculate probability , we need to define the prior probabilities of model parameters, . For the parameter and and bands, we adopt informative Gaussian priors based on the slopes of PLZ relations in the globular cluster M4 (Braga et al. 2015): and . For and , we adopt Jeffreys log-uniform priors (Jaynes 1968), and require that . Uniform priors are adopted for the remaining model parameters and bands.
To efficiently explore the parameter space, we use the Goodman & Weare (2010) Affine Invariant MCMC Ensemble sampler as implemented in the emcee package (v2.2.1, Foreman-Mackey et al. 2013). We use 200 walkers and obtain convergence after a short bum-in phase of 300 steps per walker. The chains are then evolved for another 2000 steps, and the first 300 (bum-in) steps are discarded.
To describe the marginal posterior distributions of individual model parameters, we measure the median, the difference between the 84th percentile and the median, and the difference between the median and the 16th percentile of each marginal posterior distribution (for a Gaussian distribution, these differences are equal to ±1 standard deviation). We report these values in Table 1 (Section 3.3). We do not characterize parameters of the outlier distribution, and instead simply marginalize over them (i.e., we consider them to be nuisance parameters).