Multithreading comes to Stan

It’s been available for a while actually, but not in a form I’ve been able to figure out how to use. That changed recently with the introduction of the reduce_sum “higher order function” in Stan 2.23.0. What reduce_sum does is allow data to be partitioned into conditionally independent slices that can be dispatched to parallel threads if the Stan program is compiled with threading enabled. My SFH modeling code turns out to be an ideal candidate for parallelization with this technique since flux errors given model coefficients and predictor values are treated as independent1This is an oversimplification since there is certainly some covariance between nearby wavelength bins. This is rarely if ever modeled though and I haven’t figured out how to do it. In any case adding covariance between measurements would complicate matters but not necessarily make it impossible to partition into independent chunks. Here is the line of Stan code that defines the log-likelihood for the current version of my model (see my last post for the complete listing):

    gflux ~ normal(a * (sp_st*b_st_s) .* calzetti_mod(lambda, tauv, delta) 
                    + sp_em*b_em, g_std);

This just says the galaxy flux values are drawn from independent gaussians with mean given by the model and known standard deviations. Modifying the code for reduce_sum requires a function to compute a partial sum over any subset of the data for, in this case, the log-likelihood. The partial sum function has to have a specific but fairly flexible function signature. One thing that confused me initially is that the first argument to the partial sum function is a variable to be sliced over that, according to the documentation, must be an array of any type. What was confusing was that passing a variable of vector type as the first argument will fail to compile. In other words a declaration in the data block like

vector[N] y;

with, in the functions block:

real partial_sum(vector y, ...);

fails, while

real y[N];

real partial_sum(real[] y, ...);

works. The problem here is that all of the non-scalar model parameters and data are declared as vector and matrix types because the model involves vector and matrix operations that are more efficiently implemented with high level matrix expressions. Making gflux the sliced variable, which seemed the most obvious candidate, won’t work unless it’s declared as real[] and that won’t work in the sampling statement unless it is cast to a vector since the mean in the expression at the top of the post is a vector. Preliminary investigation with a simpler model indicated that approach works but is so slow that it’s not worth the effort. But, based on a suggestion in the Stan discussion forum a simple trick solved the issue: declare a dummy variable with an array type, pass it as the first argument to the partial sum function, and just don’t use it. So now the partial sum function, which is defined in the functions block is

  real sum_ll(int[] ind, int start, int end, matrix sp_st, matrix sp_em, 
              vector gflux, vector g_std, vector lambda, real a, real tauv, real delta, 
              vector b_st_s, vector b_em) {
    return normal_lpdf(gflux[start:end] | a * (sp_st[start:end, :]*b_st_s) .* calzetti_mod(lambda[start:end], tauv, delta) 
                    + sp_em[start:end, :]*b_em, g_std[start:end]);
  }

I chose to declare the dummy variable ind in a transformed data block. I also declare a tuning variable grainsize there and set it to 1, which tells Stan to set the slice size automatically at run time.

transformed data {
  int grainsize = 1;
  int ind[nl] = rep_array(1, nl);
}

Then the sampling statement in the model block is

    target += reduce_sum(sum_ll, ind, grainsize, sp_st, sp_em, gflux, g_std, lambda,
                          a, tauv, delta, b_st_s, b_em);

The rest of the Stan code is the same as presented in the last post.

So why go to this much trouble? As it happens a couple years ago I built a new PC with what was at the time Intel’s second most powerful consumer level CPU, the I9-7960X, which has 16 physical cores and supports 32 threads. I’ve also installed 64GB of DRAM (recently upgraded from 32), which is overkill for most applications but great for running Stan programs which can be rather memory intensive. Besides managing my photo collection I use it for Stan models, but until now I haven’t really been able to make use of its full power. By default Stan runs 4 chains for MCMC sampling, and these can be run in parallel if at least 4 cores are available. It would certainly be possible to run more than 4 chains but this doesn’t buy much: increasing the effective sample size by a factor of 2 or even 4 doesn’t really improve the precision of posterior inferences enough to matter. Once I tried running 16 chains in parallel with a proportionate reduction in post-warmup iterations, but that turned out to be much slower than just using 4 chains. The introduction of reduce_sum raised at least the possibility of making full use of my CPU, and in fact working through the minimal example in Stan’s library of case studies indicated that close to factor of 4 speedups with 4 chains and 4 threads per chain are achievable. I got almost no further improvement setting the threads per chain to 8 and thus using all virtual cores, which apparently is expected at least with Intel CPUs. I haven’t yet tried other combinations, but using all physical cores with the default 4 chains seems likely to be close to optimal. I also haven’t experimented much with the single available tuning parameter, the grainsize. Setting it equal to the sample size divided by 4 and therefore presumably giving each thread an equal size slice did not work better than letting the system set it, in fact it was rather worse IIRC.

I’ve run both the threaded and unthreaded models on one complete set of data for one MaNGA galaxy. This particular data set used the smallest, 19 fiber IFU, which produces 57 spectra in the stacked RSS data. This was binned to 55 spectra, all with SNR > 5. The galaxy is a passively evolving S0 in or near the Coma cluster, but I’m not going to discuss its star formation history in any detail here. I’m only interested in comparative runtimes and the reproducibility of sampling. And here is the main result of interest, the runtimes (for sampling only) of the threaded and unthreaded code. All runs used 4 chains with 250 warmup iterations and 750 post-warmup. I’ve found 250 warmup iterations to be sufficient almost always and 3000 total samples is more than enough for my purposes. This is actually an increase from what had been my standard practice.

exectime_threaded_unthreaded
Total execution time for sampling (warmup + post-warmup) in the slowest chain. Graph shows multithreaded vs. threaded time on the same data (N=55 spectra). All threaded runs used 4 chains and 4 threads per chain on a 16 core CPU.

On average the threaded code ran a factor of 3.1 times faster than the unthreaded, and was never worse than about a factor 2 faster2but also never better than a factor 3.7 faster. This is in line with expectations. There’s some overhead involved in threading so speedups are rarely proportional to the number of threads. This is also what I found with the published minimal example and with a just slightly beyond minimal multiple regression model with lots of (fake) data I experimented with. I’ve also found the execution time for these models, particularly in the adaptation phase, to be highly variable with some chains occasionally much faster or slower than others. Stan’s developers are apparently working on communicating between chains during the warmup iterations (they currently do not) and this might reduce between chain disparities in execution time in the future.

I’m mostly interested in model star formation histories, so here’s the comparison of star formation rate densities for both sets of model runs on all 55 bins, ordered by distance from the IFU center (which coincides with the galaxy nucleus):

sfr_all_8950-1902
Star formation rate density. Estimates are for threaded and unthreaded model runs.

Notice that all the ribbons are purple (red + blue), indicating the two sets of runs were nearly enough identical. Exact reproducibility is apparently hard to achieve in multithreaded code, and I didn’t try. I only care about reproducibility within expected Monte Carlo variations, and that was clearly achieved here.

There are only a few down sides to multithreading. The main one is that rstan is currently several dot releases behind Stan itself and seems unlikely to catch up to 2.23.x before 2021. I do extensive pre- and post-processing in R and all of my visualization tools use ggplot2 (an R package) and various extensions. At present the best way to integrate R with a current release of Stan is a (so far) pre-release package called cmdstanr, which bills itself as a “lightweight interface to Stan for R users.” What it actually is is an interface between R and cmdstan, which is in turn the command line driven interface to Stan.

By following directions closely and with a few hints from various sources I was able to install cmdstan from source and with threading support in Windows 10 using Rtools 4.0. The development version of cmdstanr is trivial to install and seems to work exactly as advertised. It is not a drop in replacement for rstan though, and this required me to write some additional functions to fit it into my workflow. Fortunately there’s an rstan function that reads the output files from cmdstan and creates a stanfit object, and this enables me to use virtually all of my existing post-processing code intact.

The other consequence is that cmdstan uses R dump or json format files for input and outputs csv files, the inputs have to be created on the fly, and the outputs have to be read into memory. Cmdstanr with some help from rstan handles this automatically, but there’s some additional overhead compared to rstan which AFAIK keeps all data and sampler output in memory. On my machine the temporary directories reside on a SSD, so all this file I/O goes fairly quickly but still slower than in memory operations. I also worry a bit about the number of read-write cycles but so far the SSD is performing flawlessly.

Multithreading is beneficial to the extent that physical cores are available. This is great for people with access to HPC clusters and it’s helpful to me with an overspec’d and underutilized PC. My current Linux box, which is really better suited to this type of work, only has a several generations old Intel I7 with 4 cores. It’s still a competent and highly reliable machine, but it’s not likely to see any performance improvement from multithreading. Fortunately the current trend is to add more cores to mainstream CPUs: AMD’s Ryzen line currently tops out at 16 cores and their Threadripper series have 24-64 cores. This would be a tempting next build, but alas I’m trying not to spend a lot of money on hobbies these days for more or less obvious reasons.

Continue reading “Multithreading comes to Stan”

The current state of my star formation history modeling

I’m stranded at sea at the moment so I may as well make a short post. First, here is the complete Stan code of the current model that I’m running. I haven’t gotten around to version controlling this yet or uploading a copy to my github account, so this is the only non-local copy for the moment.

One thing I mentioned in passing way back in this post is that making the stellar contribution parameter vector a simplex seemed to improve sampler performance both in terms of execution time and convergence diagnostics. This was for a toy model where the inputs were nonnegative and summed to one, but it turns out to work better with real data as well. At some point I may want to resurrect old code to document this more carefully, but the main thing I’ve noticed is that post-warmup execution time per sample is often dramatically faster than previously (the execution time for the same number of warmup iterations is about the same or larger though). This appears to be largely due to needing fewer “leapfrog” steps per iteration. It’s fairly common for a model run to have a few “divergent transitions,” which Stan’s developers seem to think is a strong indicator of model failure1I’ve rarely actually seen anything particularly odd about the locations of the divergences and inferences I care about seem unaffected. I’m seeing fewer of these and a larger percentage of model runs with no divergences at all, which indicates the geometry in the unconstrained space may be more favorable.

Here’s the Stan source:

// calzetti attenuation
// precomputed emission lines

functions {
  vector calzetti_mod(vector lambda, real tauv, real delta) {
    int nl = rows(lambda);
    real lt;
    vector[nl] fk;
    for (i in 1:nl) {
      lt = 5500./lambda[i];
      fk[i] = -0.10177 + 0.549882*lt + 1.393039*pow(lt, 2) - 1.098615*pow(lt, 3)
            +0.260618*pow(lt, 4);
      fk[i] *= pow(lt, delta);
    }

    return exp(-tauv * fk);
  }

}
data {
    int<lower=1> nt;  //number of stellar ages
    int<lower=1> nz;  //number of metallicity bins
    int<lower=1> nl;  //number of data points
    int<lower=1> n_em; //number of emission lines
    vector[nl] lambda;
    vector[nl] gflux;
    vector[nl] g_std;
    real norm_g;
    vector[nt*nz] norm_st;
    real norm_em;
    matrix[nl, nt*nz] sp_st; //the stellar library
    vector[nt*nz] dT;        // width of age bins
    matrix[nl, n_em] sp_em; //emission line profiles
}
parameters {
    real a;
    simplex[nt*nz] b_st_s;
    vector<lower=0>[n_em] b_em;
    real<lower=0> tauv;
    real delta;
}
model {
    b_em ~ normal(0, 100.);
    a ~ normal(1, 10.);
    tauv ~ normal(0, 1.);
    delta ~ normal(0., 0.1);
    gflux ~ normal(a * (sp_st*b_st_s) .* calzetti_mod(lambda, tauv, delta) 
                    + sp_em*b_em, g_std);
}

// put in for posterior predictive checking and log_lik

generated quantities {
    vector[nt*nz] b_st;
    vector[nl] gflux_rep;
    vector[nl] mu_st;
    vector[nl] mu_g;
    vector[nl] log_lik;
    real ll;
    
    b_st = a * b_st_s;
    
    mu_st = (sp_st*b_st) .* calzetti_mod(lambda, tauv, delta);
    mu_g = mu_st + sp_em*b_em;
    
    for (i in 1:nl) {
        gflux_rep[i] = norm_g*normal_rng(mu_g[i] , g_std[i]);
        log_lik[i] = normal_lpdf(gflux[i] | mu_g[i], g_std[i]);
        mu_g[i] = norm_g * mu_g[i];
    }
    ll = sum(log_lik);
}

The functions block at the top of the code defines the modified Calzetti attenuation relation that I’ve been discussing in the last several posts. This adds a single parameter delta to the model that acts as a sort of slope difference from the standard curve. As I explained in the last couple posts I give it a tighter prior in the model block than seems physically justified.

There are two sets of parameters for the stellar contributions declared in lines 36-37 of the parameters block. Besides the simplex declaration for the individual SSP contributions there is an overall multiplicative scale parameter a. The way I (re)scale the data, which I’ll get to momentarily, this should be approximately unit valued. This is expressed in the prior in line 44 of the model block. Even though negative values of a would be unphysical I don’t explicitly constrain it since the data should guarantee that almost all posterior probability mass is on the positive side of the real line.

The current version of the code doesn’t have an explicit prior for the simplex valued SSP model contributions which means they have an implicit prior of a maximally dispersed Dirichlet distribution. This isn’t necessarily what I would choose based on domain knowledge and I plan to investigate further, but as I discussed in that post I linked it’s really hard to influence the modeled star formation history much through the choice of prior.

In the MILES SSP models one solar mass is distributed over a given IMF and then allowed to evolve according to one of a choice of stellar isochrones. According to the MILES website the flux units of the model spectra have the peculiar scaling

\((L_\lambda/L_\odot) M_\odot^{-1} \unicode{x00c5}^{-1}\)

This convention, which is the same as and dates back to at least BC03, is convenient for stellar mass computations since the model coefficient values are proportional to the number of initially solar mass units of each SSP. Calculating quantities like total stellar masses or star formation rates is then a relatively trivial operation. Forward modeling (evolutionary synthesis) is also convenient since there’s a straightforward relationship between star formation histories and predictions for spectra. It may be less convenient for spectrum fitting however, especially using MCMC based methods. The plot below shows a small subset of the SSP library that I’m currently using. Not too surprisingly there’s a multiple order of magnitude range of absolute flux levels with the main source of variation being age. What might be less evident in this plot is that in the units SDSS uses coefficient values might be in the 10’s or hundreds of thousands, with multiple order of magnitude variations among SSP contributors. A simple rescaling of the library spectra to make at least some of the coefficients more nearly unit scaled will address the first problem, but won’t affect the multiple order of magnitude variations among coefficients.

emiles_ssp_spectra
Small sample of EMILES SSP model spectra in original flux units: T (black): age 1 Myr from 2016 update of BC03 models M (red) 1 Gyr B (green) 10 Gyr

One alternative that’s close to standard statistical practice is to rescale the individual inputs to have approximately the same scale and central value. A popular choice for SED fitting is to normalize to 1 at some fiducial wavelength, often (for UV-optical-NIR data) around V. For my latest model runs I tried renormalizing each SSP spectrum to have a mean value of 1 in a ±150Å window around 5500Å, which happens to be relatively lacking in strong absorption lines at all ages and metallicities.

emiles_ssp_normalized
Same spectra normized to 1 in neighborhood of 5500 Å

This normalization makes the SSP coefficients estimates of light fractions at V, which is intuitively appealing since we are after all fitting light. The effect on sampler performance is hard to predict, but as I said at the top of the post a combination of light weighted fitting with the stellar contribution vector declared as a simplex seems to have favorable performance both in terms of execution time and convergence diagnostics. That’s based on a very small set of model runs and no rigorous performance comparisons at all2among other issues I’ve run versions of these models on at least 4 separate PCs with different generations of Intel processors and both Windows and Linux OSs. This alone produces factor of 2 or more variations in execution time..

For a quick visual comparison here are coefficient values for a single model run on a spectrum from somewhere in the disk of the spiral galaxy that’s been the subject of the last few posts. These are “interval” plots of the marginal posterior distributions of the SSP model contributions, arranged by metallicity bin (of which there are 4) from lowest to highest and within each bin by age from lowest to highest. The left hand strip of plots are the actual coefficient values representing light fractions at V, while the right hand strip are values proportional to the mass contributions. The latter is what gets transformed into star formation histories and other related quantities. The largest fraction of the light is coming from the youngest populations, which is not untypical for a star forming region. Most of the mass is in older populations, which again is typical.

Another thing that’s very typical is that contributions come from all age and metallicity bins, with similar but not quite identical age trends within each metallicity bin. I find it a real stretch to believe this tells us anything about the real chemical evolution distribution trend with lookback time, but how to interpret this still needs investigation.

light_mass_fits
Light (L) and mass (R) weighted marginal distributions of coefficient values for a single model run on a disk galaxy spectrum

First complete model run with modified attenuation curve

Here’s an SDSS finder chart image of one of the two grand design spirals that found its way into my “transitional” galaxy sample:

Central galaxy: Mangaid 1-382712 (plateifu 9491-6101), aka CGCG 088-005

and here’s a zoomed in thumbnail with IFU footprint:

plateifu 9491-6101 IFU footprint

This galaxy is in a compact group and slightly tidally distorted by interaction with its neighbors, but is otherwise a fairly normal star forming system. I picked it because I had a recent set of model runs and because it binned to a manageable but not too small number of spectra (112). The fits to the data in the first set of model runs were good and the (likely) AGN doesn’t have broad lines. Here are a few selected results from the set of model runs using the modified Calzetti attenuation relation on the same binned spectra. First, using a tight prior on the slope parameter δ had the desired effect of returning the prior for δ when τ was near zero, while the marginal posterior for τ was essentially unchanged:

9491-6101_tauv_calzetti_mod
Estimated optical depth for modified Calzetti attenuation vs. unmodified. MaNGA plateifu 9491-6101

At larger optical depths the data do constrain both the shape of the attenuation curve and optical depth. At low optical depth the posterior uncertainty in δ is about the same as the prior, while it decreases more or less monotonically for higher values of τ. A range of (posterior mean) values of δ from slightly shallower than a Calzetti relation to somewhat steeper. The general trend is toward a steeper relation with lower optical depth in the dustier regions (per the models) of the galaxy.

9491-6101_tauv_delta_std
Posterior marginal standard deviation of parameter δ vs. posterior mean optical depth. MaNGA plateifu 9491-6101

There’s an interesting pattern of correlations1astronmers like to call these “degeneracies,” and it’s fairly well known that they exist among attenuation, stellar age, stellar mass, and other properties here, some of which are summarized in the sequence of plots below. The main result is that a steeper (shallower) attenuation curve requires a smaller (larger) optical depth to create a fixed amount of reddening, so there’s a negative correlation between the slope parameter δ and the change in optical depth between the modified and unmodified curves. A lower optical depth means that a smaller amount of unattenuated light, and therefore lower stellar mass, is needed to produce a given observed flux. so there’s a negative correlation between the slope parameter and stellar mass density or a positive correlation between optical depth and stellar mass. The star formation rate density is correlated in the same sense but slightly weaker. In this data set both changed by less than about ± 0.05 dex.

(TL) change in optical depth (modified – unmodified calzetti) vs. slope parameter δ
(TR) change in stellar mass density vs. δ
(BL) change in stellar mass density vs. change in optical depth
(BR) change in SFR density vs change in optical depth Note: all quantities are marginal posterior mean estimagtes.

Here are a few relationships that I’ve shown previously. First between stellar mass and star formation rate density. The points with error bars (which are 95% marginal credible limits) are from the modified Calzetti run, while the red points are from the original. Regions with small stellar mass density and low star formation rate have small attenuation as well in this system, so the estimates hardly differ at all. Only at the high end are differences about as large as the nominal uncertainties.

SFR density vs. stellar mass density. Red points are point estimates for unmodified Calzetti attenuation. MaNGA plateifu 9491-6101

Finally, here is the relation between Hα luminosity density and star formation rate with the former corrected for attenuation using the Balmer decrement. The straight line is, once again, the calibration of Moustakas et al. Allowing the shape of the attenuation curve to vary has a larger effect on the luminosity correction than it does on the SFR estimates, but both sets of estimates straddle the line with roughly equal scatter.

9491-6101_sigma_ha_sfr_mod
SFR density vs. Hα luminosity density. Red points are point estimates for unmodified Calzetti attenuation. MaNGA plateifu 9491-6101

To conclude for now, adding the more flexible attenuation prescription proposed by Salim et al. has some quantitative effects on model posteriors, but so far at least qualitative inferences aren’t significantly affected. I haven’t yet looked in detail at star formation histories or at effects on metallicity or metallicity evolution. I’ve been skeptical that SFH modeling constrains stellar metallicity or (especially) metallicity evolution well, but perhaps it’s time to take another look.

A more flexible prescription for dust attenuation – continued

Here’s a problem that isn’t a huge surprise but that I hadn’t quite anticipated. I initially chose, without a lot of thought, a prior \(\delta \sim \mathcal{N}(0, 0.5)\) for the slope parameter delta in the modified Calzetti relation from the last post. This seemed reasonable given Salim et al.’s result showing that most galaxies have slopes \(-0.1 \lesssim \delta \lesssim +1\) (my wavelength parameterization reverses the sign of \(\delta\) ). At the end of the last post I made the obvious comment that if the modeled optical depth is near 0 the data can’t constrain the shape of the attenuation curve and in that case the best that can be hoped for is the model to return the prior for delta. Unfortunately the actual model behavior was more complicated than that for a spectrum that had a posterior mean tauv near 0:

9491-6101_modcalzetti_loose_pairs
Pairs plot of parameters tauv and delta, plus summed log likelihood with “loose” prior on delta.

Although there weren’t any indications of convergence issues in this run experts in the Stan community often warn that “banana” shaped posteriors like the joint distribution of tauv and delta above are difficult for the HMC algorithm in Stan to explore. I also suspect the distribution is multi-modal and at least one mode was missed, since the fit to the flux data as indicated by the summed log-likelihood is slightly worse than the unmodified Calzetti model.

A value of \(\delta\) this small actually reverses the slope of the attenuation curve, making it larger in the red than in the blue. It also resulted in a stellar mass estimate about 0.17 dex larger than the original model, which is well outside the statistical uncertainty:

9491-6101_stellar_mass_dust3ways
Stellar mass estimates for loose and tight priors on delta + unmodified Calzetti attenuation curve

I next tried a tighter prior on delta of 0.1 for the scale parameter, with the following result:

9491-6101_modcalzetti_tight_pairs
Pairs plot of parameters tauv and delta, plus summed log likelihood with “tight” prior on delta.

Now this is what I hoped to see. The marginal posterior of delta almost exactly returns its prior, properly reflecting the inability of the data to say anything about it. The posterior of tauv looks almost identical to the original model with a very slightly longer tail:

9491-6101_calzetti_orig_pairs
Pairs plot of parameters tauv and summed log likelihood, unmodified Calzetti attenuation curve

So this solves one problem but now I must worry that the prior is too tight in general since Salim’s results predict a considerably larger spread of slopes. As a first tentative test I ran another spectrum from the same spiral galaxy (this is mangaid 1-382712) that had moderately large attenuation in the original model (1.08 ± 0.04) with both “tight” and “loose” priors on delta with the following results:

9491-6101_tauv1_modcalzetti_tight_pairs
Joint posterior of parameters tau and delta with “tight” prior
9491-6101_tauv1_modcalzetti_loose_pairs
Joint posterior of parameters tau and delta with “loose” prior

The distributions of tauv look nearly the same, while delta shrinks very slightly towards 0 with a tight prior, but with almost the same variance. Some shrinkage towards Calzetti’s original curve might be OK. Anyway, on to a larger sample.

A more flexible prescription for dust attenuation (?)

I finally got around to reading a paper by Salim, Boquien, and Lee (2018), who proposed a simple modification to Calzetti‘s well known starburst attenuation relation that they claim accounts for most of the diversity of curves in both star forming and quiescent galaxies. For my purposes their equation 3, which summarizes the relation, can be simplified in two ways. First, for mostly historical reasons optical astronomers typically quantify the effect of dust with a color excess, usually E(B-V). If the absolute attenuation is needed, which is certainly the case in SFH modeling, the ratio of absolute to “selective” attenuation, usually written as \(\mathrm{R_V = A_V/E(B-V)}\) is needed. Parameterizing attenuation by color excess adds an unnecessary complication for my purposes. Salim et al. use a family of curves that differ only in a “slope” parameter \(\delta\) in their notation. Changing \(\delta\) changes \(\mathrm{R_V}\) according to their equation 4. But I have always parametrized dust attenuation by optical depth at V, \(\tau_V\) so that the relation between intrinsic and observed flux is

\(F_o(\lambda) = F_i(\lambda) \mathrm{e}^{-\tau_V k(\lambda)}\)

Note that parameterizing by optical depth rather than total attenuation \(\mathrm{A_V}\) is just a matter of taste since they only differ by a factor 1.08. The wavelength dependent part of the relationship is the same.

The second simplification results from the fact that the UV or 2175Å “bump” in the attenuation curve will never be redshifted into the spectral range of MaNGA data and in any case the subset of the EMILES library I currently use doesn’t extend that far into the UV. That removes the bump amplitude parameter and the second term in Salim et al.’s equation 3, reducing it to the form

\(k_{mod}(\lambda) = k_{Cal}(\lambda)(\frac{\lambda}{5500})^\delta\)

The published expression for \(k(\lambda)\) in Calzetti et al. (2000) is given in two segments with a small discontinuity due to rounding at the transition wavelength of 6300Å. This produces a small unphysical discontinuity when applied to spectra, so I just made a polynomial fit to the Calzetti curve over a longer wavelength range than gets used for modeling MaNGA or SDSS data. Also, I make the wavelength parameter \(y = 5500/\lambda\) instead of using the wavelength in microns as in Calzetti. With these adjustments Calzetti’s relation is, to more digits than necessary:

\(k_{Cal}(y) = -0.10177 + 0.549882y + 1.393039 y^2 – 1.098615 y^3 + 0.260628 y^4\)

and Salim’s modified curve is

\(k_{mod}(\lambda) = k_{Cal}(\lambda)y^\delta\)

Note that δ has the opposite sign as in Salim et al. Here is what the curve looks like over the observer frame wavelength range of MaNGA. A positive value of δ produces a steeper attenuation curve than Calzetti’s, while a negative value is shallower (grayer in common astronomers’ jargon). Salim et al. found typical values to range between about -0.1 and +1.

calzetti_mod_curve
Calzetti attenuation relation with modification proposed by Salim, Boquien, and Lee (2018). Dashed lines show shift in curve for their parameter δ = ± 0.3.

For a first pass attempt at modeling some real data I chose a spectrum from near the northern nucleus of Mrk 848 but outside the region of the highest velocity outflows. This spectrum had large optical depth \(\tau_V \approx 1.52\) and the unusual nature of the galaxy gave reason to think its extinction curve might differ significantly from Calzetti’s.

Encouragingly, the Stan model converged without difficulty and with an acceptable run time. Below are some posterior plots of the two attenuation parameters and a comparison to the optical depth parameter in the unmodified Calzetti dust model. I used a fairly informative prior of Normal(0, 0.5) for the parameter delta. The data actually constrain the value of delta, since its posterior marginal density is centered around -0.06 with a standard deviation of just 0.02. In the pairs plot below we can see there’s some correlation between the posteriors of tauv and delta, but not so much as to be concerning (yet).

mrk848_tauv_delta
(TL) marginal posteriors of optical depth parameter for Calzetti (red) and modified (dark gray) relation. (TR) parameter δ in modified Calzetti relation (BL) pairs plot of `tauv` and `delta` (BR) trace plots of `tauv` and `delta`

Overall the modified Calzetti model favors a slightly grayer attenuation curve with lower absolute attenuation:

mrk848_n_attenuation
Total attenuation for original and modified Calzetti relations. Spectrum was randomly selected near the northern nucleus or Mrk 848.

Here’s a quick look at the effect of the model modification on some key quantities. In the plot below the red symbols are for the unmodified Calzetti attenuation model, and gray or blue the modified one. These show histograms of the marginal posterior density of total stellar mass, 100Myr averaged star formation rate, and specific star formation rate. Because the modified model has lower total attenuation it needs fewer stars, so the lower stellar mass (by ≈ 0.05 dex) is a fairly predictable consequence. The star formation rate is also lower by a similar amount, making the estimates of specific star formation rate nearly identical.

The lower right pane compares model mass growth histories. I don’t have any immediate intuition about how the difference in attenuation models affects the SFH models, but notice that both show recent acceleration in star formation, which was a main theme of my Markarian 848 posts.

Stellar mass, SFR,, SSFR, and mass growth histories for original and modified Calzetti attenuation relation.

So, this first run looks ok. Of course the problem with real data is there’s no way to tell if a model modification actually brings it closer to reality — in this case it did improve the fit to the data a little bit (by about 0.2% in log likelihood) but some improvement is expected just from adding a parameter.

My concern right now is that if the dust attenuation is near 0 the data can’t constrain the value of δ. The best that can happen in this situation is for the model to return the prior. Preliminary investigation of a low attenuation spectrum (per the original model) suggests that in fact a tighter prior on delta is needed than what I originally chose.

Dust attenuation measured two ways

One more post making use of the measurement error model introduced last time and then I think I move on. I estimate the dust attenuation of the starlight in my SFH models using a Calzetti attenuation relation parametrized by the optical depth at V (τV). If good estimates of Hα and Hβ emission line fluxes are obtained we can also make optical depth estimates of emission line regions. Just to quickly review the math we have:

\(A_\lambda = \mathrm{e}^{-\tau_V k(\lambda)}\)

where \(k(\lambda)\) is the attenuation curve normalized to 1 at V (5500Å) and \(A_\lambda\) is the fractional flux attenuation at wavelength λ. Assuming an intrinsic Balmer decrement of 2.86, which is more or less the canonical value for H II regions, the estimated optical depth at V from the observed fluxes is:

\(\tau_V^{bd} = \log(\frac{\mathrm{F}(\mathrm{H}\alpha)/\mathrm{F}(\mathrm{H}\beta)}{ 2.86})/(k(4861)-k(6563))\)

The SFH models return samples from the posteriors of the emission lines, from which are calculated sample values of \(\tau_V^{bd}\).

Here is a plot of the estimated attenuation from the Balmer decrement vs. the SFH model estimates for all spectra from the 28 galaxy sample in the last two posts that have BPT classifications other than no or weak emission. Error bars are ±1 standard deviation.

tauv_bd__tauv
τVbd vs. τVstellar for 962 binned spectra in 28 MaNGA galaxies. Cloud of lines is from fit described in text. Solid and dashed lines are 1:1 and 2:1 relations.

It’s well known that attenuation in emission line regions is larger than that of the surrounding starlight, with a typical reddening ratio of ∼2 (among many references see the review by Calzetti (2001) and Charlot and Fall (2000)). One thing that’s clear in this plot that I haven’t seen explicitly mentioned in the literature is that even in the limit of no attenuation of starlight there typically is some in the emission lines. I ran the regression with measurement error model on this data set, and got the estimated relationship \(\tau_V^{bd} = 0.8 (\pm 0.05) + 1.7 ( \pm 0.09) \tau_V^{stellar}\) with a rather large estimated scatter of ≈ 0.45. So the slope is a little shallower than what’s typically assumed. The non-zero intercept seems to be a robust result, although it’s possible the models are systematically underestimating Hβ emission. I have no other reason to suspect that, though.

The large scatter shouldn’t be too much of a surprise. The shape of the attenuation curve is known to vary between and even within galaxies. Adopting a single canonical value for the Balmer decrement may be an oversimplification too, especially for regions ionized by mechanisms other than hot young stars. My models may be overdue for a more flexible prescription for attenuation.

The statistical assumptions of the measurement error model are a little suspect in this data set as well. The attenuation parameter tauv is constrained to be positive in the models. When it wants to be near 0 the samples from the posterior will pile up near 0 with a tail of larger values, looking more like draws from an exponential or gamma distribution than a gaussian. Here is an example from one galaxy in the sample that happens to have a wide range of mean attenuation estimates:

tauv_example_posteriors
Histograms of samples from the marginal posterior distributions of the parameter tauv for 4 spectra from plateifu 8080-3702.

I like theoretical quantile-quantile plots better than histograms for this type of visualization:

tauv_example_posterior_qqnorm
Normal quantile-quantile plots of samples from the marginal posterior distributions of the parameter tauv for 4 spectra from plateifu 8080-3702.

I haven’t looked at the distributions of emission line ratios in much detail. They might behave strangely in some cases too. But regardless of the validity of the statistical model applied to this data set it’s apparent that there is a fairly strong correlation, which is encouraging.

A simple Bayesian model for linear regression with measurement errors

Here’s a pretty common situation with astronomical data: two or more quantities are measured with nominally known uncertainties that in general will differ between observations. We’d like to explore the relationship among them, if any. After graphing the data and establishing that there is a relationship a first cut quantitative analysis would usually be a linear regression model fit to the data. But the ordinary least squares fit is biased and might be severely misleading if the measurement errors are large enough. I’m going to present a simple measurement error model formulation that’s amenable to Bayesian analysis and that I’ve implemented in Stan. This model is not my invention by the way — in the astronomical literature it dates at least to Kelly (2007), who also explored a number of generalizations. I’m only going to discuss the simplest case of a single predictor or covariate, and I’m also going to assume all conditional distributions are gaussian.

The basic idea of the model is that the real, unknown quantities (statisticians call these latent variables, and so will I) are related through a linear regression. The conditional distribution of the latent dependent variable is

\(y^{lat}_i | x^{lat}_i, \beta_0, \beta_1, \sigma \sim \mathcal{N}(\beta_0 + \beta_1 x^{lat}_i, \sigma)~~~ i = 1, \cdots, N\)

The observed values then are generated from the latent ones with distributions

\(y^{obs}_i| y^{lat}_i \sim \mathcal{N}(y^{lat}_i, \sigma_{y, i})\)

\(x^{obs}_i| x^{lat}_i \sim \mathcal{N}(x^{lat}_i, \sigma_{x, i})~~~ i = 1, \cdots, N\)

where \(\sigma_{x, i}, \sigma_{y, i}\) are the known standard deviations. The full joint distribution is completed by specifying priors for the parameters \(\beta_0, \beta_1, \sigma\). This model is very easy to implement in the Stan language and the complete code is listed below. I’ve also uploaded the code, a script to reproduce the simulated data example discussed below, and the SFR-stellar mass data from the last page in a dropbox folder.

/**
 * Simple regression with measurement error in x and y
*/

data {
  int<lower=0> N;
  vector[N] x;
  vector<lower=0>[N] sd_x;
  vector[N] y;
  vector<lower=0>[N] sd_y;
} 

// standardize data
transformed data {
  real mean_x = mean(x);
  real sd_xt = sd(x);
  real mean_y = mean(y);
  real sd_yt = sd(y);
  
  vector[N] xhat = (x - mean_x)/sd_xt;
  vector<lower=0>[N] sd_xhat = sd_x/sd_xt;
  
  vector[N] yhat = (y - mean_y)/sd_yt;
  vector<lower=0>[N] sd_yhat = sd_y/sd_yt;
  
}


parameters {
  vector[N] x_lat;
  vector[N] y_lat;
  real beta0;
  real beta1;
  real<lower=0> sigma;
}
transformed parameters {
  vector[N] mu_yhat = beta0 + beta1 * x_lat;
}
model {
  x_lat ~ normal(0., 1000.);
  beta0 ~ normal(0., 5.);
  beta1 ~ normal(0., 5.);
  sigma ~ normal(0., 10.);
  
  xhat ~ normal(x_lat, sd_xhat);
  y_lat ~ normal(mu_yhat, sigma);
  yhat ~ normal(y_lat, sd_yhat);
  
} 

generated quantities {
  vector[N] xhat_new;
  vector[N] yhat_new;
  vector[N] y_lat_new;
  vector[N] x_new;
  vector[N] y_new;
  vector[N] mu_x;
  vector[N] mu_y;
  real b0;
  real b1;
  real sigma_unorm;
  
  b0 = mean_y + sd_yt*beta0 - beta1*sd_yt*mean_x/sd_xt;
  b1 = beta1*sd_yt/sd_xt;
  sigma_unorm = sd_yt * sigma;
  
  mu_x = x_lat*sd_xt + mean_x;
  mu_y = mu_yhat*sd_yt + mean_y;
  
  for (n in 1:N) {
    xhat_new[n] = normal_rng(x_lat[n], sd_xhat[n]);
    y_lat_new[n] = normal_rng(beta0 + beta1 * x_lat[n], sigma);
    yhat_new[n] = normal_rng(y_lat_new[n], sd_yhat[n]);
    x_new[n] = sd_xt * xhat_new[n] + mean_x;
    y_new[n] = sd_yt * yhat_new[n] + mean_y;
  }
}

Most of this code should be self explanatory, but there are a few things to note. In the transformed data section I standardize both variables, that is I subtract the means and divide by the standard deviations. The individual observation standard deviations are also scaled. The transformed parameters block is strictly optional in this model. All it does is spell out the linear part of the linear regression model.

In the model block I give a very vague prior for the latent x variables. This has almost no effect on the model output since the posterior distributions of the latent values are strongly constrained by the observed data. The parameters of the regression model are given less vague priors. Since we standardized the data we know beta0 should be centered near 0 and beta1 near 1 and all three should be approximately unit scaled. The rest of the model block just encodes the conditional distributions I wrote out above.

Finally the generated quantities block does two things: generate some some new simulated data values under the model using the sampled parameters, and rescale the parameter values to the original data scale. The first task enables what’s called “posterior predictive checking.” Informally the idea is that if the model is successful simulated data generated under it should look like the data that was input.

It’s always a good idea to try out a model with some simulated data that conforms to it, so here’s a script to generate some data, then print and graph some basic results. This is also in the dropbox folder.

testme <- function(N=200, mu_x=0, b0=10, b1=2, sigma=0.5, seed1=1234, seed2=23455, ...) {
  require(rstan)
  require(ggplot2)
  set.seed(seed1)
  x_lat <- rnorm(N, mean=mu_x, sd=1.)
  sd_x <- runif(N, min=0.1, max=0.2)
  sd_y <- runif(N, min=0.1, max=0.2)
  x_obs <- x_lat+rnorm(N, sd=sd_x)
  y_lat <- b0 + b1*x_lat + rnorm(N, sd=sigma)
  y_obs <- y_lat + rnorm(N, sd=sd_y)
  stan_dat <- list(N=N, x=x_obs, sd_x=sd_x, y=y_obs, sd_y=sd_y)
  df1 <- data.frame(x_lat=x_lat, x=x_obs, sd_x=sd_x, y_lat=y_lat, y=y_obs, sd_y=sd_y)
  sfit <- stan(file="ls_me.stan", data=stan_dat, seed=seed2, chains=4, cores=4, ...)
  print(sfit, pars=c("b0", "b1", "sigma_unorm"), digits=3)
  g1 <- ggplot(df1)+geom_point(aes(x=x, y=y))+geom_errorbar(aes(x=x,ymin=y-sd_y,ymax=y+sd_y)) +
                      geom_errorbarh(aes(y=y, xmin=x-sd_x, xmax=x+sd_x))
  post <- extract(sfit)
  df2 <- data.frame(x_new=as.numeric(post$x_new), y_new=as.numeric(post$y_new))
  g1 <- g1 + stat_ellipse(aes(x=x_new, y=y_new), data=df2, geom="path", type="norm", linetype=2, color="blue") +
            geom_abline(slope=post$b1, intercept=post$b0, alpha=1/100)
  plot(g1)
  list(sfit=sfit, stan_dat=stan_dat, df=df1, graph=g1)
}
  

This should print the following output:

Inference for Stan model: ls_me.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean    sd  2.5%   25%   50%    75%  97.5% n_eff  Rhat
b0          9.995   0.001 0.042 9.912 9.966 9.994 10.023 10.078  5041 0.999
b1          1.993   0.001 0.040 1.916 1.965 1.993  2.020  2.073  4478 1.000
sigma_unorm 0.472   0.001 0.037 0.402 0.447 0.471  0.496  0.549  2459 1.003

Samples were drawn using NUTS(diag_e) at Thu Jan  9 10:09:34 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
> 

and graph:

bayes_regression_measurment_error_fake
Simulated data and model fit from script in text. Ellipse is 95% confidence interval for new data generated from model.

This all looks pretty good. The true parameters are well within the 95% confidence bounds of the estimates. In the graph the simulated new data is summarized with a 95% confidence ellipse, which encloses just about 95% of the input data, so the posterior predictive check indicates a good model. Stan is quite aggressive at flagging potential convergence failures, and no warnings were generated.

Turning to some real data I also included in the dropbox folder the star formation rate density versus stellar mass density data that I discussed in the last post. This is in something called R “dump” format, which is just an ascii file with R assignment statements for the input data. This isn’t actually a convenient form for input to rstan’s sampler or ggplot2’s plotting commands, so once loaded the data are copied into a list and a data frame. The interactive session for analyzing the data was:

 source("sf_mstar_sfr.txt")
> sf_dat <- list(N=N, x=x, sd_x=sd_x, y=y, sd_y=sd_y)
> df <- data.frame(x=x,sd_x=sd_x,y=y,sd_y=sd_y)
> stan_sf <- stan(file="ls_me.stan",data=sf_dat, chains=4,seed=12345L)

 post <- extract(stan_sf)
> odr <- pracma::odregress(df$x,df$y)
> ggplot(df, aes(x=x,y=y))+geom_point()+geom_errorbar(aes(x=x,ymin=y-sd_y,ymax=y+sd_y))+geom_errorbarh(aes(y=y,xmin=x-sd_x,xmax=x+sd_x))+geom_abline(slope=post$b1,intercept=post$b0,alpha=1/100)+stat_ellipse(aes(x=x_new,y=y_new),data=data.frame(x_new=as.numeric(post$x_new),y_new=as.numeric(post$y_new)),geom="path",type="norm",linetype=2, color="blue")+geom_abline(slope=odr$coef[1],intercept=odr$coef[2],color='red',linetype=2)
> print(stan_sf, pars=c("b0","b1","sigma_unorm"))

 
bayes_regression_measurment_error_sfms
Star formation rate vs. Stellar mass for star forming regions. Data from previous post. Semi-transparent lines – model fits to the regression line as described in text. Dashed red line – “orthogonal distance regression” fit.
Inference for Stan model: ls_me.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
b0          -11.20       0 0.26 -11.73 -11.37 -11.19 -11.02 -10.67  7429    1
b1            1.18       0 0.03   1.12   1.16   1.18   1.20   1.24  7453    1
sigma_unorm   0.27       0 0.01   0.25   0.27   0.27   0.28   0.29  7385    1

Samples were drawn using NUTS(diag_e) at Thu Jan  9 10:49:42 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Once again the model seems to fit the data pretty well, the posterior predictive check is reasonable, and Stan didn’t complain. Astronomers are often interested in the “cosmic variance” of various quantities, that is the true amount of unaccounted for variation. If the error estimates in this data set are reasonable the cosmic variance in SFR density is estimated by the parameter σ. The mean estimate of around 0.3 dex is consistent with other estimates in the literature1see the compilation in the paper by Speagle et al. that I cited last time, for example.

I noticed in preparing the last post that several authors have used something called “orthogonal distance regression” (ODR) to estimate the star forming main sequence relationship. I don’t know much about the technique besides that it’s an errors in variables regression method. There is an R implementation in the package pracma. The red dashed line in the plot above is the estimate for this dataset. The estimated slope (1.41) is much steeper than the range of estimates from this model. On casual visual inspection though it’s not clearly worse at capturing the mean relationship.

A few properties of my transitional galaxy candidate sample

It took a few months but I did manage to analyze 28 of the 29 galaxies in the sample I introduced last time. One member — mangaid 1-604907 — hosts a broad line AGN and has broad emission lines throughout. That’s not favorable for my modeling methods, so I left it out. It took a while to develop a more or less standardized analysis protocol, so there may be some variation in S/N cuts in binning the spectra and in details of model runs in Stan. Most runs used 250 warmup and 750 total iterations for each of 4 chains run in parallel, with some adaptation parameters changed from their default values1I set target acceptance probability adapt_delta to 0.925 or 0.95 and the maximum treedepth for the No U-Turn Sampler max_treedepth to 11-12. A total post-warmup sample size of 2000 is enough for the inferences I want to make. One of the major advantages of the NUTS sampler is that once it converges it tends to produce draws from the posterior with very low autocorrelation, so effective sample sizes tend to be close to the number of samples.

I’m just going to look at a few measured properties of the sample in this post. In future ones I may look in more detail at some individual galaxies or the sample as a whole. Without a control sample it’s hard to say if this one is significantly different from a randomly chosen sample of galaxies, and I’m not going to try. In the plots shown below each point represents measurements on a single binned spectrum. The number of binned spectra per galaxy ranged from 15 to 153 with a median of 51.5, so a relatively small number of galaxies contribute disproportionately to these plots.

One of the more important empirical results in extragalactic astrophysics is the existence of a fairly well defined and approximately linear relationship between stellar mass and star formation rate for star forming galaxies, which has come to be known as the “star forming main sequence.” Thanks to CALIFA and MaNGA it’s been established in recent years that the SFMS extends to subgalactic scales as well, at least down to the ∼kpc resolution of these surveys. This first plot is of the star formation rate surface density vs. stellar mass surface density, where recall my estimate of SFR is for a time scale of 100 Myr. Units are \(\mathrm{M_\odot /yr/kpc^2} \) and \(\mathrm{M_\odot /kpc^2} \), logarithmically scaled. These estimates are uncorrected for inclination and are color coded by BPT class using Kauffmann’s classification scheme for [N II] 6584, with two additional classes for spectra with weak or no emission lines.

If we take spectra with star forming line ratios as comprising the SFMS there is a fairly tight relation: the cloud of lines are estimates from a Bayesian simple linear regression with measurement error model fit to the points with star forming BPT classification only (N = 428). The modeled relationship is \(\Sigma_{sfr} = -11.2 (\pm 0.5) + 1.18 (\pm 0.06)~ \Sigma_{M^*}\) (95% marginal confidence limits), with a scatter around the mean relation of ≈ 0.27 dex. The slope here is rather steeper than most estimates2For example in a large compilation by Speagle et al. (2014) none of the estimates exceeded a slope of 1., but perhaps coincidentally is very close to an estimate for a small sample of MaNGA starforming galaxies in Lin et al. (2019). I don’t assign any particular significance to this result. The slope of the SFMS is highly sensitive to the fitting method used, the SFR and stellar mass calibrators, and selection effects. Also, the slope and intercept estimates are highly correlated for both Bayesian and frequentist fitting methods.

One notable feature of this plot is the rather clear stratification by BPT class, with regions having AGN/LINER line ratios and weak emission line regions offset downwards by ~1 dex. Interestingly, regions with “composite” line ratios straddle both sides of the main sequence, with some of the largest outliers on the high side. This is mostly due to the presence of Markarian 848 in the sample, which we saw in recent posts has composite line ratios in most of the area of the IFU footprint and high star formation rates near the northern nucleus (with even more hidden by dust).

sigma_sfrXsigma_mstar
Σsfr vs. ΣM*. Cloud of straight lines is an estimate of the star-forming main sequence relation based on spectra with star-forming line ratios. Sample is all analyzed spectra from the set of “transitional” candidates of the previous post.

Another notable relationship that I’ve shown previously for a few individual galaxies is between the star formation rate estimated from the SFH models and Hα luminosity, which is the main SFR calibrator in optical spectra. In the left hand plot below Hα is corrected for the estimated attenuation for the stellar component in the SFH models. The straight line is the SFR-Hα calibration of Moustakas et al. (2006), which can be traced back to early ’90s work by Kennicutt.

Most of the sample does follow a linear relationship between SFR density and Hα luminosity density with an offset from the Kennicutt-Moustakas calibration, but there appears to be a departure from linearity at the low SFR end in the sense that the 100 Myr averaged SFR exceeds the amount predicted by Hα (which recall traces star formation on 5-10 Myr scales). This might be interpreted as indicating that the sample includes a significant number of regions that have been very recently quenched (that is within the past 10-100 Myr). There are other possible interpretation though, including biased estimates of Hα luminosity when emission lines are weak.

In the right hand panel below I plot the same relationship but with Hα corrected for attenuation using the Balmer decrement for spectra with firm detections in the four lines that go into the [N II]/Hα vs. [O III]/Hβ BPT classification, and therefore have firm detections in Hβ. The sample now nicely straddles the calibration line over the ∼ 4 orders of magnitude of SFR density estimates. So, the attenuation in the regions where emission lines arise is systematically higher than the estimated attenuation of stellar light. This is a well known result. What’s encouraging is it implies my model attenuation estimates actually contain useful information.

sigma_sfrXsigma_logl_ha
(L) Estimated Σsfr vs. Σlog L(Hα) corrected for attenuation using stellar attenuation estimate. (R) same but Hα luminosity corrected using Balmer decrement. Spectra with detected Hβ emission only.

One final relation: some measure of the 4000Å break strength has been used as a calibrator of specific star formation rate since at least Brinchmann et al. (2004). Below is my version using the “narrow” definition of D4000. I haven’t attempted a quantitative comparison with any other work, but clearly there’s a well defined relationship. Maybe worth noting is that “red and dead” ETGs typically have \(\mathrm{D_n(4000)} \gtrsim 1.75\) (see my previous post for example). Very few of the spectra in this sample fall in that region, and most are low S/N spectra in the outskirts of a few of the galaxies.

ssfrXd4000
Specific star formation rate vs. Dn4000

Two obvious false positives in this sample were a pair of grand design spirals (mangaids 1-23746 and 1.382712) with H II regions sprinkled along the full length of their arms. To see why they were selected and verify that they’re in fact false positives here are BPT maps:

8611-12702_bptmap
Map of BPT classification — mangaid 1-23746 (plateifu 8611-12702)
9491-6101_bptmap
Map of BPT classification — mangaid 1-382712 (plateifu 9491-6101)

These are perfect illustrations of the perils of using single fiber spectra for sample selection when global galaxy properties are of interest. The central regions of both galaxies have “composite” spectra, which might actually indicate that the emission is from a combination of AGN and star forming regions, but outside the nuclear regions star forming line ratios prevail throughout.

These two galaxies contribute about 45% of the binned spectra with star forming line ratios, so the SFMS would be much more sparsely populated without their contribution. Only one other galaxy (mangaid 1-523050) is similarly dominated by SF regions and it has significantly disturbed morphology.

I may return to this sample or individual members in the future. Probably my next posts will be about Bayesian modelling though.

Selecting “transitioning” galaxies

I’ve mentioned a few times that I took a shot several years ago at selecting a sample of post-starburst, or to use a term that’s gained some currency recently, “transitioning” galaxies1transitioning in this context implies that star formation has recently ceased or is in the process of shutting down. To cite one example Alatalo et al. (2017) use the word 18 times with this meaning. One thing that astronomers have come to realize is that traditional K+A spectroscopic selection criteria (basically requiring a combination of strong Balmer absorption lines and weak emission) potentially miss important populations, for example galaxies hosting an AGN. Selecting for weak emission also implies that star formation has already shut down, which precludes finding galaxies that are in the process of shutting down but still forming stars. On the other hand simply dropping constraints on emission won’t work because the resulting sample would be contaminated with a large fraction of normal star forming galaxies. Here is a popular spectroscopic diagnostic diagram, plotting the (pseudo) Lick HδA index against the 4000Å break strength index Dn(4000). This version used the MPA-JHU measurements for approximately 1/2 of SDSS single fiber galaxy spectra from DR8.

hd_d4000_380K_mpa_jhu
The lick Hδ – 4000Å break strength plane for a large sample of SDSS galaxies. Measurements from the MPA-JHU pipeline.

As is true of many observed properties of galaxies the distribution in this diagram is strongly bimodal, with one of the modes centered right around HδA ≈ 5Å. For the most part this region of the Hδ-D4000 plane is occupied by starforming galaxies — somewhere around 8-10% of all SDSS spectra with MPA measurements have HδA ≥ 5Å, while true K+A galaxies are no more than a fraction of a percent of the local population. I took the simplest possible approach to try to minimize contamination from starforming galaxies: besides a traditional K+A selection with slightly relaxed emission line constraints I made a selection of spectra with emission line ratios in the “other than starforming” region of the [N II]/Hα vs [O III]/Hβ BPT diagnostic. I decided to use Kauffmann’s empirical starforming boundary as the selection criterion, and therefore included the so-called “composite” region of the BPT diagnostic. For the sake of completeness and reproducibility here is part of the CASJobs query I used. I saved a number of measurements from the MPA spectroscopic pipeline as well as the SDSS photo pipeline that aren’t included in the listing below. I did include a few lines just to point out that it’s important to use the emission line subtracted version of the Lick Hδ index because it can be significantly contaminated. Most of the where clause of the query is selecting spectra with firm detections and good error estimates of the relevant emission and absorption lines. There is also a redshift range constraint — the upper limit is set to avoid contamination of Hδ by the 5577Å terrestrial night sky line. Finally there are some modest quality constraints.

select into mydb.kaufpostagn
  gi.lick_hd_a_sub as lick_hd_a,
  gi.lick_hd_a_sub_err as lick_hd_a_err,
  gi.d4000_n,
  gi.d4000_n_err
from specObj s
  left outer join PhotoObj as p on s.bestObjid = p.Objid
  left outer join galSpecline as g on s.specObjid = g.specObjid
  left outer join galSpecIndx as gi on s.specObjid = gi.specObjid
  left outer join galSpecExtra as ge on s.specObjid = ge.specObjid
  where
  (g.h_alpha_flux/g.h_alpha_flux_err > 3 and g.h_alpha_flux_err > 0) and
  (g.nii_6584_flux/g.nii_6584_flux_err > 3 and g.nii_6584_flux_err > 0) and
  (g.h_beta_flux/g.h_beta_flux_err > 3 and g.h_beta_flux_err > 0) and
  (g.oiii_flux/g.oiii_flux_err > 3 and g.oiii_flux_err > 0) and
  ((0.61/(log10(g.nii_6584_flux/g.h_alpha_flux)-0.05)+1.3 < 
    log10(g.oiii_flux/g.h_beta_flux)) or (log10(g.nii_6584_flux/g.h_alpha_flux)>=0.05)) and
  (gi.lick_hd_a_sub > 5 and gi.lick_hd_a_sub_err > 0) and
  (s.z > .02 and s.z < 0.35) and
  (s.snMedian > 10) and
  (s.zWarning = 0 or s.zWarning = 16)
  order by
  s.plate, s.mjd, s.fiberid

I used a second query to make a more traditional K+A selection. This all could have been done with a single query but the conditions get a bit complicated. These queries run in the DR10 “context”2the data release doesn’t actually matter as long as it’s later than DR7 since the last release the MPA pipeline was run on was DR8. should return 4,235 and 874 hits respectively, with some overlap.

select into mydb.myka
s.ra,
s.dec,
s.plate,
s.mjd,
s.fiberid,
s.specObjid,
from specObj s
left outer join PhotoObj as p on s.bestObjid = p.Objid
left outer join galSpecline as g on s.specObjid = g.specObjid
left outer join galSpecIndx as gi on s.specObjid = gi.specObjid
left outer join galSpecExtra as ge on s.specObjid = ge.specObjid
where
  (g.oii_3729_eqw > -5 and g.oii_3729_eqw_err > 0) and
  (g.h_alpha_eqw >  -5 and g.h_alpha_eqw_err > 0) and
  (gi.lick_hd_a_sub > 5 and gi.lick_hd_a_sub_err > 0) and
  (s.z > .02 and s.z < 0.35) and
  (s.snMedian > 10) and
  (s.zWarning = 0 or s.zWarning = 16)
order by
  s.plate, s.mjd, s.fiberid

I thought at the time based solely on examining SDSS thumbnails that there were rather too many false positives in the form of normal starforming galaxies for the intended purpose of the sample. The choice to include galaxies in Kauffmann’s composite region played some role in this — although even the people responsible for this classification scheme admit now that it’s too simple some galaxies really do have spectra that are composites of starforming regions and AGN. But the more important reason I think is the well known problem that single fiber spectra only cover a portion of most nearby galaxies and galaxy nuclei (which were the intended targets) just aren’t representative of the global properties of galaxies. The SPOGS3an acronym formed somehow from the term “shocked post-starburst galaxy survey.” sample of Alatalo et al. 2016, which has similar aims to my selection but more elaborate selection criteria, has (in my opinion) a similar issue of false positives for the same reason.

MaNGA provides a unique opportunity to examine the causes of false positives in my transitional sample, and I hope also confirm some real positives. A simple position crossmatch of my sample, which has about 5500 members, with MaNGA DR15 object coordinates produced 29 matches with a position tolerance of about 3″. SDSS thumbnails of the 29 are shown below. By my count at least 13, or 45%, of the sample are visibly disturbed in some way with 7 of those being clear merger remnants with evident shells and tidal tails.

I’ve been calculating star formation history models for these at a leisurely pace and plan to write more about the sample in future posts. Two of these I’ve already written about — besides Mrk 848 number 15 in the postage stamps was the subject of a series of posts starting here. I may write more case studies like those, or perhaps take a more holistic look at the sample and compare to a control group.

As something of an aside, with a looser position matching tolerance a 30th object turns up:

CGCG 390-0666 (SDSS finder chart image)
CGCG 390-066 (Legacy survey cutout)

This made my sample based on the spectrum of the eastern nucleus. This object turns out to be doubly strange. Besides having two apparent nuclei, when I did some preliminary spectral fitting I noticed a peculiar pattern of residuals that turned out to be due to this:

8083-9101_spec_extract
Extract from a single spectrum of plateifu 8083-9101 (mangaid 1-38017)

Many of the spectra have emission lines at exactly their rest frame wavelengths. This particular spectrum, which came from somewhere in the SE quadrant of the IFU footprint, clearly shows Hα plus the [N II] and [S II] doublets offset from the same lines at the redshift of the galaxy. It turns out this galaxy lies near the edge of the Orion-Eridanus superbubble, a large region of diffuse emission in our galaxy. Should I choose to study this galaxy in more detail these lines could be masked easily enough, but the metadata for this data cube says it shouldn’t be used.

Markarian 848 – Closing topics

I’m going to close out my analysis of Mrk 848 for now with three topics. First, dust. Like most SED fitting codes mine produces an estimate of the internal attenuation, which I parameterize with τV, the optical depth at V assuming a conventional Calzetti attenuation curve. Before getting into a discussion for context here is a map of the posterior mean estimate for the higher S/N target binning of the data. For reference isophotes of the synthesized r band surface brightness taken from the MaNGA data cube are superimposed:

mrk848_tauv_map
Map of posterior mean of τV from single component dust model fits with Calzetti attenuation

This compares reasonably well with my visual impression of the dust distribution. Both nuclei have very large dust optical depths with a gradual decline outside, while the northern tidal tail has relatively little attenuation.

The paper by Yuan et al. that I looked at last time devoted most of its space to different ways of modeling dust attenuation, ultimately concluding that a two component dust model of the sort advocated by Charlot and Fall (2000) was needed to bring results of full spectral fitting using ppxf on the same MaNGA data as I’ve examined into reasonable agreement with broad band UV-IR SED fits.

There’s certainly some evidence in support of this. Here is a plot I’ve shown for some other systems of the estimated optical depth of the Balmer emission line emitting regions based on the observed vs. theoretical Balmer decrement (I’ve assumed an intrinsic Hα/Hβ ratio of 2.86 and a Calzetti attenuation relation) plotted against the optical depth estimated from the SFH models, which roughly estimates the amount of reddening needed to fit the SSP model spectra to the observed continuum. In some respects this is a more favorable system than some I’ve looked at because Hβ emission is at measurable levels throughout. On the other hand there is clear evidence that multiple ionization mechanisms are at work, so the assumption of a single canonical value of Hα/Hβ is likely too simple. This might be a partial cause of the scatter in the modeled relationship, but it’s encouraging that there is a strong positive correlation (for those who care, the correlation coefficient between the mean values is 0.8).

The solid line in the graph below is 1:1. The semi-transparent cloud of lines are the sampled relationships from a Bayesian errors in variables regression model. The mean (and marginal 1σ uncertainty) is \(\tau_{V, bd} = (0.94\pm 0.11) + (1.21 \pm 0.12) \tau_V\). So the estimated relationship is just a little steeper than 1:1 but with an offset of about 1, which is a little different from the Charlot & Fall model and from what Yuan et al. found, where the youngest stellar component has about 2-3 times the dust attenuation as the older stellar population. I’ve seen a similar not so steep relationship in every system I’ve looked at and don’t know why it differs from what is typically assumed. I may look into it some day.

τV estimated from Balmer decrement vs. τV from model fits. Straight line is 1:1 relation. Cloud of lines are from Bayesian errors in variables regression model.

I did have time to run some 2 dust component SFH models. This is a very simple extension of the single component models: a single optical depth is applied to all SSP spectra. A second component with the optical depth fixed at 1 larger than the bulk value is applied only to the youngest model spectra, which recall were taken from unevolved SSPs from the updated BC03 library. I’m just going to show the most basic results from the models for now in the form of maps of the SFR density and specific star formation rate. Compared to the same maps displayed at the end of the last post there is very little difference in spatial variation of these quantities. The main effect of adding more reddened young populations to the model is to replace some of the older light — this is the well known dust-age degeneracy. The average effect was to increase the stellar mass density (by ≈ 0.05 dex overall) while slightly decreasing the 100Myr average SFR (by ≈ 0.04 dex), leading to an average decrease in specific star formation rate of ≈ 0.09 dex. While there are some spatial variations in all of these quantities no qualitative conclusion changes very much.

mrk848_sigma_sfr_sfr_2dust_maps
Results from fits with 2 component dust models. (L) SFR density. (R) Specific SFR

Contrary to Yuan+ I don’t find a clear need for a 2 component dust model. Without trying to replicate their results I can’t say why exactly we disagree, but I think they erred in aggregating the MaNGA data to the lowest spatial resolution of the broad band photometric data they used, which was 5″ radius. There are clear variations in physical conditions on much smaller scales than this.

Second topic: the most widely accepted SFR indicator in visual wavelengths is Hα luminosity. Here is another plot I’ve displayed previously: a simple scatterplot of Hα luminosity density against the 100Myr averaged star formation rate density from the SFH models. Luminosity density is corrected for attenuation estimated from the Balmer decrement and for comparison the light gray points are the uncorrected values. Points are color coded by BPT class determined in the usual manner. The straight line is the Hα – SFR calibration of Moustakas et al. (2006), which in turn is taken from earlier work by Kennicutt.

Model SFR density vs. Hα luminosity density corrected for extinction estimated from Balmer decrement. Light colored points are uncorrected for extinction. Straight line is Hα-SFR calibration from Moustakas et al. (2006)

Keeping in mind that Hα emission tracks star formation on timescales of ∼10 Myr1to the extent that ionization is caused by hot young stars. There are evidently multiple ionizing sources in this system, but disentangling their effects seems hard. Note there’s no clear stratification by BPT class in this plot. this graph strongly supports the scenario I outlined in the last post. At the highest Hα luminosities the SFR-Hα trend nicely straddles the Kennicutt-Moustakas calibration, consistent with the finding that the central regions of the two galaxies have had ∼constant or slowly rising star formation rates in the recent past. At lower Hα luminosities the 100Myr average trends consistently above the calibration line, implying a recent fading of star formation.

The maps below add some detail, and here the perceptual uniformity of the viridis color palette really helps. If star formation exactly tracked Hα luminosity these two maps would look the same. Instead the northern tidal tail in particular and the small piece of the southern one within the IFU footprint are underluminous in Hα, again implying a recent fading of star formation in the peripheries.

(L) Hα luminosity density, corrected for extinction estimated by Balmer decrement. (R) SFR density (100 Myr average).

Final topic: the fit to the data, and in particular the emission lines. As I’ve mentioned previously I fit the stellar contribution and emission lines simultaneously, generally assuming separate single component gaussian velocity dispersions and a common system velocity offset. This works well for most galaxies, but for active galaxies or systems like this one with complex velocity profiles maybe not so much. In particular the northern nuclear region is known to have high velocity outflows in both ionized and neutral gas due presumably to supernova driven winds. I’m just going to look at the central fiber spectrum for now. I haven’t examined the fits in detail, but in general they get better outside the immediate region of the center. First, here is the fit to the data using my standard model. In the top panel the gray line, which mostly can’t be seen, is the observed spectrum. Blue are quantiles of the posterior mean fit — this is actually a ribbon, although its width is too thin to be discernable. The bottom panel are the residuals in standard deviations. Yes, they run as large as ±50σ, with conspicuous problems around all emission lines. There are also a number of usually weak emission lines that I don’t track that are present in this spectrum.

mrk848_fit_central_spec
Fit to central fiber spectrum; model with single gaussian velocity distributions.

I have a solution for cases like this which I call partially parametric. I assume the usual Gauss-Hermite form for the emission lines (as in, for example, ppxf) while the stellar velocity distribution is modeled with a convolution kernel2I think I’ve discussed this previously but I’m too lazy to check right now. If I haven’t I’ll post about it someday. Unfortunately the Stan implementation of this model takes at least an order of magnitude longer to execute than my standard one, which makes its wholesale use prohibitively expensive. It does materially improve the fit to this spectrum although there are still problems with the stronger emission lines. Let’s zoom in on a few crucial regions of the spectrum:

Zoomed in fit to central fiber spectrum using “partially parametric velocity distribution” model. Grey: observed flux. Blue: model.

The two things that are evident here are the clear sign of outflow in the forbidden emission lines, particularly [O III] and [N II], while the Balmer lines are relatively more symmetrical as are the [S II] doublet at 6717, 6730Å. The existence of rather significant residuals is likely because emission is coming from at least two physically distinct regions while the fit to the data is mostly driven by Hα, which as usual is the strongest emission line. The fit captures the emission line cores in the high order Balmer lines rather well and also the absorption lines on the blue side of the 4000Å break except for the region around the [Ne III] line at 3869Å.

I’m mostly interested in star formation histories, and it’s important to see what differences are present. Here is a comparison of three models: my standard one, the two dust component model, and the partially parametric velocity dispersion model:

mrk848_centralsfr3ways
Detailed star formation history models for the northern nucleus using 3 different models.

In fact the differences are small and not clearly outside the range of normal MCMC variability. The two dust model slightly increases the contribution of the youngest stellar component at the expense of slightly older contributors. All three have the presumably artifactual uptick in SFR at 4Gyr and very similar estimated histories for ages >1 Gyr.

I still envision a number of future model refinements. The current version of the official data analysis pipeline tracks several more emission lines than I do at present and has updated wavelengths that may be more accurate than the ones from the SDSS spectro pipeline. It might be useful to allow at least two distinct emission line velocity distributions, with for example one for recombination lines and one for forbidden. Unfortunately the computational expense of this sort of generalization at present is prohibitive.

I’m not impressed with the two dust model that I tried, but there may still be useful refinements to the attenuation model to be made. A more flexible form of the Calzetti relation might be useful for example3there is recent relevant literature on this topic that I’m again too lazy to look up.

My initial impression of this system was that it was a clear false positive that was selected mostly because of a spurious BPT classification. On further reflection with MaNGA data available it’s not so clear. A slight surprise is the strong Balmer absorption virtually throughout the system with evidence for a recent shut down of star formation in the tidal tails. A popular scenario for the formation of K+A galaxies through major mergers is that they experience a centrally concentrated starburst after coalescence which, once the dust clears and assuming that some feedback mechanism shuts off star formation leads to a period of up to a Gyr or so with a classic K+A signature4I previously cited Bekki et al. 2005, who examine this scenario in considerable detail.Capturing a merger in the instant before final coalescence provides important clues about this process.

To the best of my knowledge there have been no attempts at dynamical modeling of this particular system. There is now reasonably good kinematic information for the region covered by the MaNGA IFU, and there is good photometric data from both HST and several imaging surveys. Together these make detailed dynamical modeling technically feasible. It would be interesting if star formation histories could further constrain such models. Being aware of the multiple “degeneracies” between stellar age and other physical properties I’m not highly confident, but it seems provocative that we can apparently identify distinct stages in the evolutionary history of this system.