I have been working mostly with spectroscopic data from SDSS-IV [MaNGA](http://www.sdss.org/dr14/manga/) since its first public release in 2016. The first few releases include little in the way of value-added data beyond calibrated spectra and some metadata taken mostly from the [NASA-Sloan Atlas](http://nsatlas.org/). In particular no kinematic measurements are provided, which are essential to most analyses. In the next several posts I’m going to write about estimating and using the first moment of the line of sight velocity distributions (LOSVD), that is velocity offsets from the system velocity. Keep in mind that estimating the moments of the LOSVD was the original objective of Cappellari and Emsellem’s `ppxf`, and anyone professionally engaged with the data should be using it or a similar tool that has passed peer review. But, since this is a hobby I developed my own tools. All software needed to replicate the examples posted here is available on my github repository at
The first step in any data analysis is to acquire some data, and SDSS makes this really easy. The base URL for MaNGA data in DR14, which is still the current release at the time of writing, is
![UGC 4300](http://skyserver.sdss.org/dr14/SkyServerWS/ImgCutout/getjpeg?TaskName=Skyserver.Chart.Image&ra=124.003321406132&dec=27.0758846629965&scale=0.4&width=512&height=512&opt=GL&query=&Grid=on&Label=on)
Browser based downloads are fine to get a file or two, but for bulk access it’s easiest to use a tool like `wget` or `rsync`. This is documented at
“`{bash}
#! /bin/bash
wget -nv -nc -r -nH -nd -P. \
-i $1 \
-B http://data.sdss.org/sas/dr14/manga/spectro/redux/v2_1_2/
“`
This reads from a text file with the remainder of the path to the files to be downloaded listed one per line, fetches the files from the Science Archive Server, and stores them in the current working directory.
Next we need to get the data into R’s workspace. For this I use the R package [FITSio](https://cran.r-project.org/package=FITSio), which is the only package I’ve found for reading FITS files. While not as complete or polished as `astropy`’s FITS module it’s fine for reading in binary data tables. We also need some metadata from the MaNGA catalog, which is described [here](http://www.sdss.org/dr14/manga/manga-data/catalogs/). I have created a flat data table from the catalog and stored it in the R data file [drpcat.rda](https://github.com/mlpeck/vrot_stanmodels/blob/master/drpcat.rda) containing the R object `drpcat`. With the source files sourced into the R session the command to read a data cube is
“`{R}
gdat.cube <- readcube("manga-8942-12702-LOGCUBE.fits", drpcat)
```
For any reader unfamiliar with R syntax the symbol `<-` is the assignment operator (don't ask), so this line is just assigning the value returned by `readcube()` to the variable `gdat.cube`, which...
```{R}
> str(gdat.cube)
List of 13
$ meta :List of 10
..$ mangaid : chr “1-218280”
..$ plateifu: chr “8942-12702”
..$ ra : num 124
..$ dec : num 27.1
..$ l : num 196
..$ b : num 29.6
..$ ebv : num 0.0367
..$ z : num 0.0255
..$ zdist : num 0.0254
..$ cpix : num [1:2] 38 38
$ xpos : num [1:74] 0.00514 0.005 0.00486 0.00472 0.00458 …
$ ypos : num [1:74] -0.00514 -0.005 -0.00486 -0.00472 -0.00458 …
$ lambda: num [1:4563(1d)] 3622 3622 3623 3624 3625 …
$ ra.f : num [1:74, 1:74] 124 124 124 124 124 …
$ dec.f : num [1:74, 1:74] 27.1 27.1 27.1 27.1 27.1 …
$ flux : num [1:74, 1:74, 1:4563] NA NA NA NA NA NA NA NA NA NA …
$ ivar : num [1:74, 1:74, 1:4563] NA NA NA NA NA NA NA NA NA NA …
$ snr : num [1:74, 1:74] NA NA NA NA NA NA NA NA NA NA …
$ gimg : num [1:74, 1:74] 0 0 0 0 0 0 0 0 0 0 …
$ rimg : num [1:74, 1:74] 0 0 0 0 0 0 0 0 0 0 …
$ iimg : num [1:74, 1:74] 0 0 0 0 0 0 0 0 0 0 …
$ zimg : num [1:74, 1:74] 0 0 0 0 0 0 0 0 0 0 …
“`
is a large list containing another list with some metadata and a number of numerical arrays, some taken directly from the FITS file and some processed in various ways. I correct the fluxes and inverse variances for galactic foreground extinction and also check the `mask` array for spaxels that should be masked. These are set to `NA` in the `flux` and `ivar` arrays. For this observation using the largest, 127 fiber, IFU these are stored in 74 x 74 x 4563 arrays, with the number of wavelength bins equal to 4563. The 74 x 74 spaxel field is somewhat larger than the IFU footprint so only some spaxels have data — in this example just 3,049 out of 5,476, or about 56%.
RSS files are a little more complicated to work with. They contain one spectrum for each exposure and each fiber, while for every observation the IFU is dithered to 3 distinct positions. Therefore if there are Nexp total exposures there will be Nexp/3 exposures for each fiber/pointing combination. In the RSS FITS files the flux and other data are stored in 4563 x Nexp image arrays, but to use the data we need to identify the distinct fiber/pointing combinations and combine the data. This is complicated by the fact that it isn’t entirely clear how the rows are stacked — this varies between data sets, although the first 3Nfiber spectra appear always to comprise one complete set of 3 pointings. A further complication is that because of differential refraction the position on the sky seen by each fiber varies with wavelength and with exposure. I chose to gloss over that by taking the mean position (over each wavelength range)
as the fiber position for that exposure. I then do a k-nearest neighbor lookup to find the Nexp-1 neighbors of the first 3Nfiber exposures. Then, with a bit of rearranging of data I end up storing the flux and inverse variance measurements in 3Nfiber x Nexp/3 x 4563 arrays.
Now this isn’t what I really want. What I want is to combine the exposures for each fiber/pointing combination to get a single spectrum and inverse variance estimates. I call this a stacked data set, and I wrote separate functions to read and manipulate the RSS files and to stack the data. These could be combined of course; I really have no use for the RSS data itself. In any case the current workflow goes like
“`{R}
> gdat.rss <- readrss("manga-8942-12702-LOGRSS.fits", drpcat)
> str(gdat.rss)
List of 7
$ meta :List of 10
..$ mangaid : chr “1-218280”
..$ plateifu: chr “8942-12702”
..$ nexp : num 9
..$ ra : num 124
..$ dec : num 27.1
..$ l : num 196
..$ b : num 29.6
..$ ebv : num 0.0367
..$ z : num 0.0255
..$ zdist : num 0.0254
$ lambda: num [1:4563] 3622 3622 3623 3624 3625 …
$ flux : num [1:381, 1:3, 1:4563] 1.913 0.861 2.618 -2.61 5.185 …
$ ivar : num [1:381, 1:3, 1:4563] 0.124 0.161 0.14 0.177 0.119 …
$ snr : num [1:381, 1:3] 2.28 2.2 2.42 2.71 2.76 …
$ xpos : num [1:381, 1:3] 6.67 7.89 5.37 2.78 4.13 …
$ ypos : num [1:381, 1:3] -12.6 -10.2 -10.4 -10.5 -12.7 …
> gdat.stack <- stackrss(gdat.rss)
> str(gdat.stack)
List of 10
$ meta :List of 10
..$ mangaid : chr “1-218280”
..$ plateifu: chr “8942-12702”
..$ nexp : num 9
..$ ra : num 124
..$ dec : num 27.1
..$ l : num 196
..$ b : num 29.6
..$ ebv : num 0.0367
..$ z : num 0.0255
..$ zdist : num 0.0254
$ lambda: num [1:4563] 3622 3622 3623 3624 3625 …
$ flux : num [1:381, 1, 1:4563] -0.538 2.972 1.319 -0.642 1.778 …
$ ivar : num [1:381, 1, 1:4563] 0.396 0.383 0.403 0.436 0.388 …
$ snr : num [1:381, 1] 3.93 3.89 4.18 4.7 4.81 …
$ xpos : num [1:381] 6.67 7.88 5.36 2.78 4.12 …
$ ypos : num [1:381] -12.6 -10.2 -10.4 -10.5 -12.7 …
$ ra.f : num [1:381] 124 124 124 124 124 …
$ dec.f : num [1:381] 27.1 27.1 27.1 27.1 27.1 …
$ dz : NULL
“`
Now, I work almost exclusively with these stacked derivatives of the RSS files. The main reason is necessity: the Bayesian models I work with, especially for star formation history modeling, are extremely computationally expensive. These can take some 10’s of minutes per spectrum with a state of the art desktop PC even with a smaller number of samples than I’d like and some other compromises. At, say, 3 per hour analyzing ā¼3000spectra would take about 42 days worth of CPU time, while the stacked RSS files have at most 381 spectra, making a single IFU’s analysis at most a 5 days project. Eliminating the compromises could increase execution time by factors of several, making even the latter analysis difficult for any reasonable size sample of galaxies.
There are some principled reasons as well to prefer the RSS files. The data cubes are generated from the RSS data via an interpolation scheme which can’t add to the information content but does introduce spatial covariance into any derived data. This should properly be modeled, but in practice usually isn’t — the MaNGA science team recommends inflating variance estimates by some factor. For many analyses it’s necessary to bin spaxels, and again the spatial covariance should properly be accounted for. The fiber spectra on the other hand can be treated as independent samples. The one significant disadvantage of working with the RSS data is that new visualization tools are needed. I’ve developed a number of such tools, and continue to work on them.
This post is longer than I originally intended, so I will save results for the next one and then move on to discussing one use of velocity data.