QSO Catalog | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Building the QsoCatalogAll and QsoConcordanceAll tablesJim Gray, Sebastian Jester, Gordon Richards, Alex Szalay, Ani ThakarMarch 2006 Abstract: We constructed a catalog of all quasar candidates and gathered their "vital signs" from the many different SDSS data sources into one Quasar Concordance table. 1. The Target, Best, and Spec SDSS DatasetsThe SDSS Target Database is used to select the targets that will be observed with the SDSS spectrographs. Once made, these targeting decisions are never changed but the targeting algorithm has improved over time. The SDSS pipeline software is always improving so the underlying pixels are re-analyzed with each data release. To have a consistent catalog, all the mosaiced pixels, both from early and recent observations are reprocessed with the new software in subsequent data releases. The output of each of these uniform processing steps is called a Best Database. So at any instant there is the historical cumulative Target database and the current Best database. As of early 2006 we have the Early Data Release (EDR) databases and then five "real" data releases DR1, DR2, DR3, DR4, and DR5. The target selection is done by the various branches (galaxy, quasar, serendipity) of the TARGET selection algorithm. These targets are organized for spectroscopic follow-up by the TILING (Blanton et al. 2003) [0] algorithm as part of a tiling run that works within a tiling geometry. The tiling run places a 2.5 deg. circle over a tiling geometry and then assigns spectroscopic targets to be observed. The circle corresponds to a plate that can be mounted on the SDSS telescope to observe 640 targets at a time. The plates are "drilled" and "plugged" with optical fibers and then "observed". These spectroscopic observations are fed through a pipeline that builds the Spec dataset. Because Spec is relatively small (2% the size of Best), it is included in the Best database. Unfortunately, only the "main" SDSS target photometry is exported to the Target database (the target photometry for Southern and Special plates is not exported - at best we have the later Best photometry for these objects in the database). The SDSS catalogs are cross-matched with the FIRST, ROSAT, Stetson 2. Overview: Finding Everything That MIGHT be a Quasar We look in the Target..PhotoObjAll, Best..SpecObjAll, and Best..PhotoObjAll tables to find any object that might be a quasar (a QSO). We build a QsoCatalogAll table that has a row for every combination of nearby TargPhoto-Spec-BestPhoto objects from these lists that are within 1.5 arcseconds of one another. If no matching object can be found from the QSO candidate list we find a surrogate object -- the nearest primary object from the corresponding catalog (Spec, BestPhoto, TargPhoto) if one can be found (again using the 1.5" radius.) If an object is still unmatched, we look for a secondary object, or put a zero for that ObjectID (in general, we use zero rather than the SQL null value to represent missing data). 2.1. Overview: QSO TablesThe tables and views created by the quasar concordance algorithm on the Best, Target and Spectro datasets are part of the Best database. The following sections explain how they are computed.
2.2. Overview: Quasar Bunches
The algorithm uses spatial proximity (aka: "is it nearby?") to cross-correlate objects in the Target, Best, and Spec databases. The definition of nearby is fairly loose: The SDSS Photo Survey pixels are 0.4 arcsecond and the positioning is accurate to .1 arcsecond, but the Spectroscopic survey has fibers that are 1.5 arcseconds in diameter. Therefore, the QSO concordance uses the 1.5" fiber radius to define nearby for all 3 datasets. In a perfect world, one SpecObj matches one BestObj and one TargetObj, and they are all marked as QSOs. Some objects have no match in the other catalogs -- so we have zeros in those slots of that object's row. But, sometimes 2 SpecObj match 3 TargetObj and 4 BestObj, and all 9 objects are marked as QSOs. In this case we get 2x3x4 rows. We group together all the objects that are related in this way as a bunch. Each bunch has a head object ID: the first member of the bunch to be recognized as a possible QSO. The precedence is TargetObjID first, if there is no target in the bunch then the first SpecObjID (highest S/N primary first), else the first BestObjID. This ordering reflects the first time the object was considered for follow-up spectroscopy. This order avoids a selection bias in the dataset (e.g., Malmquist bias if we were to order on decreasing S/N). 2.3 The QSO Catalog and Concordance
3. Overview: A Walkthrough of the Algorithm.Phase 1: Gather the Quasars and Quasar Candidates: As a first step, gather the Target, Spec, and Best quasar candidate or confirmed objects into a Zones table [1] containing their object identifiers and positions. These are copied from the Best and Target PhotoObjAll tables and the Best SpecObjAll table. These copies are filtered by flags indicating that the objects are QSOs or are targeted as QSOs. For the photo objects (target and best), this means they are primary or secondary and flagged (primTarget) as: TARGET_QSO_HIZ OR TARGET_QSO_CAP OR TARGET_QSO_SKIRT OR TARGET_QSO_FIRST_CAP OR TARGET_QSO_FIRST_SKIRT ( = 0x0000001F). For the spectroscopic objects, they must have one or more of the following properties:
That logic is fine for most Spectroscopic objects, but there are "special plates" whose authors overloaded the primary target flags (yes, they made it much harder to understand the data and cost many hours of discussion trying to disambiguate the data.) One can recognize the standard cases with the predicate plate.programType = 0 meaning that the plate was processed as a "Main" (programType=0 is "Main") chunk, not as a "special" (programType=2) or "Southern" (programType=1) plate. The three-case logic about works fine for "main" targets. The "targets for special plates" have SpecObj.primtarget & 0x80000000≠ 0. Once you know it is "special" plate you have to ask if it is a "special target". If it is, you have to ask is it the "Fstar72" group? If not you can use the standard test ((primTarget & 0x1F) ≠ 0) - those nice people did not "overload" the primTarget flags. But the folks who did "Fstar72" overloaded the flags and so we get the following complex logic: -- select SpecObjects that are either declared QSOs from their spectra -- or that were targeted as likely QSOs Select S.SpecObjID from BestDr5.dbo.platex as P join BestDr5.dbo.specobjall as S on P.plateid = S.plateid where specClass in (3,4,0) -- class is QSO or HiZ_QSO or Unknown. or z > 0.6 -- or high redshift or ( -- standard-survey plates px.programtype = 0 -- MAIN targeting survey and so.primtarget & 0x1f != 0 ) or ( -- special quasar targets from special plates -- see http://www.sdss.org/dr4/products/spectra/special.html so.primtarget & 0x80000000 != 0 and ( ( px.programname in ('merged48','south22') and so.primtarget & 0x1f != 0 ) or ( px.programname = 'fstar72' and so.primtarget & 4 != 0 ) or ( -- bent double-lobed FIRST source counterparts from specialplates -- The "straight double" counterparts have already been snuck -- into the usual FIRST counterpart quasar category 0x10. px.programname = 'merged48' and so.primtarget & 0x200000 != 0 ) ) ) or ( -- non-special quasar targets from special plates so.primtarget & 0x80000000 = 0 and px.programname in ('merged73','merged48','south22') and so.primtarget & 0x1f != 0 ) ---------------------------------------------------------------------------------------------- Phase 2: Find the Neighbors. Once the zone table is assembled containing all the candidates, a zones algorithm [1] is used to build a neighbors table among all these objects. Two objects are QSO neighbors if they are within 1.5 arcseconds of one another. The relationship is made transitive so that friends of friends are all part of the same neighborhood. Phase 3: Build the Bunches. The Neighbors relationship partitions the objects into bunches. We pick a distinguished member from each bunch to represent that bunch - called the bunch head. The selection favors Target then Spec, then Photo Objects and within that category it favors primary, then secondary, then outside objects if there is a tie within one group (e.g. multiple target objects in a bunch.) If there are multiple selections within these groups, the tie is broken by taking the minimum object ID for PhotoObj (again, to avoid any selection bias) and the highest S/N for specObjs. Given these bunch heads, we record a summary record for each bunch in the QsoBunch table:
Where the difference between TargetObjs and TargetPrimaries (etc.) is that TargetObjs indicates multiple entries of the same object in the database (e.g. both as a primary and a secondary), whereas TargetPrimaries helps us to identify objects that are either very close together or that were deblended into two objects separated by less than 1.5" (or are in a circle of 1.5" radius). Because the object primary flags are not handy at this point of the computation, the Bunch statistics are actually computed in Phase 9. Phase 4: Build the Catalog. Now we grow the QsoCatalogAll table which, for each bunch, has triples drawn from each class of the bunch (a target, a spec, and a best object). For example, the bunch of Figure 1 would produce 4 triples. If there is no object in one of the classes, we fill in with a non-QSO surrogate object - the primary object from that database (Targ, Photo, Spec) closest to the bunch head, or if there is no primary then a secondary (the test insists on the 1.5 arcsecond radius.) If no such object can be found we fill in that slot with a zero object. The resulting table looks like this:
The last 5 "quality fields" are computed in Phase 9. Phase 5: Find Surrogates for missing objects. Some objects in the Catalog entries have no matching Target, Best, or Spec objects. In these cases we look in the database to find a surrogate object (which was not a QSO candidate) that is nearby the bunch head object - as usual the search radius is 1.5 arcseconds and we favor primary over secondary objects and favor low-signal-to noise ratio SpecObjs. Phase 6: Get the Vital Signs. We now go to the source databases and get the "vital signs" of these photo and spetro objects (both quasar candidates and also surrogates) , building a QsoSpec, QsoTarget, and QsoBest tables holding these values and for the photo objects, some additional values from ROSAT and FIRST if there is a match. We then define QsoConcordanceAll as a view on these base tables with the following (~100) fields. Phase 7: Define QsoConcordanceAll and QsoConcordance Views: Now we are ready to "glue together the QsoCatalog with the vital signs to make a "fat table" with all the attributes.
Phase 9: Mark the primary triple of each bunch, compute some derived magnitude values and cleanup: Having the QsoConcordanceAll view and all the vital signs in place we compute some derived values: Picking the best triple of each bunch, computing the distances among members of the triple and computing some derived psf magnitudes. In the end, the DR5 database has 265,697 bunches, 329,871 triples in the concordance and 114,883 confirmed quasars. Most bunches have one catalog entry, but about 10% have multiple matches (generally and primary and secondary best or target object where both are flagged as QSO candidates or multiple observations of a spectroscopic object). The catalog itself has some interesting cases. In DR5 there are 82,142 cases where the Target, Spec, and Best all agree that it is a quasar. Since SDSS spectroscopy lags the imaging, it is not surprising that there are 81,011 objects where both the Target and Best indicate a likely QSO, but there is no spectrogram for the object (the Spec Zero case). With the QsoCatalogAll and QsoConcordanceAll in place we define two views: QsoCatalog (the best of the bunch) and QsoConcordance (the wide version) by picking the best targetObj, spec, and bestObj of each bunch.
References [0] "An Efficient Targeting Strategy for Multiobject Spectrograph Surveys: The Sloan Digital Sky Survey," Blanton et al., AJ 125:2276 (2003) [1] "There Goes the Neighborhood: Relational Algebra for Spatial Data Search", pdf, Alexander S. Szalay, Gyorgy Fekete, Wil O'Mullane, Maria A. Nieto-Santisteban, Aniruddha R. Thakar, Gerd Heber, Arnold H. Rots, MSR-TR-2004-32, April 2004 [2] "Creating Sectors," Alex Szalay, Gyorgy Fekete,
Tamas Budavari, Jim Gray, Adrian Pope, Ani Thakar, August 2003, |