Special Issue of Computing in Science and Engineering
Dedicated to the SDSS Science Archive
The
January/February 2008 issue of the journal
Computing in Science and Engineering (CiSE)
- a joint publication of the American Institute of
Physics and the IEEE Computer
Society - was dedicated to the SDSS Science Archive. The issue
featured several in-depth, peer-reviewed articles on various components
of the SDSS-II Science Archive. For SDSS-III, the Data Archive Server (DAS)
has been replaced with the Science Archive Server
(SAS), whereas the Catalog Archive Server (CAS) continues (with
significant enhancements and schema changes) to provide access to the
catalog data via the SkyServer Web interface and
the CasJobs batch query service.
The
November/December 2008 issue of CiSE also had a follow-up article on lessons
learned from the SDSS-II CAS deployment.
These articles are described below with links to the PDF for each article.
|
|
|
|
The Sloan Digital Sky Survey Science Archive represents a thousand-fold
increase in the total amount of data that astronomers have collected to
date. The pioneering instrumentation technology that made this possible
is matched by groundbreaking tools that let anyone in the world access
terabytes of SDSS data online.
|
|
|
|
The Sloan Digital Sky Survey's Data Archive Server (DAS) provides public
access to data files produced by the SDSS data reduction pipeline. This
article discusses challenges in public distribution of data of this
complexity and how the project addressed them.
|
|
|
|
The multiterabyte Sloan Digital Sky Survey's (SDSS's) catalog data is
stored in a commercial relational database management system with SQL
query access and a built-in query optimizer. The SDSS Catalog Archive
Server adds advanced data mining features to the DBMS to provide fast
online access to the data.
|
|
|
|
Using a database management system (DBMS) is essential to ensure the
data integrity and reliability of large, multidimensional data
sets. However, loading multiterabyte data into a DBMS is a
time-consuming and error-prone task that the authors have tried to
automate by developing the sqlLoader pipeline--a distributed workflow
system for data loading.
|
|
|
|
Catalog Archive Server Jobs (CasJobs) is an asynchronous query workbench
service that lets users run unrestricted SQL queries against scientific
catalog archives. After running queries in batch mode, users can save
their results to a personal database called MyDB before downloading
them, letting users manage their query workloads, results, and histories
without causing network overloads.
|
|
|
|
The SDSS is one of the first very large archives in astronomy and other sciences, as we enter the era of data-intensive science. Here the authors summarize some of the important and generally applicable insights they have gained (often the hard way!) over the past decade of SDSS development.
|
|