Archive contents

The HST archive at ESAC/ESA contains data from the following Hubble collections and catalogues:

Details on the collections and catalogues are given below.

The HST Collection

The HST collection in the eHST is the standard HST archive content and contains the following:

  • All public (non-proprietary) HST data.
  • The standard HST archive products from the active instruments (ACS, COS, STIS, WFC3). These products are kept current by the HST Cache system described in the bottom of this page.
  • Data from legacy instruments (FOC, FOS, HRS, NICMOS, WFPC, WFPC2). These data have gone through a final calibration run and are not foreseen to change anymore.

Detailed information on the data produced by each instrument aboard the Hubble Space Telescope can be found in the official HST Data Handbooks.

The HLA Collection

  • Release: Data Release 8 (3rd December 2014)

The HLA is a joint project of the Space Telescope Science Institute (STScI), the Space Telescope European Coordinating Facility (ST-ECF), and the Canadian Astronomy Data Centre (CADC). The HLA collection in the eHST contains enhanced image products for WFPC2 (produced by the CADC), ACS, NICMOS and WFC3 (produced by the STScI) and extracted spectra from NICMOS and ACS grism observations (produced by the ST-ECF). .

The enhanced data products were generated from the standard HST pipeline products. The ACS, WFPC2, and WFC3, and NICMOS images have been combined using MultiDrizzle, are aligned north up, and, except for WFC3, have been astrometrically corrected when possible (for approximately 80% of the cases). The HLA-produced WFC3 images introduce an improved format described in the Image Formats FAQ. All HLA-produced images are in units of electrons/second, which for WFPC2 and NICMOS differs from calibration pipeline products; photometric zeropoints must be adjusted to reflect the units used.

The HLA Frequently Asked Questions page gives a detailed description of the types of data that are available in the HLA, including a matrix of available data.

The HLSP Collection

The High Level Science Products (HLSP) are community contributed, fully processed (reduced, co-added, cosmic-ray cleaned etc.) images and spectra that are ready for scientific analysis. Currently, the eHST contains the HLSPs from GOODS North and South.

More information on the HLSPs can be found here.

Hubble Source Catalog (HSC)

The Hubble Source Catalog is a combination of all the visit-based source lists in the Hubble Legacy Archive (HLA) into a single catalogue. It contains about 80 million detections of 30 million objects, involving 112 different detector/filter combinations, and about 160,000 HST exposures. Source lists from the Data Release 8 of the Hubble Legacy Archive (HLA) are matched using an algorithm developed by Budávari & Ludow (2012). The mean photometric accuracy is better than 0.10 mag, with relative accuracies as good as 0.02 in certain cases. The relative astrometric residuals are typically within 10 mas, and the absolute astrometric accuracy is better than about 0.1 arcsec for most sources.

HSC data provenance
Instrument Number of filters Number of sources Number of detections Retrieval date
WFPC2 38 29,146 13 × 106 2009/05/11
ACS/WFC 12 9,021 21 × 106 2011/02/16
WFC3/UVIS 47 4,772 31 × 106 2014/06/01
WFC3/IR 15 6,763 14 × 106 2014/06/01
All 112 49,702 79 × 106 -


Version 2 includes (see here for more details):

  • Four additional years of ACS source lists (i.e., through June 9, 2015). All ACS source lists go deeper than in version 1. See current HLA holdings for details.
  • One additional year of WFC3 source lists (i.e., through June 9, 2015).
  • Cross-matching between HSC sources and spectroscopic COS, FOS, and GHRS observations.

What is the HST Cache?

The cache is an envelope around HST archive file production. It is a set of database tables and software agents that ensures that all publically available HST science pipeline products are preprocessed and readily available from storage at all times. This includes mechanisms to discover newly observed datasets to insert, and automatic reprocessing of datasets which benefit from updates to reference files, available meta-data and general processing software upgrades.

Why do we need a cache?

Since 2002 all data from active instruments has been produced from scratch triggered by user requests. The reasoning behind the On The Fly Reprocessing (OTFR) and On The Fly Calibration (OTFC) pipelines was that it would guarantee that the archive user always would get her data equipped with the newest set of meta-data and calibrated according to the best methods available. This was a clear advantage to the previous system, where the raw data was produced centrally at the STScI and delivered to the partner-sites, essentially freezing that data in time. Another advantage of the system was that it conserved storage space as only the Hubble Space Telescope telemetry files and a few smaller auxiliary files needed to be stored, an important resource aspect when data is stored on optical disks in jukeboxes.

With the advent of cheap mass storage in form of hard-disk arrays this aspect became less important and a number of other drawbacks of the on-the-fly paradigm became apparent over time as well: Live processing of data requires that support is available at all times to resolve errors and bugs in the pipeline, a inevitable task when a system becomes as complex as this with such a heterogeneous set of data as input. Another drawback is the processing speed: Producing a dataset could take from several minutes to hours, which might not be an issue for the patient astronomer, but makes it impossible to expose the data through synchronous VO protocols. Next level efforts like data-mining/metadata harvesting and production of high-level data products is also enormously difficult in the on-the-fly world.

The advantages of the HST Cache are:

  • Faster access Speed
  • Shields users from processing errors
  • Direct programmatic & VO protocol access to the data
  • Makes the archive less prone to overall system breakdowns.
  • Allows site interoperability and redundancy
  • Less maintenance in the long run
  • Allows harvesting of meta-data and data-mining