Help support

Should you have any question, please check the Gaia FAQ section or contact the Gaia Helpdesk

DataLink Service

Authors: Héctor Cánovas and Jos de Bruijne

 

In each Gaia data release, the key parameters of all sources are stored in the gaia_source table that contains the (mean) astrometric, photometric, and radial-velocity data as well as astrophysical parameters. This table, along with the complementary tables for, for instance, variable stars or solar system objects, is accessible by means of the IVOA compliant TAP+ table access protocol that allows to explore astronomical datasets stored in relational data bases using the ADQL query language. In addition to the gaia_source table, Gaia DR3 includes vast amounts of non-tabular data such as mean spectra, epoch photometry, and Monte Carlo Markov Chain samples for millions of sources (while Gaia DR4 will include epoch astrometry and epoch photometry for the whole sample plus billions of mean and epoch spectra). Storing these non-tabular datasets as plain tables in a monolithic, relational data base is impractical. Instead, these products are hosted by a dedicated service designed to handle massive data requests that is accessible via the DataLink protocol. DataLink is a data access protocol compliant with the IVOA architecture that provides a linking mechanism between datasets offered by different services. In practice, it can be seen and used as a web service providing the list of additional data products available for each object outside the main catalogue(s).

Since the Archive upgrade to version 2.14 the VOTables generated by the Archive contain a new resource that facilitates the access to the DataLink products to IVOA compliant clients (like TOPCAT):

</RESOURCE>
<RESOURCE type="meta" utype="adhoc:service" name="ancillary">
    <DESCRIPTION>Retrieve DataLink file containing ancillary data for source</DESCRIPTION>
    <PARAM name="standardID" datatype="char" arraysize="*" value="ivo://ivoa.net/std/DataLink#links-1.0"/>
    <PARAM name="accessURL" datatype="char" arraysize="*" value="https://gea.esac.esa.int/data-server/datalink/links"/>
    <PARAM name="contentType" datatype="char" arraysize="*" value="application/x-votable+xml;content=datalink"/>
    <GROUP name="inputParams">
        <PARAM name="ID"  datatype="char" arraysize="*" value="" ref="DESIGNATION"/>
    </GROUP>
</RESOURCE>

The entry point to the DataLink server is indicated by the "accessURL" parameter. To invoke the service and find out the resources associated to a given source it is necessary to combine the entry point with the target ID as:  

https://gea.esac.esa.int/data-server/datalink/links?ID=Gaia+DR3+30343944744320

The output is a xml file not intended for humans but for IVOA-compliant clients. Opening this file with TOPCAT reveals the following content:

Figure 1: Content of the xml file generated by the DataLink service when invoked as explained above.

 

This file contains the URLs that give access to the DataLink products associated to the target source. For information about how to access to these products from the Gaia Archive web interface and programmatically, please see the DataLink: Access from the Archive GUI and the Command line access: DataLink tutorials, respectively. The structure and content of the DataLink products are described in the Datamodel Chapter in the Gaia DR3 documentation and DataLink products serialisation tutorial.

DataLink: Access from the Archive web interface

Authors: Héctor Cánovas, Jos de Bruijne, and Alcione Mora

 

Gaia DR3 includes vast amounts of non-tabular data such as high- and low-resolution (mean) spectra, epoch photometry, and Monte Carlo Markov Chain samples for millions of sources. These products are hosted by a dedicated service designed to handle massive data requests that is accessible via the DataLink protocol. This intermediate-level tutorial introduces the concepts needed to access and retrieve these products using the Gaia Archive web interface via its Advanced (ADQL) form. The complementary DataLink: command line access and DataLink: Python Access tutorials describe the programmatic access to these products using the Unix curl command-line utility and the Python package Astroquery.Gaia, respectively, while the DataLink products serialisation tutorial describes the structure of these products. In case of difficulties following this tutorial, please consult the DataLink service and Advanced (ADQL) tab tutorials.

Important: it is not possible to search for and retrieve the DataLink products associated to more than 5000 sources in one and the same call. However, it is possible to overcome this limit programmatically using a sequential download, as explained in this tutorial.

Tutorial content:

  1. How it works
  2. Basic use cases
  3. Advanced use cases

 

1. How it works

The DataLink service searches for the DataLink product(s) associated to a list of Gaia designation(s) or, alternatively, a combination of Gaia source ID(s) and Gaia data release. By default, the server searches for the products associated to Gaia DR3. Users interested in searching the DataLink products from Gaia DR2 (which only contains epoch photometry data) simply have have to select "Gaia DR2" in the "Data release" dropdown menu that is indicated by the inclined arrow in Fig. 1.

 

2. Basic use cases

One of the most simple use cases that can be defined is: "I want to search for the DataLink products associated to the output of this (Gaia DR3) query". In a first attempt, we may run a simple, 0.25 degrees radius ADQL cone search similar to the first example directly accessible from the ADQL query editor (see also the Query examples):

SELECT DISTANCE(
  POINT(266.41683, -29.00781),
  POINT(ra, dec)) AS separation, *
FROM gaiadr3.gaia_source
WHERE 1 = CONTAINS(
  POINT(266.41683, -29.00781),
  CIRCLE(ra, dec, 0.25))
ORDER BY separation ASC

The first step to retrieve the DataLink products associated to this sample consists in clicking on the double chain ("paperclip") icon available in the job list area of the Advanced (ADQL) form (see Fig. 1). In this case, we will receive an error message explaining that the result of the query above contains 19,758 sources, which exceeds the threshold (5000 sources) imposed to not overload the DataLink server. To avoid this problem, we may reduce the cone search radius or, even better, make use of the "has_<datalink_product>" fields available in the gaiadr3.gaia_source table. Filtering the previous query as follows:

SELECT DISTANCE(
  POINT(266.41683, -29.00781),
  POINT(ra, dec)) AS separation, *
FROM gaiadr3.gaia_source
WHERE 1 = CONTAINS(
  POINT(266.41683, -29.00781),
  CIRCLE(ra, dec, 0.25))
AND
-- Retrieve only sources with associated DataLink products 
has_epoch_photometry ='True' AND
has_xp_sampled = 'True'
ORDER BY separation ASC

reduces the query output to just 26 sources, all of them having epoch photometry and XP sampled (as well as XP continuous) spectra. Note: advanced users may use the job_upload mechanism to apply these filters to their old queries without having to re-run the entire query. Clicking again in the double chain icon in the job list area will launch the DataLink wizard as shown in Fig. 1 below. This window lists all the available products associated with the sample generated by the previous query. It is possible to retrieve only selected products (e.g., just RVS mean spectra) or download all the DataLink products at once (by simply clicking on the "Save All Data" button). Note that, depending on the amount of data to be retrieved, the preparation of the dataset prior to starting the download may take up to several minutes. From the DataLink wizard, it is also possible to select different combinations of data structures and download formats (see the DataLink: products serialisation tutorial for details).

Figure 1:Gaia ESA Archive web interface DataLink wizard that appears when clicking on the DataLink icon (double chain link encompassed by a red circle above) in the job lists area.

The vertical arrows point to the drop-down menus that allow to select the data structure (output serialisation) and file format of the files (see the DataLink: Products serialisation tutorial). 

 

The "phot_variable_flag" field in the main Gaia DR2 catalogue (gaia_dr2.gaia_source) allows to filter out the sources containing (DR2) epoch photometry. The DR2 equivalent of the last ADQL query would then be:

SELECT DISTANCE(
  POINT(266.41683, -29.00781),
  POINT(ra, dec)) AS separation, *
FROM gaiadr2.gaia_source
WHERE 1 = CONTAINS(
  POINT(266.41683, -29.00781),
  CIRCLE(ra, dec, 0.25))
AND
-- Retrieve only sources with associated DataLink products 
phot_variable_flag = 'VARIABLE'
ORDER BY separation ASC

 

3. Advanced use caseS

From the DataLink wizard, it is also possible to 1) select the input ID columns to be used by the massive data server when searching for the DataLink products associated to a given sample, and 2) select the associated data release (see the dropdown menus highlighted by the inclined arrows in Fig. 1). These two options become relevant for users aiming to search for DataLink products in external or user-provided tables (see the the Upload a user table and Download data from an external TAP server tutorials to learn how to upload a table to the user space). The example shown in Fig. 2 illustrates the case of a table uploaded by a (registered) user that contains three different ID fields (and right ascension and declination). If the selected column contains Gaia source IDs but not designations, then the user must indicate as well the desired data release.

Figure 2: Same as Fig.1, but illustrating what happens when searching for DataLink products in catalogues containing several fields that could be used as input ID columns for the DataLink service.

 

 

datalink products serialisation

Authors: Héctor Cánovas, María Henar, Jos de Bruijne, Elena Racero, and Alcione Mora

 

The DataLink IVOA protocol implemented in the Gaia ESA Archive gives access to six different products (epoch photometry, medium- and low-resolution spectra, and probability density distributions for the different astrophysical parameters) available for a significant fraction of the sources included in the main Gaia DR3 table (gaia_dr3.gaia_source). These products are serialised according to different data models, and all of them can be retrieved in multiple file formats as well as multiple data structures. This document describes the contents (both data and metadata) of the DataLink products in the various serialisations generated by the Archive. Further information about the implementation of the DataLink protocol in the Archive is briefly described in the DataLink Service, while the Datalink: Command line access and the DataLink: Access from the web interface describe how to retrieve the DataLink products through the Archive web interface and programmatically, respectively. This other tutorial shows how to use the Astroquery.Gaia Python package to download these products.

The DataLink products served by the Archive (and the data models applied to serialise them) are listed below:

Product

Retrieval type

Short description

Data model

Data release

Epoch photometry

EPOCH_PHOTOMETRY

Each row in this table contains the light curve for a given object in the G, BP, and RP bands as stored in the DataLink massive data base.

IVOA Times Series Cube

DR3 & DR2

MCMC GSP-Phot samples

MCMC_GSPPHOT

Monte-Carlo Markov Chain (MCMC) samples for the posterior probability distribution of all parameters derived from the General Stellar Parametrizer from Photometry (GSP-Phot). Some 2000 random MCMC samples are provided for (1) all sources brighter than G=12 mag, (2) a random subset of 0.1% of the sources fainter than G=12 mag. For all other sources fainter than G=12, the sample size is 100 (the last 100 samples in the MCMC).

DPAC Data model

DR3

MCMC MSC samples

MCMC_MSC

Monte-Carlo Markov Chain (MCMC) samples for the posterior probability distribution of all parameters derived from the Multiple Star Classifier (MSC). Some 100 random MCMC samples are provided for each source.

DPAC Data model

DR3

XP mean continuous spectra

XP_CONTINUOUS

Time-averaged (mean) BP/RP spectra based on the continuous representation in basis functions (see this Chapter).

DPAC Data model

DR3

XP mean sampled spectra

XP_SAMPLED

Time-averaged (mean) BP/RP externally-calibrated and sampled spectra are provided for a subset of all sources. All spectra are sampled to the same set of absolute wavelength positions, which can be found in the xp_merge table (<link to Cosmos>).

IVOA spectrum

DR3

RVS mean spectra

RVS

Time-averaged (mean) RVS normalised and sampled spectra are provided for a subset of all sources.

IVOA spectrum DR3

 

The serialisation of each product is detailed in the following sections.

Tutorial content:

  1. Retrieval parameters
    1. Data structure
    2. Download file format
    3. Output file naming
  2. Data models
    1. Epoch Photometry
    2. MCMC GSP-Phot Samples
    3. MCMC MSC Samples
    4. XP Continuous Spectra
    5. XP Sampled Spectra
    6. RVS Spectra

 

1. retrieval parameters

 

1.1 Data Structure

This parameter defines the structure of the file that is being prepared for download. There are three possible options:

  1. INDIVIDUAL (default): one single file per product per selected source(s), with the data serialised in tabular format (one element per table cell).
  2. COMBINED: one single file per product, with the data for multiple sources serialised in a tabular format (one element per table cell).
  3. RAW: one single file per product, with the data for multiple sources serialised in a tabular format (one or more elements per table cell).

The latter format is the one used internally by the DPAC consortium, and it is documented in the Gaia Data Release 3 documentation (see the Datamodel description chapter).

1.2 Download FILE Format

Available file download formats are: 

  1. VOTable (both binary and plain-text formats, .xml extension)
  2. FITS
  3. CSV
  4. *ECSV (Enhanced Character Separated Values)

*Note: It is not possible to download XP mean sampled spectra or RVS mean spectra in a COMBINED data structure using ECSV as file format. This format fundamentally does not support storing multiple tables (with their associated metadata) in a single file.

The VOTable, FITS, and ECSV file formats provide the table fields and metadata with column descriptions, UCDs, UTYPEs, and units when applicable. The CSV file format only includes the column names.

 

1.3. output file naming

The data structure and download format define the names of the retrieved files as follows:

Data structure

File name

Example

INDIVIDUAL

One file per source

< RETRIEVAL_TYPE >-<DESIGNATION>.<xml/fits/csv/ecsv>

RVS-Gaia DR3 30343944744320.<xml/fits/csv/ecsv>

COMBINED

One file with all sources

< RETRIEVAL_TYPE >_COMBINED.<xml/fits/csv/ecsv>  EPOCH_PHOTOMETRY_COMBINED.<xml/fits/csv/ecsv>

RAW

One file with all sources

< RETRIEVAL_TYPE >_RAW.<xml/fits/csv/ecsv> XP_SAMPLED_RAW.<xml/fits/csv/ecsv>

 

By default, the ouput data is downloaded as a compressed .gzip file. However, some internet browsers like, for instance, Safari, automatically expand these files (without asking the user).

2. Data Models

The data model for the products serialised in the INDIVIDUAL and COMBINED data structures is described in the following subsections. In all the tables below, the fields that are added by the Archive (i.e., those that are not included in the DPAC data model) when serialisaing the different products are highlighted in bold fonts.

2.1 EPOCH PHOTOMETRY

The serialisation of the epoch photometry is based on IVOA Time series cube data model. No metadata is added into the file header, and all the information is repeated through the output table so each row is self-contained. Column names therefore remain identical in the INDIVIDUAL and COMBINED data structures.

Field

Unit

Data type

UCD

UTYPE

source_id

 

long

meta.id;meta.main

 

transit_id

 

long

meta.id

 

band

 

string

instr.bandpass

ssa:DataID.Bandpass

time

d

double

time.epoch

 

mag

mag

double

phot.mag;em.opt

 

flux

'electron'.s**-1

double

phot.flux;stat.mean

 

flux_error

'electron'.s**-1

double

stat.error;phot.flux;em.opt

 

flux_over_error

 

double

stat.snr;phot.flux;em.opt

 

rejected_by_photometry

 

boolean

meta.code.status

 

rejected_by_variability

 

boolean

meta.code.status

 

other_flags

 

long

meta.code.status

 

solution_id

 

long

meta.version

 

 

Long descriptions for the added fields:

band: Photometric band. Values: G (per-transit combined SM-AF flux), BP (blue photometer integrated flux), and RP (red photometer integrated flux).

rejected_by_photometry: Rejected by DPAC photometric processing. Unavailable or rejected by DPAC photometric processing, or negative (unphysical) flux.

other flags: Additional processing flags. This field contains extra information on the data used to compute the fluxes and their quality. It provides debugging information that may be safely ignored for many general purpose applications. The field is a collection of binary flags, whose values can be recovered by applying bit shifting and masking operations. Each band has different binary flags in different positions, as shown below. Bit numbering is as follows: least significant bit = 1 and most significant bit = 64.

  • G band:
    • Bit 1: SM transit rejected by photometric processing.
    • Bit 2: AF1 transit rejected by photometric processing.
    • Bit 3: AF2 transit rejected by photometric processing.
    • Bit 4: AF3 transit rejected by photometric processing.
    • Bit 5: AF4 transit rejected by photometric processing.
    • Bit 6: AF5 transit rejected by photometric processing.
    • Bit 7: AF6 transit rejected by photometric processing.
    • Bit 8: AF7 transit rejected by photometric processing.
    • Bit 9: AF8 transit rejected by photometric processing.
    • Bit 10: AF9 transit rejected by photometric processing.
    • Bit 13: G band flux scatter larger than expected (all CCDs considered).
    • Bit 14: SM transit unavailable by photometric processing.
    • Bit 15: AF1 transit unavailable by photometric processing.
    • Bit 16: AF2 transit unavailable by photometric processing.
    • Bit 17: AF3 transit unavailable by photometric processing.
    • Bit 18: AF4 transit unavailable by photometric processing.
    • Bit 19: AF5 transit unavailable by photometric processing.
    • Bit 20: AF6 transit unavailable by photometric processing.
    • Bit 21: AF7 transit unavailable by photometric processing.
    • Bit 22: AF8 transit unavailable by photometric processing.
    • Bit 23: AF9 transit unavailable by photometric processing.
  • BP band:
    • Bit 11: BP transit rejected by photometric processing.
    • Bit 24: BP transit photometry rejected by variability processing.
  • RP band:
    • Bit 12: RP transit rejected by photometric processing.
    • Bit 25: RP transit photometry rejected by variability processing.

 

Figure 1: excerpt from the Epoch Photometry table as shown by TOPCAT.

 

2.2 MCMC GSP-PHOT samples

The serialisation of the MCMC GSP-Phot samples follows the DPAC data model, with the only exceptions being the "nsamples" field that is not included and the array serialisation (the array columns are flattened to one value per entry). No metadata is added into the file header, and all the information is repeated through the output table so each row is self-contained. Column names therefore remain identical in the INDIVIDUAL and COMBINED data structures. The table metadata does not contain any UTYPEs.

Field

Unit

Data type

UCD

source_id

 

long

meta.id

solution_id

 

long

meta.version

teff

K

float

phys.temperature.effective

azero

mag

float

phys.absorption;em.opt

logg

log(cm.s**-2)

float

phys.gravity

mh

'dex'

float

phys.abund

ag

mag

float

phys.absorption;em.opt

mg

mag

float

phys.magAbs;em.opt

distancepc

pc

float

pos.distance

abp

mag

float

phys.absorption;em.opt.B

arp

mag

float

phys.absorption;em.opt.R

ebpminrp

mag

float

phot.color.excess;em.opt

log_pos

 

float

stat.probability

log_lik

 

float

stat.likelihood

radius

solRad

float

phys.size.radius

 

Figure 2: excerpt from the MCMC GSP-Phot table as shown by TOPCAT.

 

 

2.3 MCMC MSC SAMPLES

The serialisation of the MCMC MSC samples follows the DPAC data model, with the only exceptions being the "nsamples" field that is not included and the array serialisation (the array columns are flattened to one value per entry). No metadata is added into the file header, and all the information is repeated through the output table so each row is self-contained. Column names therefore remain identical in the INDIVIDUAL and COMBINED data structures. The table metadata does not contain any UTYPEs.

Field

Unit

Data type

UCD

source_id

 

long

meta.id

solution_id

 

long

meta.version

teff1

K

float

phys.temperature.effective

teff2

K

float

phys.temperature.effective

logg1

log(cm.s**-2)

float

phys.gravity

logg2

log(cm.s**-2)

float

phys.gravity

azero

mag

short

phys.absorption;em.opt

mh

'dex'

float

phys.abund.Fe

distancepc

pc

float

pos.distance

log_pos

 

float

stat.probability

log_lik

 

float

stat.likelihood

 

Figure 3: excerpt from the MCMC MSC table as shown by TOPCAT.

 

2.4 XP CONTINUOS spectra

The XP Continuous mean spectra are serialised without deviations from the original DPAC data model for any data structure. The table metadata contains neither units nor UTYPEs.

Field

Data type

UCD

source_id

long

meta.id;meta.main

solution_id

long

meta.version

bp_basis_function_id

short

meta.id

bp_degrees_of_freedom

short

stat.fit.dof

bp_n_parameters

short

stat.fit.param

bp_n_measurements

short

meta.number

bp_n_rejected_measurements

short

meta.number

bp_standard_deviation

float

stat.stdev

bp_chi_squared

float

stat.fit.chi2

bp_coefficients

double[]

stat.fit.param

bp_coefficient_errors

float[]

stat.error

bp_coefficient_correlations

float[]

stat.correlation

bp_n_relevant_bases

short

meta.number

bp_relative_shrinking

float

stat.fit.param

rp_basis_function_id

short

meta.number

rp_degrees_of_freedom

short

stat.fit.dof

rp_n_parameters

short

stat.fit.param

rp_n_measurements

short

meta.number

rp_n_rejected_measurements

short

meta.number

rp_standard_deviation

float

stat.stdev

rp_chi_squared

float

stat.fit.chi2

rp_coefficients

double[]

stat.fit.param

rp_coefficient_errors

float[]

stat.error

rp_coefficient_correlations

float[]

stat.correlation

rp_n_relevant_bases

short

meta.number

rp_relative_shrinking

float

meta.number

 

Figure 4: excerpt from the XP Continuous Spectrum table as shown by TOPCAT.

 

2.5 XP SAMPLED spectra

The serialisation of the XP sampled mean spectra follows the IVOA Spectra Data Model. Both the INDIVIDUAL and COMBINED data structures contain the same data and metadata. However, due to the 8-characters length limit imposed by the FITS format to the keyword names, the (added) table metadata parameters are re-named when serialising this product in FITS format. In the table below, the rows in white and green background indicate the fields included in the table metadata and data, respectively. None of the metadata fields is included in the files generated in .csv format, which follows a particular serialisation (similar to the DPAC RAW serialisation but including a wavelength column).

Field (VOTable)

Field (FITS)

Unit

Data type

UCD

UTYPE

source_id

SOURCEID

 

long

meta.id;src

spec:Target.Name

solution_id

SOLUTION

 

long

meta.version

 

spatialLocation

POS

deg

double[]

pos.eq

spec:Char.SpatialAxis.Coverage.Location.Value

SpatialExtent

APERTURE

deg

double

instr.fov

spec:Spectrum.Char.SpatialAxis.Coverage.Bounds.Extent

TimeAxisCoverageLocation

REFEPOCH

yr

double

time.epoch

spec:Char.TimeAxis.Coverage.Location.Value

TimeAxisCoverageBoundsExtent

EPOCHEXT

yr

double

time.duration

spec:Char.TimeAxis.Coverage.Bounds.Extent

spectralLocation

-

nm

double

instr.bandpass

spec:Char.SpectralAxis.Coverage.Location.Value

spectralCoverageBoundsExtent

WAVEEXTE

nm

double

instr.bandwidth

spec:Char.SpectralAxis.Coverage.Bounds.Extent

spectralCoverageBoundsStart

WAVESTAR

nm

double

stat.min

spec:Char.SpectralAxis.Coverage.Bounds.Start

spectralCoverageBoundsStop

WAVEEND

nm

double

stat.max

spec:Char.SpectralAxis.Coverage.Bounds.Stop

spectralAccuracyStatError

WAVEERRO

nm

double

stat.error;em.wl

spec:Char.SpectralAxis.Accuracy.StatError

DataModel

DATAMODE

nm

string

 

spec:Spectrum.DataModel

Publisher

PUBLISHE

 

string

meta.curation

spec:Curation.Publisher

Title

TITLE

 

string

 

spec:DataID.Title

SpectralAxisUcd

-

 

string

 

spec:Spectrum.Char.SpectralAxis.Ucd

SpectralAxisUnit

SPECTRAL

 

string

 

spec:Spectrum.Char.SpectralAxis.Unit

FluxAxisUcd

-

 

string

 

spec:Spectrum.Char.FluxAxis.Ucd

FluxAxisUnit

FLUXAXIS

 

string

 

spec:Spectrum.Char.FluxAxis.Unit

wavelength

 

nm

double

em.wl

spec:Data.SpectralAxis.Value

flux

 

W.m**-2.nm**-1

float

spect

 

flux_error

 

W.m**-2.nm**-1

float

stat.error;spect 

 

 

 

Figure 5: excerpt from the XP Sampled Spectrum table and metadata as shown by TOPCAT.

 

 

2.6 RVS SPECTRA

The serialisation of the RVS mean spectra follows the IVOA Spectra Data Model. Both the INDIVIDUAL and COMBINED data structures contain the same data and metadata. However, due to the 8-characters length limit imposed by the FITS format to the keyword names, the (added) table metadata parameters are re-named when serialising this product in FITS format. In the table below, the rows in white and green background indicate the fields included in the table metadata and data, respectively. None of the metadata fields is included in the files generated in .csv format, which follows a particular serialisation (similar to the DPAC RAW serialisation but including a wavelength column).

Field (VOTable)

Field (FITS)

Unit

Data type

UCD

UTYPE

source_id

SOURCEID

 

long

meta.id;src

spec:Target.Name

solution_id

SOLUTION

 

long

meta.version

 

combined_transits

NTRANSIT

 

int

 

 

combined_ccds

NCCDS

 

int

 

 

deblended_ccd

NDEBLEND

 

int

 

 

spatialLocation

POS

deg

double[]

pos.eq

spec:Char.SpatialAxis.Coverage.Location.Value

TimeAxisCoverageLocation

REFEPOCH

yr

double

time.epoch

spec:Char.TimeAxis.Coverage.Location.Value

TimeAxisCoverageBoundsExtent

EPOCHEXT

yr

double

time.duration

spec:Char.TimeAxis.Coverage.Bounds.Extent

spectralAccuracyStatError

WAVEERRO

nm

double

stat.error;em.wl

spec:Char.SpectralAxis.Accuracy.StatError

spectralLocation

-

nm

double

instr.bandpass

spec:Char.SpectralAxis.Coverage.Location.Value

spectralCoverageBoundsExtent

WAVEEXTE

nm

double

instr.bandwidth

spec:Char.SpectralAxis.Coverage.Bounds.Extent

spectralCoverageBoundsStart

WAVESTAR

nm

double

stat.min

spec:Char.SpectralAxis.Coverage.Bounds.Start

spectralCoverageBoundsStop

WAVEEND

nm

double

stat.max

spec:Char.SpectralAxis.Coverage.Bounds.Stop

SpatialExtent

APERTURE

deg

double

instr.fov

spec:Spectrum.Char.SpatialAxis.Coverage.Bounds.Extent

DataModel

DATAMODE

 

string

 

spec:Spectrum.DataModel

Publisher

PUBLISHE

 

string

meta.curation

spec:Curation.Publisher

Title

TITLE

 

string

 

spec:DataID.Title

SpectralAxisUcd

-

 

string

 

spec:Spectrum.Char.SpectralAxis.Ucd

SpectralAxisUnit

SPECTRAL

 

string

 

spec:Spectrum.Char.SpectralAxis.Unit

FluxAxisUcd

FLUXAXIS

 

string

 

spec:Spectrum.Char.FluxAxis.Ucd

FluxAxisUnit

 

 

string

 

spec:Spectrum.Char.FluxAxis.Unit

wavelength

 

nm

double

em.wl

spec:Data.SpectralAxis.Value

flux

 

 

float

phot.flux;em.opt.I 

 

flux_error

 

 

float

stat.error;phot.flux;em.opt.I

 

 

Figure 6: excerpt from the RVS Spectra table and metadata as shown by TOPCAT.

 

Datalink: python access

Authors: Héctor Cánovas and Jos de Bruijne

 

The main goal of the Jupyter Notebook displayed below is to teach how to retrieve and inspect the DataLink products using the Astroquery.Gaia Python package. The code and its associated requirements.txt file can be downloaded from here

 

Tutorial: Retrieve (all) the DataLink products associated to a sample





 

Release number: v1.0.1 (2022-12-06)

Applicable Gaia Data Releases: Gaia EDR3, Gaia DR3

Author: Héctor Cánovas Cabrera; hector.canovas@esa.int

Summary:

This code shows how to retrieve the different DataLink products from an input list of Gaia DR3 sources. These products are serialised in three different data structures:

  • INDIVIDUAL
  • COMBINED, and
  • RAW

Although all data structures contain virtually the same information, the RAW format - the internal format used by the Gaia collaboration - is not intended for the final users (see for details the DataLink: Products serialisation tutorial). This notebook shows the content of the INDIVIDUAL & COMBINED products, whose serialisation follows different IVOA data model recommendations and it allows to easily inspect the product content. We recommend to select the COMBINED format when downloading DataLink products for large (>1000) amounts of sources to reduce the total download time.

Useful URLs:

In [1]:
from astroquery.gaia import Gaia
import matplotlib.pyplot as plt
In [2]:
def extract_dl_ind(datalink_dict, key, figsize = [15,5], fontsize = 12, linewidth = 2, show_legend = True, show_grid = True):
    ""
    "Extract individual DataLink products and export them to an Astropy Table"
    ""
    dl_out  = datalink_dict[key][0].to_table()
    if 'time' in dl_out.keys():
        plot_e_phot(dl_out, colours  = ['green', 'red', 'blue'], title = 'Epoch photometry', fontsize = fontsize, show_legend = show_legend, show_grid = show_grid, figsize = figsize)
    if 'wavelength' in dl_out.keys():
        if len(dl_out) == 343:  title = 'XP Sampled'
        if len(dl_out) == 2401: title = 'RVS'
        plot_sampled_spec(dl_out, color = 'blue', title = title, fontsize = fontsize, show_legend = False, show_grid = show_grid, linewidth = linewidth, legend = '', figsize = figsize)
    return dl_out


def plot_e_phot(inp_table, colours  = ['green', 'red', 'blue'], title = 'Epoch photometry', fontsize = 12, show_legend = True, show_grid = True, figsize = [15,5]):
    ""
    "Epoch photometry plotter. 'inp_table' MUST be an Astropy-table object."
    ""
    fig      = plt.figure(figsize=figsize)
    xlabel   = f'JD date [{inp_table["time"].unit}]'
    ylabel   = f'magnitude [{inp_table["mag"].unit}]'
    gbands   = ['G', 'RP', 'BP']
    colours  = iter(colours)

    plt.gca().invert_yaxis()
    for band in gbands:
        phot_set = inp_table[inp_table['band'] == band]
        plt.plot(phot_set['time'], phot_set['mag'], 'o', label = band, color = next(colours))
    make_canvas(title = title, xlabel = xlabel, ylabel = ylabel, fontsize= fontsize, show_legend=show_legend, show_grid = show_grid)
    plt.show()


def plot_sampled_spec(inp_table, color = 'blue', title = '', fontsize = 14, show_legend = True, show_grid = True, linewidth = 2, legend = '', figsize = [12,4], show_plot = True):
    ""
    "RVS & XP sampled spectrum plotter. 'inp_table' MUST be an Astropy-table object."
    ""
    if show_plot:
        fig      = plt.figure(figsize=figsize)
    xlabel   = f'Wavelength [{inp_table["wavelength"].unit}]'
    ylabel   = f'Flux [{inp_table["flux"].unit}]'
    plt.plot(inp_table['wavelength'], inp_table['flux'], '-', linewidth = linewidth, label = legend)
    make_canvas(title = title, xlabel = xlabel, ylabel = ylabel, fontsize= fontsize, show_legend=show_legend, show_grid = show_grid)
    if show_plot:
        plt.show()


def make_canvas(title = '', xlabel = '', ylabel = '', show_grid = False, show_legend = False, fontsize = 12):
    ""
    "Create generic canvas for plots"
    ""
    plt.title(title,    fontsize = fontsize)
    plt.xlabel(xlabel,  fontsize = fontsize)
    plt.ylabel(ylabel , fontsize = fontsize)
    plt.xticks(fontsize = fontsize)
    plt.yticks(fontsize = fontsize)
    if show_grid:
        plt.grid()
    if show_legend:
        plt.legend(fontsize = fontsize*0.75)
 

Connect to the Gaia Archive

The DataLink products are available to both registered & anonymous users. However, we recommend to access as a registered user due to their extra benefits when executing long queries (as explained in this FAQ).

In [3]:
Gaia.login()
 
INFO: Login to gaia TAP server [astroquery.gaia.core]
User: hcanovas
Password: ········
OK
INFO: Login to gaia data server [astroquery.gaia.core]
OK
 

Download data sample

The query below retrieves a random sample of Gaia (E)DR3 sources having all types of DataLink products.

In [4]:
query = f"SELECT source_id, ra, dec, pmra, pmdec, parallax \
FROM gaiadr3.gaia_source \
WHERE has_epoch_photometry = 'True' \
AND has_xp_sampled = 'True'\
AND has_rvs = 'True' \
AND has_mcmc_msc = 'True' \
AND has_mcmc_gspphot = 'True' \
AND random_index between 0 and 200000"


job     = Gaia.launch_job_async(query)
results = job.get_results()
print(f'Table size (rows): {len(results)}')
results
 
INFO: Query finished. [astroquery.utils.tap.core]
Table size (rows): 3
Out[4]:
Table length=3
source_id ra dec pmra pmdec parallax
  deg deg mas / yr mas / yr mas
int64 float64 float64 float64 float64 float64
6196457933368101888 202.80436078238418 -21.178991138861807 80.54562044679744 -32.95247075512294 10.167137280246173
5924045608237672448 257.635024432604 -53.35065341915946 -4.404105752618793 -6.63122508730231 0.19938320884996538
4911590910260264960 24.783541498908786 -55.317468647500505 40.64757827861938 10.758104689073546 6.2453699013330555
 

The example below retrieves ALL available DataLink products for the input sample of Gaia Source IDs. This option significantly increases the total download time, and here it is selected only for teaching purposes. If you are not interested in downloading all products we recommend you to specify the DataLink product in retrieval_type.

The downloaded files can be stored locally by specifying the output file directory via the output_file option in the load_data method below. Note that:

  • The DataLink products are stored in a .gz compressed directory. To avoid errors, this shoud be considered when naming the output file, e.g., output_file = 'datalink_output.gz'
  • The individual files will also be saved in the same directory from where this notebook is being launched. This is a known bug and we are working to fix it.
  • Finally, the metadata of some of the products raises an Astropy units warning. This is a known issue and we are also working on it.
In [5]:
retrieval_type = 'ALL'          # Options are: 'EPOCH_PHOTOMETRY', 'MCMC_GSPPHOT', 'MCMC_MSC', 'XP_SAMPLED', 'XP_CONTINUOUS', 'RVS', 'ALL'
data_structure = 'INDIVIDUAL'   # Options are: 'INDIVIDUAL', 'COMBINED', 'RAW'
data_release   = 'Gaia DR3'     # Options are: 'Gaia DR3' (default), 'Gaia DR2'


datalink = Gaia.load_data(ids=results['source_id'], data_release = data_release, retrieval_type=retrieval_type, data_structure = data_structure, verbose = False, output_file = None)
dl_keys  = [inp for inp in datalink.keys()]
dl_keys.sort()

print()
print(f'The following Datalink products have been downloaded:')
for dl_key in dl_keys:
    print(f' * {dl_key}')
 
WARNING: UnitsWarning: Unit 'em' not supported by the VOUnit standard. Did you mean Elm or Em? [astropy.units.format.vounit]
WARNING: UnitsWarning: Unit 'wl' not supported by the VOUnit standard.  [astropy.units.format.vounit]
 
The following Datalink products have been downloaded:
 * EPOCH_PHOTOMETRY-Gaia DR3 4911590910260264960.xml
 * EPOCH_PHOTOMETRY-Gaia DR3 5924045608237672448.xml
 * EPOCH_PHOTOMETRY-Gaia DR3 6196457933368101888.xml
 * MCMC_GSPPHOT-Gaia DR3 4911590910260264960.xml
 * MCMC_GSPPHOT-Gaia DR3 5924045608237672448.xml
 * MCMC_GSPPHOT-Gaia DR3 6196457933368101888.xml
 * MCMC_MSC-Gaia DR3 4911590910260264960.xml
 * MCMC_MSC-Gaia DR3 5924045608237672448.xml
 * MCMC_MSC-Gaia DR3 6196457933368101888.xml
 * RVS-Gaia DR3 4911590910260264960.xml
 * RVS-Gaia DR3 5924045608237672448.xml
 * RVS-Gaia DR3 6196457933368101888.xml
 * XP_CONTINUOUS-Gaia DR3 4911590910260264960.xml
 * XP_CONTINUOUS-Gaia DR3 5924045608237672448.xml
 * XP_CONTINUOUS-Gaia DR3 6196457933368101888.xml
 * XP_SAMPLED-Gaia DR3 4911590910260264960.xml
 * XP_SAMPLED-Gaia DR3 5924045608237672448.xml
 * XP_SAMPLED-Gaia DR3 6196457933368101888.xml
 

Detailed content

The DataLink products are stored inside a Python Dictionary, where each element (key) contains a one-element list. In addition:

  • The epoch photometry, MCMC's, and XP continuous products consist in a table that includes a "source_id" field.

  • The XP sampled and RVS products consist in a table that is serialised following the IVOA Spectrum Data Model (see for details the DataLink: Products serialisation tutorial). As a result, a number of parameters (including the the source_id) associated to these files is stored in the table metadata. The cell below shows how to extract these parameters, and how to export the table content to an Astropy Table object.

In [12]:
dl_key   = 'RVS-Gaia DR3 6196457933368101888.xml' # Try out using other XP_Sampled or RVS products (e.g., 'XP_SAMPLED-Gaia DR3 4911590910260264960.xml')
product  = datalink[dl_key][0]
items    = [item for item in product.iter_fields_and_params()]

if 'RVS' in dl_key or 'XP_SAMPLED' in dl_key:
    for item in items:
        print(item)
    print()
    print(f'Showing data for source_id: {product.get_field_by_id("source_id").value}')

prod_tab = product.to_table()
prod_tab[0:5]
 
<PARAM ID="source_id" datatype="long" name="source_id" ucd="meta.id;meta.main" value="6196457933368101888"/>
<PARAM ID="solution_id" datatype="long" name="solution_id" ucd="meta.version" value="5950420259779346465"/>
<PARAM ID="combined_transits" datatype="int" name="combined_transits" ucd="meta.number" value="26"/>
<PARAM ID="combined_ccds" datatype="int" name="combined_ccds" ucd="meta.number" value="73"/>
<PARAM ID="deblended_ccds" datatype="int" name="deblended_ccds" ucd="meta.number" value="6"/>
<PARAM ID="spatialLocation" arraysize="2" datatype="double" name="spatialLocation" ucd="pos.eq" unit="deg" utype="spec:Char.SpatialAxis.Coverage.Location.Value" value="[202.80436078 -21.17899114]"/>
<PARAM ID="TimeAxisCoverageLocation" datatype="double" name="TimeAxisCoverageLocation" ucd="time.epoch" unit="yr" utype="spec:Char.TimeAxis.Coverage.Location.Value" value="2016.0"/>
<PARAM ID="TimeAxisCoverageBoundsExtent" datatype="double" name="TimeAxisCoverageBoundsExtent" ucd="time.duration" unit="yr" utype="spec:Char.TimeAxis.Coverage.Bounds.Extent" value="2.83"/>
<PARAM ID="spectralAccuracyStatError" datatype="double" name="spectralAccuracyStatError" ucd="stat.error;em.wl" unit="nm" utype="spec:Char.SpectralAxis.Accuracy.StatError" value="0.0"/>
<PARAM ID="spectralLocation" datatype="double" name="spectralLocation" ucd="instr.bandpass" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Location.Value" value="0.0"/>
<PARAM ID="spectralCoverageBoundsExtent" datatype="double" name="spectralCoverageBoundsExtent" ucd="instr.bandwidth" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Bounds.Extent" value="24.0"/>
<PARAM ID="spectralCoverageBoundsStart" datatype="double" name="spectralCoverageBoundsStart" ucd="stat.min" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Bounds.Start" value="846.0"/>
<PARAM ID="spectralCoverageBoundsStop" datatype="double" name="spectralCoverageBoundsStop" ucd="stat.max" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Bounds.Stop" value="870.0"/>
<PARAM ID="SpatialExtent" datatype="double" name="SpatialExtent" ucd="instr.fov" unit="deg" utype="spec:Spectrum.Char.SpatialAxis.Coverage.Bounds.Extent" value="0.000491106"/>
<PARAM ID="DataModel" arraysize="13" datatype="char" name="DataModel" utype="spec:Spectrum.DataModel" value="Spectrum 1.01"/>
<PARAM ID="Publisher" arraysize="13" datatype="char" name="Publisher" ucd="meta.curation" utype="spec:Curation.Publisher" value="ESA/Gaia/DPAC"/>
<PARAM ID="Title" arraysize="*" datatype="char" name="Title" utype="spec:DataID.Title" value=""/>
<PARAM ID="SpectralAxisUcd" arraysize="5" datatype="char" name="SpectralAxisUcd" unit="em wl" utype="spec:Spectrum.Char.SpectralAxis.Ucd" value="em.wl"/>
<PARAM ID="SpectralAxisUnit" arraysize="2" datatype="char" name="SpectralAxisUnit" unit="nm" utype="spec:Spectrum.Char.SpectralAxis.Unit" value="nm"/>
<PARAM ID="FluxAxisUcd" arraysize="29" datatype="char" name="FluxAxisUcd" utype="spec:Spectrum.Char.FluxAxis.Ucd" value="phot.flux.density;arith.ratio"/>
<PARAM ID="FluxAxisUnit" arraysize="*" datatype="char" name="FluxAxisUnit" utype="spec:Spectrum.Char.FluxAxis.Unit" value=""/>
<FIELD ID="wavelength" datatype="double" name="wavelength" ucd="em.wl" unit="nm" utype="spec:Data.SpectralAxis.Value"/>
<FIELD ID="flux" datatype="float" name="flux" ucd="phot.flux;em.opt.I"/>
<FIELD ID="flux_error" datatype="float" name="flux_error" ucd="stat.error;phot.flux;em.opt.I"/>

Showing data for source_id: 6196457933368101888
Out[12]:
Table length=5
wavelength flux flux_error
nm    
float64 float32 float32
846.0 0.9810099 0.016027793
846.01 0.96773505 0.02387072
846.02 0.9449764 0.02818237
846.03 0.9631098 0.022609279
846.04 0.99672407 0.020400077
 

The code below creates a plot if the downloaded product contains epoch photometry or a sampled spectrum (RVS or XP). Try yourself and examine the content of the different products by commenting/uncommenting the dl_key variable below. The table displayed below only shows the first 5 elements to shorten this Notebook.

In [13]:
dl_key  = 'EPOCH_PHOTOMETRY-Gaia DR3 4911590910260264960.xml'
# dl_key  = 'MCMC_MSC-Gaia DR3 5924045608237672448.xml'
# dl_key  = 'MCMC_GSPPHOT-Gaia DR3 5924045608237672448.xml'
# dl_key  = 'XP_CONTINUOUS-Gaia DR3 4911590910260264960.xml'
# dl_key  = 'RVS-Gaia DR3 6196457933368101888.xml'
# dl_key  = 'XP_SAMPLED-Gaia DR3 6196457933368101888.xml'

dl_out  = extract_dl_ind(datalink, dl_key, figsize=[20,7])   # Change the figsize to e.g. figsize=[20,7] to increase the size of the displayed image.
dl_out[0:5]                                                  # Remove the '[0:5]' to display the entire table.
 
Out[13]:
Table length=5
source_id transit_id band time mag flux flux_error flux_over_error rejected_by_photometry rejected_by_variability other_flags solution_id
      d mag electron / s electron / s          
int64 int64 object float64 float64 float64 float64 float32 bool bool int64 int64
4911590910260264960 14922212505719965 G 1666.7066178487407 11.642262897057917 414993.78049738676 487.08184643327724 852.0001 False False 4097 375316653866487564
4911590910260264960 14936045572629106 G 1666.9567829719524 11.641661753520346 415223.6152432905 756.6865282971511 548.73926 False False 4609 375316653866487564
4911590910260264960 14945786367190460 G 1667.132946715427 11.648596812514562 412579.85777377436 264.43392219174564 1560.238 False False 4097 375316653866487564
4911590910260264960 14949878655266891 G 1667.2069563968746 11.643335932713073 414583.84376818856 531.3042654941166 780.3134 False False 4097 375316653866487564
4911590910260264960 14959619461231507 G 1667.3831203042998 11.649341777555474 412296.8680857516 249.81574135962032 1650.4039 False False 4097 375316653866487564
 

As it happens with the INDIVIDUAL example above, the following example retrieves ALL available DataLink products for the input sample of Gaia Source IDs. If you are not interested in downloading all products we recommend you to specify the DataLink product in retrieval_type (e.g., retrieval_type = 'RVS')

In [14]:
retrieval_type = 'ALL'          # Options are: 'EPOCH_PHOTOMETRY', 'MCMC_GSPPHOT', 'MCMC_MSC', 'XP_SAMPLED', 'XP_CONTINUOUS', 'RVS', 'ALL'
data_structure = 'COMBINED'     # Options are: 'INDIVIDUAL', 'COMBINED', 'RAW'
data_release   = 'Gaia DR3'     # Options are: 'Gaia DR3' (default), 'Gaia DR2'


datalink  = Gaia.load_data(ids=results['source_id'], data_release = data_release, retrieval_type=retrieval_type, data_structure = data_structure, verbose = False, output_file = None)
dl_keys  = [inp for inp in datalink.keys()]
dl_keys.sort()

print()
print(f'The following Datalink products have been downloaded:')
for dl_key in dl_keys:
    print(f' * {dl_key}')
 
WARNING: UnitsWarning: Unit 'em' not supported by the VOUnit standard. Did you mean Elm or Em? [astropy.units.format.vounit]
WARNING: UnitsWarning: Unit 'wl' not supported by the VOUnit standard.  [astropy.units.format.vounit]
 
The following Datalink products have been downloaded:
 * EPOCH_PHOTOMETRY_COMBINED.xml
 * MCMC_GSPPHOT_COMBINED.xml
 * MCMC_MSC_COMBINED.xml
 * RVS_COMBINED.xml
 * XP_CONTINUOUS_COMBINED.xml
 * XP_SAMPLED_COMBINED.xml
 

Detailed content

The DataLink products are stored inside a Python Dictionary, where each element (key) contains a one- or a multi-element list, depending on the product:

  • The epoch photometry, MCMC's, and XP continuous products consist in a single-element list, which is a table that includes a "source_id" field.

  • The XP sampled and RVS products consist in a multi-element list, where each element is a table serialised following the IVOA Spectrum Data Model.

 

Extract data for individual sources (epoch photometry, MCMC's, and XP continuous)

The table displayed below only shows the first 5 elements to shorten this Notebook.

In [15]:
dl_key      = 'EPOCH_PHOTOMETRY_COMBINED.xml'     # Try also with 'XP_CONTINUOUS_COMBINED.xml', 'MCMC_MSC_COMBINED.xml', 'MCMC_GSPPHOT_COMBINED.xml'
product     = datalink[dl_key][0]
product_tb  = product.to_table()                  # Export to Astropy Table object.
source_ids  = list(set(product_tb['source_id']))  # Detect source_ids.
print(f' There is data for the following Source IDs:')
for source_id in source_ids:
    print(f'* {source_id}')


inp_source = source_ids[0]                        # Replace "1" by "0" or "2" to show the data for the individual sources.
product_tb = product_tb[product_tb['source_id'] == inp_source]

print()
print(f'Showing data for source_id {inp_source}')
product_tb[0:5]                                   # Remove the '[0:5]' to display the entire table.
 
 There is data for the following Source IDs:
* 4911590910260264960
* 6196457933368101888
* 5924045608237672448

Showing data for source_id 4911590910260264960
Out[15]:
Table length=5
source_id transit_id band time mag flux flux_error flux_over_error rejected_by_photometry rejected_by_variability other_flags solution_id
      d mag electron / s electron / s          
int64 int64 object float64 float64 float64 float64 float32 bool bool int64 int64
4911590910260264960 14922212505719965 G 1666.7066178487407 11.642262897057917 414993.78049738676 487.08184643327724 852.0001 False False 4097 375316653866487564
4911590910260264960 14936045572629106 G 1666.9567829719524 11.641661753520346 415223.6152432905 756.6865282971511 548.73926 False False 4609 375316653866487564
4911590910260264960 14945786367190460 G 1667.132946715427 11.648596812514562 412579.85777377436 264.43392219174564 1560.238 False False 4097 375316653866487564
4911590910260264960 14949878655266891 G 1667.2069563968746 11.643335932713073 414583.84376818856 531.3042654941166 780.3134 False False 4097 375316653866487564
4911590910260264960 14959619461231507 G 1667.3831203042998 11.649341777555474 412296.8680857516 249.81574135962032 1650.4039 False False 4097 375316653866487564
 

Extract data for individual sources (XP sampled and RVS)

In [16]:
dl_key   = 'RVS_COMBINED.xml'    # Try also with 'XP_SAMPLED_COMBINED.xml'
product  = datalink[dl_key][0]   # Replace "1" by "0" or "2" to show the data for the individual sources.
items    = [item for item in product.iter_fields_and_params()]

for item in items:
    print(item)

print()
print(f'Showing data for source_id: {product.get_field_by_id("source_id").value}')
prod_tab = product.to_table()
prod_tab[0:5]
 
<PARAM ID="source_id" datatype="long" name="source_id" ucd="meta.id;meta.main" value="6196457933368101888"/>
<PARAM ID="solution_id" datatype="long" name="solution_id" ucd="meta.version" value="5950420259779346465"/>
<PARAM ID="combined_transits" datatype="int" name="combined_transits" ucd="meta.number" value="26"/>
<PARAM ID="combined_ccds" datatype="int" name="combined_ccds" ucd="meta.number" value="73"/>
<PARAM ID="deblended_ccds" datatype="int" name="deblended_ccds" ucd="meta.number" value="6"/>
<PARAM ID="spatialLocation" arraysize="2" datatype="double" name="spatialLocation" ucd="pos.eq" unit="deg" utype="spec:Char.SpatialAxis.Coverage.Location.Value" value="[202.80436078 -21.17899114]"/>
<PARAM ID="TimeAxisCoverageLocation" datatype="double" name="TimeAxisCoverageLocation" ucd="time.epoch" unit="yr" utype="spec:Char.TimeAxis.Coverage.Location.Value" value="2016.0"/>
<PARAM ID="TimeAxisCoverageBoundsExtent" datatype="double" name="TimeAxisCoverageBoundsExtent" ucd="time.duration" unit="yr" utype="spec:Char.TimeAxis.Coverage.Bounds.Extent" value="2.83"/>
<PARAM ID="spectralAccuracyStatError" datatype="double" name="spectralAccuracyStatError" ucd="stat.error;em.wl" unit="nm" utype="spec:Char.SpectralAxis.Accuracy.StatError" value="0.0"/>
<PARAM ID="spectralLocation" datatype="double" name="spectralLocation" ucd="instr.bandpass" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Location.Value" value="0.0"/>
<PARAM ID="spectralCoverageBoundsExtent" datatype="double" name="spectralCoverageBoundsExtent" ucd="instr.bandwidth" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Bounds.Extent" value="24.0"/>
<PARAM ID="spectralCoverageBoundsStart" datatype="double" name="spectralCoverageBoundsStart" ucd="stat.min" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Bounds.Start" value="846.0"/>
<PARAM ID="spectralCoverageBoundsStop" datatype="double" name="spectralCoverageBoundsStop" ucd="stat.max" unit="nm" utype="spec:Char.SpectralAxis.Coverage.Bounds.Stop" value="870.0"/>
<PARAM ID="SpatialExtent" datatype="double" name="SpatialExtent" ucd="instr.fov" unit="deg" utype="spec:Spectrum.Char.SpatialAxis.Coverage.Bounds.Extent" value="0.000491106"/>
<PARAM ID="DataModel" arraysize="13" datatype="char" name="DataModel" utype="spec:Spectrum.DataModel" value="Spectrum 1.01"/>
<PARAM ID="Publisher" arraysize="13" datatype="char" name="Publisher" ucd="meta.curation" utype="spec:Curation.Publisher" value="ESA/Gaia/DPAC"/>
<PARAM ID="Title" arraysize="*" datatype="char" name="Title" utype="spec:DataID.Title" value=""/>
<PARAM ID="SpectralAxisUcd" arraysize="5" datatype="char" name="SpectralAxisUcd" unit="em wl" utype="spec:Spectrum.Char.SpectralAxis.Ucd" value="em.wl"/>
<PARAM ID="SpectralAxisUnit" arraysize="2" datatype="char" name="SpectralAxisUnit" unit="nm" utype="spec:Spectrum.Char.SpectralAxis.Unit" value="nm"/>
<PARAM ID="FluxAxisUcd" arraysize="29" datatype="char" name="FluxAxisUcd" utype="spec:Spectrum.Char.FluxAxis.Ucd" value="phot.flux.density;arith.ratio"/>
<PARAM ID="FluxAxisUnit" arraysize="*" datatype="char" name="FluxAxisUnit" utype="spec:Spectrum.Char.FluxAxis.Unit" value=""/>
<FIELD ID="wavelength" datatype="double" name="wavelength" ucd="em.wl" unit="nm" utype="spec:Data.SpectralAxis.Value"/>
<FIELD ID="flux" datatype="float" name="flux" ucd="phot.flux;em.opt.I"/>
<FIELD ID="flux_error" datatype="float" name="flux_error" ucd="stat.error;phot.flux;em.opt.I"/>

Showing data for source_id: 6196457933368101888
Out[16]:
Table length=5
wavelength flux flux_error
nm    
float64 float32 float32
846.0 0.9810099 0.016027793
846.01 0.96773505 0.02387072
846.02 0.9449764 0.02818237
846.03 0.9631098 0.022609279
846.04 0.99672407 0.020400077
In [19]:
dl_key      = 'XP_SAMPLED_COMBINED.xml'          # Try also with 'RVS_COMBINED.xml'
source_ids  = [product.get_field_by_id("source_id").value for product in datalink[dl_key]]
tables      = [product.to_table()                         for product in datalink[dl_key]]


fig          = plt.figure(figsize=[20,7])        # Change the figsize to e.g. figsize=[30,7] to increase the size of the displayed image.
source_ids_i = iter(source_ids)
for inp_table in tables:
    plot_sampled_spec(inp_table, title=dl_key.replace('_COMBINED.xml', ''), legend = f'source ID = {next(source_ids_i)}', show_plot=False)
plt.show()
 

Tutorial - Programmatic download of large datasets through DataLink

Authors: Héctor Cánovas and Jos de Bruijne

 

The main goal of the Jupyter Notebook displayed below is to teach how to retrieve large amounts (data for more than 5000 sources) of DataLink products using the Astroquery.Gaia Python package. The code and its associated requirements.txt file can be downloaded from this link.

 

Tutorial: Download DataLink products for >5000 sources





 

Release number: v1.0 (2022-07-06)

Applicable Gaia Data Releases: Gaia EDR3, Gaia DR3

Author: Héctor Cánovas Cabrera; hector.canovas@esa.int

Summary:

This Jupyter Notebook allows to overcome the Gaia Archive DataLink products download threshold by first splitting an input source list into multiple chunks, each of them having $\leq$ 5000 sources. Then, a sequential download begins and the multiple outputs are finally merged. As explained in the DataLink: products serialisation tutorial, it is possible to retrieve DataLink products in various data structures and formats. We suggest to retrieve the DataLink products in COMBINED data structure (as shown in all the examples below) because our tests indicate that this is the most efficient data structure to download large amounts of products. For simplicity, all the products in the following examples are downloaded in VOTable. This allows to easily export them to several other formats using the tools available within the Astropy.table module. This complementary tutorial shows how to download and inspect all the different DataLink products via Astroquery.Gaia for an small sample of sources. Finally, while executing this notebook it is posisble to receive a few warnings about the units included in the product metadata. Those are known issues and we are working on them.

Useful URLs:

In [1]:
from astropy.table import Table, vstack
from astroquery.gaia import Gaia
import numpy as np
In [2]:
def chunks(lst, n):
    ""
    "Split an input list into multiple chunks of size =< n"
    ""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
 

Connect to the Gaia Archive

The DataLink products are available to both registered & anonymous users. However, we recommend to access as a registered user due to their extra benefits when executing long queries (as explained in this FAQ).

In [3]:
Gaia.login()
 
INFO: Login to gaia TAP server [astroquery.gaia.core]
User: hcanovas
Password: ········
OK
INFO: Login to gaia data server [astroquery.gaia.core]
OK
 

Execute ADQL Query

The query below retrieves data for 12000 sources that have associated all the DataLink products offered in Gaia DR3.

In [4]:
query = "SELECT TOP 5100 source_id, ra, dec, parallax from gaiadr3.gaia_source \
WHERE has_epoch_photometry = 'True' AND \
has_mcmc_gspphot = 'True' AND \
has_mcmc_msc = 'True' AND \
has_xp_sampled = 'True' AND \
has_rvs = 'True'"

job     = Gaia.launch_job_async(query)
results = job.get_results()
results[0:5]
 
INFO: Query finished. [astroquery.utils.tap.core]
Out[4]:
Table length=5
source_id ra dec parallax
  deg deg mas
int64 float64 float64 float64
2263166706630078848 295.13035167754015 70.28624696426813 17.357227526090668
2263178457660566784 294.86955515586925 70.52640371163079 5.99456673538563
2268372099615724288 285.63663592006975 75.41851051257491 23.857068308325488
5912901375001820288 263.99225124991324 -58.82661905857226 6.476061657906406
2266609140096698112 275.7457014457717 72.17444369607303 7.253739784978569
 

Warning: The load_data method allows to retrieve all types of DataLink products (epoch photometry, MCMC's, and spectra) in one single call (see below). However, selecting this option when attempting to retrieve DataLink products for large (>1000) amount of sources can severely delay the dataset preparation on the server side, and even result in a download error. Therefore, we strongly recommend to select one a product at a time in this case.

 

Split the input list into several chunks containing =<5000 elements each

In [5]:
dl_threshold = 5000               # DataLink server threshold. It is not possible to download products for more than 5000 sources in one single call.
ids          = results['source_id']
ids_chunks   = list(chunks(ids, dl_threshold))
datalink_all = []


print(f'* Input list contains {len(ids)} source_IDs')
print(f'* This list is split into {len(ids_chunks)} chunks of <= {dl_threshold} elements each')
 
* Input list contains 5100 source_IDs
* This list is split into 2 chunks of <= 5000 elements each
In [6]:
retrieval_type = 'RVS'        # Options are: 'EPOCH_PHOTOMETRY', 'MCMC_GSPPHOT', 'MCMC_MSC', 'XP_SAMPLED', 'XP_CONTINUOUS', 'RVS' 
data_structure = 'COMBINED'   # Options are: 'INDIVIDUAL', 'COMBINED', 'RAW' - but as explained above, we strongly recommend to use COMBINED for massive downloads.
data_release   = 'Gaia DR3'   # Options are: 'Gaia DR3' (default), 'Gaia DR2'
dl_key         = f'{retrieval_type}_{data_structure}.xml'


ii = 0
for chunk in ids_chunks:
    ii = ii + 1
    print(f'Downloading Chunk #{ii}; N_files = {len(chunk)}')
    datalink  = Gaia.load_data(ids=chunk, data_release = data_release, retrieval_type=retrieval_type, format = 'votable', data_structure = data_structure)
    datalink_all.append(datalink)
 
Downloading Chunk #1; N_files = 5000
 
WARNING: UnitsWarning: Unit 'em' not supported by the VOUnit standard. Did you mean Elm or Em? [astropy.units.format.vounit]
WARNING: UnitsWarning: Unit 'wl' not supported by the VOUnit standard.  [astropy.units.format.vounit]
 
Downloading Chunk #2; N_files = 100
 

The sampled spectra (XP and RVS) are serialised following the IVOA Spectrum Data Model and as a result a number of parameters, including the associated source_id, are stored in the table metadata. This is taken into account in the cells below.

 

Epoch Photometry, MCMC, or XP Continuous

In this case, the merged product is one single table that includes the source_id in one of the table fields. The code below includes an example showing how to write the entire table using the Astropy.table module.

Warning: the written table can have a size >1 Gb.

In [7]:
if 'RVS' not in dl_key and 'XP_SAMPLED' not in dl_key:
    temp       = [inp[dl_key][0].to_table() for inp in datalink_all]
    merged     = vstack(temp)
    file_name  = f"{dl_key}_{data_release.replace(' ','_')}.vot"

    print(f'Writting table as: {file_name}')
    merged.write(file_name, format = 'votable', overwrite = True)

    display(merged)
 

XP sampled or RVS

In this case, the merged product is one Python list whose elements are all the individual products. The code below includes an example showing how to write an individual table using the Astropy.table module

In [9]:
if 'RVS' in dl_key or 'XP_SAMPLED'  in dl_key:
    product_list_tb  = [item                                    for sublist in datalink_all for item in sublist[dl_key]]
    product_list_ids = [item.get_field_by_id("source_id").value for sublist in datalink_all for item in sublist[dl_key]]


    ii          = 12     # Try different values to display the content of the individual products.
    source_id   = product_list_ids[ii]
    product_tab = product_list_tb[ii].to_table()
    file_name   = f"{dl_key.replace('_COMBINED.xml', '')}_{data_release.replace(' ','_')}_{source_id}.vot"

    print(f'Writting table as: {file_name}')
    product_tab.write(file_name, format = 'votable', overwrite = True)
    print()
    print(f'Showing {retrieval_type} for source_id = {source_id}')
    display(product_tab[:5])
 
Writting table as: RVS_Gaia_DR3_5912768368455408000.vot

Showing RVS for source_id = 5912768368455408000
 
Table length=5
wavelength flux flux_error
nm    
float64 float32 float32
846.0 0.961344 0.03171042
846.01 0.9489333 0.020776467
846.02 0.9774552 0.017536303
846.03 0.9911668 0.014816107
846.04 0.9947043 0.013110418