Help support

Should you have any question, please check the Gaia FAQ section or contact the Gaia Helpdesk

​​​​​​​Cross-match tutorial

Author: Alcione Mora

This tutorial provides tips for cross-matching Gaia with other catalogues. The main tool is the geometrical query, typically complemented with additional conditions and coordinate propagation in special cases when the epoch of both catalogues is significantly different.

For a simple cross-match between Gaia and a user provided list of source names or coordinates use the Archive GUI: "Basic" => "File" (100 sources or less), which is described to some extent in the introductory tutorial Simple form (DR1).

This is an intermediate level tutorial that assumes a basic knowledge of the general interface and workflow. The introductory tutorials White dwarfs exploration and Cluster analysis (GUI) are recommended in case of difficulties following this exercise.

Introduction

One of the most Frequently Asked Questions is how to combine Gaia with other catalogues, either a major survey (e.g. 2MASS, Pan-STARRs) or a user provided table.

The initial step is to recognise cross-matching catalogues as a scientific exercise. That is, there is no general rule to determine whether two sources observed differently (telescope, instrument, wavelength, resolution, epoch, …) are physically related to the same object. This decision depends on the context and science case and can only be done by the astronomer.

For example, some Gaia DR2 sources (visible, sub-arcsec resolution, epoch J2015.5) might be related to IRAS 100 micrometres photometry (far-infrared, 2 arcmin resolution, epoch 1983) for the purpose of constructing Spectral Energy Distributions of pre-main sequence stars surrounded by circumstellar discs. However, VLBI coeval quasar astrometry needs careful combination with Gaia when comparing ICRF and Gaia-CRF realisations, due to possible microarcsec offsets between the radio and visible emitting regions, see e.g. Mignard+ (2018).

Having said that, there are tips that can help us in this endeavour. Some of them are discussed in the following. The most important tools are the geometrical conditions. They help us select potential match pairs between catalogues, typically based on a maximum distance on the sphere. Proximity alone is not a match guarantee and is often supplemented by additional restrictions. Popular ones are maximum flux difference (e.g. |G - K| < 2), 3D location in space (e.g. 4 < parallax < 5) or astrometric quality (e.g. RUWE < 1.4). Finally, proper motions and other parameters need to be taken into account for fast moving objects when the time difference between catalogues is significant.

CDS cross-match service and J2000

Significant effort has been invested by data centres like CDS to homogenise catalogues as much as possible. For example, Vizier catalogues include the synthetic columns RAJ2000 and DEJ2000, which aim at providing ICRS coordinates, propagated to epoch J2000 when proper motions are available. The excellent CDS cross-match service makes use of these columns.

However, great care is needed to interpret the results, in particular when significant proper motions are suspected, because most catalogues lack proper motions, which means the comparison at epoch J2000 might only be partly true. For example, in a cross-match between Gaia DR2 and Pan-STARRs DR1, the former might be time transformed from J2015.15 to J2000, but the latter cannot be shifted from J2013 (approx.) to J2000 because Pan-STARRs lacks proper motions. For a detailed discussion on the meanings of J2000 (reference system and epoch) look at the following FAQ.

Retrieve some sample data

The first step is to retrieve some data which will serve as the external catalogue to cross-match against Gaia. We will use the HST/ACS photometry from the NGC 346 open cluster analysed by Gouliermis+ (2006). It can be retrieved in multiple ways. Maybe the fastest is to paste the following link in a browser and click on "Submit" after loading the Vizier page.

http://vizier.u-strasbg.fr/viz-bin/VizieR-3?-source=J/ApJS/166/549/table2&-out.max=unlimited&-out.form=VOTable&-out.add=_RAJ,_DEJ&-oc.form=decimal

It should produce a VOTable that can be uploaded to the Gaia Archive user space while authenticated, e.g.: user_amora.gouliermis_2006 in my case.

Cross-match against small user table: ADQL cone search

The most basic (and powerful) geometric condition is the cone search. It selects potential matches based on angular proximity. Please, take your time to learn it. In many cases, this restriction alone is what most astronomers need. More advanced cross-matches also use cone searches as the first step.

The following query cross-matches the previously uploaded Gouliermis+ (2006) catalogue against Gaia using a 1 arcsec cone search radius.

select num, source_id, ra, dec, parallax, pmra, pmdec,
vmag, imag, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
distance(
  point('', col_raj2000, col_dej2000),
  point('', ra, dec)
) * 3600 as dist_arcsec
from user_amora.gouliermis_2006 as gouliermis
join gaiadr2.gaia_source as gaia
  on 1 = contains(
    point('', col_raj2000, col_dej2000),
    circle('', ra, dec, 1. / 3600)
  )

Output: 22406 records

The geometrical condition is applied in the join (match) between tables.

on 1 = contains(
  point('', col_raj2000, col_dej2000),
  circle('', ra, dec, 1. / 3600)
)

Circles of 1 arcsec = 1 deg / 3600 are traced around objects in Gaia to look for points in the Gouliermis+ (2006) catalogue.

Order matters! When cross-matching a small vs a big catalogue, put the coordinates from the small one in the "point" function and the large ones in the "circle". This is required to ensure the table geometric indices are properly used. The effect of reversing the order ranges from unnoticeable to query time-out.

 ​​​​

Cross-match against small table in external Archive: external TAPs + JOB_UPLOAD

If the catalogue exists in an external TAP Archive, as for the Gouliermis+ (2006) table in TAP-Vizier , it can be directly retrieved by the Gaia Archive. Note this table in particular is used in the first steps of the External Archives (TAPs) tutorial. This avoids downloading the table in the users' computer and the subsequent upload to the Gaia Archive.

Search keywords: gouliermis ngc 346 hst/acs

External TAP: TAP-Vizier http://tapvizier.u-strasbg.fr/TAPVizieR/tap

ADQL query:

select * from "J/ApJS/166/549/table2"

Output: 99079 records

The resulting job (e.g. job ID 1561683712009O) can now be uploaded as a user table (e.g. user_amora.gouliermis_2006) and cross-matched against Gaia following the steps in the previous section.

Alternatively, JOB_UPLOAD, which will be used extensively in this tutorial, allows us to do the cross-match on-the-fly without creating a user table due to the small size of the Gouliermis+ (2006) catalogue. The following ADQL query shows how to do it (remember to switch back to the Gaia Archive after using TAP-Vizier).

select num, source_id, ra, dec, parallax, pmra, pmdec,
vmag, imag, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
distance(
  point('', raj2000, dej2000),
  point('', ra, dec)
) * 3600 as dist_arcsec
from job_upload.job1561683712009O as gouliermis
join gaiadr2.gaia_source as gaia
  on 1 = contains(
    point('', raj2000, dej2000),
    circle('', ra, dec, 1. / 3600)
  )

where the previous job has been referenced in the query as follows.

from job_upload.job1561683712009O as gouliermis

Cross-match against small user table: Built-in tool

A built-in tool is available for authenticated users. It produces the same results for pure cone searches (no additional restrictions) and might be a good starting point for users with little experience in ADQL. The dedicated introductory tutorial White dwarfs exploration explains how to do it in detail. We Just need to replace the example white dwarfs catalogue by the Gouliermis+ (2006) table and use a radius of 1 arcsec.

The first step is to identify col_raj2000 and cold_dej2000 as the RA-DEC pair in the table properties editor

 

Then we need to run the built-in cross-match tool to compare user_amora.gouliermis_2006 against gaiadr2.gaia_source using a 1 arcsec radius

 

which produces the user table user_amora.xmatch_gouliermis_2006_gaia_source with the cross-match and the empty job xmatch_gouliermis_2006_gaia_source. Clicking on the last icon on the right in that job generates the final join query between Gouliermis+ (2006), Gaia and the cross-match table. It can be edited as follows to provide results directly comparable to those obtained in the previous sections.

SELECT
a."num", b."source_id", b."ra", b."dec",
b."parallax", b."pmra", b."pmdec",
a."vmag", a."imag", b."phot_g_mean_mag",
b."phot_bp_mean_mag", b."phot_rp_mean_mag", c."dist"
FROM user_amora.gouliermis_2006 AS a,
  gaiadr2.gaia_source AS b,
  user_amora.xmatch_gouliermis_2006_gaia_source AS c
WHERE (c.gouliermis_2006_gouliermis_2006_oid = a.gouliermis_2006_oid
  AND c.gaia_source_source_id = b.source_id)

Output: 22406 records

Cross-match against small user table: ADQL cone search + extra condition

As explained in the introduction, cross-matches are scientific problems. This means additional insight is needed on top of basic cone searches. Those criteria must be provided by the astronomers. For example: colours or parallaxes within a given range.

The following query enforces |V - G| < 2, which is a reasonable filter to reduce the number of false positives matches due to the similarity in wavelength and spatial resolution of Gaia and HST/ACS. This condition is particularly effective due to this HST/ACS catalogue containing several crowded areas (stellar clusters and associations).

select num, source_id, ra, dec, parallax, pmra, pmdec,
vmag, imag, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
distance(
  point('', col_raj2000, col_dej2000),
  point('', ra, dec)
) * 3600 as dist_arcsec
from user_amora.gouliermis_2006 as gouliermis
join gaiadr2.gaia_source as gaia
  on 1 = contains(
    point('', col_raj2000, col_dej2000),
    circle('', ra, dec, 1. / 3600)
  )
where abs(vmag - phot_g_mean_mag) < 2

Output: 6206 records

The photometric condition below is applied at the end of the query. Note it reduces the number of possible matches by a factor 3.6.

where abs(vmag - phot_g_mean_mag) < 2

Two large tables cross-match against one small user table: ADQL cone search + extra condition + join

Sometimes the data are scattered over more than two tables. For example, imagine we are only interested in matches where the overall astrometric quality is good. In our example, it would translate into combining three tables: Gouliermis+ (2016) for the HST/ACS photometry, gaiadr2.gaia_source for the astrometry and photometry and gaiadr2.ruwe for the astrometric RUWE metrics.

This means three tables need to be combined (joined) together. The first question to answer is: can it be done in several steps?. For example, could we combine HST/ACS with Gaia first, and then filter the results using the RUWE? If yes, please try to do it when possible.

For example, imagine the results obtained in the previous section (cone search + condition) produce job ID 1561684519759O. These results, being small, could be reused using JOB_UPLOAD in a subsequent query to filter by RUWE < 1.4, the usual cut applied for selecting good astrometry candidates, see more details in Lindegren (2018) technical note.

select intermediate.*, ruwe
from job_upload.job1561684519759O as intermediate
join gaiadr2.ruwe using(source_id)
where ruwe < 1.4

Output: 3297 records

Alternatively, the results obtained in the previous query could have been uploaded to a user table (e.g. user_amora.intermediate) for subsequent filtering.

select intermediate.*, ruwe
from user_amora.intermediate
join gaiadr2.ruwe using(source_id)
where ruwe < 1.4

There are many situations when the two steps process is impossible or undesirable. For example, if the intermediate table is very large. The following query shows a tentative naïve combination of the three tables into a single query (please do not execute).

select num, source_id, ra, dec, parallax, pmra, pmdec, ruwe,
vmag, imag, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
distance(
  point('', col_raj2000, col_dej2000),
  point('', ra, dec)
) * 3600 as dist_arcsec
from user_amora.gouliermis_2006 as gouliermis
join gaiadr2.gaia_source as gaia
  on 1 = contains(
    point('', col_raj2000, col_dej2000),
    circle('', ra, dec, 1. / 3600)
  )
join gaiadr2.ruwe using(source_id)
where abs(vmag - phot_g_mean_mag) < 2

However, it fails after extending for longer than most users' time allocation. The reason is the data base does not know how to efficiently plan the query. This is particularly true when combining geometric conditions with additional restrictions and or multiple table joins.

In these cases, we might use our insight to instruct the Archive on which parts of the query (subqueries) should be executed first. Note this is typically a bad idea. Data bases are usually smarter than humans at planning and rewriting queries. Only try to do it for failing queries, particularly if geometric queries are involved.

The recipe consists of two steps. First use a subquery to isolate whatever block needs to be executed earlier (typically the geometrical condition). Then, add OFFSET 0 at the end of the subquery. This condition has no semantic meaning (do not skip any records). However, it instructs the Archive PostgreSQL data base backend to respect the logical order (subquery is executed first and other conditions applied afterwards).

Let us see an example

select * from (
  select * from (
    select num, source_id, ra, dec, parallax, pmra, pmdec,
    vmag, imag, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag
    from user_amora.gouliermis_2006 as gouliermis
    join gaiadr2.gaia_source as gaia
      on 1 = contains(
        point('', col_raj2000, col_dej2000),
        circle('', ra, dec, 1. / 3600)
      )
    offset 0
  ) as subquery
  join gaiadr2.ruwe using(source_id)
  offset 0
) as subquery2
where abs(vmag - phot_g_mean_mag) < 2
  and ruwe < 1.4

It uses two subqueries with OFFSET 0 to enforce the following logical sequence: cone search => ruwe condition => photometric condition. The resulting query is elegant, and produces all the results in one go. However, it takes more execution time than splitting in two (556 vs a few seconds).

Cross-match against a major survey in the Archive: 2MASS vs Gaia at Pleiades

Sometimes we are interested in the combination of two big catalogues (hundreds of millions of rows). For example, Gaia + 2MASS. Should we be interested in ALL matches, then the Archive is probably not the tool of choice (see next section for pre-computed matches, though). Note Gaia data may be downloaded in bulk for offline analysis.

However, there are many science cases where astronomers are only interested in a small sample of the combined catalogues. This is why copies of several major surveys are hosted in the Gaia Archive. Some of them have been curated by DPAC and are described in the Data model documentation (e.g. gaiadr1.tmass_original_valid, gaiadr2.panstarrs1_original_valid) while others have been directly copied or adapted from external archives (e.g. external.apassdr9, public.hipparcos_newreduction). They can be found in the Archive GUI tree under branch "Other". See the contens as of July 2019.

 

For example, let us assume we want to retrieve good astrometry (RUWE < 1.4) 2MASS-Gaia sources closer than 2 arcsec in a circle of 2 deg radius around the Pleiades centre. The following query shows how to do it.

select * from (
  select source_id, ra, dec, parallax, pmra, pmdec, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag
  from gaiadr2.gaia_source
  where
  1 = contains(
    point('', 56.75, 24.1167),
    circle('', ra, dec, 2)
  )
  offset 0
) as gaia
join gaiadr1.tmass_original_valid as tmass
  on 1 = contains(
    point('', gaia.ra, gaia.dec),
	circle('', tmass.ra, tmass.dec, 2. / 3600)
  )
join gaiadr2.ruwe as ruwe
  using(source_id)

Output: 49873 records

Note it uses a subquery with OFFSET 0 to enforce only a subset of Gaia (2 deg radius cone search around the Pleiades cluster centre) is selected before cross-matching it with 2MASS (another cross-match, with a smaller radius of 2 arcsec) and filtering by RUWE.

2MASS vs Gaia at Pleiades: pre-computed cross-match

Comparing Gaia with other major surveys is one of the most popular actions in the Archive. Therefore, pre-computed cross-match tables exist for a limited number of catalogues. See Marrese+ (2018) for further details.

In these cases, both the cross-match and a copy of the external catalogue are hosted in the Gaia Archive. For example, gaiadr2.tmass_best_neighbour and gaiadr2.tmass_neighbourhood contain the pre-computed cross-matches between Gaia DR2 and 2MASS, for which a curated copy containing a subset of columns is also served in the Archive as table gaiadr1.tmass_original_valid. In some cases, the external catalogue has undergone minimal or no curation, and is closer to the original source, e.g. external.ravedr5_dr5, external.apassdr9. Note the steps in the previous section (ad-hoc cone search, not necessarily slower) can always be done, even if no pre-computed cross-match exists.

The following query shows how to select the best 2MASS neighbour and get RUWE values for the same Gaia sample used in previous section (2 deg around the Pleiades centre)

select source_id, gaia.ra, gaia.dec, parallax, pmra, pmdec, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
j_m, h_m, ks_m, ruwe
from gaiadr2.gaia_source as gaia
join gaiadr2.tmass_best_neighbour using (source_id)
join gaiadr1.tmass_original_valid using (tmass_oid)
join gaiadr2.ruwe as ruwe using(source_id)
where
  1 = contains(
    point('', 56.75, 24.1167),
    circle('', gaia.ra, gaia.dec, 2)
  )

Output: 48843 records

​​​​​​​ Epoch propagation has been applied to Gaia data to ensure a proper comparison is done between both catalogues. In most cases, neighbours are ranked based on geometrical criteria alone (distances and position errors).

​​Note the output is somewhat different to that obtained in the previous section. On the one hand this is expected: only best neighbours have been selected. On the other hand, keep in mind pre-computed cross-matches give priority to purity over completeness.

That is, any cross-match process will exclude genuine matches (completeness loss, false negatives) and include false pairs (purity loss, false positives). There is always a trade-off between purity and completeness. Pre-computed cross-matches are helpful in many situations (e.g. constructing a statistically significant sample with little contamination). Should them not be the tool required for your science case (e.g. identifying all possible matches, possibly reducing purity, for a one-by-one subsequent analysis), please follow the steps outlined in the previous section (ad-hoc cone search).

Proper-motion corrected cross-match. Introduction

Stars move in the sky. This means the full astrometry (mostly proper motions) needs to be considered when cross-matching catalogues with fast moving objects separated by a significant time difference. In addition, reference systems also evolve, as our knowledge of the sky improves. For example, the Gaia DR1 and DR2 Celestial Reference Frames are different, see Mignard+ (2018) .

In any case, the science case drives the level of detail. In many situations, astronomers are only interested in lists of potential matches, and arcsec level position tolerance is acceptable. This is the focus for the remainder of this tutorial.

A more careful work is needed to e.g. determine long baseline proper motions or accelerations. This is outside the scope of this tutorial. See e.g. Brandt (2018) for a recent comparison between Hipparcos and Gaia DR2.

The first thing to consider is most stars move very slowly. Gaia DR2 contains slightly less than 1 million stars moving faster than 50 mas/yr, as shown by the following query.

select source_id, ra, dec, parallax, pmra, pmdec,
sqrt(pmra * pmra + pmdec * pmdec) as pm, radial_velocity, ref_epoch
from gaiadr2.gaia_source
where (pmra  < -50 / sqrt(2) or pmra  > 50 / sqrt(2))
  and (pmdec < -50 / sqrt(2) or pmdec > 50 / sqrt(2))
  and sqrt(pmra * pmra + pmdec * pmdec) > 50

Output: 976538 records

This means the bulk of the catalogue (1.3 billion) moves less than 5 arcsec per century. Should the objects of interest belong to this group, cross-match candidates could be selected just enlarging the cone search radius appropriately.

Proper-motion enlarged cross-match. USNO B1 vs Gaia, M4 globular cluster

Imagine we want to combine USNO B1 photometry with Gaia DR2 in a circle of 0.5 deg centred around the globular cluster M4, which has a small proper motion. This external catalogue can be explored as an external TAP. Using keywords "usno b1", reveals two copies at CDS and IRSA. If the former is selected, the following query retrieves the data

select "USNO-B1.0", "Tycho-2", raj2000, dej2000, epoch, pmra, pmde, mupr,
  b1mag, r1mag, b2mag, r2mag
from "I/284/out"
where 1 = contains(
  point('ICRS', raj2000, dej2000),
  circle('ICRS', 245.89675000000003, -26.52575, 0.5)
)

Output: 43060 records

Statistics on the observation epoch can be retrieved analysing the output with JOB_UPLOAD (job ID for this exercise: 1564403845697O). Remember to switch back to the Gaia Archive to reuse jobs.

select min(epoch), max(epoch), avg(epoch)
from job_upload.job1564403845697O

The first epoch in the M4 sample is thus 1954.2. Imagine we want a cone search including all neighbours that have been closer than 2 arcsec, irrespective of the observation epoch. The cross-match radius between Gaia and M4 can be estimated as follows: 2 arcsec (tolerance) + 50 mas/yr (maximum proper motion for slow stars) * (2015.5 – 1954.2) yr = 5.0 arcsec. A first list of potential Gaia DR2 - USNO B1 matches can be obtained using such a cross-match radius. JOB_UPLOAD will be used to upload the previous USNO results (job ID 1564407875966O) and compare them to gaia_source.

select usno.*,
  gaia.source_id, gaia.ra, gaia.dec, gaia.parallax,
  gaia.pmra as pmra_gaia, gaia.pmdec as pmdec_gaia,
  radial_velocity, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag
from job_upload.job1564407875966O as usno
join gaiadr2.gaia_source as gaia
  on 1 = contains(
    point('', usno.raj2000, usno.dej2000),
    circle('', gaia.ra, gaia.dec, 5. / 3600.)
  )

Output: 80410 records

Such query (job ID job1564413301477O) contains all potential matches between both catalogues, and might be enough for many users intending to select the good matches a posteriori on their computers.

Proper-motion enlarged cross-match. USNO B1 vs Gaia, high proper motion stars

Imagine we are interested on the fastest moving objects in Gaia. This is the complementary exercise to the previous section. For simplicity, only objects faster than 2 arcsec/yr will be considered. The sample can be selected using the following query.

select source_id, ra, dec, parallax, pmra, pmdec,
sqrt(pmra * pmra + pmdec * pmdec) as pm, radial_velocity, ref_epoch
from gaiadr2.gaia_source
where (pmra  < -2000 / sqrt(2) or pmra  > 2000 / sqrt(2))
  and (pmdec < -2000 / sqrt(2) or pmdec > 2000 / sqrt(2))
  and sqrt(pmra * pmra + pmdec * pmdec) > 2000
order by pm desc

Output: 28 records

The highest proper motion corresponds to Barnard's star: 10.4 arcsec/yr. If the earliest epoch in USNO B1 of 1949 is considered, see Monet+ (2003), the search radius can be estimated as 2 arcsec + 10.4 arcsec/yr * (2015.5 - 1949) = 693.6 arcsec = 11.5 arcmin. The following query in the external Archive TAP-Vizier (remember to switch) can be requested from the Gaia Archive using TAP_UPLOAD (job ID 1564409893418O) to provide the high proper motion Gaia stars and retrieve potential matches

select "USNO-B1.0", "Tycho-2", raj2000, dej2000, epoch, usno.pmra, usno.pmde, usno.mupr,
  b1mag, r1mag, b2mag, r2mag, gaia.*
from "I/284/out" as usno
join TAP_UPLOAD.JOB1564409893418O as gaia
  on 1 = contains(
    point('ICRS', raj2000, dej2000),
    circle('ICRS', ra, dec, 11.5 / 60.)
  )

Output: 194097 records

Note the big search radius produces a large list of potential matches (almost 7000 candidates per input star). Reducing the proper motion threshold to e.g. 1 arcsec/yr quickly saturates the TAP_UPLOAD synchronous query limits in TAP-Vizier, due to the large area to explore.

The lesson learnt is brute force can only be applied to small samples, specially if working with different archives. See next section for tips on how to use proper motions to reduce the search radius.

Proper-motion corrected cross-match. USNO B1 vs Gaia, M4 globular cluster

Source properties, most notably proper motions, can be used to propagate source positions to the other catalogue epoch(s). The search radius can then be reduced, as well as the number of possible matches. Two of the most popular ways are the approximate linear proper motion correction and the rigorous Hipparcos calculation. See more details in "The Hipparcos and Tycho Catalogues" ESA 1997 SP-1200 , sections 1.2.8, 1.5.4, 1.5.5. The approximated propagation is as follows.

RA_propagated_approx = RA_catalogue + pm_RA * (epoch_propagated – epoch_catalogue) / cos(DEC)
DEC_propagated_approx = DEC_catalogue + pm_DEC * (epoch_propagated – epoch_catalogue)

While the rigorous Hipparcos calculation takes into account the spherical geometry and some relativistic corrections. They have been introduced into the Archive as the user defined functions EPOCH_PROP and EPOCH_PROP_POS. In addition to proper motion (the main contributor), parallax and radial velocity also play a role for the closest (and thus fastest moving) objects. The corresponding SQL code would be:

RA_propagated = SELECT COORD1(
  EPOCH_PROP_POS(ra, dec, parallax, pmra, pmdec, radial_velocity, epoch_catalogue, epoch_propagated))
DEC_propagated = SELECT COORD2(
  EPOCH_PROP_POS(ra, dec, parallax, pmra, pmdec, radial_velocity, epoch_catalogue, epoch_propagated))

The following plot shows the angular distance as a function of the proper motion between the Hipparcos catalogue propagated rigorously to epoch J1900 (91.25 years baseline) and using the approximate expressions (blue points) or committing an error of 100 km/s in the radial velocity (red points).

Note the difference is most times negligible. Only for the highest proper motions (>500 mas/yr) it is above the arcsec level, See zoom below.

It is important to notice the errors grow with the square of the time difference. This means for catalogues not farther away than ~20 yr, the effect becomes very small. See below the equivalent plot for the J1991.25 => J1970.0 propagation, which shows the errors are smaller than 300 mas even for Barnard's star.

​​​​​​​

Going back to our original example, we may safely use the approximate propagation routines without radial velocities to refine the M4 neighbour search (job ID job1564413301477O). The first step is to approximately propagate the coordinates back to the original epochs (USNO B1 positions are propagated to epoch J2000 when proper motions are available).

raj2000 + 1. / 3600e3 * pmra * (epoch - 2000.0) / cos(radians(dej2000)) as ra_orig,
dej2000 + 1. / 3600e3 * pmde * (epoch - 2000.0) as dec_orig,

The second step is to approximately propagate the Gaia coordinates to the catalogue epoch, which is different for each potential match.

ra  + 1. / 3600e3 * pmra_gaia  * (epoch - 2015.5) / cos(radians(dec)) as ra_gaia,
dec + 1. / 3600e3 * pmdec_gaia * (epoch - 2015.5) as dec_gaia

The third step is to apply a more stringent criterion (e.g. 2 arcsec radius).

where
  distance(
    point('', ra_orig, dec_orig),
    point('', ra_gaia, dec_gaia)
  ) * 3600 <= 2
  or (
    pmra_gaia is null and
    distance (
        point('', ra_orig, dec_orig),
        point('', ra, dec)
	  ) * 3600 <= 2
  )

Note the original Gaia positions are used for stars without proper motion, which produce null ra_gaia, dec_gaia. The following complete query shows how to do all steps at once.

select *,
distance(
  point('', ra_orig, dec_orig),
  point('', ra_gaia, dec_gaia)
) * 3600 as dist_arcsec
from (
  select *,
  raj2000 + 1. / 3600e3 * pmra * (epoch - 2000.0) / cos(radians(dej2000)) as ra_orig,
  dej2000 + 1. / 3600e3 * pmde * (epoch - 2000.0) as dec_orig,
  ra  + 1. / 3600e3 * pmra_gaia  * (epoch - 2015.5) / cos(radians(dec)) as ra_gaia,
  dec + 1. / 3600e3 * pmdec_gaia * (epoch - 2015.5) as dec_gaia
  from job_upload.job1564413301477O
) as subquery
where
  distance(
    point('', ra_orig, dec_orig),
    point('', ra_gaia, dec_gaia)
  ) * 3600 <= 2
  or (
    pmra_gaia is null and
    distance (
        point('', ra_orig, dec_orig),
        point('', ra, dec)
	  ) * 3600 <= 2
  )

Output: 41481 records

which is closer to the 43060 original number of USNO B1 M4 candidates. The Gaia proper motions plot below nicely reveals the cluster against the field stars.

A histogram of the propagated distances shows a peak below 1 arcsec and a plateau afterwards. This suggests additional cleaning is still needed, probably based on additional criteria rather than pure geometrical distances.

Alternatively, the high precision epoch propagation functions could be used as follows

select *,
distance(
  point('', ra_orig, dec_orig),
  point('', ra_gaia, dec_gaia)
) * 3600 as dist_arcsec
from (
  select *,
  COORD1(EPOCH_PROP_POS(raj2000, dej2000, 0, pmra, pmde, 0, 2000, epoch)) as ra_orig,
  COORD2(EPOCH_PROP_POS(raj2000, dej2000, 0, pmra, pmde, 0, 2000, epoch)) as dec_orig,
  COORD1(EPOCH_PROP_POS(ra, dec, parallax, pmra_gaia, pmdec_gaia, radial_velocity, 2015.5, epoch)) as ra_gaia,
  COORD2(EPOCH_PROP_POS(ra, dec, parallax, pmra_gaia, pmdec_gaia, radial_velocity, 2015.5, epoch)) as dec_gaia
  from job_upload.job1564413301477O
) as subquery
where distance(
  point('', ra_orig, dec_orig),
  point('', ra_gaia, dec_gaia)
) * 3600 <= 2

Output: 41481 records

which provides the same candidates matches. The distances are more precisely computed. However, the differences are minimal for M4 (below 50 microarcsec), which is not very close to the Sun.

Interestingly enough, the proper motions in Gaia DR2 are clustered around the average value for M4, as opposed to USNO B1, which has a much larger dispersion. This is a good example of why Gaia positions are better propagated and compared to the values at the observation epoch rather than those propagated to a common epoch (e.g. J2000).

Revisiting the simple query cross-match

Several of the techniques presented above have been incorporated into the simple query cross-match interface, accessible through tabs "Search" => "Basic" => "File". For example Let us assume we use the same input file described in the "Simple Query tutorial (DR1)"

 

If we click on "Submit query" with the default settings, the following output is produced (divided into three chunks for clarity).

 

 

​​​​​​​

The first two blocks are the default gaia_source column selection, while the third one contains the input source name plus some astrometric parameters taken from the name resolver (Simbad, NED or Vizier), together with the distance to the Gaia source. The corresponding query can be retrieved by clicking on "Show query in ADQL form".

SELECT TOP 500
  gaia_source.source_id, gaia_source.ra, gaia_source.ra_error, gaia_source.dec,
  gaia_source.dec_error, gaia_source.parallax, gaia_source.parallax_error,
  gaia_source.phot_g_mean_mag, gaia_source.bp_rp, gaia_source.radial_velocity,
  gaia_source.radial_velocity_error, gaia_source.phot_variable_flag,
  gaia_source.teff_val, gaia_source.a_g_val,
  temp.id AS target_id, temp.ra AS target_ra, temp.dec AS target_dec,
  temp.parallax AS target_parallax, temp.pm_ra AS target_pm_ra, temp.pm_dec AS target_pm_dec,
  temp.radial_velocity AS target_radial_velocity,
  DISTANCE(
    POINT('ICRS', gaiadr2.gaia_source.ra, gaiadr2.gaia_source.dec),
    POINT('ICRS',
      COORD1(EPOCH_PROP_POS(temp.ra, temp.dec, temp.parallax, temp.pm_ra, temp.pm_dec, temp.radial_velocity, 2000, 2015.5)),
      COORD2(EPOCH_PROP_POS(temp.ra, temp.dec, temp.parallax, temp.pm_ra, temp.pm_dec, temp.radial_velocity, 2000, 2015.5)))
  ) AS target_distance
FROM JOB_UPLOAD.JOB1564417933358O AS temp, gaiadr2.gaia_source
WHERE (
  CONTAINS(
    POINT('ICRS',
      COORD1(EPOCH_PROP_POS(temp.ra, temp.dec, temp.parallax, temp.pm_ra, temp.pm_dec, temp.radial_velocity, 2000, 2015.5)),
      COORD2(EPOCH_PROP_POS(temp.ra, temp.dec, temp.parallax, temp.pm_ra, temp.pm_dec, temp.radial_velocity, 2000, 2015.5))),
    CIRCLE('ICRS', gaiadr2.gaia_source.ra, gaiadr2.gaia_source.dec,0.001388888888888889)
  ) = 1
)

Note it uses the rigorous epoch propagation EPOCH_PROP_POS function to produce the best possible distances between the Gaia sources and the positions provided by the name resolver.

External Archives (TAPs) access

Author: Alcione Mora

 

Gaia data are better exploited when combined with other catalogues. This tutorial shows ways to do it for data hosted in external TAP services. the same technology used by the Gaia Archive.

This is an intermediate level tutorial that assumes a basic knowledge of the general interface and workflow. The introductory tutorials White dwarfs exploration and Cluster analysis (GUI) are recommended in case of difficulties following this exercise.

Introduction

The Gaia Archive is based on the Virtual Observatory Table Access Protocol (TAP) . It allows users to retrieve data from astronomical data bases using ADQL queries. There are many TAP services around the globe. Two notable cases are the ARI mirror of the Gaia Archive (Heidelberg, Germany) and TAP Vizier (CDS, Strasbourg, France). The latter is particularly comprehensive, containing a mixture of large catalogues (e.g. Gaia, 2MASS, Pan-STARRs) and small to medium tables included in many refereed publications.

This tutorial explains how to discover, explore and combine data from external TAPs within the Gaia Archive. Some key ideas have been obtained from Topcat, which provides an excellent TAP client.

Discovering resources

Imagine we want to combine HST/ACS photometry from the NGC 346 open cluster analysed by Gouliermis+ (2006). The first step is to look for such resource. First, we need to go navigate through Search => Advanced (ADQL) tab in the Gaia Archive GUI. Then, we need to click on the scrollable menu showing "gaia" on top of the tables tree:

And then in "External TAP search":

which displays a pop-up menu where keywords can be introduced. If we type "gouliermis ngc 346" and click on "Submit Query", the following three tables appear.

If we select one or more items and click on "Add selected"

the Archive is instructed to send future requests to the external TAP of our choice. This means only tables from that TAP are accessible at this time. That is, resources from different TAPs cannot be combined in the same query.

Querying external archives

When selecting an external TAP, the interface changes as follows

The Gaia Archive table tree is now replaced by the selected tables in the active external TAP. Metadata (table and column description, data type, …) can be accessed as usual clicking on the tree items.

The ADQL background colour changes to blue and the external TAP address is shown on top.

The following query retrieves the full Gouliermis+ (2006) catalogue.

select * from "J/ApJS/166/549/table2"

 

Any job executed will also use blue colour and a downward pointing arrow to indicate they were executed in and downloaded from an external system.

Note Vizier table names include special characters and must be enclosed within quotes. In our case, job ID 1561587589604O was successfully created with the results. Note they are now hosted in the Gaia Archive, in the user or anonymous area depending on whether we were authenticated or not.

Any external TAP table can be used in queries irrespectively of whether they are shown in the tree. That is, the following query could also have been executed without the need to select the corresponding table, provided we knew its name.

select * from "J/ApJ/672/914/table2"

Combination with Gaia data: JOB_UPLOAD

The typical workflow for using external catalogues requires the combination (join) of the retrieved data with those hosted in the Gaia Archive. One possibility is uploading the table to the user space, as explained in tutorials White dwarfs exploration and Cluster analysis (GUI).

For small catalogues, or those to be used only once it is probably better to use JOB_UPLOAD, which temporarily uploads the result of an existing job for its use within a query. The first step is to go back to the Gaia Archive, selecting it from the drop-down menu on top of the table tree.

We can now execute the following query

select * from gaiadr1.gaia_source as gaia1
join job_upload.job1561587589604O as gouliermis
on 1 = contains(
	point('', gouliermis.raj2000, gouliermis.dej2000),
	circle('', gaia1.ra, gaia1.dec, 2. / 3600)
)
and abs(phot_g_mean_mag - vmag) < 2.

which does a cross-match between the Gouliermis+ (2006) data and Gaia DR2 using a cone search radius of 2" and a maximum magnitude difference |V-G| < 2. Note the big catalogue (Gaia) is used in the "circle" function, while the small one (Gouliermis) is used within "point". This is the usual convention in the Gaia Archive. Inverting it might produce queries that take a long time or just time out.

The output job has ID 1561588324694O and is shown in black font because has been executed locally in the Gaia Archive.

The pre-existing job with the TAP Vizier results and ID 1561587589604O is referenced (and uploaded) using the following syntax

JOB_UPLOAD.JOB1561587589604O as gouliermis

Topcat users will find it similar to the TAP_UPLOAD.tXX mechanism, where tXX means table number XX in the interface.

Combination with external data: TAP_UPLOAD

Sometimes the combination goes in the opposite direction. Imagine we want to complement the Gouliermis+ (2006) ACS + Gaia data with 2MASS photometry in the ARI Archive (there is a copy of 2MASS in the Gaia Archive, but it will not be used in this example). We can locate it searching for external TAPs using keywords: "2mass point source catalogue short"

The following query will upload our intermediate Gouliermis + Gaia data to ARI and do a 2" geometrical cross-match with 2MASS

select * from tap_upload.job1561588324694O  as upload
join twomass as tmass on 1 = contains(
	point('icrs', tmass.raj2000, tmass.dej2000),
	circle('icrs', upload.ra, upload.dec, 2. / 3600.)
)

Note the big catalogue (2MASS) is now used in the "point" function, while the small one (Gouliermis + Gaia) is used within "circle". This is the convention in the ARI and Vizier TAPs, and opposite to that in the Gaia Archive. Inverting it might produce queries that take a long time or just time out. This means some experimentation is needed when using geometrical queries in an external TAP to produce optimum results.

The output job has ID 1561589380377O and is shown in blue font because has been executed externally in the ARI Gaia Archive.

 

The pre-existing job with the Gouliermis + Gaia intermediate results is referenced (and uploaded) using syntax

TAP_UPLOAD.JOB1561588324694O as upload

Note we use JOB_UPLOAD to upload a local job into the Gaia Archive and TAP_UPLOAD for uploads to an external TAP Archive.

Finally, we may use the combined ACS/Gaia/2MASS data as needed. The following quick Topcat plot shows a G vs V-Ks colour magnitude diagram.

 

Combination with external cross-matched data: source_id

Sometimes, the external catalogue has already been cross-matched with Gaia, and the quoted source_id can be used to do an exact table join, instead of a geometrical one.

For example, imagine we are interested in the TAP Vizier catalogue of Gaia DR2 planetary nebulae from Kimeswenger & Barría (2018). It can be explored using the external TAP search keywords: "gaia dr2 pn"

​​​​​

It can be retrieved using query

select * from "J/A+A/616/L2/tablea"

which produces a job with ID 1561657027843O. It contains valid Gaia DR2 source_id in column "source". They can be used to do an exact cross-match (table join) as follows. First, we need to go back to the Gaia Archive, as explained above. Then, the following query is executed.

select pn.*,
parallax, pmra, pmdec, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag
from job_upload.job1561657027843O as pn
join gaiadr2.gaia_source as gaia
  on pn.source = gaia.source_id

which does the final data combination.