How to extract Gaia data - Gaia Users
The Graphical User Interface (GUI) of the ESA Gaia Archive offers the possibility to carry out basic and advanced queries using ADQL (Astronomical Data Query Language). The outputs of these queries can be easily downloaded in a variety of formats, including VOTable, FITS and CSV. Below you can find a brief description of the main features offered by the Archive, as well as two video tutorials explaining how to use it.
Graphic User Interface main page.
|Basic query form.
This form allows to easily search for data in all the catalogues hosted by the Archive. Restrictions can be added to the query using the 'Extra conditions' wizard. The output fields can be selected by means of the 'Display columns' option panel.
Advanced (ADQL) query form.
This form allows to execute ADQL queries. Each query generates a job at server side. The jobs executed by the user can be inspected in the list provided in this page. All the public tables and the user-uploaded tables are visible in the left side of the web.
The output of the queries are displayed in this window. The ADQL query that generated these results can be inspected by clicking on the 'Show query in ADQL form' link.
Video tutorial: How to use the Archive
Author: Deborah Baines
Video tutorial: How to use the simple form of the Archive
Author: Alcione Mora
Authors: Héctor Cánovas & Alcione Mora
The main function of the Gaia Archive is to provide data to the astronomers. The
Search tab in the GUI landing page provides two different ways of accesing the Archive for
Basic (default option) and
Advanced (ADQL) queries. The main objectives for the
Basic tab are:
- To ease the exploration of the Archive catalogues for simple use cases, and
- To help users into the transition towards the
Advanced (ADQL)tab for complex use cases.
To that end, the
Basic tab allows to perform two of the most common operations executed when exploring an astronomy archive in a very simple and intuitive manner. These operations are the ADQL cone search, which allows to search by coordinates for one or more sources in a given catalogue, and an ADQL query to retrieve all the sources encompassed by a circular or box-like region in the projected sky.
Basic queries are synchronous, which means that they will time out after 60 and 90 seconds for non-registered and registered users, respectively (see this FAQ). Furthermore, the output of these queries is limited to 2000 sources. Therefore, we recommend to use the
Advanced (ADQL) tab to execute complex and/or long queries. Through this tutorial you will learn to use the
Basic tab to:
- Retrieve data for a single source.
- Retrieve data for multiple sources.
- Search for data inside a sky region.
- Understanding the query results.
The most basic use case could be formulated as "I want the most relevant Gaia results for a single object". It can be accomplished using the
Basic > Position subtab. The first step is to fill in the "Name" box with either the object identifier (e.g. "UX Ori") or its ICRS coordinates. The accepted input formats are described in the pop-up window that appears when clicking on top of the "Name" tooltip (see Fig. 1). The single object search launches an ADQL cone search around the celestial coordinates of the input object, which are provided by the Sesame name resolver. The drop-down menu highlighted by the thin solid circle in Fig. 1 allows to choose the service that will be queried to obtain the object coordinates (by default the system tries Simbad, then NED, and then Vizier, but if the input name is a Gaia designation the system queries the Gaia Archive). Once the name or coordinates are successfully resolved the object box turns green and a confirmation message is shown. Note that epoch propagation (see this tutorial) is applied if the proper motions of the resolved source are included in the databases searched for by the name resolver. The cone search radius can be tuned using the "Radius" box, and its units can be adjusted using its associated drop-down menu (highlighted by the dashed circle in Fig. 1).
Figure 1: Content of the
Basic > Position subtab (single source resolver). The vertical arrows highlight the tooltips with explanatory text, while the circles and the horizontal arrow highlight the drop-down menus and extra options available to customize the query, respectively.
The next step is to choose the catalogue that is going to be explored. The latest Gaia data release is the default one, but all the catalogues hosted by the Archive (e.g., previous Gaia data releases, external catalogues) containing geometric information in the form of celestial coordinates can be explored by clicking on the drop-down menu highlighted by the thick circle in Fig. 1. Registered users can also access to their user-uploaded tables provided that their tables contain indexed celestial coordinates (see this tutorial). By default, only a few pre-selected columns of the choosen catalogue are shown in the query outputs. Therefore, you may want to verify that the output of your search will contain the columns that you are interested in. To do so, simply click on the "Display columns" menu (indicated by the dashed horizontal arrow in Fig. 1) and mark the columns that you want to retrieve.
Now you are ready to hit the "Submit Query" button. If you are interested in learning how your query is expressed in the ADQL language, you can hit the "Show Query" button. The query results are shown in the
Query Results tab, whose contents are explained in the Understanding the query results section below.
Three common use cases are:
- I want to look for Gaia counterparts on my list of known objects,
- I want to look for the neighbours of my favourite source, and
- I want to look for the neighbours of my list of favourite sources.
This popular use case can be accomplished using the
Basic > File subtab of the GUI. First of all, you must prepare a single column ascii file (without header) with the names or ICRS oordinates of the objects you are interested in. The input file must be formatted as described in the pop-up window that appears when clicking on top of the "Select a file with Target names" tooltip (see Fig. 2). Second, select the file that you want to upload using the wizard that appears when clicking on top of the "Choose file" button. Be aware that the query output is limitted to 2000 sources. If you aim to retrieve a larger dataset you should upload your target list to your user space as explained in this tutorial, and then use the
Advanced (ADQL) tab to perform a cross-match as explained in this other tutorial.
Figure 2: Content of the
Basic > File subtab (multi source resolver). The arrow highlight the tooltip with explanatory text.
Imagine that we want to retrieve all the sources from the Gaia EDR3 catalogue that, on-sky, are separated by less than 5 arc minutes from UX Ori. This can easily be achieved by simply updating the cone search radius units (using the drop-down menu highlighted by the red dashed circle in Fig. 1). This query outputs 195 sources, including several sources without parallax data. The
Basic > Position subtab allows to apply additional selection criteria to e.g., filter out the targets with parallax signal-to-noise below a given threshold. To do so simply click on the "Extra conditions" drop-down menu as highlighted by the horizontal solid arrow in Fig. 1. A menu that allows to apply and combine different filters will show up as illustrated by Fig. 3. Applying the condition exemplified in Fig. 3 reduces the output of the previous query to 30 sources.
Figure 3: Content of the "Extra conditions" menu of the
Basic > Position subtab. The arrow indicate how to select the operators that will be included in the query. The solid and dashed circles highlight the drop-down menus allowing to select the column over which the operator is going to be applied and the filter combination, respectively.
This use case is a combination of the previous two cases. If you are interested in retrieving the neighbours of a pre-computed source list you should 1) upload the target list as explained in Sect 2.1 and 2) then adjust the cone search radius as described in Sect 2.2. As before, you should keep in mind that the output catalogue of the query is limitted to 2000 sources so you may want to apply a selection criteria using the "Extra conditions" menu as described in Sect 2.2.
A cone search with an arbitrary radius can be generated around any point on the celestial sphere by entering the target coordinates in the "Name" box and adjusting the cone search radius as explained in the Single source data retrieval section. Alternatively, it is also possible to use the "Equatorial" button under the
Basic > Position subtab (see Fig. 1) and choosing the "Circle" option (see Fig. 4). The
Basic > Position subtab also allows to retrieve the sources encompassed by a box region in the projected sky. To do so, simply select the "Box" option, as illustrated by Fig. 4. The accepted formats of the input coordinates are described in the RA and Dec tooltips of the coordinate boxes. As with the previous options, it is also possible to add different selection criteria to filter out the query output by means of the "Extra conditions" menu.
Figure 4: Content of the "Box" menu of the
Basic > Position (Equatorial) subtab. The arrows indicate the tooltips with explanatory text.
The query results are presented in tabular format in the
Query Results subtab that is automatically opened once the query is finished, as shown by Fig. 5. The columns provide units when available. Further details on the meaning of each field can be found in the "<Target Catalogue> Data Model" accessible in the bottom part. Figure 5 shows the results of searching for the Gaia EDR3 counterparts of a list of sources (Sect. 2.1) and therefore the link points to the authoritative reference on the Gaia EDR3 contents. For this example only five columns ("source_id", "ra", "dec", "parallax", and "phot_g_mean_mag") were selected for displaying (using the "Displayed columns" menu as explained in Sect. 1). When the query consist in finding counterparts of an input source list in a given catalogue (as in this example), the output table contains addicional columns that are automatically added by the Archive to aid the user in finding the correspondence between the input targets and the query results. These columns are:
- "target_id": contains the input target name as provided by the user in the input target list.
- "target_id, target_ra, target_dec, target_parallax, target_pm_ra, target_pm_dec, target_radial_velocity": contain data provided by the Name resolver (see Sect. 1).
- "target_distance": contains the on-sky angular separation (in units of degrees) between the target coordinates provided by the Name resolver and the target coordinates of the selected catalogue (Gaia EDR3 in this example).
Query Results subtab (see text for detailed explanation). The arrows indicate the tooltips with explanatory text.
Clicking over the "Show query in ADQL form" button will open the
Advanced (ADQL) tab to show the ADQL query that has been launched. This utility can be helpful for non-expert users aiming to learn the ADQL query language. Finally,the query output can be downloaded using the "Download results" menu (indicated by a vertical arrow in Fig. 5). The format of the output file can be set by means of the drop-down menu highlighted by the red circle in Fig. 5.
Authors: Héctor Cánovas & Alcione Mora
Advanced (ADQL) form of the GUI allows to execute complex ADQL queries, inspect the Archive catalogues, upload tables to the user space, and take advantage of a number of utilities developed by our team to facilitate the exploration and scientific exploitation of the Gaia Archive. All
Advanced (ADQL) queries are asynchronous, which means that they will time out after 90 and 120 minutes for non-registered and registered users, respectively (see the "Why does my query time out after 90 minutes? Why is my query limited to 3 million rows?" FAQ). The output of these queries is limited to 3,000,000 sources for non-authenticated users, whereas it is unlimited for authenticated ones. The
Advanced (ADQL) tab is divided in three main areas that are highlighted in Fig. 1.:
Figure 1: Content of the
Advanced (ADQL) tab. The large red-wrinkled and blue-solid rectangles encompass the ADQL query editor and Jobs list areas, respectively. The large black rectangle contains the tables tree.
This box allows to introduce ADQL queries to explore the catalogues hosted by the Archive. Many of the most common types of queries that are executed in the Archive (like cone searches, cross-matches, or computing the proper motion propagation of a given sample of objects) can be loaded by clicking on top of the "Query Examples" text (see Fig. 1). All these examples, along with a basic explanation and a reference (when appropriate) can also be found under the "Writting Queries/Query Examples" section of the left-side menu of this page. For more information and references about the ADQL language please take a look at the "Writting queries/ADQL syntax" section of that menu. Pressing the "Submit Query" button launches the query, which in turn generates a "job" in the Archive. A job has several attributes, including the results that contain the output of the launched ADQL query. The job ID can be specified before launching the query using the "Job name" box. This utility helps reminding the goal of a given query when visiting the Archive in the future. The job results are presented in a tabular form and - once a job succesfully finishes - they can be examined by clicking on top of the table-like icon located in the Jobs list (highlighted by the red circle in Fig. 1). This action will show up a table in the
Query Results tab (see the "Understanding the Query Results" section of this tutorial to learn more details about the content of this tab).
Each row of the Jobs list contains relevant information about the jobs previously launched, like:
- The status (a job can be succesfully finished, failed, or it can be under execution),
- The job ID (either an alpha-numeric code assigned by the Archive or a user-defined name),
- The creation date,
- The number of rows of the job result, and
- The size of the job result.
The jobs launched by registered users are stored in their accounts, which have a job quota limit of 20 Gbs. The number of jobs stored in an account is shown at the bottom left of the page (in the example shown in Fig. 1 there are 183 jobs stored). This number appears surrounded by two pair of arrows that allow to navigate through the job list. The expandable menu "Apply jobs filter" allows to select jobs executed within a given range of time, as well as to filter out jobs by their status (e.g., completed, aborted, executing, etc.) Jobs can be deleted by means of the "Delete selected jobs" option. Additionaly, the Jobs list contains 7 buttons (encompassed by the small blue rectangle in Fig. 1) that, from left to right, allow to:
- Obtain detailed job information,
- Export the job result to a table and upload it to the user account (see this tutorial for details),
- Download the job result as a table (the file format of the downloaded file can be set using the "Download format" menu),
- Upload the job result to the user VOspace,
- Examine the job result in the
- Show the ADQL query that generated the job in the ADQL parser, and
- Explore the datalink products (if any) associated to the job results.
The tables tree allows to directly access to all the catalogues hosted by the Archive. These catalogues are organised under different branches that can be expanded by clicking on the "+" sign next to each schema name (see Fig. 2). A first look at this box shows that the tables have three different types of associated icons. Tables having indexed celestial coordinates are marked with an spherical-like icon, and they can be explored within the
Basic query tab (see Fig. 1 in this tutorial). The tables lacking indexed coordinates column are marked with a table-like icon, while the cross-matches between two tables are marked with a double-star like icon. Clicking on top of a table name will open a pop-up window with basic information about the table. From that window it is also possible to directly show the first 20 rows of the table, which will open in the
Query Results tab. Clicking on the "+" sign next to each table name will expand the table content to show its column names. Indexed columns appear with bold fonts, and the (indexed) primary key of the table appears in bold red fonts.
Figure 2. Left: Selected content of the Catalogue box. The yellow pop-up window appears after clicking on top of the table name. Right: Content of the "gaiaedr3.gaia_source" catalogue, that is revealed by clicking on top of the "+" sign next to the table name.
From left to right, the five buttons placed on top of the catalogue schemas (encompassed by the small black rectangle in Fig. 1) allow to:
- Upload a local table to the user space (see more details about this utility in this tutorial),
- Remove a table from the user space,
- Run a positional cross-match between two tables,
- Edit the table properties, and
- Share a table with the members of a given group (see this tutorial for more information about the sharing capabilities of the Archive).
Furthermore, it is possible to easily combine the catalogues and tables hosted by the Gaia Archive with other catalogues hosted by external TAP services. To access to this advanced utility click on top of the arrow encompassed by the black circle as indicated in Fig. 1. This dedicated tutorial explains how to make use of this this powerful functionality.