HOW TO USE TAP TO ACCESS THE CSA

CSA's new TAP system allows access to data held in the archive, and metadata in the database to help you find which data you need.

 

Update from CAIO: the data requests have changed very little (allowing continuity with your existing scripts) but the metadata requests are now far more powerful. If you already have http requests or code to retrieve data with the CAIO, please refer to this page for converting CAIO to TAP.

 

This page is essentially a 'Where To Start Guide', with step-by-step sections for forming data and metadata requests (including inventory requests), but also touching briefly on all aspects covered in more detail on the other pages. Use cases are also provided for the different types of request, here and on each specialised page.

 

DIFFERENT TYPES OF REQUESTS

There are several different ways to access data using the CAIO:

REQUEST TYPE

DESCRIPTION

SYNTAX

DIRECT DATA REQUEST (UP TO 1 GB)

Use to retrieve data up to a total compressed size of 1GB. This data will be delivered to you as soon as the package is ready. Direct data requests do not require authentication.

Requests for headers also use this syntax.

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product

METADATA REQUEST (INFORMATION ABOUT DATA PRODUCTS)

Use to retrieve metadata about the data products, to find the data you need.

This could be lists of datasets for a given instrument or measurement type, inventory information, or a list of files that fulfil certain criteria.

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=

STREAMING DATA REQUEST (ONE FILE)

Use to retrieve one file of data via the streaming functionality - this is used for graphics applications, for instance. Streaming requests do not require authentication.

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=streamed

ASYNCHRONOUS DATA REQUEST (UP TO 50 GB)

Retrieve data files up to a total compressed size of 50GB. Once requested, the data will be prepared on our server and you will be provided with a link by email when it is ready for download. This type of request requires that you register with the CSA and authenticate.

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED

LOGIN REQUEST (FOR ASYNC)

Log in for asynchronous data requests - registration with the CSA is required first

https://csa.esac.esa.int/csa-sl-tap/login

HEADER REQUEST

Request header files for datasets

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=header

Full details for using each request type are given on the pages linked in the first column.

 

CONSTRUCTING A SIMPLE REQUEST FOR DATA

For a simple data request like this, you need to know:

  1. that this selection will be under 1 GB (compressed); it will download as soon as it has been prepared on our server. Downloading more than 1 GB in one request requires an Asynchronous Request.

  2. the name (ID) of the dataset you want to access. This is mandatory.

  3. what time period you want. This is basically mandatory for most datasets, since if not provided, the default time span is the whole mission. This will probably exceed 1 GB compressed, unless the dataset is small (e.g., FGM caveat files for 1 s/c).

For this example, we will use the spin resolution (~4s) magnetic field dataset from the FGM instrument from spacecraft 1 (C1): C1_CP_FGM_SPIN and we will request a two-day period in March 2003. This request will be roughly 730 kB compressed, so fits criteria 1 above; uncompressed, it will be about 3.7 MB.

The syntax for this type of request begins with:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&

Next we include the DATASET_ID that we want:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN

Additional items are connected with an ampersand (&). We will add our dates in ISO format, and we have our final complete request:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z

This code and all of the use cases given here can be directly copied and pasted into the address bar of a browser and the data will download. This is equivalent to the hyperlink:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z

 

Structure of downloaded file name and package

The downloaded file will be a file called anonymousXXXXXXXXXXXXX.tgz or <youruserid>XXXXXXXXXXXXX.tgz where the X's represent the number allocated to that request. Depending on your browser, a request copied and pasted into the address bar or by following that link may or may not gunzip the package automatically. When the package has been gunzipped and untarred , it will be a folder called ' CSA_Download_yyyyMMdd_hhmm' using the date and time of the retrieval. Inside this folder will be another folder with the name of the dataset requested, containing the data file(s) requested:

CSA_Download_yyyyMMdd_hhmm/<DATASET_ID>/<DATASET_ID>__<START_DATE>_<END_DATE>_<VERSION_NO>.<DELIVERY_FORMAT>

For example:

CSA_Download_20210825_1231/C1_CP_FGM_SPIN/C1_CP_FGM_SPIN__20030303_000000_20030304_000000_V140305.cef
CSA_Download_20210825_1231/C1_CP_FGM_SPIN/C1_CP_FGM_SPIN__20030304_000000_20030305_000000_V140305.cef
CSA_Download_20210825_1231/C1_CQ_FGM_CAVF/C1_CQ_FGM_CAVF__20030303_000000_20030304_000000_V140305.cef

By default, if this dataset comes with additional information, such as active caveats, this will be included in the package. In our example here, this time interval includes a caveat about calibration, so two folders will be in the package.

Defaults

In our simple example here, there are some defaults being applied. The options for these, plus other options not included as default, are listed on the Data Requests page:

  • DELIVERY_FORMAT=CEF

    • The default data file format is CEF (Cluster Exchange Format) which is ASCII with a header of metadata.

  • DELIVERY_INTERVAL=Daily

    • Interval covered by each returned data file, 24h is the default, but this is from the start date provided, not midnight to midnight.

  • REF_DOC=1

    • Most datasets have related datasets which may be caveats, experiment settings, or other necessary accompanying data. These files are only included if there is information relevant for the requested time period.

  • INCLUDE_EMPTY=1

    • Some of the requested files may contain no records; you can choose whether to receive these or not, but by default they are included.

So, with all these parameters and their defaults included (but not needed), our example from above would look like this:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z&DELIVERY_FORMAT=CEF&DELIVERY_INTERVAL=Daily&REF_DOC=1&INCLUDE_EMPTY=1

 

Data Request Use Cases

These use cases demonstrate different request types and some of the criteria that may be used with them. Details of all the criteria available for use in these commands can be found on the pages dedicated to those types of requests. Only asynchronous data requests (UC4) require authentication.

 

How-To UC1 - DATA REQUEST: CEF

This use case retrieves data from 2 datasets in CEF format (default) for an interval of 1 day, split into 6-hour files; the data will be assembled and download will start as soon as the package is ready.
Type of request: (Synchronous) Data (product) request

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C3_CP_ASP_IONC&DATASET_ID=C1_CP_EFW_L1_P2&START_DATE=2004-06-18T00:00:00Z&END_DATE=2004-06-19T00:00:00Z&DELIVERY_INTERVAL=hourly6

 

How-To UC2 - DATA REQUEST: CDF

This use case retrieves data from 1 dataset for all spacecraft in CDF format for an interval of 8 hours, split into 1-hour files; the data will be assembled and download will start as soon as the package is ready.
Type of request:(Synchronous) Data (product) request

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C*_CP_PEA_MOMENTS&START_DATE=2004-07-15T00:00:00Z&END_DATE=2004-07-15T08:00:00Z&DELIVERY_INTERVAL=hourly&DELIVERY_FORMAT=CDF

 

How-To UC3 - GET HEADER INFORMATION

This use case retrieves the dataset headers in XML format that match a search criteria. Remember that although this is a request for metadata, header metadata requires a Data request - note the value of RETRIEVAL_TYPE, which is PRODUCT by default, but HEADER here. This will download the headers for 60 datasets: all headers for RPA mode for all spacecraft operating CIS-CODIF (C1, C3 and C4), all particles (H+, He+, H++ and O+) and all units (counts/s, particle energy flux, particle flux, phase space density, and raw counts).
Type of request: Data Header request

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=header&DATASET_ID=C*_CIS-CODIF_RPA_*

Note that this is a relatively large request for this type of file (60 files) and currently, this will take up to a minute to process.

 

How-To UC4 - LARGE DATA REQUEST

For downloads greater than 1 GB and less than 50 GB, an asynchronous request can be used. This use case retrieves data from 3 datasets in CEF format for an interval of 4 days, split into daily files (small volume request, for demonstration purposes). The request will be executed asynchronously, so the server will assemble it and will send an email to the registered email address containing the collection link when complete.
Type of request: Asynchronous data request so include RETRIEVAL_ACCESS=deferred and ensure you have registered with the CSA and logged in (see authentication).

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL__ACCESS=DEFERRED&DATASET_ID=C1_CP_EDI_EGD&DATASET_ID=C1_CP_EFW_L3_P&DATASET_ID=C1_CP_FGM_FULL&START_DATE=2001-02-01T00:00:00Z&END_DATE=2001-02-05T00:00:00Z&DELIVERY_INTERVAL=Daily

 

CONSTRUCTING A SIMPLE REQUEST FOR METADATA

For a simple metadata request, you need to know:

  1. what information (fields, or columns) you are looking for (using SELECT)

  2. the table this information is included on (see TAP Table and Views, and using FROM)

  3. any criteria to be applied

Note that SELECT and FROM are mandatory for this query. Once these mandatory elements are in place, this request can be directly copied and pasted into the address bar of a browser and the data will download, although you'll probably want to add more options.

For the following example, we want to have a list of all the dataset IDs and their titles for FGM.

The URL for this type of request is

https://csa.esac.esa.int/csa-sl-tap/tap/sync

and the query (using the ?) begins with REQUEST=doQuery&LANG=ADQL&QUERY=

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=

The results will be in VOTable format by default, add FORMAT=CSV or FORMAT=JSON for the other formats.

Next we include the fields that we want using SELECT. The gap between SELECT and these field needs to be filled with a plus sign ('+') and the fields are separated with a comma:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+dataset_id,title

This information is about datasets, which is in the table csa.v_dataset, so we include this after FROM

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+dataset_id,title+FROM+csa.v_dataset

This is the minimum required for a valid request. The syntax above may be copied into the address bar of a browser and will download all dataset IDs and their titles for all datasets, in VOTable format (the default).

We'll add the CSV format, since this is the easiest to read:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,title+FROM+csa.v_dataset

To restrict the results to a single experiment, say FGM, there are options depending on exactly what you want.

For example, we can add the conditional statement using WHERE, that the experiments field is equal to 'FGM' (this field can contain more than one experiment name):

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,title+FROM+csa.v_dataset+WHERE+experiments='FGM'

This will give us the list of dataset IDs where FGM is the only experiment involved in that dataset, giving us 63 results. However, this is not the same as the list of datasets that include 'FGM' in the dataset ID. For this we need the following request, where we need the wildcard , which in this case is a percentage sign (%), which must also be URL encoded, so ends up being '%25FGM%25':

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,title+FROM+csa.v_dataset+WHERE+dataset_id+like+'%25FGM%25'

This request returns 98 results, which includes the graphical cross calibration datasets, for example.

However, they are not sorted. To sort the results by dataset ID, include ORDER+BY+dataset_id at the end.

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,title+FROM+csa.v_dataset+WHERE+dataset_id+like+'%25FGM%25'+ORDER+BY+dataset_id

The page on metadata requests gives more detailed information on the parameters that can be given and returned. This code can be directly copied and pasted into the address bar of a browser and the data will download.

 

Metadata Request Use Cases

Metadata requests do not require authentication.

 

How-To UC5 - LIST DATASETS WITH A PARTICULAR PARAMETER

This use case requests dataset ID, parameter description (cat_description), and parameter ID for datasets that include electron temperature (entity = Electron, property = Temperature - these are case sensitive, see UC6, below). We need to query the csa.v_parameter table:
Type of request: Metadata request

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,cat_description,parameter_id+FROM+csa.v_parameter+WHERE+entity='Electron'+AND+property='Temperature'

 

How-To UC6 - LIST AVAILABLE VALUES

In the Use Case above, our request includes entity='Electron', where 'Electron' is case sensitive because it's the value in that field. So to find out the exact form of the values we're interested in, we can use DISTINCT. This will return a list where all the duplicates have been removed. Here, we want to find out the whole list of entities included in the CSA datasets.
Type of request: Metadata request

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+DISTINCT+entity+FROM+csa.v_parameter

The results are:

"Photon", "Instrument", "Transformation", "Particles", "Ion_CNO", "Compound", "Magnetic_Field", "Proton", "Electric_Field", "Other1", "Helium+", "Alpha", "Electron", "Observatory", "Oxygen+", "Ion"

 

How-To UC7 - INVENTORY INFORMATION

Inventory information is held in the csa.v_dataset_inventory table. This table only includes the fields (columns) that used to be given in the old CAIO inventory requests, that is: dataset_id, start_time, end_time, num_instances and inventory_version. So we could put * after SELECT, but then the results will be given with those fields in alphabetical order, which will not be the same as the list above. To get a particular order we need to list them explicitly. Remember that the times given here, since we are dealing with results in intervals, need to be given the 'wrong' way round to ensure that all applicable intervals are included - see Metadata Requests for a graphical explanation.
Type of request: Metadata request

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+FROM+csa.v_dataset_inventory+WHERE+dataset_id='C1_CP_FGM_SPIN'+AND+start_time<='2002-05-01T00:00:00'+AND+end_time>='2002-04-01T00:00:00'+ORDER+BY+start_time