CAIO to TAP - Cluster Science Archive

Differences between CAIO and TAP

Note that full user pages are available at https://www.cosmos.esa.int/web/csa-guide/home. This is a 'quick start' guide on how to adapt your CAIO script to use TAP instead.

There are several different requests that are possible using the CAIO:

Synchronous product requests (data), via:
- Browser
- wget
- Python
- MATLAB
- IDL
Asynchronous product requests (data) via a browser
Streamed data requests (data)
Metadata requests, including
- Inventory requests

The above features are already available in the TAP limited-functionality beta release and the sections below describe how to adapt scripts for these services.

The following features, already available in CAIO, will be added shortly to the TAP functionality:

Header requests (counts as data)

If you require assistance to alter code, please contact us.

Adapting scripts for:

Synchronous Data Download

The format is slightly different, but the development team have worked hard to limit the changes necessary to a request. The two biggest changes are as follows

Change of downloaded format suffix:

The CAIO package downloaded has a .tar.gz extension; the TAP system will give you a .tgz, which can be treated in the same way as the .tar.gz package but with possible repercussions for your scripts.

Initial URL:

Essentially, replacing the start of the request with the following is the only change that needs to be made between the CAIO and TAP for synchronous data download:

https://csa.esac.esa.int/csa/aio/product-action?

becomes

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&

After this, the request will look exactly the same.

An example:

(CAIO:)

https://csa.esac.esa.int/csa/aio/product-action?DATASET_ID=C3_CP_FGM_SPIN&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

becomes

(TAP:)

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C3_CP_FGM_SPIN&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

wget Example for SYNCHRONOUS DATA DOWNLOAD

To use the URL request with wget, put the whole request in double quotes and add wget at the beginning. Adding '--content-disposition' can help with the naming of the downloaded file.

This example:

wget --content-disposition "https://csa.esac.esa.int/csa/aio/product-action?DATASET_ID=C1_CP_WHI_NATURAL&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"

will change to:

wget --content-disposition "https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_WHI_NATURAL&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"

Python Example

To change the example given in the CAIO web site to be able to use the TAP server, the URL needs to change and another item added to the query_specs dictionary. Note that to get more than one dataset, put the strings of the datasets in a list as the DATASET_ID value. Times including fractions of seconds will be accepted but rounded down.

from requests import get  # to make GET request
import tarfile

def download(url, params, file_name):
    # open in binary mode
    with open(file_name, "wb") as file:
        # get request
        response = get(url, params=params)
        # write to file
        file.write(response.content)

# Update the URL:
myurl = 'https://csa.esac.esa.int/csa-sl-tap/data'
# Add another item to the query parameters dictionary:
query_specs = {'RETRIEVAL_TYPE': 'product',
               'DATASET_ID': 'C1_CP_FGM_SPIN',
               'START_DATE': '2003-03-03T12:00:00Z',
               'END_DATE': '2003-03-04T12:00:00Z',
               'DELIVERY_FORMAT': 'CEF',
               'DELIVERY_INTERVAL': 'hourly'}

download(myurl, query_specs, '2021taptest.tar.gz')

with tarfile.open("2021taptest.tar.gz") as tar:
    tarname = tar.getnames()
    tar.extractall()

MATLAB Example

Just like the Python example, change the URL and add RETRIEVAL_TYPE=product

URL = 'https://csa.esac.esa.int/csa-sl-tap/data';
fileName=tempname;
gzFileName = [fileName '.gz'];
options = weboptions('RequestMethod', 'get', 'Timeout', Inf);
tgzFileName = websave(gzFileName, URL, 'RETRIEVAL_TYPE', 'product', ...
    'DATASET_ID', 'C1_CP_FGM_SPIN', ...
    'START_DATE', '2003-03-03T00:00:00Z', ...
    'END_DATE', '2003-03-04T00:00:00Z', options);
gunzip(gzFileName);
fileNames=untar(fileName);
for iFile = 1:numel(fileNames), disp(fileNames{iFile}); end

IDL Example

The credentials are not needed, since only synchronous download is available and this does not require logging in. Like the Python example, the URL changes and a parameter is added to the query, however, the following code does not untar or gunzip the package. The downloaded package has the extension .tgz (before, it was .tar.gz) and IDL cannot directly untar it - you can use the function csa_untar.pro to gunzip and untar it. DO NOT USE IDL's FILE_GUNZIP on the .tgz file - it will expand until it fills your hard drive - this has been reported to the owners of IDL.

function csa_product

  ; Construct URL query from parameters and keywords.
  csa_product_query = 'RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T12:00:00Z&END_DATE=2003-03-03T14:00:00Z&DELIVERY_FORMAT=CDF&DELIVERY_INTERVAL=hourly'

  ;Create IDLnetURL object and set properties
  csa_product_obj = obj_new('IDLnetUrl')
  csa_product_obj->SetProperty, VERBOSE=1
  csa_product_obj->SetProperty, url_scheme = 'https'
  csa_product_obj->SetProperty, url_host = 'csa.esac.esa.int/'
  csa_product_obj->SetProperty, url_path = 'csa-sl-tap/data'
  csa_product_obj->SetProperty, url_query = csa_product_query

  ;send request to CSA TAP system, saving response in csa_buffer.dat
  csa_product_response = csa_product_obj->get(filename='csa_buffer.dat')
  csa_product_obj->getproperty, response_header=csa_product_header

  ;check a .tar.gz file was downloaded and if so rename buffer to correct filename and return correct filename, otherwise return 0
  csa_filestart = strpos(csa_product_header,'filename=')
  if csa_filestart ne -1 then begin
    csa_fileend =  strpos(csa_product_header,'gz"')
    csa_filename = strmid(csa_product_header,csa_filestart+10,csa_fileend-csa_filestart-8)
    csa_dir_end = strpos(csa_product_response,'csa_buffer.dat')
    csa_working_dir = strmid(csa_product_response,0,csa_dir_end)
    file_move, csa_product_response, csa_working_dir+csa_filename
    print, 'Downloaded data to '+csa_working_dir+csa_filename
    outfile = csa_working_dir+csa_filename
    return, outfile
  endif else begin
    print, 'Something went wrong.'
    return, 0
  endelse
end

Asynchronous DATA requests

To make a synchronous data request (up to 1 GB) into an asynchronous data request (up to 50 GB), you need to log in at

https://csa.esac.esa.int/csa-sl-tap/login

and then add the following to the request (note that 'DEFERRED' is case sensitive):

RETRIEVAL_ACCESS=DEFERRED

to make:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&DATASET_ID=C3_CP_FGM_SPIN&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

The log-in will create a session cookie for the request in the browser, and you will receive an email (to the email address registered to your profile) when the package is ready.

StreamED Data Requests

There are restrictions that apply to a streamed request:

Only CEF products can be downloaded using these requests
Only one dataset can be requested
Only one file is delivered for the time period requested, i.e. delivery interval option is not available
Header only cannot be requested
If the internet connection is broken before file download has completed, the request must be made again to retrieve the whole file

To make a synchronous data request into a streamed data request, add

RETRIEVAL_ACCESS=streamed

to the request:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=streamed&DATASET_ID=C3_CP_FGM_SPIN&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

Metadata Requests

Jupyter Notebook with some examples.

For a metadata request, the URL is:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=

[Default FORMAT=VOTable, can also be JSON or CSV]

then add mandatory SELECT <parameter> and FROM <table>

plus other optional conditions/conditional statements, separating with appropriate delimiters + , =

The metadata requests are directed at tables of information. For the CSA, there are 11 different tables containing different levels of information, much like the resource_class used in the old CAIO queries. The first example queries the table called csa.dataset which contains a column labelled dataset_id. The inventory example below accesses a different table called csa.dataset_inventory. The other tables include those that contain information on files (csa.file) and parameters (csa.parameter). A full list of the tables and their columns will be given in the user manual, but the Jupyter Notebook contains instructions for listing all tables and their columns.

Example: to get a list of all dataset IDs in CSV format, we need to query the csa.v_dataset table (SELECT dataset_id FROM csa.v_dataset):

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id+FROM+csa.v_dataset

This list will be unordered; order in ascending order by adding +ORDER+BY+1 (if only one field on the list, or +ORDER+BY+<field_name>), or descending order with +ORDER+BY+1+desc - this needs to go at the end:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id+FROM+csa.v_dataset+ORDER+BY+dataset_id

Example: to get a list of all datasets that include FGM, we need to add the WHERE statement and use quotes and wildcards, where %25 is the URL encoding of % (percentage sign), which is the wildcard (instead of the more usual *).

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id+FROM+csa.v_dataset+WHERE+dataset_id+like+'%25FGM%25'

If you require assistance to alter code, please contact us.

Inventory Requests

In the old (CAIO) system, an inventory request was self-contained: the selected field was DATASET_INVENTORY and this included the fields of dataset_id, start_time, end_time, num_instances and inventory_version. In the new (TAP) system, these fields must be listed separately in the query; however, this also means that it's fully customisable.

The CAIO request asks for the inventory and gives the start time and end time:

https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET_INVENTORY&RESOURCE_CLASS=DATASET_INVENTORY&RETURN_TYPE=CSV&QUERY=DATASET_INVENTORY.DATASET_ID%20like%20'C1_CP_FGM_SPIN'%20AND%20DATASET_INVENTORY.START_TIME%20%3C=%20'2002-05-01T00:00:00Z'%20AND%20DATASET_INVENTORY.END_TIME%20%3E=%20'2002-04-01T00:00:00Z'

Broken down into its constituent parts:

https://csa.esac.esa.int/csa/aio/metadata-action?

SELECTED_FIELDS=DATASET_INVENTORY&

RESOURCE_CLASS=DATASET_INVENTORY&

RETURN_TYPE=CSV&

QUERY=

DATASET_INVENTORY.DATASET_ID%20like%20'C1_CP_FGM_SPIN'%20

AND%20

DATASET_INVENTORY.START_TIME%20%3C=%20'2002-05-01T00:00:00Z'%20

AND%20

DATASET_INVENTORY.END_TIME%20%3E=%20'2002-04-01T00:00:00Z'

The closest equivalent TAP command is:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+FROM+csa.v_dataset_inventory+WHERE+dataset_id='C1_CP_FGM_SPIN'+AND+start_time<='2002-05-01T00:00:00'+AND+end_time>='2002-04-01T00:00:00'+ORDER+BY+start_time

Broken down into parts, this looks like:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?

REQUEST=doQuery&

LANG=ADQL&

FORMAT=CSV&

QUERY=

SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+

FROM+csa.v_dataset_inventory+

WHERE+dataset_id='C1_CP_FGM_SPIN'+

AND+

start_time<='2002-05-01T00:00:00'+

AND+

end_time>='2002-04-01T00:00:00'+

ORDER+BY+start_time

Remember that, as stated above for metadata, the TAP query results are not ordered by default; one has to request that the results are ordered by a given field name. Further note that the start and end times are slightly counterintuitive in order to include all relevant records - this has not changed since the move from CAIO.