Differences between CAIO and TAP

There are several different requests that are possible using the CAIO:

The above features are already available in the TAP limited-functionality beta release and the sections below describe how to adapt scripts for these services.

The following features, already available in CAIO, will be added shortly to the TAP functionality:

  • Asynchronous product requests (data)
  • Header requests (counts as data)
  • Streaming data requests (data)

If you require assistance to alter code, please do not hesitate to get in touch with Helen Middleton - any enquiries will also help in the writing of the User Manual/Help pages.

Adapting scripts for:

Synchronous Data Download

The format is slightly different, but the development team have worked hard to limit the changes necessary to a request. The two biggest changes are as follows

Change of downloaded format suffix:

The CAIO package downloaded has a .tar.gz extension; the TAP system will give you a .tgz, which can be treated in the same way as the .tar.gz package but with possible repercussions for your scripts.

Initial URL:

Essentially, replacing the start of the request with the following is the only change that needs to be made between the CAIO and TAP for synchronous data download:

So 

https://csa.esac.esa.int/csa/aio/product-action?

becomes

https://csa.esac.esa.int/csa-sl-tap/data?retrieval_type=PRODUCT&

After this, the request will look exactly the same.

An example:

(CAIO:)

https://csa.esac.esa.int/csa/aio/product-action?DATASET_ID=C3_CP_FGM_SPIN&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

becomes

(TAP:) 

https://csa.esac.esa.int/csa-sl-tap/data?retrieval_type=PRODUCT&DATASET_ID=C3_CP_FGM_SPIN&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

 

wget Example for SYNCHRONOUS DATA DOWNLOAD

To use the URL request with wget, put the whole request in double quotes and add wget at the beginning. Adding '--content-disposition' can help with the naming of the downloaded file.

This example:

wget --content-disposition "https://csa.esac.esa.int/csa/aio/product-action?DATASET_ID=C1_CP_WHI_NATURAL&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"

will change to:

wget --content-disposition "https://csa.esac.esa.int/csa-sl-tap/data?retrieval_type=PRODUCT&DATASET_ID=C1_CP_WHI_NATURAL&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"

 

Python Example

To change the example given in the CAIO web site to be able to use the TAP server, the URL needs to change and another item added to the query_specs dictionary. Note that to get more than one dataset, put the strings of the datasets in a list as the DATASET_ID value. Times including fractions of seconds will be accepted but rounded down.

from requests import get  # to make GET request
import tarfile

def download(url, params, file_name):
    # open in binary mode
    with open(file_name, "wb") as file:
        # get request
        response = get(url, params=params)
        # write to file
        file.write(response.content)

# Update the URL:
myurl = 'https://csa.esac.esa.int/csa-sl-tap/data'
# Add another item to the query parameters dictionary:
query_specs = {'retrieval_type': 'PRODUCT',
               'DATASET_ID': 'C1_CP_FGM_SPIN',
               'START_DATE': '2003-03-03T12:00:00Z',
               'END_DATE': '2003-03-04T12:00:00Z',
               'DELIVERY_FORMAT': 'CEF',
               'DELIVERY_INTERVAL': 'hourly'}

download(myurl, query_specs, '2021taptest.tar.gz')

with tarfile.open("2021taptest.tar.gz") as tar:
    tarname = tar.getnames()
    tar.extractall()

 

MATLAB Example

Just like the Python example, change the URL and add retrieval_type=PRODUCT 

URL = 'https://csaint.esac.esa.int/csa-sl-tap/data';
fileName=tempname;
gzFileName = [fileName '.gz'];
options = weboptions('RequestMethod', 'get', 'Timeout', Inf);
tgzFileName = websave(gzFileName, URL, 'retrieval_type', 'PRODUCT', ...
    'DATASET_ID', 'C1_CP_FGM_SPIN', ...
    'START_DATE', '2003-03-03T00:00:00Z', ...
    'END_DATE', '2003-03-04T00:00:00Z', options);
gunzip(gzFileName);
fileNames=untar(fileName);
for iFile = 1:numel(fileNames), disp(fileNames{iFile}); end

 

IDL Example

The credentials are not needed, since only synchronous download is available and this does not require logging in. Like the Python example, the URL changes and a parameter is added to the query:

function csa_product, csa_dataset_id, csa_start_date, csa_end_date, $
  delivery_format=csa_delivery_format,                     $
  delivery_interval=csa_delivery_interval,                 $
  ref_doc=csa_ref_doc, ingested_since=csa_ingested_since,  $
  max_file_size = csa_max_file_size,                       $
  include_empty=csa_include_empty

  ;define URL host and path
  csa_host = 'csaint.esac.esa.int/'
  csa_product_path = 'csa-sl-tap/data'

  ; Establish error handler. When errors occur, the error index is returned in the variable Error_status:
  CATCH, Error_status

  ; Construct URL query from supplied parameters and keywords.
  csa_product_query = 'retrieval_type=PRODUCT&'

  ; Add all required DATASET_IDs
  for i = 0, n_elements(csa_dataset_id) -1 do begin
    csa_product_query=csa_product_query+'DATASET_ID='+csa_dataset_id[i]+'&'
  endfor

  ; Also basically mandatory is the start and end date (otherwise you're requesting the whole mission)
  csa_product_query = csa_product_query+'START_DATE='+csa_start_date+'&END_DATE='+csa_end_date

  ; Optional stuff:
  if keyword_set(csa_delivery_format) then csa_product_query=csa_product_query+'&DELIVERY_FORMAT='+csa_delivery_format
  if keyword_set(csa_delivery_interval) then csa_product_query=csa_product_query+'&DELIVERY_INTERVAL='+csa_delivery_interval
  if keyword_set(csa_ref_doc) then csa_product_query=csa_product_query+'&REF_DOC='+csa_ref_doc
  if keyword_set(csa_ingested_since) then csa_product_query=csa_product_query+'&INGESTED_SINCE='+csa_ingested_since
  if keyword_set(csa_max_file_size) then csa_product_query=csa_product_query+'&MAX_FILE_SIZE='+csa_max_file_size
  if keyword_set(csa_include_empty) then csa_product_query=csa_product_query+'&INCLUDE_EMPTY='+csa_include_empty

  ;Create IDLnetURL object and set properties
  csa_product_obj = obj_new('IDLnetUrl')
  csa_product_obj->SetProperty, VERBOSE=1
  csa_product_obj->SetProperty, url_scheme = 'https'
  csa_product_obj->SetProperty, url_host = csa_host
  csa_product_obj->SetProperty, url_path = csa_product_path
  csa_product_obj->SetProperty, url_query = csa_product_query

  ;send request to CSA AIO system, saving response in csa_buffer.dat
  print, 'Getting data from CSA, please wait.'
  if Error_status NE 0 then begin
    print,'ERROR index  : ',Error_status
    print,'ERROR message: ',!error_state.msg
    print,'Most likely missing data file(s). Quitting.'
    return, 0
  endif

  csa_product_response = csa_product_obj->get(filename='csa_buffer.dat')
  csa_product_obj->getproperty, response_header=csa_product_header

  ;check a .tar.gz file was downloaded and if so rename buffer to correct filename and return correct filename, otherwise return 0
  csa_filestart = strpos(csa_product_header,'filename=')
  if csa_filestart ne -1 then begin
    csa_fileend =  strpos(csa_product_header,'gz"')
    csa_filename = strmid(csa_product_header,csa_filestart+10,csa_fileend-csa_filestart-8)
    csa_dir_end = strpos(csa_product_response,'csa_buffer.dat')
    csa_working_dir = strmid(csa_product_response,0,csa_dir_end)
    file_move, csa_product_response, csa_working_dir+csa_filename
    print, 'Downloaded data to '+csa_working_dir+csa_filename
    outfile = csa_working_dir+csa_filename
    return, outfile
  endif else begin
    print, 'Something went wrong.'
    return, 0
  endelse
end

Metadata Requests

 

For a metadata request, the URL is:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=

[Default FORMAT=VOTable, can also be JSON or CSV]

then add mandatory SELECT <parameter> and FROM <table> 

plus other optional conditions/conditional statements, separating with appropriate delimiters + , =

 

Example: to get a list of all dataset IDs in CSV format, we need to query the csa.v_dataset table (SELECT dataset_id FROM csa.v_dataset):

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id+FROM+csa.v_dataset

This list will be unordered; order in ascending order by adding +ORDER+BY+1 (if only one field on the list, or +ORDER+BY+<field_name>), or descending order with +ORDER+BY+1+desc - this needs to go at the end:

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id+FROM+csa.v_dataset+ORDER+BY+dataset_id

 

Example: to get a list of all datasets that include FGM, we need to add the WHERE statement and use quotes and wildcards, where %25 is the URL encoding of % (percentage sign), which is the wildcard (instead of the more usual *).

https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id+FROM+csa.v_dataset+WHERE+dataset_id+like+'%25FGM%25'

 

If you require assistance to construct a metadata request, please do not hesitate to get in touch with Helen Middleton - any enquiries will also help in the writing of the User Manual/Help pages.

 

Inventory Requests

In the old (CAIO) system, an inventory request was self-contained: the selected field was DATASET_INVENTORY and this included the fields of dataset_id, start_time, end_time, num_instances and inventory_version. In the new (TAP) system, these fields must be listed separately in the query; however, this also means that it's fully customisable.

The CAIO request asks for the inventory and gives the start time and end time:

https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET_INVENTORY&RESOURCE_CLASS=DATASET_INVENTORY&RETURN_TYPE=CSV&QUERY=DATASET_INVENTORY.DATASET_ID%20like%20'C1_CP_FGM_SPIN'%20AND%20DATASET_INVENTORY.START_TIME%20%3C=%20'2002-05-01T00:00:00Z'%20AND%20DATASET_INVENTORY.END_TIME%20%3E=%20'2002-04-01T00:00:00Z'

Broken down into its constituent parts:

https://csa.esac.esa.int/csa/aio/metadata-action?

SELECTED_FIELDS=DATASET_INVENTORY&

RESOURCE_CLASS=DATASET_INVENTORY&

RETURN_TYPE=CSV&

QUERY=

DATASET_INVENTORY.DATASET_ID%20like%20'C1_CP_FGM_SPIN'%20

AND%20

DATASET_INVENTORY.START_TIME%20%3C=%20'2002-05-01T00:00:00Z'%20

AND%20

DATASET_INVENTORY.END_TIME%20%3E=%20'2002-04-01T00:00:00Z'

 

The closest equivalent TAP command is:

https://csaint.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+FROM+csa.v_dataset_inventory+WHERE+dataset_id='C1_CP_FGM_SPIN'+AND+start_time<='2002-05-01T00:00:00'+AND+end_time>='2002-04-01T00:00:00'+ORDER+BY+start_time

Broken down into parts, this looks like:

https://csaint.esac.esa.int/csa-sl-tap/tap/sync?

REQUEST=doQuery&

LANG=ADQL&

FORMAT=CSV&

QUERY=

SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+

FROM+csa.v_dataset_inventory+

WHERE+dataset_id='C1_CP_FGM_SPIN'+

AND+

start_time<='2002-05-01T00:00:00'+

AND+

end_time>='2002-04-01T00:00:00'+

ORDER+BY+start_time

Remember that, as stated above for metadata, the TAP query results are not ordered by default; one has to request that the results are ordered by a given field name. Further note that the start and end times are slightly counterintuitive in order to include all relevant records - this has not changed since the move from CAIO.