Asynchronous Data Requests - CSA Guide

ASYNCHRONOUS DATA REQUESTS (UP TO 50 GB)

If the requested data package is greater in volume than 1 GB (compressed), an asynchronous data request is required. There are two changes necessary compared to the Synchronous Data Request:

Authentication is required - either in the web interface, or with wget, curl or a programming language.
The keyword 'RETRIEVAL_ACCESS' is added, with the value 'DEFERRED' (note that DEFERRED is case sensitive)

After this request is made, the CSA server will assemble the package, then send an email to the registered address when it is ready for collection - so make sure that the email registered to the CSA user ID is up to date.

Alternatively, rather than waiting for an email, details are provided below on how to check the status of the asynchronous job, how to tell if it is complete and ready for download, and where to go to get the completed package.

The package will be retained on the CSA server for 3 days.

If you want to download large portions of the archive, e.g., one dataset for the entire mission, this is how it should be done. Note that very large files will not convert to CDF because the converting software cannot handle them - either reduce the size of the file or download as CEF (ASCII with a metadata header).

THE BASE

The details of the request need to be built on the following base:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&

Then add the DATASET_ID, START_DATE and END_DATE details, which are mandatory:

PARAMETER NAME(S)	DESCRIPTION	MANDATORY?	POSSIBLE VALUES
RETRIEVAL_TYPE	Type of retrieval. PRODUCT (Cluster data file) or HEADER	YES	PRODUCT or HEADER
RETRIEVAL_ACCESS	Type of retrieval access - DEFERRED in this case	for ASYNC, YES	DEFERRED (case sensitive)
DATASET_ID	Unique identifier of the dataset whose data we want to retrieve. It can appear more than once, and can contain wild cards ('*')	YES	Any of the Cluster datasets.
START_DATE	Start time of the time range that the retrieved files should cover.	YES	ISO time format: 'yyyy-MM-ddTHH:mm:ssZ'
END_DATE	End time of the time range that the retrieved files should cover.	YES	ISO time format: 'yyyy-MM-ddTHH:mm:ssZ'

If the dates aren't included, the request will be accepted but the dates will default to the start and end of the Cluster mission.

The optional parameters are listed here:

PARAMETER NAME(S)	DESCRIPTION	MANDATORY?	POSSIBLE VALUES
DELIVERY_FORMAT	Delivery format, applicable only to files originally in CEF. By default, CEF	NO	CEF, CDF, CDF_2_7 or CDF_ISTP
DELIVERY_INTERVAL	Specifies the delivery file interval in time. By default, "Daily"	NO	All, Daily, TenMin, Hourly, hourly3, hourly6, hourly12
FLAT	By default, FLAT=FALSE and files will be returned in a directory structure for each dataset ID. With FLAT=TRUE, all files will be returned in a single directory.	NO	FALSE or TRUE; 0 or 1
FORCEPACK	By default TRUE: package will be tarballed, then gzipped. If FALSE, package is not tarballed, and a zip will download.	NO	TRUE or FALSE; 1 or 0
GZIP	By default, GZIP=FALSE and the files inside the package are not gzipped themselves; make GZIP=TRUE to have all the files within the package gzipped - good for large CEF files, and many programs can read gzipped ASCII files without gunzipping. CURRENTLY THERE IS A BUG FOR THIS - GZIP=TRUE REQUESTS WILL FAIL - WORK IN PROGRESS	NO	FALSE or TRUE
INCLUDE_EMPTY	By default, if the files corresponding to the dataset requested are empty such that if they do not contain any records they will be included in the delivery. Set this to 0 if the empty dataset files are not wanted.	NO	0 or 1
INGESTED_SINCE	Only download files that have been ingested since a particular date.	NO	ISO time format: 'yyyy-MM-ddTHH:mm:ssZ'
MAX_FILE_SIZE	The size of the maximum amount of uncompressed data that can be downloaded in kB. If the query results in a delivery that exceeds the specified file size the user will receive an error message.	NO	A positive integer representing the maximum file size in kB
REF_DOC	By default, the files corresponding to the dataset requested are delivered WITH the files of the available datasets that are referenced in the metadata of the requested dataset. Referenced documents are typically CAVEAT files. If the reference documents are CEF files only the requested interval is delivered, if they are non-CEF files, the file(s) containing the requested interval will be delivered. Set this to 0 if the referenced files are not wanted.	NO	0 or 1

USE CASE:

So, to take the first direct data request use case (Data UC1, Data Requests page):

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C3_CP_ASP_ION*&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

To make this an asynchronous data request, authentication is required first, at (opens a new tab):

https://csa.esac.esa.int/csa-sl-tap/login

and then include RETRIEVAL_ACCESS=DEFERRED:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&DATASET_ID=C3_CP_ASP_ION*&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

and an email containing the data collection link will be sent to the registered email address as soon as the package is ready.

Example of the email sent when package is ready for collection, including the link to retrieve the package, where it will be kept for 3 days:

The XML Response

Performing the request will generate a response in the form of an XML file containing details of the request. Putting the request into the address bar of a browser will generate the XML in the browser as shown below. However, if you wish to perform asynchronous requests within a script (particularly if you don't want to rely on the email for the collection address), the contents of the XML will need to be used for the status and collection address for the package.

The following screenshot shows the XML response to an asynchronous request made in the address bar of a browser. If you are not familiar with reading XML files, the image below shows that the content is arranged in fields, with a start and an end (marked by '/'); you may need developer tools to be active in your browser to view it like this.

For example, the first field highlighted in this screenshot is for the <uws:phase>:

<uws:phase>EXECUTING</uws:phase>

in this case the value is EXECUTING, which means that the package is still being assembled.

The status of the job can be checked by reloading the page; note that this is also the URL value inside the <uws:parameter id="email_base_url">, the second field highlighted below; for example:

<uws:parameter id="email_base_url">
<![CDATA[ http://csa.esac.esa.int/csa-sl-tap/tap/async/1651071708120OPE ]]>
</uws:parameter>

Reload the page until the <uws:phase> changes to COMPLETED (see image at bottom of page):

<uws:phase>COMPLETED</uws:phase>

then there will be a value inside the <uws:results> field at the bottom of the XML (see image at bottom of page), e.g.,:

<uws:results>
<uws:result id="hmiddlet1651071708155" xlink:type="simple" xlink:href="http://csa.esac.esa.int/csa-sl-tap/tap/async/1651071708120OPE/results/hmiddlet1651071708155" size="953398938"/>
</uws:results>

The data will download from the URL inside the uws:results field.

The COMPLETED package XML: