ASYNCHRONOUS DATA REQUESTS (UP TO 50 GB)

 

If the requested data package is greater in volume than 1 GB (compressed), an asynchronous data request is required. There are two changes necessary compared to the Synchronous Data Request:

  1. Authentication is required - either in the web interface, or with wget, curl or a programming language.

  2. The keyword 'RETRIEVAL_ACCESS' is added, with the value 'DEFERRED' (note that DEFERRED is case sensitive)

After this request is made, the CSA server will assemble the package, then send an email to the registered address when it is ready for collection - so make sure that the email registered to the CSA user ID is up to date.

Alternatively, rather than waiting for an email, details are provided below on how to check the status of the asynchronous job, how to tell if it is complete and ready for download, and where to go to get the completed package.

The package will be retained on the CSA server for 3 days.

 

If you want to download large portions of the archive, e.g., one dataset for the entire mission, this is how it should be done. Note that very large files will not convert to CDF because the converting software cannot handle them - either reduce the size of the file or download as CEF (ASCII with a metadata header).

THE BASE

The details of the request need to be built on the following base:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&

Then add the DATASET_IDSTART_DATE and END_DATE details, which are mandatory:

PARAMETER NAME(S)

DESCRIPTION

MANDATORY?

POSSIBLE VALUES

RETRIEVAL_TYPE

Type of retrieval. PRODUCT (Cluster data file) or HEADER

YES

PRODUCT or HEADER

RETRIEVAL_ACCESS

Type of retrieval access - DEFERRED in this case

for ASYNC, YES

DEFERRED (case sensitive)

DATASET_ID

Unique identifier of the dataset whose data we want to retrieve.
It can appear more than once, and can contain wild cards ('*')

YES

Any of the Cluster datasets.

START_DATE

Start time of the time range that the retrieved files should cover.

YES

ISO time format: 'yyyy-MM-ddTHH:mm:ssZ'

END_DATE

End time of the time range that the retrieved files should cover.

YES

ISO time format: 'yyyy-MM-ddTHH:mm:ssZ'

If the dates aren't included, the request will be accepted but the dates will default to the start and end of the Cluster mission.

 

The optional parameters are listed here:

PARAMETER NAME(S)

DESCRIPTION

MANDATORY?

POSSIBLE VALUES

DELIVERY_FORMAT

Delivery format, applicable only to files originally in CEF. By default, CEF

NO

CEF, CDF, CDF_2_7 or CDF_ISTP

DELIVERY_INTERVAL

Specifies the delivery file interval in time. By default, "Daily"

NO

All, Daily, TenMin, Hourly, hourly3, hourly6, hourly12

FLAT

By default, FLAT=FALSE and files will be returned in a directory structure for each dataset ID. With FLAT=TRUE, all files will be returned in a single directory.

NO

FALSE or TRUE; 0 or 1

FORCEPACK

By default TRUE: package will be tarballed, then gzipped.

If FALSE, package is not tarballed, and a zip will download.

NO

TRUE or FALSE; 1 or 0

GZIP

By default, GZIP=FALSE and the files inside the package are not gzipped themselves; make GZIP=TRUE to have all the files within the package gzipped - good for large CEF files, and many programs can read gzipped ASCII files without gunzipping.

CURRENTLY THERE IS A BUG FOR THIS - GZIP=TRUE REQUESTS WILL FAIL - WORK IN PROGRESS

NO

FALSE or TRUE

INCLUDE_EMPTY

By default, if the files corresponding to the dataset requested are empty such that if they do not contain any records they will be included in the delivery. Set this to 0 if the empty dataset files are not wanted.

NO

0 or 1

INGESTED_SINCE

Only download files that have been ingested since a particular date.

NO

ISO time format: 'yyyy-MM-ddTHH:mm:ssZ'

MAX_FILE_SIZE

The size of the maximum amount of uncompressed data that can be downloaded in kB. If the query results in a delivery that exceeds the specified file size the user will receive an error message.

NO

A positive integer representing the maximum file size in kB

REF_DOC

By default, the files corresponding to the dataset requested are delivered WITH the files of the available datasets that are referenced in the metadata of the requested dataset. Referenced documents are typically CAVEAT files. If the reference documents are CEF files only the requested interval is delivered, if they are non-CEF files, the file(s) containing the requested interval will be delivered. Set this to 0 if the referenced files are not wanted.

NO

0 or 1

USE CASE:

So, to take the first direct data request use case (Data UC1Data Requests page):

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C3_CP_ASP_ION*&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

To make this an asynchronous data request, authentication is required first, at (opens a new tab):

https://csa.esac.esa.int/csa-sl-tap/login

and then include RETRIEVAL_ACCESS=DEFERRED:

https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&DATASET_ID=C3_CP_ASP_ION*&START_DATE=2004-06-18T11:35:00Z&END_DATE=2004-06-19T18:35:00Z

and an email containing the data collection link will be sent to the registered email address as soon as the package is ready.

Example of the email sent when package is ready for collection, including the link to retrieve the package, where it will be kept for 3 days:

The XML Response

Performing the request will generate a response in the form of an XML file containing details of the request. Putting the request into the address bar of a browser will generate the XML in the browser as shown below. However, if you wish to perform asynchronous requests within a script (particularly if you don't want to rely on the email for the collection address), the contents of the XML will need to be used for the status and collection address for the package.

The following screenshot shows the XML response to an asynchronous request made in the address bar of a browser. If you are not familiar with reading XML files, the image below shows that the content is arranged in fields, with a start and an end (marked by '/'); you may need developer tools to be active in your browser to view it like this.

For example, the first field highlighted in this screenshot is for the <uws:phase>:

<uws:phase>EXECUTING</uws:phase>

in this case the value is EXECUTING, which means that the package is still being assembled. 

The status of the job can be checked by reloading the page; note that this is also the URL value inside the <uws:parameter id="email_base_url">, the second field highlighted below; for example:

<uws:parameter id="email_base_url">
<![CDATA[ http://csa.esac.esa.int/csa-sl-tap/tap/async/1651071708120OPE ]]>
</uws:parameter>

Reload the page until the <uws:phase> changes to COMPLETED (see image at bottom of page):

<uws:phase>COMPLETED</uws:phase>

then there will be a value inside the <uws:results> field at the bottom of the XML (see image at bottom of page), e.g.,:

<uws:results>
<uws:result id="hmiddlet1651071708155" xlink:type="simple" xlink:href="http://csa.esac.esa.int/csa-sl-tap/tap/async/1651071708120OPE/results/hmiddlet1651071708155" size="953398938"/>
</uws:results>

The data will download from the URL inside the uws:results field.

The COMPLETED package XML: