Cone search > Bulk download¶

Release number: v1.1 (2022-08-06)

Applicable Gaia Data Releases: Gaia EDR3, Gaia DR3

Author: Héctor Cánovas Cabrera; hector.canovas@esa.int

Summary:

This code computes the list of Gaia (E)DR3 files associated to a circular region in the sky defined by the user. The granularity of this region is set by the HEALPix level selected.

Input parameters:

target catalogue (e.g., gaia_source, auxiliary/agn_cross_id, or auxiliary/frame_rotator_source),
the cone-search parameters (centre and radius), and
the desired healpix level.

Once the variables above are set the notebook creates a reference file that contains the min/max HEALPix index (levels: 6,7,8, and 9) encompassed by each gaia_source file available in the (E)DR3 bulk download directory.. The convertion between the different HEALPix levels is done by means of bit-shifting operations.

Useful URLs:

import os
from datetime import datetime
import numpy as np
import pandas as pd
from astropy import units as u
from astropy_healpix import HEALPix

Set input variables¶

Default input paramers:

DR3 = True ; Default Value. Set it to False to retrieve EDR3 files
target_table = 'gaia_source' ; Alternative values: 'Astrophysical_parameters/astrophysical_parameters', 'Variability/vari_cepheid', etc - see all the content in: http://cdn.gea.esac.esa.int/Gaia/gdr3/ & http://cdn.gea.esac.esa.int/Gaia/gedr3/
Cone-search parameters: radius = 0.5 degrees, centred in the Large Magallanic Cloud (in ICRS coordinates).
Healpix-level = 6 (choose a larger one to increase granularity, and viceversa).

# Set input parameters below ===========
DR3            = True            # Set it to False to select EDR3
target_table   = 'gaia_source'   # Alternative values: 'Astrophysical_parameters/astrophysical_parameters/', etc
hpx_level      = 6
lon            =  80.894 * u.deg # Right Ascencion (ICRS)
lat            = -69.756 * u.deg # Declination (ICRS)
radius         = 0.5     * u.deg
print(f'Input Variables: ')
print(f'* HEALPix level                      = {hpx_level} ')
print(f'* ICRS longitude (~ Right Ascension) = {lon} ')
print(f'* ICRS latitude  (~ Declination)     = {lat} ')
print(f'* Radius                             = {radius} ')
print()

Input Variables:
* HEALPix level                      = 6
* ICRS longitude (~ Right Ascension) = 80.894 deg
* ICRS latitude  (~ Declination)     = -69.756 deg
* Radius                             = 0.5 deg

# Download basic parameters ============
output_file    = 'bulk_download_files.txt'
output_dir     = 'downloads'

if os.path.isdir(f'{output_dir}'):
   now = datetime.now()
   output_dir_2 = output_dir + now.strftime("_%Y-%m-%d")
   print(f'>> Warning: {output_dir} directory exist. Creating alternative directory: {output_dir_2}')
   print()
   output_dir   = output_dir_2
else:
   os.system(f'mkdir {output_dir}')

print(f'Files will be downloaded to: {output_dir}')

Files will be downloaded to: downloads

Create reference file¶

if DR3:
   gaia_dr_flag = 'DR3'
else:
   gaia_dr_flag = 'EDR3'

print('='*120)
print(f'Preparing selection of Gaia {gaia_dr_flag}: ""{target_table}" files')
print('='*120)

url_prefix      = f'http://cdn.gea.esac.esa.int/Gaia/g{gaia_dr_flag.lower()}/{target_table}/'
md5sum_file_url = url_prefix + '_MD5SUM.txt'
md5sum_file     = pd.read_csv(md5sum_file_url, header=None, delim_whitespace=True, names=['md5Sum', 'file'])

if DR3:
   md5sum_file.drop(md5sum_file.tail(1).index,inplace=True) # The last row in the "_MD5SUM.txt" file in the DR3 directories includes the md5Sum value of the _MD5SUM.txt file
md5sum_file

======================================================================================================================== Preparing selection of Gaia DR3: ""gaia_source" files ========================================================================================================================

# Extract HEALPix level-8 from file name ======================================
healpix_8_min  = [int(file[file.find('_')+1:file.rfind('-')])     for file in md5sum_file['file']]
healpix_8_max  = [int(file[file.rfind('-')+1:file.rfind('.csv')]) for file in md5sum_file['file']]
reference_file = pd.DataFrame({'file':md5sum_file['file'], 'healpix8_min':healpix_8_min, 'healpix8_max':healpix_8_max}).reset_index(drop=True)

# Compute HEALPix levels 6,7, and 9 ===========================================
reference_file['healpix7_min'] = [inp >> 2 for inp in reference_file['healpix8_min']]
reference_file['healpix7_max'] = [inp >> 2 for inp in reference_file['healpix8_max']]

reference_file['healpix6_min'] = [inp >> 2 for inp in reference_file['healpix7_min']]
reference_file['healpix6_max'] = [inp >> 2 for inp in reference_file['healpix7_max']]

reference_file['healpix9_min'] = [inp << 2       for inp in reference_file['healpix8_min']]
reference_file['healpix9_max'] = [(inp << 2) + 3 for inp in reference_file['healpix8_max']]

# Generate reference file =====================================================
ncols          = ['file', 'healpix6_min', 'healpix6_max', 'healpix7_min', 'healpix7_max', 'healpix8_min', 'healpix8_max', 'healpix9_min', 'healpix9_max']
reference_file = reference_file[ncols]
reference_file

Compute Healpix indexes associated to the selected circular region¶

print('='*120)
print(f'Computing HEALPix Level {hpx_level} encompasing a Cone Search (Radius, longitude, latitude): {radius.value} {radius.unit},  {lon.value} {lon.unit}, {lat.value} {lat.unit}')
print('='*120)

hp             = HEALPix(nside=2**hpx_level, order='nested')
hp_cone_search = hp.cone_search_lonlat(lon, lat, radius = radius)

======================================================================================================================== Computing HEALPix Level 6 encompasing a Cone Search (Radius, longitude, latitude): 0.5 deg,  80.894 deg, -69.756 deg ========================================================================================================================

Download files¶

A .txt file with the list of files to be downloaded will be firts created. This file will be read and a secuencial download of all the files listed will start. A progress message will be in the terminal from where this Notebook was launched.

f = open(output_file, "w")

subset     = []
for index in reference_file.index:
    row = reference_file.iloc[index]
    hp_min, hp_max = row[f'healpix{hpx_level}_min'], row[f'healpix{hpx_level}_max']
    if np.any(np.logical_and(hp_min <= hp_cone_search, hp_cone_search <= hp_max)):
       bulk_file = url_prefix + row['file'] + '\n'
       f.write(bulk_file)
       subset.append(bulk_file)
f.close()

print('='*120)
print(f'A total of {len(subset)} files for download were written in {output_file}')
print('='*120)

======================================================================================================================== A total of 8 files for download were written in bulk_download_files.txt ========================================================================================================================

print('='*120)
print(f'Bulk download files are stored in directory: {output_dir}')
print('='*120)
os.system(f'wget -i {output_file} -P {output_dir}/ -q  --show-progress --progress=bar:force 2>&1')

======================================================================================================================== Bulk download files are stored in directory: downloads ========================================================================================================================
GaiaSource_529516-5 100%[===================>] 194,36M  63,9MB/s    in 3,0s
GaiaSource_529539-5 100%[===================>] 192,17M  68,3MB/s    in 2,8s
GaiaSource_529548-5 100%[===================>] 204,84M  66,2MB/s    in 3,1s
GaiaSource_529557-5 100%[===================>] 196,44M  62,6MB/s    in 3,2s
GaiaSource_529567-5 100%[===================>] 193,80M  66,0MB/s    in 2,9s
GaiaSource_529576-5 100%[===================>] 190,83M  60,2MB/s    in 3,2s
GaiaSource_529585-5 100%[===================>] 201,90M  61,6MB/s    in 3,3s
GaiaSource_529597-5 100%[===================>] 196,50M  64,4MB/s    in 3,1s

0

	md5Sum	file
0	52d800747f32a025bdcf1594bbbad29b	GaiaSource_000000-003111.csv.gz
1	aa91dfad8cc5122f109aacb462fb8304	GaiaSource_003112-005263.csv.gz
2	0ee8a887c3db8cb5110354c42289b0a2	GaiaSource_005264-006601.csv.gz
3	11ced35b2cead1950331222f6c5690f0	GaiaSource_006602-007952.csv.gz
4	b178d83fbe020b1f131c0aedcdc0cd29	GaiaSource_007953-010234.csv.gz
...	...	...
3381	20c69195b88742d586b0227d1831893a	GaiaSource_783518-784479.csv.gz
3382	bf8bf6562d676df85787a23e62a05b79	GaiaSource_784480-784992.csv.gz
3383	158b295db422e9d024b8f6c8429aaa9d	GaiaSource_784993-785417.csv.gz
3384	57c5737e81e2548ff0f1d57d7b667096	GaiaSource_785418-786096.csv.gz
3385	c084a93c691e81e616cb020d1bba2c60	GaiaSource_786097-786431.csv.gz

	file	healpix6_min	healpix6_max	healpix7_min	healpix7_max	healpix8_min	healpix8_max	healpix9_min	healpix9_max
0	GaiaSource_000000-003111.csv.gz	0	194	0	777	0	3111	0	12447
1	GaiaSource_003112-005263.csv.gz	194	328	778	1315	3112	5263	12448	21055
2	GaiaSource_005264-006601.csv.gz	329	412	1316	1650	5264	6601	21056	26407
3	GaiaSource_006602-007952.csv.gz	412	497	1650	1988	6602	7952	26408	31811
4	GaiaSource_007953-010234.csv.gz	497	639	1988	2558	7953	10234	31812	40939
...	...	...	...	...	...	...	...	...	...
3381	GaiaSource_783518-784479.csv.gz	48969	49029	195879	196119	783518	784479	3134072	3137919
3382	GaiaSource_784480-784992.csv.gz	49030	49062	196120	196248	784480	784992	3137920	3139971
3383	GaiaSource_784993-785417.csv.gz	49062	49088	196248	196354	784993	785417	3139972	3141671
3384	GaiaSource_785418-786096.csv.gz	49088	49131	196354	196524	785418	786096	3141672	3144387
3385	GaiaSource_786097-786431.csv.gz	49131	49151	196524	196607	786097	786431	3144388	3145727

How to extract Gaia data - Gaia Users

Help support

Graphical User Interface

Landing page

Basic query form

Advanced (ADQL) query form

Query results

Video: How to use the Archive

Video: How to use the Archive basic form

Tutorial: Basic queries

Tutorial: Advanced (ADQL) tab

Directory structure

Gaia (E)DR3

Gaia DR2

Files download

entire catalogue

Partial download (cone-search alike)

Cone search > Bulk download¶

Set input variables¶

Create reference file¶

Compute Healpix indexes associated to the selected circular region¶

Download files¶