Earlier this year in the week of January 23 the GREAT School on the Science and Techniques of Gaia, organized by the GREAT Initial Training Network, was held at the Lorentz Center in Leiden. The school was intended for the ITN PhD students, most of whom had just started their PhD research, and a number of students from outside the network were also invited. The school consisted of morning lectures, treating the major Gaia science topics that are covered by the GREAT ITN, and afternoon exercises aimed at giving the students a flavour of what it will be like to work with the huge Gaia catalogue.
The exercises consisted of exploring various aspects of the simulated data contained in the Gaia Universe Model Snapshot version 10 (described here). The students worked in groups on exercises which involved interrogating the GUMS data set in order to research various aspects of the structure and dynamics of the Milky Way. GUMS contains 1.6 billion stars and represents a non-trivial data set to work with. For these exercises the students used a custom developed Java framework based on Hadoop, provided to them on a virtual machine (containing a small subset of the GUMS data set) which they installed on their notebooks. The virtual machine and Hadoop framework were prepared by Daniel Tapiador and William O'Mullane at ESAC.
For the exercise groups that managed to get their code working quickly there was the opportunity to run the code against the full GUMS data set which was stored in 'the cloud' at Amazon. This provided a taste of the future possibility of 'bringing the processing to the data'.
The pictures above show the results obtained by the students:
- Figure 1. The distribution of stars near the sun in the space of U and V velocities
- Figure 2. The distribution of stars in energy and angular momentum, showing prominently the disk (red colours) and halo (blue cloud of points) of the Milky Way.
- Figure 3. The colour magnitude diagram for a field toward the Galactic pole. The diagram includes the effects of extinction (in the colour only) and was obtained by running the Hadoop code on Amazon.
- Figure 4. All sky star count map for O, B, and A stars, illustrating that these types of stars are confined mostly to the Milky Way's disk.
- Figure 5. Star counts for stars within 2.5 kpc from the sun. The 'holes' in the star count maps are artefacts of the small GUMS subset used to produce the diagrams.
- Figure 6. Stellar density as a function of height above the Galactic plane. Also in this case the full GUMS data set on Amazon was used.
- Figure 7. The V-component of the space velocity of stars plotted as a function of [Fe/H] for stars near the sun. The diagram shows the disk stars at high [Fe/H] and V values and the halo at low values of [Fe/H] and V.
- Figure 8. The W-component of the space velocity of stars near the sun as a function of age, showing the well known increase of the vertical velocity dispersion with age.
- Figure 9. Stellar density in the Galactic plane. This shows the expected reach of Gaia and the effects of high extinction regions (the 'spokes' radiating out from the position of the sun). This result was also obtained from the full GUMS data set in the cloud. See also this picture of the week.
- Figure 10. The average heliocentric radial velocity of stars within 2.5 kpc from the Sun, showing where the asymmetric distribution is due to a combination of asymmetric drift and the sun's motion with respect to the Local Standard of Rest. This effect is also seen clearly the RAVE survey data. Also this last diagram was obtained from the GUMS data set in the cloud.