Creating a large interacting galaxy dataset with the ESA Hubble Archive,

galaxy zoo labels and deep learning



David O’Ryan (Ph.D. student at the University of Lancaster, ESAC visitor from the Archival Research Visitor Programme)


Attempting to study interacting galaxy populations in a statistical manner is a notoriously difficult task. The underlying parameter space of any individual interaction is complex and degenerate. Thus, learning about how mergers impact the full population requires statistical approaches applied to large samples of interacting systems. The ESA data archives provide a rich source of these systems, with the potential to identify samples many times larger than existing catalogs. Visual identification has previously been an excellent way to identify the low-surface-brightness signatures of interacting galaxies, but traditional visual methods face scalability and reproducibility issues. This project will build on previous efforts by training machine learning algorithms with previous visual catalogs at the Galaxy Zoo Citizen Science project and applying the trained merger detection algorithms to the ESA archive. This new sample of mergers is a very valuable precursor dataset to the full set of mergers expected from the full-depth HSC catalog, and will provide a unique degeneracy-breaking opportunity to explore population parameter spaces in a Bayesian framework.

The presentation will present results obtained over a visit to ESAC of three months in 2022, show the newly identified interacting galaxy candidates with the Convolutional Neural Network and discuss ways of expanding these type of ML-powered archival searches to other datasets via transfer learning or to other types of objects by using the same methodology. In particular, the seminar will discuss the usage of the private beta of the ESA Datalabs system for performing the science project described above.


Images below are 5 examples of approximately 14,000 of the interacting galaxies found within the Hubble Source Catalog. Each was given a prediction score of being an interacting galaxy by our CNN of >= 0.95. These are major interactions, where clear tidal features have been formed. They are prime examples of the interacting systems we are looking for in the archives.