Unraveling the evolution of the Milky Way with big data

Unraveling the evolution of our home galaxy, the Milky Way, is a similar challenge to mapping the human genome, according to the European Space Agency (ESA). ESA’s galaxy mapper Gaia takes billions of measurements of 2 billion of the brightest stars in the sky. Here we look at what it takes to uncheck those measures to reveal the secrets of the galaxy.

On June 13, the Gaia Data Processing and Analysis Consortium (DPAC), a collaboration of 450 European astronomers and engineers supporting the galaxy-mapping, released what DPAC Chairman Anthony Brown described as “the richest set of astronomical data ever published.”

To create the 10-terabyte catalog of compressed data, DPAC computers had to ingest 940 billion observations of 2 billion of the sky’s brightest light sources, said Brown, an astronomer at Leiden University in the Netherlands, at an ESA press conference in June. 13.

Related: A new trove of Gaia data will unveil the Milky Way’s dark past and future

The data, captured by Gaia between June 2014 and June 2017, contained information about the precise positions and movements of 1.5 billion stars in the sky; details of the ages, temperatures and light levels of around half a billion of them stars; and the detailed chemical compositions of several million of them.

It took five years for the data to pass through the sophisticated IT pipeline of validation, calibration and analysis procedures, which involve six supercomputing centers in six European countries. It would take a thousand years for a single (and rather powerful) personal computer to process the dataset, Gonzalo Gracia, DPAC project coordinator for data processing, told Space.com.

As of 2022, Gaia’s main database contains 1 petabyte of data, Gracia added, which equates to the data capacity of 200,000 DVDs. To date, the telescope has made more than 100 measurements of each of the 2 billion light sources it sees.

“Every day, Gaia sends us between 20 and 100 gigabytes of data,” Gracia said. “That may not seem like much if you compare it to the bandwidth you have at home, but we’re talking about a 1.5 million kilometer satellite. [930,000 miles] away from Earth.”

The Gaia telescope observes 2 billion of the brightest stars in our galaxy, the Milky Way. (Image credit: ESA/ATG medialab; background: ESO/S. Brunier, CC BY-NC)

The data journey

From Gaia’s point of view at Lagrange 2, a stable point in the sun-Earth system where the gravitational forces of the two bodies are in balance, the spacecraft observes the cosmos from the shelter of the Sunis glare.

Three ESA space antennas (one near Madrid, one in Malargüe in Argentina and one in New Norcia in Australia) receive the data collected by the two telescopes and other instruments of the space probe. From these ground stations, measurements travel over conventional internet lines to the European Space Operations Center in Darmstadt, Germany, for basic checks, before the data is sent to the Science Operations Center of the space station. agency in Madrid.

“That’s when we do the first round of treatment,” Gracia said. “We do some initial calibrations and run the data through software to assess the health of the satellite. This happens within the first few hours of receiving the data.”

Then things start to get complicated. A data processing center at CNES, the French space agency, in Toulouse scans the data set for fast moving objects in the solar system: asteroids and comets which could be on a collision course with Earth.

“They have a pipeline, which detects these objects and checks if they are already known,” Gracia said. “If they’re not known, they raise an alarm with the community of solar system objects around the world, who can do the follow-up observation and find out what the object is talking about and what its trajectory is.”

Gaia is pretty good at monitoring asteroids and might even see some that aren’t visible from Earth. The release of mission data on June 13 contained information on the detailed trajectories of 60,000 space rocks in the solar system. In addition to this, Gaia measured the light spectra of these space rocks, revealing their chemical compositions. Previously, astronomers knew the detailed chemical compositions of only 4,500 asteroids.

In addition, a team from Cambridge, England, is comparing the new luminosity measurements provided by Gaia with the data acquired earlier. Significant changes in star brightness levels are always cause for excitement, as they could indicate supernovaexplosions that occur when massive stars die before collapsing into black holes Where neutron stars.

Sometimes faint distant stars and galaxies can temporarily shine through microlensa strange phenomenon that occurs when an extremely massive object comes between the dim star and the observer, its powerful light bending gravity acting as a magnifying glass. Gaia, who scans the entire sky every two months, sees all of this.

Again and again

Meanwhile, the rest of the consortium conducts what Gracia calls “cycle processing”: endless cycles of redigesting, validating, and analyzing data to extract the most accurate information astronomers can use to create accurate maps of the Milky Way galaxy and model its life in the past and the future. Several thousand servers running tens of thousands of core processors are involved in the operation.

“We have to process the data multiple times,” Gracia said. “We process it, we give it to scientists for verification, and then we have to adjust our calibrations, our algorithms; we have to improve them every time.”

The datasets are also dependent on each other. For example, without information on the precise positions of observed objects, data on changes in brightness or movements of asteroids would be worthless.

“We basically have information about the amount of photons hitting the Gaia telescopes, and from their position in the window we derive the positions in the sky,” Gracia said. “It’s done in Barcelona, ​​where we produce this astrometric information for all the sources in the sky. It’s the input for pretty much all of the other processing that we do. It takes a long time to do all of this and to do it with sufficient amount of data to ensure that the data is truly of the highest quality.”

This amount of processing causes the delay between data acquisition and dissemination. Gaia was launched in December 2013, but the astronomy community didn’t get their hands on the first batch of data until September 2016. The second data release followed in April 2018. The June 13 data dump was preceded by a partial advance release in December 2020. Each new catalog increases the accuracy of the data as well as the amount of information available on each of the 2 billion light sources the telescope sees. Although the mission is already in its ninth year, nothing stops for the 450 researchers and engineers of the DPAC.

As Milky Way researchers around the world unwrap the gifts of the June 13 data release, searching for evidence of the galaxy’s dynamic life, Gracia and her colleagues are already busy working on the next dump of data, which promises, among other things, to unleash Gaia’s ability to spot planets around distant stars. Thousands of new discoveries should enrich the existing exoplanet catalog as DPAC researchers train their algorithms to spot the characteristic faint fading of a star caused by a planet crossing in front of its disk.

“We started processing data from the fourth cycle two years ago and are already planning for the fifth cycle,” Gracia said. “It’s really non-stop.”

Follow Tereza Pultarova on Twitter @TerezaPultarova. Follow us on Twitter @Spacedotcom and on Facebook.

About Johnnie Gross

Check Also

Castaway gamma-ray bursts occur in extremely distant galaxies, astronomers say

About 30% of short gamma-ray bursts (sGRBs), which form in neutron star collisions, have no …