From July 3 to July 7, a series of conferences was held on data science and the environment. The event, organized by IMT Atlantique, builds bridges between two communities between whom there has been little collaboration in Europe so far. Environmental data could benefit from new processing methods that could help to explain what, until now, has been impossible for physics.
Some marine and atmospheric phenomena lack physical explanations, despite the observations that have been made. Could these explanations be found through a new method of analysis? The collaboration between data science and the environment is currently underdeveloped in Europe. Yet data scientists offer tools and methodologies that could be helpful in processing environmental data. With the goal of establishing a connection between these scientific communities, IMT Atlantique created a special conference series: “Data science & Environment”, bringing together researchers from around the world. This event is associated with a summer school in order to raise awareness of these mixed approaches among future researchers. Both events were initiated by Pierre Tandeo, a researcher already convinced that this collaboration will bear fruit. Specialized in mathematics applied to oceans and meteorology, he presents the issues related to this collaboration.
What is data science?
Pierre Tandeo: Data science is built on the analysis of data using mathematical and statistical tools. It is often confused with big data. Yet data science involves a “professional” aspect, meaning that it uses a scientific approach for extracting relevant, physics-related information related a specific subject matter. Big data, on the other hand, is not necessarily aimed at addressing questions related to physics.
It is often said that data scientists wear three hats, since they must master the mathematical and IT tools, and the data for a given subject. It is not easy to possess these three areas of expertise, which explains why we organized this conference. The goal is to cause the community of applied mathematics to intermingle with that of physics related to environmental data processing, in order to merge their skills in a move towards an environmental data science.
What kinds of environmental data can data scientists process?
PT: The conference focuses on the study of oceans, the atmosphere and climate. Within these areas, there are three main types of data: satellite observations, in situ measurements at sea or in the atmosphere, and simulations from computer models. These simulations are intended to describe the phenomena using physical equations.
Today, this data is becoming increasingly easy to access. It includes large volumes of information that have not yet been used, due to the processing challenges presented by these large sets of data. Manipulating the data sets is a complex undertaking, and special IT and statistical tools must be used to process them.
What can data science contribute to environmental research and vice versa?
PT: Major environmental questions remain, and physical comprehension remains insufficient. What this means is that we are not able to convert what is observed into equations. The question is, can we try to understand these environmental phenomena using data, since the connections are undoubtedly hidden within it? To reveal this data, a suitable mathematical tool must be built.
Also, when we check the weather, for example, we don’t trust the forecasts that are made beyond one week’s time, because the system is complex. It’s called “chaotic.” The difficulty in forecasting environmental data lies in the fact that many interactions can take place between the variables that physics cannot even explain. This complexity requires a revision of the applied mathematical techniques that are commonly used. The environment forces us to rethink the way data is processed. This makes it an ideal field for data science, since it is hard to master, thus providing a challenge for mathematicians.
Can you give us an example of an environmental issue that has benefited from a mathematical approach?
PT: Some statistical approaches have proven successful. Forecasting the coupled atmosphere-ocean phenomenon called ENSO (with its two opposite phases: El Nino/La Nina) is a good example. The two ENSO phases appear irregularly (every 2 to 7 years) and have extremely significant human, economic and ecological impacts [they particularly affect North and South America]. Therefore, physicists try to predict six months in advance whether we will experience a normal year, El Nino (unusually hot) or La Nina (unusually cold). The ENSO predictions from statistical models were often found to be better than the predictions provided by physical models. These statistical forecasts are based on learning from historical data that is constantly increasing, particularly since the use of satellites.
This conference also provided an opportunity to identify other environmental challenges that remain unresolved, for which data science could provide a solution. It is a vast and rapidly growing field.
Also read on I’MTech:
Ocean remote sensing: solving the puzzle of missing data
What topics will be discussed at the conferences?
PT: The first half focuses on the applications of data science for the climate, atmosphere, and oceans. Yet we have observed that applied mathematical methods are more widespread among the atmosphere and climate community. I think oceanographers have things to learn from what is being done elsewhere. That is also why the event is being held in Brest, one of the major European oceanographic centers.
The other sessions are devoted to mathematical methodologies, and are aimed at presenting how high dimensional problems—with a large volume of information—can be processed, and how to extract relevant information. Data assimilation is also addressed. This looks at the question of how physical forecast models can be mixed with satellite data. The last focus is on analog methods, which involve using learning techniques based on historical observations and trying to project them on current or future data.
What are the anticipated outcomes of these sessions?
PT: In the short term, the goal is to start conversations. I would like to see two researchers from both communities finding common ground, because they both have something to gain. In the medium term, the goal is to make this an ongoing event. Ideally, we would like to repeat the event in other locations in France, or in Europe, and open it up to other types of environmental data over the next two years. Finally, the long-term goal would be to initiate projects involving international collaboration. Along with several colleagues, we are currently working to establish a French-American project on the applications of applied mathematics for climate. The creation of international mixed research units in these areas would mark a true culmination.