As environmental scientists move towards understanding earth systems at greater resolution than ever before, it’s critical that they have access to needed data sets. Yet much of these data are not archived, publicly available, or collected in a standardized format, due to the multiple challenges of coordinating efforts across independent research groups and institutions worldwide.
Now researchers at Berkeley Lab are taking action to address these challenges. Thanks to $3.6 million in funding from the U.S. Department of Energy (DOE)’s Office of Science, the Lab’s Computing Sciences and Earth & Environmental Sciences Area (EESA) are partnering on a three-year project to develop an archive that will serve as a repository for hundreds of DOE-funded research projects under the agency’s Environmental System Science (ESS) umbrella. The ESS domain includes both large-scale and smaller studies of Subsurface Biogeochemical Research and Terrestrial Ecosystem Science around the world.
“Our basic mission is to enable all of DOE’s ESS projects to archive their data with us so that it’s available, and won’t get lost,” said Deborah Agarwal, a senior scientist at Computing Sciences who is leading the effort. “Just as important is to make the data available to the public, as well as to DOE researchers.”
Dubbed ESS-DIVE (Environmental System – Science Data Infrastructure for a Virtual Ecosystem), the Lab-hosted archive will make a significant difference for researchers and the public, says Margaret Torn, EESA senior scientist. Torn leads EESA’s Biosphere-Atmosphere Interactions program domain, which encompasses large ESS projects such as AmeriFlux, Next Generation Ecosystem Experiment (NGEE)-Arctic, and NGEE-Tropics.
In addition to providing an archive for her team’s data, Torn says that ESS-DIVE will allow scientists studying similar topics to know that other data exist. And by enabling the community to establish protocols and standards for the archived data—such as using the same variable names and units—it will enable scientists to integrate data from across teams/projects for broader analyses.
“People who aren’t researchers will also benefit from these data,” Torn said, “such as water utilities, farmers, and stewards of environmental remediation.”
The ESS-DIVE team will set up user capabilities in the archive such as advanced data search and data visualization. The team also plans to conduct a user needs assessment in order to ensure a quality user experience.
“The preservation and appropriate curation of data—as well as being able to reuse it—is a key component of good science,” said Jay Hnilo, DOE Program Manager for Data Informatics. ESS-DIVE will create an integrated data environment and help to accelerate DOE’s science going forward, he added.
“We all want to extend our understanding from the sites that we are studying to as much of the Earth as possible, and connect our research with similar research at other sites,” Torn said. “This will allow us to speak a common language and have a broader impact.”
The ESS-DIVE team is composed of an interdisciplinary group of data scientists, digital librarians, and environmental scientists, as well the National Center for Ecological Analysis and Synthesis, a research center based at the University of California – Santa Barbara. Key Berkeley Lab personnel working on the project include Charuleka Varadharajan, Shreyas Cholia, Cory Snavely, Valerie Hendrix, Dan Gunter, and William Riley.