Search papers, labs, and topics across Lattice.
The ocean is rapidly changing through warming, sea-level rise, ocean acidification, declining oxygen concentrations, and altered circulation patterns, which are reshaping marine ecosystems worldwide. These changes necessitate a complete and holistic view of affected marine ecosystems. Long-term ocean monitoring programs are crucial for understanding these changes and guiding sustainable management. The California Cooperative Oceanic Fisheries Investigations (CalCOFI)*1, initiated in 1949, is one of the world's longest-running and most comprehensive marine ecosystem observing programs. CalCOFI collects extensive ocean data across all Essential Ocean Variables in the Southern California Current four times a year (Gallo et al. 2022). This encompasses over 36 physical and chemical parameters, 2,500 biological parameters, and 50,000 environmental DNA sequences, measured simultaneously (Satterthwaite et al. 2025). These data have illuminated long-term ecosystem patterns, supported marine management, and enhanced our understanding of how climate variability and change impact the California Current. Given its long history and collaborative nature, CalCOFI's data are fragmented across more than 30 disparate datasets, managed by different entities, served in various locations, with inconsistent naming and limited integration (Fig. 1), which hinders a holistic ecosystem understanding. Thus, CalCOFI exemplifies similar challenges and opportunities of mobilizing and integrating ocean data on a global scale. Our vision has been to modernize and build an integrated data system for CalCOFI that automates dataset ingestion, integration, and serving for timely use in research and management (Fig. 1). We are developing workflows to publish these datasets into portals promoting Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) data principles. Our initial efforts focused on two core CalCOFI datasets: Conductivity, Temperature, and Depth (CTD) bottle data and larval fish data to develop a workflow for easier access, integration, and use (Fig. 2). Data providers upload processed datasets into a shared Google Drive in simple formats (e.g., .csv), with globally unique identifiers and standardized naming conventions (Suppl. material 1). Data are automatically detected and integrated into a PostgreSQL database (Fig. 2). Newly added data trigger a system rebuild and validation, checking primary keys, spatial geometry, taxonomy, and metadata. Managers are notified of errors; otherwise, the system updates production databases, records a new version, and publishes a Github release with source files and a DOI via Zenodo. Original source data files are retained for past version reconstruction. While PostgreSQL maintains relationships between data, we also produce an analytical database in DuckDB for a single, portable database with fast querying, even on remote servers (Fig. 2). CalCOFI data and observations are accessible through multiple repositories and platforms, tools, interfaces, and portals based on user needs and funder requirements (Fig. 2; Suppl. material 1). For example, some tools support flexible analysis and visualization (e.g., NOAA Environmental Research Division's Data Access Program), others provide long-term archiving with standardized metadata (e.g., Environmental Data Initiative). We are also publishing biological observations in DarwinCore (Wieczorek et al. 2012) format with Extended Measurement or Fact and uploading them to the Ocean Biodiversity Information System (OBIS) so that CalCOFI data can be integrated with global biodiversity datasets worldwide. CalCOFI program information is on the Global Ocean Observing System Biology & Ecosystem Portal, and dataset records are structured for indexing by portals like Ocean Data Information System (ODIS) and Google Dataset Search. To facilitate data access and visualization, we also developed the CalCOFI Integrated Data Viewer*2 (Fig. 3). This web app allows users to explore decades of CalCOFI oceanographic and larval fish data across space, time, and depth. We plan to integrate additional CalCOFI datasets and collaborate with other data providers to build a more complete ecosystem-level view of the California Current. Our ultimate goal is to enhance CalCOFI data findability, integration, and usability for scientific and management communities, enabling a better understanding of ocean changes and ensuring a sustainable future for marine ecosystems and human communities.