Design an international database of habitat of migratory fish, version = final
Author
Oliviéro Jules, Briand Cédric, Helminen Jani
Published
30-01-2026
Diadromous species use both Marine and continental environments. They are subjected to multiple stressor acting locally or at the global scale, many of them acting in continental waters. Currently ICES has designed its databases for the marine environment, and we need an habitat referential to better account for stressor (dam, connectivity, pollution, fishery) acting at the very local scale in the continental environment. The objective is to propose a referential database of habitat that could be used as a vocabulary for habitat based on the HydroSHEDS database and covering all continental freshwater and transitional habitats across Europe, but also to provide a referential for units from the river, to the wider stock level, including the marine environment, and adapted to the needs of the diadromous expert groups (WGNAS Working group on Atlantic Salmon, WGBAST Working group on Baltic Salmon, WKTRUTTA Working Group to Develop and Test Assessment Methods for Sea Trout Populations , WGEEL Working group on eel).
River network choice
To create the habitat database we decided to use a single river network. This ensures a consistent network across countries, with the similar spatial resolution and attributes. We decided to use the Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales (HydroSHEDS).
HydroSHEDS
Data description
HydroSHEDS is a global database providing high-resolution hydrographic data derived from NASA’s Shuttle Radar Topography Mission (SRTM). Covering most land areas, it offers structured datasets for hydrological modeling, watershed analysis, and environmental research. The data is available at resolutions up to 3 arc-seconds (~90m) in raster (GeoTIFF) and vector (shapefiles, geodatabases) formats. We use the geodatabases format. It includes river networks, watershed boundaries, drainage directions, and flow accumulation. Core products include HydroBASINS for watershed boundaries, HydroRIVERS for river networks, HydroLAKES for lakes and reservoirs, and HydroATLAS, which adds environmental attributes.
Descriptions of riversegments variables
Code
data_description <-dbGetQuery(con_diaspara, "SELECT cols.column_name AS var_name, pgd.description AS description FROM information_schema.columns cols LEFT JOIN pg_catalog.pg_statio_all_tables st ON st.schemaname = cols.table_schema AND st.relname = cols.table_name LEFT JOIN pg_catalog.pg_description pgd ON pgd.objoid = st.relid AND pgd.objsubid = cols.ordinal_position WHERE cols.table_schema = 'riveratlas' AND cols.table_name = 'riveratlas_v10';")knitr::kable(data_description) %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))
Variables description
var_name
description
NA
NA
:--------
:-----------
Importing HydroSHEDS
The first thing is to download and process the hydrological network from the HydroSHEDS website. To do so we create a PowerShell script that iterates through predefined file IDs ($files) and corresponding dataset types ($atlas), constructing download URLs and filenames dynamically. The script navigates to the source directory, downloads each dataset as a ZIP file using curl, extracts it to a specified output directory, and then connects to the PostgreSQL database (diaspara). Within the database, it ensures a clean schema by dropping any existing schema of the same name and recreating it for fresh data import.
PowerShell code to download HydroSHEDS
# launch in powershell# Basin 20082137# River 20087321# Lake 35959544[String[]]$files="20082137","20087321","35959544"[String[]]$atlas="Basin","River","Lake"$pathccmsource="D:\eda\hydroatlas"$pathccmout="D:\eda\"for($i=0;$i-lt$files.Length;$i++){$file=$files[$i]$atlasName=$atlas[$i]Write-Output"Downloading "$atlasName""ATLAS""$namefile="$atlasName"+"ATLAS_Data_v10.gdb"$schema="$atlasName"+"ATLAS"cd$pathccmsourcecurl-o "$namefile.zip""https://figshare.com/ndownloader/files/$file" Expand-Archive "$namefile.zip"-DestinationPath "$pathccmout" psql --dbname="postgresql://${env:userlocal}:${env:passlocal}@$env:hostdiaspara/diaspara"-c "DROP SCHEMA IF EXISTS $schema CASCADE; CREATE SCHEMA $schema;"}
The same process is used to import a country layer and ICES divisions. ICES divisions are downloaded from the ICES website.
Inserting data
Then we insert the downloaded data to the database. To do so we use ogr2ogr, a command-line tool from the GDAL library, to import geospatial data from File Geodatabases (GDB) into a PostgreSQL database. Each command processes a different dataset—BasinATLAS, RiverATLAS, and LakeATLAS—and loads them into corresponding schemas within the DIASPARA database.
The -f "PostgreSQL" specifies the output format, while -a_srs EPSG:4326 ensures the spatial reference system is set to WGS 84 (EPSG:4326). The -lco SCHEMA="..." option places each dataset into a designated schema (basinatlas, riveratlas, lakeatlas). The -overwrite flag ensures any existing data in the schema is replaced. The -progress option provides real-time feedback during execution. Lastly, --config PG_USE_COPY YES optimizes performance by using PostgreSQL’s COPY command for faster data insertion.
The same process is used to insert other downloaded data to the database.
From a hydroshed spatial layers to a diadromous habitat db
Once the data are downloaded we choose to to split the database into smaller groups following ICES Areas and Ecoregions.
Step 1 : Spliting data into smaller groups for efficiency (Europe/N.Africa)
One of the objective of the habitat database is to allow to link any habitat (e.g. any river reaches, lake etc.) with any spatial/bioligcal units used by expert groups when analysing the population dynamics and assessing the species (e.g. stock unit, eel management unit, ecoregion, countries, sea basin). In this context, the main task was to build this association between HydroSHEDS spatial features of continental habitats with spatial units used by WGBAST, WGEEL and WGNAS. Once the data are downloaded we first split them into smaller groups following ICES Areas and Ecoregions.
Step 1 : Spliting data into smaller groups for computational efficiency (Europe/N.Africa)
European hydro spatial features are extracted by first selecting large-scale catchments from basinatlas.basinatlas_v10_lev03, a layer representing coarse-resolution catchments at level 3. We filtered these catchments based on specific hybas_id values and stored them in a temporary table tempo.hydro_large_catchments_europe. Next, we refine our selection by extracting smaller, more detailed catchments from basinatlas.basinatlas_v10_lev12, which represents the most detailed level (level 12) of the HydroBASINS hierarchy. We ensure that these smaller catchments are spatially contained within the large catchments using ST_Within, saving them in tempo.hydro_small_catchments_europe, and add a GIST index on geometries to improve spatial query efficiency. Finally, we select river segments from riveratlas.riveratlas_v10 that intersect with the chosen small catchments using ST_Intersects, storing the results in tempo.hydro_riversegments_europe.
Code to split data into smaller entities
-- Selecting european data from hydroatlas (large catchments)DROPTABLEIFEXISTS tempo.hydro_large_catchments_europe;CREATETABLE tempo.hydro_large_catchments_europe AS(SELECT shape FROM basinatlas.basinatlas_v10_lev03WHERE hybas_id =ANY(ARRAY[2030000010,2030003440,2030005690,2030006590,2030006600,2030007930,2030007940,2030008490,2030008500,2030009230,2030012730,2030014550,2030016230,2030018240,2030020320,2030024230,2030026030,2030028220,2030030090,2030033480,2030037990,2030041390,2030045150,2030046500,2030047500,2030048590,2030048790,2030054200,2030056770,2030057170,2030059370,2030059450,2030059500,2030068680,2030059510]));--35-- Selecting european data from hydroatlas (small catchments)DROPTABLEIFEXISTS tempo.hydro_small_catchments_europe;CREATETABLE tempo.hydro_small_catchments_europe AS (SELECT*FROM basinatlas.basinatlas_v10_lev12 baWHEREEXISTS (SELECT1FROM tempo.hydro_large_catchments_europe hlceWHERE ST_Within(ba.shape,hlce.shape) ));--78055CREATEINDEX idx_tempo_hydro_small_catchments_europe ON tempo.hydro_small_catchments_europe USING GIST(shape);-- Selecting european data from riveratlasDROPTABLEIFEXISTS tempo.hydro_riversegments_europe;CREATETABLE tempo.hydro_riversegments_europe AS(SELECT*FROM riveratlas.riveratlas_v10 rWHEREEXISTS (SELECT1FROM tempo.hydro_small_catchments_europe eWHERE ST_Intersects(r.geom,e.shape) )); --589947
The same method is used to select catchments and river segments in the Southern Mediterranean area.
Figure 1: Map of all catchments present in the database.
Step 2 : Selecting the most downstream riversegment for each reach
Next to link river basins with sea basins we need to extract the most downstream river segments from tempo.hydro_riversegments_europe. To achieve this, we filter the table to retain only the river segments where hyriv_id is equal to main_riv, which corresponds to the most downstream river segment of the river basin, ensuring that for each reach, only its most downstream segment is selected. The results are then stored in tempo.riveratlas_mds.
Code to select the most downstream river segments
DROPTABLEIFEXISTS tempo.riveratlas_mds;CREATETABLE tempo.riveratlas_mds AS (SELECT*FROM tempo.hydro_riversegments_europeWHERE hydro_riversegments_europe.hyriv_id = hydro_riversegments_europe.main_riv); --17344
The same method has been used for Southern Mediterranean data.
Figure 2: Map of most downstream river segments in the Baltic area.
Step 3 : Creating a most downstream point
To identify the most downstream point for each river segment in tempo.riveratlas_mds, a new column, downstream_point, is added to store the point geometry (EPSG:4326). The last coordinate of each segment’s geometry was extracted using ST_PointN on the cut down line geometry from ST_Dump, ensuring that the most downstream point is correctly identified. The table is then updated by assigning the extracted downstream point to the corresponding hyriv_id. Finally, a GIST index on downstream_point is created to improve the efficiency of spatial queries.
Code to create the most downstream point
ALTERTABLE tempo.riveratlas_mdsADDCOLUMN downstream_point geometry(Point, 4326);WITH downstream_points AS (SELECT hyriv_id, ST_PointN((ST_Dump(geom)).geom, ST_NumPoints((ST_Dump(geom)).geom)) AS downstream_pointFROM tempo.riveratlas_mds)UPDATE tempo.riveratlas_mds AS tSET downstream_point = dp.downstream_pointFROM downstream_points AS dpWHERE t.hyriv_id = dp.hyriv_id; --17344CREATEINDEX idx_tempo_riveratlas_mds_dwnstrm ON tempo.riveratlas_mds USING GIST(downstream_point);
The same method is used for Southern Mediterranean data.