wgtrutta database description

DIASPARA WP3.2 working document

Technical analysis of the wgtrutta database
Author

Briand Cédric, Oliviero Jules, Helminen Jani

Published

28-12-2024

The following working document is a technical analysis of the WGTRUTTA templates. This document does not engage the WGTRUTTA, it’s to try to get how this works. The documents have been kindly provided by Iain Malcolm & Carlos Alexandre. The dataset is analysed in separation without trying to fit it in a common format for the different species (which will be the final product). This analysis doesn’t question the format of the excel table for a datacall (it’s probably best to ask for a wide format, this is what we do in eel). We are just trying to puzzle out how we would store this info, and thus gaining knowledge about the data structure for future discussions. This document is listed as a task there github link to diaspara

Part I : BIO-ECOLOGICAL DATA FOR SEA TROUT RIVERS ACROSS EUROPE

Regional site spatio-temporal units (table general information)

Table 1: General information table metadata in the WGTRUTTA DB
Variable.name Description
Site nº Order in which the site is inserted in the database
Country Country in which the site is located
River Basin Name of the main basin in which the site and respetive river are located
River Name Local name of the river in which the site is located
River ID Unique code attributed to the river (Country_River; e.g. PT_Mondego)
Site ID Unique code attributed to the site (Country_River_Number; e.g. PT_Mondego_#1)
Coordinates Latitute and Longitude of the site; WGS84 datum (Degrees, minutes and seconds format; e.g. 40°39'15.00'' N - 8°16'32.21'' W)
Data Provider Name of the researcher or entity providing the data
Time series available Year, or time series, for which the bioecological data is available for the site
Weblink Please add weblink, if available, with additional information about the river, sampling site or related projects
Comments Please provide additional comments on the data provided (e.g., catch methods and protocols used; information on sampling efficiency and other constraints; Please also indicate restrictions to data use, if they exist)
NOTE WGTRUTTA

Site n° is the primary key

QUESTION WGTRUTTA: River basin

Do you have specific frame for this, like in the Baltic, not all basins have Sea Trouts, so how do we proceed there ? Would there be regional units (quite large) like those of WGNAS or do you intend this to be specific river codes (which we intend to provide) soon in WP3.1 Note if we provide the code you can link it and hopefully get the river name for the referential

QUESTION WGTRUTTA: River ID

If you have a referential why do you need the country ? Because of the country providing the data and these will come as separate lines ?

NOTE WGTRUTTA

ESPG 4326 WGS84 is given as degree decimals like -7.522888 37.663168

Question ICES

Do you have a vocabulary of data providers to ICES. Do you need one. Then wouldn’t this make adding new data time demanding ? Should we leave this open ?

QUESTION WGTRUTTA: Time series available

Would you agree that if the database is built, a query to detailed table should give this information and that it would be tedious to maintain ?

QUESTION WGTRUTTA: comments

This information should be more specific ? We had this in WGEEL and have now to do additional datacalls to qualify the data in a harmonised way among series. Would you agree to have several columns there ?

G global general information sampling sampling table sampling id <site id> country method protocol comment public status site site table <site id> <river id> geom sampling:f1->site:f0 country country table <country id> sampling:f2->country:f0 river river table <river id> <basin id> geom site:f1->river:f0 basin basin table <basin id> geom river:f1->basin:f0

Graph: general information: A temptative table format for the general information.

The ordering of site > river > basin is straithforward. With proper postgis tables, Gis attributes (geometries) should allow constraint to check wheter a site is within the basin. A river (Mondego) or basin can belong to many countries, country is related to the <sampling id>. For this reason there is no link between site and country.

Population data

Table 2: Population data table metadata in the WGTRUTTA DB
Variable.name Description
Site nº Order in which the site is inserted in the database
Country Country in which the site is located
River Basin Name of the main basin in which the site and respetive river are located
River Name Local name of the river in which the site is located
River ID Unique code attributed to the river (Country_River; e.g. PT_Mondego)
Site ID Unique code attributed to the site (Country_River_Number; e.g. PT_Mondego_#1)
Sampling Year Year in which the provided information was collected
Sampling Season Season in whih the provided information was collected (Winter; Spring; Summer; Autumn)
Number of 0+ juveniles Absolute number of 0+ trout juveniles (parr) captured in a given sampling site and sampling campaign
Number of 1+ juveniles Absolute number of 1+ trout juveniles (parr) captured in a given sampling site and sampling campaign
Number of 2+ or older juveniles Absolute number of 2+ or older trout juveniles (parr) captured in a given sampling site and sampling campaign
Number of smolts Absolute number of trout smolts captured in a given sampling site and sampling campaign
Number of finnocks Absolute number of trout finnocks captured in a given sampling site and sampling campaign
Total number of adults Absolute number of trout adults captured in a given sampling site and sampling campaign
0+ Juvenile Density Density of 0+ trout juveniles (parr) captured in a given sampling site and sampling campaign; nº of fish/m2
1+ Juvenile Density Density of 1+ trout juveniles (parr) captured in a given sampling site and sampling campaign; nº of fish/m2
2+ or older Juvenile Density Density of 2+ or older trout juveniles (parr) captured in a given sampling site and sampling campaign; nº of fish/m2
0+ Juvenile Average Length Average total length of 0+ trout juveniles (parr) for a given sampling site and sampling campaign (mm)
1+ Juvenile Average Length Average total length of 1+ trout juveniles (parr) for a given sampling site and sampling campaign (mm)
2+ or older Juvenile Average Length Average total length of 2+ or older trout juveniles (parr) for a given sampling site and sampling campaign (mm)
0+ Juvenile Average Weight Average total weight of 0+ trout juveniles (parr) for a given sampling site and sampling campaign (g)
1+ Juvenile Average Weight Average total weight of 1+ trout juveniles (parr) for a given sampling site and sampling campaign (g)
2+ or older Juvenile Average Weight Average total weight of 2+ or older trout juveniles (parr) for a given sampling site and sampling campaign (g)
Smolt Average Length Average total length of trout smolts for a given sampling site and sampling campaign (mm)
Smolt Average Weight Average total weight of trout smolts for a given sampling site and sampling campaign (g)
Adult Average Length Average total length of trout adults for a given sampling site and sampling campaign (mm)
Adult Average Weight Average total weight of trout adults for a given sampling site and sampling campaign (g)
Average Sea Age Average age (nº of years spent at sea) of adult trouts returning to freshwaters for a given site and sampling campaign
Average River Age Average age (nº of years spent at river) of trouts captured at river (adults, smolts or juveniles) for a given site and sampling campaign
% Anadromous Trouts Proportion (%) of the fish identified as part of the anadromous ecotype for a given river and sampling campaign
% Resident Trouts Proportion (%) of the fish identified as part of the resident ecotype for a given river and sampling campaign
% Marine Survival (Finnock) Ratio (%) of the total number of 0+ sea age fish (finnock) recorded returning to freshwaters from the annual number of smolts recorded, in the same year, migrating downstream for a given site and sampling campaign
% Marine Survival (Total first return) Ratio (%) of the total number of maiden fish recorded returning to freshwaters for a given smolt cohort, site and sampling campaign
Total Ova Deposition Total nº of oocytes deposited by sea trout females estimated for a given river and sampling campaign (please provide details on the method of variable estimation at the comments section)
% Sea Lice (Total) Total proportion (%) of the trouts in a given sampling site and sampling campaign with any sign of sea lice infestation
% Sea Lice (Attached) Proportion (%) of the trouts in a given sampling site and sampling campaign with signs of sea lice infestation (attached sea lice)
% Sea Lice (Mobile) Proportion (%) of the trouts in a given sampling site and sampling campaign with signs of sea lice infestation (mobile sea lice)
% Sea Lice (Wounds) Proportion (%) of the trouts in a given sampling site and sampling campaign with signs of sea lice infestation (only sea lice wounds)
Other Occurring Salmonids Information about the presence of other salmonid species in a given sampling site
Comments Please provide additional comments on the data provided (e.g., catch methods and protocols used; information on how each variable was obtained and/or estimated; restrictions to data use, if they exist)

The structuring of the population data has in fact three different levels. It is obvious that the year / season creates a different population id (a unit for analysis). For this reason, it is separated from the rest, and creates the unicity constraint, this table also stores an eventual comment and info on other occurring salmonids. From this table two types of data would probably best be separated, it would make more sense to store together in a row level format all data pertaining to the same level of complexity, so some data will depend on stage, age and whether the trouts are resident or anadromous. This table is called popdetailmetric table in the diagram. Some other information are however related to the whole population popgroupmetrics table

G population population data popsample popsample table <population_id> <popsample_id> year season omments other occuring salmonids popdetailmetrics popdetailmetrics table <popsample_id> value datatype stage age anadromous popdetailmetrics:f0->popsample:f1 stage stage juvenile smolt finnock adult popdetailmetrics:f2->stage age age   river age sea age   popdetailmetrics:f1->age datatypedetail datatypedetail number density   length   weight popdetailmetrics:f3->datatypedetail popgroupmetrics popgroupmetrics table <popsample_id> value datatypegroupdetail popgroupmetrics:f0->popsample:f1 datatypegroup datatypegroup %anadromous %resident %Marine survival finnock Total Ova Deposition % Sea Lice (Total) % Sea Lice (Attached) % Sea Lice (Mobile) % Sea Lice (Wounds) popgroupmetrics:f1->datatypegroup

Graph:population table. Proposed structure for the population data. The content of referential values for datatype, stage, age is given to facilitate the understanding of the proposed table structure.

Individual data

Table 3: Individual data table metadata in the WGTRUTTA DB
Variable.name Description
Site nº Order in which the site is inserted in the database
Site ID Unique code attributed to the site (Country_River_Number; e.g. PT_Mondego_#1)
Sampling Year Year in which the provided information was collected
Sampling Season Season in whih the provided information was collected (Winter; Spring; Summer; Autumn)
Date of capture Specific date of capture for a fiven fish included in the databse (dd/mm/yyyy)
Fish ID Unique code attributed to each fish (River_Number; e.g. MOND#1)
Life Stage Life stage of the captured fish; Juvenile / Smolt / Adult
Habitat Habitat in which the fish was captured: Sea / Estuary / River
Length Total length of the fish (mm)
Weight Total weight of the fish (g)
Sex Sex of the fish; Male / Female / Undefined
River age Total number of years spent by the fish in freshwater
Sea age Total number of years spent by the fish at sea
Sea Lice Infestation Presence (Y) or absence (N) of signs of sea lice infestation in the fish. If Yes, please state the type of sign: attached sea lice; mobile sea lice; only sea lice wounds)
Catch Method Method used to catch/record the fish (e.g., electrofishing; trap/counter; rod cacth; nets)
Tagged (Y/N) Please state if the fish was subjected to any type of tagging procedure (Y/N)
Type(s) of Tags If Yes in the previous field, please identify the type(s) of tag(s) used in the fish (e.g., conventional; PIT tag; radio; acoustic)
Comments Please provide additional comments on the data provided (e.g., particularities of each fish and/or in its catch method; restrictions to data use, if they exist)

QUESTION WGTRUTTA: comments

Won’t the year and season attibutes be provided by the dates ?

QUESTION WGTRUTTA: site and habitats

What is the definition of a site. Site is the lowest level in the basin / river /site but seems the basins for the population structuring. Is that right ? I was just wondering about the relation between site and habitat, and whether a site if it’s small extended over several habitats.

QUESTION WGTRUTTA: public

Will the public status be managed at the fish level ? Shouldn’t it be related to an upper level ?

QUESTION WGTRUTTA: tag

Can there be several tag per fish ?

G individual Individual data indivtable individual table <fish_id> fish_idorigin <popsample_id> stage date x y catch method habitat comments stage stage juvenile smolt finnock adult indivtable:f2->stage popsample popsample indivtable:f1->popsample:f1 fishmetrics fishmetrics table <fish_id> metric value fishmetrics:f0->indivtable:f0 metric metric length weight sex river age sea age Sea Lice Infestation Sea Lice (Attached) Sea Lice (Mobile) Sea Lice (Wounds) fishmetrics:f1->metric:f0 tag tag table <fish_id> tag_type tag_code tag_actionvalue tag:f0->indivtable:f0 tagaction tagaction tag:f1->tagaction

Graph:individual table. Proposed structure for the population data. The content of referential values for datatype, stage, age is given to facilitate the understanding of the proposed table structure. Fish idorigin is the id in the orginal table (necessary to update the right fish when changes are made). More information about tags have been added. In the end it should just be few fishes. We’ve also added more details on sea lice, those that are at population level have been reported there. Normally everything in value should work as numeric, some values (sex) or sea Lice infestation can be recoded as zero or one.

Part II: ENVIRONMENTAL DATA FOR SEA TROUT RIVERS ACROSS EUROPE

Site level environmental data

Table 4: Site - env table metadata in the WGTRUTTA DB
column
site_no
country
river_basin
river_name
river_id
site_id
year_s
season
month
length
average_width
predominant_substrate_first
predominant_substrate_second
predominant_substrate_third
average_depth
pool
run
riffle
vegetation_extension
vegetation_predominant_type
large_woody_debris
riparian_gallery_density_left_bank
riparian_gallery_density_right_bank
shade
ths_width
ths_depth
ths_velocity
ths_substrate
ths_shade
ths_slope
estimated_ths
alcalinity
comments

QUESTION WGTRUTTA: what is a site ?

The main problem is to relate this information to site. What is a site ? Is it large, several segments, is it the scale of a subcatchment, does it correspond to an electrofishing site ? It seems, as you have reach level information, that the data at a site is maybe an average of data collected in different reaches. We will probably need more explanation.

QUESTION WGTRUTTA: what is a site ?

There is an information about the year season and month. Can there really be information about multiple years ? How do you relate that to month and season ? Do you expect to fill one or the other column ? Do you expect to have chronicles of habitat measures ? If it’s the case would it be possible to simply use a date here ?

Macroscale information

Table 5: Macroscale information columns in the WGTRUTTA DB
column
site_no
country
river_basin
river_name
river_id
site_id
distance_to_sea
distance_to_source
wetted_area
altitude
slope
nb_dam_downstream
nb_dam_upstream
regulated_flow
habitat_restoration
fisheries
restocking
comments

To my understanding, the macroscale information will be collected in a gis table of river segments. There will be several gis, possibly with segments coverage that will be different from the portion of habitat described at site level. This table will carry out the macroscale information and be used to transfer it.

Reach level habitat

Table 6: Reach level habitat columns in the WGTRUTTA DB
column
site_no
country
river_basin
river_name
river_id
site_id
year_s
season
month
predominant_substrate_first
predominant_substrate_second
predominant_substrate_third
pool
run
riffle
vegetation_extension
vegetation_predominant_type
large_woody_debris
riparian_gallery_density_left_bank
riparian_gallery_density_right_bank
shade
ths_width
ths_depth
ths_velocity
ths_substrate
ths_shade
ths_slope
estimated_ths
alcalinity
comments

QUESTION WGTRUTTA: what is a reach ?

Could the reach level habitat be characterized as a point geometry or is it a segment ? How do you relate reach and sites ?

G environmental Envionmental data global general information population population data individual Individual data sampling sampling table sampling id <site id> country method protocol comment public status site site table <site id> <river id> geom sampling:f1->site:f0 country country table <country id> sampling:f2->country:f0 segmentsite segment-site table <site id> <segment id> segment segment table <segment id> distance to sea distance to farthest source wetted area altitude slope nb dam downstream regulated flow habitat restoration fisheries restocking <river id> geom segmentsite:f1->segment:f0 reach reach table <reach id> <site id> geom reach:f1->site:f0 site:f0->segmentsite:f0 river river table <river id> <basin id> geom site:f1->river:f0 basin basin table <basin id> geom river:f1->basin:f0 popsample popsample table <population_id> popsample_id year season omments other occuring salmonids popsample:f0->sampling:f0 popdetailmetrics popdetailmetrics table <popsample_id> value datatype stage age anadromous popdetailmetrics:f0->popsample:f1 stage stage juvenile smolt finnock adult popdetailmetrics:f2->stage age age   river age sea age   popdetailmetrics:f1->age datatypedetail datatypedetail number density   length   weight popdetailmetrics:f3->datatypedetail popgroupmetrics popgroupmetrics table <popsample_id> value datatypegroupdetail popgroupmetrics:f0->popsample:f1 datatypegroup datatypegroup %anadromous %resident %Marine survival finnock Total Ova Deposition % Sea Lice (Total) % Sea Lice (Attached) % Sea Lice (Mobile) % Sea Lice (Wounds) popgroupmetrics:f1->datatypegroup indivtable individual table <fish_id> fish_idorigin <popsample_id> stage date x y catch method habitat comments indivtable:f1->popsample:f1 indivtable:f2->stage fishmetrics fishmetrics table <fish_id> metric value fishmetrics:f0->indivtable:f0 metric metric length weight sex river age sea age Sea Lice Infestation Sea Lice (Attached) Sea Lice (Mobile) Sea Lice (Wounds) fishmetrics:f1->metric:f0 tag tag table <fish_id> tag_type tag_code tag_actionvalue tag:f0->indivtable:f0 tagaction tagaction read remove apply     tag:f1->tagaction sitehabitat site habitat <site id> geom year month length average_width predominant_substrate_first predominant_substrate_second predominant_substrate_third average_depth pool run riffle vegetation_extension vegetation_predominant_type large_woody_debris riparian_gallery_density_left_bank riparian_gallery_density_right_bank shade ths_width ths_depth ths_velocity ths_substrate ths_shade ths_slope estimated_ths alcalinity comments sitehabitat:f0->site:f0 reachhabitat reach habitat <reach id> geom year month length average_width predominant_substrate_first predominant_substrate_second predominant_substrate_third average_depth pool run riffle vegetation_extension vegetation_predominant_type large_woody_debris riparian_gallery_density_left_bank riparian_gallery_density_right_bank shade ths_width ths_depth ths_velocity ths_substrate ths_shade ths_slope estimated_ths alcalinity comments

graph: A temptative table format for the WGTRUTTA datacall template.

 

EU is not reponsible for the content of the project