WGNAS database description

DIASPARA WP3.2 working document

Technical analysis of the salmoglob database
Author

Briand Cédric, Oliviero Jules, Helminen Jani

Published

Last Updated on 16-04-2025

The following working document is just a technical analysis of the WGNAS database (and further its graphical interface) (ICES 2024). It uses different sources, ICES vocabulary, the stock annex (ICES 2021), to analyse the structure of the wgnas database before integrating in a single database (wgnas, wgbast, wgeel) in the DIASPARA project. This document does not engage the WGNAS it’s just a technical analysis, to try to get how this works. This document is listed as a task there : https://github.com/DIASPARAproject/WP3_migdb/issues/11

Main database structure

When loading the first lines of the db, it really looks as following :

database10 <- DBI::dbGetQuery(con_salmoglob, 'SELECT * FROM public.database limit 20')
knitr::kable(database10) %>% kable_styling(bootstrap_options = c("striped","hover", "condensed"))
Table 1: Structure of the WGNAS main database
version year type age area location metric value var_mod date_time
2 NA Conservation limits _ coun_England_Wales _ Mean 209010700.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Finland _ Mean 104278220.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_France _ Mean 55156800.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Gulf _ Mean 248680000.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Iceland_NE _ Mean 23889096.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Iceland_SW _ Mean 51693269.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Ireland _ Mean 710711690.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Labrador _ Mean 243660000.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Newfoundland _ Mean 267780000.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Northern_Ireland _ Mean 93800000.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Norway _ Mean 444064979.75 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Quebec _ Mean 50380000.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Russia _ Mean 357856550.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Scotia Fundy _ Mean 224140000.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Scotland _ Mean 561073322.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_Sweden _ Mean 13997100.00 cons_lim 2022-08-18 08:37:30
2 NA Conservation limits _ coun_US _ Mean 435369000.00 cons_lim 2022-08-18 08:37:30
1 NA Demographic transitions Mixed Atlantic _ Coefficient of variation 0.01 CV_dummy 2020-11-24 18:32:04
1 NA Homewater catches Mixed Atlantic _ Coefficient of variation 0.05 CV_hw 2020-11-24 18:32:04
1 NA Natural mortality rate Mixed Atlantic _ Coefficient of variation 0.03 CV_M 2020-11-24 18:32:04

Area

Figure 1: Phylogeographic structuring observed across studies in Atlantic salmon resolved by screening of phylogenetically informative nuclear microsatellite and SNP variation.

We first analyse the types of area that are found in the database. These are the possible geographical locations.

From (Rivot et al. (2021)) : > The model considers the dynamics of 25 SU (subscript r = 1, … , N with N=25) (Fig. 1):
> * 6 SU from NA CSG, indexed by r = 1, …, 6: 1 = Newfoundland, 2 = Gulf, 3 = Scotia-Fundy, 4 = USA, 5 = Quebec and 6 = Labrador;
> * 8 SU from the SE CSG, indexed by r = 7, …, 14: 7 = France, 8 = UK England and Wales, 9 = Ireland, 10 = UK Northern Ireland - FO, 11 = UK Northern Ireland - FB, 12 = UK Scotland East, 13 = UK Scotland West, 14 = Iceland South-West;
> * 11 SU from NE CSG, indexed by r= 15, …, 25: 15 = Iceland North-East, 16 = Sweden, 17 = Norway South-East, 18 = Norway South-West, 19 = Norway Middle, 20 = Norway North, 21 = Finland, 22 = Russia Kola Barents, 23 = Russia Kola White Sea, 24 = Russia Arkhangelsk Karelia and 25 = Russia River Pechora. SU are defined on the basis of freshwater areas. All salmon within a SU are assumed to have the same demographic parameters and to undertake a similar migration route at sea. Note that Germany and Spain (SE CSG) are not considered at this stage because no complete series of data are provided to ICES WGNAS by these two jurisdictions.

Code to find different values in column area
area <- DBI::dbGetQuery(con_salmoglob, 'SELECT DISTINCT area FROM public.database')
knitr::kable(area[grep("fishery",area$area),,drop=FALSE], caption = "Area corresponding to fisheries", row.names=FALSE)
knitr::kable(area[grep("coun",area$area),,drop=FALSE], caption = "Area corresponding to countries", row.names=FALSE)
knitr::kable(area[!grepl("coun",area$area) & !grepl("fishery",area$area),,drop=FALSE], caption = "Area corresponding to source area", row.names=FALSE)
Table 2: Area types
(a) Fisheries
area
LB fishery
LB/SPM/swNF fishery
neNF fishery
FAR fishery
GLD fishery
(b) Countries
area
coun_Gulf
coun_Ireland
coun_Finland
coun_Scotland
coun_Russia
coun_Norway
coun_Northern_Ireland
coun_Iceland_SW
coun_Iceland_NE
coun_US
coun_Scotia Fundy
coun_England_Wales
coun_Labrador
coun_Newfoundland
coun_Sweden
coun_Quebec
coun_France
(c) Source area
area
GF
NO_SW
SW
NEC
NAC
NO_NO
US
RU_RP
FR
IC_NE
IR
Atlantic
QC
NI_FO
FI
RU_KB
NF
IC_SW
NO_SE
SC_EA
NO_MI
LB
RU_AK
EW
SF
SC_WE
NI_FB
RU_KW

The latter table (Table 2 (c)) is by default, so Atlantic, NEC and NAC are included. Having a label in a reference table and a geom to associate to those labels would be very usefull.

Location

Locations corresponding to neither fisheries nor areas

Code to analyse the data in location
location <- DBI::dbGetQuery(con_salmoglob, 'SELECT DISTINCT location FROM public.database')
knitr::kable(location[grep("fishe",tolower(location$location)),,drop=FALSE],  row.names=FALSE)
location_biz <- data.frame("location"=setdiff(location[!grepl("fishe",tolower(location$location)),],
  area[!grepl("coun",area$area) & !grepl("fishery",area$area),]))
knitr::kable(location_biz, caption = , row.names=FALSE)  
location_area <- data.frame("location"=intersect(location[!grepl("fishe",location$location),],
  area[!grepl("coun",area$area) & !grepl("fishery",area$area),]))
knitr::kable(location_area, row.names=FALSE)
Table 3: Locations
(a) Location corresponding to fisheries
location
Bef. Fisheries
Bef. Gld fisheries
Aft. First fisheries
Aft. Second fisheries
Aft. Gld fisheries
Return aft. Second fishery
Gld fisheries
Return aft. First fishery
(b) These one are neither fishery nor area
location
GLD - by NAC SU
_
FAR - by SU
GLD - by NEC SU
delayed spawners
additional
GLD - by cplx
(c) Location corresponding to area
location
NO_SW
SW
NO_NO
US
RU_RP
FR
IC_NE
IR
QC
NI_FO
FI
RU_KB
NF
IC_SW
NO_SE
SC_EA
NO_MI
LB
RU_AK
EW
SF
SC_WE
NI_FB
RU_KW
GF

Here we search for locations related to fisheries.

QUESTION TO WGNAS

  • Is there some gis reference for these fishing areas
  • Would it possible to use ICES areas there or would this reporting correspond to different boundaries ?

Locations corresponding to fisheries

code to list the parameters with location not being area nor fishery
for (i in 1 : nrow(location_biz)){
  location_bizi <- location_biz$location[i]
  var_biz <- DBI::dbGetQuery(con_salmoglob, glue::glue_sql("SELECT distinct(var_mod) FROM public.database 
where location in ({location_bizi})", .con=con_salmoglob))
colnames(var_biz) <- location_bizi
print(kable(var_biz[,,drop=FALSE], row.names=FALSE))
}
Table 4: List of parameters corresponding to neither fishery nor area (corresponding to weird table…).
GLD - by NAC SU
p_C8_2_NAC_gld_mu
_
theta6_surv
log_N9_sd
N
nSm
CV_N1_pr
CV_M
log_N6_sd
theta6_delSp
p_smolt_pr
min_log_N9
Stocking_2SW
date_begin
date_end_hindcast
log_C6_mu
p_smolt
log_N6_mu
eggs
CV_theta1
N_NEC
log_C6_sd
max_log_N9
CV_dummy
log_C9_sd
log_C9_mu
log_N9_mu
E_M
theta9_surv
E_theta1_pr
cons_lim
prop_female
E_theta1
N_Sample
CV_hw
N_NAC
mu_N1_pr
date_end
theta9_delSp
N_Sample_sm
FAR - by SU
p_C8_NEC_3_far_mu
p_C5_NEC_1_far_mu
p_C8_NEC_1_far_mu
GLD - by NEC SU
p_C8_2_NEC_gld_mu
delayed spawners
log_C9_delSp_sd
log_C6_delSp_mu
log_C6_delSp_sd
log_C9_delSp_mu
additional
log_C9_sup_mu
log_C9_sup_sd
log_C6_sup_sd
log_C6_sup_mu
GLD - by cplx
p_C8_2_NECNAC_gld_mu

Some values have been deprecated and are present in the database archive but not in the main database.

Code to find what is only in database_archive.
database <- DBI::dbGetQuery(con_salmoglob, "SELECT * FROM public.database")
database_archive <- DBI::dbGetQuery(con_salmoglob, "SELECT * FROM public.database_archive")

# List of commun columns
cols <- c("version", "year", "type", "age", "area", "location", "metric", "var_mod")

# getting values that are not present in database anymore
for (col in cols) {
  diff_values <- setdiff(database_archive[[col]], database[[col]])
  if (length(diff_values) > 0) {
    cat("Values present in database_archive but not in database for", col, "column :\n")
    print(diff_values)
  }
}
Values present in database_archive but not in database for version column :
[1] 3 4
Values present in database_archive but not in database for year column :
[1] 51 52
Values present in database_archive but not in database for area column :
[1] "NEAC"
Values present in database_archive but not in database for location column :
[1] "main"
Code to get the number of location per model variable
result <- database %>%
  group_by(var_mod) %>%
  summarise(location_count = n_distinct(location))
knitr::kable(result) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Table 5: Number of locations per model variables
var_mod location_count
CV_M 1
CV_N1_pr 1
CV_dummy 1
CV_hw 1
CV_theta1 1
E_M 1
E_theta1 1
E_theta1_pr 1
N 1
N_NAC 1
N_NEC 1
N_Sample 1
N_Sample_sm 1
Stocking_2SW 1
cons_lim 1
date_begin 1
date_end 1
date_end_hindcast 1
deltat5_1 1
deltat5_2 1
deltat8_1 1
deltat8_2 1
deltat8_2_1 1
deltat8_2_2 1
eggs 1
log_C5_NAC_1_lbnf_mu 1
log_C5_NAC_1_lbnf_sd 1
log_C5_NAC_2_lbnf_lab_mu 1
log_C5_NAC_2_lbnf_lab_sd 1
log_C5_NAC_2_lbnf_oth_mu 1
log_C5_NAC_2_lbnf_oth_sd 1
log_C5_NEC_1_far_mu 1
log_C5_NEC_1_far_sd 1
log_C6_delSp_mu 1
log_C6_delSp_sd 1
log_C6_mu 1
log_C6_sd 1
log_C6_sup_mu 1
log_C6_sup_sd 1
log_C8_2_gld_tot_mu 1
log_C8_2_gld_tot_sd 1
log_C8_NAC_1_lbnf_mu 1
log_C8_NAC_1_lbnf_sd 1
log_C8_NAC_3_lbnf_mu 1
log_C8_NAC_3_lbnf_sd 1
log_C8_NAC_4_lbnf_lab_mu 1
log_C8_NAC_4_lbnf_lab_sd 1
log_C8_NAC_4_lbnf_oth_mu 1
log_C8_NAC_4_lbnf_oth_sd 1
log_C8_NEC_1_far_mu 1
log_C8_NEC_1_far_sd 1
log_C8_NEC_3_far_mu 1
log_C8_NEC_3_far_sd 1
log_C9_delSp_mu 1
log_C9_delSp_sd 1
log_C9_mu 1
log_C9_sd 1
log_C9_sup_mu 1
log_C9_sup_sd 1
log_N6_mu 1
log_N6_sd 1
log_N9_mu 1
log_N9_sd 1
max_log_N9 1
min_log_N9 1
mu_N1_pr 1
nSm 1
omega 25
p_C5_NEC_1_far_mu 1
p_C8_2_NAC_gld_mu 1
p_C8_2_NECNAC_gld_mu 1
p_C8_2_NEC_gld_mu 1
p_C8_NEC_1_far_mu 1
p_C8_NEC_3_far_mu 1
p_smolt 1
p_smolt_pr 1
prop_female 1
theta6_delSp 1
theta6_surv 1
theta9_delSp 1
theta9_surv 1

So in the end, only omega has 25 locations, and in this case the location corresponds to area. This means we can get rid of the location column. For omega we can keep it,; so we can add a column with a foreign key to area in the parameters table where priors will be stored.

Sea age

Salmon mature at various sea ages, typically returning to freshwater to spawn after one to three years at sea, but also sometimes at older sea ages; this varies widely between populations. Those salmon that return after one year at sea are referred to as one-sea-winter (1SW) salmon, or grilse, with older fish categorised as 2SW, 3SW, etc. In practice, however, for management purposes these older sea age fish are typically aggregated and collectively referred to as multi-sea-winter (MSW) salmon. The sea age when salmon become sexually mature depends on genetics as well as growing conditions in the sea, and possibly freshwater, although the precise proximate factors initiating homeward migration are unknown (Hansen and Quinn, 1998). The sea age of Atlantic salmon is important in the context of stock definition since these different groups of fish have different migration routes, return at different times and are differentially exploited in fisheries. Thus, for example, it is only potential MSW salmon that are exploited in the distant water salmon fishery that operates off the west coast of Greenland (ICES 2021).

Code to describe the content of age column
summary_age <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) N, age FROM public.database group by age')%>% arrange(age)
knitr::kable(summary_age, caption = "Number age data categorized by age") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
summary_age_parm <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) N, age, var_mod FROM public.database group by age, var_mod order by age, var_mod') 
knitr::kable(summary_age_parm %>% filter(age=="_"), caption = "Number age data categorized by age, parameters without age, n is the number of lines in the db") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
knitr::kable(summary_age_parm %>% filter(age!="_"), caption = "Number age data categorized by age, n is the number of lines in the db") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
distinct_age_parm <- DBI::dbGetQuery(con_salmoglob, 
  "WITH t1 AS (
  SELECT age, 
  var_mod 
  FROM public.database 
  group by age, var_mod 
  order by age, var_mod ),
  t2 AS (
  SELECT age as age2, 
  var_mod as var_mod2
  FROM public.database 
  group by age, var_mod 
  order by age, var_mod)
  SELECT age, age2, var_mod from t1 JOIN t2 on t1.var_mod= t2.var_mod2 
  where age!=age2
  ")
knitr::kable(distinct_age_parm, caption = "Variables using more than 1 age (e.g. from which the age cannot be derived from the variable itself.)") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Table 6: Age structure in the database
(a) Number age data categorized by age
n age
26 0SW
1525 1FW
14687 1SW
212 1SW immature
212 1SW mature
1007 1SW maturing
1007 1SW not maturing
1325 1SW post-return
1525 2FW
18868 2SW
1325 2SW post-return
1525 3FW
1525 4FW
1525 5FW
1525 6FW
57 Mixed
649 _
1551 eggs
(b) Number age data categorized by age, parameters without age, n is the number of lines in the db
n age var_mod
17 _ cons_lim
1 _ date_begin
1 _ date_end
1 _ date_end_hindcast
1 _ N
1 _ N_NAC
1 _ N_NEC
1 _ nSm
625 _ omega
(c) Number age data categorized by age, n is the number of lines in the db
n age var_mod
25 0SW mu_N1_pr
1 0SW N_Sample_sm
1325 1FW p_smolt
200 1FW p_smolt_pr
25 1SW deltat5_1
25 1SW deltat5_2
1325 1SW eggs
53 1SW log_C5_NAC_2_lbnf_lab_mu
53 1SW log_C5_NAC_2_lbnf_lab_sd
53 1SW log_C5_NAC_2_lbnf_oth_mu
53 1SW log_C5_NAC_2_lbnf_oth_sd
1325 1SW log_C6_delSp_mu
1325 1SW log_C6_delSp_sd
1325 1SW log_C6_mu
1325 1SW log_C6_sd
1250 1SW log_C6_sup_mu
1250 1SW log_C6_sup_sd
1325 1SW log_N6_mu
1325 1SW log_N6_sd
1325 1SW prop_female
1325 1SW theta6_delSp
53 1SW immature log_C8_NAC_1_lbnf_mu
53 1SW immature log_C8_NAC_1_lbnf_sd
53 1SW immature log_C8_NEC_1_far_mu
53 1SW immature log_C8_NEC_1_far_sd
53 1SW mature log_C5_NAC_1_lbnf_mu
53 1SW mature log_C5_NAC_1_lbnf_sd
53 1SW mature log_C5_NEC_1_far_mu
53 1SW mature log_C5_NEC_1_far_sd
1007 1SW maturing p_C5_NEC_1_far_mu
1007 1SW not maturing p_C8_NEC_1_far_mu
1325 1SW post-return theta6_surv
1325 2FW p_smolt
200 2FW p_smolt_pr
25 2SW deltat8_1
25 2SW deltat8_2
25 2SW deltat8_2_1
25 2SW deltat8_2_2
1325 2SW eggs
53 2SW log_C8_2_gld_tot_mu
53 2SW log_C8_2_gld_tot_sd
53 2SW log_C8_NAC_3_lbnf_mu
53 2SW log_C8_NAC_3_lbnf_sd
53 2SW log_C8_NAC_4_lbnf_lab_mu
53 2SW log_C8_NAC_4_lbnf_lab_sd
53 2SW log_C8_NAC_4_lbnf_oth_mu
53 2SW log_C8_NAC_4_lbnf_oth_sd
53 2SW log_C8_NEC_3_far_mu
53 2SW log_C8_NEC_3_far_sd
1325 2SW log_C9_delSp_mu
1325 2SW log_C9_delSp_sd
1325 2SW log_C9_mu
1325 2SW log_C9_sd
1250 2SW log_C9_sup_mu
1250 2SW log_C9_sup_sd
1325 2SW log_N9_mu
1325 2SW log_N9_sd
25 2SW max_log_N9
25 2SW min_log_N9
318 2SW p_C8_2_NAC_gld_mu
1007 2SW p_C8_2_NEC_gld_mu
106 2SW p_C8_2_NECNAC_gld_mu
1007 2SW p_C8_NEC_3_far_mu
1325 2SW prop_female
1325 2SW Stocking_2SW
1325 2SW theta9_delSp
1325 2SW post-return theta9_surv
1325 3FW p_smolt
200 3FW p_smolt_pr
1325 4FW p_smolt
200 4FW p_smolt_pr
1325 5FW p_smolt
200 5FW p_smolt_pr
1325 6FW p_smolt
200 6FW p_smolt_pr
1 eggs CV_N1_pr
25 eggs CV_theta1
1325 eggs E_theta1
200 eggs E_theta1_pr
1 Mixed CV_dummy
1 Mixed CV_hw
1 Mixed CV_M
1 Mixed E_M
53 Mixed N_Sample
(d) Variables using more than 1 age (e.g. from which the age cannot be derived from the variable itself.)
age age2 var_mod
1SW 2SW eggs
2SW 1SW eggs
2FW 6FW p_smolt
2FW 1FW p_smolt
2FW 3FW p_smolt
2FW 4FW p_smolt
2FW 5FW p_smolt
6FW 2FW p_smolt
6FW 1FW p_smolt
6FW 3FW p_smolt
6FW 4FW p_smolt
6FW 5FW p_smolt
1FW 2FW p_smolt
1FW 6FW p_smolt
1FW 3FW p_smolt
1FW 4FW p_smolt
1FW 5FW p_smolt
3FW 2FW p_smolt
3FW 6FW p_smolt
3FW 1FW p_smolt
3FW 4FW p_smolt
3FW 5FW p_smolt
4FW 2FW p_smolt
4FW 6FW p_smolt
4FW 1FW p_smolt
4FW 3FW p_smolt
4FW 5FW p_smolt
5FW 2FW p_smolt
5FW 6FW p_smolt
5FW 1FW p_smolt
5FW 3FW p_smolt
5FW 4FW p_smolt
1FW 2FW p_smolt_pr
1FW 3FW p_smolt_pr
1FW 4FW p_smolt_pr
1FW 5FW p_smolt_pr
1FW 6FW p_smolt_pr
2FW 1FW p_smolt_pr
2FW 3FW p_smolt_pr
2FW 4FW p_smolt_pr
2FW 5FW p_smolt_pr
2FW 6FW p_smolt_pr
3FW 1FW p_smolt_pr
3FW 2FW p_smolt_pr
3FW 4FW p_smolt_pr
3FW 5FW p_smolt_pr
3FW 6FW p_smolt_pr
4FW 1FW p_smolt_pr
4FW 2FW p_smolt_pr
4FW 3FW p_smolt_pr
4FW 5FW p_smolt_pr
4FW 6FW p_smolt_pr
5FW 1FW p_smolt_pr
5FW 2FW p_smolt_pr
5FW 3FW p_smolt_pr
5FW 4FW p_smolt_pr
5FW 6FW p_smolt_pr
6FW 1FW p_smolt_pr
6FW 2FW p_smolt_pr
6FW 3FW p_smolt_pr
6FW 4FW p_smolt_pr
6FW 5FW p_smolt_pr
2SW 1SW prop_female
1SW 2SW prop_female
NOTE

Only eggs, p_smolt, p_smolt_pr and prop_female need an age, the other is contained in the name of the variable.

NOTE

So the age also covers information on maturity but the only time it is really used outside from the metadata (eggs, smolt, prop_female) the maturity is not necessary.

QUESTION TO WGNAS

  • Why is omega not mixed ?
  • Would it not make more sense to store 2SW as MSW (multi-sea-winter) ? So in the referential have 1SW 2SW 3SW … how many ? and then MSW ?

Developmental stage

Code to extract count per life stage in metadata
summary_lifestage_metadata <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) as  n,  life_stage FROM public.metadata group by life_stage')%>% arrange(life_stage)
knitr::kable(summary_lifestage_metadata, format = "markdown") %>%
 kable_material(c("striped", "hover")) |>
  kbl() 
Table 7: Number of lines with lifestage in the metadata table
x
<table class=" lightable-material lightable-striped lightable-hover" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> n </th> <th style="text-align:left;"> life_stage </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 103 </td> <td style="text-align:left;"> Adult </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Adults </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Eggs </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Multiple </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Non mature </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> PFA </td> </tr> <tr> <td style="text-align:right;"> 20 </td> <td style="text-align:left;"> Smolts </td> </tr> <tr> <td style="text-align:right;"> 21 </td> <td style="text-align:left;"> Spawners </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> _ </td> </tr> </tbody> </table>
NOTE

get rid of Adults replace with Adult.

NOTE

get rid of Multiple there is only one instance.

NOTE

Now there is the PFA, which is fine, but not really a stage.

We first start by looking at the possible reference tables in ICES corresponding to both maturity and stage. Currently we have to store information about the sea age (0SW, 1SW, 2SW, ...,MSW) , the freshwater age (1FW 2FW 3FW 4FW ...). The maturity could be stored in the metadata as it seems to be linked always with parameters. Currently in wgnas we have maturity : immature, mature, not maturing and post return. In the metadata we have egg, smolts, spawner, adults

QUESTION TO WGNAS

  • Is the correspondence mature / maturing correct or is it pre -spawing?
  • Is the correspondence OK. Do we really need PFA stage or could we just describe it in the metadata ?

QUESTION TO ICES

Could we add smolt and PFA (pre fisheries abundance) somewhere ?

Metric

The metric are linked with coefficients, they don’t need to be stored in the main db

Code to describe the metrics
summary_metric <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) as  n,  metric FROM public.database group by metric')%>% arrange(metric)
knitr::kable(summary_metric,
 caption = "Number of lines with metric in the database table") %>%
 kable_material(c("striped", "hover"))
 summary_metric <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) as  n,  metric FROM public.metadata group by metric')%>% arrange(metric)
knitr::kable(summary_metric,
 caption = "Number of lines with metric in the metadata table") %>%
 kable_material(c("striped", "hover"))
Table 8: Metric overview
(a) Number of lines with metric in the database table
n metric
50 Bound
29 Coefficient of variation
679 Hyperparameter
7 Index
38278 Mean
11033 Standard deviation
(b) Number of lines with metric in the metadata table
n metric
2 Bound
5 Coefficient of variation
75 Estimate
6 Hyperparameter
7 Index
60 Mean
2 Precision
17 Standard deviation

There is a problem of correspondence between the database and metadata, detailed below where we query the database to find where the metric in metatdata, and the metric in database are different :

Code to join metadata and metrics and find correspondence problems
# this query didn't return as expected
 pb <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT database.var_mod, 
  metadata.metric as metric_metadata, 
  database.metric as metric_database  
  FROM public.metadata 
  JOIN public.database
  ON database.var_mod=metadata.var_mod
  WHERE metadata.metric != database.metric
  ")
knitr::kable(pb,
 caption = "Variables with correspondence pb for metric between the metadata and the database table") %>%
 kable_material(c("striped", "hover"))
 pb1 <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT distinct metadata.var_mod, 
  metadata.metric as metric_metadata,
  database.metric as metric_database  
  FROM public.metadata  
  left JOIN public.database
  ON database.var_mod=metadata.var_mod
  WHERE metadata.metric in ('Precision','Estimate')
  ")
knitr::kable(pb1,
 caption = "Variables with correspondence pb for metric between the metadata and the database table") %>%
 kable_material(c("striped", "hover"))
Table 9: Checking correspondence between database and metadata
(a) This row has a problem
var_mod metric_metadata metric_database
NA NA NA
:------- :--------------- :---------------
(b) Variables with correspondence pb for metric between the metadata and the database table
var_mod metric_metadata metric_database
tau_theta4 Precision NA
C5_NEC_1_tot Estimate NA
C6_hw Estimate NA
C5_NAC_1 Estimate NA
h6_hw_delSp Estimate NA
C9_hw Estimate NA
p_C8_2_NECNAC Estimate NA
C5_NAC_2_lab Estimate NA
C8_NAC_4_lab Estimate NA
N9 Estimate NA
theta1_ddp Estimate NA
h9_hw_delSp Estimate NA
tau_theta3 Precision NA
C5_NAC_2 Estimate NA
N8_1 Estimate NA
p_C8_NEC_1 Estimate NA
logit_theta3_pr Estimate NA
h5_NEC_1 Estimate NA
N5 Estimate NA
logN8 Estimate NA
N7 Estimate NA
C8_NEC_1 Estimate NA
h8_NEC_1 Estimate NA
logit_theta4 Estimate NA
h8_NEC_3 Estimate NA
theta3_pr Estimate NA
p_C5_NEC_1 Estimate NA
C5_NAC_2_other Estimate NA
h8_NAC_4 Estimate NA
C8_NEC_3 Estimate NA
C8_NEC_1_tot Estimate NA
logN2 Estimate NA
theta1 Estimate NA
theta4 Estimate NA
C8_2 Estimate NA
C5_NAC_1_tot Estimate NA
C8_NAC_3 Estimate NA
logit_theta3 Estimate NA
C8_2_comp Estimate NA
p_smolt_gamma Estimate NA
h5_NAC_1 Estimate NA
N4 Estimate NA
C8_NAC_1 Estimate NA
h8_NAC_4_lab Estimate NA
C8_NAC_4 Estimate NA
C6_hw_delSp Estimate NA
h8_NAC_1 Estimate NA
p_C8_2_NAC Estimate NA
C9_hw_delSp Estimate NA
N10 Estimate NA
C8_NEC_3_tot Estimate NA
h8_NAC_4_other Estimate NA
h6_hw Estimate NA
logN4 Estimate NA
h8_2 Estimate NA
N3 Estimate NA
p_smolt_stoch Estimate NA
theta8_2 Estimate NA
theta3 Estimate NA
N3_tot Estimate NA
N6 Estimate NA
h9_hw Estimate NA
C8_NAC_1_tot Estimate NA
h5_NAC_2 Estimate NA
C8_NAC_3_tot Estimate NA
log_N9_pr Estimate NA
p_C8_NEC_3 Estimate NA
h8_NAC_3 Estimate NA
N1 Estimate NA
logN5 Estimate NA
N1_pr Estimate NA
N3_pr Estimate NA
p_C8_2_NEC Estimate NA
C8_2_tot Estimate NA
C5_NEC_1 Estimate NA
C8_NAC_4_other Estimate NA
N8_2 Estimate NA

Metadata

The metadata structure is described in table Table 10.

Table 10: Metadata structure

Details on values inserted in the DB are described in subtables from table Table 11.

Code to get all unique values from metadata
DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT type_object FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT nimble FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
 DBI::dbGetQuery(con_salmoglob, 
 model_stage <-  "SELECT DISTINCT model_stage FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
 DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT type FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT locations FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT fishery FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT metric FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
   DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT status FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT environment FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
   DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT life_stage FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
   DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT complex FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT ages  FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
Table 11: Various values in the salmoglob metadata, description of their content
(a) type_object
type_object
single_value
array
matrix
vector
(b) nimble
nimble
Data_nimble
Const_nimble
other
Output
(c) model_stage
model_stage
Fit
other
First year
(d) type
type
Origin distribution in sea catches
Maturation rate
Homewater catches
Conservation limits
Number of years
Initialization first year
Returns
Abundance
High seas harvest rates
Smolt age structure
Time spent at sea
Sex ratio
Number of SU
Number of smolt ages
River harvest rates
Stocking
First year
Survival rate
Proportion of delayed individuals
Sea catches
Demographic transitions
Fecundity rate
Natural mortality rate
Prior hyperparameter
(e) locations
locations
NF fisheries
Bef. Fisheries
_
GLD - by NAC SU
FAR - by SU
Aft. FAR fisheries
Bef. Gld fisheries
Bef. FAR fisheries
LB/SPM fisheries in Lb
LB/SPM fisheries out. Lb
Aft. First fisheries
Btw. GLD - NF/LB fisheries
Aft. NF/LB fisheries
Btw. NF/LB - GLD fisheries
NF/LB fisheries
Aft. Second fisheries
Aft. Gld fisheries
Btw. FAR - GLD fisheries
NF/LB/SPM fisheries
GLD - by NEC SU
Btw. GLD - FAR fisheries
Return aft. Second fishery
LB/SPM fisheries tot
Return aft. First fishery
Gld fisheries
Bef. NF/LB fisheries
FAR fisheries
GLD - by cplx
GLD fisheries
(f) fishery
fishery
LB fishery
LB/SPM/swNF fishery
neNF fishery
FAR fishery
neNF
NA
delayed spawners
GLD fishery
_
main
LB-LB/SPM/swNF fishery
(g) metric
metric
Estimate
Index
Bound
Hyperparameter
Standard deviation
Coefficient of variation
Precision
Mean
(h) status
status
other
parameter_estimate
derived_quantity
parameter_constant
data
(i) environment
environment
Transition
Multiple
River
Sea
_
(j) life_stage
life_stage
Adult
Spawners
PFA
Eggs
Non mature
_
Smolts
Adults
Multiple
(k) complex
complex
Multiple
NEC
NAC
_
(l) ages
ages
0SW
1SW
1SW mature
_
Mixed
Multiple
2SW post-return
1SW post-return
eggs
1SW immature
2SW

version

Versions are simple numbers from 1 to 14, they represent version of variables, how does that work ?

“The database_archive table allows for data versioning to strengthen the data security and quality. Its content matches that of the database table plus supplementary information, i.e. the date (day-month-year/hour-minute) when the data has been updated and an identifier to register the author of data updating (the corresponding information on the author, i.e. first name, name, email address and institute affiliation can be retrieved from the users table - not presented here). The database_archive table contains the data present in the database table and all its previous versions, which are referred by a version number (one single version number by variable). At any time, an older version of database can hence be retrieved” (source Hernvann et al. (2021)).

The date time in database archive is also present in the database. but this date time has several values both in the latest db and in the archive, see below : (Table Table 12).

Code to get version
# this query didn't return as expected
 version <- DBI::dbGetQuery(con_salmoglob, 
  "WITH maxyear AS (SELECT count(*) as N, version, 
  max(year) as year,
  extract(year from date_time) as year_last_update
  FROM public.database
  group by version, date_time, year
  order by version)

  SELECT version,  sum(N) as N, max(year) as year, max(year_last_update) as year_last_update
  FROM maxyear Group by version order by version
  ")
knitr::kable(version,
 ) %>%
 kable_material(c("striped", "hover"))
# to which years correspond the data (2020 to 2024)
 dates_db <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT count(*) as N,  
    extract(year from date_time) as year
  FROM public.database
  group by extract(year from date_time) 
  ")

 # to which years correspond the data archive (2020 to 2024)
 dates_db_archive <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT count(*) as n_arch,  
    extract(year from date_time) as year
  FROM public.database_archive
  group by extract(year from date_time) ")

full_join(dates_db,dates_db_archive)   %>%
select(year, n, n_arch)  %>%
arrange(year)  %>%
knitr::kable() %>%
 kable_material(c("striped", "hover"))
Table 12: Version in the database = number of lines, narch = number of lines in the archive database, year =last year when present in column date, year_last_update = year extracted from date_time
(a) version.
version n year year_last_update
1 738 NA 2021
2 6417 2020 2022
5 150 NA 2024
6 1696 2023 2024
7 10759 2023 2024
8 9911 2023 2024
9 10017 2023 2024
10 6307 2023 2024
11 1749 2023 2024
12 1007 2023 2024
13 318 2023 2024
14 1007 2023 2024
(b) Year of data in date_time, both in database, and in database archive.
year n n_arch
2020 688 35292
2021 5050 106014
2022 1417 107651
2023 NA 93172
2024 42921 49434

QUESTION TO WGNAS

From the results in the database (Table Table 12), it’s not clear what version represents here. What does version mean ? Just a version for each variable. Values have been inserted in 2020, but it’s not clear how you can identify the origin of data in database archive, since those same dates are in the database, and not really updated. Are we right ?

Answer from Pierre-Yves Hernvann

In principle, this database update is performed variable by variable, that is, “var_mod” by “var_mod,” as you can see in the table. For example, we choose to update the values ​​of the sea returns of salmon that have spent two winters at sea (var_mod==“log_N6_mu”) via a csv file containing the return estimates for each region for the entire time series. For practical reasons, we can also update this variable “log_N6_mu” for a specific region (“area” column). The file we upload then contains the “log_N6_mu” values ​​for a single region, such as Newfoundland (NF).

When this file is uploaded with new data, it undergoes a battery of tests to ensure the consistency of format and information with what is already in the database and an upload date (“date_time” column) is assigned. If these tests are passed successfully, the database rows corresponding to the “var_mod” or the “var_mod” / “area” pair are deleted and replaced with those of the updated csv file. This replacement is accompanied by the incrementation of the version number (by 1) which makes it possible to determine that the update dated “date_time” constitutes the Xth modification of the variable (“var_mod” or “var_mod” / “area” pair). The archive table (which I don’t think you’ll be able to access) contains all versions (with version numbers and dates) of the table used by the WGNAS (it’s therefore a long table created by appending all updated versions of the WGNAS table).

IN SUMMARY: - The version number is a strictly positive integer used to track updates to the data available in the database used by the WGNAS. - A version number is not specific to the entire database (not everything is updated every year; some information may be updated several times a year). - A version number is specific to a “var_mod” or a “var_mod” / “area” pair. Thus, in most cases, a “var_mod” will be characterized by the same “version” index, but some “var_mod” may have different version numbers depending on the region. - For a triplet, “version”/“var_mod”/“area” there is only one “date_time” indicating the date and time to the nearest second.

TODO

Remove NA, -, and main ? from fishery

QUESTION TO WGNAS

In the metadata column fishery what is ‘main’ ? Should we remove NA and - and leave it NULL ?

Other comments.

Currently stage in metadata not consistent, replicates information about maturity but not always. The lack of consistency in the metadata is not a problem as this table only serves as a reference to describe var_mod used in the database table.

Additional values about maturity, stage, can be stored in the metadata table.

There is a discrepancy between data in the historical dataset and the current, so we need to create additional parameters (in the metadata table) for the historical database and this will need to be checked.

References

Hernvann, Pierre-Yves, Rémi Patin, Jérôme Guitton, Maxime Olmos, Marie-Pierre Etienne, Maéva Labouyrie, Léa Bézier, and Etienne Rivot. 2021. WGNAS-SalmoGlob ToolBox: A Web Application for Supporting Atlantic Salmon Stock Assessment at the North Atlantic Basin Scale.” {{ICES WGNAS Working Paper}} 2021/27, 22th {{March}} - 1st {{April}} 2021.
ICES. 2021. “Stock Annex: Salmon (Salmo Salar) in Northeast Atlantic.” ICES Stock Annexes. https://doi.org/10.17895/ICES.PUB.18622037.V1.
———. 2024. “The Second ICES/NASCO Workshop on Salmon Mortality at Sea (WKSalmon2; Outputs from 2022 Meeting).” ICES Scientific Reports. https://doi.org/10.17895/ICES.PUB.22560790.
Rivot, Etienne, Rémi Patin, Maxime Olmos, Gérald Chaput, and Pierre-Yves Hernvann. 2021. “A Hierarchical Bayesian Life Cycle Model for Atlantic Slamon Stock Assessment at the North Atlantic Basin Scale.” ICES WGNAS Working Paper 2021/26 22th March - 1st April 2021.