The following working document is just a technical analysis of the WGNAS database (and further its graphical interface) (ICES 2024). It uses different sources, ICES vocabulary, the stock annex (ICES 2021), to analyse the structure of the wgnas database before integrating in a single database (wgnas, wgbast, wgeel) in the DIASPARA project. This document does not engage the WGNAS it’s just a technical analysis, to try to get how this works. This document is listed as a task there : https://github.com/DIASPARAproject/WP3_migdb/issues/11

Main database structure

When loading the first lines of the db, it really looks as following :

database10 <- DBI::dbGetQuery(con_salmoglob, 'SELECT * FROM public.database limit 20')
knitr::kable(database10) %>% kable_styling(bootstrap_options = c("striped","hover", "condensed"))

Table 1: Structure of the WGNAS main database

version	year	type	age	area	location	metric	value	var_mod	date_time
2	NA	Conservation limits	_	coun_England_Wales	_	Mean	209010700.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Finland	_	Mean	104278220.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_France	_	Mean	55156800.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Gulf	_	Mean	248680000.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Iceland_NE	_	Mean	23889096.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Iceland_SW	_	Mean	51693269.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Ireland	_	Mean	710711690.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Labrador	_	Mean	243660000.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Newfoundland	_	Mean	267780000.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Northern_Ireland	_	Mean	93800000.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Norway	_	Mean	444064979.75	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Quebec	_	Mean	50380000.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Russia	_	Mean	357856550.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Scotia Fundy	_	Mean	224140000.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Scotland	_	Mean	561073322.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_Sweden	_	Mean	13997100.00	cons_lim	2022-08-18 08:37:30
2	NA	Conservation limits	_	coun_US	_	Mean	435369000.00	cons_lim	2022-08-18 08:37:30
1	NA	Demographic transitions	Mixed	Atlantic	_	Coefficient of variation	0.01	CV_dummy	2020-11-24 18:32:04
1	NA	Homewater catches	Mixed	Atlantic	_	Coefficient of variation	0.05	CV_hw	2020-11-24 18:32:04
1	NA	Natural mortality rate	Mixed	Atlantic	_	Coefficient of variation	0.03	CV_M	2020-11-24 18:32:04

Area

Figure 1: Phylogeographic structuring observed across studies in Atlantic salmon resolved by screening of phylogenetically informative nuclear microsatellite and SNP variation.

We first analyse the types of area that are found in the database. These are the possible geographical locations.

From (Rivot et al. (2021)) : > The model considers the dynamics of 25 SU (subscript r = 1, … , N with N=25) (Fig. 1):
> * 6 SU from NA CSG, indexed by r = 1, …, 6: 1 = Newfoundland, 2 = Gulf, 3 = Scotia-Fundy, 4 = USA, 5 = Quebec and 6 = Labrador;
> * 8 SU from the SE CSG, indexed by r = 7, …, 14: 7 = France, 8 = UK England and Wales, 9 = Ireland, 10 = UK Northern Ireland - FO, 11 = UK Northern Ireland - FB, 12 = UK Scotland East, 13 = UK Scotland West, 14 = Iceland South-West;
> * 11 SU from NE CSG, indexed by r= 15, …, 25: 15 = Iceland North-East, 16 = Sweden, 17 = Norway South-East, 18 = Norway South-West, 19 = Norway Middle, 20 = Norway North, 21 = Finland, 22 = Russia Kola Barents, 23 = Russia Kola White Sea, 24 = Russia Arkhangelsk Karelia and 25 = Russia River Pechora. SU are defined on the basis of freshwater areas. All salmon within a SU are assumed to have the same demographic parameters and to undertake a similar migration route at sea. Note that Germany and Spain (SE CSG) are not considered at this stage because no complete series of data are provided to ICES WGNAS by these two jurisdictions.

Code to find different values in column area

area <- DBI::dbGetQuery(con_salmoglob, 'SELECT DISTINCT area FROM public.database')
knitr::kable(area[grep("fishery",area$area),,drop=FALSE], caption = "Area corresponding to fisheries", row.names=FALSE)
knitr::kable(area[grep("coun",area$area),,drop=FALSE], caption = "Area corresponding to countries", row.names=FALSE)
knitr::kable(area[!grepl("coun",area$area) & !grepl("fishery",area$area),,drop=FALSE], caption = "Area corresponding to source area", row.names=FALSE)

Table 2: Area types

(a) Fisheries

area
LB fishery
LB/SPM/swNF fishery
neNF fishery
FAR fishery
GLD fishery

(b) Countries

area
coun_Gulf
coun_Ireland
coun_Finland
coun_Scotland
coun_Russia
coun_Norway
coun_Northern_Ireland
coun_Iceland_SW
coun_Iceland_NE
coun_US
coun_Scotia Fundy
coun_England_Wales
coun_Labrador
coun_Newfoundland
coun_Sweden
coun_Quebec
coun_France

area
GF
NO_SW
SW
NEC
NAC
NO_NO
US
RU_RP
FR
IC_NE
IR
Atlantic
QC
NI_FO
FI
RU_KB
NF
IC_SW
NO_SE
SC_EA
NO_MI
LB
RU_AK
EW
SF
SC_WE
NI_FB
RU_KW

The latter table (Table 2 (c)) is by default, so Atlantic, NEC and NAC are included. Having a label in a reference table and a geom to associate to those labels would be very usefull.

Location

Locations corresponding to neither fisheries nor areas

Code to analyse the data in location

location <- DBI::dbGetQuery(con_salmoglob, 'SELECT DISTINCT location FROM public.database')
knitr::kable(location[grep("fishe",tolower(location$location)),,drop=FALSE],  row.names=FALSE)
location_biz <- data.frame("location"=setdiff(location[!grepl("fishe",tolower(location$location)),],
  area[!grepl("coun",area$area) & !grepl("fishery",area$area),]))
knitr::kable(location_biz, caption = , row.names=FALSE)  
location_area <- data.frame("location"=intersect(location[!grepl("fishe",location$location),],
  area[!grepl("coun",area$area) & !grepl("fishery",area$area),]))
knitr::kable(location_area, row.names=FALSE)

Table 3: Locations

(a) Location corresponding to fisheries

location
Bef. Fisheries
Bef. Gld fisheries
Aft. First fisheries
Aft. Second fisheries
Aft. Gld fisheries
Return aft. Second fishery
Gld fisheries
Return aft. First fishery

(b) These one are neither fishery nor area

location
GLD - by NAC SU
_
FAR - by SU
GLD - by NEC SU
delayed spawners
additional
GLD - by cplx

location
NO_SW
SW
NO_NO
US
RU_RP
FR
IC_NE
IR
QC
NI_FO
FI
RU_KB
NF
IC_SW
NO_SE
SC_EA
NO_MI
LB
RU_AK
EW
SF
SC_WE
NI_FB
RU_KW
GF

Here we search for locations related to fisheries.

QUESTION TO WGNAS

Is there some gis reference for these fishing areas
Would it possible to use ICES areas there or would this reporting correspond to different boundaries ?

Locations corresponding to fisheries

code to list the parameters with location not being area nor fishery

for (i in 1 : nrow(location_biz)){
  location_bizi <- location_biz$location[i]
  var_biz <- DBI::dbGetQuery(con_salmoglob, glue::glue_sql("SELECT distinct(var_mod) FROM public.database 
where location in ({location_bizi})", .con=con_salmoglob))
colnames(var_biz) <- location_bizi
print(kable(var_biz[,,drop=FALSE], row.names=FALSE))
}

Table 4: List of parameters corresponding to neither fishery nor area (corresponding to weird table…).

GLD - by NAC SU
p_C8_2_NAC_gld_mu

_
theta6_surv
log_N9_sd
N
nSm
CV_N1_pr
CV_M
log_N6_sd
theta6_delSp
p_smolt_pr
min_log_N9
Stocking_2SW
date_begin
date_end_hindcast
log_C6_mu
p_smolt
log_N6_mu
eggs
CV_theta1
N_NEC
log_C6_sd
max_log_N9
CV_dummy
log_C9_sd
log_C9_mu
log_N9_mu
E_M
theta9_surv
E_theta1_pr
cons_lim
prop_female
E_theta1
N_Sample
CV_hw
N_NAC
mu_N1_pr
date_end
theta9_delSp
N_Sample_sm

FAR - by SU
p_C8_NEC_3_far_mu
p_C5_NEC_1_far_mu
p_C8_NEC_1_far_mu

GLD - by NEC SU
p_C8_2_NEC_gld_mu

delayed spawners
log_C9_delSp_sd
log_C6_delSp_mu
log_C6_delSp_sd
log_C9_delSp_mu

additional
log_C9_sup_mu
log_C9_sup_sd
log_C6_sup_sd
log_C6_sup_mu

GLD - by cplx
p_C8_2_NECNAC_gld_mu

Some values have been deprecated and are present in the database archive but not in the main database.

Code to find what is only in database_archive.

database <- DBI::dbGetQuery(con_salmoglob, "SELECT * FROM public.database")
database_archive <- DBI::dbGetQuery(con_salmoglob, "SELECT * FROM public.database_archive")

# List of commun columns
cols <- c("version", "year", "type", "age", "area", "location", "metric", "var_mod")

# getting values that are not present in database anymore
for (col in cols) {
  diff_values <- setdiff(database_archive[[col]], database[[col]])
  if (length(diff_values) > 0) {
    cat("Values present in database_archive but not in database for", col, "column :\n")
    print(diff_values)
  }
}

Values present in database_archive but not in database for version column :
[1] 3 4
Values present in database_archive but not in database for year column :
[1] 51 52
Values present in database_archive but not in database for area column :
[1] "NEAC"
Values present in database_archive but not in database for location column :
[1] "main"

Code to get the number of location per model variable

result <- database %>%
  group_by(var_mod) %>%
  summarise(location_count = n_distinct(location))
knitr::kable(result) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Table 5: Number of locations per model variables

var_mod	location_count
CV_M	1
CV_N1_pr	1
CV_dummy	1
CV_hw	1
CV_theta1	1
E_M	1
E_theta1	1
E_theta1_pr	1
N	1
N_NAC	1
N_NEC	1
N_Sample	1
N_Sample_sm	1
Stocking_2SW	1
cons_lim	1
date_begin	1
date_end	1
date_end_hindcast	1
deltat5_1	1
deltat5_2	1
deltat8_1	1
deltat8_2	1
deltat8_2_1	1
deltat8_2_2	1
eggs	1
log_C5_NAC_1_lbnf_mu	1
log_C5_NAC_1_lbnf_sd	1
log_C5_NAC_2_lbnf_lab_mu	1
log_C5_NAC_2_lbnf_lab_sd	1
log_C5_NAC_2_lbnf_oth_mu	1
log_C5_NAC_2_lbnf_oth_sd	1
log_C5_NEC_1_far_mu	1
log_C5_NEC_1_far_sd	1
log_C6_delSp_mu	1
log_C6_delSp_sd	1
log_C6_mu	1
log_C6_sd	1
log_C6_sup_mu	1
log_C6_sup_sd	1
log_C8_2_gld_tot_mu	1
log_C8_2_gld_tot_sd	1
log_C8_NAC_1_lbnf_mu	1
log_C8_NAC_1_lbnf_sd	1
log_C8_NAC_3_lbnf_mu	1
log_C8_NAC_3_lbnf_sd	1
log_C8_NAC_4_lbnf_lab_mu	1
log_C8_NAC_4_lbnf_lab_sd	1
log_C8_NAC_4_lbnf_oth_mu	1
log_C8_NAC_4_lbnf_oth_sd	1
log_C8_NEC_1_far_mu	1
log_C8_NEC_1_far_sd	1
log_C8_NEC_3_far_mu	1
log_C8_NEC_3_far_sd	1
log_C9_delSp_mu	1
log_C9_delSp_sd	1
log_C9_mu	1
log_C9_sd	1
log_C9_sup_mu	1
log_C9_sup_sd	1
log_N6_mu	1
log_N6_sd	1
log_N9_mu	1
log_N9_sd	1
max_log_N9	1
min_log_N9	1
mu_N1_pr	1
nSm	1
omega	25
p_C5_NEC_1_far_mu	1
p_C8_2_NAC_gld_mu	1
p_C8_2_NECNAC_gld_mu	1
p_C8_2_NEC_gld_mu	1
p_C8_NEC_1_far_mu	1
p_C8_NEC_3_far_mu	1
p_smolt	1
p_smolt_pr	1
prop_female	1
theta6_delSp	1
theta6_surv	1
theta9_delSp	1
theta9_surv	1

So in the end, only omega has 25 locations, and in this case the location corresponds to area. This means we can get rid of the location column. For omega we can keep it,; so we can add a column with a foreign key to area in the parameters table where priors will be stored.

Age

Salmon mature at various sea ages, typically returning to freshwater to spawn after one to three years at sea, but also sometimes at older sea ages; this varies widely between populations. Those salmon that return after one year at sea are referred to as one-sea-winter (1SW) salmon, or grilse, with older fish categorised as 2SW, 3SW, etc. In practice, however, for management purposes these older sea age fish are typically aggregated and collectively referred to as multi-sea-winter (MSW) salmon. The sea age when salmon become sexually mature depends on genetics as well as growing conditions in the sea, and possibly freshwater, although the precise proximate factors initiating homeward migration are unknown (Hansen and Quinn, 1998). The sea age of Atlantic salmon is important in the context of stock definition since these different groups of fish have different migration routes, return at different times and are differentially exploited in fisheries. Thus, for example, it is only potential MSW salmon that are exploited in the distant water salmon fishery that operates off the west coast of Greenland (ICES 2021).

Code to describe the content of age column

summary_age <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) N, age FROM public.database group by age')%>% arrange(age)
knitr::kable(summary_age, caption = "Number age data categorized by age") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
summary_age_parm <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) N, age, var_mod FROM public.database group by age, var_mod order by age, var_mod') 
knitr::kable(summary_age_parm %>% filter(age=="_"), caption = "Number age data categorized by age, parameters without age, n is the number of lines in the db") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
knitr::kable(summary_age_parm %>% filter(age!="_"), caption = "Number age data categorized by age, n is the number of lines in the db") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
distinct_age_parm <- DBI::dbGetQuery(con_salmoglob, 
  "WITH t1 AS (
  SELECT age, 
  var_mod 
  FROM public.database 
  group by age, var_mod 
  order by age, var_mod ),
  t2 AS (
  SELECT age as age2, 
  var_mod as var_mod2
  FROM public.database 
  group by age, var_mod 
  order by age, var_mod)
  SELECT age, age2, var_mod from t1 JOIN t2 on t1.var_mod= t2.var_mod2 
  where age!=age2
  ")
knitr::kable(distinct_age_parm, caption = "Variables using more than 1 age (e.g. from which the age cannot be derived from the variable itself.)") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Table 6: Age structure in the database

(a) Number age data categorized by age

n	age
26	0SW
1525	1FW
14687	1SW
212	1SW immature
212	1SW mature
1007	1SW maturing
1007	1SW not maturing
1325	1SW post-return
1525	2FW
18868	2SW
1325	2SW post-return
1525	3FW
1525	4FW
1525	5FW
1525	6FW
57	Mixed
649	_
1551	eggs

(b) Number age data categorized by age, parameters without age, n is the number of lines in the db

n	age	var_mod
17	_	cons_lim
1	_	date_begin
1	_	date_end
1	_	date_end_hindcast
1	_	N
1	_	N_NAC
1	_	N_NEC
1	_	nSm
625	_	omega

n	age	var_mod
25	0SW	mu_N1_pr
1	0SW	N_Sample_sm
1325	1FW	p_smolt
200	1FW	p_smolt_pr
25	1SW	deltat5_1
25	1SW	deltat5_2
1325	1SW	eggs
53	1SW	log_C5_NAC_2_lbnf_lab_mu
53	1SW	log_C5_NAC_2_lbnf_lab_sd
53	1SW	log_C5_NAC_2_lbnf_oth_mu
53	1SW	log_C5_NAC_2_lbnf_oth_sd
1325	1SW	log_C6_delSp_mu
1325	1SW	log_C6_delSp_sd
1325	1SW	log_C6_mu
1325	1SW	log_C6_sd
1250	1SW	log_C6_sup_mu
1250	1SW	log_C6_sup_sd
1325	1SW	log_N6_mu
1325	1SW	log_N6_sd
1325	1SW	prop_female
1325	1SW	theta6_delSp
53	1SW immature	log_C8_NAC_1_lbnf_mu
53	1SW immature	log_C8_NAC_1_lbnf_sd
53	1SW immature	log_C8_NEC_1_far_mu
53	1SW immature	log_C8_NEC_1_far_sd
53	1SW mature	log_C5_NAC_1_lbnf_mu
53	1SW mature	log_C5_NAC_1_lbnf_sd
53	1SW mature	log_C5_NEC_1_far_mu
53	1SW mature	log_C5_NEC_1_far_sd
1007	1SW maturing	p_C5_NEC_1_far_mu
1007	1SW not maturing	p_C8_NEC_1_far_mu
1325	1SW post-return	theta6_surv
1325	2FW	p_smolt
200	2FW	p_smolt_pr
25	2SW	deltat8_1
25	2SW	deltat8_2
25	2SW	deltat8_2_1
25	2SW	deltat8_2_2
1325	2SW	eggs
53	2SW	log_C8_2_gld_tot_mu
53	2SW	log_C8_2_gld_tot_sd
53	2SW	log_C8_NAC_3_lbnf_mu
53	2SW	log_C8_NAC_3_lbnf_sd
53	2SW	log_C8_NAC_4_lbnf_lab_mu
53	2SW	log_C8_NAC_4_lbnf_lab_sd
53	2SW	log_C8_NAC_4_lbnf_oth_mu
53	2SW	log_C8_NAC_4_lbnf_oth_sd
53	2SW	log_C8_NEC_3_far_mu
53	2SW	log_C8_NEC_3_far_sd
1325	2SW	log_C9_delSp_mu
1325	2SW	log_C9_delSp_sd
1325	2SW	log_C9_mu
1325	2SW	log_C9_sd
1250	2SW	log_C9_sup_mu
1250	2SW	log_C9_sup_sd
1325	2SW	log_N9_mu
1325	2SW	log_N9_sd
25	2SW	max_log_N9
25	2SW	min_log_N9
318	2SW	p_C8_2_NAC_gld_mu
1007	2SW	p_C8_2_NEC_gld_mu
106	2SW	p_C8_2_NECNAC_gld_mu
1007	2SW	p_C8_NEC_3_far_mu
1325	2SW	prop_female
1325	2SW	Stocking_2SW
1325	2SW	theta9_delSp
1325	2SW post-return	theta9_surv
1325	3FW	p_smolt
200	3FW	p_smolt_pr
1325	4FW	p_smolt
200	4FW	p_smolt_pr
1325	5FW	p_smolt
200	5FW	p_smolt_pr
1325	6FW	p_smolt
200	6FW	p_smolt_pr
1	eggs	CV_N1_pr
25	eggs	CV_theta1
1325	eggs	E_theta1
200	eggs	E_theta1_pr
1	Mixed	CV_dummy
1	Mixed	CV_hw
1	Mixed	CV_M
1	Mixed	E_M
53	Mixed	N_Sample

(d) Variables using more than 1 age (e.g. from which the age cannot be derived from the variable itself.)

age	age2	var_mod
1SW	2SW	eggs
2SW	1SW	eggs
2FW	6FW	p_smolt
2FW	1FW	p_smolt
2FW	3FW	p_smolt
2FW	4FW	p_smolt
2FW	5FW	p_smolt
6FW	2FW	p_smolt
6FW	1FW	p_smolt
6FW	3FW	p_smolt
6FW	4FW	p_smolt
6FW	5FW	p_smolt
1FW	2FW	p_smolt
1FW	6FW	p_smolt
1FW	3FW	p_smolt
1FW	4FW	p_smolt
1FW	5FW	p_smolt
3FW	2FW	p_smolt
3FW	6FW	p_smolt
3FW	1FW	p_smolt
3FW	4FW	p_smolt
3FW	5FW	p_smolt
4FW	2FW	p_smolt
4FW	6FW	p_smolt
4FW	1FW	p_smolt
4FW	3FW	p_smolt
4FW	5FW	p_smolt
5FW	2FW	p_smolt
5FW	6FW	p_smolt
5FW	1FW	p_smolt
5FW	3FW	p_smolt
5FW	4FW	p_smolt
1FW	2FW	p_smolt_pr
1FW	3FW	p_smolt_pr
1FW	4FW	p_smolt_pr
1FW	5FW	p_smolt_pr
1FW	6FW	p_smolt_pr
2FW	1FW	p_smolt_pr
2FW	3FW	p_smolt_pr
2FW	4FW	p_smolt_pr
2FW	5FW	p_smolt_pr
2FW	6FW	p_smolt_pr
3FW	1FW	p_smolt_pr
3FW	2FW	p_smolt_pr
3FW	4FW	p_smolt_pr
3FW	5FW	p_smolt_pr
3FW	6FW	p_smolt_pr
4FW	1FW	p_smolt_pr
4FW	2FW	p_smolt_pr
4FW	3FW	p_smolt_pr
4FW	5FW	p_smolt_pr
4FW	6FW	p_smolt_pr
5FW	1FW	p_smolt_pr
5FW	2FW	p_smolt_pr
5FW	3FW	p_smolt_pr
5FW	4FW	p_smolt_pr
5FW	6FW	p_smolt_pr
6FW	1FW	p_smolt_pr
6FW	2FW	p_smolt_pr
6FW	3FW	p_smolt_pr
6FW	4FW	p_smolt_pr
6FW	5FW	p_smolt_pr
2SW	1SW	prop_female
1SW	2SW	prop_female

Only the following parameters need a dimension with age.

extract parameters with more than one age

more_than_one_age <- DBI::dbGetQuery(con_salmoglob, 
  ' WITH uk AS (SELECT DISTINCT age, var_mod FROM DATABASE),
 W AS (SELECT *, count (*) OVER (PARTITION BY var_mod) AS n FROM uk)
 SELECT * FROM W  WHERE n> 1')
knitr::kable(more_than_one_age, caption = "Parameters with more than one age") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Table 7: parameters with more than one age in the db

Parameters with more than one age
age	var_mod	n
1SW	eggs	2
2SW	eggs	2
3FW	p_smolt	6
2FW	p_smolt	6
1FW	p_smolt	6
4FW	p_smolt	6
6FW	p_smolt	6
5FW	p_smolt	6
3FW	p_smolt_pr	6
5FW	p_smolt_pr	6
1FW	p_smolt_pr	6
2FW	p_smolt_pr	6
4FW	p_smolt_pr	6
6FW	p_smolt_pr	6
2SW	prop_female	2
1SW	prop_female	2

NOTE

Only eggs, p_smolt, p_smolt_pr and prop_female need an age, the other is contained in the name of the variable.

NOTE

So the age also covers information on maturity but the only time it is really used outside from the metadata (eggs, smolt, prop_female) the maturity is not necessary.

QUESTION TO WGNAS

Why is omega not mixed ?
Would it not make more sense to store 2SW as MSW (multi-sea-winter) ? So in the referential have 1SW 2SW 3SW … how many ? and then MSW ?

Developmental stage

Code to extract count per life stage in metadata

summary_lifestage_metadata <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) as  n,  life_stage FROM public.metadata group by life_stage')%>% arrange(life_stage)
knitr::kable(summary_lifestage_metadata, format = "markdown") %>%
 kable_material(c("striped", "hover"))

Table 8: Number of lines with lifestage in the metadata table

n	life_stage
103	Adult
1	Adults
13	Eggs
1	Multiple
2	Non mature
5	PFA
20	Smolts
21	Spawners
8	_

NOTE

get rid of Adults replace with Adult.

NOTE

get rid of Multiple there is only one instance.

NOTE

Now there is the PFA, which is fine, but not really a stage.

We first start by looking at the possible reference tables in ICES corresponding to both maturity and stage. Currently we have to store information about the sea age (0SW, 1SW, 2SW, ...,MSW) , the freshwater age (1FW 2FW 3FW 4FW ...). The maturity could be stored in the metadata as it seems to be linked always with parameters. Currently in wgnas we have maturity : immature, mature, not maturing and post return. In the metadata we have egg, smolts, spawner, adults

QUESTION TO WGNAS

Is the correspondence mature / maturing correct or is it pre -spawing?
Is the correspondence OK. Do we really need PFA stage or could we just describe it in the metadata ?

QUESTION TO ICES

Could we add smolt and PFA (pre fisheries abundance) somewhere ?

Metric

The metric are linked with coefficients, they don’t need to be stored in the main db

Code to describe the metrics

summary_metric <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) as  n,  metric FROM public.database group by metric')%>% arrange(metric)
knitr::kable(summary_metric,
 caption = "Number of lines with metric in the database table") %>%
 kable_material(c("striped", "hover"))
 summary_metric <- DBI::dbGetQuery(con_salmoglob, 
  'SELECT count(*) as  n,  metric FROM public.metadata group by metric')%>% arrange(metric)
knitr::kable(summary_metric,
 caption = "Number of lines with metric in the metadata table") %>%
 kable_material(c("striped", "hover"))

Table 9: Metric overview

(a) Number of lines with metric in the database table

n	metric
50	Bound
29	Coefficient of variation
679	Hyperparameter
7	Index
38278	Mean
11033	Standard deviation

(b) Number of lines with metric in the metadata table

n	metric
2	Bound
5	Coefficient of variation
75	Estimate
6	Hyperparameter
7	Index
60	Mean
2	Precision
17	Standard deviation

There is a problem of correspondence between the database and metadata, detailed below where we query the database to find where the metric in metatdata, and the metric in database are different :

Code to join metadata and metrics and find correspondence problems

# this query didn't return as expected
 pb <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT database.var_mod, 
  metadata.metric as metric_metadata, 
  database.metric as metric_database  
  FROM public.metadata 
  JOIN public.database
  ON database.var_mod=metadata.var_mod
  WHERE metadata.metric != database.metric
  ")
knitr::kable(pb,
 caption = "Variables with correspondence pb for metric between the metadata and the database table") %>%
 kable_material(c("striped", "hover"))
 pb1 <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT distinct metadata.var_mod, 
  metadata.metric as metric_metadata,
  database.metric as metric_database  
  FROM public.metadata  
  left JOIN public.database
  ON database.var_mod=metadata.var_mod
  WHERE metadata.metric in ('Precision','Estimate')
  ")
knitr::kable(pb1,
 caption = "Variables with correspondence pb for metric between the metadata and the database table") %>%
 kable_material(c("striped", "hover"))

Table 10: Checking correspondence between database and metadata

(a) This row has a problem

var_mod	metric_metadata	metric_database
NA	NA	NA
:-------	:---------------	:---------------

(b) Variables with correspondence pb for metric between the metadata and the database table

var_mod	metric_metadata	metric_database
tau_theta4	Precision	NA
C5_NEC_1_tot	Estimate	NA
C6_hw	Estimate	NA
C5_NAC_1	Estimate	NA
h6_hw_delSp	Estimate	NA
C9_hw	Estimate	NA
p_C8_2_NECNAC	Estimate	NA
C5_NAC_2_lab	Estimate	NA
C8_NAC_4_lab	Estimate	NA
N9	Estimate	NA
theta1_ddp	Estimate	NA
h9_hw_delSp	Estimate	NA
tau_theta3	Precision	NA
C5_NAC_2	Estimate	NA
N8_1	Estimate	NA
p_C8_NEC_1	Estimate	NA
logit_theta3_pr	Estimate	NA
h5_NEC_1	Estimate	NA
N5	Estimate	NA
logN8	Estimate	NA
N7	Estimate	NA
C8_NEC_1	Estimate	NA
h8_NEC_1	Estimate	NA
logit_theta4	Estimate	NA
h8_NEC_3	Estimate	NA
theta3_pr	Estimate	NA
p_C5_NEC_1	Estimate	NA
C5_NAC_2_other	Estimate	NA
h8_NAC_4	Estimate	NA
C8_NEC_3	Estimate	NA
C8_NEC_1_tot	Estimate	NA
logN2	Estimate	NA
theta1	Estimate	NA
theta4	Estimate	NA
C8_2	Estimate	NA
C5_NAC_1_tot	Estimate	NA
C8_NAC_3	Estimate	NA
logit_theta3	Estimate	NA
C8_2_comp	Estimate	NA
p_smolt_gamma	Estimate	NA
h5_NAC_1	Estimate	NA
N4	Estimate	NA
C8_NAC_1	Estimate	NA
h8_NAC_4_lab	Estimate	NA
C8_NAC_4	Estimate	NA
C6_hw_delSp	Estimate	NA
h8_NAC_1	Estimate	NA
p_C8_2_NAC	Estimate	NA
C9_hw_delSp	Estimate	NA
N10	Estimate	NA
C8_NEC_3_tot	Estimate	NA
h8_NAC_4_other	Estimate	NA
h6_hw	Estimate	NA
logN4	Estimate	NA
h8_2	Estimate	NA
N3	Estimate	NA
p_smolt_stoch	Estimate	NA
theta8_2	Estimate	NA
theta3	Estimate	NA
N3_tot	Estimate	NA
N6	Estimate	NA
h9_hw	Estimate	NA
C8_NAC_1_tot	Estimate	NA
h5_NAC_2	Estimate	NA
C8_NAC_3_tot	Estimate	NA
log_N9_pr	Estimate	NA
p_C8_NEC_3	Estimate	NA
h8_NAC_3	Estimate	NA
N1	Estimate	NA
logN5	Estimate	NA
N1_pr	Estimate	NA
N3_pr	Estimate	NA
p_C8_2_NEC	Estimate	NA
C8_2_tot	Estimate	NA
C5_NEC_1	Estimate	NA
C8_NAC_4_other	Estimate	NA
N8_2	Estimate	NA

Metadata

The metadata structure is described in table Table 11.

Table 11: Metadata structure

Details on values inserted in the DB are described in subtables from table Table 12.

Code to get all unique values from metadata

DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT type_object FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT nimble FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
 DBI::dbGetQuery(con_salmoglob, 
 model_stage <-  "SELECT DISTINCT model_stage FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
 DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT type FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT locations FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT fishery FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT metric FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
   DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT status FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT environment FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
   DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT life_stage FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
   DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT complex FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))
  DBI::dbGetQuery(con_salmoglob, 
  "SELECT DISTINCT ages  FROM metadata") %>%
  knitr::kable() %>%
 kable_material(c("striped", "hover"))

Table 12: Various values in the salmoglob metadata, description of their content

(a) type_object

type_object
single_value
array
matrix
vector

(b) nimble

nimble
Data_nimble
Const_nimble
other
Output

model_stage
Fit
other
First year

(d) type

type
Origin distribution in sea catches
Maturation rate
Homewater catches
Conservation limits
Number of years
Initialization first year
Returns
Abundance
High seas harvest rates
Smolt age structure
Time spent at sea
Sex ratio
Number of SU
Number of smolt ages
River harvest rates
Stocking
First year
Survival rate
Proportion of delayed individuals
Sea catches
Demographic transitions
Fecundity rate
Natural mortality rate
Prior hyperparameter

(e) locations

locations
NF fisheries
Bef. Fisheries
_
GLD - by NAC SU
FAR - by SU
Aft. FAR fisheries
Bef. Gld fisheries
Bef. FAR fisheries
LB/SPM fisheries in Lb
LB/SPM fisheries out. Lb
Aft. First fisheries
Btw. GLD - NF/LB fisheries
Aft. NF/LB fisheries
Btw. NF/LB - GLD fisheries
NF/LB fisheries
Aft. Second fisheries
Aft. Gld fisheries
Btw. FAR - GLD fisheries
NF/LB/SPM fisheries
GLD - by NEC SU
Btw. GLD - FAR fisheries
Return aft. Second fishery
LB/SPM fisheries tot
Return aft. First fishery
Gld fisheries
Bef. NF/LB fisheries
FAR fisheries
GLD - by cplx
GLD fisheries

(f) fishery

fishery
LB fishery
LB/SPM/swNF fishery
neNF fishery
FAR fishery
neNF
NA
delayed spawners
GLD fishery
_
main
LB-LB/SPM/swNF fishery

(g) metric

metric
Estimate
Index
Bound
Hyperparameter
Standard deviation
Coefficient of variation
Precision
Mean

(h) status

status
other
parameter_estimate
derived_quantity
parameter_constant
data

(i) environment

environment
Transition
Multiple
River

Sea
_

(j) life_stage

life_stage
Adult
Spawners
PFA
Eggs
Non mature
_
Smolts
Adults
Multiple

(k) complex

complex
Multiple
NEC
NAC
_

(l) ages

ages
0SW
1SW
1SW mature
_
Mixed
Multiple
2SW post-return
1SW post-return
eggs
1SW immature
2SW

version

Versions are simple numbers from 1 to 14, they represent version of variables, how does that work ?

“The database_archive table allows for data versioning to strengthen the data security and quality. Its content matches that of the database table plus supplementary information, i.e. the date (day-month-year/hour-minute) when the data has been updated and an identifier to register the author of data updating (the corresponding information on the author, i.e. first name, name, email address and institute affiliation can be retrieved from the users table - not presented here). The database_archive table contains the data present in the database table and all its previous versions, which are referred by a version number (one single version number by variable). At any time, an older version of database can hence be retrieved” (source Hernvann et al. (2021)).

The date time in database archive is also present in the database. but this date time has several values both in the latest db and in the archive, see below : (Table Table 13).

Code to get version

# this query didn't return as expected
 version <- DBI::dbGetQuery(con_salmoglob, 
  "WITH maxyear AS (SELECT count(*) as N, version, 
  max(year) as year,
  extract(year from date_time) as year_last_update
  FROM public.database
  group by version, date_time, year
  order by version)

  SELECT version,  sum(N) as N, max(year) as year, max(year_last_update) as year_last_update
  FROM maxyear Group by version order by version
  ")
knitr::kable(version,
 ) %>%
 kable_material(c("striped", "hover"))
# to which years correspond the data (2020 to 2024)
 dates_db <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT count(*) as N,  
    extract(year from date_time) as year
  FROM public.database
  group by extract(year from date_time) 
  ")

 # to which years correspond the data archive (2020 to 2024)
 dates_db_archive <- DBI::dbGetQuery(con_salmoglob, 
  "SELECT count(*) as n_arch,  
    extract(year from date_time) as year
  FROM public.database_archive
  group by extract(year from date_time) ")

full_join(dates_db,dates_db_archive)   %>%
select(year, n, n_arch)  %>%
arrange(year)  %>%
knitr::kable() %>%
 kable_material(c("striped", "hover"))

Table 13: Version in the database = number of lines, narch = number of lines in the archive database, year =last year when present in column date, year_last_update = year extracted from date_time

(a) version.

version	n	year	year_last_update
1	738	NA	2021
2	6417	2020	2022
5	150	NA	2024
6	1696	2023	2024
7	10759	2023	2024
8	9911	2023	2024
9	10017	2023	2024
10	6307	2023	2024
11	1749	2023	2024
12	1007	2023	2024
13	318	2023	2024
14	1007	2023	2024

(b) Year of data in date_time, both in database, and in database archive.

year	n	n_arch
2020	688	35292
2021	5050	106014
2022	1417	107651
2023	NA	93172
2024	42921	49434

QUESTION TO WGNAS

From the results in the database (Table Table 13), it’s not clear what version represents here. What does version mean ? Just a version for each variable. Values have been inserted in 2020, but it’s not clear how you can identify the origin of data in database archive, since those same dates are in the database, and not really updated. Are we right ?

Answer from Pierre-Yves Hernvann

In principle, this database update is performed variable by variable, that is, “var_mod” by “var_mod,” as you can see in the table. For example, we choose to update the values of the sea returns of salmon that have spent two winters at sea (var_mod==“log_N6_mu”) via a csv file containing the return estimates for each region for the entire time series. For practical reasons, we can also update this variable “log_N6_mu” for a specific region (“area” column). The file we upload then contains the “log_N6_mu” values for a single region, such as Newfoundland (NF).

When this file is uploaded with new data, it undergoes a battery of tests to ensure the consistency of format and information with what is already in the database and an upload date (“date_time” column) is assigned. If these tests are passed successfully, the database rows corresponding to the “var_mod” or the “var_mod” / “area” pair are deleted and replaced with those of the updated csv file. This replacement is accompanied by the incrementation of the version number (by 1) which makes it possible to determine that the update dated “date_time” constitutes the Xth modification of the variable (“var_mod” or “var_mod” / “area” pair). The archive table (which I don’t think you’ll be able to access) contains all versions (with version numbers and dates) of the table used by the WGNAS (it’s therefore a long table created by appending all updated versions of the WGNAS table).

IN SUMMARY: - The version number is a strictly positive integer used to track updates to the data available in the database used by the WGNAS. - A version number is not specific to the entire database (not everything is updated every year; some information may be updated several times a year). - A version number is specific to a “var_mod” or a “var_mod” / “area” pair. Thus, in most cases, a “var_mod” will be characterized by the same “version” index, but some “var_mod” may have different version numbers depending on the region. - For a triplet, “version”/“var_mod”/“area” there is only one “date_time” indicating the date and time to the nearest second.

TODO

Remove NA, -, and main ? from fishery

QUESTION TO WGNAS

In the metadata column fishery what is ‘main’ ? Should we remove NA and - and leave it NULL ?

Other comments.

Currently stage in metadata not consistent, replicates information about maturity but not always. The lack of consistency in the metadata is not a problem as this table only serves as a reference to describe var_mod used in the database table.

Additional values about maturity, stage, can be stored in the metadata table.

There is a discrepancy between data in the historical dataset and the current, so we need to create additional parameters (in the metadata table) for the historical database and this will need to be checked.

References

Hernvann, Pierre-Yves, Rémi Patin, Jérôme Guitton, Maxime Olmos, Marie-Pierre Etienne, Maéva Labouyrie, Léa Bézier, and Etienne Rivot. 2021. “WGNAS-SalmoGlob ToolBox: A Web Application for Supporting Atlantic Salmon Stock Assessment at the North Atlantic Basin Scale.” {{ICES WGNAS Working Paper}} 2021/27, 22th {{March}} - 1st {{April}} 2021.

ICES. 2021. “Stock Annex: Salmon (Salmo Salar) in Northeast Atlantic.” ICES Stock Annexes. https://doi.org/10.17895/ICES.PUB.18622037.V1.

———. 2024. “The Second ICES/NASCO Workshop on Salmon Mortality at Sea (WKSalmon2; Outputs from 2022 Meeting).” ICES Scientific Reports. https://doi.org/10.17895/ICES.PUB.22560790.

Rivot, Etienne, Rémi Patin, Maxime Olmos, Gérald Chaput, and Pierre-Yves Hernvann. 2021. “A Hierarchical Bayesian Life Cycle Model for Atlantic Slamon Stock Assessment at the North Atlantic Basin Scale.” ICES WGNAS Working Paper 2021/26 22th March - 1st April 2021.