The following working document is just a technical analysis of the WGNAS database (and further its graphical interface) (ICES 2024). It uses different sources, ICES vocabulary, the stock annex (ICES 2021), to analyse the structure of the wgnas database before integrating in a single database (wgnas, wgbast, wgeel) in the DIASPARA project. This document does not engage the WGNAS it’s just a technical analysis, to try to get how this works. This document is listed as a task there : https://github.com/DIASPARAproject/WP3_migdb/issues/11
Main database structure
When loading the first lines of the db, it really looks as following :
Figure 1: Phylogeographic structuring observed across studies in Atlantic salmon resolved by screening of phylogenetically informative nuclear microsatellite and SNP variation.
We first analyse the types of area that are found in the database. These are the possible geographical locations.
From (Rivot et al. (2021)) : > The model considers the dynamics of 25 SU (subscript r = 1, … , N with N=25) (Fig. 1):
> * 6 SU from NA CSG, indexed by r = 1, …, 6: 1 = Newfoundland, 2 = Gulf, 3 = Scotia-Fundy, 4 = USA, 5 = Quebec and 6 = Labrador;
> * 8 SU from the SE CSG, indexed by r = 7, …, 14: 7 = France, 8 = UK England and Wales, 9 = Ireland, 10 = UK Northern Ireland - FO, 11 = UK Northern Ireland - FB, 12 = UK Scotland East, 13 = UK Scotland West, 14 = Iceland South-West;
> * 11 SU from NE CSG, indexed by r= 15, …, 25: 15 = Iceland North-East, 16 = Sweden, 17 = Norway South-East, 18 = Norway South-West, 19 = Norway Middle, 20 = Norway North, 21 = Finland, 22 = Russia Kola Barents, 23 = Russia Kola White Sea, 24 = Russia Arkhangelsk Karelia and 25 = Russia River Pechora. SU are defined on the basis of freshwater areas. All salmon within a SU are assumed to have the same demographic parameters and to undertake a similar migration route at sea. Note that Germany and Spain (SE CSG) are not considered at this stage because no complete series of data are provided to ICES WGNAS by these two jurisdictions.
Code to find different values in column area
area <- DBI::dbGetQuery(con_salmoglob, 'SELECT DISTINCT area FROM public.database')knitr::kable(area[grep("fishery",area$area),,drop=FALSE], caption ="Area corresponding to fisheries", row.names=FALSE)knitr::kable(area[grep("coun",area$area),,drop=FALSE], caption ="Area corresponding to countries", row.names=FALSE)knitr::kable(area[!grepl("coun",area$area) &!grepl("fishery",area$area),,drop=FALSE], caption ="Area corresponding to source area", row.names=FALSE)
Table 2: Area types
(a) Fisheries
area
LB fishery
LB/SPM/swNF fishery
neNF fishery
FAR fishery
GLD fishery
(b) Countries
area
coun_Gulf
coun_Ireland
coun_Finland
coun_Scotland
coun_Russia
coun_Norway
coun_Northern_Ireland
coun_Iceland_SW
coun_Iceland_NE
coun_US
coun_Scotia Fundy
coun_England_Wales
coun_Labrador
coun_Newfoundland
coun_Sweden
coun_Quebec
coun_France
(c) Source area
area
GF
NO_SW
SW
NEC
NAC
NO_NO
US
RU_RP
FR
IC_NE
IR
Atlantic
QC
NI_FO
FI
RU_KB
NF
IC_SW
NO_SE
SC_EA
NO_MI
LB
RU_AK
EW
SF
SC_WE
NI_FB
RU_KW
The latter table (Table 2 (c)) is by default, so Atlantic, NEC and NAC are included. Having a label in a reference table and a geom to associate to those labels would be very usefull.
Location
Locations corresponding to neither fisheries nor areas
Here we search for locations related to fisheries.
QUESTION TO WGNAS
Is there some gis reference for these fishing areas
Would it possible to use ICES areas there or would this reporting correspond to different boundaries ?
Locations corresponding to fisheries
code to list the parameters with location not being area nor fishery
for (i in1:nrow(location_biz)){ location_bizi <- location_biz$location[i] var_biz <- DBI::dbGetQuery(con_salmoglob, glue::glue_sql("SELECT distinct(var_mod) FROM public.database where location in ({location_bizi})", .con=con_salmoglob))colnames(var_biz) <- location_biziprint(kable(var_biz[,,drop=FALSE], row.names=FALSE))}
Table 4: List of parameters corresponding to neither fishery nor area (corresponding to weird table…).
GLD - by NAC SU
p_C8_2_NAC_gld_mu
_
theta6_surv
log_N9_sd
N
nSm
CV_N1_pr
CV_M
log_N6_sd
theta6_delSp
p_smolt_pr
min_log_N9
Stocking_2SW
date_begin
date_end_hindcast
log_C6_mu
p_smolt
log_N6_mu
eggs
CV_theta1
N_NEC
log_C6_sd
max_log_N9
CV_dummy
log_C9_sd
log_C9_mu
log_N9_mu
E_M
theta9_surv
E_theta1_pr
cons_lim
prop_female
E_theta1
N_Sample
CV_hw
N_NAC
mu_N1_pr
date_end
theta9_delSp
N_Sample_sm
FAR - by SU
p_C8_NEC_3_far_mu
p_C5_NEC_1_far_mu
p_C8_NEC_1_far_mu
GLD - by NEC SU
p_C8_2_NEC_gld_mu
delayed spawners
log_C9_delSp_sd
log_C6_delSp_mu
log_C6_delSp_sd
log_C9_delSp_mu
additional
log_C9_sup_mu
log_C9_sup_sd
log_C6_sup_sd
log_C6_sup_mu
GLD - by cplx
p_C8_2_NECNAC_gld_mu
Some values have been deprecated and are present in the database archive but not in the main database.
Code to find what is only in database_archive.
database <- DBI::dbGetQuery(con_salmoglob, "SELECT * FROM public.database")database_archive <- DBI::dbGetQuery(con_salmoglob, "SELECT * FROM public.database_archive")# List of commun columnscols <-c("version", "year", "type", "age", "area", "location", "metric", "var_mod")# getting values that are not present in database anymorefor (col in cols) { diff_values <-setdiff(database_archive[[col]], database[[col]])if (length(diff_values) >0) {cat("Values present in database_archive but not in database for", col, "column :\n")print(diff_values) }}
Values present in database_archive but not in database for version column :
[1] 3 4
Values present in database_archive but not in database for year column :
[1] 51 52
Values present in database_archive but not in database for area column :
[1] "NEAC"
Values present in database_archive but not in database for location column :
[1] "main"
Code to get the number of location per model variable
result <- database %>%group_by(var_mod) %>%summarise(location_count =n_distinct(location))knitr::kable(result) %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))
Table 5: Number of locations per model variables
var_mod
location_count
CV_M
1
CV_N1_pr
1
CV_dummy
1
CV_hw
1
CV_theta1
1
E_M
1
E_theta1
1
E_theta1_pr
1
N
1
N_NAC
1
N_NEC
1
N_Sample
1
N_Sample_sm
1
Stocking_2SW
1
cons_lim
1
date_begin
1
date_end
1
date_end_hindcast
1
deltat5_1
1
deltat5_2
1
deltat8_1
1
deltat8_2
1
deltat8_2_1
1
deltat8_2_2
1
eggs
1
log_C5_NAC_1_lbnf_mu
1
log_C5_NAC_1_lbnf_sd
1
log_C5_NAC_2_lbnf_lab_mu
1
log_C5_NAC_2_lbnf_lab_sd
1
log_C5_NAC_2_lbnf_oth_mu
1
log_C5_NAC_2_lbnf_oth_sd
1
log_C5_NEC_1_far_mu
1
log_C5_NEC_1_far_sd
1
log_C6_delSp_mu
1
log_C6_delSp_sd
1
log_C6_mu
1
log_C6_sd
1
log_C6_sup_mu
1
log_C6_sup_sd
1
log_C8_2_gld_tot_mu
1
log_C8_2_gld_tot_sd
1
log_C8_NAC_1_lbnf_mu
1
log_C8_NAC_1_lbnf_sd
1
log_C8_NAC_3_lbnf_mu
1
log_C8_NAC_3_lbnf_sd
1
log_C8_NAC_4_lbnf_lab_mu
1
log_C8_NAC_4_lbnf_lab_sd
1
log_C8_NAC_4_lbnf_oth_mu
1
log_C8_NAC_4_lbnf_oth_sd
1
log_C8_NEC_1_far_mu
1
log_C8_NEC_1_far_sd
1
log_C8_NEC_3_far_mu
1
log_C8_NEC_3_far_sd
1
log_C9_delSp_mu
1
log_C9_delSp_sd
1
log_C9_mu
1
log_C9_sd
1
log_C9_sup_mu
1
log_C9_sup_sd
1
log_N6_mu
1
log_N6_sd
1
log_N9_mu
1
log_N9_sd
1
max_log_N9
1
min_log_N9
1
mu_N1_pr
1
nSm
1
omega
25
p_C5_NEC_1_far_mu
1
p_C8_2_NAC_gld_mu
1
p_C8_2_NECNAC_gld_mu
1
p_C8_2_NEC_gld_mu
1
p_C8_NEC_1_far_mu
1
p_C8_NEC_3_far_mu
1
p_smolt
1
p_smolt_pr
1
prop_female
1
theta6_delSp
1
theta6_surv
1
theta9_delSp
1
theta9_surv
1
So in the end, only omega has 25 locations, and in this case the location corresponds to area. This means we can get rid of the location column. For omega we can keep it,; so we can add a column with a foreign key to area in the parameters table where priors will be stored.
Sea age
Salmon mature at various sea ages, typically returning to freshwater to spawn after one to three years at sea, but also sometimes at older sea ages; this varies widely between populations. Those salmon that return after one year at sea are referred to as one-sea-winter (1SW) salmon, or grilse, with older fish categorised as 2SW, 3SW, etc. In practice, however, for management purposes these older sea age fish are typically aggregated and collectively referred to as multi-sea-winter (MSW) salmon. The sea age when salmon become sexually mature depends on genetics as well as growing conditions in the sea, and possibly freshwater, although the precise proximate factors initiating homeward migration are unknown (Hansen and Quinn, 1998). The sea age of Atlantic salmon is important in the context of stock definition since these different groups of fish have different migration routes, return at different times and are differentially exploited in fisheries. Thus, for example, it is only potential MSW salmon that are exploited in the distant water salmon fishery that operates off the west coast of Greenland (ICES 2021).
Code to describe the content of age column
summary_age <- DBI::dbGetQuery(con_salmoglob, 'SELECT count(*) N, age FROM public.database group by age')%>%arrange(age)knitr::kable(summary_age, caption ="Number age data categorized by age") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))summary_age_parm <- DBI::dbGetQuery(con_salmoglob, 'SELECT count(*) N, age, var_mod FROM public.database group by age, var_mod order by age, var_mod') knitr::kable(summary_age_parm %>%filter(age=="_"), caption ="Number age data categorized by age, parameters without age, n is the number of lines in the db") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))knitr::kable(summary_age_parm %>%filter(age!="_"), caption ="Number age data categorized by age, n is the number of lines in the db") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))distinct_age_parm <- DBI::dbGetQuery(con_salmoglob, "WITH t1 AS ( SELECT age, var_mod FROM public.database group by age, var_mod order by age, var_mod ), t2 AS ( SELECT age as age2, var_mod as var_mod2 FROM public.database group by age, var_mod order by age, var_mod) SELECT age, age2, var_mod from t1 JOIN t2 on t1.var_mod= t2.var_mod2 where age!=age2 ")knitr::kable(distinct_age_parm, caption ="Variables using more than 1 age (e.g. from which the age cannot be derived from the variable itself.)") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))
Table 6: Age structure in the database
(a) Number age data categorized by age
n
age
26
0SW
1525
1FW
14687
1SW
212
1SW immature
212
1SW mature
1007
1SW maturing
1007
1SW not maturing
1325
1SW post-return
1525
2FW
18868
2SW
1325
2SW post-return
1525
3FW
1525
4FW
1525
5FW
1525
6FW
57
Mixed
649
_
1551
eggs
(b) Number age data categorized by age, parameters without age, n is the number of lines in the db
n
age
var_mod
17
_
cons_lim
1
_
date_begin
1
_
date_end
1
_
date_end_hindcast
1
_
N
1
_
N_NAC
1
_
N_NEC
1
_
nSm
625
_
omega
(c) Number age data categorized by age, n is the number of lines in the db
n
age
var_mod
25
0SW
mu_N1_pr
1
0SW
N_Sample_sm
1325
1FW
p_smolt
200
1FW
p_smolt_pr
25
1SW
deltat5_1
25
1SW
deltat5_2
1325
1SW
eggs
53
1SW
log_C5_NAC_2_lbnf_lab_mu
53
1SW
log_C5_NAC_2_lbnf_lab_sd
53
1SW
log_C5_NAC_2_lbnf_oth_mu
53
1SW
log_C5_NAC_2_lbnf_oth_sd
1325
1SW
log_C6_delSp_mu
1325
1SW
log_C6_delSp_sd
1325
1SW
log_C6_mu
1325
1SW
log_C6_sd
1250
1SW
log_C6_sup_mu
1250
1SW
log_C6_sup_sd
1325
1SW
log_N6_mu
1325
1SW
log_N6_sd
1325
1SW
prop_female
1325
1SW
theta6_delSp
53
1SW immature
log_C8_NAC_1_lbnf_mu
53
1SW immature
log_C8_NAC_1_lbnf_sd
53
1SW immature
log_C8_NEC_1_far_mu
53
1SW immature
log_C8_NEC_1_far_sd
53
1SW mature
log_C5_NAC_1_lbnf_mu
53
1SW mature
log_C5_NAC_1_lbnf_sd
53
1SW mature
log_C5_NEC_1_far_mu
53
1SW mature
log_C5_NEC_1_far_sd
1007
1SW maturing
p_C5_NEC_1_far_mu
1007
1SW not maturing
p_C8_NEC_1_far_mu
1325
1SW post-return
theta6_surv
1325
2FW
p_smolt
200
2FW
p_smolt_pr
25
2SW
deltat8_1
25
2SW
deltat8_2
25
2SW
deltat8_2_1
25
2SW
deltat8_2_2
1325
2SW
eggs
53
2SW
log_C8_2_gld_tot_mu
53
2SW
log_C8_2_gld_tot_sd
53
2SW
log_C8_NAC_3_lbnf_mu
53
2SW
log_C8_NAC_3_lbnf_sd
53
2SW
log_C8_NAC_4_lbnf_lab_mu
53
2SW
log_C8_NAC_4_lbnf_lab_sd
53
2SW
log_C8_NAC_4_lbnf_oth_mu
53
2SW
log_C8_NAC_4_lbnf_oth_sd
53
2SW
log_C8_NEC_3_far_mu
53
2SW
log_C8_NEC_3_far_sd
1325
2SW
log_C9_delSp_mu
1325
2SW
log_C9_delSp_sd
1325
2SW
log_C9_mu
1325
2SW
log_C9_sd
1250
2SW
log_C9_sup_mu
1250
2SW
log_C9_sup_sd
1325
2SW
log_N9_mu
1325
2SW
log_N9_sd
25
2SW
max_log_N9
25
2SW
min_log_N9
318
2SW
p_C8_2_NAC_gld_mu
1007
2SW
p_C8_2_NEC_gld_mu
106
2SW
p_C8_2_NECNAC_gld_mu
1007
2SW
p_C8_NEC_3_far_mu
1325
2SW
prop_female
1325
2SW
Stocking_2SW
1325
2SW
theta9_delSp
1325
2SW post-return
theta9_surv
1325
3FW
p_smolt
200
3FW
p_smolt_pr
1325
4FW
p_smolt
200
4FW
p_smolt_pr
1325
5FW
p_smolt
200
5FW
p_smolt_pr
1325
6FW
p_smolt
200
6FW
p_smolt_pr
1
eggs
CV_N1_pr
25
eggs
CV_theta1
1325
eggs
E_theta1
200
eggs
E_theta1_pr
1
Mixed
CV_dummy
1
Mixed
CV_hw
1
Mixed
CV_M
1
Mixed
E_M
53
Mixed
N_Sample
(d) Variables using more than 1 age (e.g. from which the age cannot be derived from the variable itself.)
age
age2
var_mod
1SW
2SW
eggs
2SW
1SW
eggs
2FW
6FW
p_smolt
2FW
1FW
p_smolt
2FW
3FW
p_smolt
2FW
4FW
p_smolt
2FW
5FW
p_smolt
6FW
2FW
p_smolt
6FW
1FW
p_smolt
6FW
3FW
p_smolt
6FW
4FW
p_smolt
6FW
5FW
p_smolt
1FW
2FW
p_smolt
1FW
6FW
p_smolt
1FW
3FW
p_smolt
1FW
4FW
p_smolt
1FW
5FW
p_smolt
3FW
2FW
p_smolt
3FW
6FW
p_smolt
3FW
1FW
p_smolt
3FW
4FW
p_smolt
3FW
5FW
p_smolt
4FW
2FW
p_smolt
4FW
6FW
p_smolt
4FW
1FW
p_smolt
4FW
3FW
p_smolt
4FW
5FW
p_smolt
5FW
2FW
p_smolt
5FW
6FW
p_smolt
5FW
1FW
p_smolt
5FW
3FW
p_smolt
5FW
4FW
p_smolt
1FW
2FW
p_smolt_pr
1FW
3FW
p_smolt_pr
1FW
4FW
p_smolt_pr
1FW
5FW
p_smolt_pr
1FW
6FW
p_smolt_pr
2FW
1FW
p_smolt_pr
2FW
3FW
p_smolt_pr
2FW
4FW
p_smolt_pr
2FW
5FW
p_smolt_pr
2FW
6FW
p_smolt_pr
3FW
1FW
p_smolt_pr
3FW
2FW
p_smolt_pr
3FW
4FW
p_smolt_pr
3FW
5FW
p_smolt_pr
3FW
6FW
p_smolt_pr
4FW
1FW
p_smolt_pr
4FW
2FW
p_smolt_pr
4FW
3FW
p_smolt_pr
4FW
5FW
p_smolt_pr
4FW
6FW
p_smolt_pr
5FW
1FW
p_smolt_pr
5FW
2FW
p_smolt_pr
5FW
3FW
p_smolt_pr
5FW
4FW
p_smolt_pr
5FW
6FW
p_smolt_pr
6FW
1FW
p_smolt_pr
6FW
2FW
p_smolt_pr
6FW
3FW
p_smolt_pr
6FW
4FW
p_smolt_pr
6FW
5FW
p_smolt_pr
2SW
1SW
prop_female
1SW
2SW
prop_female
NOTE
Only eggs, p_smolt, p_smolt_pr and prop_female need an age, the other is contained in the name of the variable.
NOTE
So the age also covers information on maturity but the only time it is really used outside from the metadata (eggs, smolt, prop_female) the maturity is not necessary.
QUESTION TO WGNAS
Why is omega not mixed ?
Would it not make more sense to store 2SW as MSW (multi-sea-winter) ? So in the referential have 1SW 2SW 3SW … how many ? and then MSW ?
Developmental stage
Code to extract count per life stage in metadata
summary_lifestage_metadata <- DBI::dbGetQuery(con_salmoglob, 'SELECT count(*) as n, life_stage FROM public.metadata group by life_stage')%>%arrange(life_stage)knitr::kable(summary_lifestage_metadata, format ="markdown") %>%kable_material(c("striped", "hover")) |>kbl()
Table 7: Number of lines with lifestage in the metadata table
Now there is the PFA, which is fine, but not really a stage.
We first start by looking at the possible reference tables in ICES corresponding to both maturity and stage. Currently we have to store information about the sea age (0SW, 1SW, 2SW, ...,MSW) , the freshwater age (1FW 2FW 3FW 4FW ...). The maturity could be stored in the metadata as it seems to be linked always with parameters. Currently in wgnas we have maturity : immature, mature, not maturing and post return. In the metadata we have egg, smolts, spawner, adults
QUESTION TO WGNAS
Is the correspondence mature / maturing correct or is it pre -spawing?
Is the correspondence OK. Do we really need PFA stage or could we just describe it in the metadata ?
QUESTION TO ICES
Could we add smolt and PFA (pre fisheries abundance) somewhere ?
Metric
The metric are linked with coefficients, they don’t need to be stored in the main db
Code to describe the metrics
summary_metric <- DBI::dbGetQuery(con_salmoglob, 'SELECT count(*) as n, metric FROM public.database group by metric')%>%arrange(metric)knitr::kable(summary_metric,caption ="Number of lines with metric in the database table") %>%kable_material(c("striped", "hover")) summary_metric <- DBI::dbGetQuery(con_salmoglob, 'SELECT count(*) as n, metric FROM public.metadata group by metric')%>%arrange(metric)knitr::kable(summary_metric,caption ="Number of lines with metric in the metadata table") %>%kable_material(c("striped", "hover"))
Table 8: Metric overview
(a) Number of lines with metric in the database table
n
metric
50
Bound
29
Coefficient of variation
679
Hyperparameter
7
Index
38278
Mean
11033
Standard deviation
(b) Number of lines with metric in the metadata table
n
metric
2
Bound
5
Coefficient of variation
75
Estimate
6
Hyperparameter
7
Index
60
Mean
2
Precision
17
Standard deviation
There is a problem of correspondence between the database and metadata, detailed below where we query the database to find where the metric in metatdata, and the metric in database are different :
Code to join metadata and metrics and find correspondence problems
# this query didn't return as expected pb <- DBI::dbGetQuery(con_salmoglob, "SELECT database.var_mod, metadata.metric as metric_metadata, database.metric as metric_database FROM public.metadata JOIN public.database ON database.var_mod=metadata.var_mod WHERE metadata.metric != database.metric ")knitr::kable(pb,caption ="Variables with correspondence pb for metric between the metadata and the database table") %>%kable_material(c("striped", "hover")) pb1 <- DBI::dbGetQuery(con_salmoglob, "SELECT distinct metadata.var_mod, metadata.metric as metric_metadata, database.metric as metric_database FROM public.metadata left JOIN public.database ON database.var_mod=metadata.var_mod WHERE metadata.metric in ('Precision','Estimate') ")knitr::kable(pb1,caption ="Variables with correspondence pb for metric between the metadata and the database table") %>%kable_material(c("striped", "hover"))
Table 9: Checking correspondence between database and metadata
(a) This row has a problem
var_mod
metric_metadata
metric_database
NA
NA
NA
:-------
:---------------
:---------------
(b) Variables with correspondence pb for metric between the metadata and the database table
var_mod
metric_metadata
metric_database
tau_theta4
Precision
NA
C5_NEC_1_tot
Estimate
NA
C6_hw
Estimate
NA
C5_NAC_1
Estimate
NA
h6_hw_delSp
Estimate
NA
C9_hw
Estimate
NA
p_C8_2_NECNAC
Estimate
NA
C5_NAC_2_lab
Estimate
NA
C8_NAC_4_lab
Estimate
NA
N9
Estimate
NA
theta1_ddp
Estimate
NA
h9_hw_delSp
Estimate
NA
tau_theta3
Precision
NA
C5_NAC_2
Estimate
NA
N8_1
Estimate
NA
p_C8_NEC_1
Estimate
NA
logit_theta3_pr
Estimate
NA
h5_NEC_1
Estimate
NA
N5
Estimate
NA
logN8
Estimate
NA
N7
Estimate
NA
C8_NEC_1
Estimate
NA
h8_NEC_1
Estimate
NA
logit_theta4
Estimate
NA
h8_NEC_3
Estimate
NA
theta3_pr
Estimate
NA
p_C5_NEC_1
Estimate
NA
C5_NAC_2_other
Estimate
NA
h8_NAC_4
Estimate
NA
C8_NEC_3
Estimate
NA
C8_NEC_1_tot
Estimate
NA
logN2
Estimate
NA
theta1
Estimate
NA
theta4
Estimate
NA
C8_2
Estimate
NA
C5_NAC_1_tot
Estimate
NA
C8_NAC_3
Estimate
NA
logit_theta3
Estimate
NA
C8_2_comp
Estimate
NA
p_smolt_gamma
Estimate
NA
h5_NAC_1
Estimate
NA
N4
Estimate
NA
C8_NAC_1
Estimate
NA
h8_NAC_4_lab
Estimate
NA
C8_NAC_4
Estimate
NA
C6_hw_delSp
Estimate
NA
h8_NAC_1
Estimate
NA
p_C8_2_NAC
Estimate
NA
C9_hw_delSp
Estimate
NA
N10
Estimate
NA
C8_NEC_3_tot
Estimate
NA
h8_NAC_4_other
Estimate
NA
h6_hw
Estimate
NA
logN4
Estimate
NA
h8_2
Estimate
NA
N3
Estimate
NA
p_smolt_stoch
Estimate
NA
theta8_2
Estimate
NA
theta3
Estimate
NA
N3_tot
Estimate
NA
N6
Estimate
NA
h9_hw
Estimate
NA
C8_NAC_1_tot
Estimate
NA
h5_NAC_2
Estimate
NA
C8_NAC_3_tot
Estimate
NA
log_N9_pr
Estimate
NA
p_C8_NEC_3
Estimate
NA
h8_NAC_3
Estimate
NA
N1
Estimate
NA
logN5
Estimate
NA
N1_pr
Estimate
NA
N3_pr
Estimate
NA
p_C8_2_NEC
Estimate
NA
C8_2_tot
Estimate
NA
C5_NEC_1
Estimate
NA
C8_NAC_4_other
Estimate
NA
N8_2
Estimate
NA
Metadata
The metadata structure is described in table Table 10.
Table 10: Metadata structure
Details on values inserted in the DB are described in subtables from table Table 11.
Code to get all unique values from metadata
DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT type_object FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover"))DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT nimble FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, model_stage <-"SELECT DISTINCT model_stage FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT type FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT locations FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT fishery FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT metric FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT status FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT environment FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT life_stage FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT complex FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover")) DBI::dbGetQuery(con_salmoglob, "SELECT DISTINCT ages FROM metadata") %>% knitr::kable() %>%kable_material(c("striped", "hover"))
Table 11: Various values in the salmoglob metadata, description of their content
(a) type_object
type_object
single_value
array
matrix
vector
(b) nimble
nimble
Data_nimble
Const_nimble
other
Output
(c) model_stage
model_stage
Fit
other
First year
(d) type
type
Origin distribution in sea catches
Maturation rate
Homewater catches
Conservation limits
Number of years
Initialization first year
Returns
Abundance
High seas harvest rates
Smolt age structure
Time spent at sea
Sex ratio
Number of SU
Number of smolt ages
River harvest rates
Stocking
First year
Survival rate
Proportion of delayed individuals
Sea catches
Demographic transitions
Fecundity rate
Natural mortality rate
Prior hyperparameter
(e) locations
locations
NF fisheries
Bef. Fisheries
_
GLD - by NAC SU
FAR - by SU
Aft. FAR fisheries
Bef. Gld fisheries
Bef. FAR fisheries
LB/SPM fisheries in Lb
LB/SPM fisheries out. Lb
Aft. First fisheries
Btw. GLD - NF/LB fisheries
Aft. NF/LB fisheries
Btw. NF/LB - GLD fisheries
NF/LB fisheries
Aft. Second fisheries
Aft. Gld fisheries
Btw. FAR - GLD fisheries
NF/LB/SPM fisheries
GLD - by NEC SU
Btw. GLD - FAR fisheries
Return aft. Second fishery
LB/SPM fisheries tot
Return aft. First fishery
Gld fisheries
Bef. NF/LB fisheries
FAR fisheries
GLD - by cplx
GLD fisheries
(f) fishery
fishery
LB fishery
LB/SPM/swNF fishery
neNF fishery
FAR fishery
neNF
NA
delayed spawners
GLD fishery
_
main
LB-LB/SPM/swNF fishery
(g) metric
metric
Estimate
Index
Bound
Hyperparameter
Standard deviation
Coefficient of variation
Precision
Mean
(h) status
status
other
parameter_estimate
derived_quantity
parameter_constant
data
(i) environment
environment
Transition
Multiple
River
Sea
_
(j) life_stage
life_stage
Adult
Spawners
PFA
Eggs
Non mature
_
Smolts
Adults
Multiple
(k) complex
complex
Multiple
NEC
NAC
_
(l) ages
ages
0SW
1SW
1SW mature
_
Mixed
Multiple
2SW post-return
1SW post-return
eggs
1SW immature
2SW
version
Versions are simple numbers from 1 to 14, they represent version of variables, how does that work ?
“The database_archive table allows for data versioning to strengthen the data security and quality. Its content matches that of the database table plus supplementary information, i.e. the date (day-month-year/hour-minute) when the data has been updated and an identifier to register the author of data updating (the corresponding information on the author, i.e. first name, name, email address and institute affiliation can be retrieved from the users table - not presented here). The database_archive table contains the data present in the database table and all its previous versions, which are referred by a version number (one single version number by variable). At any time, an older version of database can hence be retrieved” (source Hernvann et al. (2021)).
The date time in database archive is also present in the database. but this date time has several values both in the latest db and in the archive, see below : (Table Table 12).
Code to get version
# this query didn't return as expected version <- DBI::dbGetQuery(con_salmoglob, "WITH maxyear AS (SELECT count(*) as N, version, max(year) as year, extract(year from date_time) as year_last_update FROM public.database group by version, date_time, year order by version) SELECT version, sum(N) as N, max(year) as year, max(year_last_update) as year_last_update FROM maxyear Group by version order by version ")knitr::kable(version, ) %>%kable_material(c("striped", "hover"))# to which years correspond the data (2020 to 2024) dates_db <- DBI::dbGetQuery(con_salmoglob, "SELECT count(*) as N, extract(year from date_time) as year FROM public.database group by extract(year from date_time) ")# to which years correspond the data archive (2020 to 2024) dates_db_archive <- DBI::dbGetQuery(con_salmoglob, "SELECT count(*) as n_arch, extract(year from date_time) as year FROM public.database_archive group by extract(year from date_time) ")full_join(dates_db,dates_db_archive) %>%select(year, n, n_arch) %>%arrange(year) %>%knitr::kable() %>%kable_material(c("striped", "hover"))
Table 12: Version in the database = number of lines, narch = number of lines in the archive database, year =last year when present in column date, year_last_update = year extracted from date_time
(a) version.
version
n
year
year_last_update
1
738
NA
2021
2
6417
2020
2022
5
150
NA
2024
6
1696
2023
2024
7
10759
2023
2024
8
9911
2023
2024
9
10017
2023
2024
10
6307
2023
2024
11
1749
2023
2024
12
1007
2023
2024
13
318
2023
2024
14
1007
2023
2024
(b) Year of data in date_time, both in database, and in database archive.
year
n
n_arch
2020
688
35292
2021
5050
106014
2022
1417
107651
2023
NA
93172
2024
42921
49434
QUESTION TO WGNAS
From the results in the database (Table Table 12), it’s not clear what version represents here. What does version mean ? Just a version for each variable. Values have been inserted in 2020, but it’s not clear how you can identify the origin of data in database archive, since those same dates are in the database, and not really updated. Are we right ?
Answer from Pierre-Yves Hernvann
In principle, this database update is performed variable by variable, that is, “var_mod” by “var_mod,” as you can see in the table. For example, we choose to update the values of the sea returns of salmon that have spent two winters at sea (var_mod==“log_N6_mu”) via a csv file containing the return estimates for each region for the entire time series. For practical reasons, we can also update this variable “log_N6_mu” for a specific region (“area” column). The file we upload then contains the “log_N6_mu” values for a single region, such as Newfoundland (NF).
When this file is uploaded with new data, it undergoes a battery of tests to ensure the consistency of format and information with what is already in the database and an upload date (“date_time” column) is assigned. If these tests are passed successfully, the database rows corresponding to the “var_mod” or the “var_mod” / “area” pair are deleted and replaced with those of the updated csv file. This replacement is accompanied by the incrementation of the version number (by 1) which makes it possible to determine that the update dated “date_time” constitutes the Xth modification of the variable (“var_mod” or “var_mod” / “area” pair). The archive table (which I don’t think you’ll be able to access) contains all versions (with version numbers and dates) of the table used by the WGNAS (it’s therefore a long table created by appending all updated versions of the WGNAS table).
IN SUMMARY: - The version number is a strictly positive integer used to track updates to the data available in the database used by the WGNAS. - A version number is not specific to the entire database (not everything is updated every year; some information may be updated several times a year). - A version number is specific to a “var_mod” or a “var_mod” / “area” pair. Thus, in most cases, a “var_mod” will be characterized by the same “version” index, but some “var_mod” may have different version numbers depending on the region. - For a triplet, “version”/“var_mod”/“area” there is only one “date_time” indicating the date and time to the nearest second.
TODO
Remove NA, -, and main ? from fishery
QUESTION TO WGNAS
In the metadata column fishery what is ‘main’ ? Should we remove NA and - and leave it NULL ?
Other comments.
Currently stage in metadata not consistent, replicates information about maturity but not always. The lack of consistency in the metadata is not a problem as this table only serves as a reference to describe var_mod used in the database table.
Additional values about maturity, stage, can be stored in the metadata table.
There is a discrepancy between data in the historical dataset and the current, so we need to create additional parameters (in the metadata table) for the historical database and this will need to be checked.
References
Hernvann, Pierre-Yves, Rémi Patin, Jérôme Guitton, Maxime Olmos, Marie-Pierre Etienne, Maéva Labouyrie, Léa Bézier, and Etienne Rivot. 2021. “WGNAS-SalmoGlob ToolBox: A Web Application for Supporting Atlantic Salmon Stock Assessment at the North Atlantic Basin Scale.” {{ICES WGNAS Working Paper}} 2021/27, 22th {{March}} - 1st {{April}} 2021.
———. 2024. “The Second ICES/NASCO Workshop on Salmon Mortality at Sea (WKSalmon2; Outputs from 2022 Meeting).” ICES Scientific Reports. https://doi.org/10.17895/ICES.PUB.22560790.
Rivot, Etienne, Rémi Patin, Maxime Olmos, Gérald Chaput, and Pierre-Yves Hernvann. 2021. “A Hierarchical Bayesian Life Cycle Model for Atlantic Slamon Stock Assessment at the North Atlantic Basin Scale.” ICES WGNAS Working Paper 2021/26 22th March - 1st April 2021.