0:00:00.000,0:00:06.499 Welcome to another video focusing on methodological approaches 0:00:06.499,0:00:12.198 to estimating effective population size using genomic data. 0:00:16.431,0:00:23.396 The effective population size is one of the most important parameters in conservation genetics. 0:00:23.396,0:00:32.661 The effective population size of a real population X is defined as the size of a hypothetical ideal population 0:00:32.661,0:00:40.959 that exhibits the same genetic drift as the real population under study. 0:00:40.959,0:00:49.324 The infinitive ideal population size, defined by Ronald Fisher and Sewall Wright, 0:00:49.324,0:00:55.556 is a population that is defined by the following parameters: 0:00:55.556,0:01:02.788 non-overlapping population, diploid individuals that sexually reproduce, 0:01:02.788,0:01:09.320 with a sex ratio of 1:1 and reproduce randomly. 0:01:09.320,0:01:20.818 In addition, in an ideal population defined in this way, migration, mutation and selection is not occur. 0:01:20.818,0:01:27.550 The population assumes a constant size across generations. 0:01:27.550,0:01:36.581 In an ideal infinitely large population, there is no change in allele frequency, 0:01:36.581,0:01:43.913 no change in inbreeding coefficient, no random loss of alleles across time. 0:01:43.913,0:01:50.079 The alleles of a given ideal population occur in linkage equilibrium. 0:01:50.079,0:01:58.977 On the other hand, in an ideal population of finite size, genetic drift already occurs, 0:01:58.977,0:02:08.075 which changes the frequency of individual alleles, as well as the change in the value of the inbreeding coefficient. 0:02:08.075,0:02:17.373 Furthermore, genetic drift in the final ideal population causes loss or fixation of alleles 0:02:17.373,0:02:24.339 and the occurrence of linkage disequilibrium between alleles of the population. 0:02:24.339,0:02:34.437 Since the effective population size is one of the most important population parameters, 0:02:34.437,0:02:39.002 there are a number of methodological approaches for estimating this parameter. 0:02:39.002,0:02:44.135 This slide shows some of the most important approaches. 0:02:44.135,0:02:51.367 the effective population size estimated based on population variability, 0:02:51.367,0:02:57.499 evaluated by changes in the frequency of alleles in the population, 0:02:57.499,0:03:03.497 another approach is based on the change in the level of the inbreeding coefficient, 0:03:03.497,0:03:15.662 and last but not least, the approach that evaluates the level of change in the linkage disequilibrium in the population should be mentioned. 0:03:15.662,0:03:23.394 Different estimates of effective population size are based on different methodological approaches 0:03:23.394,0:03:36.724 and therefore the estimates of effective population size based on these different approaches may not always be the same. 0:03:36.724,0:03:47.689 From the point of view of using genomic data, the most used approach to estimate the effective population size 0:03:47.689,0:03:52.188 is based on determining the level of linkage disequilibrium in the population. 0:03:52.188,0:04:01.853 However, this approach is based on a different definition of effective population size than the other approaches mentioned. 0:04:01.853,0:04:10.284 When the effective population size of a real population X with an observed level of linkage disequilibrium 0:04:10.284,0:04:17.983 for a given genomic segment corresponds to the size of a hypothetical idealized population 0:04:17.983,0:04:29.780 that shows identical levels of linkage disequilibrium patterns for the same genomic distance interval as observed in the real finite population. 0:04:29.780,0:04:36.312 Note that the definition of effective population size based on linkage disequilibrium 0:04:36.312,0:04:44.444 is different from the other definitions of effective population size estimation mentioned above. 0:04:46.810,0:04:57.775 As already mentioned, in an ideal infinite population that has reached equilibrium state, all loci are in linkage equilibrium. 0:04:57.775,0:05:06.740 However, in an ideal population with a finite number of individuals that reaches equilibrium state, 0:05:06.740,0:05:19.471 the loci are in linkage disequilibrium, and the linkage disequilibrium is a function of the population size and the genetic distance between loci. 0:05:19.471,0:05:24.370 And just based on the distances between loci, 0:05:24.370,0:05:32.001 it is possible to estimate not only the level of the effective size of the current population, 0:05:32.001,0:05:41.466 but also the effective size of the population in the previous generations. 0:05:41.466,0:05:54.964 This slide shows the most common computer programs used to estimate effective population size based on both pedigree and molecular genetic data. 0:05:54.964,0:06:02.529 The group of programs using molecular genetic data can be divided into two groups, 0:06:02.529,0:06:09.461 namely programs estimating effective population size based on linkage disequilibrium. 0:06:09.461,0:06:23.425 This group includes, for example, the very popular program NeEstimator, which works not only with genomic data but also with microsatellites, 0:06:23.425,0:06:29.290 and the recently published programs GONE and CurrentNe. 0:06:29.290,0:06:41.888 The SNeP program also belongs to this group, but is mainly suitable for estimating historical effective population size. 0:06:41.888,0:06:55.785 Another group of programmes are NeIBD and HapNe, which are based on an approach based on estimating the coefficient of inbreeding. 0:06:55.785,0:07:01.584 These two programs are primarily designed for the analysis of human data 0:07:01.584,0:07:09.549 and are not well suited for estimating the effective population size of livestock. 0:07:13.448,0:07:26.612 We used genomic data from 215 Old Kladruber horses to compare the individual programs to estimate effective population size. 0:07:26.612,0:07:33.477 This population, which is classified among the genetic resources of the Czech Republic, 0:07:33.477,0:07:44.175 shows very accurate pedigree records, but also accurately captures the historical development of the breed, 0:07:44.175,0:07:56.106 which allows us to verify the suitability of individual programs and the accuracy of estimating the effective population size. 0:07:56.106,0:08:08.470 The pedigree records of this breed date back to 1773 and include 49 generations of ancestors 0:08:08.470,0:08:15.136 with the equivalent of complete generations corresponding to 16 generations. 0:08:15.136,0:08:27.367 In addition, the molecular genetic data were adjusted for the analysis according to looser or stricter error corrections, 0:08:27.367,0:08:41.730 namely quality control data 1 (Q1) where approximately 39,000 molecular genetic data are included in the analysis 0:08:41.730,0:08:50.929 and quality control data 2 (Q2) which included approximately 61,000 molecular genetic data. 0:08:50.929,0:08:59.227 In addition, the effective population size was estimated based on pedigree data for comparison. 0:09:01.627,0:09:10.258 The results of the comparison of the different computer programs show that when using a larger number of SNPs, 0:09:10.258,0:09:16.890 most of the programs used showed very similar values; 0:09:16.890,0:09:32.720 the only exception was the SNeP program, which showed significantly lower estimates of effective population size when using both larger and fewer SNPs. 0:09:32.720,0:09:51.350 quality control 1 and quality control 2. The computer program NeEstimator also showed lower estimates when using data with lower SNP densities (quality of control 1). 0:09:51.350,0:10:00.048 Next, the estimation of the historical effective population size was verified using SNeP and GONE softwares. 0:10:00.048,0:10:09.780 Looking at the results, it is clear that the guess of the historical effective population size by the GONE program 0:10:09.780,0:10:28.743 very realistically reproduces the historical evolution of the breed under study and declares the occurrence of the genetic drift effect, which generally occurred in most horse populations in Europe after the First World War, 0:10:28.743,0:10:51.138 the collapse of the Habsburg Monarchy and the development of mechanization in agriculture. The SNeP program significantly underestimates the estimates of effective population size in all generations studied. 0:10:53.438,0:10:59.936 Based on the conclusions reached, the GONE program shows the highest suitability 0:10:59.936,0:11:09.335 for estimating the effective population size of livestock, companion animals and wildlife animals. 0:11:09.335,0:11:20.766 The following slides show the parameter settings and how to run the GONE program to estimate the current and historical effective population size. 0:11:20.766,0:11:27.964 The program is freely available on the following github.com web page. 0:11:27.964,0:11:36.196 The methodological approach, testing of the program and verification of the accuracy of the program 0:11:36.196,0:11:41.961 on simulated data are presented in this scientific publication. 0:11:43.861,0:11:47.427 After downloading we will get the following files. 0:11:47.427,0:11:56.725 Among these files are sample data files, marked with the name "example". 0:11:56.725,0:12:02.391 There is also a parametric file "INPUT_PARAMETERS_FILE" 0:12:02.391,0:12:11.922 and the most important file that allows us to run the program - "script_GONE.sh". 0:12:11.922,0:12:22.087 The software GONE supports input data in PLINK format, i.e. files with .ped and .map extensions. 0:12:22.087,0:12:34.684 Genomic data can be encoded in biallelic format and both codes 1 and 2 or AGCT - as nucleotide bases. 0:12:34.684,0:12:48.448 The GONE program has only one limitation on the data, namely that it can accept 100 000 SNPs per chromosome 0:12:48.448,0:12:54.180 and a maximum of 1 million SNPs per whole genome. 0:12:54.180,0:13:07.478 In terms of the number of individuals, the limitations are that the number of individuals should be between 2 and 1800 individuals. 0:13:07.478,0:13:21.475 However, the recommended minimal number of individuals for accurate estimation is between 20 and 24. 0:13:21.475,0:13:26.274 There are no other restrictions on the datasets. 0:13:26.274,0:13:32.873 This slide shows the parameter file with the default settings. 0:13:32.873,0:13:40.338 Highlighted are the most important parameters that should be verified before analysis. 0:13:40.338,0:13:49.436 The first parameter is to determine if the data is phased or not. 0:13:49.436,0:14:00.901 For analyses, it is recommended to use phased data, which provides more accurate estimates of the effective population size. 0:14:00.901,0:14:13.631 However, it is useful to consider the precision of phasing for files with small numbers of individuals. 0:14:13.631,0:14:25.596 In files with small numbers of individuals, phasing may result in higher 'noise' that affects the final estimates of effective population size. 0:14:25.596,0:14:30.561 Another parameter is the recombination ratio. 0:14:30.561,0:14:45.625 For more accurate estimates it is preferable to report the actual recombination ratios that occur in a given breed or population. 0:14:45.625,0:14:57.556 However, past analyses suggest that this default setting provides sufficiently accurate estimates of effective population size. 0:14:57.556,0:15:02.522 The last parameter represents the minor allele frequency setting. 0:15:02.522,0:15:15.386 In other population analyses or other computer programs, a minor allele frequency setting of more than 0.05 is recommended. 0:15:15.386,0:15:22.384 However, in this program, when the fixed alleles are removed, 0:15:22.384,0:15:32.582 information about linkage disequilibrium is lost and the estimates of effective population size are not as accurate. 0:15:32.582,0:15:36.982 The GONE program is run from the command line. 0:15:36.982,0:15:43.980 The program starts by typing the example given, where the last term "example" 0:15:43.980,0:15:52.712 represents the name of the genotype data files, here named example., without extension. 0:15:55.211,0:16:01.343 The following procedures should be used to estimate the effective population size. 0:16:01.343,0:16:13.841 The program uses a maximum number of SNPs per chromosome of 100,000 with a default setting of 50,000 SNPs. 0:16:13.841,0:16:20.506 At lower SNP counts, all SNPs are included in the analysis. 0:16:20.506,0:16:29.038 If more than 50,000 SNPs are present in the dataset, SNPs are selected randomly. 0:16:29.038,0:16:39.969 This can be used to estimate confidence intervals when the program is run repeatedly with random selection of SNPs. 0:16:39.969,0:16:51.000 However, it should be emphasized that the analyses should be run sequentially because random SNP selection 0:16:51.000,0:17:01.232 is based on a randomly generated initial "seed" that is generated by a computer clock-time function. 0:17:01.232,0:17:13.829 Here we show the estimate of the confidence interval for 100 random SNP samples at 100 runs of the program. 0:17:13.829,0:17:25.060 Estimates of the effective population size are recommended to be taken as the median of these 100 runs - marked in blue. 0:17:25.060,0:17:32.092 The mean is also shown after the addition, and is marked in black. 0:17:32.092,0:17:42.590 Another recommendation is to adjust the number of generations for which the effective population size is estimated. 0:17:42.590,0:17:53.288 The default ensemble setting is up to 2000 generations, but for medium-density SNP chips 0:17:53.288,0:18:08.485 chips with SNP counts around 50 000 - it is recommended to consider only a maximum of 100 generations into the past for estimation accuracy. 0:18:08.485,0:18:17.117 For example, for horses where the generation interval is equal to 10 years, 0:18:17.117,0:18:25.582 the mentioned 100 generations represent an interval of 1000 years into the past. 0:18:25.582,0:18:35.879 This represents a large distance in time, since cultural breeds began to take shape about 200 years ago. 0:18:35.879,0:18:46.244 This slide again shows an estimate of the effective population size with a prediction 100 years into the past. 0:18:48.344,0:18:59.741 Next, we focused on a case study - estimating the current and historical effective population size of sheep and goats in the Czech and Slovak Republic. 0:18:59.741,0:19:09.273 As already mentioned, in this case study we will look at the estimation of the current effective population size 0:19:09.273,0:19:19.371 and the historical effective population size of selected breeds of sheep and goats reared in the Czech and Slovak Republic. 0:19:19.371,0:19:32.068 Different types of breeds are included in the analysis of current and historical effective population size in order to verify the effect of admixture of breeds, 0:19:32.068,0:19:44.466 the autochthonous breeds as well as international breeds and hybrids, which are marked in grey and blue in the slide. 0:19:44.466,0:19:52.798 Based on a previous study, three computer programs were used to estimate effective population size, 0:19:52.798,0:20:04.495 namely NeEstimator v.2, which is one of the most commonly used programs for estimating effective population size in the last decade. 0:20:04.495,0:20:14.593 Its advantages include the ability to estimate effective population size from both microsatelite data and genomic SNP data, 0:20:14.593,0:20:20.159 the ability to estimate confidence intervals for individual estimates, 0:20:20.159,0:20:27.191 and the universality of the program's use for all types of computer systems. 0:20:27.191,0:20:37.022 Disadvantages, however, include the large time-consuming estimation process, especially for large datasets, 0:20:37.022,0:20:45.287 and the fact that for datasets with small numbers of individuals that were randomly selected from large population, 0:20:45.287,0:20:50.320 estimation of infinite effective population size often occurs. 0:20:50.320,0:20:55.718 Other programs used were GONE and CurrentNe. 0:20:55.718,0:21:06.383 These programs work only with genomic SNP data and are designed for iOS and Linux only. 0:21:06.383,0:21:13.815 In addition, the GONE program allows estimation of historical effective population size. 0:21:13.815,0:21:27.612 Unlike the NeEstimator program, the GONE program for estimating current effective population size is not significantly affected by previous population admixture. 0:21:27.612,0:21:39.343 However, the effect of admixture is reflected in the estimation of the historical effective population size for this program. 0:21:39.343,0:21:46.408 In addition, CurrentNe allows for higher kinship relationships in the analysed datasets. 0:21:46.408,0:21:51.874 However, this benefit is not very applicable to most population studies, 0:21:51.874,0:21:59.039 because most of these studies include randomly selected individuals in the analysis, 0:21:59.039,0:22:04.472 taking into account the lowest level of relatioship. 0:22:04.472,0:22:15.369 The following three slides present individual estimates of the current effective population size using the three programs tested. 0:22:15.369,0:22:23.368 The results are split by dataset, with the larger datasets represented mostly by sheep breeds. 0:22:23.368,0:22:27.867 Then the smaller data sets, represented by goat breeds. 0:22:27.867,0:22:36.365 Finally, populations of crossbred and synthetic sheep breeds and a Texel breed, 0:22:36.365,0:22:47.296 whose dataset contains only 5 individuals, are presented and used to test the ability of the programs to handle small files. 0:22:47.296,0:22:59.261 This first table shows that, although the NeEstimator program provided lower estimates of effective population size for larger files 0:22:59.261,0:23:09.925 than the other two programs, the differences are not significant based on the estimated confidence intervals. 0:23:09.925,0:23:16.724 The other two programs produced very similar estimates in most cases. 0:23:16.724,0:23:28.522 For the less numerous sets, represented here by goat breeds, the differences between programmes are already more significant. 0:23:28.522,0:23:38.753 Even significant differences between the estimates were obtained, with no overlap even for the confidence intervals. 0:23:38.753,0:23:47.784 From the results of the estimation of the effective population size for the crossbred and synthetic populations, 0:23:47.784,0:23:51.384 which includes the Slovak dairy sheep, 0:23:51.384,0:24:01.582 it is clear that the NeEstimator estimates for the current population are affected by the occurrence of admixture of genes. 0:24:01.582,0:24:09.847 Also, for a population with a significantly small number of individuals, here represented by the Texel breed, 0:24:09.847,0:24:13.646 the NeEstimator estimates are not valid. 0:24:13.646,0:24:18.745 For the other two programs, the estimates are acceptable, 0:24:18.745,0:24:26.077 although with a significantly large confidence interval, especially for CurrentNe. 0:24:26.077,0:24:36.508 In terms of estimating the historical population size and the effect of admixture on these estimations, 0:24:36.508,0:24:47.706 the populations were again divided according to the level of expected admixture into groups of breeds with expected low admixture, 0:24:47.706,0:24:51.138 the results of which are shown on the following slide. 0:24:51.138,0:24:59.537 and breeds expected to have high admixture, the results of which are shown on the following slide. 0:24:59.537,0:25:08.268 The results show a clear decrease in effective population size over the 40 generations. 0:25:08.268,0:25:14.534 The most significant decrease is presented approximately 6 generations ago. 0:25:14.534,0:25:29.850 Considering the value of the generation interval for the breeds studied of 4-5 years, we get to the close of the 1990s. 0:25:29.850,0:25:37.463 This period represented a very turbulent period for small ruminant breeding in the study area, 0:25:37.463,0:25:45.994 during which there was a significant reduction in the number of breeds bred. 0:25:45.994,0:25:58.825 However, for breeds that were expected to have a higher admixture rate, here represented for example by crossbreeds or synthetic breeds, 0:25:58.825,0:26:02.191 very different results were estimated. 0:26:02.191,0:26:12.456 The slide shows that the presence of mixing in the analysed populations can be estimated from these results. 0:26:12.456,0:26:24.587 The high degree of mixing is manifested in very different patterns in the historical estimates of effective population size. 0:26:24.587,0:26:26.586 The high degree of mixing is manifested by a sharp to steep increase in effective population size, 0:26:26.586,0:26:43.949 which is unrealistic for the classic population and is clearly visible in this slide. 0:26:43.949,0:26:54.614 Similar trends in effective population size were estimated for the analysed goat breeds as for the analysed sheep breeds. 0:26:54.614,0:27:01.079 The slide shows two distinct occurrences of genetic drift. 0:27:01.079,0:27:10.977 The first in the range of 8 to 9 generations and the second in the range of 23 to 26 generations. 0:27:10.977,0:27:25.808 Again, as in the sheep breeds, these occurrences of genetic drift are consistent with the historical evolution of the breeds in the past. 0:27:25.808,0:27:38.272 In conclusion, the analysis showed that the tested computer programs are applicable for estimating effective population size based on genomic data. 0:27:38.272,0:27:48.603 The different programs have certain advantages and disadvantages that have been presented in this case study. 0:27:48.603,0:27:52.403 From the comparison of the different computer programs, 0:27:52.403,0:28:02.401 the GONE program emerges as the most suitable and universal program for predicting effective population size. 0:28:05.833,0:28:13.532 In this presentation we focused on options and procedures for estimation of effective population size 0:28:13.532,0:28:15.965 using genomic data. 0:28:15.965,0:28:19.197 Thank you for your attention.