0:00:00.000,0:00:06.499
Welcome to another video focusing on methodological approaches
0:00:06.499,0:00:12.198
to estimating effective population size using genomic data.
0:00:16.431,0:00:23.396
The effective population size is one of the most important parameters in conservation genetics.
0:00:23.396,0:00:32.661
The effective population size of a real population X is defined as the size of a hypothetical ideal population
0:00:32.661,0:00:40.959
that exhibits the same genetic drift as the real population under study.
0:00:40.959,0:00:49.324
The infinitive ideal population size, defined by Ronald Fisher and Sewall Wright,
0:00:49.324,0:00:55.556
is a population that is defined by the following parameters:
0:00:55.556,0:01:02.788
non-overlapping population, diploid individuals that sexually reproduce,
0:01:02.788,0:01:09.320
with a sex ratio of 1:1 and reproduce randomly.
0:01:09.320,0:01:20.818
In addition, in an ideal population defined in this way, migration, mutation and selection is not occur.
0:01:20.818,0:01:27.550
The population assumes a constant size across generations.
0:01:27.550,0:01:36.581
In an ideal infinitely large population, there is no change in allele frequency,
0:01:36.581,0:01:43.913
no change in inbreeding coefficient, no random loss of alleles across time.
0:01:43.913,0:01:50.079
The alleles of a given ideal population occur in linkage equilibrium.
0:01:50.079,0:01:58.977
On the other hand, in an ideal population of finite size, genetic drift already occurs,
0:01:58.977,0:02:08.075
which changes the frequency of individual alleles, as well as the change in the value of the inbreeding coefficient.
0:02:08.075,0:02:17.373
Furthermore, genetic drift in the final ideal population causes loss or fixation of alleles
0:02:17.373,0:02:24.339
and the occurrence of linkage disequilibrium between alleles of the population.
0:02:24.339,0:02:34.437
Since the effective population size is one of the most important population parameters,
0:02:34.437,0:02:39.002
there are a number of methodological approaches for estimating this parameter.
0:02:39.002,0:02:44.135
This slide shows some of the most important approaches.
0:02:44.135,0:02:51.367
the effective population size estimated based on population variability,
0:02:51.367,0:02:57.499
evaluated by changes in the frequency of alleles in the population,
0:02:57.499,0:03:03.497
another approach is based on the change in the level of the inbreeding coefficient,
0:03:03.497,0:03:15.662
and last but not least, the approach that evaluates the level of change in the linkage disequilibrium in the population should be mentioned.
0:03:15.662,0:03:23.394
Different estimates of effective population size are based on different methodological approaches
0:03:23.394,0:03:36.724
and therefore the estimates of effective population size based on these different approaches may not always be the same.
0:03:36.724,0:03:47.689
From the point of view of using genomic data, the most used approach to estimate the effective population size
0:03:47.689,0:03:52.188
is based on determining the level of linkage disequilibrium in the population.
0:03:52.188,0:04:01.853
However, this approach is based on a different definition of effective population size than the other approaches mentioned.
0:04:01.853,0:04:10.284
When the effective population size of a real population X with an observed level of linkage disequilibrium
0:04:10.284,0:04:17.983
for a given genomic segment corresponds to the size of a hypothetical idealized population
0:04:17.983,0:04:29.780
that shows identical levels of linkage disequilibrium patterns for the same genomic distance interval as observed in the real finite population.
0:04:29.780,0:04:36.312
Note that the definition of effective population size based on linkage disequilibrium
0:04:36.312,0:04:44.444
is different from the other definitions of effective population size estimation mentioned above.
0:04:46.810,0:04:57.775
As already mentioned, in an ideal infinite population that has reached equilibrium state, all loci are in linkage equilibrium.
0:04:57.775,0:05:06.740
However, in an ideal population with a finite number of individuals that reaches equilibrium state,
0:05:06.740,0:05:19.471
the loci are in linkage disequilibrium, and the linkage disequilibrium is a function of the population size and the genetic distance between loci.
0:05:19.471,0:05:24.370
And just based on the distances between loci,
0:05:24.370,0:05:32.001
it is possible to estimate not only the level of the effective size of the current population,
0:05:32.001,0:05:41.466
but also the effective size of the population in the previous generations.
0:05:41.466,0:05:54.964
This slide shows the most common computer programs used to estimate effective population size based on both pedigree and molecular genetic data.
0:05:54.964,0:06:02.529
The group of programs using molecular genetic data can be divided into two groups,
0:06:02.529,0:06:09.461
namely programs estimating effective population size based on linkage disequilibrium.
0:06:09.461,0:06:23.425
This group includes, for example, the very popular program NeEstimator, which works not only with genomic data but also with microsatellites,
0:06:23.425,0:06:29.290
and the recently published programs GONE and CurrentNe.
0:06:29.290,0:06:41.888
The SNeP program also belongs to this group, but is mainly suitable for estimating historical effective population size.
0:06:41.888,0:06:55.785
Another group of programmes are NeIBD and HapNe, which are based on an approach based on estimating the coefficient of inbreeding.
0:06:55.785,0:07:01.584
These two programs are primarily designed for the analysis of human data
0:07:01.584,0:07:09.549
and are not well suited for estimating the effective population size of livestock.
0:07:13.448,0:07:26.612
We used genomic data from 215 Old Kladruber horses to compare the individual programs to estimate effective population size.
0:07:26.612,0:07:33.477
This population, which is classified among the genetic resources of the Czech Republic,
0:07:33.477,0:07:44.175
shows very accurate pedigree records, but also accurately captures the historical development of the breed,
0:07:44.175,0:07:56.106
which allows us to verify the suitability of individual programs and the accuracy of estimating the effective population size.
0:07:56.106,0:08:08.470
The pedigree records of this breed date back to 1773 and include 49 generations of ancestors
0:08:08.470,0:08:15.136
with the equivalent of complete generations corresponding to 16 generations.
0:08:15.136,0:08:27.367
In addition, the molecular genetic data were adjusted for the analysis according to looser or stricter error corrections,
0:08:27.367,0:08:41.730
namely quality control data 1 (Q1) where approximately 39,000 molecular genetic data are included in the analysis
0:08:41.730,0:08:50.929
and quality control data 2 (Q2) which included approximately 61,000 molecular genetic data.
0:08:50.929,0:08:59.227
In addition, the effective population size was estimated based on pedigree data for comparison.
0:09:01.627,0:09:10.258
The results of the comparison of the different computer programs show that when using a larger number of SNPs,
0:09:10.258,0:09:16.890
most of the programs used showed very similar values;
0:09:16.890,0:09:32.720
the only exception was the SNeP program, which showed significantly lower estimates of effective population size when using both larger and fewer SNPs.
0:09:32.720,0:09:51.350
quality control 1 and quality control 2. The computer program NeEstimator also showed lower estimates when using data with lower SNP densities (quality of control 1).
0:09:51.350,0:10:00.048
Next, the estimation of the historical effective population size was verified using SNeP and GONE softwares.
0:10:00.048,0:10:09.780
Looking at the results, it is clear that the guess of the historical effective population size by the GONE program
0:10:09.780,0:10:28.743
very realistically reproduces the historical evolution of the breed under study and declares the occurrence of the genetic drift effect, which generally occurred in most horse populations in Europe after the First World War,
0:10:28.743,0:10:51.138
the collapse of the Habsburg Monarchy and the development of mechanization in agriculture. The SNeP program significantly underestimates the estimates of effective population size in all generations studied.
0:10:53.438,0:10:59.936
Based on the conclusions reached, the GONE program shows the highest suitability
0:10:59.936,0:11:09.335
for estimating the effective population size of livestock, companion animals and wildlife animals.
0:11:09.335,0:11:20.766
The following slides show the parameter settings and how to run the GONE program to estimate the current and historical effective population size.
0:11:20.766,0:11:27.964
The program is freely available on the following github.com web page.
0:11:27.964,0:11:36.196
The methodological approach, testing of the program and verification of the accuracy of the program
0:11:36.196,0:11:41.961
on simulated data are presented in this scientific publication.
0:11:43.861,0:11:47.427
After downloading we will get the following files.
0:11:47.427,0:11:56.725
Among these files are sample data files, marked with the name "example".
0:11:56.725,0:12:02.391
There is also a parametric file "INPUT_PARAMETERS_FILE"
0:12:02.391,0:12:11.922
and the most important file that allows us to run the program - "script_GONE.sh".
0:12:11.922,0:12:22.087
The software GONE supports input data in PLINK format, i.e. files with .ped and .map extensions.
0:12:22.087,0:12:34.684
Genomic data can be encoded in biallelic format and both codes 1 and 2 or AGCT - as nucleotide bases.
0:12:34.684,0:12:48.448
The GONE program has only one limitation on the data, namely that it can accept 100 000 SNPs per chromosome
0:12:48.448,0:12:54.180
and a maximum of 1 million SNPs per whole genome.
0:12:54.180,0:13:07.478
In terms of the number of individuals, the limitations are that the number of individuals should be between 2 and 1800 individuals.
0:13:07.478,0:13:21.475
However, the recommended minimal number of individuals for accurate estimation is between 20 and 24.
0:13:21.475,0:13:26.274
There are no other restrictions on the datasets.
0:13:26.274,0:13:32.873
This slide shows the parameter file with the default settings.
0:13:32.873,0:13:40.338
Highlighted are the most important parameters that should be verified before analysis.
0:13:40.338,0:13:49.436
The first parameter is to determine if the data is phased or not.
0:13:49.436,0:14:00.901
For analyses, it is recommended to use phased data, which provides more accurate estimates of the effective population size.
0:14:00.901,0:14:13.631
However, it is useful to consider the precision of phasing for files with small numbers of individuals.
0:14:13.631,0:14:25.596
In files with small numbers of individuals, phasing may result in higher 'noise' that affects the final estimates of effective population size.
0:14:25.596,0:14:30.561
Another parameter is the recombination ratio.
0:14:30.561,0:14:45.625
For more accurate estimates it is preferable to report the actual recombination ratios that occur in a given breed or population.
0:14:45.625,0:14:57.556
However, past analyses suggest that this default setting provides sufficiently accurate estimates of effective population size.
0:14:57.556,0:15:02.522
The last parameter represents the minor allele frequency setting.
0:15:02.522,0:15:15.386
In other population analyses or other computer programs, a minor allele frequency setting of more than 0.05 is recommended.
0:15:15.386,0:15:22.384
However, in this program, when the fixed alleles are removed,
0:15:22.384,0:15:32.582
information about linkage disequilibrium is lost and the estimates of effective population size are not as accurate.
0:15:32.582,0:15:36.982
The GONE program is run from the command line.
0:15:36.982,0:15:43.980
The program starts by typing the example given, where the last term "example"
0:15:43.980,0:15:52.712
represents the name of the genotype data files, here named example., without extension.
0:15:55.211,0:16:01.343
The following procedures should be used to estimate the effective population size.
0:16:01.343,0:16:13.841
The program uses a maximum number of SNPs per chromosome of 100,000 with a default setting of 50,000 SNPs.
0:16:13.841,0:16:20.506
At lower SNP counts, all SNPs are included in the analysis.
0:16:20.506,0:16:29.038
If more than 50,000 SNPs are present in the dataset, SNPs are selected randomly.
0:16:29.038,0:16:39.969
This can be used to estimate confidence intervals when the program is run repeatedly with random selection of SNPs.
0:16:39.969,0:16:51.000
However, it should be emphasized that the analyses should be run sequentially because random SNP selection
0:16:51.000,0:17:01.232
is based on a randomly generated initial "seed" that is generated by a computer clock-time function.
0:17:01.232,0:17:13.829
Here we show the estimate of the confidence interval for 100 random SNP samples at 100 runs of the program.
0:17:13.829,0:17:25.060
Estimates of the effective population size are recommended to be taken as the median of these 100 runs - marked in blue.
0:17:25.060,0:17:32.092
The mean is also shown after the addition, and is marked in black.
0:17:32.092,0:17:42.590
Another recommendation is to adjust the number of generations for which the effective population size is estimated.
0:17:42.590,0:17:53.288
The default ensemble setting is up to 2000 generations, but for medium-density SNP chips
0:17:53.288,0:18:08.485
chips with SNP counts around 50 000 - it is recommended to consider only a maximum of 100 generations into the past for estimation accuracy.
0:18:08.485,0:18:17.117
For example, for horses where the generation interval is equal to 10 years,
0:18:17.117,0:18:25.582
the mentioned 100 generations represent an interval of 1000 years into the past.
0:18:25.582,0:18:35.879
This represents a large distance in time, since cultural breeds began to take shape about 200 years ago.
0:18:35.879,0:18:46.244
This slide again shows an estimate of the effective population size with a prediction 100 years into the past.
0:18:48.344,0:18:59.741
Next, we focused on a case study - estimating the current and historical effective population size of sheep and goats in the Czech and Slovak Republic.
0:18:59.741,0:19:09.273
As already mentioned, in this case study we will look at the estimation of the current effective population size
0:19:09.273,0:19:19.371
and the historical effective population size of selected breeds of sheep and goats reared in the Czech and Slovak Republic.
0:19:19.371,0:19:32.068
Different types of breeds are included in the analysis of current and historical effective population size in order to verify the effect of admixture of breeds,
0:19:32.068,0:19:44.466
the autochthonous breeds as well as international breeds and hybrids, which are marked in grey and blue in the slide.
0:19:44.466,0:19:52.798
Based on a previous study, three computer programs were used to estimate effective population size,
0:19:52.798,0:20:04.495
namely NeEstimator v.2, which is one of the most commonly used programs for estimating effective population size in the last decade.
0:20:04.495,0:20:14.593
Its advantages include the ability to estimate effective population size from both microsatelite data and genomic SNP data,
0:20:14.593,0:20:20.159
the ability to estimate confidence intervals for individual estimates,
0:20:20.159,0:20:27.191
and the universality of the program's use for all types of computer systems.
0:20:27.191,0:20:37.022
Disadvantages, however, include the large time-consuming estimation process, especially for large datasets,
0:20:37.022,0:20:45.287
and the fact that for datasets with small numbers of individuals that were randomly selected from large population,
0:20:45.287,0:20:50.320
estimation of infinite effective population size often occurs.
0:20:50.320,0:20:55.718
Other programs used were GONE and CurrentNe.
0:20:55.718,0:21:06.383
These programs work only with genomic SNP data and are designed for iOS and Linux only.
0:21:06.383,0:21:13.815
In addition, the GONE program allows estimation of historical effective population size.
0:21:13.815,0:21:27.612
Unlike the NeEstimator program, the GONE program for estimating current effective population size is not significantly affected by previous population admixture.
0:21:27.612,0:21:39.343
However, the effect of admixture is reflected in the estimation of the historical effective population size for this program.
0:21:39.343,0:21:46.408
In addition, CurrentNe allows for higher kinship relationships in the analysed datasets.
0:21:46.408,0:21:51.874
However, this benefit is not very applicable to most population studies,
0:21:51.874,0:21:59.039
because most of these studies include randomly selected individuals in the analysis,
0:21:59.039,0:22:04.472
taking into account the lowest level of relatioship.
0:22:04.472,0:22:15.369
The following three slides present individual estimates of the current effective population size using the three programs tested.
0:22:15.369,0:22:23.368
The results are split by dataset, with the larger datasets represented mostly by sheep breeds.
0:22:23.368,0:22:27.867
Then the smaller data sets, represented by goat breeds.
0:22:27.867,0:22:36.365
Finally, populations of crossbred and synthetic sheep breeds and a Texel breed,
0:22:36.365,0:22:47.296
whose dataset contains only 5 individuals, are presented and used to test the ability of the programs to handle small files.
0:22:47.296,0:22:59.261
This first table shows that, although the NeEstimator program provided lower estimates of effective population size for larger files
0:22:59.261,0:23:09.925
than the other two programs, the differences are not significant based on the estimated confidence intervals.
0:23:09.925,0:23:16.724
The other two programs produced very similar estimates in most cases.
0:23:16.724,0:23:28.522
For the less numerous sets, represented here by goat breeds, the differences between programmes are already more significant.
0:23:28.522,0:23:38.753
Even significant differences between the estimates were obtained, with no overlap even for the confidence intervals.
0:23:38.753,0:23:47.784
From the results of the estimation of the effective population size for the crossbred and synthetic populations,
0:23:47.784,0:23:51.384
which includes the Slovak dairy sheep,
0:23:51.384,0:24:01.582
it is clear that the NeEstimator estimates for the current population are affected by the occurrence of admixture of genes.
0:24:01.582,0:24:09.847
Also, for a population with a significantly small number of individuals, here represented by the Texel breed,
0:24:09.847,0:24:13.646
the NeEstimator estimates are not valid.
0:24:13.646,0:24:18.745
For the other two programs, the estimates are acceptable,
0:24:18.745,0:24:26.077
although with a significantly large confidence interval, especially for CurrentNe.
0:24:26.077,0:24:36.508
In terms of estimating the historical population size and the effect of admixture on these estimations,
0:24:36.508,0:24:47.706
the populations were again divided according to the level of expected admixture into groups of breeds with expected low admixture,
0:24:47.706,0:24:51.138
the results of which are shown on the following slide.
0:24:51.138,0:24:59.537
and breeds expected to have high admixture, the results of which are shown on the following slide.
0:24:59.537,0:25:08.268
The results show a clear decrease in effective population size over the 40 generations.
0:25:08.268,0:25:14.534
The most significant decrease is presented approximately 6 generations ago.
0:25:14.534,0:25:29.850
Considering the value of the generation interval for the breeds studied of 4-5 years, we get to the close of the 1990s.
0:25:29.850,0:25:37.463
This period represented a very turbulent period for small ruminant breeding in the study area,
0:25:37.463,0:25:45.994
during which there was a significant reduction in the number of breeds bred.
0:25:45.994,0:25:58.825
However, for breeds that were expected to have a higher admixture rate, here represented for example by crossbreeds or synthetic breeds,
0:25:58.825,0:26:02.191
very different results were estimated.
0:26:02.191,0:26:12.456
The slide shows that the presence of mixing in the analysed populations can be estimated from these results.
0:26:12.456,0:26:24.587
The high degree of mixing is manifested in very different patterns in the historical estimates of effective population size.
0:26:24.587,0:26:26.586
The high degree of mixing is manifested by a sharp to steep increase in effective population size,
0:26:26.586,0:26:43.949
which is unrealistic for the classic population and is clearly visible in this slide.
0:26:43.949,0:26:54.614
Similar trends in effective population size were estimated for the analysed goat breeds as for the analysed sheep breeds.
0:26:54.614,0:27:01.079
The slide shows two distinct occurrences of genetic drift.
0:27:01.079,0:27:10.977
The first in the range of 8 to 9 generations and the second in the range of 23 to 26 generations.
0:27:10.977,0:27:25.808
Again, as in the sheep breeds, these occurrences of genetic drift are consistent with the historical evolution of the breeds in the past.
0:27:25.808,0:27:38.272
In conclusion, the analysis showed that the tested computer programs are applicable for estimating effective population size based on genomic data.
0:27:38.272,0:27:48.603
The different programs have certain advantages and disadvantages that have been presented in this case study.
0:27:48.603,0:27:52.403
From the comparison of the different computer programs,
0:27:52.403,0:28:02.401
the GONE program emerges as the most suitable and universal program for predicting effective population size.
0:28:05.833,0:28:13.532
In this presentation we focused on options and procedures for estimation of effective population size
0:28:13.532,0:28:15.965
using genomic data.
0:28:15.965,0:28:19.197
Thank you for your attention.