1
00:00:01,160 --> 00:00:01,456
Hello

2
00:00:01,478 --> 00:00:05,806
my name is Nina Moravčíková, and today I would like to present you a

3
00:00:05,838 --> 00:00:09,086
lecture with the title "Analysis of the"

4
00:00:09,238 --> 00:00:23,650
animal genetic resources biodiversity status using genomic data", which is part of the module 2: "Conservation and sustainable use of animal genetic resources" within the ISAGREED project.

5
00:00:24,190 --> 00:00:37,028
Even though this lecture is intended for students at the second level of education, it may also be beneficial in teaching both first and third level students.

6
00:00:37,028 --> 00:00:47,510
In the context of the evaluation of genomic data, it is first necessary to briefly describe how such information can be obtained.

7
00:00:47,810 --> 00:00:58,452
To analyze the genome, we can use a variety of tools, including single genetic markers, genotyping chips, or whole genome sequencing.

8
00:00:58,626 --> 00:01:11,860
The difference between these methods is related both to the laboratory procedure for their determination and to the amount of genome data that we obtained with them.

9
00:01:13,200 --> 00:01:14,912
What can we understand

10
00:01:15,056 --> 00:01:27,940
by a genetic marker? It is any characteristic trait or manifestation of an organism that can be used to identify a specific chromosome, cell, or individual.

11
00:01:28,450 --> 00:01:39,986
The term genetic marker may refer to a gene, short segment of DNA, or other manifestations of genotype, chromosome, or karyotype.

12
00:01:40,138 --> 00:01:52,112
However, it is important to remember that genetic marker is usually polymorphic variant that shows mendelistic inheritance and is correlated with variation

13
00:01:52,112 --> 00:01:58,910
in a phenotypic trait that is of importance, for example, from a breeding point of view.

14
00:01:59,950 --> 00:02:17,050
In terms of livestock production traits, both candidate genes are monitored because their alleles and genotypes influence the formation of quantitative traits and, at the same time, loci for quantitative traits.

15
00:02:18,430 --> 00:02:32,950
The advantage of DNA marker is mainly that they are directly detectable in nucleotide sequences, show an increased level of polymorphism and dominance or codominant inheritance.

16
00:02:33,330 --> 00:02:44,430
DNA markers are relatively common in the genome and can be tested relatively easy and rapidly with a high degree of repeatability.

17
00:02:45,890 --> 00:02:53,950
The most commonly used genetic markers are nowadays single nucleotide polymorphism, called SNPs.

18
00:02:54,610 --> 00:03:04,570
SNPs are usually generated by point mutation, for example, single substitution in the DNA at a particular site.

19
00:03:04,990 --> 00:03:14,530
Compared to other types of genetic markers, they occur frequently in the genome, every 100 to 300 base pairs.

20
00:03:14,870 --> 00:03:26,376
Mutations that occur at the frequency of more than 1% in a given population, that means a minor or less frequent allele is present

21
00:03:26,376 --> 00:03:35,580
in the genotype of at least 1% of individuals belonging to that population, are usually considered SNPs.

22
00:03:36,280 --> 00:03:46,296
It is a biallelic marker that means within a population we recognized only two alleles for SNP, namely dominant and recessive.

23
00:03:46,488 --> 00:03:52,400
The term dominant indicates that it is the predominant allele in individuals

24
00:04:04,750 --> 00:04:17,090
However, it is important to note that even if an allele is dominant in one population, it may not be dominant in another population with different genetic origin.

25
00:04:17,590 --> 00:04:26,370
SNP markers have wide range of applications from biodiversity evaluation to genomic selection.

26
00:04:27,990 --> 00:04:39,214
Whole genome sequencing is the term used to refer to the process of determining the exact order of nucleotides in a strand of DNA molecule

27
00:04:39,382 --> 00:04:43,158
that means determining its primary structure.

28
00:04:43,334 --> 00:04:53,630
The classical methods are Maxam-Gilbert and Sanger methods from which the currently used next generation sequencing methods are derived.

29
00:04:54,010 --> 00:05:04,510
Within the NGS methods, there are several platforms which, even though they differ in their technological approach, yield comparable outputs.

30
00:05:06,090 --> 00:05:14,197
Even though the cost of whole genome sequencing has decreased significantly compared to the previous period,

31
00:05:14,197 --> 00:05:21,778
it is still high if we want to obtain whole population data, especially for high coverage sequencing.

32
00:05:21,954 --> 00:05:31,004
For this reason, SNP genotyping chips are now being used in population wide studies, which allow to obtain information

33
00:05:31,004 --> 00:05:37,830
on a large number of SNP markers uniformly distributed across the genome at a lower cost.

34
00:05:38,530 --> 00:05:46,870
SNP chips allow genotyping from a few thousand up to 700,000 SNP markers.

35
00:05:47,210 --> 00:05:52,504
They are available for most livestock and companion animal species.

36
00:05:52,682 --> 00:06:02,647
The information obtained in this way can be used for a variety of purposes, including testing parentage, genomic diversity status,

37
00:06:02,647 --> 00:06:08,320
genome wide association studies, or estimation of genomic breeding values.

38
00:06:09,780 --> 00:06:14,916
In the following slides, we will discuss indicators that are used to estimate

39
00:06:14,948 --> 00:06:19,244
biodiversity status of animal genetic resources based

40
00:06:19,292 --> 00:06:21,440
on genomic data analysis.

41
00:06:21,620 --> 00:06:25,056
The first indicator is genome homozygosity and

42
00:06:25,088 --> 00:06:28,608
genomic inbreeding. In the context of genome

43
00:06:28,664 --> 00:06:32,296
homozygosity, two terms you will often find

44
00:06:32,368 --> 00:06:39,460
in the literature: autozygosity and runs of homozygosity, abbreviated as ROH.

45
00:06:40,200 --> 00:06:49,420
Basically, autozygosity reflects all alleles or chromosomal segments of DNA that are identical by descent

46
00:06:49,720 --> 00:06:52,660
that means coming from a common ancestor.

47
00:06:52,990 --> 00:06:56,478
Runs of homozygosity are considered to be

48
00:06:56,574 --> 00:06:59,470
all genomic regions with a specific number

49
00:06:59,510 --> 00:07:04,054
of consecutive homozygous genotypes or, when talking

50
00:07:04,102 --> 00:07:08,062
about SNP markers testing, all homozygous SNP

51
00:07:08,126 --> 00:07:19,730
markers. the distribution, number and length of runs of homozygosity depend on various factors affecting the livestock genome.

52
00:07:19,910 --> 00:07:22,826
The most significant in this context can

53
00:07:22,858 --> 00:07:25,386
be considered to be artificial selection and

54
00:07:25,418 --> 00:07:27,230
the intensity of inbreeding.

55
00:07:27,570 --> 00:07:38,070
The length of the ROH segments in an individual's genome itself corresponds to the distance of the ancestors in the individual's pedigree.

56
00:07:38,450 --> 00:07:46,546
If the parents of an individual have a common ancestor, their genome will share the same genetic variants in certain regions

57
00:07:46,618 --> 00:07:50,070
that means such parents will be identical by descent.

58
00:07:50,370 --> 00:07:53,298
If both parents transfer the same region

59
00:07:53,394 --> 00:07:57,474
to the offspring, then the offspring will be homozygous

60
00:07:57,562 --> 00:08:06,426
for the genetic variants, that means creating an ROH region in the offspring's genome.

61
00:08:06,618 --> 00:08:18,070
This assumption is the basis of the approach for estimating the genomic inbreeding coefficient through the coverage of the genome by runs of homozygosity.

62
00:08:18,960 --> 00:08:31,167
However, information about the occurrence and length of ROH segments in the genome can be used not only to estimate genomic inbreeding, but also to test the impact of artificial

63
00:08:31,167 --> 00:08:41,580
selection on specific regions in the genome, or to identify causal variants involved in the control of preferred phenotypic traits and characteristics.

64
00:08:43,240 --> 00:08:56,209
In this slide, you can see in the first part the formula for estimating genomic inbreeding, referred to as Froh, where the numerator expresses the total length of homozygous

65
00:08:56,209 --> 00:09:06,480
segments in an individual's genome, and the denominator the total genome length derived from the physical position of the markers tested.

66
00:09:06,860 --> 00:09:21,331
Froh allows to establish the trend of inbreeding, where segments longer than 4 megabases reflect autozygous regions derived from ancestors approximately 12 generations ago,

67
00:09:21,331 --> 00:09:32,066
segments longer than 8 megabases are derived from ancestors 6 generations ago, and segments longer than 16 megabases correspond

68
00:09:32,066 --> 00:09:39,420
to the proportion of autozygosity inherited from ancestors from the last 3 generations.

69
00:09:40,240 --> 00:09:51,380
Similar to pedigree inbreeding, the genomic inbreeding values range from 0 to 1, or in percentage terms, from 0 to 100%.

70
00:09:52,130 --> 00:10:03,918
Information on the increase in inbreeding per generation and the overall inbreeding coefficient is important both in terms of the occurrence of inbreeding depression

71
00:10:03,918 --> 00:10:08,990
and, at the same time, the survival of the population in the long term.

72
00:10:09,490 --> 00:10:18,510
One of the reasons is that the accumulation of inbreeding across generations leads to a reduction in genetic diversity.

73
00:10:18,960 --> 00:10:30,340
It is generally accepted that the increase in inbreeding per generation should not exceed 1% in small population and 4% in large populations.

74
00:10:30,680 --> 00:10:40,568
The most commonly used programs to estimate genomic inbreeding coefficients include detectRUNS, Plink or cgaTOH.

75
00:10:40,744 --> 00:10:55,240
In the figure you can see the results from a comparative analysis of the inbreeding coefficient in 15 cattle breeds based on ROH segments longer than 4 and 8 Mbp.

76
00:10:57,540 --> 00:11:06,061
Another indicator of biodiversity that we can estimate by testing genomic markers is the linkage disequilibrium

77
00:11:06,061 --> 00:11:13,200
between SNP markers in the genome and consequently the effective population size based on it.

78
00:11:13,580 --> 00:11:26,265
The term linkage disequilibrium essentially refers to a non-random relationship or association between alleles of different SNP markers in the genome of the evaluated

79
00:11:26,265 --> 00:11:33,830
population, which is likely to be due to selection, mating system, recombination, or genetic drift.

80
00:11:34,170 --> 00:11:35,066
As a result,

81
00:11:35,178 --> 00:11:44,390
this means that such genetic variants can produce specific combination of genotypes in a population, also called haplotypes.

82
00:11:44,810 --> 00:11:57,595
Information on the level of linkage disequilibrium can be used to assess the evolutionary shaping of population, to estimate effective population size, or, as in the case of ROH

83
00:11:57,595 --> 00:12:07,130
segments, to test the occurrence of specific genetic variants that have been strongly influenced by artificial or natural selection.

84
00:12:08,710 --> 00:12:16,810
The most commonly used formula for calculating linkage disequilibrium between SNP markers is shown in the slide.

85
00:12:17,240 --> 00:12:23,656
However, in addition to this formula proposed by Hill and Robertson, there are other

86
00:12:23,656 --> 00:12:35,648
modifications of it that take into account, for example, the mutation rate or the nature of the genetic markers tested that can be biallelic or multiallelic.

87
00:12:35,824 --> 00:12:51,360
The range of values in the case of linkage disequilibrium range from 0 to 1, with 0 indicating linkage equilibrium between markers and 1 indicating complete linkage disequilibrium.

88
00:12:52,740 --> 00:13:01,920
Effective population size essentially reflects the number of individuals that are active in reproduction in a given population

89
00:13:02,540 --> 00:13:07,000
that means can produce individuals for the next generation.

90
00:13:07,700 --> 00:13:17,979
Estimation of this parameter in the case of genomic data is most often based on its relation to the degree of linkage disequilibrium in the genome,

91
00:13:17,979 --> 00:13:26,650
where it is possible to test not only the current effective population size but also the trend of its evolution in the past.

92
00:13:28,190 --> 00:13:37,310
The effective population size, abbreviated as Ne, can be determined using, for example, the formula proposed by Corbin et al.

93
00:13:37,390 --> 00:13:38,890
as shown in this slide.

94
00:13:39,390 --> 00:13:48,610
This formula takes into account the inheritance model, the physical distance between SNP markers or the intensity of mutations.

95
00:13:49,220 --> 00:13:55,527
In this case, the historical effective size is estimated as a function of time and the physical

96
00:13:55,527 --> 00:14:03,360
distance between the two markers, assuming a constant linear growth of Ne with the time expressed by past generations.

97
00:14:04,020 --> 00:14:16,200
The figure on the right shows representative results of the analysis of effective population size trend in two cattle breeds: Slovaks Spotted and Slovak Pinzgau.

98
00:14:16,510 --> 00:14:23,290
Similar to pedigree information, the effective population size can range from 0 to n.

99
00:14:23,830 --> 00:14:37,766
It is generally accepted that the effective population size should not be less than 50 individuals in the case of small populations or 100 individuals in the case of large populations.

100
00:14:37,958 --> 00:14:53,900
In terms of long term sustainability, the effective population size should be at least 500 individuals. In the case of genomic data, programs such as SneP or GONE can be used for its calculation.

101
00:14:55,280 --> 00:15:07,192
In animal genetic resources, indicators describing population structure at intra and interpopulation level are often evaluated.

102
00:15:07,256 --> 00:15:17,910
Genetic distances are most often analyzed as they reflect the degree of genetic differences between individuals, populations or species.

103
00:15:18,330 --> 00:15:27,750
The most commonly discussed in the literature are Nei's genetic distance, Wright's fixation index Fst, principal component analysis

104
00:15:27,750 --> 00:15:34,150
or methods quantifying the degree of genetic admixture and gene flow between populations.

105
00:15:35,370 --> 00:15:48,290
Nei's genetic distance theory assumes that if two populations showing low genetic distances are similar, they share common ancestors with a high degree of confidence.

106
00:15:48,750 --> 00:15:59,370
For this reason, this indicator can also be considered as the molecular equivalent of the relatedness coefficient calculated on the basis of pedigree information.

107
00:16:00,350 --> 00:16:06,382
You can see the formula for calculating the standard Nei's genetic distance on the slide.

108
00:16:06,566 --> 00:16:11,382
The minimum value that the Nei's genetic distance can take is 0.

109
00:16:11,566 --> 00:16:20,402
This value means that individuals or populations have the same variants (alleles or genotypes) in the genome

110
00:16:20,586 --> 00:16:23,270
that means they are genetically identical.

111
00:16:23,690 --> 00:16:28,266
The maximum value that the Nei's genetic distance can take is 1.

112
00:16:28,458 --> 00:16:39,630
This value reflects the fact that due to completely different genetic variants, individuals or populations are genetically different and we can say, unrelated.

113
00:16:40,130 --> 00:16:50,168
To calculate Nei's genetic distances, we can use the R package StAMPP or other programs. Compared to Nei's genetic distance

114
00:16:50,264 --> 00:16:57,300
Wright's Fst fixation index only allows to estimate the level of diversity at the population level.

115
00:16:57,680 --> 00:17:02,480
This index is essentially an indicator of the intensity of population

116
00:17:02,600 --> 00:17:10,336
fragmentation, expressed as a decrease in heterozygosity in subpopulations due to the effect of genetic drift.

117
00:17:10,528 --> 00:17:21,884
Hence, to calculate this index, we need to have information about the expected heterozygosity within the metapopulation and the average heterozygosity within subpopulations

118
00:17:21,972 --> 00:17:33,804
as you can see in the formula on the slide. Wright's fixation index Fst takes values from 0 to 1 and the interpretation of the values is similar to that of Nei's

119
00:17:33,852 --> 00:17:35,040
genetic distances.

120
00:17:35,740 --> 00:17:42,760
If the value of the index is equal to 0 populations are genetically identical and opposite

121
00:17:43,060 --> 00:17:48,000
if the value is equal to one the populations are genetically distinct.

122
00:17:48,400 --> 00:17:55,580
In real livestock populations, the value of this index usually ranges from 0 to 0.5,

123
00:17:55,920 --> 00:18:09,220
of course, if we are testing a single species. Populations with Fst value higher than 0.25 are considered to be genetically differentiated.

124
00:18:10,280 --> 00:18:24,040
Other commonly used approaches to evaluate population structure and genetic relationships between populations include principal component analysis and bayesian analysis of genetic admixture.

125
00:18:24,460 --> 00:18:36,100
Principal component analysis is a popular multivariate statistical method that has found applications in various scientific fields, including population genetics.

126
00:18:36,260 --> 00:18:48,250
Simply said, this analysis is used to represent high dimensional data, for example, genomic information about individuals or populations, in a fewer dimensions.

127
00:18:48,790 --> 00:18:54,490
Bayesian statistics is a method that is used in other scientific disciplines as well.

128
00:18:54,910 --> 00:19:08,010
This statistics operates with conditional probability and allows the probability of the initial hypothesis to be refined in a sequence as other relevant facts appear.

129
00:19:08,750 --> 00:19:17,504
This slide shows representative results of testing the proportion of genetic admixture, principal component analysis and gene flow.

130
00:19:17,702 --> 00:19:30,724
In the case of the first figure, this is a bayesian analysis of admixture within 15 cattle breeds with the proportion of admixture within breads represented by different colors of the lines.

131
00:19:30,892 --> 00:19:45,440
The second figure on the left shows representative results of the principal component analyzed, with the degree of admixture being best seen in the part D through the overlapping peaks of different colors.

132
00:19:45,930 --> 00:19:58,710
In the third figure on the right, we can see the results of the genetic admixture analysis of the four breeds and the numerical representation of the gene flow between their gene pools.

133
00:19:59,410 --> 00:20:09,771
As I mentioned before in the case of ROH segments and linkage disequilibrium, in addition to standard indicators such as Ne and F,

134
00:20:09,771 --> 00:20:20,930
we can also evaluate the effect of selection on the genomic structure or identify specific genetic variants under strong selection pressure.

135
00:20:21,350 --> 00:20:30,410
Unlike whole genome association studies, this approach does not require access to phenotypic information about individuals.

136
00:20:30,790 --> 00:20:42,450
It is essentially the identification of so called selection signals, the occurrence of which depends on variety of factors (in livestock mainly artificial selection).

137
00:20:42,910 --> 00:20:52,134
Two groups of methods are basically used for this purpose: methods testing differences between populations or

138
00:20:52,134 --> 00:21:00,342
breeds and methods analyzing intrapopulation differences. In terms of interpopulation differences

139
00:21:00,486 --> 00:21:16,062
whole genome screening of the Fst fixation index, analysis of variability in linkage disequilibrium, or calculation of integrated haplotype scores are most commonly used to identify selection signals.

140
00:21:16,246 --> 00:21:23,850
A number of programs exist for this purpose such as Plink, varLD or R package rehh.

141
00:21:24,550 --> 00:21:34,870
The figure on the right shows the results of the analysis of testing selection signals reflecting differences between Slovak Spotted and Slovak Pinzgau cattle.

142
00:21:35,030 --> 00:21:50,074
The strongest signals were found in the casein gene family and KIT and KDR genes responsible for spotting. Within intrapopulation differences,

143
00:21:50,202 --> 00:21:59,578
selection signals are usually determined based on the distribution of runs of homozygosity or variation in linkage disequilibrium.

144
00:21:59,714 --> 00:22:04,630
The same programs as for the previous methods can be used for the calculation.

145
00:22:05,010 --> 00:22:13,502
The figure on the right shows the results of testing ROH segments distribution in the genome of Slovak Spotted and Slovak Pinzgau cattle.

146
00:22:13,666 --> 00:22:23,850
The results show that similar to previos approach, the selection signals are strongest in the genomic regions of casein family genes.

147
00:22:26,230 --> 00:22:34,330
If you have any questions about the presentation, please contact me at the email address shown in the slide.

148
00:22:34,790 --> 00:22:42,890
Information about the project, including access to other presentations, can be found by scanning the barcode on the left.

149
00:22:43,290 --> 00:22:44,690
Thank you for your attention.