1 00:00:01,160 --> 00:00:01,456 Hello 2 00:00:01,478 --> 00:00:05,806 my name is Nina Moravčíková, and today I would like to present you a 3 00:00:05,838 --> 00:00:09,086 lecture with the title "Analysis of the" 4 00:00:09,238 --> 00:00:23,650 animal genetic resources biodiversity status using genomic data", which is part of the module 2: "Conservation and sustainable use of animal genetic resources" within the ISAGREED project. 5 00:00:24,190 --> 00:00:37,028 Even though this lecture is intended for students at the second level of education, it may also be beneficial in teaching both first and third level students. 6 00:00:37,028 --> 00:00:47,510 In the context of the evaluation of genomic data, it is first necessary to briefly describe how such information can be obtained. 7 00:00:47,810 --> 00:00:58,452 To analyze the genome, we can use a variety of tools, including single genetic markers, genotyping chips, or whole genome sequencing. 8 00:00:58,626 --> 00:01:11,860 The difference between these methods is related both to the laboratory procedure for their determination and to the amount of genome data that we obtained with them. 9 00:01:13,200 --> 00:01:14,912 What can we understand 10 00:01:15,056 --> 00:01:27,940 by a genetic marker? It is any characteristic trait or manifestation of an organism that can be used to identify a specific chromosome, cell, or individual. 11 00:01:28,450 --> 00:01:39,986 The term genetic marker may refer to a gene, short segment of DNA, or other manifestations of genotype, chromosome, or karyotype. 12 00:01:40,138 --> 00:01:52,112 However, it is important to remember that genetic marker is usually polymorphic variant that shows mendelistic inheritance and is correlated with variation 13 00:01:52,112 --> 00:01:58,910 in a phenotypic trait that is of importance, for example, from a breeding point of view. 14 00:01:59,950 --> 00:02:17,050 In terms of livestock production traits, both candidate genes are monitored because their alleles and genotypes influence the formation of quantitative traits and, at the same time, loci for quantitative traits. 15 00:02:18,430 --> 00:02:32,950 The advantage of DNA marker is mainly that they are directly detectable in nucleotide sequences, show an increased level of polymorphism and dominance or codominant inheritance. 16 00:02:33,330 --> 00:02:44,430 DNA markers are relatively common in the genome and can be tested relatively easy and rapidly with a high degree of repeatability. 17 00:02:45,890 --> 00:02:53,950 The most commonly used genetic markers are nowadays single nucleotide polymorphism, called SNPs. 18 00:02:54,610 --> 00:03:04,570 SNPs are usually generated by point mutation, for example, single substitution in the DNA at a particular site. 19 00:03:04,990 --> 00:03:14,530 Compared to other types of genetic markers, they occur frequently in the genome, every 100 to 300 base pairs. 20 00:03:14,870 --> 00:03:26,376 Mutations that occur at the frequency of more than 1% in a given population, that means a minor or less frequent allele is present 21 00:03:26,376 --> 00:03:35,580 in the genotype of at least 1% of individuals belonging to that population, are usually considered SNPs. 22 00:03:36,280 --> 00:03:46,296 It is a biallelic marker that means within a population we recognized only two alleles for SNP, namely dominant and recessive. 23 00:03:46,488 --> 00:03:52,400 The term dominant indicates that it is the predominant allele in individuals 24 00:04:04,750 --> 00:04:17,090 However, it is important to note that even if an allele is dominant in one population, it may not be dominant in another population with different genetic origin. 25 00:04:17,590 --> 00:04:26,370 SNP markers have wide range of applications from biodiversity evaluation to genomic selection. 26 00:04:27,990 --> 00:04:39,214 Whole genome sequencing is the term used to refer to the process of determining the exact order of nucleotides in a strand of DNA molecule 27 00:04:39,382 --> 00:04:43,158 that means determining its primary structure. 28 00:04:43,334 --> 00:04:53,630 The classical methods are Maxam-Gilbert and Sanger methods from which the currently used next generation sequencing methods are derived. 29 00:04:54,010 --> 00:05:04,510 Within the NGS methods, there are several platforms which, even though they differ in their technological approach, yield comparable outputs. 30 00:05:06,090 --> 00:05:14,197 Even though the cost of whole genome sequencing has decreased significantly compared to the previous period, 31 00:05:14,197 --> 00:05:21,778 it is still high if we want to obtain whole population data, especially for high coverage sequencing. 32 00:05:21,954 --> 00:05:31,004 For this reason, SNP genotyping chips are now being used in population wide studies, which allow to obtain information 33 00:05:31,004 --> 00:05:37,830 on a large number of SNP markers uniformly distributed across the genome at a lower cost. 34 00:05:38,530 --> 00:05:46,870 SNP chips allow genotyping from a few thousand up to 700,000 SNP markers. 35 00:05:47,210 --> 00:05:52,504 They are available for most livestock and companion animal species. 36 00:05:52,682 --> 00:06:02,647 The information obtained in this way can be used for a variety of purposes, including testing parentage, genomic diversity status, 37 00:06:02,647 --> 00:06:08,320 genome wide association studies, or estimation of genomic breeding values. 38 00:06:09,780 --> 00:06:14,916 In the following slides, we will discuss indicators that are used to estimate 39 00:06:14,948 --> 00:06:19,244 biodiversity status of animal genetic resources based 40 00:06:19,292 --> 00:06:21,440 on genomic data analysis. 41 00:06:21,620 --> 00:06:25,056 The first indicator is genome homozygosity and 42 00:06:25,088 --> 00:06:28,608 genomic inbreeding. In the context of genome 43 00:06:28,664 --> 00:06:32,296 homozygosity, two terms you will often find 44 00:06:32,368 --> 00:06:39,460 in the literature: autozygosity and runs of homozygosity, abbreviated as ROH. 45 00:06:40,200 --> 00:06:49,420 Basically, autozygosity reflects all alleles or chromosomal segments of DNA that are identical by descent 46 00:06:49,720 --> 00:06:52,660 that means coming from a common ancestor. 47 00:06:52,990 --> 00:06:56,478 Runs of homozygosity are considered to be 48 00:06:56,574 --> 00:06:59,470 all genomic regions with a specific number 49 00:06:59,510 --> 00:07:04,054 of consecutive homozygous genotypes or, when talking 50 00:07:04,102 --> 00:07:08,062 about SNP markers testing, all homozygous SNP 51 00:07:08,126 --> 00:07:19,730 markers. the distribution, number and length of runs of homozygosity depend on various factors affecting the livestock genome. 52 00:07:19,910 --> 00:07:22,826 The most significant in this context can 53 00:07:22,858 --> 00:07:25,386 be considered to be artificial selection and 54 00:07:25,418 --> 00:07:27,230 the intensity of inbreeding. 55 00:07:27,570 --> 00:07:38,070 The length of the ROH segments in an individual's genome itself corresponds to the distance of the ancestors in the individual's pedigree. 56 00:07:38,450 --> 00:07:46,546 If the parents of an individual have a common ancestor, their genome will share the same genetic variants in certain regions 57 00:07:46,618 --> 00:07:50,070 that means such parents will be identical by descent. 58 00:07:50,370 --> 00:07:53,298 If both parents transfer the same region 59 00:07:53,394 --> 00:07:57,474 to the offspring, then the offspring will be homozygous 60 00:07:57,562 --> 00:08:06,426 for the genetic variants, that means creating an ROH region in the offspring's genome. 61 00:08:06,618 --> 00:08:18,070 This assumption is the basis of the approach for estimating the genomic inbreeding coefficient through the coverage of the genome by runs of homozygosity. 62 00:08:18,960 --> 00:08:31,167 However, information about the occurrence and length of ROH segments in the genome can be used not only to estimate genomic inbreeding, but also to test the impact of artificial 63 00:08:31,167 --> 00:08:41,580 selection on specific regions in the genome, or to identify causal variants involved in the control of preferred phenotypic traits and characteristics. 64 00:08:43,240 --> 00:08:56,209 In this slide, you can see in the first part the formula for estimating genomic inbreeding, referred to as Froh, where the numerator expresses the total length of homozygous 65 00:08:56,209 --> 00:09:06,480 segments in an individual's genome, and the denominator the total genome length derived from the physical position of the markers tested. 66 00:09:06,860 --> 00:09:21,331 Froh allows to establish the trend of inbreeding, where segments longer than 4 megabases reflect autozygous regions derived from ancestors approximately 12 generations ago, 67 00:09:21,331 --> 00:09:32,066 segments longer than 8 megabases are derived from ancestors 6 generations ago, and segments longer than 16 megabases correspond 68 00:09:32,066 --> 00:09:39,420 to the proportion of autozygosity inherited from ancestors from the last 3 generations. 69 00:09:40,240 --> 00:09:51,380 Similar to pedigree inbreeding, the genomic inbreeding values range from 0 to 1, or in percentage terms, from 0 to 100%. 70 00:09:52,130 --> 00:10:03,918 Information on the increase in inbreeding per generation and the overall inbreeding coefficient is important both in terms of the occurrence of inbreeding depression 71 00:10:03,918 --> 00:10:08,990 and, at the same time, the survival of the population in the long term. 72 00:10:09,490 --> 00:10:18,510 One of the reasons is that the accumulation of inbreeding across generations leads to a reduction in genetic diversity. 73 00:10:18,960 --> 00:10:30,340 It is generally accepted that the increase in inbreeding per generation should not exceed 1% in small population and 4% in large populations. 74 00:10:30,680 --> 00:10:40,568 The most commonly used programs to estimate genomic inbreeding coefficients include detectRUNS, Plink or cgaTOH. 75 00:10:40,744 --> 00:10:55,240 In the figure you can see the results from a comparative analysis of the inbreeding coefficient in 15 cattle breeds based on ROH segments longer than 4 and 8 Mbp. 76 00:10:57,540 --> 00:11:06,061 Another indicator of biodiversity that we can estimate by testing genomic markers is the linkage disequilibrium 77 00:11:06,061 --> 00:11:13,200 between SNP markers in the genome and consequently the effective population size based on it. 78 00:11:13,580 --> 00:11:26,265 The term linkage disequilibrium essentially refers to a non-random relationship or association between alleles of different SNP markers in the genome of the evaluated 79 00:11:26,265 --> 00:11:33,830 population, which is likely to be due to selection, mating system, recombination, or genetic drift. 80 00:11:34,170 --> 00:11:35,066 As a result, 81 00:11:35,178 --> 00:11:44,390 this means that such genetic variants can produce specific combination of genotypes in a population, also called haplotypes. 82 00:11:44,810 --> 00:11:57,595 Information on the level of linkage disequilibrium can be used to assess the evolutionary shaping of population, to estimate effective population size, or, as in the case of ROH 83 00:11:57,595 --> 00:12:07,130 segments, to test the occurrence of specific genetic variants that have been strongly influenced by artificial or natural selection. 84 00:12:08,710 --> 00:12:16,810 The most commonly used formula for calculating linkage disequilibrium between SNP markers is shown in the slide. 85 00:12:17,240 --> 00:12:23,656 However, in addition to this formula proposed by Hill and Robertson, there are other 86 00:12:23,656 --> 00:12:35,648 modifications of it that take into account, for example, the mutation rate or the nature of the genetic markers tested that can be biallelic or multiallelic. 87 00:12:35,824 --> 00:12:51,360 The range of values in the case of linkage disequilibrium range from 0 to 1, with 0 indicating linkage equilibrium between markers and 1 indicating complete linkage disequilibrium. 88 00:12:52,740 --> 00:13:01,920 Effective population size essentially reflects the number of individuals that are active in reproduction in a given population 89 00:13:02,540 --> 00:13:07,000 that means can produce individuals for the next generation. 90 00:13:07,700 --> 00:13:17,979 Estimation of this parameter in the case of genomic data is most often based on its relation to the degree of linkage disequilibrium in the genome, 91 00:13:17,979 --> 00:13:26,650 where it is possible to test not only the current effective population size but also the trend of its evolution in the past. 92 00:13:28,190 --> 00:13:37,310 The effective population size, abbreviated as Ne, can be determined using, for example, the formula proposed by Corbin et al. 93 00:13:37,390 --> 00:13:38,890 as shown in this slide. 94 00:13:39,390 --> 00:13:48,610 This formula takes into account the inheritance model, the physical distance between SNP markers or the intensity of mutations. 95 00:13:49,220 --> 00:13:55,527 In this case, the historical effective size is estimated as a function of time and the physical 96 00:13:55,527 --> 00:14:03,360 distance between the two markers, assuming a constant linear growth of Ne with the time expressed by past generations. 97 00:14:04,020 --> 00:14:16,200 The figure on the right shows representative results of the analysis of effective population size trend in two cattle breeds: Slovaks Spotted and Slovak Pinzgau. 98 00:14:16,510 --> 00:14:23,290 Similar to pedigree information, the effective population size can range from 0 to n. 99 00:14:23,830 --> 00:14:37,766 It is generally accepted that the effective population size should not be less than 50 individuals in the case of small populations or 100 individuals in the case of large populations. 100 00:14:37,958 --> 00:14:53,900 In terms of long term sustainability, the effective population size should be at least 500 individuals. In the case of genomic data, programs such as SneP or GONE can be used for its calculation. 101 00:14:55,280 --> 00:15:07,192 In animal genetic resources, indicators describing population structure at intra and interpopulation level are often evaluated. 102 00:15:07,256 --> 00:15:17,910 Genetic distances are most often analyzed as they reflect the degree of genetic differences between individuals, populations or species. 103 00:15:18,330 --> 00:15:27,750 The most commonly discussed in the literature are Nei's genetic distance, Wright's fixation index Fst, principal component analysis 104 00:15:27,750 --> 00:15:34,150 or methods quantifying the degree of genetic admixture and gene flow between populations. 105 00:15:35,370 --> 00:15:48,290 Nei's genetic distance theory assumes that if two populations showing low genetic distances are similar, they share common ancestors with a high degree of confidence. 106 00:15:48,750 --> 00:15:59,370 For this reason, this indicator can also be considered as the molecular equivalent of the relatedness coefficient calculated on the basis of pedigree information. 107 00:16:00,350 --> 00:16:06,382 You can see the formula for calculating the standard Nei's genetic distance on the slide. 108 00:16:06,566 --> 00:16:11,382 The minimum value that the Nei's genetic distance can take is 0. 109 00:16:11,566 --> 00:16:20,402 This value means that individuals or populations have the same variants (alleles or genotypes) in the genome 110 00:16:20,586 --> 00:16:23,270 that means they are genetically identical. 111 00:16:23,690 --> 00:16:28,266 The maximum value that the Nei's genetic distance can take is 1. 112 00:16:28,458 --> 00:16:39,630 This value reflects the fact that due to completely different genetic variants, individuals or populations are genetically different and we can say, unrelated. 113 00:16:40,130 --> 00:16:50,168 To calculate Nei's genetic distances, we can use the R package StAMPP or other programs. Compared to Nei's genetic distance 114 00:16:50,264 --> 00:16:57,300 Wright's Fst fixation index only allows to estimate the level of diversity at the population level. 115 00:16:57,680 --> 00:17:02,480 This index is essentially an indicator of the intensity of population 116 00:17:02,600 --> 00:17:10,336 fragmentation, expressed as a decrease in heterozygosity in subpopulations due to the effect of genetic drift. 117 00:17:10,528 --> 00:17:21,884 Hence, to calculate this index, we need to have information about the expected heterozygosity within the metapopulation and the average heterozygosity within subpopulations 118 00:17:21,972 --> 00:17:33,804 as you can see in the formula on the slide. Wright's fixation index Fst takes values from 0 to 1 and the interpretation of the values is similar to that of Nei's 119 00:17:33,852 --> 00:17:35,040 genetic distances. 120 00:17:35,740 --> 00:17:42,760 If the value of the index is equal to 0 populations are genetically identical and opposite 121 00:17:43,060 --> 00:17:48,000 if the value is equal to one the populations are genetically distinct. 122 00:17:48,400 --> 00:17:55,580 In real livestock populations, the value of this index usually ranges from 0 to 0.5, 123 00:17:55,920 --> 00:18:09,220 of course, if we are testing a single species. Populations with Fst value higher than 0.25 are considered to be genetically differentiated. 124 00:18:10,280 --> 00:18:24,040 Other commonly used approaches to evaluate population structure and genetic relationships between populations include principal component analysis and bayesian analysis of genetic admixture. 125 00:18:24,460 --> 00:18:36,100 Principal component analysis is a popular multivariate statistical method that has found applications in various scientific fields, including population genetics. 126 00:18:36,260 --> 00:18:48,250 Simply said, this analysis is used to represent high dimensional data, for example, genomic information about individuals or populations, in a fewer dimensions. 127 00:18:48,790 --> 00:18:54,490 Bayesian statistics is a method that is used in other scientific disciplines as well. 128 00:18:54,910 --> 00:19:08,010 This statistics operates with conditional probability and allows the probability of the initial hypothesis to be refined in a sequence as other relevant facts appear. 129 00:19:08,750 --> 00:19:17,504 This slide shows representative results of testing the proportion of genetic admixture, principal component analysis and gene flow. 130 00:19:17,702 --> 00:19:30,724 In the case of the first figure, this is a bayesian analysis of admixture within 15 cattle breeds with the proportion of admixture within breads represented by different colors of the lines. 131 00:19:30,892 --> 00:19:45,440 The second figure on the left shows representative results of the principal component analyzed, with the degree of admixture being best seen in the part D through the overlapping peaks of different colors. 132 00:19:45,930 --> 00:19:58,710 In the third figure on the right, we can see the results of the genetic admixture analysis of the four breeds and the numerical representation of the gene flow between their gene pools. 133 00:19:59,410 --> 00:20:09,771 As I mentioned before in the case of ROH segments and linkage disequilibrium, in addition to standard indicators such as Ne and F, 134 00:20:09,771 --> 00:20:20,930 we can also evaluate the effect of selection on the genomic structure or identify specific genetic variants under strong selection pressure. 135 00:20:21,350 --> 00:20:30,410 Unlike whole genome association studies, this approach does not require access to phenotypic information about individuals. 136 00:20:30,790 --> 00:20:42,450 It is essentially the identification of so called selection signals, the occurrence of which depends on variety of factors (in livestock mainly artificial selection). 137 00:20:42,910 --> 00:20:52,134 Two groups of methods are basically used for this purpose: methods testing differences between populations or 138 00:20:52,134 --> 00:21:00,342 breeds and methods analyzing intrapopulation differences. In terms of interpopulation differences 139 00:21:00,486 --> 00:21:16,062 whole genome screening of the Fst fixation index, analysis of variability in linkage disequilibrium, or calculation of integrated haplotype scores are most commonly used to identify selection signals. 140 00:21:16,246 --> 00:21:23,850 A number of programs exist for this purpose such as Plink, varLD or R package rehh. 141 00:21:24,550 --> 00:21:34,870 The figure on the right shows the results of the analysis of testing selection signals reflecting differences between Slovak Spotted and Slovak Pinzgau cattle. 142 00:21:35,030 --> 00:21:50,074 The strongest signals were found in the casein gene family and KIT and KDR genes responsible for spotting. Within intrapopulation differences, 143 00:21:50,202 --> 00:21:59,578 selection signals are usually determined based on the distribution of runs of homozygosity or variation in linkage disequilibrium. 144 00:21:59,714 --> 00:22:04,630 The same programs as for the previous methods can be used for the calculation. 145 00:22:05,010 --> 00:22:13,502 The figure on the right shows the results of testing ROH segments distribution in the genome of Slovak Spotted and Slovak Pinzgau cattle. 146 00:22:13,666 --> 00:22:23,850 The results show that similar to previos approach, the selection signals are strongest in the genomic regions of casein family genes. 147 00:22:26,230 --> 00:22:34,330 If you have any questions about the presentation, please contact me at the email address shown in the slide. 148 00:22:34,790 --> 00:22:42,890 Information about the project, including access to other presentations, can be found by scanning the barcode on the left. 149 00:22:43,290 --> 00:22:44,690 Thank you for your attention.