1 00:00:03,840 --> 00:00:04,464 Hello. 2 00:00:04,582 --> 00:00:19,090 In this video I would like to explain you how to calculate genetic diversity parameters, which is practical example for presentation within the module 2. 3 00:00:19,910 --> 00:00:24,010 For this purpose we will use database of genotype information. 4 00:00:24,710 --> 00:00:29,370 The name of the file which includes database is Testset. 5 00:00:30,150 --> 00:00:44,650 This database include information about five microsatellite markers which were genotyped for 15 animals originating from three subpopulations. 6 00:00:44,810 --> 00:00:48,910 For the calculation we will need the program Genelex. 7 00:00:49,370 --> 00:00:58,544 Our first task is to calculate average observed and expected heterozygosity, effective allele number and Wright's 8 00:00:58,544 --> 00:01:11,046 fixation index Fis across markers and subpopulations in the dataset and also to describe the level of diversity which result from all analyzed parameters. 9 00:01:11,238 --> 00:01:22,650 The second task is to calculate F statistics, that means Fis, Fit and FST indexes and to describe the level of diversity in the dataset. 10 00:01:23,030 --> 00:01:42,268 Third task is related to the evaluation of genetic distances across subpopulations, which are derived from Wright's fixation index and Nei's genetic distances and fourth task is related to the state of diversity, 11 00:01:42,268 --> 00:01:57,150 explanation of level of diversity within and across breeds and also to analysis of genetic differentiation in the dataset based on the principal component analysis. 12 00:01:58,330 --> 00:02:06,280 First, I would like to explain you organization of data in the Testset file. 13 00:02:06,580 --> 00:02:09,524 The type of this file is Genepop. 14 00:02:09,652 --> 00:02:13,720 If you open this file you can see several information. 15 00:02:14,620 --> 00:02:18,892 In the first row you have the name of the dataset. 16 00:02:18,996 --> 00:02:31,840 You can use any other name. If you want you can use for example name of breed or name of the localization and so on. 17 00:02:32,180 --> 00:02:38,892 In the next few rows you can see the ID of genetic markers. 18 00:02:38,996 --> 00:02:42,868 In this case, the IDs are general. 19 00:02:43,004 --> 00:02:50,720 That means the name of the first genetic marker is loci 1, name of the second one is loci 2, and so on. 20 00:02:51,020 --> 00:02:54,680 In the next row you have abbreviation POP. 21 00:02:55,060 --> 00:03:08,660 This row is used if you would like to differentiate between the name of markers and then genotype information for animals in population number one. 22 00:03:09,040 --> 00:03:16,060 But you also need to use this row if you would like to differentiate between two different populations. 23 00:03:16,600 --> 00:03:24,020 These animals are belonging to population number one, these animals are belonging to the population number two, and so on. 24 00:03:24,800 --> 00:03:37,320 In this row you have information about the animal ID, in this case A1, and then information about the genotypes for microsatellite genetic markers. 25 00:03:37,820 --> 00:03:52,080 In first column you have information about the genotype for first genetic marker, in the second column for the second genetic marker, in the third column for the third genetic marker, and so on. 26 00:03:52,940 --> 00:04:03,240 Also in this case, each animal has two alleles in the genotype, but the alleles are encoded by three numbers. 27 00:04:03,540 --> 00:04:13,936 You can see for example in this row that this is the genotype for first genetic marker for animal 1 in population 1. 28 00:04:14,128 --> 00:04:21,340 The first allele is encoded as 262 and the second one is encoded by 264. 29 00:04:21,880 --> 00:04:27,856 For this marker, first animal from first population is heterozygous. 30 00:04:27,928 --> 00:04:37,980 If we go to the second one, we can see that for second genetic markers the first animal from the first population is homozygous. 31 00:04:38,650 --> 00:04:56,510 If you would like to analyze biallelic genetic markers such as single nucleotide polymorphism, we can basically use only one number for one allele (for encoding one allele) 32 00:04:57,850 --> 00:05:06,226 that means you will have two different numbers in the genotype, not six as in this case. But depending on you. 33 00:05:06,418 --> 00:05:11,830 Please use only coding by using numbers not letters. 34 00:05:13,890 --> 00:05:27,602 If you would like, you can use program Genalex directly from the folder because you don't need to install it, but you can very easy to upload this program from the Internet. 35 00:05:27,706 --> 00:05:37,594 Just open Google, write the name of the program, open first link and in the section download you can see that 36 00:05:37,594 --> 00:05:53,470 here you can find the program including also short manual for the program and also explanation for formulas which are used for the calculation of genetic diversity parameters. 37 00:05:55,570 --> 00:06:09,570 Program as I said, you don't need to install. If you want to open this program, double click on the icon of the program and then this is important you need to allow macros. 38 00:06:13,070 --> 00:06:18,222 Now the program is starting. Where you find the program? 39 00:06:18,366 --> 00:06:34,878 You find the program directly here or just to open add-ins and inside add-ins you can see here is the name of the program and all of the options which are included in this program. 40 00:06:35,054 --> 00:06:40,350 You just open it by clicking on the on the name of the program. 41 00:06:41,610 --> 00:06:47,490 First step is import of the data to the program. 42 00:06:47,610 --> 00:06:54,306 As I said, the type of the database which we will use for the calculation is Genepop. 43 00:06:54,458 --> 00:06:56,826 Thus, please click on the Genepop. 44 00:06:57,018 --> 00:07:02,378 In this window you can select the type of the data. 45 00:07:02,474 --> 00:07:04,922 We know that we have codominant data. 46 00:07:05,026 --> 00:07:09,494 We can also change the source of data. 47 00:07:09,582 --> 00:07:13,130 But right now we will use single genepop file. 48 00:07:13,630 --> 00:07:22,926 Just click on OK and now you need to select the localization of your database in your computer. 49 00:07:22,998 --> 00:07:30,810 In my case is database stored in the folder on the desktop. 50 00:07:45,720 --> 00:07:49,820 Double click on the name of the database or the name of the file 51 00:07:50,920 --> 00:07:57,432 and now the program is asking you if you would like to save this Excel sheet. 52 00:07:57,496 --> 00:07:59,896 If not, just click on Cancel. 53 00:08:00,088 --> 00:08:06,960 It is very easy to repeat the analysis because procedure is not time consuming. 54 00:08:08,300 --> 00:08:14,444 After importing of data you can see that you have several information in the excel. 55 00:08:14,532 --> 00:08:25,548 In the first row you have information about the number of markers in the database, then number of animals in the database 56 00:08:25,644 --> 00:08:29,600 and the number of subpopulations in the database. 57 00:08:30,060 --> 00:08:38,780 In next column, in this row you have information about the number of animals within each subpopulation. 58 00:08:40,000 --> 00:08:43,232 That means in first subpopulation we have five animals. 59 00:08:43,296 --> 00:08:45,864 In the second one, we have also five animals 60 00:08:45,912 --> 00:08:47,780 and also in the third one. 61 00:08:48,240 --> 00:08:58,740 In the second row, you have information about the name of the dataset, type of import and also name of the population. 62 00:08:59,360 --> 00:09:06,808 If you would like, you can change the name of the population to the name of breeds which you analyze. 63 00:09:06,944 --> 00:09:09,940 But right now is this one not so important. 64 00:09:10,440 --> 00:09:17,936 In the third row you have information about the data which are below this row. 65 00:09:18,008 --> 00:09:23,540 That means in this column we have information about the animal ID. 66 00:09:23,880 --> 00:09:30,940 In this one, we have information about the population from which animals are coming, 67 00:09:31,480 --> 00:09:38,020 and in the next columns we have information about the genotypes of animals. 68 00:09:38,420 --> 00:09:52,320 This is first genotype for loci 1, second genotype for first animals for loci 2, third genotype for first animal for loci 3 and so on. 69 00:09:53,500 --> 00:09:57,396 Now we would like to make analysis. How to do it? 70 00:09:57,468 --> 00:10:05,564 Please click on add-ins and select the name of the program because observed and expected heterozygosity, 71 00:10:05,564 --> 00:10:15,840 effective number of alleles and also Wright´s fixation index are parameters which are calculated from the frequency of alleles, i.e. 72 00:10:16,000 --> 00:10:20,420 in the first step we need to select option Frequency. 73 00:10:21,400 --> 00:10:26,552 You can see that program is asking you about the type of data. 74 00:10:26,616 --> 00:10:30,672 We know that the type of data is in our case codominant. 75 00:10:30,816 --> 00:10:39,420 Then is also informing you how many markers you have in the database, how many samples, how many subpopulations. 76 00:10:40,010 --> 00:10:48,630 Click on OK and now you can select the type of analysis which you would like to perform. 77 00:10:49,890 --> 00:11:00,390 We would like to calculate heterozygosity (observed and expected), Wright's statistics and also effective number of alleles. 78 00:11:00,890 --> 00:11:09,756 For this task, please select the option Het, Fstat, Poly by pop. Because we know that we need to also calculate 79 00:11:09,756 --> 00:11:20,060 Nei's genetic distances and Wright's Fst index to quantify the level of genetic differentiation in the population/ metapopulation 80 00:11:20,760 --> 00:11:32,020 we need to select in the third part of this window the option Nei's distance and Fst. Please click on it and then Ok. 81 00:11:34,500 --> 00:11:39,400 Now you can see that the program was calculating the results. 82 00:11:39,860 --> 00:11:47,480 The results are stored in the sheet HFP, NeiP and FstP. 83 00:11:49,060 --> 00:11:53,960 Please select the HFP sheet. 84 00:11:54,260 --> 00:12:00,440 Here you can see in the first row the type of analysis or the name of the analysis. 85 00:12:01,420 --> 00:12:11,700 In this sheet you have information about heterozygosity, F statistics and polymorphism by population for codominant data. 86 00:12:12,160 --> 00:12:24,660 In this part you have only description about the name of the dataset, number of loci (or genetic markers), number of samples, number of subpopulations. 87 00:12:25,160 --> 00:12:28,100 In this row you have information 88 00:12:30,390 --> 00:12:35,854 about abbreviations which are used in table below. 89 00:12:36,022 --> 00:12:43,490 For example, N means sample size, Na means number of alleles. 90 00:12:44,630 --> 00:13:00,290 Ne means effective number of alleles, I is information index, Ho is observed heterozygosity, He is expected heterozygosity and F is Wright's fixation index Fis. 91 00:13:00,990 --> 00:13:11,390 In this table you can see values, which were calculated separately for each marker and population. 92 00:13:12,170 --> 00:13:30,282 But we need basically average values for each population across all genetic markers evaluated. Hence, this table is more important for us. 93 00:13:30,466 --> 00:13:40,975 In this table you can see number of animals in each population, average number of alleles, average effective 94 00:13:40,975 --> 00:13:53,040 number of alleles, average observed heterozygosity, average expected heterozygosity and average Wright's fixation index Fis. 95 00:13:54,860 --> 00:14:09,120 For the practical example, we need to speak about the observed heterozygosity, expected heterozygosity, effective number of alleles and Wright's fixation index Fis. 96 00:14:09,700 --> 00:14:24,028 If we compare average values for our three populations, we can see that population 1 reached the highest average effective number of alleles 97 00:14:24,084 --> 00:14:34,198 that means in this case, if we are speaking only about the effective number of alleles, the level of diversity is in this population the highest. 98 00:14:34,374 --> 00:14:47,050 But when we check observed heterozygosity, we can see that based on average values, the value is the highest in case of population number 2. 99 00:14:47,950 --> 00:14:51,254 In the case of expected heterozygosity, 100 00:14:51,382 --> 00:14:57,370 ,on the other side, the value is highest in case of population number 1. 101 00:14:58,540 --> 00:15:09,388 But if we check all of values for heterozygosities, we can see that the value is higher than 0.5 102 00:15:09,484 --> 00:15:19,160 that means in each population is higher proportion of heterozygous genotypes compared to the homozygous ones. 103 00:15:20,460 --> 00:15:25,116 Wright's fixation index can reach value from -1 to 1. 104 00:15:25,268 --> 00:15:33,804 Negative value means that in your population you have higher proportion of heterozygous genotypes and opposite 105 00:15:33,804 --> 00:15:43,660 if you have values greater than 0, that means positive values, in your population, you have prevalence of homozygous genotypes. 106 00:15:44,200 --> 00:16:04,946 In our case, the values of Wright's Fis index confirm that the level of diversity is really highest in the case of population number 2 compared to population number 1 and population number 3. 107 00:16:05,058 --> 00:16:12,162 But we can also see that also in the case of population number 3 is the value negative. 108 00:16:12,226 --> 00:16:23,162 This means that the proportion of heterozygous animals is higher. In the case of population number 1, the average value of Wright's Fis index is close to zero 109 00:16:23,226 --> 00:16:30,050 that means the proportion of homozygous and heterozygous genotypes is relatively balanced. 110 00:16:31,550 --> 00:16:44,650 This program is very good in summary statistics because provide you also total average for all populations and all genetic markers evaluated. 111 00:16:45,510 --> 00:16:51,342 The second task was calculation of Wright's F indexes. 112 00:16:51,406 --> 00:17:01,298 You can find information about the average value of Wright's fixation indexes in this part of the results. You can 113 00:17:01,298 --> 00:17:15,790 see that even if in metapopulation is proportion of homozygous genotypes higher, within the populations we really observed higher proportion of heterozygous genotypes. 114 00:17:16,850 --> 00:17:25,250 According to Fst value, we can say that the populations are really good differentiated from each other. 115 00:17:26,510 --> 00:17:35,030 When we check information which are below this table, we can see also percentage of polymorphic loci and other 116 00:17:35,030 --> 00:17:42,930 information related to the methodology for calculation of parameters which are included in this sheet. 117 00:17:43,390 --> 00:17:50,890 Now we can go to the third excel sheet with the name NeiP. 118 00:17:51,450 --> 00:17:55,314 In this excel sheet you can find two matrices. 119 00:17:55,402 --> 00:18:04,230 The first one is genetic distance matrix and the second one is genetic identity matrix. 120 00:18:04,690 --> 00:18:23,580 For our practical example, we need mainly information from the first matrix. Based on the matrix, we can see which population is more differentiated or the most differentiated from others in our dataset. 121 00:18:23,660 --> 00:18:38,388 This value is basically value which tell us genetic distances between population 1 and population 1. Value is zero 122 00:18:38,484 --> 00:18:43,000 that means logically that these two populations are the same. 123 00:18:44,330 --> 00:18:52,350 If the value is zero in case of Nei's genetic distances, that means that really population are genetically same. 124 00:18:52,770 --> 00:19:00,150 If the value is higher or close to one that means that the populations are genetically different. 125 00:19:01,170 --> 00:19:12,960 You can see that here we can find also values which are higher than 1 and this is mainly due to the formula for the calculation of Nei's genetic distances. 126 00:19:13,540 --> 00:19:23,652 We have several methods which are modification of standard Nei's genetic distances. 127 00:19:23,796 --> 00:19:29,320 This is also the reason why the maximum value is in this case not 1. 128 00:19:30,140 --> 00:19:46,980 From our three populations, we can say that population 1 and population 3 are really genetically connected and the highest genetic distance is between population 1 and population 2. 129 00:19:47,920 --> 00:19:59,260 We can compare the Nei's genetic distances with Wright's FST matrix if the results are the same or if there is some difference. 130 00:20:00,280 --> 00:20:03,904 In the case of Wright's Fst index is valid 131 00:20:03,952 --> 00:20:10,260 the same approach for the explanation of the results 132 00:20:10,760 --> 00:20:24,040 that means if the value is 0 populations are genetically similar and if the values is 1 populations are totally genetically different. 133 00:20:25,020 --> 00:20:30,204 We can say that Wright's Fst matrix shows us the same results. 134 00:20:30,332 --> 00:20:42,720 That means population 1 and population 3 are genetically connected and the highest genetic distance is between population 1 and population 2. 135 00:20:44,120 --> 00:20:57,176 The last task in practical example is calculation or estimation of genetic differentiation based on the principal component analysis. 136 00:20:57,368 --> 00:21:10,860 If we would like to make principal component analysis in Genalex, the first step is calculation of genetic distances not on population level but on the individual level. 137 00:21:11,740 --> 00:21:23,220 We need to go back to the original dataset and now we need to calculate genetic distances on individual level. 138 00:21:23,300 --> 00:21:37,880 For this purpose, we use option Distance and then please click on the Genetic. You don't need to change parameters which are set, 139 00:21:37,880 --> 00:21:50,200 just click on OK and now we can make principal component analysis by selecting option PCoA and then analysis. 140 00:21:50,900 --> 00:22:01,040 We can change the type of methodological approach, but right now is this not so important. 141 00:22:02,040 --> 00:22:17,955 We can let it as it is. What is good to change is graph options because it will be good to have different colors for animals from different populations. 142 00:22:17,955 --> 00:22:30,100 We can just select color code pops and then click on OK. Here are the results from the principal component analysis. 143 00:22:30,560 --> 00:22:38,976 In the first part of the Excel sheet you can see general information about the dataset and procedure which was made. 144 00:22:39,168 --> 00:22:48,560 Then in the table below, you have information about the proportion of variance which is explained by axes. 145 00:22:48,680 --> 00:23:02,756 Usually first and second axes are used for visualization because these two axes generally describe really high percentage of variation in the dataset. 146 00:23:02,828 --> 00:23:12,560 You can see also here that first and second axis describe 42.04% of variance in the dataset. 147 00:23:13,780 --> 00:23:29,840 What we see on the figure, we can see that our populations are genetically separated, population number 2 is here and population number one and three are on the left. 148 00:23:31,520 --> 00:23:47,260 This confirms the results from the Nei's genetic distances and Wright's Fst index because based on these two matrices we know that there is closer connection between population 1 and population 3. 149 00:23:49,320 --> 00:23:52,656 This is all from this practical part. 150 00:23:52,848 --> 00:24:08,524 If you have more questions about the program Genelex or coding of data, please write me email. My email address you can find on the presentation from the theoretical part. 151 00:24:08,692 --> 00:24:10,140 Thank you for your attention.