1
00:00:03,840 --> 00:00:04,464
Hello.

2
00:00:04,582 --> 00:00:19,090
In this video I would like to explain you how to calculate genetic diversity parameters, which is practical example for presentation within the module 2.

3
00:00:19,910 --> 00:00:24,010
For this purpose we will use database of genotype information.

4
00:00:24,710 --> 00:00:29,370
The name of the file which includes database is Testset.

5
00:00:30,150 --> 00:00:44,650
This database include information about five microsatellite markers which were genotyped for 15 animals originating from three subpopulations.

6
00:00:44,810 --> 00:00:48,910
For the calculation we will need the program Genelex.

7
00:00:49,370 --> 00:00:58,544
Our first task is to calculate average observed and expected heterozygosity, effective allele number and Wright's

8
00:00:58,544 --> 00:01:11,046
fixation index Fis across markers and subpopulations in the dataset and also to describe the level of diversity which result from all analyzed parameters.

9
00:01:11,238 --> 00:01:22,650
The second task is to calculate F statistics, that means Fis, Fit and FST indexes and to describe the level of diversity in the dataset.

10
00:01:23,030 --> 00:01:42,268
Third task is related to the evaluation of genetic distances across subpopulations, which are derived from Wright's fixation index and Nei's genetic distances and fourth task is related to the state of diversity,

11
00:01:42,268 --> 00:01:57,150
explanation of level of diversity within and across breeds and also to analysis of genetic differentiation in the dataset based on the principal component analysis.

12
00:01:58,330 --> 00:02:06,280
First, I would like to explain you organization of data in the Testset file.

13
00:02:06,580 --> 00:02:09,524
The type of this file is Genepop.

14
00:02:09,652 --> 00:02:13,720
If you open this file you can see several information.

15
00:02:14,620 --> 00:02:18,892
In the first row you have the name of the dataset.

16
00:02:18,996 --> 00:02:31,840
You can use any other name. If you want you can use for example name of breed or name of the localization and so on.

17
00:02:32,180 --> 00:02:38,892
In the next few rows you can see the ID of genetic markers.

18
00:02:38,996 --> 00:02:42,868
In this case, the IDs are general.

19
00:02:43,004 --> 00:02:50,720
That means the name of the first genetic marker is loci 1, name of the second one is loci 2, and so on.

20
00:02:51,020 --> 00:02:54,680
In the next row you have abbreviation POP.

21
00:02:55,060 --> 00:03:08,660
This row is used if you would like to differentiate between the name of markers and then genotype information for animals in population number one.

22
00:03:09,040 --> 00:03:16,060
But you also need to use this row if you would like to differentiate between two different populations.

23
00:03:16,600 --> 00:03:24,020
These animals are belonging to population number one, these animals are belonging to the population number two, and so on.

24
00:03:24,800 --> 00:03:37,320
In this row you have information about the animal ID, in this case A1, and then information about the genotypes for microsatellite genetic markers.

25
00:03:37,820 --> 00:03:52,080
In first column you have information about the genotype for first genetic marker, in the second column for the second genetic marker, in the third column for the third genetic marker, and so on.

26
00:03:52,940 --> 00:04:03,240
Also in this case, each animal has two alleles in the genotype, but the alleles are encoded by three numbers.

27
00:04:03,540 --> 00:04:13,936
You can see for example in this row that this is the genotype for first genetic marker for animal 1 in population 1.

28
00:04:14,128 --> 00:04:21,340
The first allele is encoded as 262 and the second one is encoded by 264.

29
00:04:21,880 --> 00:04:27,856
For this marker, first animal from first population is heterozygous.

30
00:04:27,928 --> 00:04:37,980
If we go to the second one, we can see that for second genetic markers the first animal from the first population is homozygous.

31
00:04:38,650 --> 00:04:56,510
If you would like to analyze biallelic genetic markers such as single nucleotide polymorphism, we can basically use only one number for one allele (for encoding one allele)

32
00:04:57,850 --> 00:05:06,226
that means you will have two different numbers in the genotype, not six as in this case. But depending on you.

33
00:05:06,418 --> 00:05:11,830
Please use only coding by using numbers not letters.

34
00:05:13,890 --> 00:05:27,602
If you would like, you can use program Genalex directly from the folder because you don't need to install it, but you can very easy to upload this program from the Internet.

35
00:05:27,706 --> 00:05:37,594
Just open Google, write the name of the program, open first link and in the section download you can see that

36
00:05:37,594 --> 00:05:53,470
here you can find the program including also short manual for the program and also explanation for formulas which are used for the calculation of genetic diversity parameters.

37
00:05:55,570 --> 00:06:09,570
Program as I said, you don't need to install. If you want to open this program, double click on the icon of the program and then this is important you need to allow macros.

38
00:06:13,070 --> 00:06:18,222
Now the program is starting. Where you find the program?

39
00:06:18,366 --> 00:06:34,878
You find the program directly here or just to open add-ins and inside add-ins you can see here is the name of the program and all of the options which are included in this program.

40
00:06:35,054 --> 00:06:40,350
You just open it by clicking on the on the name of the program.

41
00:06:41,610 --> 00:06:47,490
First step is import of the data to the program.

42
00:06:47,610 --> 00:06:54,306
As I said, the type of the database which we will use for the calculation is Genepop.

43
00:06:54,458 --> 00:06:56,826
Thus, please click on the Genepop.

44
00:06:57,018 --> 00:07:02,378
In this window you can select the type of the data.

45
00:07:02,474 --> 00:07:04,922
We know that we have codominant data.

46
00:07:05,026 --> 00:07:09,494
We can also change the source of data.

47
00:07:09,582 --> 00:07:13,130
But right now we will use single genepop file.

48
00:07:13,630 --> 00:07:22,926
Just click on OK and now you need to select the localization of your database in your computer.

49
00:07:22,998 --> 00:07:30,810
In my case is database stored in the folder on the desktop.

50
00:07:45,720 --> 00:07:49,820
Double click on the name of the database or the name of the file

51
00:07:50,920 --> 00:07:57,432
and now the program is asking you if you would like to save this Excel sheet.

52
00:07:57,496 --> 00:07:59,896
If not, just click on Cancel.

53
00:08:00,088 --> 00:08:06,960
It is very easy to repeat the analysis because procedure is not time consuming.

54
00:08:08,300 --> 00:08:14,444
After importing of data you can see that you have several information in the excel.

55
00:08:14,532 --> 00:08:25,548
In the first row you have information about the number of markers in the database, then number of animals in the database

56
00:08:25,644 --> 00:08:29,600
and the number of subpopulations in the database.

57
00:08:30,060 --> 00:08:38,780
In next column, in this row you have information about the number of animals within each subpopulation.

58
00:08:40,000 --> 00:08:43,232
That means in first subpopulation we have five animals.

59
00:08:43,296 --> 00:08:45,864
In the second one, we have also five animals

60
00:08:45,912 --> 00:08:47,780
and also in the third one.

61
00:08:48,240 --> 00:08:58,740
In the second row, you have information about the name of the dataset, type of import and also name of the population.

62
00:08:59,360 --> 00:09:06,808
If you would like, you can change the name of the population to the name of breeds which you analyze.

63
00:09:06,944 --> 00:09:09,940
But right now is this one not so important.

64
00:09:10,440 --> 00:09:17,936
In the third row you have information about the data which are below this row.

65
00:09:18,008 --> 00:09:23,540
That means in this column we have information about the animal ID.

66
00:09:23,880 --> 00:09:30,940
In this one, we have information about the population from which animals are coming,

67
00:09:31,480 --> 00:09:38,020
and in the next columns we have information about the genotypes of animals.

68
00:09:38,420 --> 00:09:52,320
This is first genotype for loci 1, second genotype for first animals for loci 2, third genotype for first animal for loci 3 and so on.

69
00:09:53,500 --> 00:09:57,396
Now we would like to make analysis. How to do it?

70
00:09:57,468 --> 00:10:05,564
Please click on add-ins and select the name of the program because observed and expected heterozygosity,

71
00:10:05,564 --> 00:10:15,840
effective number of alleles and also Wright´s fixation index are parameters which are calculated from the frequency of alleles, i.e.

72
00:10:16,000 --> 00:10:20,420
in the first step we need to select option Frequency.

73
00:10:21,400 --> 00:10:26,552
You can see that program is asking you about the type of data.

74
00:10:26,616 --> 00:10:30,672
We know that the type of data is in our case codominant.

75
00:10:30,816 --> 00:10:39,420
Then is also informing you how many markers you have in the database, how many samples, how many subpopulations.

76
00:10:40,010 --> 00:10:48,630
Click on OK and now you can select the type of analysis which you would like to perform.

77
00:10:49,890 --> 00:11:00,390
We would like to calculate heterozygosity (observed and expected), Wright's statistics and also effective number of alleles.

78
00:11:00,890 --> 00:11:09,756
For this task, please select the option Het, Fstat, Poly by pop. Because we know that we need to also calculate

79
00:11:09,756 --> 00:11:20,060
Nei's genetic distances and Wright's Fst index to quantify the level of genetic differentiation in the population/ metapopulation

80
00:11:20,760 --> 00:11:32,020
we need to select in the third part of this window the option Nei's distance and Fst. Please click on it and then Ok.

81
00:11:34,500 --> 00:11:39,400
Now you can see that the program was calculating the results.

82
00:11:39,860 --> 00:11:47,480
The results are stored in the sheet HFP, NeiP and FstP.

83
00:11:49,060 --> 00:11:53,960
Please select the HFP sheet.

84
00:11:54,260 --> 00:12:00,440
Here you can see in the first row the type of analysis or the name of the analysis.

85
00:12:01,420 --> 00:12:11,700
In this sheet you have information about heterozygosity, F statistics and polymorphism by population for codominant data.

86
00:12:12,160 --> 00:12:24,660
In this part you have only description about the name of the dataset, number of loci (or genetic markers), number of samples, number of subpopulations.

87
00:12:25,160 --> 00:12:28,100
In this row you have information

88
00:12:30,390 --> 00:12:35,854
about abbreviations which are used in table below.

89
00:12:36,022 --> 00:12:43,490
For example, N means sample size, Na means number of alleles.

90
00:12:44,630 --> 00:13:00,290
Ne means effective number of alleles, I is information index, Ho is observed heterozygosity, He is expected heterozygosity and F is Wright's fixation index Fis.

91
00:13:00,990 --> 00:13:11,390
In this table you can see values, which were calculated separately for each marker and population.

92
00:13:12,170 --> 00:13:30,282
But we need basically average values for each population across all genetic markers evaluated. Hence, this table is more important for us.

93
00:13:30,466 --> 00:13:40,975
In this table you can see number of animals in each population, average number of alleles, average effective

94
00:13:40,975 --> 00:13:53,040
number of alleles, average observed heterozygosity, average expected heterozygosity and average Wright's fixation index Fis.

95
00:13:54,860 --> 00:14:09,120
For the practical example, we need to speak about the observed heterozygosity, expected heterozygosity, effective number of alleles and Wright's fixation index Fis.

96
00:14:09,700 --> 00:14:24,028
If we compare average values for our three populations, we can see that population 1 reached the highest average effective number of alleles

97
00:14:24,084 --> 00:14:34,198
that means in this case, if we are speaking only about the effective number of alleles, the level of diversity is in this population the highest.

98
00:14:34,374 --> 00:14:47,050
But when we check observed heterozygosity, we can see that based on average values, the value is the highest in case of population number 2.

99
00:14:47,950 --> 00:14:51,254
In the case of expected heterozygosity,

100
00:14:51,382 --> 00:14:57,370
,on the other side, the value is highest in case of population number 1.

101
00:14:58,540 --> 00:15:09,388
But if we check all of values for heterozygosities, we can see that the value is higher than 0.5

102
00:15:09,484 --> 00:15:19,160
that means in each population is higher proportion of heterozygous genotypes compared to the homozygous ones.

103
00:15:20,460 --> 00:15:25,116
Wright's fixation index can reach value from -1 to 1.

104
00:15:25,268 --> 00:15:33,804
Negative value means that in your population you have higher proportion of heterozygous genotypes and opposite

105
00:15:33,804 --> 00:15:43,660
if you have values greater than 0, that means positive values, in your population, you have prevalence of homozygous genotypes.

106
00:15:44,200 --> 00:16:04,946
In our case, the values of Wright's Fis index confirm that the level of diversity is really highest in the case of population number 2 compared to population number 1 and population number 3.

107
00:16:05,058 --> 00:16:12,162
But we can also see that also in the case of population number 3 is the value negative.

108
00:16:12,226 --> 00:16:23,162
This means that the proportion of heterozygous animals is higher. In the case of population number 1, the average value of Wright's Fis index is close to zero

109
00:16:23,226 --> 00:16:30,050
that means the proportion of homozygous and heterozygous genotypes is relatively balanced.

110
00:16:31,550 --> 00:16:44,650
This program is very good in summary statistics because provide you also total average for all populations and all genetic markers evaluated.

111
00:16:45,510 --> 00:16:51,342
The second task was calculation of Wright's F indexes.

112
00:16:51,406 --> 00:17:01,298
You can find information about the average value of Wright's fixation indexes in this part of the results. You can

113
00:17:01,298 --> 00:17:15,790
see that even if in metapopulation is proportion of homozygous genotypes higher, within the populations we really observed higher proportion of heterozygous genotypes.

114
00:17:16,850 --> 00:17:25,250
According to Fst value, we can say that the populations are really good differentiated from each other.

115
00:17:26,510 --> 00:17:35,030
When we check information which are below this table, we can see also percentage of polymorphic loci and other

116
00:17:35,030 --> 00:17:42,930
information related to the methodology for calculation of parameters which are included in this sheet.

117
00:17:43,390 --> 00:17:50,890
Now we can go to the third excel sheet with the name NeiP.

118
00:17:51,450 --> 00:17:55,314
In this excel sheet you can find two matrices.

119
00:17:55,402 --> 00:18:04,230
The first one is genetic distance matrix and the second one is genetic identity matrix.

120
00:18:04,690 --> 00:18:23,580
For our practical example, we need mainly information from the first matrix. Based on the matrix, we can see which population is more differentiated or the most differentiated from others in our dataset.

121
00:18:23,660 --> 00:18:38,388
This value is basically value which tell us genetic distances between population 1 and population 1. Value is zero

122
00:18:38,484 --> 00:18:43,000
that means logically that these two populations are the same.

123
00:18:44,330 --> 00:18:52,350
If the value is zero in case of Nei's genetic distances, that means that really population are genetically same.

124
00:18:52,770 --> 00:19:00,150
If the value is higher or close to one that means that the populations are genetically different.

125
00:19:01,170 --> 00:19:12,960
You can see that here we can find also values which are higher than 1 and this is mainly due to the formula for the calculation of Nei's genetic distances.

126
00:19:13,540 --> 00:19:23,652
We have several methods which are modification of standard Nei's genetic distances.

127
00:19:23,796 --> 00:19:29,320
This is also the reason why the maximum value is in this case not 1.

128
00:19:30,140 --> 00:19:46,980
From our three populations, we can say that population 1 and population 3 are really genetically connected and the highest genetic distance is between population 1 and population 2.

129
00:19:47,920 --> 00:19:59,260
We can compare the Nei's genetic distances with Wright's FST matrix if the results are the same or if there is some difference.

130
00:20:00,280 --> 00:20:03,904
In the case of Wright's Fst index is valid

131
00:20:03,952 --> 00:20:10,260
the same approach for the explanation of the results

132
00:20:10,760 --> 00:20:24,040
that means if the value is 0 populations are genetically similar and if the values is 1 populations are totally genetically different.

133
00:20:25,020 --> 00:20:30,204
We can say that Wright's Fst matrix shows us the same results.

134
00:20:30,332 --> 00:20:42,720
That means population 1 and population 3 are genetically connected and the highest genetic distance is between population 1 and population 2.

135
00:20:44,120 --> 00:20:57,176
The last task in practical example is calculation or estimation of genetic differentiation based on the principal component analysis.

136
00:20:57,368 --> 00:21:10,860
If we would like to make principal component analysis in Genalex, the first step is calculation of genetic distances not on population level but on the individual level.

137
00:21:11,740 --> 00:21:23,220
We need to go back to the original dataset and now we need to calculate genetic distances on individual level.

138
00:21:23,300 --> 00:21:37,880
For this purpose, we use option Distance and then please click on the Genetic. You don't need to change parameters which are set,

139
00:21:37,880 --> 00:21:50,200
just click on OK and now we can make principal component analysis by selecting option PCoA and then analysis.

140
00:21:50,900 --> 00:22:01,040
We can change the type of methodological approach, but right now is this not so important.

141
00:22:02,040 --> 00:22:17,955
We can let it as it is. What is good to change is graph options because it will be good to have different colors for animals from different populations.

142
00:22:17,955 --> 00:22:30,100
We can just select color code pops and then click on OK. Here are the results from the principal component analysis.

143
00:22:30,560 --> 00:22:38,976
In the first part of the Excel sheet you can see general information about the dataset and procedure which was made.

144
00:22:39,168 --> 00:22:48,560
Then in the table below, you have information about the proportion of variance which is explained by axes.

145
00:22:48,680 --> 00:23:02,756
Usually first and second axes are used for visualization because these two axes generally describe really high percentage of variation in the dataset.

146
00:23:02,828 --> 00:23:12,560
You can see also here that first and second axis describe 42.04% of variance in the dataset.

147
00:23:13,780 --> 00:23:29,840
What we see on the figure, we can see that our populations are genetically separated, population number 2 is here and population number one and three are on the left.

148
00:23:31,520 --> 00:23:47,260
This confirms the results from the Nei's genetic distances and Wright's Fst index because based on these two matrices we know that there is closer connection between population 1 and population 3.

149
00:23:49,320 --> 00:23:52,656
This is all from this practical part.

150
00:23:52,848 --> 00:24:08,524
If you have more questions about the program Genelex or coding of data, please write me email. My email address you can find on the presentation from the theoretical part.

151
00:24:08,692 --> 00:24:10,140
Thank you for your attention.