<pre><div class="text_to_html">1
00:00:01,000 --&gt; 00:00:01,480
Hello,

2
00:00:01,546 --&gt; 00:00:05,090
I would like to welcome you
on a presentation with topic

3
00:00:05,160 --&gt; 00:00:10,130
&quot;Fine-scale analysis of population
structure based on genomic data

4
00:00:10,200 --&gt; 00:00:15,090
and quantification of selection effect
on livestock genome&quot;, which was prepared

5
00:00:15,160 --&gt; 00:00:19,050
for the third degree of education.

6
00:00:19,120 --&gt; 00:00:25,200
This presentation is a part of the ISEGREED
project, which is supported

7
00:00:25,266 --&gt; 00:00:28,090
by the European Union.

8
00:00:28,160 --&gt; 00:00:33,760
This presentation belonging to the module
number 2: Conservation and sustainable

9
00:00:33,826 --&gt; 00:00:37,170
use of animal genetic resources.

10
00:00:37,240 --&gt; 00:00:41,640
My name is Nina Moravč&iacute;kov&aacute;.
This presentation was also

11
00:00:41,706 --&gt; 00:00:44,010
prepared by Professor Kasarda.

12
00:00:44,080 --&gt; 00:00:48,280
We are working on the Slovak University
of Agriculture in Nitra,

13
00:00:48,346 --&gt; 00:00:51,680
the Faculty of Agrobiology
and Food Resources and Institute

14
00:00:51,746 --&gt; 00:00:55,610
of Nutrition and Genomics.

15
00:00:55,680 --&gt; 00:01:02,640
This presentation is divided to four
parts: quality control of genomic data,

16
00:01:02,706 --&gt; 00:01:06,050
approaches and tools for population
structure analysis,

17
00:01:06,120 --&gt; 00:01:10,960
approaches and tools for evaluating
the impact of selection on the livestock

18
00:01:11,026 --&gt; 00:01:15,160
genome, and the last part is functional annotation

19
00:01:15,226 --&gt; 00:01:20,570
of region significantly
affected by selection pressure.

20
00:01:20,640 --&gt; 00:01:27,010
Quality control of genomic data is really important step

21
00:01:27,080 --&gt; 00:01:30,160
before any type of analysis.

22
00:01:30,226 --&gt; 00:01:34,200
I would like to speak mainly about

23
00:01:34,266 --&gt; 00:01:38,970
the quality control of data which is
related to the genomic data

24
00:01:39,040 --&gt; 00:01:42,370
obtained by using SNP chips.

25
00:01:42,440 --&gt; 00:01:46,770
If we have incorrect or low quality data,

26
00:01:46,840 --&gt; 00:01:49,560
this can usually lead to errors

27
00:01:49,626 --&gt; 00:01:56,760
in analysis and mainly errors related
to the interpretation of results.

28
00:01:57,400 --&gt; 00:02:04,040
Data quality indicators which are usually
used are call rate of SNP markers overall

29
00:02:04,106 --&gt; 00:02:10,920
in the meta-population,
and then also call rate of SNP markers

30
00:02:10,986 --&gt; 00:02:14,130
within individuals in the population,

31
00:02:14,200 --&gt; 00:02:20,160
then frequency of the minor allele
frequency and also deviation

32
00:02:20,226 --&gt; 00:02:23,090
from the Hardy-Weinberg equilibrium.

33
00:02:23,160 --&gt; 00:02:30,650
Sometimes it&#039;s also good to apply quality
control for linkage disequilibrium.

34
00:02:30,720 --&gt; 00:02:37,440
Type of quality control which is used
before the analysis depends mainly

35
00:02:37,506 --&gt; 00:02:45,010
on the type of the analysis and also
the main objective of the analysis.

36
00:02:45,080 --&gt; 00:02:51,160
On this slide, you can see
standard quality control which is used if

37
00:02:51,226 --&gt; 00:02:56,050
we would like to analyze
population genetic structure.

38
00:02:56,120 --&gt; 00:03:02,410
Usually, this quality control of genomic
data covers call rate across SNPs

39
00:03:02,480 --&gt; 00:03:08,960
and across animals,
which minimum value is usually set to 90%,

40
00:03:09,026 --&gt; 00:03:12,210
and then also minor allele frequency.

41
00:03:12,280 --&gt; 00:03:16,880
The minimum value for minor allele
frequency is based on the

42
00:03:16,946 --&gt; 00:03:20,690
Mendelian Inheritance Law.

43
00:03:20,760 --&gt; 00:03:25,570
Then we also applied
Hardy-Weinberg equilibrium test.

44
00:03:25,640 --&gt; 00:03:32,040
Sometimes it&#039;s also good to control
level of linkage disequilibrium across

45
00:03:32,106 --&gt; 00:03:37,360
SNPs because if we would like to analyze population

46
00:03:37,426 --&gt; 00:03:41,050
structure, it will be good to have only

47
00:03:41,120 --&gt; 00:03:45,130
information about neutral genetic markers.

48
00:03:45,200 --&gt; 00:03:52,330
For this purpose, we can use several
types of programs and web-based tools.

49
00:03:52,400 --&gt; 00:03:55,690
For example, we can use program PLING.

50
00:03:55,760 --&gt; 00:04:00,800
On the right side of this slide,
you can see a graphical visualization

51
00:04:00,866 --&gt; 00:04:07,160
of quality control of SNP chip
data in case of horses.

52
00:04:07,920 --&gt; 00:04:14,920
Which type of analysis we can perform if
we are speaking about population

53
00:04:14,986 --&gt; 00:04:20,050
structure and utilization of SNP data.

54
00:04:20,120 --&gt; 00:04:25,610
We can analyze genetic differentiation
within and between populations,

55
00:04:25,680 --&gt; 00:04:31,970
then we can also evaluate or estimate
the degree of genetic admixture

56
00:04:32,040 --&gt; 00:04:36,640
within or between them,
as well as changes in their gene pool

57
00:04:36,706 --&gt; 00:04:42,610
which have arisen, for example, due
to selection, migration, or genetic drift.

58
00:04:42,680 --&gt; 00:04:47,240
But we can also estimate other parameters, for example,

59
00:04:47,306 --&gt; 00:04:52,570
genomic relationship matrix and based 
on the results optimize mating plans.

60
00:04:52,640 --&gt; 00:04:58,080
The most common type of method which can
be use for the analysis of population

61
00:04:58,146 --&gt; 00:05:01,730
structure are calculation of Wright&#039;s FST index,

62
00:05:01,800 --&gt; 00:05:05,770
calculation of genetic distance
and relationship matrices,

63
00:05:05,840 --&gt; 00:05:10,120
principal component analysis,
discriminant analysis of principal

64
00:05:10,186 --&gt; 00:05:14,330
components, and Bayesian analysis
of genetic admixture and gene flow

65
00:05:14,400 --&gt; 00:05:18,520
between populations,
and also construction of phylogenetic

66
00:05:18,586 --&gt; 00:05:22,170
trees and genetic networks.

67
00:05:22,240 --&gt; 00:05:27,210
Wright&#039;s fixation index FST 
is one of the most commonly used

68
00:05:27,280 --&gt; 00:05:32,360
parameters for evaluation of the degree
of genetic differentiation

69
00:05:32,426 --&gt; 00:05:34,770
between and within populations.

70
00:05:34,840 --&gt; 00:05:37,600
Its value range from zero to one.

71
00:05:37,666 --&gt; 00:05:42,530
If the value is equal to zero, then
the populations are genetically identical.

72
00:05:42,600 --&gt; 00:05:46,640
But if the value is equal to one,
we can say that the populations

73
00:05:46,706 --&gt; 00:05:49,930
are genetically totally different.

74
00:05:50,000 --&gt; 00:05:55,720
The interpretation of Wright&#039;s FST index is
relative easy, and also the time

75
00:05:55,786 --&gt; 00:05:58,850
for the computation is relatively short.

76
00:05:58,920 --&gt; 00:06:03,160
But the FST index cannot be use
for the quantification of genetic

77
00:06:03,226 --&gt; 00:06:05,730
relationship between individuals

78
00:06:05,800 --&gt; 00:06:08,090
that means on individual level.

79
00:06:08,160 --&gt; 00:06:14,090
Also, if the level of diversity
in the population is low, then also

80
00:06:14,160 --&gt; 00:06:18,770
the reliability of the results
is relatively low.

81
00:06:18,840 --&gt; 00:06:24,800
For the calculation of FST index,
we can use many tools, for example,

82
00:06:24,866 --&gt; 00:06:30,080
Arlequin, Genepop, and Genalex,
but these three programs are limited

83
00:06:30,146 --&gt; 00:06:34,040
mainly in a connection to the number

84
00:06:34,106 --&gt; 00:06:38,330
of SNPs for which we have genetic data.

85
00:06:38,400 --&gt; 00:06:43,850
But we can use also many R packages,
for example, StAMPP.

86
00:06:43,920 --&gt; 00:06:50,970
On the figure on the left side,
you can see dendrogram, which were

87
00:06:51,040 --&gt; 00:06:57,810
made based on the FST matrix
for the 16 cattle breeds.

88
00:06:57,880 --&gt; 00:07:03,770
This visualization is relatively nice
because we see that we have

89
00:07:03,840 --&gt; 00:07:09,850
two genetic clusters composed of breed
which are somehow connected

90
00:07:09,920 --&gt; 00:07:16,490
from historical point of view or
from phylogenetical point of view.

91
00:07:16,560 --&gt; 00:07:21,330
Relationship matrices
express genetic similarities and also

92
00:07:21,400 --&gt; 00:07:24,760
kinship between individuals
within a population.

93
00:07:24,826 --&gt; 00:07:30,840
That means these matrices can be used
for the quantification of level of genetic

94
00:07:30,906 --&gt; 00:07:34,210
relationship between individuals.

95
00:07:34,280 --&gt; 00:07:38,600
Each element of the matrix represents
a measure of genetic similarity

96
00:07:38,666 --&gt; 00:07:41,050
between a pair of individuals.

97
00:07:41,120 --&gt; 00:07:45,840
Relationship matrices are most often
calculated based on the frequency

98
00:07:45,906 --&gt; 00:07:51,320
of alleles in the population,
while the calculation itself can be based

99
00:07:51,386 --&gt; 00:07:56,730
on various approaches, for example,
calculation of the IBD matrix

100
00:07:56,800 --&gt; 00:07:59,890
or Nei&#039;s genetic distances.

101
00:07:59,960 --&gt; 00:08:05,480
The calculation of relationship matrices
is also relatively easy,

102
00:08:05,546 --&gt; 00:08:09,800
and after calculation,
we have relatively accurate estimates

103
00:08:09,866 --&gt; 00:08:13,610
of relationship between
animals in the population.

104
00:08:13,680 --&gt; 00:08:20,040
But sometimes, if we have information
about high number of individuals or

105
00:08:20,106 --&gt; 00:08:25,850
animals in the population,
this type of analysis is time consuming.

106
00:08:25,920 --&gt; 00:08:30,640
For the calculation of relationship
matrices, we can use, for for example,

107
00:08:30,706 --&gt; 00:08:33,890
PLINK, if you would like to calculate IBD
matrix,

108
00:08:33,960 --&gt; 00:08:38,720
or we can also use different R packages,
for example, StAMPP, if you would like

109
00:08:38,786 --&gt; 00:08:42,090
to calculate Nei&#039;s genetic distance matrix.

110
00:08:42,160 --&gt; 00:08:46,040
On the left side,
you can see example of visualization

111
00:08:46,106 --&gt; 00:08:51,370
of genetic distance matrix,
which is valid for the

112
00:08:51,440 --&gt; 00:08:53,490
five breeds of dogs.

113
00:08:53,560 --&gt; 00:08:58,010
Based on obtained result, we can say that

114
00:08:58,080 --&gt; 00:09:01,280
animals which belong to the same breeds

115
00:09:01,346 --&gt; 00:09:08,930
are connected together
and created one genetic cluster.

116
00:09:09,000 --&gt; 00:09:13,600
Another type of method which can be use
for the evaluation of population

117
00:09:13,666 --&gt; 00:09:17,490
structure is principal component analysis.

118
00:09:17,560 --&gt; 00:09:23,090
PCA is a multivariate statistical method
that decomposes a covariance matrix

119
00:09:23,160 --&gt; 00:09:28,800
of genetic data and extract the principal
component that reflect the variability

120
00:09:28,866 --&gt; 00:09:31,290
of the data in the the dataset.

121
00:09:31,360 --&gt; 00:09:36,640
For the visualization of the result,
usually first two principal

122
00:09:36,706 --&gt; 00:09:42,170
components are used because
these two first principal components

123
00:09:42,240 --&gt; 00:09:46,810
explain the highest proportion
of variability in the dataset.

124
00:09:46,880 --&gt; 00:09:51,170
PCA provides basic information about
the genetic structure,

125
00:09:51,240 --&gt; 00:09:56,530
which is useful when testing databases
with a large number of individuals.

126
00:09:56,600 --&gt; 00:10:00,720
PCA is a time-saving method for assessing the state

127
00:10:00,786 --&gt; 00:10:04,080
of genetic differentiation.

128
00:10:04,720 --&gt; 00:10:07,640
Visualization of PCA components is

129
00:10:07,706 --&gt; 00:10:12,770
really simple and good interpretable.

130
00:10:12,840 --&gt; 00:10:15,000
But what are disadvantage

131
00:10:15,066 --&gt; 00:10:16,840
of PCA analysis?

132
00:10:16,906 --&gt; 00:10:18,400
It&#039;s mainly low sensitivity if

133
00:10:18,466 --&gt; 00:10:22,730
we would like to estimate the degree
of genetic admixture

134
00:10:22,800 --&gt; 00:10:26,570
within and between populations.

135
00:10:26,640 --&gt; 00:10:32,920
For the calculation of principal component
analysis can be use also many tools,

136
00:10:32,986 --&gt; 00:10:37,840
for example, PLINK or R package Adegenet.

137
00:10:37,906 --&gt; 00:10:43,240
On the left side,
you can see example of visualization

138
00:10:43,306 --&gt; 00:10:48,970
of principal component analysis
in case of 16 sheep breeds.

139
00:10:49,040 --&gt; 00:10:55,410
On the figure, you can see that by using
this method, we really found three genetic

140
00:10:55,480 --&gt; 00:10:58,250
groups, and deeper

141
00:10:58,320 --&gt; 00:11:04,570
evaluation of the groups showed us that

142
00:11:04,640 --&gt; 00:11:10,120
the obtained differentiation is connected

143
00:11:10,186 --&gt; 00:11:13,010
mainly to the origin of each breed.

144
00:11:13,080 --&gt; 00:11:16,920
Discriminant analysis of principal components is

145
00:11:16,986 --&gt; 00:11:23,800
a method of discriminant analysis,
which is usually used for the evaluation

146
00:11:23,866 --&gt; 00:11:28,210
of genetic structure between
predefined groups or clusters.

147
00:11:28,280 --&gt; 00:11:33,760
It uses PCA to reduce the dimension
of the data and then discriminant analysis

148
00:11:33,826 --&gt; 00:11:37,530
to maximize the resolution
between populations.

149
00:11:37,600 --&gt; 00:11:42,160
Discriminant analysis of principal
components provides a more accurate

150
00:11:42,226 --&gt; 00:11:46,610
representation of the genetic structure
between predefined clusters,

151
00:11:46,680 --&gt; 00:11:49,970
compared to, for example, classical PCA.

152
00:11:50,040 --&gt; 00:11:55,800
But sometimes, is this analysis sensitive
to low level of diversity

153
00:11:55,866 --&gt; 00:11:57,850
in the population.

154
00:11:57,920 --&gt; 00:12:02,010
If we use discriminant analysis
of principal components, we can

155
00:12:02,080 --&gt; 00:12:06,080
expect relatively high accuracy
in detecting differences between

156
00:12:06,146 --&gt; 00:12:09,840
populations, and also results which are

157
00:12:09,906 --&gt; 00:12:14,520
relatively simply and easy interpretable.

158
00:12:15,440 --&gt; 00:12:21,010
On this slide, on the left side,
you can see representative results

159
00:12:21,080 --&gt; 00:12:25,050
from the discriminant analysis
of principal components.

160
00:12:25,120 --&gt; 00:12:28,570
In this case, was used genomic data

161
00:12:28,640 --&gt; 00:12:32,800
for red deer populations, seven farmed,

162
00:12:32,866 --&gt; 00:12:36,210
and two wild red deer populations.

163
00:12:36,280 --&gt; 00:12:41,890
By applying this method,
we found three clusters.

164
00:12:41,960 --&gt; 00:12:48,410
First two clusters were composed from wild
populations, Slovak and Spain,

165
00:12:48,480 --&gt; 00:12:55,170
and the third clusters was composed
from the populations of farmed animals.

166
00:12:55,240 --&gt; 00:13:00,000
For the calculation of discriminant
analysis of principal components, we can

167
00:13:00,066 --&gt; 00:13:03,530
use, for example, R package Adegenet.

168
00:13:03,600 --&gt; 00:13:09,960
If you would like to estimate
the proportion of genetic admixture within

169
00:13:10,026 --&gt; 00:13:14,130
the gene pool of population,
we can use Bayesian approach.

170
00:13:14,200 --&gt; 00:13:18,810
Bayesian approach allows
the identification of genetic groups

171
00:13:18,880 --&gt; 00:13:23,290
and the degree of admixture within
individuals without the need

172
00:13:23,360 --&gt; 00:13:26,960
to predefine groups or clusters.

173
00:13:27,160 --&gt; 00:13:33,250
Bayesian approach provides relatively
accurate identification of genetic

174
00:13:33,320 --&gt; 00:13:37,680
clusters, and this method is flexible
if we are speaking about

175
00:13:37,746 --&gt; 00:13:39,730
the complex structures.

176
00:13:39,800 --&gt; 00:13:45,250
But mainly if we have information for high
number of animals, this method is

177
00:13:45,320 --&gt; 00:13:48,840
time consuming compared to others.

178
00:13:49,160 --&gt; 00:13:54,600
For analysis or for testing of degree
of genetic admixture, based on the

179
00:13:54,666 --&gt; 00:13:57,570
Bayesian approach, we can use many tools.

180
00:13:57,640 --&gt; 00:14:02,930
We can use, for example, program
Structure, Admixture, or Faststructure.

181
00:14:03,000 --&gt; 00:14:06,450
On the left side, you can see representative results

182
00:14:06,520 --&gt; 00:14:12,320
from the estimation of genetic admixture
between seven farmed and two

183
00:14:12,386 --&gt; 00:14:16,090
wild populations of red deer.

184
00:14:16,160 --&gt; 00:14:23,210
Similarly to discriminant analysis
of principal components, we found that two

185
00:14:23,280 --&gt; 00:14:29,610
wild populations from Slovakia and Spain
were totally differentiated

186
00:14:29,680 --&gt; 00:14:31,890
from farmed populations.

187
00:14:31,960 --&gt; 00:14:38,530
As you can see on the figure,
farmed populations were relatively admixed.

188
00:14:38,600 --&gt; 00:14:43,690
That means we found relatively high degree
of admixture between

189
00:14:43,760 --&gt; 00:14:50,610
farmed populations of red deer,
mainly due to the migration of animals

190
00:14:50,680 --&gt; 00:14:55,520
and also artificial insemination.

191
00:14:55,640 --&gt; 00:15:00,120
We can use Bayesian approach also
for estimation of gene

192
00:15:00,186 --&gt; 00:15:02,490
flow between populations.

193
00:15:02,560 --&gt; 00:15:05,770
We can use, for example, program TreeMix.

194
00:15:05,840 --&gt; 00:15:11,760
Program TreeMix is based on the allele
frequencies, and it creates phylogenetic

195
00:15:11,826 --&gt; 00:15:15,400
trees with the possibility of testing
the intensity of migration

196
00:15:15,466 --&gt; 00:15:17,610
between populations.

197
00:15:17,680 --&gt; 00:15:23,000
This method is based on the maximum
probability and allows estimation

198
00:15:23,066 --&gt; 00:15:27,850
of phylogenetic relationships
and migration between population.

199
00:15:27,920 --&gt; 00:15:32,880
Program TreeMix allow the detection
of the itensity of migration and gene

200
00:15:32,946 --&gt; 00:15:36,160
flow in the past,
but sometimes the reliability

201
00:15:36,226 --&gt; 00:15:42,410
of the results depends on the amount of
available genomic data as well as

202
00:15:42,480 --&gt; 00:15:48,090
on the reliability of the allele
frequency estimation.

203
00:15:48,160 --&gt; 00:15:53,760
On this slide, on the right side,
you can see results from the analysis

204
00:15:53,826 --&gt; 00:15:58,240
of gene flow intensity between red deer
populations based

205
00:15:58,306 --&gt; 00:16:00,010
on the Bayesian approach.

206
00:16:00,080 --&gt; 00:16:04,010
In this case, we used program Bayesass,

207
00:16:04,080 --&gt; 00:16:07,880
which allows to determine the intensity

208
00:16:07,946 --&gt; 00:16:12,130
of gene flow between
and also within populations.

209
00:16:12,200 --&gt; 00:16:18,880
This program, compared to the TreeMix,
provides us information about the recent

210
00:16:18,946 --&gt; 00:16:23,200
migration rate, not migration rate in the past.

211
00:16:24,160 --&gt; 00:16:30,890
Population structure can be also evaluate
by constructing genetic networks,

212
00:16:30,960 --&gt; 00:16:34,930
for example, by using package Netview.

213
00:16:35,000 --&gt; 00:16:40,160
This package is a visualization tool
that uses genetic networks to show

214
00:16:40,226 --&gt; 00:16:44,090
relationships between
individuals or populations.

215
00:16:44,160 --&gt; 00:16:48,650
It creates genetic networks that show
genetic relationships

216
00:16:48,720 --&gt; 00:16:52,050
and gene flow between populations.

217
00:16:52,120 --&gt; 00:16:57,520
This package or Netview is really
suitable for assessing complex

218
00:16:57,586 --&gt; 00:17:01,410
relationships as well as
the impact of migration.

219
00:17:01,480 --&gt; 00:17:07,560
Its visualization is intuitive
and suitable for displaying

220
00:17:07,626 --&gt; 00:17:10,170
admixture and differentiation.

221
00:17:10,240 --&gt; 00:17:16,130
But if we have information about the large
number of individuals,

222
00:17:16,200 --&gt; 00:17:20,490
its utilization is relatively limited.

223
00:17:20,560 --&gt; 00:17:25,400
On this slide, you can see graphical
visualization of the results of testing

224
00:17:25,466 --&gt; 00:17:30,280
three different scenarios
of development of intra-population

225
00:17:30,346 --&gt; 00:17:37,450
and inter-population genetic relationships
within 16 cattle breeds using Netview.

226
00:17:37,520 --&gt; 00:17:42,680
Compared to the results from, for example,
PCA or discriminant analysis

227
00:17:42,746 --&gt; 00:17:48,890
of principal components,
we found that animals are clustered

228
00:17:48,960 --&gt; 00:17:53,890
together if they have common historical background.

229
00:17:53,960 --&gt; 00:17:57,680
That means if there is really high

230
00:17:57,746 --&gt; 00:18:03,050
intensity of gene flow between them.

231
00:18:03,120 --&gt; 00:18:08,610
Another type of graphical visualization

232
00:18:08,680 --&gt; 00:18:12,530
of genetic relationships between animals

233
00:18:12,600 --&gt; 00:18:18,330
or between populations is
a construction of phylogenetic trees.

234
00:18:18,400 --&gt; 00:18:23,250
Phylogenetic trees are graphical
representations

235
00:18:23,320 --&gt; 00:18:28,690
of evolutionary relationships between
populations or species

236
00:18:28,760 --&gt; 00:18:31,250
derived from the genetic data.

237
00:18:31,320 --&gt; 00:18:37,570
They are usually used to visualize
genealogical or genetic relationships,

238
00:18:37,640 --&gt; 00:18:41,960
model evolutionary processes,
and also track population

239
00:18:42,026 --&gt; 00:18:44,730
differentiation and migration.

240
00:18:44,800 --&gt; 00:18:49,570
They can be created using a variety
of algorithm and models,

241
00:18:49,640 --&gt; 00:18:55,160
but most commonly used models are based
on the genetic distances, for example,

242
00:18:55,226 --&gt; 00:19:00,200
Nei&#039;s genetic distance,
or probabilistic models like maximum

243
00:19:00,266 --&gt; 00:19:04,120
likelihood and Bayesian methods.

244
00:19:04,560 --&gt; 00:19:09,250
For the preparation of phylogenetic tree,
we can use different tools,

245
00:19:09,320 --&gt; 00:19:13,250
for example, SplitsTree or various R packages.

246
00:19:13,320 --&gt; 00:19:20,770
On this slide, you can see an example
of phylogenetic tree, and this tree was

247
00:19:20,840 --&gt; 00:19:27,610
derived from the Nei&#039;s genetic distance
matrix calculated for eight horse breeds.

248
00:19:27,680 --&gt; 00:19:34,160
This study was mainly oriented to the analysis
of genetic relationship of Slovak

249
00:19:34,226 --&gt; 00:19:39,720
warmblood horse to another historically connected horse

250
00:19:39,786 --&gt; 00:19:44,210
breeds which can be found in the Europe.

251
00:19:44,280 --&gt; 00:19:48,400
Now, we are going to the next part of this
presentation,

252
00:19:48,466 --&gt; 00:19:52,650
which is related to the approaches
and tools that can be use

253
00:19:52,720 --&gt; 00:19:58,450
for the evaluation of the impact
of selection on the livestock genome.

254
00:19:58,520 --&gt; 00:20:02,600
Genomic regions under strong selection
pressure are usually

255
00:20:02,666 --&gt; 00:20:05,330
called selection signals.

256
00:20:05,400 --&gt; 00:20:10,130
Analysis of selection signals
distribution in the genome allows

257
00:20:10,200 --&gt; 00:20:15,560
for a better understanding of evolutionary
processes and also the impact

258
00:20:15,626 --&gt; 00:20:19,960
of domestication,
and then also the impact of natural

259
00:20:20,026 --&gt; 00:20:25,610
and intensive artificial selection
of specific genomic regions which control

260
00:20:25,680 --&gt; 00:20:31,600
preferred phenotypic traits
in terms of adaptability, resilience,

261
00:20:31,666 --&gt; 00:20:38,410
or performance of individuals,
populations, and also livestock species.

262
00:20:38,480 --&gt; 00:20:41,170
Analysis of selection signals or selection

263
00:20:41,240 --&gt; 00:20:46,440
signatures also allows us to identify

264
00:20:46,506 --&gt; 00:20:51,200
genomic regions showing a decrease or
increase in genetic variability

265
00:20:51,266 --&gt; 00:20:53,250
or genetic diversity.

266
00:20:53,320 --&gt; 00:20:58,760
In this type of analysis,
we don&#039;t need to have information

267
00:20:58,826 --&gt; 00:21:02,050
about the phenotype of animals.

268
00:21:02,120 --&gt; 00:21:06,930
Approaches and methods for evaluation
of the selection signals distribution

269
00:21:07,000 --&gt; 00:21:11,690
in the livestock genome can
be divided to two groups.

270
00:21:11,760 --&gt; 00:21:18,360
First group of methods is group which is
based on the evaluation of inter-population

271
00:21:18,426 --&gt; 00:21:21,210
or inter-breeds differences.

272
00:21:21,280 --&gt; 00:21:26,720
The second one is group of method
for evaluation of variability

273
00:21:26,786 --&gt; 00:21:29,250
at the intra-population level.

274
00:21:29,320 --&gt; 00:21:34,520
In the case of first group,
we can speak about the calculation

275
00:21:34,586 --&gt; 00:21:38,810
of Wright&#039;s FST index at the genome-wide level,

276
00:21:38,880 --&gt; 00:21:43,570
quantification of differences in linkage disequilibrium,

277
00:21:43,640 --&gt; 00:21:47,720
which is method based on the analysis of

278
00:21:47,786 --&gt; 00:21:52,770
haplotype structure and also PCA analysis.

279
00:21:52,840 --&gt; 00:21:58,760
In the case of second group of method,
we can speak about the distribution

280
00:21:58,826 --&gt; 00:22:05,130
of runs of homozygosity or
heterozygosity-rich regions in the genome,

281
00:22:05,200 --&gt; 00:22:10,570
and also level of linkage disequilibrium,

282
00:22:10,640 --&gt; 00:22:16,080
RDA analysis, or Tajima&#039;s D statistics.

283
00:22:16,280 --&gt; 00:22:20,520
Similarly, as in case of analysis
of population structure,

284
00:22:20,586 --&gt; 00:22:25,440
also in this case, Wright&#039;s
FST is one of the most commonly used

285
00:22:25,506 --&gt; 00:22:32,450
approach for analysis of selection 
signals distribution in the genome.

286
00:22:32,520 --&gt; 00:22:37,400
In this case, selection signals are
identified based on the differences

287
00:22:37,466 --&gt; 00:22:40,730
in allelic frequencies between
populations,

288
00:22:40,800 --&gt; 00:22:45,080
which arose as a result of, for example,
different breeding goals

289
00:22:45,146 --&gt; 00:22:47,170
or breed standards.

290
00:22:47,240 --&gt; 00:22:54,490
Two basic types of signals we can obtain
if we use this approach

291
00:22:54,560 --&gt; 00:23:00,080
in which the different type of selection
correspond to the regions represented

292
00:23:00,146 --&gt; 00:23:05,210
by several loci or SNP markers with a high
value of FST index,

293
00:23:05,280 --&gt; 00:23:09,810
and on other hand,
by the regions with a low value

294
00:23:09,880 --&gt; 00:23:13,760
represent genomic regions that were
subject to the same type

295
00:23:13,826 --&gt; 00:23:16,800
of selection in a given breeds.

296
00:23:17,400 --&gt; 00:23:22,320
Threshold value, defining the signal,
is usually set up as 1%

297
00:23:22,386 --&gt; 00:23:25,250
of the highest FST values.

298
00:23:25,320 --&gt; 00:23:31,330
This method is relatively simply method
for calculation and is widely

299
00:23:31,400 --&gt; 00:23:34,730
used in population genetics.

300
00:23:34,800 --&gt; 00:23:39,570
But this method cannot be used if you
would like to analyze selection

301
00:23:39,640 --&gt; 00:23:43,810
signals at the intra-population level.

302
00:23:43,880 --&gt; 00:23:49,610
For the calculation of Wright&#039;s FST index
on the genome-wide level,

303
00:23:49,680 --&gt; 00:23:54,770
we can use, for example, PLINK
and for the visualization program R.

304
00:23:54,840 --&gt; 00:23:58,520
On the left side,
you can see example of the visualization

305
00:23:58,586 --&gt; 00:24:04,330
of Wright&#039;s FST distribution
in the autosomal genome.

306
00:24:04,400 --&gt; 00:24:10,450
This study was based on the genomic
data for beef cattle breeds.

307
00:24:10,520 --&gt; 00:24:16,960
What is typical for this type of study
is also description of selection signals.

308
00:24:17,026 --&gt; 00:24:22,570
That means we usually analyze start
and end position of the selection signals,

309
00:24:22,640 --&gt; 00:24:29,480
protein coding genes which are located
directly or very close to the selection

310
00:24:29,546 --&gt; 00:24:36,890
signals, and also QTLs, which are located
in the region of selection signal.

311
00:24:36,960 --&gt; 00:24:42,330
In the table on the right side,
you can really see that we found

312
00:24:42,400 --&gt; 00:24:47,400
many QTLs, which were previously
associated with important

313
00:24:47,466 --&gt; 00:24:51,010
phenotypic traits in cattle.

314
00:24:51,080 --&gt; 00:24:56,560
Another approach for estimation
of selection signals distribution

315
00:24:56,626 --&gt; 00:25:02,210
in the livestock genome is approach
which is based on the variability in

316
00:25:02,280 --&gt; 00:25:09,370
linkage disequilibrium, or we can say,
differences in linkage disequilibrium

317
00:25:09,440 --&gt; 00:25:11,330
between breeds.

318
00:25:11,400 --&gt; 00:25:15,720
In this case, I would like to speak about
integrated haplotype score,

319
00:25:15,786 --&gt; 00:25:19,640
which is very frequently used
for the analysis of selection

320
00:25:19,706 --&gt; 00:25:23,330
signals distribution in the genome.

321
00:25:23,400 --&gt; 00:25:28,360
In this case, selection signals are
derived from a change in the linkage

322
00:25:28,426 --&gt; 00:25:32,210
disequilibrium in the genome of the evaluated breeds

323
00:25:32,280 --&gt; 00:25:39,050
and the emergence of specific haplotypes 
due to the linkage disequilibrium.

324
00:25:39,120 --&gt; 00:25:45,690
Integrated haplotype score value can be
defined simply as a measure of how

325
00:25:45,760 --&gt; 00:25:51,010
unusual a haplotype consisting
of a specific SNP marker is

326
00:25:51,080 --&gt; 00:25:53,200
compared to the rest of the genome.

327
00:25:53,266 --&gt; 00:25:58,240
Integrated haplotype score is a particularly

328
00:25:58,306 --&gt; 00:26:02,810
sensitive method for detecting the effect
of recent selection that led

329
00:26:02,880 --&gt; 00:26:06,050
to an increase in the frequency of a certain

330
00:26:06,120 --&gt; 00:26:10,960
allelic variant in a population,
but has not yet eliminate

331
00:26:11,026 --&gt; 00:26:14,080
other variants at a given locus.

332
00:26:14,160 --&gt; 00:26:20,000
The analysis begins with the calculation
of extended haplotype homozygosity,

333
00:26:20,066 --&gt; 00:26:24,250
which quantifies the decrease
in homozygosity of the haplotype

334
00:26:24,320 --&gt; 00:26:29,800
from a certain SNP marker,
and then continues with the calculation

335
00:26:29,866 --&gt; 00:26:34,880
of the integrated haplotype score value,
which is based on the logarithm

336
00:26:34,946 --&gt; 00:26:39,850
of the ratio of integrated extended
haplotype homozygosity values

337
00:26:39,920 --&gt; 00:26:43,250
for two allelic variants.

338
00:26:43,320 --&gt; 00:26:48,960
Integrated haplotype score can reach
positive values when haplotype carrying

339
00:26:49,026 --&gt; 00:26:54,080
a single allele is longer
and has a higher extended haplotype

340
00:26:54,146 --&gt; 00:26:59,490
homozygosity, indicated a significant
effect of positive selection

341
00:26:59,560 --&gt; 00:27:04,320
or negative values
when an alternative allele has a higher

342
00:27:04,386 --&gt; 00:27:08,440
extended haplotype homozygosity which can
also reflect selection

343
00:27:08,506 --&gt; 00:27:11,010
but in opposite direction.

344
00:27:11,080 --&gt; 00:27:15,210
Threshold value defining the signal is set

345
00:27:15,280 --&gt; 00:27:18,250
similar to previous approach, for example,

346
00:27:18,320 --&gt; 00:27:24,850
as 1% of the highest positive
values of integrated haplotype score.

347
00:27:24,920 --&gt; 00:27:30,040
This approach is suitable for detecting
the effect of recent selection

348
00:27:30,106 --&gt; 00:27:35,730
and identification of signals which can
arise as a result of adaptation,

349
00:27:35,800 --&gt; 00:27:42,760
but is also sensitive for the data
quality, and if you would like to obtain

350
00:27:42,826 --&gt; 00:27:48,330
reliable estimates, you need high
quality and robust genomic data.

351
00:27:48,400 --&gt; 00:27:52,880
For the calculation of integrated
haplotype score, we can use, for example,

352
00:27:52,946 --&gt; 00:27:57,250
program Haploview or other R packages.

353
00:27:57,320 --&gt; 00:28:00,200
On the left side,
you can see example

354
00:28:00,266 --&gt; 00:28:04,800
from the analysis of variability
in linkages equilibrium in the genome

355
00:28:04,866 --&gt; 00:28:09,760
of milk and beef cattle breeds,
and on the right side,

356
00:28:09,826 --&gt; 00:28:12,650
you can see description of identified

357
00:28:12,720 --&gt; 00:28:17,880
selection signals and also genes and QTLs,

358
00:28:17,946 --&gt; 00:28:23,760
which were located directly
in the region of the signals.

359
00:28:24,640 --&gt; 00:28:30,320
Evaluation of the inter-population or
interbreed differences and the following

360
00:28:30,386 --&gt; 00:28:38,090
analysis of selection signatures can be
also performed by using PCA analysis.

361
00:28:38,160 --&gt; 00:28:43,040
In this case, this analysis assumes
that the signals in the genome arose as

362
00:28:43,106 --&gt; 00:28:46,360
a result of the local adaptation
of individuals to the

363
00:28:46,426 --&gt; 00:28:49,490
environmental conditions.

364
00:28:49,560 --&gt; 00:28:55,800
PCA analysis is in this context
an alternative method for identifying

365
00:28:55,866 --&gt; 00:28:59,570
selection signals to the Wright&#039;s FST index.

366
00:28:59,640 --&gt; 00:29:04,880
Detection of selection signals is based
on the assumption of the existence

367
00:29:04,946 --&gt; 00:29:11,080
of a correlation between genetic variants
and principal components which reflects

368
00:29:11,146 --&gt; 00:29:15,810
the local adaptation of population
to the production environment.

369
00:29:15,880 --&gt; 00:29:20,690
To identify selection signal,
different tests can be used,

370
00:29:20,760 --&gt; 00:29:23,690
for example, Mahalanobis distance test.

371
00:29:23,760 --&gt; 00:29:29,370
In this case, the identification of SNP
markers showing association with positive

372
00:29:29,440 --&gt; 00:29:34,370
selection is based on the construction
of a Z-score vector

373
00:29:34,440 --&gt; 00:29:39,530
obtained by regression analysis
of the relationship between SNP markers

374
00:29:39,600 --&gt; 00:29:42,770
and the principal components of K.

375
00:29:42,840 --&gt; 00:29:48,050
The threshold value which defined
the signal of selection can be,

376
00:29:48,120 --&gt; 00:29:54,250
in this case, determined, for example,
based on the false discovery rate test.

377
00:29:54,320 --&gt; 00:29:59,730
This method is really efficient
in case of visualization.

378
00:29:59,800 --&gt; 00:30:06,130
But because this method is alternative,
it&#039;s not so often used for the

379
00:30:06,200 --&gt; 00:30:11,530
quantification of selection
signals in the genome.

380
00:30:11,600 --&gt; 00:30:16,480
For the analysis of distribution
of selection signals in the genome

381
00:30:16,546 --&gt; 00:30:22,850
by using PCA analysis, can be use,
for example, R package PCAdapt.

382
00:30:22,920 --&gt; 00:30:28,200
This method also allows you to quantify

383
00:30:28,266 --&gt; 00:30:31,330
genetic differentiation in the data set

384
00:30:31,400 --&gt; 00:30:36,880
and then provide you information about the
selection signals distribution, as you

385
00:30:36,946 --&gt; 00:30:40,730
can see on the slide on the figure 13.

386
00:30:40,800 --&gt; 00:30:46,120
Then the last step of analysis is usually
description of the selection signals,

387
00:30:46,186 --&gt; 00:30:50,530
that mean description of the start
and the end position of the signal,

388
00:30:50,600 --&gt; 00:30:55,280
number of genes and number of QTLS, which are located

389
00:30:55,346 --&gt; 00:30:58,890
directly or very close to the signal.

390
00:30:58,960 --&gt; 00:31:03,520
If we would like to analyze
distribution of selection signals

391
00:31:03,586 --&gt; 00:31:08,960
at the intra-population level,
we can use method which is based

392
00:31:09,026 --&gt; 00:31:13,530
on the identification of runs
of homozygosity in the genome.

393
00:31:13,600 --&gt; 00:31:18,440
This approach assumes that regions
in the genome showing strong selection

394
00:31:18,506 --&gt; 00:31:24,250
signals are the results of an increase
in local homozygosity due to intensive

395
00:31:24,320 --&gt; 00:31:28,240
breeding to traits defined
in the breed standard of each breed.

396
00:31:28,306 --&gt; 00:31:31,640
Runs of homozygosity regions forming

397
00:31:31,706 --&gt; 00:31:36,730
selection signals located in the genome
are formed by the alleles derived

398
00:31:36,800 --&gt; 00:31:41,360
from common ancestors,
which can be inherited from generation

399
00:31:41,426 --&gt; 00:31:46,370
to generation in unchanging form.

400
00:31:46,440 --&gt; 00:31:52,680
Selection signals are then
identified based on the frequency of SNP

401
00:31:52,746 --&gt; 00:31:57,960
markers in runs of homozygosity
in specific region across

402
00:31:58,026 --&gt; 00:32:00,410
individuals in the population.

403
00:32:00,480 --&gt; 00:32:07,450
Threshold value for defining the signal is
similarly to another approach set to

404
00:32:07,520 --&gt; 00:32:10,570
as 1% of the highest value.

405
00:32:10,640 --&gt; 00:32:16,970
This method allows to detect regions where
there has been a decrease in diversity.

406
00:32:17,040 --&gt; 00:32:23,200
Because of this, this method also
can serve as a good indicator

407
00:32:23,266 --&gt; 00:32:25,730
of the effect of positive selection.

408
00:32:25,800 --&gt; 00:32:31,520
But if we would like to obtain reliable
estimates or reliable results,

409
00:32:31,586 --&gt; 00:32:36,970
we need to also have high
quality and robust genomic data.

410
00:32:37,040 --&gt; 00:32:40,810
On this slide, you can see results from the analysis

411
00:32:40,880 --&gt; 00:32:46,130
of distribution of runs of homozygosity
segments in the genome

412
00:32:46,200 --&gt; 00:32:49,090
of Slovak warmblood horse.

413
00:32:49,160 --&gt; 00:32:53,610
Based on the threshold value, we found the

414
00:32:53,680 --&gt; 00:32:57,170
selection signals on chromosome 1, 2, 6,

415
00:32:57,240 --&gt; 00:33:03,050
9, 11, 15, and 16,
And we also identified many genes

416
00:33:03,120 --&gt; 00:33:10,120
inside the regions of selection signals
which were included in the formation or

417
00:33:10,186 --&gt; 00:33:15,050
in the genetic control of important
phenotypic traits for horses.

418
00:33:15,120 --&gt; 00:33:19,640
On the other hand,
we can also analyze selection signals

419
00:33:19,706 --&gt; 00:33:24,840
distribution in the genome based on the
regions showing high

420
00:33:24,906 --&gt; 00:33:27,850
level of heterozygosity.

421
00:33:27,920 --&gt; 00:33:34,440
This method is usually used to detect
regions which may be important,

422
00:33:34,506 --&gt; 00:33:40,200
for example, in terms of adaptability or
response to environmental changes

423
00:33:40,266 --&gt; 00:33:42,530
or the occurrence of pathogens.

424
00:33:42,600 --&gt; 00:33:47,720
This method is based on the assumptions
that the heterozygous individuals have

425
00:33:47,786 --&gt; 00:33:52,840
usually higher fitness than
homozygous ones.

426
00:33:52,960 --&gt; 00:33:56,800
In this case, a high level of heterozygosity may be

427
00:33:56,866 --&gt; 00:34:02,320
the result of balancing selection effect
that means the preservation of genetic

428
00:34:02,386 --&gt; 00:34:05,280
diversity within a population.

429
00:34:05,346 --&gt; 00:34:10,930
Similar to analysis of 

430
00:34:11,000 --&gt; 00:34:12,930
runs of homozygosity

431
00:34:13,000 --&gt; 00:34:18,450
selection signals are
derived from the frequency of SNP markers

432
00:34:18,520 --&gt; 00:34:24,000
in heterozygosity-rich regions
in a specific genomic region across

433
00:34:24,066 --&gt; 00:34:26,370
individuals in the population.

434
00:34:26,440 --&gt; 00:34:29,120
Threshold value is usually set based

435
00:34:29,186 --&gt; 00:34:34,250
on the 1% of the highest values.

436
00:34:34,320 --&gt; 00:34:39,480
This approach allows us to detect regions
in which there is an increased

437
00:34:39,546 --&gt; 00:34:42,530
proportion of heterozygous genotypes.

438
00:34:42,600 --&gt; 00:34:49,440
That means that also can serve us as
an indicator of genomic regions which can

439
00:34:49,506 --&gt; 00:34:54,610
be important in terms of adaptation
or evolutionary potential.

440
00:34:54,680 --&gt; 00:35:00,130
But if we would like to have reliable
result, we need to also analyze high

441
00:35:00,200 --&gt; 00:35:04,130
quality and robust genomic data.

442
00:35:04,200 --&gt; 00:35:08,440
On this slide,
you can see results from the analysis

443
00:35:08,506 --&gt; 00:35:12,050
of distribution of heterozygosity-rich

444
00:35:12,120 --&gt; 00:35:15,650
regions in the five horse breeds,

445
00:35:15,720 --&gt; 00:35:20,760
and this study was based especially on the
analysis of distribution

446
00:35:20,826 --&gt; 00:35:26,280
of heterozygosity-rich regions
in the genomic coordinates of major

447
00:35:26,346 --&gt; 00:35:29,480
histocompatibility complex.

448
00:35:29,960 --&gt; 00:35:36,640
Another interesting approach is
identification of selection signals

449
00:35:36,706 --&gt; 00:35:40,130
in the genome based on the RDA analysis.

450
00:35:40,200 --&gt; 00:35:44,680
RDA tests the relationship between genetic
variability and also

451
00:35:44,746 --&gt; 00:35:46,690
environmental factors.

452
00:35:46,760 --&gt; 00:35:53,650
That means it quantified the influence of
natural selection on the genome structure.

453
00:35:53,720 --&gt; 00:35:59,810
This approach is basically a method
of evaluating genotype environment

454
00:35:59,880 --&gt; 00:36:05,240
association that evaluates the percentage
of genomic variability explained

455
00:36:05,306 --&gt; 00:36:10,280
by environmental variables and also
detects loci under a strong

456
00:36:10,346 --&gt; 00:36:12,360
selection pressure.

457
00:36:12,480 --&gt; 00:36:15,120
This method is two-step analysis

458
00:36:15,186 --&gt; 00:36:19,810
in which genetic and environmental data
are evaluated using

459
00:36:19,880 --&gt; 00:36:23,720
multivariate linear regression.

460
00:36:24,480 --&gt; 00:36:30,290
From advantages of this method, we can

461
00:36:30,360 --&gt; 00:36:33,170
Mention that this method is really

462
00:36:33,240 --&gt; 00:36:36,920
good approach to evaluate
the relationships between genetic

463
00:36:36,986 --&gt; 00:36:41,410
variability within a population
and environmental factors.

464
00:36:41,480 --&gt; 00:36:47,050
But similarly to previous approaches,
if you would like to have

465
00:36:47,120 --&gt; 00:36:51,530
good results or results with high

466
00:36:51,600 --&gt; 00:36:54,600
reliability, you need to also have information

467
00:36:54,666 --&gt; 00:36:59,410
about high number of SNP
markers and animals.

468
00:36:59,480 --&gt; 00:37:04,170
For RDA analysis, we can use, for example,

469
00:37:04,240 --&gt; 00:37:07,880
R Package vegan or DeepGenomeScan program.

470
00:37:08,960 --&gt; 00:37:15,690
Last approach which I would like
to mention is Tajima&#039;s D statistic,

471
00:37:15,760 --&gt; 00:37:21,130
which evaluates population diversity
and can be used

472
00:37:21,200 --&gt; 00:37:25,920
as an indicator of balancing selection.

473
00:37:26,120 --&gt; 00:37:30,210
Tajima&#039;s D can reach positive
or negative values.

474
00:37:30,280 --&gt; 00:37:36,290
Positive values indicated significant
effect of balancing selection,

475
00:37:36,360 --&gt; 00:37:41,880
and negative values, on the other hand,
can be associated with the effect

476
00:37:41,946 --&gt; 00:37:47,890
of positive selection on the genome
of analyzed population or breed.

477
00:37:47,960 --&gt; 00:37:53,890
Threshold value is defining similar
to other approach, for example,

478
00:37:53,960 --&gt; 00:37:59,730
as the 1% of the highest positive values.

479
00:37:59,800 --&gt; 00:38:04,400
This method allows us to detect
regions in which there is an increased

480
00:38:04,466 --&gt; 00:38:06,610
proportion of heterozygous genotypes.

481
00:38:06,680 --&gt; 00:38:12,640
That means it&#039;s relatively good
indicator of regions important,

482
00:38:12,706 --&gt; 00:38:17,330
for example, in term of adaptation.

483
00:38:17,400 --&gt; 00:38:21,930
But also, if we would like to have results

484
00:38:22,000 --&gt; 00:38:25,200
with good quality,

485
00:38:25,266 --&gt; 00:38:29,650
we also need to have information about
high number of markers

486
00:38:29,720 --&gt; 00:38:33,720
and high number of animals.

487
00:38:34,320 --&gt; 00:38:39,400
Here you can see the results
from the analysis of selection signals

488
00:38:39,466 --&gt; 00:38:43,480
distribution derived from the Tajima&#039;s D statistic

489
00:38:43,546 --&gt; 00:38:50,370
across the genome of five horse breeds
coming from Czech Republic and Slovakia.

490
00:38:50,440 --&gt; 00:38:54,760
As you can see, we found that selection

491
00:38:54,826 --&gt; 00:38:58,080
signals were distributed non-uniformly

492
00:38:58,146 --&gt; 00:39:04,490
across the genome of tested horse breeds,
but we also found that in some

493
00:39:04,560 --&gt; 00:39:10,840
genomic regions, selection
signals overlapped across breeds.

494
00:39:12,320 --&gt; 00:39:16,640
Next step, after identification of selection signals

495
00:39:16,706 --&gt; 00:39:22,530
in the genome is usually description
of the regions of selection signals.

496
00:39:22,600 --&gt; 00:39:29,130
This description is usually based
on the searching for quantitative trait

497
00:39:29,200 --&gt; 00:39:36,040
loci or protein coding genes located
directly or very close to the

498
00:39:36,106 --&gt; 00:39:39,160
region of selection signals.

499
00:39:39,360 --&gt; 00:39:44,330
Then it&#039;s also important to analyze

500
00:39:44,400 --&gt; 00:39:49,120
biological function of QTLs or genes.

501
00:39:49,186 --&gt; 00:39:51,770
For this purpose, we can use

502
00:39:51,840 --&gt; 00:39:56,800
several databases or tools, for example,

503
00:39:56,866 --&gt; 00:40:02,040
GO, which is gene ontology or
KEGG, which is Kyoto Encyclopedia

504
00:40:02,106 --&gt; 00:40:05,560
of Genes and Genomes.

505
00:40:06,120 --&gt; 00:40:13,810
Here you can see really good databases
for the identification of QTLs or genes.

506
00:40:13,880 --&gt; 00:40:18,760
For the identification of QTLs,
you can use animal QTL database

507
00:40:18,826 --&gt; 00:40:24,610
in which you can find information
about different livestock species.

508
00:40:24,680 --&gt; 00:40:30,200
Really good and simple web-based tool
for the obtaining information 

509
00:40:30,266 --&gt; 00:40:34,640
about the genes in a certain region is

510
00:40:34,960 --&gt; 00:40:37,520
a tool, Biomart, providing

511
00:40:37,586 --&gt; 00:40:40,370
by the Ensemble database.

512
00:40:40,440 --&gt; 00:40:43,560
If you would like to analyze

513
00:40:44,200 --&gt; 00:40:48,210
biological function of genes or biological

514
00:40:48,280 --&gt; 00:40:51,640
pathways in which are genes included

515
00:40:51,706 --&gt; 00:40:56,080
You can use, for example,
the web-based tool David.

516
00:40:58,760 --&gt; 00:41:03,360
What are advantages of functional
annotation of regions significantly

517
00:41:03,426 --&gt; 00:41:05,850
affected by selection pressure?

518
00:41:05,920 --&gt; 00:41:12,160
The main advantage is mainly the fact
that the detailed analysis of regions

519
00:41:12,226 --&gt; 00:41:16,290
in the genome significantly affected
by selection pressure

520
00:41:16,360 --&gt; 00:41:21,680
allows the identification of specific
genes and biological pathways

521
00:41:21,746 --&gt; 00:41:24,530
responsible for phenotypic traits.

522
00:41:24,600 --&gt; 00:41:29,360
The future research of identified genes or

523
00:41:29,426 --&gt; 00:41:33,240
QTLs in regions under strong selection

524
00:41:33,306 --&gt; 00:41:39,890
pressure can be in the future potentially
used in the breeding programs.

525
00:41:39,960 --&gt; 00:41:44,640
But functional annotation has also

526
00:41:44,706 --&gt; 00:41:46,090
disadvantages.

527
00:41:46,160 --&gt; 00:41:50,760
The most important problem is the fact

528
00:41:50,826 --&gt; 00:41:53,560
that the overlap between selection signals

529
00:41:53,626 --&gt; 00:42:00,000
and functional regions does not
always imply a causal relationship

530
00:42:00,066 --&gt; 00:42:06,530
and also the fact that the information
in the available databases is

531
00:42:06,600 --&gt; 00:42:12,640
limited to the current knowledge and may
not always cover all

532
00:42:12,706 --&gt; 00:42:16,840
relevant genes or QTL loci.

533
00:42:17,800 --&gt; 00:42:23,400
On this slide,
you find the list of the papers which were

534
00:42:23,466 --&gt; 00:42:27,610
used for the preparation of this presentation,

535
00:42:27,680 --&gt; 00:42:31,520
and the full text of the papers are also

536
00:42:31,586 --&gt; 00:42:36,040
available in the folder Study Materials.

537
00:42:36,920 --&gt; 00:42:41,410
By this slide, I would like
to thank you for your attention.

538
00:42:41,480 --&gt; 00:42:48,360
If you will have questions or if you would
like to continue with this topic

539
00:42:48,426 --&gt; 00:42:53,680
in the future and need help,
please contact me on my email address,

540
00:42:53,746 --&gt; 00:42:56,130
which you can see on the slide.

541
00:42:56,200 --&gt; 00:42:59,930
On the slide is also QR code.

542
00:43:00,000 --&gt; 00:43:03,120
By scanning of this QR code,

543
00:43:03,186 --&gt; 00:43:07,280
you can obtain access to other modules

544
00:43:07,346 --&gt; 00:43:10,960
which were prepared
within the project ISAGREED.

</div></pre>