1
00:00:01,131 --> 00:00:03,715
Hello, in this lecture for PhD
2
00:00:03,715 --> 00:00:06,097
students, we will focus on the
3
00:00:06,097 --> 00:00:08,116
possibilities of assessing genetic
4
00:00:08,237 --> 00:00:11,063
diversity and population structure using
5
00:00:11,224 --> 00:00:13,728
mitochondrial DNA and nuclear
6
00:00:13,728 --> 00:00:16,473
microsatellite markers applied to the
7
00:00:16,473 --> 00:00:19,138
honeybee. The lecture is part of module
8
00:00:19,138 --> 00:00:22,126
1, Animal Genetics. The creation
9
00:00:22,247 --> 00:00:24,266
of this presentation was supported by the
10
00:00:24,629 --> 00:00:27,456
Erasmus+ KA2 grant within
11
00:00:27,456 --> 00:00:29,959
ISAGREED project, innovation of the
12
00:00:29,959 --> 00:00:32,866
content and structure of study programs
13
00:00:32,866 --> 00:00:35,531
in the field of animal genetics and food
14
00:00:35,531 --> 00:00:37,469
resource management using
15
00:00:37,469 --> 00:00:38,680
digitalization.
16
00:00:40,739 --> 00:00:43,202
This lecture will cover topics such as
17
00:00:43,202 --> 00:00:45,786
genetic data acquisition, assessment of
18
00:00:45,786 --> 00:00:48,370
genetic variability using mitochondrial
19
00:00:48,370 --> 00:00:50,752
DNA sequences, and assessment of genetic
20
00:00:50,752 --> 00:00:53,700
variability using nuclear STRs
21
00:00:53,700 --> 00:00:56,526
or microsatellite markers. Why
22
00:00:56,526 --> 00:00:59,514
is genetic diversity important? Genetic
23
00:00:59,514 --> 00:01:02,259
diversity is important for the yield of
24
00:01:02,259 --> 00:01:05,167
populations. It is a key source of
25
00:01:05,328 --> 00:01:07,791
the ability to build tolerance or
26
00:01:07,791 --> 00:01:10,052
resistance to current and future
27
00:01:10,052 --> 00:01:12,919
diseases, pathogens and predators.
28
00:01:13,726 --> 00:01:16,230
The current state of bee populations can
29
00:01:16,230 --> 00:01:19,137
be attributed in part to a reduction in
30
00:01:19,137 --> 00:01:22,124
diversity. Bee diversity
31
00:01:22,286 --> 00:01:24,466
has been assessed using morphometrics
32
00:01:24,466 --> 00:01:26,808
traits such as wing parameters,
33
00:01:27,010 --> 00:01:29,877
pigmentation, etc. The
34
00:01:29,877 --> 00:01:32,622
honey bee, Apis mellifera, is
35
00:01:32,622 --> 00:01:35,045
now known to comprise 31
36
00:01:35,045 --> 00:01:37,629
subspecies, breeds or races.
37
00:01:38,032 --> 00:01:40,132
DNA analyses, particularly of
38
00:01:40,132 --> 00:01:42,635
mitochondrial origin, has
39
00:01:42,635 --> 00:01:45,058
facilitated the description of
40
00:01:45,098 --> 00:01:47,278
evolutionary lineages including the
41
00:01:47,278 --> 00:01:50,145
Western Mediterranean type M, the
42
00:01:50,145 --> 00:01:52,406
Northern Mediterranean type C,
43
00:01:53,093 --> 00:01:55,677
the African lineage A,and the
44
00:01:55,677 --> 00:01:57,817
oriental lineage O. The
45
00:01:59,916 --> 00:02:02,258
honeybee genome has been completely
46
00:02:02,258 --> 00:02:04,923
sequenced on multiple occasions. The
47
00:02:04,923 --> 00:02:07,911
individual chromosomes are visible and
48
00:02:07,911 --> 00:02:10,212
the penultimate column illustrates the
49
00:02:10,212 --> 00:02:13,200
size of each chromosome in terms of the
50
00:02:13,200 --> 00:02:15,986
number of base pairs. A total
51
00:02:17,116 --> 00:02:19,781
of 12,398
52
00:02:20,589 --> 00:02:23,415
genes have been described or are
53
00:02:23,415 --> 00:02:26,282
estimated to exist in the honeybee genome.
54
00:02:27,008 --> 00:02:27,614
Of these,
55
00:02:27,856 --> 00:02:30,602
9935
56
00:02:30,602 --> 00:02:33,509
genes coded for some kind of
57
00:02:33,509 --> 00:02:34,841
protein, while
58
00:02:36,335 --> 00:02:38,919
2421 genes don't
59
00:02:38,919 --> 00:02:41,665
code for proteins, but rather code for
60
00:02:41,745 --> 00:02:44,006
other RNAs, such as transfer
61
00:02:44,006 --> 00:02:46,348
RNAs or other small nuclear
62
00:02:46,348 --> 00:02:47,156
RNAs.
63
00:02:49,659 --> 00:02:52,243
The mitochondrial DNA of most species is
64
00:02:52,243 --> 00:02:54,746
estimatedto be within the range of
65
00:02:54,746 --> 00:02:57,734
approximately 16 to 20 kilobases.
66
00:02:58,461 --> 00:03:00,480
The mitochondrial genome of the Western
67
00:03:00,480 --> 00:03:02,822
honeybee is estimated to comprise
68
00:03:02,822 --> 00:03:03,871
approximately
69
00:03:04,033 --> 00:03:06,778
16,500
70
00:03:06,859 --> 00:03:09,605
base pairs. In the reference genome
71
00:03:09,605 --> 00:03:12,593
NC001566,
72
00:03:13,239 --> 00:03:15,015
the mitochondrial DNA is
73
00:03:15,177 --> 00:03:18,730
16,343
74
00:03:18,810 --> 00:03:21,717
base pairs in size. In the
75
00:03:21,717 --> 00:03:23,656
complete genome of the carpathian
76
00:03:23,656 --> 00:03:26,401
mitochondrial DNA, the size of the
77
00:03:26,482 --> 00:03:27,733
mitochondrial DNA is
78
00:03:27,733 --> 00:03:29,712
16,358
79
00:03:30,883 --> 00:03:33,830
base pairs. The
80
00:03:33,830 --> 00:03:35,849
mitochondrial DNA of the honey bee
81
00:03:35,849 --> 00:03:38,595
contains 13 genes that encode
82
00:03:38,595 --> 00:03:41,582
proteins, as well as 22 genes
83
00:03:41,825 --> 00:03:44,732
that encode tRNA and two genes that
84
00:03:44,732 --> 00:03:47,720
encode ribosomal RNA. In
85
00:03:47,720 --> 00:03:50,586
particular, the barcoding sequence, which
86
00:03:50,586 --> 00:03:53,049
is the cytochrome oxidase 1 site
87
00:03:53,049 --> 00:03:55,351
sequence, and the so-called intergenic
88
00:03:55,351 --> 00:03:57,692
regime, which includes parts of the
89
00:03:57,894 --> 00:04:00,478
tRNA genes for leucin and
90
00:04:00,478 --> 00:04:03,305
cytochrome oxidase 2, are employed
91
00:04:03,305 --> 00:04:05,969
for phylogenetic and
92
00:04:06,212 --> 00:04:07,746
phylogeographic analysis.
93
00:04:09,603 --> 00:04:11,703
What degree of variability can be
94
00:04:11,703 --> 00:04:14,206
observed at the DNA level and which
95
00:04:14,206 --> 00:04:17,113
molecular genetic markers exist? At
96
00:04:17,113 --> 00:04:19,859
present, the most frequently utilized
97
00:04:20,262 --> 00:04:22,039
are biallelic single nucleotide
98
00:04:22,039 --> 00:04:24,865
polymorphisms which can
99
00:04:24,865 --> 00:04:27,409
be identified in both coding and
100
00:04:27,409 --> 00:04:28,984
non-coding regions.
101
00:04:29,751 --> 00:04:32,496
Additionally, data on insertions and
102
00:04:32,496 --> 00:04:35,444
deletions, defined as the presence or
103
00:04:35,444 --> 00:04:37,947
absence of base, can be employed.
104
00:04:38,512 --> 00:04:41,258
These are commonly referred to as indels.
105
00:04:41,742 --> 00:04:44,326
The image on the right illustrated this.
106
00:04:44,326 --> 00:04:46,991
The top row depicts SNP markers,
107
00:04:47,476 --> 00:04:50,463
while the bottom row displays deletionsof
108
00:04:50,463 --> 00:04:53,169
cytosine in the ACA sequence
109
00:04:53,169 --> 00:04:53,613
region.
110
00:04:57,812 --> 00:05:00,638
Following the isolation of the DNA from
111
00:05:00,638 --> 00:05:03,626
the B sample and the amplification of
112
00:05:03,626 --> 00:05:06,049
specific small section, for
113
00:05:06,049 --> 00:05:08,996
example, one of the two genes mentioned
114
00:05:09,036 --> 00:05:11,943
above, sequencing is conducted
115
00:05:11,943 --> 00:05:14,406
using a capillary electrophoresis-based
116
00:05:14,406 --> 00:05:16,506
sequencer. The resulting
117
00:05:16,506 --> 00:05:19,373
identification of the individual bases
118
00:05:19,696 --> 00:05:21,714
The sequence is illustrated in the
119
00:05:21,714 --> 00:05:22,280
feature. The
120
00:05:26,600 --> 00:05:28,901
individual peaks are represented by the
121
00:05:28,901 --> 00:05:31,808
colors used to identify the individual
122
00:05:31,808 --> 00:05:32,374
bases.
123
00:05:36,734 --> 00:05:39,480
2 mitochondrial DNA sequences
124
00:05:39,641 --> 00:05:41,741
were employed for the purpose of
125
00:05:41,822 --> 00:05:44,809
identifying subtypes, mitotypes or
126
00:05:44,809 --> 00:05:47,716
haplotypes within the lineage. These are
127
00:05:47,716 --> 00:05:50,462
the aforementioned. Intergenic region
128
00:05:50,866 --> 00:05:53,046
tRNA Leucine-Cox2
129
00:05:54,984 --> 00:05:57,972
and cytochrome oxidase 1 region.
130
00:05:59,425 --> 00:06:01,686
This section comprise 2 mitochondrial
131
00:06:01,686 --> 00:06:04,190
genes, the transfer RNA for
132
00:06:04,190 --> 00:06:07,016
Leucine and the cytochrome oxidase 2.
133
00:06:07,581 --> 00:06:10,327
This sequence is distinguished by a
134
00:06:10,327 --> 00:06:13,072
high mutation content in the table
135
00:06:13,072 --> 00:06:15,939
variations in nucleotide length and
136
00:06:15,939 --> 00:06:18,240
compositions across honey bee
137
00:06:18,240 --> 00:06:21,067
populations. This amplicon
138
00:06:21,067 --> 00:06:23,409
is cleaved by the restriction endonuclease
139
00:06:23,409 --> 00:06:25,831
Dra1, which
140
00:06:25,831 --> 00:06:27,688
specifically recognize the
141
00:06:27,769 --> 00:06:30,353
TTTAAA sequence
142
00:06:30,757 --> 00:06:32,776
to identify each lineage.
143
00:06:33,745 --> 00:06:35,925
The second sequence is the sequence for
144
00:06:35,925 --> 00:06:38,913
the barcoding region, and this is part
145
00:06:38,953 --> 00:06:41,820
of cytochrome oxidase 1, cox1 gene.
146
00:06:43,031 --> 00:06:45,736
This sequence is compared to
147
00:06:45,777 --> 00:06:48,643
sequence stored in databases such as the
148
00:06:48,643 --> 00:06:50,743
BOLD system or GeneBank.
149
00:06:51,591 --> 00:06:54,013
The DNA fragment is highly conserved
150
00:06:54,336 --> 00:06:56,557
within taxa and is often used to
151
00:06:56,557 --> 00:06:59,101
distinguish taxa and species.
152
00:07:01,927 --> 00:07:04,511
The tRNA-Leucin-cox2
153
00:07:04,511 --> 00:07:07,418
sequence structure allows for the
154
00:07:07,459 --> 00:07:09,921
identification of distinct evolutionary
155
00:07:10,164 --> 00:07:12,909
lineages within the honeybee,
156
00:07:13,394 --> 00:07:15,736
the C lineage, which
157
00:07:15,736 --> 00:07:17,997
encompasses the honeybee,
158
00:07:17,997 --> 00:07:20,540
Apis mellifera, as well as the ligustica,
159
00:07:20,540 --> 00:07:22,761
macedonica and other related
160
00:07:22,761 --> 00:07:25,587
subspecies is characterized
161
00:07:25,587 --> 00:07:28,575
by the presence of a single copy of the Q
162
00:07:28,575 --> 00:07:31,482
sequence. The aforementioned
163
00:07:31,482 --> 00:07:34,470
lines contain one to two copies
164
00:07:34,793 --> 00:07:37,700
of the aforementioned Q sequence, in
165
00:07:37,700 --> 00:07:40,526
addition to the so-called
166
00:07:40,526 --> 00:07:43,393
P0 segment, the
167
00:07:43,514 --> 00:07:46,260
M lineage, which is the original black
168
00:07:46,260 --> 00:07:49,086
bee, Apis mellifera mellifera, which is no
169
00:07:49,086 --> 00:07:51,832
longer found in the Czech Republic, may
170
00:07:51,832 --> 00:07:54,739
contain one, two or three repeats of the
171
00:07:54,819 --> 00:07:56,1000
aforementioned Q sequence,
172
00:07:57,646 --> 00:08:00,553
in addition to the so-called P
173
00:08:00,553 --> 00:08:02,975
sequence. By sequencing and
174
00:08:02,975 --> 00:08:05,398
comparing individual sequences, it is
175
00:08:05,398 --> 00:08:07,901
possible to identify an evolutionary
176
00:08:07,901 --> 00:08:09,516
lineage in each bee.
177
00:08:11,939 --> 00:08:14,927
The identification of particular lineage
178
00:08:15,169 --> 00:08:17,349
is possible through cleavage with the
179
00:08:17,349 --> 00:08:19,691
restriction enzyme Dra1,
180
00:08:20,256 --> 00:08:21,952
which recognize the altered
181
00:08:22,194 --> 00:08:24,778
TTTAAA sequence.
182
00:08:25,586 --> 00:08:28,574
Once this change occurs as a result of
183
00:08:28,574 --> 00:08:31,158
mutation, this enzyme is unable to
184
00:08:31,158 --> 00:08:33,580
recognize this sequence. Instead, it is
185
00:08:33,580 --> 00:08:36,083
unable to cleave it. Using a
186
00:08:36,083 --> 00:08:38,506
classical PCR reaction
187
00:08:38,829 --> 00:08:40,606
based on the length of the individual
188
00:08:40,606 --> 00:08:43,513
fragments, it is possible to distinguish
189
00:08:43,513 --> 00:08:46,097
between variants such as C and
190
00:08:46,097 --> 00:08:48,358
A1 or A4.
191
00:08:50,538 --> 00:08:53,284
The second option is to obtain the entire
192
00:08:53,284 --> 00:08:56,029
sequence of a given segment by sequencing
193
00:08:56,514 --> 00:08:59,098
and subsequently analyzing it. The
194
00:08:59,098 --> 00:09:00,834
following example illustrates the
195
00:09:00,874 --> 00:09:03,781
sequencing and subsequent analysis of
196
00:09:03,862 --> 00:09:06,608
tRNA-leucin - Cox2 sequences from
197
00:09:07,415 --> 00:09:08,788
several individuals.
198
00:09:11,049 --> 00:09:13,310
Some software, such as UniPro
199
00:09:13,794 --> 00:09:16,540
Ugene, enables the user to
200
00:09:16,540 --> 00:09:19,447
perform the cleavage with Dra1
201
00:09:19,447 --> 00:09:22,273
restriction enzyme, in silico, that is
202
00:09:22,475 --> 00:09:24,777
on a computer. The following
203
00:09:24,777 --> 00:09:27,441
example illustrates the cleavage of a
204
00:09:27,441 --> 00:09:30,349
sequence belonging to C lineage,
205
00:09:30,672 --> 00:09:33,094
which contains 3 cleavage sites,
206
00:09:33,417 --> 00:09:35,840
resulting in three fragments of specific
207
00:09:35,840 --> 00:09:36,203
length.
208
00:09:38,747 --> 00:09:41,411
In another sample, only two
209
00:09:41,411 --> 00:09:44,076
cleavage sites were identified and the
210
00:09:44,076 --> 00:09:46,499
length of the fragment suggests that this
211
00:09:46,499 --> 00:09:49,244
is a bee belonging to the A lineage or
212
00:09:49,244 --> 00:09:50,456
African lineage.
213
00:09:53,605 --> 00:09:56,108
Subsequently, the sequences
214
00:09:56,108 --> 00:09:58,531
obtained from the larger population are
215
00:09:58,531 --> 00:10:01,276
compared using the method of multiple
216
00:10:01,276 --> 00:10:04,022
sequential alignment, MSA.
217
00:10:05,072 --> 00:10:07,979
This process could be completed
218
00:10:07,979 --> 00:10:10,321
manually; However, software has been
219
00:10:10,321 --> 00:10:13,228
developed with algorithms that facilitate
220
00:10:13,470 --> 00:10:16,458
this task. Some of these programs are
221
00:10:16,458 --> 00:10:18,719
accessible online, for example on the
222
00:10:18,799 --> 00:10:20,738
European Bioinformatics Institute
223
00:10:20,738 --> 00:10:22,918
servers. For our purposes,
224
00:10:22,918 --> 00:10:25,421
Kalign was the most suitable.
225
00:10:25,744 --> 00:10:28,328
However, there are other tools such as
226
00:10:28,328 --> 00:10:31,316
Clustal Omega, MAFT
227
00:10:31,477 --> 00:10:34,223
and so on. Additionally, there are
228
00:10:34,223 --> 00:10:36,646
programs that can be downloaded and
229
00:10:36,646 --> 00:10:39,512
installed to perform this analysis, such
230
00:10:39,512 --> 00:10:41,612
as MEGA or the Unipro Ugene.
231
00:10:44,519 --> 00:10:47,063
The DnaSP program was
232
00:10:47,063 --> 00:10:49,889
employed to identify DNA polymorphisms
233
00:10:49,889 --> 00:10:52,796
and haplotypes in both mitochondrial DNA
234
00:10:52,796 --> 00:10:55,541
regions, utilizing all sequences from
235
00:10:55,541 --> 00:10:57,883
multiple sequence alignments in FASTA
236
00:10:57,883 --> 00:10:58,126
format.
237
00:11:03,960 --> 00:11:06,786
Moreover, nucleotide substitutions and
238
00:11:06,786 --> 00:11:09,350
insertion deletions for each haplotype
239
00:11:09,431 --> 00:11:11,692
were compared with the reference genome.
240
00:11:12,338 --> 00:11:14,881
To identify specific haplotypes in the
241
00:11:14,881 --> 00:11:17,425
tRNA leucin - cox2, lineage
242
00:11:17,667 --> 00:11:20,453
C and A, reference sequences
243
00:11:20,574 --> 00:11:23,562
with 100% identity were
244
00:11:23,562 --> 00:11:26,106
further searched using BLAST
245
00:11:26,469 --> 00:11:29,376
local pairwise alignment tools. Again,
246
00:11:29,376 --> 00:11:31,799
sequences found in the National Center
247
00:11:31,799 --> 00:11:34,221
for Biotechnology Information (NCBI)
248
00:11:34,867 --> 00:11:37,371
database at the US GenBank.
249
00:11:38,098 --> 00:11:40,682
BLAST was also employed to verify the
250
00:11:40,682 --> 00:11:43,669
cox1 haplotypes, with the sequences
251
00:11:43,669 --> 00:11:46,496
subsequently validated using the
252
00:11:46,496 --> 00:11:48,636
BOLD database based on the multiple
253
00:11:48,636 --> 00:11:51,583
alignments using Kalign, necessitating
254
00:11:51,583 --> 00:11:53,521
additional manual refinement.
255
00:11:56,065 --> 00:11:58,931
A total of 13 haplotypes were
256
00:11:58,931 --> 00:12:01,596
identified, three of which belonged to
257
00:12:01,637 --> 00:12:04,342
the A lineage and the rest to the
258
00:12:04,342 --> 00:12:06,885
C lineage. The most prevalent
259
00:12:06,885 --> 00:12:09,550
haplotype was C1a, which is
260
00:12:09,550 --> 00:12:11,932
typical for Apis meliffera linguistica, the
261
00:12:11,973 --> 00:12:14,920
Italian bee. The table illustrates the
262
00:12:14,920 --> 00:12:17,262
classification of individual haplotypes
263
00:12:17,262 --> 00:12:20,210
into C and A lineages based
264
00:12:20,210 --> 00:12:22,713
on Dra1 spectrum cleavage and
265
00:12:22,713 --> 00:12:25,660
sequencing. The individual haplotypes
266
00:12:25,660 --> 00:12:28,567
and their sequences have been uploaded to
267
00:12:28,567 --> 00:12:31,313
the Genebank databases on the NCBI
268
00:12:31,313 --> 00:12:34,180
server. In the third column, the
269
00:12:34,180 --> 00:12:36,885
reference sequences are
270
00:12:36,885 --> 00:12:39,792
displayed. Additionally, the numbers
271
00:12:39,792 --> 00:12:42,134
and lengths of the fragments produced by
272
00:12:42,134 --> 00:12:45,121
the cleavage are shown, which also
273
00:12:45,121 --> 00:12:47,625
demonstrate the considerable variability.
274
00:12:48,594 --> 00:12:50,653
In the last two columns, the
275
00:12:50,774 --> 00:12:53,520
identification of or comparison with
276
00:12:53,641 --> 00:12:56,023
other sequences in the GenBank database
277
00:12:56,023 --> 00:12:58,607
is presented, where sequences
278
00:12:58,688 --> 00:13:01,595
with 100% identity to our
279
00:13:01,595 --> 00:13:03,936
sequences have been selected.
280
00:13:05,148 --> 00:13:07,813
It is notable that all haplotypes
281
00:13:07,813 --> 00:13:10,558
belonging to C lineage have been
282
00:13:10,558 --> 00:13:12,819
previously described in Apis meliffera
283
00:13:12,819 --> 00:13:15,484
carnica. Additionally, 3
284
00:13:15,484 --> 00:13:17,341
distinct African haplotypes are
285
00:13:17,785 --> 00:13:20,046
identified as Apis meliffera
286
00:13:20,168 --> 00:13:23,155
iberica. However, a single
287
00:13:23,398 --> 00:13:26,022
sequences exhibiting complete
288
00:13:26,022 --> 00:13:28,566
identity wasn't assigned to any
289
00:13:28,566 --> 00:13:30,262
particular subspecies.
290
00:13:32,523 --> 00:13:34,420
This table illustrates the
291
00:13:34,420 --> 00:13:36,843
identification of the most significant
292
00:13:36,843 --> 00:13:39,548
polymorphic sites indicating the bases
293
00:13:39,548 --> 00:13:42,294
present in each haplotype. The
294
00:13:42,294 --> 00:13:44,878
positions were found to exhibit mainly
295
00:13:44,878 --> 00:13:47,502
single nucleotide polymorphisms and
296
00:13:47,502 --> 00:13:50,046
deletions. It is notable that
297
00:13:50,126 --> 00:13:52,064
position 50 displays
298
00:13:52,387 --> 00:13:55,052
polymorphisms with the standard
299
00:13:55,052 --> 00:13:58,040
allele identified as C. The remaining
300
00:13:58,040 --> 00:14:00,624
haplotypes at this position exhibited a
301
00:14:00,624 --> 00:14:01,432
deletion.
302
00:14:03,450 --> 00:14:05,873
Similarly, the cox1 sequence
303
00:14:06,115 --> 00:14:08,942
was analyzed, whereby 13
304
00:14:08,942 --> 00:14:11,687
different haplotypes for barcoding
305
00:14:12,252 --> 00:14:15,159
were identified. As with the
306
00:14:15,159 --> 00:14:17,340
previous analysis, individual SNP
307
00:14:17,340 --> 00:14:20,085
mutations were observed. No insertion or
308
00:14:20,085 --> 00:14:22,508
deletion were identified with the
309
00:14:22,508 --> 00:14:25,011
barcoding sequence. Only SNP
310
00:14:25,011 --> 00:14:27,191
substitutions were presented.
311
00:14:30,341 --> 00:14:32,925
The tables below present the results of
312
00:14:32,925 --> 00:14:35,186
the haplotype frequencies in the
313
00:14:35,186 --> 00:14:37,851
tRNA-Leucin - cox2 gene
314
00:14:38,254 --> 00:14:40,354
and in the cox1 gene in the Czech
315
00:14:40,354 --> 00:14:43,261
Republic. It can be observed that there
316
00:14:43,261 --> 00:14:46,007
are a number of haplotypes that are
317
00:14:46,087 --> 00:14:48,348
relatively well represented, such as
318
00:14:48,348 --> 00:14:50,690
C1A, C2L,
319
00:14:50,932 --> 00:14:52,951
C2E, and C2C.
320
00:14:53,759 --> 00:14:56,423
In contrast, there are haplotypes that
321
00:14:56,423 --> 00:14:59,169
have been identified in only one or a
322
00:14:59,169 --> 00:15:02,157
few individuals. A similar
323
00:15:02,238 --> 00:15:05,064
situation was observed in the cox1
324
00:15:05,064 --> 00:15:07,244
gene where the first four
325
00:15:07,244 --> 00:15:09,747
haplotypes were the most frequently
326
00:15:09,747 --> 00:15:11,928
represented, while the others were
327
00:15:12,009 --> 00:15:14,431
present in minority or individual
328
00:15:14,431 --> 00:15:15,239
samples.
329
00:15:18,469 --> 00:15:20,487
From a population of bees in the Czech
330
00:15:20,487 --> 00:15:23,071
Republic comprising over 300
331
00:15:23,071 --> 00:15:26,059
samples, certain characteristics were
332
00:15:26,059 --> 00:15:28,724
calculated, including haplotype diversity
333
00:15:28,724 --> 00:15:31,389
parameters in the tRNA leucine -
334
00:15:31,389 --> 00:15:34,054
cox2 and cox1 sequences.
335
00:15:34,700 --> 00:15:37,284
The genetic diversity indices, namely
336
00:15:37,284 --> 00:15:40,272
haplotype diversity Hd, molecular
337
00:15:40,272 --> 00:15:43,098
diversity pí and Tajima´s D
338
00:15:43,421 --> 00:15:46,409
were estimated. These indices
339
00:15:47,135 --> 00:15:49,154
were evaluated using PEGAS
340
00:15:50,043 --> 00:15:52,304
in program R, although
341
00:15:52,465 --> 00:15:55,211
alternative software such as DnaSP,
342
00:15:55,614 --> 00:15:58,521
MEGA or Arlequine can be also employed.
343
00:16:01,509 --> 00:16:03,851
Additionally, the so-called
344
00:16:04,093 --> 00:16:06,677
haplotype networks were determined.
345
00:16:07,485 --> 00:16:09,907
We use the Randomized Minimum Spanning
346
00:16:09,988 --> 00:16:12,814
Tree method (RMSAT),
347
00:16:13,622 --> 00:16:16,085
which takes into account frequencies and
348
00:16:16,206 --> 00:16:18,306
relationships between haplotypes.
349
00:16:19,436 --> 00:16:22,101
These haplotype networks were processed
350
00:16:22,182 --> 00:16:24,443
using the PEGAS package in R.
351
00:16:25,169 --> 00:16:27,754
However, other programs such as PopArt
352
00:16:28,521 --> 00:16:29,853
and others can be used.
353
00:16:34,698 --> 00:16:37,524
Here we see the result of haplotype
354
00:16:37,524 --> 00:16:39,543
network analysis for
355
00:16:39,543 --> 00:16:42,006
308 individuals in the
356
00:16:42,006 --> 00:16:44,711
tRNA-Leucine - cox2 sequence.
357
00:16:45,438 --> 00:16:48,103
The most common haplotype is C1A,
358
00:16:48,507 --> 00:16:51,494
followed by C2E, which has two point
359
00:16:51,494 --> 00:16:54,402
mutations, and C2C, which is the
360
00:16:54,402 --> 00:16:57,228
third most common haplotype in the Czech
361
00:16:57,228 --> 00:17:00,054
Republic. Each color in the circle
362
00:17:00,054 --> 00:17:02,800
represents a specific region from which
363
00:17:03,607 --> 00:17:06,272
the bee was obtained. The aim was to
364
00:17:06,272 --> 00:17:09,139
cover the whole Czech Republic, with
365
00:17:09,179 --> 00:17:11,723
the sampling evenly distributed
366
00:17:11,723 --> 00:17:12,894
across all regions.
367
00:17:14,993 --> 00:17:17,900
This slide presents the analysis
368
00:17:17,941 --> 00:17:20,161
of the haplotype network for the cox1
369
00:17:20,161 --> 00:17:22,342
sequence. As observed
370
00:17:22,422 --> 00:17:24,199
previously, if a haplotype was
371
00:17:24,199 --> 00:17:26,864
sufficiently abundant, it occurred in
372
00:17:26,944 --> 00:17:29,448
almost all regions. This is
373
00:17:29,771 --> 00:17:32,637
exemplified by HpB02,
374
00:17:34,616 --> 00:17:36,150
HpB03,
375
00:17:36,473 --> 00:17:38,411
HpB01 and
376
00:17:38,492 --> 00:17:40,349
HpB04.
377
00:17:41,722 --> 00:17:44,064
Conversely, the other haplotypes were
378
00:17:44,064 --> 00:17:46,971
present in a few individuals or in only
379
00:17:46,971 --> 00:17:48,021
one individual.
380
00:17:51,251 --> 00:17:53,350
Subsequently, further
381
00:17:53,512 --> 00:17:56,177
phylogenetic analysis was conducted on
382
00:17:56,177 --> 00:17:58,599
these sequences. Following the
383
00:17:58,599 --> 00:18:01,425
completion of the MSA multiple
384
00:18:01,425 --> 00:18:04,252
sequence alignment, phylogenetic tree
385
00:18:04,817 --> 00:18:07,482
generation was conducted using maximum
386
00:18:07,482 --> 00:18:10,268
likelihood method and the Tamura-Nei
387
00:18:10,268 --> 00:18:12,731
model in Mega X software.
388
00:18:13,619 --> 00:18:16,284
This method entitled the construction of
389
00:18:16,607 --> 00:18:19,352
a bootstrap consensus tree based on
390
00:18:19,433 --> 00:18:21,936
10,000 replicates. The
391
00:18:21,936 --> 00:18:24,520
individual branches correspond to
392
00:18:24,601 --> 00:18:27,104
partitions produced in less than 50% of
393
00:18:27,104 --> 00:18:30,092
bootstrap replicates, as well
394
00:18:30,092 --> 00:18:32,353
as the percentage of replication trees
395
00:18:32,353 --> 00:18:35,341
that clustered related haplotypes in
396
00:18:35,341 --> 00:18:37,925
the bootstrap test. The
397
00:18:37,925 --> 00:18:40,913
initial tree for the heuristic
398
00:18:40,994 --> 00:18:43,255
search was obtained automatically by
399
00:18:43,255 --> 00:18:46,081
applying the Neighbor-Joining and BioNJ
400
00:18:46,848 --> 00:18:49,473
algorithms to the pairwise distances
401
00:18:49,473 --> 00:18:52,460
matrix estimated by the Tamura-Nei model,
402
00:18:52,945 --> 00:18:55,448
and then selecting the topology with the
403
00:18:55,448 --> 00:18:57,467
highest log-likelihood value.
404
00:18:58,396 --> 00:19:00,495
As a result, the following phylogenetic
405
00:19:00,495 --> 00:19:02,716
tree were obtained.
406
00:19:04,977 --> 00:19:07,319
The phylogenetic tree based on the
407
00:19:07,319 --> 00:19:10,064
analysis of tRNA leucine-cox2
408
00:19:10,064 --> 00:19:12,648
sequence is displayed on the left.
409
00:19:13,133 --> 00:19:15,798
The phylogenetic tree based on the cox1
410
00:19:15,798 --> 00:19:18,705
sequence is displayed on the right. It
411
00:19:18,705 --> 00:19:21,127
can be observed that the bees from lineage
412
00:19:21,127 --> 00:19:24,115
A cluster together, in
413
00:19:24,115 --> 00:19:26,376
contrast to the other bees,
414
00:19:26,699 --> 00:19:29,687
namely those from lineage C, which are
415
00:19:29,687 --> 00:19:30,979
marked in red.
416
00:19:35,662 --> 00:19:38,610
The second type of markers used
417
00:19:38,610 --> 00:19:40,508
to assess genetic variability are
418
00:19:40,508 --> 00:19:42,849
microsatellites. These are
419
00:19:42,849 --> 00:19:45,797
polymorphisms that occur exclusively in
420
00:19:45,797 --> 00:19:48,098
nuclear DNA. These
421
00:19:48,098 --> 00:19:50,521
polymorphisms are characterized by the
422
00:19:50,521 --> 00:19:53,105
repetition of a particular motif,
423
00:19:53,589 --> 00:19:56,456
such as GC, in a series of
424
00:19:56,456 --> 00:19:59,323
units called tandem repeats. Each
425
00:19:59,323 --> 00:20:01,665
allele is referred to be the length of
426
00:20:01,665 --> 00:20:04,572
this repeat. To illustrate, we
427
00:20:04,572 --> 00:20:06,752
have an allele with eight repeats,
428
00:20:07,236 --> 00:20:09,659
another with three repeats, and the last
429
00:20:09,659 --> 00:20:11,637
with ten repeats.
430
00:20:12,727 --> 00:20:15,312
The designation of allele is dependent
431
00:20:16,200 --> 00:20:18,784
on the region in which the microsatellite
432
00:20:18,784 --> 00:20:21,287
is located, which is bounded by
433
00:20:21,287 --> 00:20:23,669
primers. The length of the segment
434
00:20:23,669 --> 00:20:25,648
containing the microsatellite can be
435
00:20:25,648 --> 00:20:28,232
determined using a sequencer and
436
00:20:28,232 --> 00:20:31,179
fragmentation analysis. The figure
437
00:20:31,179 --> 00:20:33,804
below illustrates the genotyping for a
438
00:20:33,804 --> 00:20:36,711
particular microsatellite, which in this
439
00:20:36,711 --> 00:20:39,699
case is characterized by three alleles
440
00:20:39,941 --> 00:20:42,040
numbered 156,
441
00:20:42,484 --> 00:20:45,028
152, and
442
00:20:46,966 --> 00:20:48,016
142.
443
00:20:52,457 --> 00:20:54,718
We can see that the microsatellites are
444
00:20:54,718 --> 00:20:56,979
very polymorphic, they can have
445
00:20:57,868 --> 00:21:00,654
not just three alleles, but 20
446
00:21:00,654 --> 00:21:03,318
alleles, which in population can mean a
447
00:21:03,318 --> 00:21:05,943
large number of different combinations of
448
00:21:05,943 --> 00:21:08,688
genotypes. So they are useful
449
00:21:08,688 --> 00:21:11,192
for assessing diversity in populations.
450
00:21:11,757 --> 00:21:14,583
Here is an example of variability in
451
00:21:14,583 --> 00:21:17,571
two populations. The population on
452
00:21:17,571 --> 00:21:19,913
the left has little variability
453
00:21:20,397 --> 00:21:22,901
containing only three types of alleles
454
00:21:23,143 --> 00:21:25,162
and a large number of homozygous
455
00:21:25,242 --> 00:21:27,503
individuals. The second
456
00:21:27,503 --> 00:21:30,047
population on the right contains a
457
00:21:30,047 --> 00:21:32,591
large number of alleles and may often
458
00:21:32,591 --> 00:21:35,256
contain heterozygous genotypes.
459
00:21:42,533 --> 00:21:45,198
Why are microsatellite markers still
460
00:21:45,198 --> 00:21:47,621
used when we have whole genome
461
00:21:47,621 --> 00:21:50,285
sequences? Microsatellite
462
00:21:50,285 --> 00:21:52,708
markers are relatively inexpensive,
463
00:21:53,354 --> 00:21:55,534
and their identification provides
464
00:21:55,534 --> 00:21:57,795
multilocus genotype information.
465
00:21:58,997 --> 00:22:01,177
They can easily be used to estimate the
466
00:22:01,177 --> 00:22:03,882
genetic diversity of populations and
467
00:22:03,882 --> 00:22:06,587
structures, which is also important in
468
00:22:06,587 --> 00:22:08,768
conservation genetics and breeding.
469
00:22:11,836 --> 00:22:14,016
The only method used to determine
470
00:22:14,016 --> 00:22:16,358
genotypes is fragmentation analysis,
471
00:22:17,004 --> 00:22:19,750
which is performed in sequencer using
472
00:22:19,750 --> 00:22:22,657
capillary electrophoresis. The fragments
473
00:22:22,738 --> 00:22:25,322
are separated according to their size,
474
00:22:25,564 --> 00:22:27,421
and the sensor detects the passage of the
475
00:22:27,421 --> 00:22:29,844
molecules, their color and signal
476
00:22:29,844 --> 00:22:32,024
intensity over time, providing
477
00:22:32,024 --> 00:22:33,720
information about the length of the
478
00:22:33,720 --> 00:22:36,465
fragments. The instrument used
479
00:22:36,546 --> 00:22:39,049
is a genetic analyzer. In our
480
00:22:39,049 --> 00:22:41,512
case, we used the ABIPrism
481
00:22:41,512 --> 00:22:43,168
3500
482
00:22:44,056 --> 00:22:47,044
genetic analyzer. Fragment
483
00:22:47,044 --> 00:22:49,305
size were actually determined using
484
00:22:49,305 --> 00:22:52,131
GeneScan software and genotypes were
485
00:22:52,131 --> 00:22:54,554
determined using GeneMapper software.
486
00:22:58,834 --> 00:23:01,095
As we were looking at 22
487
00:23:01,095 --> 00:23:03,759
microsatellite loci, we grouped
488
00:23:03,759 --> 00:23:05,778
certain microsatellites in a single
489
00:23:05,778 --> 00:23:08,685
multiplex reaction. We were able
490
00:23:08,685 --> 00:23:11,390
to identify several microsatellites under
491
00:23:11,390 --> 00:23:14,015
the same conditions, distinguished by
492
00:23:14,015 --> 00:23:15,226
different color.
493
00:23:18,698 --> 00:23:20,879
The result of the analysis is displayed
494
00:23:21,121 --> 00:23:23,988
here for a particular microsatellite
495
00:23:23,988 --> 00:23:26,935
locus, we see the color-coded peaks,
496
00:23:27,137 --> 00:23:29,559
which are identified fragments of a
497
00:23:29,559 --> 00:23:30,771
particular length.
498
00:23:35,575 --> 00:23:38,563
The figure shows the evaluation
499
00:23:38,604 --> 00:23:40,744
of the variability of genotyping of
500
00:23:40,744 --> 00:23:43,247
microsatellite loci in three
501
00:23:43,247 --> 00:23:46,235
individuals. We can see that different
502
00:23:46,235 --> 00:23:48,052
alleles can be present at certain
503
00:23:48,052 --> 00:23:50,757
positions of the region, which allows
504
00:23:50,757 --> 00:23:53,018
easy discrimination of individuals.
505
00:24:00,124 --> 00:24:02,183
After determining the genotypes at
506
00:24:02,304 --> 00:24:04,969
all loci and for all individuals in the
507
00:24:04,969 --> 00:24:07,795
population, the next step is to
508
00:24:07,795 --> 00:24:10,460
perform a diversity analysis. It is
509
00:24:10,783 --> 00:24:13,609
to determine diversity parameters such as
510
00:24:13,609 --> 00:24:15,386
the number of alleles , the
511
00:24:16,597 --> 00:24:19,424
effective number of alleles , the
512
00:24:19,424 --> 00:24:21,372
Shannon Information Index ,
513
00:24:22,492 --> 00:24:23,865
the observed and expected
514
00:24:24,430 --> 00:24:27,257
heterozygosity
515
00:24:27,257 --> 00:24:30,244
respectively, and the unbiased expected
516
00:24:30,244 --> 00:24:33,071
heterozygosity uHE,
517
00:24:33,717 --> 00:24:36,462
and the so-called fixation index F.
518
00:24:37,916 --> 00:24:40,823
We use the GenAlEx program, which runs
519
00:24:40,984 --> 00:24:43,528
in Microsoft Excel, but the data and
520
00:24:43,528 --> 00:24:46,152
parameters can also be calculated using,
521
00:24:46,475 --> 00:24:49,382
for example, diversity package
522
00:24:49,382 --> 00:24:51,886
in R. We see
523
00:24:53,420 --> 00:24:56,004
that the expected heterozygosity averaged
524
00:24:56,166 --> 00:24:58,992
over all loci combined is
525
00:24:58,992 --> 00:25:01,818
0.579
526
00:25:02,182 --> 00:25:04,806
and the actual observed heterozygosity
527
00:25:05,129 --> 00:25:05,371
is
528
00:25:06,340 --> 00:25:09,328
0.556. This is a
529
00:25:09,328 --> 00:25:12,154
relatively high heterozygosity which
530
00:25:12,154 --> 00:25:15,061
characterized these bees populations
531
00:25:15,465 --> 00:25:17,524
in sufficiently divergent.
532
00:25:20,795 --> 00:25:23,540
Since we knew which area, district
533
00:25:23,904 --> 00:25:25,801
of the Czech Republic each individual
534
00:25:25,801 --> 00:25:28,466
came from, we divided the
535
00:25:28,466 --> 00:25:31,131
population of the Czech Republic into 77
536
00:25:31,131 --> 00:25:33,311
districts, which characterized the
537
00:25:33,311 --> 00:25:36,218
geographical areas. This allowed
538
00:25:36,299 --> 00:25:38,883
us to calculate the so-called Wright´s F
539
00:25:38,883 --> 00:25:41,790
statistics, Fst, Fis,
540
00:25:41,952 --> 00:25:44,778
and Fit, and the so-called
541
00:25:44,778 --> 00:25:47,160
analysis of molecular variance, which
542
00:25:47,160 --> 00:25:49,139
determine the proportion of
543
00:25:49,219 --> 00:25:51,965
variability between populations,
544
00:25:52,126 --> 00:25:54,307
between individuals within populations,
545
00:25:54,387 --> 00:25:57,214
and within individuals. On the
546
00:25:57,214 --> 00:25:59,717
table on the left, characterize
547
00:26:00,282 --> 00:26:03,189
the individuality and the mean values of
548
00:26:03,189 --> 00:26:06,016
the F statistic. The Fst is
549
00:26:06,016 --> 00:26:08,519
most interesting because it determines
550
00:26:08,761 --> 00:26:10,296
the degree of variation between
551
00:26:10,296 --> 00:26:12,718
subpopulations. It is districts.
552
00:26:13,364 --> 00:26:14,575
The value of
553
00:26:14,979 --> 00:26:17,805
0.086 is not very high,
554
00:26:17,805 --> 00:26:19,743
but it is shown some
555
00:26:19,743 --> 00:26:22,731
diversification in the table
556
00:26:22,731 --> 00:26:25,638
on the right. We see the result of the
557
00:26:25,638 --> 00:26:27,738
analysis of molecular variances,
558
00:26:28,949 --> 00:26:31,291
where in the last column the variation
559
00:26:31,291 --> 00:26:33,875
between regional populations districts
560
00:26:34,198 --> 00:26:36,782
account for only 1% of the total
561
00:26:36,782 --> 00:26:39,285
variation. But even
562
00:26:39,528 --> 00:26:42,273
1 to 3% of variability between
563
00:26:42,273 --> 00:26:45,180
geographical areas are common figures
564
00:26:45,180 --> 00:26:47,603
according to the other publications.
565
00:26:48,733 --> 00:26:51,075
Variability between individuals within
566
00:26:51,075 --> 00:26:53,901
populations is expressed as 6%.
567
00:26:54,628 --> 00:26:56,687
and within individual variability
568
00:26:56,687 --> 00:26:58,868
accounts for most of the variability
569
00:26:58,868 --> 00:27:00,039
within populations.
570
00:27:04,884 --> 00:27:07,548
Paired Nei´s and paired FST
571
00:27:07,791 --> 00:27:10,455
genetic distances were calculated
572
00:27:10,455 --> 00:27:12,313
using the GenAlEx program.
573
00:27:13,201 --> 00:27:15,462
Paired values of these distances were
574
00:27:15,462 --> 00:27:17,885
used for principal component analysis
575
00:27:17,885 --> 00:27:20,388
calculations. The top graph
576
00:27:20,388 --> 00:27:22,730
depicts the distances between the first
577
00:27:22,810 --> 00:27:24,991
and second components for each district.
578
00:27:25,718 --> 00:27:28,221
The bottom graph represents a
579
00:27:28,221 --> 00:27:30,118
calculation comparing the first and
580
00:27:30,118 --> 00:27:32,339
second components using the paired FST
581
00:27:32,339 --> 00:27:35,327
distances. We can see that, for
582
00:27:35,327 --> 00:27:37,830
example, the district of Děčín,
583
00:27:38,153 --> 00:27:40,657
DC, or Fridek Mistek, FM,
584
00:27:41,222 --> 00:27:43,241
and similar other districts are a little
585
00:27:43,241 --> 00:27:45,421
more distant from the central cluster,
586
00:27:46,228 --> 00:27:48,328
and there are some distances between
587
00:27:48,328 --> 00:27:51,114
them, but they are not the significant.
588
00:27:51,639 --> 00:27:54,384
So there are areas that are more
589
00:27:54,384 --> 00:27:56,080
distinct from other areas.
590
00:27:59,795 --> 00:28:02,338
The Bayesian clustering method and
591
00:28:02,338 --> 00:28:05,205
program structure were used to
592
00:28:05,245 --> 00:28:07,547
analyze the genetic diversity
593
00:28:08,031 --> 00:28:09,808
and admixture rate of honeybee
594
00:28:09,808 --> 00:28:12,150
populations. Ten
595
00:28:12,150 --> 00:28:14,330
independent simulations were run,
596
00:28:14,976 --> 00:28:17,641
each involving 10,000 burn-in steps,
597
00:28:18,045 --> 00:28:20,911
followed by100,000
598
00:28:21,113 --> 00:28:23,778
iterations of Markov Chain Monte
599
00:28:23,778 --> 00:28:26,604
Carlo. We then used the
600
00:28:26,604 --> 00:28:29,269
Clampack and Structure Selector programs,
601
00:28:29,754 --> 00:28:32,741
which implement Evann's methods and
602
00:28:32,741 --> 00:28:35,164
Puechmaille's method, respectively,
603
00:28:35,891 --> 00:28:37,829
to determine the optimal number of
604
00:28:37,829 --> 00:28:40,817
clusters K. The best
605
00:28:40,817 --> 00:28:43,481
fit data Delta
606
00:28:43,481 --> 00:28:46,469
K, MetMed K, MaxMedK,
607
00:28:46,550 --> 00:28:48,932
MedMeanK and
608
00:28:48,932 --> 00:28:51,879
MaxMeanK. In
609
00:28:51,879 --> 00:28:54,544
the case of this population, both methods
610
00:28:54,625 --> 00:28:56,725
determined that there are
611
00:28:57,128 --> 00:29:00,116
three genetically distinct populations
612
00:29:00,439 --> 00:29:02,539
in the honeybee population in the Czech
613
00:29:02,539 --> 00:29:05,446
Republic based on these 22
614
00:29:05,446 --> 00:29:08,111
microsatellite markers, and that each
615
00:29:08,111 --> 00:29:10,856
individual can be assigned to one of
616
00:29:10,856 --> 00:29:13,440
these three groups with some high
617
00:29:13,440 --> 00:29:15,136
degree of probability.
618
00:29:20,466 --> 00:29:23,090
Other methods can also be used to
619
00:29:23,090 --> 00:29:25,714
determine the genetic structure, such as
620
00:29:25,714 --> 00:29:28,137
the discriminant analysis of the
621
00:29:28,137 --> 00:29:30,802
principal components, DAPC,
622
00:29:31,448 --> 00:29:33,790
programmed in the adgenet
623
00:29:34,032 --> 00:29:36,616
package in R. This
624
00:29:36,616 --> 00:29:39,523
method, in turn, can be used to determine
625
00:29:39,523 --> 00:29:42,309
the structure of the population and to
626
00:29:42,309 --> 00:29:45,256
speak of clusters, groups that are
627
00:29:45,256 --> 00:29:48,002
genetically distinct from each other
628
00:29:48,163 --> 00:29:51,070
and to which individuals can be
629
00:29:51,353 --> 00:29:53,331
unambiguously assigned.
630
00:29:54,220 --> 00:29:56,723
In this population of honey bees in the
631
00:29:56,723 --> 00:29:59,630
Czech Republic, five groups, five
632
00:29:59,630 --> 00:30:02,497
clusters were estimated to be different.
633
00:30:06,534 --> 00:30:08,755
And thank you for your attention.