1 00:00:01,131 --> 00:00:03,715 Hello, in this lecture for PhD 2 00:00:03,715 --> 00:00:06,097 students, we will focus on the 3 00:00:06,097 --> 00:00:08,116 possibilities of assessing genetic 4 00:00:08,237 --> 00:00:11,063 diversity and population structure using 5 00:00:11,224 --> 00:00:13,728 mitochondrial DNA and nuclear 6 00:00:13,728 --> 00:00:16,473 microsatellite markers applied to the 7 00:00:16,473 --> 00:00:19,138 honeybee. The lecture is part of module 8 00:00:19,138 --> 00:00:22,126 1, Animal Genetics. The creation 9 00:00:22,247 --> 00:00:24,266 of this presentation was supported by the 10 00:00:24,629 --> 00:00:27,456 Erasmus+ KA2 grant within 11 00:00:27,456 --> 00:00:29,959 ISAGREED project, innovation of the 12 00:00:29,959 --> 00:00:32,866 content and structure of study programs 13 00:00:32,866 --> 00:00:35,531 in the field of animal genetics and food 14 00:00:35,531 --> 00:00:37,469 resource management using 15 00:00:37,469 --> 00:00:38,680 digitalization. 16 00:00:40,739 --> 00:00:43,202 This lecture will cover topics such as 17 00:00:43,202 --> 00:00:45,786 genetic data acquisition, assessment of 18 00:00:45,786 --> 00:00:48,370 genetic variability using mitochondrial 19 00:00:48,370 --> 00:00:50,752 DNA sequences, and assessment of genetic 20 00:00:50,752 --> 00:00:53,700 variability using nuclear STRs 21 00:00:53,700 --> 00:00:56,526 or microsatellite markers. Why 22 00:00:56,526 --> 00:00:59,514 is genetic diversity important? Genetic 23 00:00:59,514 --> 00:01:02,259 diversity is important for the yield of 24 00:01:02,259 --> 00:01:05,167 populations. It is a key source of 25 00:01:05,328 --> 00:01:07,791 the ability to build tolerance or 26 00:01:07,791 --> 00:01:10,052 resistance to current and future 27 00:01:10,052 --> 00:01:12,919 diseases, pathogens and predators. 28 00:01:13,726 --> 00:01:16,230 The current state of bee populations can 29 00:01:16,230 --> 00:01:19,137 be attributed in part to a reduction in 30 00:01:19,137 --> 00:01:22,124 diversity. Bee diversity 31 00:01:22,286 --> 00:01:24,466 has been assessed using morphometrics 32 00:01:24,466 --> 00:01:26,808 traits such as wing parameters, 33 00:01:27,010 --> 00:01:29,877 pigmentation, etc. The 34 00:01:29,877 --> 00:01:32,622 honey bee, Apis mellifera, is 35 00:01:32,622 --> 00:01:35,045 now known to comprise 31 36 00:01:35,045 --> 00:01:37,629 subspecies, breeds or races. 37 00:01:38,032 --> 00:01:40,132 DNA analyses, particularly of 38 00:01:40,132 --> 00:01:42,635 mitochondrial origin, has 39 00:01:42,635 --> 00:01:45,058 facilitated the description of 40 00:01:45,098 --> 00:01:47,278 evolutionary lineages including the 41 00:01:47,278 --> 00:01:50,145 Western Mediterranean type M, the 42 00:01:50,145 --> 00:01:52,406 Northern Mediterranean type C, 43 00:01:53,093 --> 00:01:55,677 the African lineage A,and the 44 00:01:55,677 --> 00:01:57,817 oriental lineage O. The 45 00:01:59,916 --> 00:02:02,258 honeybee genome has been completely 46 00:02:02,258 --> 00:02:04,923 sequenced on multiple occasions. The 47 00:02:04,923 --> 00:02:07,911 individual chromosomes are visible and 48 00:02:07,911 --> 00:02:10,212 the penultimate column illustrates the 49 00:02:10,212 --> 00:02:13,200 size of each chromosome in terms of the 50 00:02:13,200 --> 00:02:15,986 number of base pairs. A total 51 00:02:17,116 --> 00:02:19,781 of 12,398 52 00:02:20,589 --> 00:02:23,415 genes have been described or are 53 00:02:23,415 --> 00:02:26,282 estimated to exist in the honeybee genome. 54 00:02:27,008 --> 00:02:27,614 Of these, 55 00:02:27,856 --> 00:02:30,602 9935 56 00:02:30,602 --> 00:02:33,509 genes coded for some kind of 57 00:02:33,509 --> 00:02:34,841 protein, while 58 00:02:36,335 --> 00:02:38,919 2421 genes don't 59 00:02:38,919 --> 00:02:41,665 code for proteins, but rather code for 60 00:02:41,745 --> 00:02:44,006 other RNAs, such as transfer 61 00:02:44,006 --> 00:02:46,348 RNAs or other small nuclear 62 00:02:46,348 --> 00:02:47,156 RNAs. 63 00:02:49,659 --> 00:02:52,243 The mitochondrial DNA of most species is 64 00:02:52,243 --> 00:02:54,746 estimatedto be within the range of 65 00:02:54,746 --> 00:02:57,734 approximately 16 to 20 kilobases. 66 00:02:58,461 --> 00:03:00,480 The mitochondrial genome of the Western 67 00:03:00,480 --> 00:03:02,822 honeybee is estimated to comprise 68 00:03:02,822 --> 00:03:03,871 approximately 69 00:03:04,033 --> 00:03:06,778 16,500 70 00:03:06,859 --> 00:03:09,605 base pairs. In the reference genome 71 00:03:09,605 --> 00:03:12,593 NC001566, 72 00:03:13,239 --> 00:03:15,015 the mitochondrial DNA is 73 00:03:15,177 --> 00:03:18,730 16,343 74 00:03:18,810 --> 00:03:21,717 base pairs in size. In the 75 00:03:21,717 --> 00:03:23,656 complete genome of the carpathian 76 00:03:23,656 --> 00:03:26,401 mitochondrial DNA, the size of the 77 00:03:26,482 --> 00:03:27,733 mitochondrial DNA is 78 00:03:27,733 --> 00:03:29,712 16,358 79 00:03:30,883 --> 00:03:33,830 base pairs. The 80 00:03:33,830 --> 00:03:35,849 mitochondrial DNA of the honey bee 81 00:03:35,849 --> 00:03:38,595 contains 13 genes that encode 82 00:03:38,595 --> 00:03:41,582 proteins, as well as 22 genes 83 00:03:41,825 --> 00:03:44,732 that encode tRNA and two genes that 84 00:03:44,732 --> 00:03:47,720 encode ribosomal RNA. In 85 00:03:47,720 --> 00:03:50,586 particular, the barcoding sequence, which 86 00:03:50,586 --> 00:03:53,049 is the cytochrome oxidase 1 site 87 00:03:53,049 --> 00:03:55,351 sequence, and the so-called intergenic 88 00:03:55,351 --> 00:03:57,692 regime, which includes parts of the 89 00:03:57,894 --> 00:04:00,478 tRNA genes for leucin and 90 00:04:00,478 --> 00:04:03,305 cytochrome oxidase 2, are employed 91 00:04:03,305 --> 00:04:05,969 for phylogenetic and 92 00:04:06,212 --> 00:04:07,746 phylogeographic analysis. 93 00:04:09,603 --> 00:04:11,703 What degree of variability can be 94 00:04:11,703 --> 00:04:14,206 observed at the DNA level and which 95 00:04:14,206 --> 00:04:17,113 molecular genetic markers exist? At 96 00:04:17,113 --> 00:04:19,859 present, the most frequently utilized 97 00:04:20,262 --> 00:04:22,039 are biallelic single nucleotide 98 00:04:22,039 --> 00:04:24,865 polymorphisms which can 99 00:04:24,865 --> 00:04:27,409 be identified in both coding and 100 00:04:27,409 --> 00:04:28,984 non-coding regions. 101 00:04:29,751 --> 00:04:32,496 Additionally, data on insertions and 102 00:04:32,496 --> 00:04:35,444 deletions, defined as the presence or 103 00:04:35,444 --> 00:04:37,947 absence of base, can be employed. 104 00:04:38,512 --> 00:04:41,258 These are commonly referred to as indels. 105 00:04:41,742 --> 00:04:44,326 The image on the right illustrated this. 106 00:04:44,326 --> 00:04:46,991 The top row depicts SNP markers, 107 00:04:47,476 --> 00:04:50,463 while the bottom row displays deletionsof 108 00:04:50,463 --> 00:04:53,169 cytosine in the ACA sequence 109 00:04:53,169 --> 00:04:53,613 region. 110 00:04:57,812 --> 00:05:00,638 Following the isolation of the DNA from 111 00:05:00,638 --> 00:05:03,626 the B sample and the amplification of 112 00:05:03,626 --> 00:05:06,049 specific small section, for 113 00:05:06,049 --> 00:05:08,996 example, one of the two genes mentioned 114 00:05:09,036 --> 00:05:11,943 above, sequencing is conducted 115 00:05:11,943 --> 00:05:14,406 using a capillary electrophoresis-based 116 00:05:14,406 --> 00:05:16,506 sequencer. The resulting 117 00:05:16,506 --> 00:05:19,373 identification of the individual bases 118 00:05:19,696 --> 00:05:21,714 The sequence is illustrated in the 119 00:05:21,714 --> 00:05:22,280 feature. The 120 00:05:26,600 --> 00:05:28,901 individual peaks are represented by the 121 00:05:28,901 --> 00:05:31,808 colors used to identify the individual 122 00:05:31,808 --> 00:05:32,374 bases. 123 00:05:36,734 --> 00:05:39,480 2 mitochondrial DNA sequences 124 00:05:39,641 --> 00:05:41,741 were employed for the purpose of 125 00:05:41,822 --> 00:05:44,809 identifying subtypes, mitotypes or 126 00:05:44,809 --> 00:05:47,716 haplotypes within the lineage. These are 127 00:05:47,716 --> 00:05:50,462 the aforementioned. Intergenic region 128 00:05:50,866 --> 00:05:53,046 tRNA Leucine-Cox2 129 00:05:54,984 --> 00:05:57,972 and cytochrome oxidase 1 region. 130 00:05:59,425 --> 00:06:01,686 This section comprise 2 mitochondrial 131 00:06:01,686 --> 00:06:04,190 genes, the transfer RNA for 132 00:06:04,190 --> 00:06:07,016 Leucine and the cytochrome oxidase 2. 133 00:06:07,581 --> 00:06:10,327 This sequence is distinguished by a 134 00:06:10,327 --> 00:06:13,072 high mutation content in the table 135 00:06:13,072 --> 00:06:15,939 variations in nucleotide length and 136 00:06:15,939 --> 00:06:18,240 compositions across honey bee 137 00:06:18,240 --> 00:06:21,067 populations. This amplicon 138 00:06:21,067 --> 00:06:23,409 is cleaved by the restriction endonuclease 139 00:06:23,409 --> 00:06:25,831 Dra1, which 140 00:06:25,831 --> 00:06:27,688 specifically recognize the 141 00:06:27,769 --> 00:06:30,353 TTTAAA sequence 142 00:06:30,757 --> 00:06:32,776 to identify each lineage. 143 00:06:33,745 --> 00:06:35,925 The second sequence is the sequence for 144 00:06:35,925 --> 00:06:38,913 the barcoding region, and this is part 145 00:06:38,953 --> 00:06:41,820 of cytochrome oxidase 1, cox1 gene. 146 00:06:43,031 --> 00:06:45,736 This sequence is compared to 147 00:06:45,777 --> 00:06:48,643 sequence stored in databases such as the 148 00:06:48,643 --> 00:06:50,743 BOLD system or GeneBank. 149 00:06:51,591 --> 00:06:54,013 The DNA fragment is highly conserved 150 00:06:54,336 --> 00:06:56,557 within taxa and is often used to 151 00:06:56,557 --> 00:06:59,101 distinguish taxa and species. 152 00:07:01,927 --> 00:07:04,511 The tRNA-Leucin-cox2 153 00:07:04,511 --> 00:07:07,418 sequence structure allows for the 154 00:07:07,459 --> 00:07:09,921 identification of distinct evolutionary 155 00:07:10,164 --> 00:07:12,909 lineages within the honeybee, 156 00:07:13,394 --> 00:07:15,736 the C lineage, which 157 00:07:15,736 --> 00:07:17,997 encompasses the honeybee, 158 00:07:17,997 --> 00:07:20,540 Apis mellifera, as well as the ligustica, 159 00:07:20,540 --> 00:07:22,761 macedonica and other related 160 00:07:22,761 --> 00:07:25,587 subspecies is characterized 161 00:07:25,587 --> 00:07:28,575 by the presence of a single copy of the Q 162 00:07:28,575 --> 00:07:31,482 sequence. The aforementioned 163 00:07:31,482 --> 00:07:34,470 lines contain one to two copies 164 00:07:34,793 --> 00:07:37,700 of the aforementioned Q sequence, in 165 00:07:37,700 --> 00:07:40,526 addition to the so-called 166 00:07:40,526 --> 00:07:43,393 P0 segment, the 167 00:07:43,514 --> 00:07:46,260 M lineage, which is the original black 168 00:07:46,260 --> 00:07:49,086 bee, Apis mellifera mellifera, which is no 169 00:07:49,086 --> 00:07:51,832 longer found in the Czech Republic, may 170 00:07:51,832 --> 00:07:54,739 contain one, two or three repeats of the 171 00:07:54,819 --> 00:07:56,1000 aforementioned Q sequence, 172 00:07:57,646 --> 00:08:00,553 in addition to the so-called P 173 00:08:00,553 --> 00:08:02,975 sequence. By sequencing and 174 00:08:02,975 --> 00:08:05,398 comparing individual sequences, it is 175 00:08:05,398 --> 00:08:07,901 possible to identify an evolutionary 176 00:08:07,901 --> 00:08:09,516 lineage in each bee. 177 00:08:11,939 --> 00:08:14,927 The identification of particular lineage 178 00:08:15,169 --> 00:08:17,349 is possible through cleavage with the 179 00:08:17,349 --> 00:08:19,691 restriction enzyme Dra1, 180 00:08:20,256 --> 00:08:21,952 which recognize the altered 181 00:08:22,194 --> 00:08:24,778 TTTAAA sequence. 182 00:08:25,586 --> 00:08:28,574 Once this change occurs as a result of 183 00:08:28,574 --> 00:08:31,158 mutation, this enzyme is unable to 184 00:08:31,158 --> 00:08:33,580 recognize this sequence. Instead, it is 185 00:08:33,580 --> 00:08:36,083 unable to cleave it. Using a 186 00:08:36,083 --> 00:08:38,506 classical PCR reaction 187 00:08:38,829 --> 00:08:40,606 based on the length of the individual 188 00:08:40,606 --> 00:08:43,513 fragments, it is possible to distinguish 189 00:08:43,513 --> 00:08:46,097 between variants such as C and 190 00:08:46,097 --> 00:08:48,358 A1 or A4. 191 00:08:50,538 --> 00:08:53,284 The second option is to obtain the entire 192 00:08:53,284 --> 00:08:56,029 sequence of a given segment by sequencing 193 00:08:56,514 --> 00:08:59,098 and subsequently analyzing it. The 194 00:08:59,098 --> 00:09:00,834 following example illustrates the 195 00:09:00,874 --> 00:09:03,781 sequencing and subsequent analysis of 196 00:09:03,862 --> 00:09:06,608 tRNA-leucin - Cox2 sequences from 197 00:09:07,415 --> 00:09:08,788 several individuals. 198 00:09:11,049 --> 00:09:13,310 Some software, such as UniPro 199 00:09:13,794 --> 00:09:16,540 Ugene, enables the user to 200 00:09:16,540 --> 00:09:19,447 perform the cleavage with Dra1 201 00:09:19,447 --> 00:09:22,273 restriction enzyme, in silico, that is 202 00:09:22,475 --> 00:09:24,777 on a computer. The following 203 00:09:24,777 --> 00:09:27,441 example illustrates the cleavage of a 204 00:09:27,441 --> 00:09:30,349 sequence belonging to C lineage, 205 00:09:30,672 --> 00:09:33,094 which contains 3 cleavage sites, 206 00:09:33,417 --> 00:09:35,840 resulting in three fragments of specific 207 00:09:35,840 --> 00:09:36,203 length. 208 00:09:38,747 --> 00:09:41,411 In another sample, only two 209 00:09:41,411 --> 00:09:44,076 cleavage sites were identified and the 210 00:09:44,076 --> 00:09:46,499 length of the fragment suggests that this 211 00:09:46,499 --> 00:09:49,244 is a bee belonging to the A lineage or 212 00:09:49,244 --> 00:09:50,456 African lineage. 213 00:09:53,605 --> 00:09:56,108 Subsequently, the sequences 214 00:09:56,108 --> 00:09:58,531 obtained from the larger population are 215 00:09:58,531 --> 00:10:01,276 compared using the method of multiple 216 00:10:01,276 --> 00:10:04,022 sequential alignment, MSA. 217 00:10:05,072 --> 00:10:07,979 This process could be completed 218 00:10:07,979 --> 00:10:10,321 manually; However, software has been 219 00:10:10,321 --> 00:10:13,228 developed with algorithms that facilitate 220 00:10:13,470 --> 00:10:16,458 this task. Some of these programs are 221 00:10:16,458 --> 00:10:18,719 accessible online, for example on the 222 00:10:18,799 --> 00:10:20,738 European Bioinformatics Institute 223 00:10:20,738 --> 00:10:22,918 servers. For our purposes, 224 00:10:22,918 --> 00:10:25,421 Kalign was the most suitable. 225 00:10:25,744 --> 00:10:28,328 However, there are other tools such as 226 00:10:28,328 --> 00:10:31,316 Clustal Omega, MAFT 227 00:10:31,477 --> 00:10:34,223 and so on. Additionally, there are 228 00:10:34,223 --> 00:10:36,646 programs that can be downloaded and 229 00:10:36,646 --> 00:10:39,512 installed to perform this analysis, such 230 00:10:39,512 --> 00:10:41,612 as MEGA or the Unipro Ugene. 231 00:10:44,519 --> 00:10:47,063 The DnaSP program was 232 00:10:47,063 --> 00:10:49,889 employed to identify DNA polymorphisms 233 00:10:49,889 --> 00:10:52,796 and haplotypes in both mitochondrial DNA 234 00:10:52,796 --> 00:10:55,541 regions, utilizing all sequences from 235 00:10:55,541 --> 00:10:57,883 multiple sequence alignments in FASTA 236 00:10:57,883 --> 00:10:58,126 format. 237 00:11:03,960 --> 00:11:06,786 Moreover, nucleotide substitutions and 238 00:11:06,786 --> 00:11:09,350 insertion deletions for each haplotype 239 00:11:09,431 --> 00:11:11,692 were compared with the reference genome. 240 00:11:12,338 --> 00:11:14,881 To identify specific haplotypes in the 241 00:11:14,881 --> 00:11:17,425 tRNA leucin - cox2, lineage 242 00:11:17,667 --> 00:11:20,453 C and A, reference sequences 243 00:11:20,574 --> 00:11:23,562 with 100% identity were 244 00:11:23,562 --> 00:11:26,106 further searched using BLAST 245 00:11:26,469 --> 00:11:29,376 local pairwise alignment tools. Again, 246 00:11:29,376 --> 00:11:31,799 sequences found in the National Center 247 00:11:31,799 --> 00:11:34,221 for Biotechnology Information (NCBI) 248 00:11:34,867 --> 00:11:37,371 database at the US GenBank. 249 00:11:38,098 --> 00:11:40,682 BLAST was also employed to verify the 250 00:11:40,682 --> 00:11:43,669 cox1 haplotypes, with the sequences 251 00:11:43,669 --> 00:11:46,496 subsequently validated using the 252 00:11:46,496 --> 00:11:48,636 BOLD database based on the multiple 253 00:11:48,636 --> 00:11:51,583 alignments using Kalign, necessitating 254 00:11:51,583 --> 00:11:53,521 additional manual refinement. 255 00:11:56,065 --> 00:11:58,931 A total of 13 haplotypes were 256 00:11:58,931 --> 00:12:01,596 identified, three of which belonged to 257 00:12:01,637 --> 00:12:04,342 the A lineage and the rest to the 258 00:12:04,342 --> 00:12:06,885 C lineage. The most prevalent 259 00:12:06,885 --> 00:12:09,550 haplotype was C1a, which is 260 00:12:09,550 --> 00:12:11,932 typical for Apis meliffera linguistica, the 261 00:12:11,973 --> 00:12:14,920 Italian bee. The table illustrates the 262 00:12:14,920 --> 00:12:17,262 classification of individual haplotypes 263 00:12:17,262 --> 00:12:20,210 into C and A lineages based 264 00:12:20,210 --> 00:12:22,713 on Dra1 spectrum cleavage and 265 00:12:22,713 --> 00:12:25,660 sequencing. The individual haplotypes 266 00:12:25,660 --> 00:12:28,567 and their sequences have been uploaded to 267 00:12:28,567 --> 00:12:31,313 the Genebank databases on the NCBI 268 00:12:31,313 --> 00:12:34,180 server. In the third column, the 269 00:12:34,180 --> 00:12:36,885 reference sequences are 270 00:12:36,885 --> 00:12:39,792 displayed. Additionally, the numbers 271 00:12:39,792 --> 00:12:42,134 and lengths of the fragments produced by 272 00:12:42,134 --> 00:12:45,121 the cleavage are shown, which also 273 00:12:45,121 --> 00:12:47,625 demonstrate the considerable variability. 274 00:12:48,594 --> 00:12:50,653 In the last two columns, the 275 00:12:50,774 --> 00:12:53,520 identification of or comparison with 276 00:12:53,641 --> 00:12:56,023 other sequences in the GenBank database 277 00:12:56,023 --> 00:12:58,607 is presented, where sequences 278 00:12:58,688 --> 00:13:01,595 with 100% identity to our 279 00:13:01,595 --> 00:13:03,936 sequences have been selected. 280 00:13:05,148 --> 00:13:07,813 It is notable that all haplotypes 281 00:13:07,813 --> 00:13:10,558 belonging to C lineage have been 282 00:13:10,558 --> 00:13:12,819 previously described in Apis meliffera 283 00:13:12,819 --> 00:13:15,484 carnica. Additionally, 3 284 00:13:15,484 --> 00:13:17,341 distinct African haplotypes are 285 00:13:17,785 --> 00:13:20,046 identified as Apis meliffera 286 00:13:20,168 --> 00:13:23,155 iberica. However, a single 287 00:13:23,398 --> 00:13:26,022 sequences exhibiting complete 288 00:13:26,022 --> 00:13:28,566 identity wasn't assigned to any 289 00:13:28,566 --> 00:13:30,262 particular subspecies. 290 00:13:32,523 --> 00:13:34,420 This table illustrates the 291 00:13:34,420 --> 00:13:36,843 identification of the most significant 292 00:13:36,843 --> 00:13:39,548 polymorphic sites indicating the bases 293 00:13:39,548 --> 00:13:42,294 present in each haplotype. The 294 00:13:42,294 --> 00:13:44,878 positions were found to exhibit mainly 295 00:13:44,878 --> 00:13:47,502 single nucleotide polymorphisms and 296 00:13:47,502 --> 00:13:50,046 deletions. It is notable that 297 00:13:50,126 --> 00:13:52,064 position 50 displays 298 00:13:52,387 --> 00:13:55,052 polymorphisms with the standard 299 00:13:55,052 --> 00:13:58,040 allele identified as C. The remaining 300 00:13:58,040 --> 00:14:00,624 haplotypes at this position exhibited a 301 00:14:00,624 --> 00:14:01,432 deletion. 302 00:14:03,450 --> 00:14:05,873 Similarly, the cox1 sequence 303 00:14:06,115 --> 00:14:08,942 was analyzed, whereby 13 304 00:14:08,942 --> 00:14:11,687 different haplotypes for barcoding 305 00:14:12,252 --> 00:14:15,159 were identified. As with the 306 00:14:15,159 --> 00:14:17,340 previous analysis, individual SNP 307 00:14:17,340 --> 00:14:20,085 mutations were observed. No insertion or 308 00:14:20,085 --> 00:14:22,508 deletion were identified with the 309 00:14:22,508 --> 00:14:25,011 barcoding sequence. Only SNP 310 00:14:25,011 --> 00:14:27,191 substitutions were presented. 311 00:14:30,341 --> 00:14:32,925 The tables below present the results of 312 00:14:32,925 --> 00:14:35,186 the haplotype frequencies in the 313 00:14:35,186 --> 00:14:37,851 tRNA-Leucin - cox2 gene 314 00:14:38,254 --> 00:14:40,354 and in the cox1 gene in the Czech 315 00:14:40,354 --> 00:14:43,261 Republic. It can be observed that there 316 00:14:43,261 --> 00:14:46,007 are a number of haplotypes that are 317 00:14:46,087 --> 00:14:48,348 relatively well represented, such as 318 00:14:48,348 --> 00:14:50,690 C1A, C2L, 319 00:14:50,932 --> 00:14:52,951 C2E, and C2C. 320 00:14:53,759 --> 00:14:56,423 In contrast, there are haplotypes that 321 00:14:56,423 --> 00:14:59,169 have been identified in only one or a 322 00:14:59,169 --> 00:15:02,157 few individuals. A similar 323 00:15:02,238 --> 00:15:05,064 situation was observed in the cox1 324 00:15:05,064 --> 00:15:07,244 gene where the first four 325 00:15:07,244 --> 00:15:09,747 haplotypes were the most frequently 326 00:15:09,747 --> 00:15:11,928 represented, while the others were 327 00:15:12,009 --> 00:15:14,431 present in minority or individual 328 00:15:14,431 --> 00:15:15,239 samples. 329 00:15:18,469 --> 00:15:20,487 From a population of bees in the Czech 330 00:15:20,487 --> 00:15:23,071 Republic comprising over 300 331 00:15:23,071 --> 00:15:26,059 samples, certain characteristics were 332 00:15:26,059 --> 00:15:28,724 calculated, including haplotype diversity 333 00:15:28,724 --> 00:15:31,389 parameters in the tRNA leucine - 334 00:15:31,389 --> 00:15:34,054 cox2 and cox1 sequences. 335 00:15:34,700 --> 00:15:37,284 The genetic diversity indices, namely 336 00:15:37,284 --> 00:15:40,272 haplotype diversity Hd, molecular 337 00:15:40,272 --> 00:15:43,098 diversity pí and Tajima´s D 338 00:15:43,421 --> 00:15:46,409 were estimated. These indices 339 00:15:47,135 --> 00:15:49,154 were evaluated using PEGAS 340 00:15:50,043 --> 00:15:52,304 in program R, although 341 00:15:52,465 --> 00:15:55,211 alternative software such as DnaSP, 342 00:15:55,614 --> 00:15:58,521 MEGA or Arlequine can be also employed. 343 00:16:01,509 --> 00:16:03,851 Additionally, the so-called 344 00:16:04,093 --> 00:16:06,677 haplotype networks were determined. 345 00:16:07,485 --> 00:16:09,907 We use the Randomized Minimum Spanning 346 00:16:09,988 --> 00:16:12,814 Tree method (RMSAT), 347 00:16:13,622 --> 00:16:16,085 which takes into account frequencies and 348 00:16:16,206 --> 00:16:18,306 relationships between haplotypes. 349 00:16:19,436 --> 00:16:22,101 These haplotype networks were processed 350 00:16:22,182 --> 00:16:24,443 using the PEGAS package in R. 351 00:16:25,169 --> 00:16:27,754 However, other programs such as PopArt 352 00:16:28,521 --> 00:16:29,853 and others can be used. 353 00:16:34,698 --> 00:16:37,524 Here we see the result of haplotype 354 00:16:37,524 --> 00:16:39,543 network analysis for 355 00:16:39,543 --> 00:16:42,006 308 individuals in the 356 00:16:42,006 --> 00:16:44,711 tRNA-Leucine - cox2 sequence. 357 00:16:45,438 --> 00:16:48,103 The most common haplotype is C1A, 358 00:16:48,507 --> 00:16:51,494 followed by C2E, which has two point 359 00:16:51,494 --> 00:16:54,402 mutations, and C2C, which is the 360 00:16:54,402 --> 00:16:57,228 third most common haplotype in the Czech 361 00:16:57,228 --> 00:17:00,054 Republic. Each color in the circle 362 00:17:00,054 --> 00:17:02,800 represents a specific region from which 363 00:17:03,607 --> 00:17:06,272 the bee was obtained. The aim was to 364 00:17:06,272 --> 00:17:09,139 cover the whole Czech Republic, with 365 00:17:09,179 --> 00:17:11,723 the sampling evenly distributed 366 00:17:11,723 --> 00:17:12,894 across all regions. 367 00:17:14,993 --> 00:17:17,900 This slide presents the analysis 368 00:17:17,941 --> 00:17:20,161 of the haplotype network for the cox1 369 00:17:20,161 --> 00:17:22,342 sequence. As observed 370 00:17:22,422 --> 00:17:24,199 previously, if a haplotype was 371 00:17:24,199 --> 00:17:26,864 sufficiently abundant, it occurred in 372 00:17:26,944 --> 00:17:29,448 almost all regions. This is 373 00:17:29,771 --> 00:17:32,637 exemplified by HpB02, 374 00:17:34,616 --> 00:17:36,150 HpB03, 375 00:17:36,473 --> 00:17:38,411 HpB01 and 376 00:17:38,492 --> 00:17:40,349 HpB04. 377 00:17:41,722 --> 00:17:44,064 Conversely, the other haplotypes were 378 00:17:44,064 --> 00:17:46,971 present in a few individuals or in only 379 00:17:46,971 --> 00:17:48,021 one individual. 380 00:17:51,251 --> 00:17:53,350 Subsequently, further 381 00:17:53,512 --> 00:17:56,177 phylogenetic analysis was conducted on 382 00:17:56,177 --> 00:17:58,599 these sequences. Following the 383 00:17:58,599 --> 00:18:01,425 completion of the MSA multiple 384 00:18:01,425 --> 00:18:04,252 sequence alignment, phylogenetic tree 385 00:18:04,817 --> 00:18:07,482 generation was conducted using maximum 386 00:18:07,482 --> 00:18:10,268 likelihood method and the Tamura-Nei 387 00:18:10,268 --> 00:18:12,731 model in Mega X software. 388 00:18:13,619 --> 00:18:16,284 This method entitled the construction of 389 00:18:16,607 --> 00:18:19,352 a bootstrap consensus tree based on 390 00:18:19,433 --> 00:18:21,936 10,000 replicates. The 391 00:18:21,936 --> 00:18:24,520 individual branches correspond to 392 00:18:24,601 --> 00:18:27,104 partitions produced in less than 50% of 393 00:18:27,104 --> 00:18:30,092 bootstrap replicates, as well 394 00:18:30,092 --> 00:18:32,353 as the percentage of replication trees 395 00:18:32,353 --> 00:18:35,341 that clustered related haplotypes in 396 00:18:35,341 --> 00:18:37,925 the bootstrap test. The 397 00:18:37,925 --> 00:18:40,913 initial tree for the heuristic 398 00:18:40,994 --> 00:18:43,255 search was obtained automatically by 399 00:18:43,255 --> 00:18:46,081 applying the Neighbor-Joining and BioNJ 400 00:18:46,848 --> 00:18:49,473 algorithms to the pairwise distances 401 00:18:49,473 --> 00:18:52,460 matrix estimated by the Tamura-Nei model, 402 00:18:52,945 --> 00:18:55,448 and then selecting the topology with the 403 00:18:55,448 --> 00:18:57,467 highest log-likelihood value. 404 00:18:58,396 --> 00:19:00,495 As a result, the following phylogenetic 405 00:19:00,495 --> 00:19:02,716 tree were obtained. 406 00:19:04,977 --> 00:19:07,319 The phylogenetic tree based on the 407 00:19:07,319 --> 00:19:10,064 analysis of tRNA leucine-cox2 408 00:19:10,064 --> 00:19:12,648 sequence is displayed on the left. 409 00:19:13,133 --> 00:19:15,798 The phylogenetic tree based on the cox1 410 00:19:15,798 --> 00:19:18,705 sequence is displayed on the right. It 411 00:19:18,705 --> 00:19:21,127 can be observed that the bees from lineage 412 00:19:21,127 --> 00:19:24,115 A cluster together, in 413 00:19:24,115 --> 00:19:26,376 contrast to the other bees, 414 00:19:26,699 --> 00:19:29,687 namely those from lineage C, which are 415 00:19:29,687 --> 00:19:30,979 marked in red. 416 00:19:35,662 --> 00:19:38,610 The second type of markers used 417 00:19:38,610 --> 00:19:40,508 to assess genetic variability are 418 00:19:40,508 --> 00:19:42,849 microsatellites. These are 419 00:19:42,849 --> 00:19:45,797 polymorphisms that occur exclusively in 420 00:19:45,797 --> 00:19:48,098 nuclear DNA. These 421 00:19:48,098 --> 00:19:50,521 polymorphisms are characterized by the 422 00:19:50,521 --> 00:19:53,105 repetition of a particular motif, 423 00:19:53,589 --> 00:19:56,456 such as GC, in a series of 424 00:19:56,456 --> 00:19:59,323 units called tandem repeats. Each 425 00:19:59,323 --> 00:20:01,665 allele is referred to be the length of 426 00:20:01,665 --> 00:20:04,572 this repeat. To illustrate, we 427 00:20:04,572 --> 00:20:06,752 have an allele with eight repeats, 428 00:20:07,236 --> 00:20:09,659 another with three repeats, and the last 429 00:20:09,659 --> 00:20:11,637 with ten repeats. 430 00:20:12,727 --> 00:20:15,312 The designation of allele is dependent 431 00:20:16,200 --> 00:20:18,784 on the region in which the microsatellite 432 00:20:18,784 --> 00:20:21,287 is located, which is bounded by 433 00:20:21,287 --> 00:20:23,669 primers. The length of the segment 434 00:20:23,669 --> 00:20:25,648 containing the microsatellite can be 435 00:20:25,648 --> 00:20:28,232 determined using a sequencer and 436 00:20:28,232 --> 00:20:31,179 fragmentation analysis. The figure 437 00:20:31,179 --> 00:20:33,804 below illustrates the genotyping for a 438 00:20:33,804 --> 00:20:36,711 particular microsatellite, which in this 439 00:20:36,711 --> 00:20:39,699 case is characterized by three alleles 440 00:20:39,941 --> 00:20:42,040 numbered 156, 441 00:20:42,484 --> 00:20:45,028 152, and 442 00:20:46,966 --> 00:20:48,016 142. 443 00:20:52,457 --> 00:20:54,718 We can see that the microsatellites are 444 00:20:54,718 --> 00:20:56,979 very polymorphic, they can have 445 00:20:57,868 --> 00:21:00,654 not just three alleles, but 20 446 00:21:00,654 --> 00:21:03,318 alleles, which in population can mean a 447 00:21:03,318 --> 00:21:05,943 large number of different combinations of 448 00:21:05,943 --> 00:21:08,688 genotypes. So they are useful 449 00:21:08,688 --> 00:21:11,192 for assessing diversity in populations. 450 00:21:11,757 --> 00:21:14,583 Here is an example of variability in 451 00:21:14,583 --> 00:21:17,571 two populations. The population on 452 00:21:17,571 --> 00:21:19,913 the left has little variability 453 00:21:20,397 --> 00:21:22,901 containing only three types of alleles 454 00:21:23,143 --> 00:21:25,162 and a large number of homozygous 455 00:21:25,242 --> 00:21:27,503 individuals. The second 456 00:21:27,503 --> 00:21:30,047 population on the right contains a 457 00:21:30,047 --> 00:21:32,591 large number of alleles and may often 458 00:21:32,591 --> 00:21:35,256 contain heterozygous genotypes. 459 00:21:42,533 --> 00:21:45,198 Why are microsatellite markers still 460 00:21:45,198 --> 00:21:47,621 used when we have whole genome 461 00:21:47,621 --> 00:21:50,285 sequences? Microsatellite 462 00:21:50,285 --> 00:21:52,708 markers are relatively inexpensive, 463 00:21:53,354 --> 00:21:55,534 and their identification provides 464 00:21:55,534 --> 00:21:57,795 multilocus genotype information. 465 00:21:58,997 --> 00:22:01,177 They can easily be used to estimate the 466 00:22:01,177 --> 00:22:03,882 genetic diversity of populations and 467 00:22:03,882 --> 00:22:06,587 structures, which is also important in 468 00:22:06,587 --> 00:22:08,768 conservation genetics and breeding. 469 00:22:11,836 --> 00:22:14,016 The only method used to determine 470 00:22:14,016 --> 00:22:16,358 genotypes is fragmentation analysis, 471 00:22:17,004 --> 00:22:19,750 which is performed in sequencer using 472 00:22:19,750 --> 00:22:22,657 capillary electrophoresis. The fragments 473 00:22:22,738 --> 00:22:25,322 are separated according to their size, 474 00:22:25,564 --> 00:22:27,421 and the sensor detects the passage of the 475 00:22:27,421 --> 00:22:29,844 molecules, their color and signal 476 00:22:29,844 --> 00:22:32,024 intensity over time, providing 477 00:22:32,024 --> 00:22:33,720 information about the length of the 478 00:22:33,720 --> 00:22:36,465 fragments. The instrument used 479 00:22:36,546 --> 00:22:39,049 is a genetic analyzer. In our 480 00:22:39,049 --> 00:22:41,512 case, we used the ABIPrism 481 00:22:41,512 --> 00:22:43,168 3500 482 00:22:44,056 --> 00:22:47,044 genetic analyzer. Fragment 483 00:22:47,044 --> 00:22:49,305 size were actually determined using 484 00:22:49,305 --> 00:22:52,131 GeneScan software and genotypes were 485 00:22:52,131 --> 00:22:54,554 determined using GeneMapper software. 486 00:22:58,834 --> 00:23:01,095 As we were looking at 22 487 00:23:01,095 --> 00:23:03,759 microsatellite loci, we grouped 488 00:23:03,759 --> 00:23:05,778 certain microsatellites in a single 489 00:23:05,778 --> 00:23:08,685 multiplex reaction. We were able 490 00:23:08,685 --> 00:23:11,390 to identify several microsatellites under 491 00:23:11,390 --> 00:23:14,015 the same conditions, distinguished by 492 00:23:14,015 --> 00:23:15,226 different color. 493 00:23:18,698 --> 00:23:20,879 The result of the analysis is displayed 494 00:23:21,121 --> 00:23:23,988 here for a particular microsatellite 495 00:23:23,988 --> 00:23:26,935 locus, we see the color-coded peaks, 496 00:23:27,137 --> 00:23:29,559 which are identified fragments of a 497 00:23:29,559 --> 00:23:30,771 particular length. 498 00:23:35,575 --> 00:23:38,563 The figure shows the evaluation 499 00:23:38,604 --> 00:23:40,744 of the variability of genotyping of 500 00:23:40,744 --> 00:23:43,247 microsatellite loci in three 501 00:23:43,247 --> 00:23:46,235 individuals. We can see that different 502 00:23:46,235 --> 00:23:48,052 alleles can be present at certain 503 00:23:48,052 --> 00:23:50,757 positions of the region, which allows 504 00:23:50,757 --> 00:23:53,018 easy discrimination of individuals. 505 00:24:00,124 --> 00:24:02,183 After determining the genotypes at 506 00:24:02,304 --> 00:24:04,969 all loci and for all individuals in the 507 00:24:04,969 --> 00:24:07,795 population, the next step is to 508 00:24:07,795 --> 00:24:10,460 perform a diversity analysis. It is 509 00:24:10,783 --> 00:24:13,609 to determine diversity parameters such as 510 00:24:13,609 --> 00:24:15,386 the number of alleles , the 511 00:24:16,597 --> 00:24:19,424 effective number of alleles , the 512 00:24:19,424 --> 00:24:21,372 Shannon Information Index , 513 00:24:22,492 --> 00:24:23,865 the observed and expected 514 00:24:24,430 --> 00:24:27,257 heterozygosity 515 00:24:27,257 --> 00:24:30,244 respectively, and the unbiased expected 516 00:24:30,244 --> 00:24:33,071 heterozygosity uHE, 517 00:24:33,717 --> 00:24:36,462 and the so-called fixation index F. 518 00:24:37,916 --> 00:24:40,823 We use the GenAlEx program, which runs 519 00:24:40,984 --> 00:24:43,528 in Microsoft Excel, but the data and 520 00:24:43,528 --> 00:24:46,152 parameters can also be calculated using, 521 00:24:46,475 --> 00:24:49,382 for example, diversity package 522 00:24:49,382 --> 00:24:51,886 in R. We see 523 00:24:53,420 --> 00:24:56,004 that the expected heterozygosity averaged 524 00:24:56,166 --> 00:24:58,992 over all loci combined is 525 00:24:58,992 --> 00:25:01,818 0.579 526 00:25:02,182 --> 00:25:04,806 and the actual observed heterozygosity 527 00:25:05,129 --> 00:25:05,371 is 528 00:25:06,340 --> 00:25:09,328 0.556. This is a 529 00:25:09,328 --> 00:25:12,154 relatively high heterozygosity which 530 00:25:12,154 --> 00:25:15,061 characterized these bees populations 531 00:25:15,465 --> 00:25:17,524 in sufficiently divergent. 532 00:25:20,795 --> 00:25:23,540 Since we knew which area, district 533 00:25:23,904 --> 00:25:25,801 of the Czech Republic each individual 534 00:25:25,801 --> 00:25:28,466 came from, we divided the 535 00:25:28,466 --> 00:25:31,131 population of the Czech Republic into 77 536 00:25:31,131 --> 00:25:33,311 districts, which characterized the 537 00:25:33,311 --> 00:25:36,218 geographical areas. This allowed 538 00:25:36,299 --> 00:25:38,883 us to calculate the so-called Wright´s F 539 00:25:38,883 --> 00:25:41,790 statistics, Fst, Fis, 540 00:25:41,952 --> 00:25:44,778 and Fit, and the so-called 541 00:25:44,778 --> 00:25:47,160 analysis of molecular variance, which 542 00:25:47,160 --> 00:25:49,139 determine the proportion of 543 00:25:49,219 --> 00:25:51,965 variability between populations, 544 00:25:52,126 --> 00:25:54,307 between individuals within populations, 545 00:25:54,387 --> 00:25:57,214 and within individuals. On the 546 00:25:57,214 --> 00:25:59,717 table on the left, characterize 547 00:26:00,282 --> 00:26:03,189 the individuality and the mean values of 548 00:26:03,189 --> 00:26:06,016 the F statistic. The Fst is 549 00:26:06,016 --> 00:26:08,519 most interesting because it determines 550 00:26:08,761 --> 00:26:10,296 the degree of variation between 551 00:26:10,296 --> 00:26:12,718 subpopulations. It is districts. 552 00:26:13,364 --> 00:26:14,575 The value of 553 00:26:14,979 --> 00:26:17,805 0.086 is not very high, 554 00:26:17,805 --> 00:26:19,743 but it is shown some 555 00:26:19,743 --> 00:26:22,731 diversification in the table 556 00:26:22,731 --> 00:26:25,638 on the right. We see the result of the 557 00:26:25,638 --> 00:26:27,738 analysis of molecular variances, 558 00:26:28,949 --> 00:26:31,291 where in the last column the variation 559 00:26:31,291 --> 00:26:33,875 between regional populations districts 560 00:26:34,198 --> 00:26:36,782 account for only 1% of the total 561 00:26:36,782 --> 00:26:39,285 variation. But even 562 00:26:39,528 --> 00:26:42,273 1 to 3% of variability between 563 00:26:42,273 --> 00:26:45,180 geographical areas are common figures 564 00:26:45,180 --> 00:26:47,603 according to the other publications. 565 00:26:48,733 --> 00:26:51,075 Variability between individuals within 566 00:26:51,075 --> 00:26:53,901 populations is expressed as 6%. 567 00:26:54,628 --> 00:26:56,687 and within individual variability 568 00:26:56,687 --> 00:26:58,868 accounts for most of the variability 569 00:26:58,868 --> 00:27:00,039 within populations. 570 00:27:04,884 --> 00:27:07,548 Paired Nei´s and paired FST 571 00:27:07,791 --> 00:27:10,455 genetic distances were calculated 572 00:27:10,455 --> 00:27:12,313 using the GenAlEx program. 573 00:27:13,201 --> 00:27:15,462 Paired values of these distances were 574 00:27:15,462 --> 00:27:17,885 used for principal component analysis 575 00:27:17,885 --> 00:27:20,388 calculations. The top graph 576 00:27:20,388 --> 00:27:22,730 depicts the distances between the first 577 00:27:22,810 --> 00:27:24,991 and second components for each district. 578 00:27:25,718 --> 00:27:28,221 The bottom graph represents a 579 00:27:28,221 --> 00:27:30,118 calculation comparing the first and 580 00:27:30,118 --> 00:27:32,339 second components using the paired FST 581 00:27:32,339 --> 00:27:35,327 distances. We can see that, for 582 00:27:35,327 --> 00:27:37,830 example, the district of Děčín, 583 00:27:38,153 --> 00:27:40,657 DC, or Fridek Mistek, FM, 584 00:27:41,222 --> 00:27:43,241 and similar other districts are a little 585 00:27:43,241 --> 00:27:45,421 more distant from the central cluster, 586 00:27:46,228 --> 00:27:48,328 and there are some distances between 587 00:27:48,328 --> 00:27:51,114 them, but they are not the significant. 588 00:27:51,639 --> 00:27:54,384 So there are areas that are more 589 00:27:54,384 --> 00:27:56,080 distinct from other areas. 590 00:27:59,795 --> 00:28:02,338 The Bayesian clustering method and 591 00:28:02,338 --> 00:28:05,205 program structure were used to 592 00:28:05,245 --> 00:28:07,547 analyze the genetic diversity 593 00:28:08,031 --> 00:28:09,808 and admixture rate of honeybee 594 00:28:09,808 --> 00:28:12,150 populations. Ten 595 00:28:12,150 --> 00:28:14,330 independent simulations were run, 596 00:28:14,976 --> 00:28:17,641 each involving 10,000 burn-in steps, 597 00:28:18,045 --> 00:28:20,911 followed by100,000 598 00:28:21,113 --> 00:28:23,778 iterations of Markov Chain Monte 599 00:28:23,778 --> 00:28:26,604 Carlo. We then used the 600 00:28:26,604 --> 00:28:29,269 Clampack and Structure Selector programs, 601 00:28:29,754 --> 00:28:32,741 which implement Evann's methods and 602 00:28:32,741 --> 00:28:35,164 Puechmaille's method, respectively, 603 00:28:35,891 --> 00:28:37,829 to determine the optimal number of 604 00:28:37,829 --> 00:28:40,817 clusters K. The best 605 00:28:40,817 --> 00:28:43,481 fit data Delta 606 00:28:43,481 --> 00:28:46,469 K, MetMed K, MaxMedK, 607 00:28:46,550 --> 00:28:48,932 MedMeanK and 608 00:28:48,932 --> 00:28:51,879 MaxMeanK. In 609 00:28:51,879 --> 00:28:54,544 the case of this population, both methods 610 00:28:54,625 --> 00:28:56,725 determined that there are 611 00:28:57,128 --> 00:29:00,116 three genetically distinct populations 612 00:29:00,439 --> 00:29:02,539 in the honeybee population in the Czech 613 00:29:02,539 --> 00:29:05,446 Republic based on these 22 614 00:29:05,446 --> 00:29:08,111 microsatellite markers, and that each 615 00:29:08,111 --> 00:29:10,856 individual can be assigned to one of 616 00:29:10,856 --> 00:29:13,440 these three groups with some high 617 00:29:13,440 --> 00:29:15,136 degree of probability. 618 00:29:20,466 --> 00:29:23,090 Other methods can also be used to 619 00:29:23,090 --> 00:29:25,714 determine the genetic structure, such as 620 00:29:25,714 --> 00:29:28,137 the discriminant analysis of the 621 00:29:28,137 --> 00:29:30,802 principal components, DAPC, 622 00:29:31,448 --> 00:29:33,790 programmed in the adgenet 623 00:29:34,032 --> 00:29:36,616 package in R. This 624 00:29:36,616 --> 00:29:39,523 method, in turn, can be used to determine 625 00:29:39,523 --> 00:29:42,309 the structure of the population and to 626 00:29:42,309 --> 00:29:45,256 speak of clusters, groups that are 627 00:29:45,256 --> 00:29:48,002 genetically distinct from each other 628 00:29:48,163 --> 00:29:51,070 and to which individuals can be 629 00:29:51,353 --> 00:29:53,331 unambiguously assigned. 630 00:29:54,220 --> 00:29:56,723 In this population of honey bees in the 631 00:29:56,723 --> 00:29:59,630 Czech Republic, five groups, five 632 00:29:59,630 --> 00:30:02,497 clusters were estimated to be different. 633 00:30:06,534 --> 00:30:08,755 And thank you for your attention.