1 00:00:00,962 --> 00:00:03,806 Hello! In this lecture, we will deal 2 00:00:03,806 --> 00:00:06,731 with genomics in farm animals and, above 3 00:00:06,811 --> 00:00:09,776 all, with sequencing. The lecture is 4 00:00:09,776 --> 00:00:12,100 part of module 1, Animal Genetics. 5 00:00:12,741 --> 00:00:15,145 The creation of this presentation was 6 00:00:15,145 --> 00:00:17,549 supported by the ERASMUS+ 7 00:00:18,030 --> 00:00:20,034 KA2 grant within the 8 00:00:20,034 --> 00:00:22,798 ISAGREED project. Innovation of the 9 00:00:22,798 --> 00:00:25,323 content and structure of study programmes 10 00:00:25,643 --> 00:00:27,847 in the field of aneimal genetics and food 11 00:00:27,847 --> 00:00:29,970 resource management using 12 00:00:30,211 --> 00:00:31,253 digitization. 13 00:00:34,778 --> 00:00:37,743 DNA sequencing is a basic molecular 14 00:00:37,743 --> 00:00:40,628 genetic method. The method makes it 15 00:00:40,628 --> 00:00:43,032 possible to read genetic information, 16 00:00:43,353 --> 00:00:46,077 which is stored in the form of sequence 17 00:00:46,198 --> 00:00:48,722 of nucleotides in a DNA molecule. 18 00:00:49,363 --> 00:00:52,328 In this figure, I present an overview 19 00:00:52,328 --> 00:00:55,133 of the most important sequencing methods, 20 00:00:55,373 --> 00:00:57,937 from the oldest to the most modern. 21 00:00:58,338 --> 00:01:00,702 A chemical method based on the specific 22 00:01:00,702 --> 00:01:03,186 cleavage of the DNA chain and 23 00:01:03,186 --> 00:01:05,710 referred to by ta discoverers as 24 00:01:05,710 --> 00:01:08,114 Maxam-Gilbert, was the first 25 00:01:08,114 --> 00:01:10,679 sequencing method, but is not 26 00:01:10,679 --> 00:01:13,484 longer used today. The 27 00:01:13,484 --> 00:01:16,368 method discovered by Frederick Sanger, 28 00:01:16,689 --> 00:01:19,454 after whom it is named, is still used 29 00:01:19,454 --> 00:01:22,218 today. It is also referred to 30 00:01:22,218 --> 00:01:24,742 as the dideoxy method 31 00:01:25,023 --> 00:01:27,507 because it uses specially modified 32 00:01:27,507 --> 00:01:29,791 nucleotides, so-called end 33 00:01:29,831 --> 00:01:32,796 terminators. These are added 34 00:01:32,796 --> 00:01:34,879 to the mixture with the normal 35 00:01:34,879 --> 00:01:37,324 nucleotides, and if they are 36 00:01:37,364 --> 00:01:39,928 incorporated into the chain, the 37 00:01:39,928 --> 00:01:42,011 synthesis is terminated. 38 00:01:43,213 --> 00:01:45,698 Pyrosequencing and sequencing by 39 00:01:45,698 --> 00:01:48,342 hybridization are alternative 40 00:01:48,342 --> 00:01:50,826 methods, the use of which is 41 00:01:50,826 --> 00:01:53,471 currently small. Conversely, 42 00:01:53,631 --> 00:01:55,194 next-generation sequencing 43 00:01:56,596 --> 00:01:59,481 is very widespread and used due to its 44 00:01:59,481 --> 00:02:02,045 high speed and sequencing capacity, 45 00:02:02,526 --> 00:02:05,491 as it can also be used to sequence entire 46 00:02:05,491 --> 00:02:07,975 genomes and is referred to 47 00:02:08,616 --> 00:02:10,940 as second-generation 48 00:02:10,940 --> 00:02:11,822 sequencing. 49 00:02:13,504 --> 00:02:16,229 Methods based on the sequencing of single 50 00:02:16,229 --> 00:02:18,713 DNA molecule referred to as 51 00:02:18,793 --> 00:02:21,518 third-generation sequencing, are 52 00:02:21,518 --> 00:02:24,202 increasingly being used. A 53 00:02:24,323 --> 00:02:27,127 great advantage of sequencing is 54 00:02:27,127 --> 00:02:29,531 that it enables simple and accurate 55 00:02:29,531 --> 00:02:32,336 identification of a polymorphic 56 00:02:32,336 --> 00:02:32,977 site. 57 00:02:36,102 --> 00:02:38,827 Modern automatic sequencers are 58 00:02:38,947 --> 00:02:41,632 also suitable for direct detection 59 00:02:41,952 --> 00:02:43,154 of polymorphisms. 60 00:02:45,158 --> 00:02:46,921 The principle of the method was 61 00:02:46,961 --> 00:02:49,405 discovered in 1977 62 00:02:49,966 --> 00:02:52,209 by the English biochemist Frederick 63 00:02:52,209 --> 00:02:55,134 Sanger. The bases of Sanger sequencing 64 00:02:55,415 --> 00:02:57,979 are dideoxynucleotides, 65 00:02:58,300 --> 00:03:00,704 called ddNTPs or 66 00:03:00,864 --> 00:03:01,986 end terminators. 67 00:03:02,947 --> 00:03:05,191 ddNTP is a modified 68 00:03:05,191 --> 00:03:08,156 nucleotide that has removed the OH 69 00:03:08,557 --> 00:03:11,442 binding group on the 3rd carbon of 70 00:03:11,442 --> 00:03:14,246 deoxyribose, which is necessary for 71 00:03:14,246 --> 00:03:15,849 binding another nucleotide. 72 00:03:16,971 --> 00:03:19,375 Thus, upon incorporation of the 73 00:03:19,375 --> 00:03:22,140 ddNTP, chain synthesis 74 00:03:22,140 --> 00:03:25,025 stops, producing a chain of the size 75 00:03:25,025 --> 00:03:27,268 and color corresponding to the 76 00:03:27,308 --> 00:03:29,913 corresponding nucleotide on the template. 77 00:03:32,357 --> 00:03:34,521 The sequencing process consists of 78 00:03:34,521 --> 00:03:37,365 several steps. It starts with sample 79 00:03:37,606 --> 00:03:40,210 preparation, which is most often a 80 00:03:40,210 --> 00:03:43,095 PCR product. This must be 81 00:03:43,135 --> 00:03:45,659 purified to contain only template 82 00:03:45,739 --> 00:03:48,584 DNA. Furthermore, we need to 83 00:03:48,584 --> 00:03:50,708 know at least its approximate 84 00:03:50,788 --> 00:03:53,713 concentration. The sequencing 85 00:03:53,753 --> 00:03:56,437 reaction itself is the linear cyclic 86 00:03:56,558 --> 00:03:59,002 enzymatic reaction that must 87 00:03:59,002 --> 00:04:01,686 contain the listed components. The 88 00:04:01,686 --> 00:04:04,251 sequencing primer defines the start of 89 00:04:04,251 --> 00:04:06,975 sequencing, only one must be used, 90 00:04:07,135 --> 00:04:09,660 not two as in PCR. The 91 00:04:09,660 --> 00:04:12,264 reaction is catalyzed by DNA polymerase 92 00:04:12,665 --> 00:04:15,469 inside the reaction buffer with the addition 93 00:04:15,469 --> 00:04:17,232 of standard dNTPs 94 00:04:19,236 --> 00:04:21,800 and additionally ddNTPs. 95 00:04:23,082 --> 00:04:25,847 This is followed by purification of the 96 00:04:25,887 --> 00:04:28,652 sequencing reaction when free labeled 97 00:04:28,652 --> 00:04:31,296 ddNTPs must be removed. 98 00:04:32,298 --> 00:04:35,102 The mixture of fragment is divided by 99 00:04:35,102 --> 00:04:37,947 capillary electrophoresis into sequencers 100 00:04:38,468 --> 00:04:40,151 and the result evaluated. 101 00:04:43,356 --> 00:04:46,041 Currently, a variant known as 4-colour 102 00:04:46,041 --> 00:04:48,445 terminator sequencing is routinely 103 00:04:48,445 --> 00:04:51,169 used. Individual termination 104 00:04:51,169 --> 00:04:54,054 nucleotides are marked with fluorescent 105 00:04:54,054 --> 00:04:56,739 colours according to the heterogeneous 106 00:04:56,739 --> 00:04:59,664 base they carry, or which nucleotide 107 00:04:59,784 --> 00:05:02,549 on the template they pair with. 108 00:05:03,390 --> 00:05:06,115 By the action of the polymerase, chains 109 00:05:06,115 --> 00:05:08,278 complementary to the template are 110 00:05:08,278 --> 00:05:11,083 synthesized, while usually normal 111 00:05:11,163 --> 00:05:13,447 dNTPs are incorporated 112 00:05:14,048 --> 00:05:16,933 and the chain is lengthened, but 113 00:05:17,213 --> 00:05:19,898 occasionally ddNTPs 114 00:05:19,978 --> 00:05:22,783 are also incorporated, which 115 00:05:22,823 --> 00:05:24,946 then terminates the synthesis of the 116 00:05:25,026 --> 00:05:27,711 chain. The resulting fragment 117 00:05:28,152 --> 00:05:30,556 is labelled with a fluorescent colour 118 00:05:30,956 --> 00:05:33,480 that corresponds to the nucleotide at the 119 00:05:33,480 --> 00:05:36,165 appropriate position of the template. 120 00:05:38,088 --> 00:05:41,053 A mixture of single-chain molecules of 121 00:05:41,053 --> 00:05:44,018 different lengths and different colours is 122 00:05:44,018 --> 00:05:46,983 formed. In order to determine at 123 00:05:46,983 --> 00:05:49,788 which position which base occurs, an 124 00:05:49,828 --> 00:05:52,593 accurate electrophoretic separation of 125 00:05:52,593 --> 00:05:55,157 this mixture of fragments must take 126 00:05:55,157 --> 00:05:57,521 place. To do this, it 127 00:05:57,561 --> 00:05:59,604 uses fluorescent capillary 128 00:05:59,685 --> 00:06:02,609 electrophoresis, which is the basis of 129 00:06:02,609 --> 00:06:05,054 genetic analyzers or sequencers. 130 00:06:05,855 --> 00:06:08,539 This device can not only divide and sort 131 00:06:08,539 --> 00:06:11,384 the fragments according to size, but 132 00:06:11,384 --> 00:06:13,588 also directly read the sequence of 133 00:06:13,588 --> 00:06:16,072 nucleotides based on different coloured 134 00:06:16,152 --> 00:06:19,117 peaks. E.g. 135 00:06:19,117 --> 00:06:21,922 the 100 nucleotides 136 00:06:21,922 --> 00:06:23,925 long chain emits green light, 137 00:06:24,807 --> 00:06:27,772 i.e. there was adenine in position 138 00:06:27,772 --> 00:06:30,456 100 of the template, the 139 00:06:30,496 --> 00:06:33,301 chain 101 long 140 00:06:33,301 --> 00:06:35,705 lights up red, i.e. in 141 00:06:35,705 --> 00:06:38,510 position 101 was 142 00:06:38,510 --> 00:06:40,273 thymine etc. 143 00:06:42,517 --> 00:06:45,161 The Sanger method is the most widely used 144 00:06:45,161 --> 00:06:47,325 sequencing method. Among its 145 00:06:47,605 --> 00:06:50,250 advantage are the reading of relatively 146 00:06:50,330 --> 00:06:51,892 long DNA chains 147 00:06:52,213 --> 00:06:55,018 (approximately 800 bp) 148 00:06:55,498 --> 00:06:58,183 and at the same time high accuracy and 149 00:06:58,183 --> 00:07:00,867 reliability. If we need to know the 150 00:07:00,867 --> 00:07:03,432 sequence or detect a mutation in a 151 00:07:03,432 --> 00:07:06,276 specific section in genome in one or two 152 00:07:06,357 --> 00:07:08,640 individuals, it is the most 153 00:07:08,640 --> 00:07:11,485 cost-effective option. Compared to 154 00:07:11,565 --> 00:07:14,050 NGS methods, however, the 155 00:07:14,090 --> 00:07:16,414 performance and price per one 156 00:07:16,414 --> 00:07:19,298 sequenced base is very high. So the 157 00:07:19,298 --> 00:07:21,823 method is not suitable for sequencing 158 00:07:21,823 --> 00:07:24,427 large section of the genome or even 159 00:07:24,427 --> 00:07:25,869 entire genomes. 160 00:07:28,514 --> 00:07:31,158 For whole genome sequencing, it is 161 00:07:31,158 --> 00:07:33,843 more advisable to use another method, 162 00:07:34,364 --> 00:07:36,848 the so-called next generation sequencing, 163 00:07:37,329 --> 00:07:39,893 NGS for short. The 164 00:07:39,893 --> 00:07:42,377 method is also referred to as second 165 00:07:42,377 --> 00:07:45,302 generation sequencing. The method 166 00:07:45,502 --> 00:07:48,307 is based on DNA fragmentation 167 00:07:48,307 --> 00:07:51,272 and sequencing of short fragments, but 168 00:07:51,352 --> 00:07:54,117 in huge quantity at the same time, and 169 00:07:54,718 --> 00:07:57,603 is thus referred as a massively 170 00:07:57,683 --> 00:08:00,367 parallel sequencing. The method 171 00:08:00,448 --> 00:08:03,212 enables a generally very high sequencing 172 00:08:03,212 --> 00:08:05,416 capacity. However, you can choose 173 00:08:05,416 --> 00:08:07,940 sequencer variants with a capacity 174 00:08:08,261 --> 00:08:10,585 according to your needs from 1 175 00:08:10,665 --> 00:08:13,550 gigabase to 8 terabases. 176 00:08:16,434 --> 00:08:19,399 Common NGS sequencers include Illumina 177 00:08:19,399 --> 00:08:22,324 devices. The basis of the method is the 178 00:08:22,324 --> 00:08:25,129 so-called bridge PCR and 179 00:08:25,129 --> 00:08:27,934 sequencing during synthesis using 180 00:08:27,934 --> 00:08:30,418 four-colour fluorescence. The 181 00:08:30,538 --> 00:08:32,862 actual sequencing takes place in 182 00:08:32,982 --> 00:08:35,346 clusters of the same sequence - the 183 00:08:35,346 --> 00:08:38,151 corresponding spot lights up in colour 184 00:08:38,151 --> 00:08:40,956 according to the base. Section 185 00:08:40,956 --> 00:08:43,840 from 150 to 300 186 00:08:43,840 --> 00:08:46,565 nucleotides in size are sequenced. 187 00:08:47,326 --> 00:08:49,770 A more detailed explanation is beyond the 188 00:08:49,770 --> 00:08:52,415 timeframe of this lecture, and I 189 00:08:52,415 --> 00:08:55,139 recommend watching videos on the internet 190 00:08:55,540 --> 00:08:58,185 for those interested in understanding 191 00:08:58,185 --> 00:08:59,948 this rather complex method. 192 00:09:02,191 --> 00:09:04,796 Next-generation sequencing is today a 193 00:09:04,796 --> 00:09:07,640 widely used tool in genetics for 194 00:09:07,640 --> 00:09:10,525 both research and diagnosis. The 195 00:09:10,525 --> 00:09:12,889 high capacity of the methods makes it 196 00:09:12,929 --> 00:09:15,734 possible to obtain sequences of 197 00:09:15,734 --> 00:09:18,058 even large genomes of animals, 198 00:09:19,020 --> 00:09:21,985 e.g. mammals, the size of which is around 199 00:09:21,985 --> 00:09:24,308 3 billion nucleotides, 200 00:09:24,789 --> 00:09:27,754 within a few hours to days, which 201 00:09:27,754 --> 00:09:30,158 would take several years of work using 202 00:09:30,158 --> 00:09:33,123 the Sanger sequencing method. It is 203 00:09:33,123 --> 00:09:35,607 possible to obtain the entire genetic 204 00:09:35,607 --> 00:09:38,492 information of an individual in this 205 00:09:38,492 --> 00:09:41,057 way. It therefore allows 206 00:09:41,057 --> 00:09:43,461 finding a large number of new or 207 00:09:43,461 --> 00:09:46,305 detecting all no polymorphisms and 208 00:09:46,305 --> 00:09:49,070 mutations. However, there is a 209 00:09:49,070 --> 00:09:50,953 problem with the processing and 210 00:09:51,033 --> 00:09:53,317 evaluation of large amounts of data, 211 00:09:53,798 --> 00:09:56,362 which is why it is necessary to use very 212 00:09:56,362 --> 00:09:59,327 powerful bioinformatics 213 00:09:59,327 --> 00:10:01,892 tools. Whole-genome data are 214 00:10:01,892 --> 00:10:04,696 stored in the genome databases of 215 00:10:04,776 --> 00:10:07,301 individual organisms, and can be 216 00:10:07,301 --> 00:10:09,585 compared with a specific sample. 217 00:10:10,226 --> 00:10:12,469 Then methods is also important for 218 00:10:12,469 --> 00:10:14,152 sequencing an individual, 219 00:10:14,954 --> 00:10:17,798 e.g. patients, which is the 220 00:10:17,798 --> 00:10:19,762 field of personal genomics. 221 00:10:21,845 --> 00:10:24,089 Sequencing techniques based on the 222 00:10:24,169 --> 00:10:26,333 sequencing of a single molecule are 223 00:10:26,814 --> 00:10:29,298 referred to as 3rd generation 224 00:10:29,298 --> 00:10:32,183 sequencing. The advantage is long 225 00:10:32,183 --> 00:10:35,067 reads, in the case of de novo sequencing, 226 00:10:35,388 --> 00:10:36,269 up to seven kb. 227 00:10:38,673 --> 00:10:41,158 This makes it possible to correctly 228 00:10:41,318 --> 00:10:43,722 identify variants on one chain, 229 00:10:44,603 --> 00:10:47,488 the so-called haplotypes, which reading 230 00:10:47,528 --> 00:10:49,291 of short sequencing is the second 231 00:10:49,291 --> 00:10:51,495 generation, does not allow. 232 00:10:52,296 --> 00:10:54,861 The method is gradually being improved, 233 00:10:55,382 --> 00:10:57,746 as the reading error rate is still higher 234 00:10:57,746 --> 00:11:00,590 compared to the two previous generations. 235 00:11:01,352 --> 00:11:03,635 A method known as single molecule 236 00:11:03,635 --> 00:11:06,240 real-time sequencing from Pacific 237 00:11:06,240 --> 00:11:09,004 Bioscience or Nanopore sequencers from 238 00:11:09,044 --> 00:11:11,529 Oxford Nanopore is available. 239 00:11:13,933 --> 00:11:16,457 The results of whole-genome sequencing 240 00:11:16,577 --> 00:11:19,542 are stored in genomic databases and 241 00:11:19,542 --> 00:11:22,066 often freely available. For 242 00:11:22,066 --> 00:11:24,711 example, the NCBI Internet 243 00:11:24,711 --> 00:11:27,235 database allows viewing 244 00:11:27,315 --> 00:11:29,880 down to the sequence level of 245 00:11:29,960 --> 00:11:32,203 individual gene nucleotides. 246 00:11:34,367 --> 00:11:36,731 Another practical use of sequencing in 247 00:11:36,771 --> 00:11:39,736 animals is species identification. 248 00:11:40,057 --> 00:11:43,022 Molecular taxonomy with so-called DNA 249 00:11:43,102 --> 00:11:45,586 barcoding, which makes it possible to 250 00:11:45,586 --> 00:11:48,230 identify a species where it is not 251 00:11:48,230 --> 00:11:50,594 possible with the classical method, 252 00:11:51,195 --> 00:11:52,958 e.g., in insect larvae. 253 00:11:54,200 --> 00:11:56,965 A fragment of the gene for cytochrome 254 00:11:56,965 --> 00:11:59,770 C oxidase 1, which is located 255 00:11:59,850 --> 00:12:02,815 in mitochondrial DNA, is used for 256 00:12:02,815 --> 00:12:05,379 this purpose. This 257 00:12:05,379 --> 00:12:07,703 fragment is usually sequenced by 258 00:12:07,703 --> 00:12:10,107 Sanger method, and species 259 00:12:10,107 --> 00:12:12,191 identification is performed by 260 00:12:12,271 --> 00:12:14,755 comparison with determined 261 00:12:14,835 --> 00:12:17,520 sequences stored in the Bold or Blast 262 00:12:17,640 --> 00:12:20,244 databases. The 263 00:12:20,244 --> 00:12:22,728 advantage of mitochondrial 264 00:12:22,728 --> 00:12:24,612 DNA is more copies, i.e. 265 00:12:25,974 --> 00:12:28,899 greater amount of DNA per amount 266 00:12:28,979 --> 00:12:31,904 of biological material, and higher 267 00:12:31,904 --> 00:12:34,628 stability. It is also used for the 268 00:12:34,628 --> 00:12:37,433 determination of older or museum 269 00:12:37,433 --> 00:12:39,597 samples. The 270 00:12:39,597 --> 00:12:42,241 disadvantage is the possibility of 271 00:12:42,241 --> 00:12:44,805 contamination with a bacterial genome 272 00:12:45,126 --> 00:12:46,809 (Wolbachia, etc.). 273 00:12:50,014 --> 00:12:52,498 This image shows a sample of a 274 00:12:52,498 --> 00:12:55,143 specific sequence obtained by 275 00:12:55,143 --> 00:12:57,707 mitochondrial DNA sequencing 276 00:12:58,228 --> 00:13:01,073 of an old museum butterfly specimen. 277 00:13:04,599 --> 00:13:07,043 By inserting the sequence into the 278 00:13:07,243 --> 00:13:08,926 Internet alignment search tool 279 00:13:08,926 --> 00:13:11,570 called Blast, it was 280 00:13:11,570 --> 00:13:14,135 possible to identify the species 281 00:13:14,135 --> 00:13:15,657 based on homology. 282 00:13:18,943 --> 00:13:21,828 The lecture is finished. I believe 283 00:13:21,828 --> 00:13:24,392 that you have understood the basic 284 00:13:24,392 --> 00:13:27,197 principles of one of the most important 285 00:13:27,357 --> 00:13:29,721 molecular-genetic methods - sequencing. 286 00:13:30,562 --> 00:13:33,006 I also recommend watching the follow-up 287 00:13:33,006 --> 00:13:35,931 presentation: Laboratory examples. 288 00:13:36,653 --> 00:13:38,175 Thank you for your attention.