1
00:00:00,962 --> 00:00:03,806
Hello! In this lecture, we will deal

2
00:00:03,806 --> 00:00:06,731
with genomics in farm animals and, above

3
00:00:06,811 --> 00:00:09,776
all, with sequencing. The lecture is

4
00:00:09,776 --> 00:00:12,100
part of module 1, Animal Genetics.

5
00:00:12,741 --> 00:00:15,145
The creation of this presentation was

6
00:00:15,145 --> 00:00:17,549
supported by the ERASMUS+

7
00:00:18,030 --> 00:00:20,034
KA2 grant within the

8
00:00:20,034 --> 00:00:22,798
ISAGREED project. Innovation of the

9
00:00:22,798 --> 00:00:25,323
content and structure of study programmes

10
00:00:25,643 --> 00:00:27,847
in the field of aneimal genetics and food

11
00:00:27,847 --> 00:00:29,970
resource management using

12
00:00:30,211 --> 00:00:31,253
digitization.

13
00:00:34,778 --> 00:00:37,743
DNA sequencing is a basic molecular

14
00:00:37,743 --> 00:00:40,628
genetic method. The method makes it

15
00:00:40,628 --> 00:00:43,032
possible to read genetic information,

16
00:00:43,353 --> 00:00:46,077
which is stored in the form of sequence

17
00:00:46,198 --> 00:00:48,722
of nucleotides in a DNA molecule.

18
00:00:49,363 --> 00:00:52,328
In this figure, I present an overview

19
00:00:52,328 --> 00:00:55,133
of the most important sequencing methods,

20
00:00:55,373 --> 00:00:57,937
from the oldest to the most modern.

21
00:00:58,338 --> 00:01:00,702
A chemical method based on the specific

22
00:01:00,702 --> 00:01:03,186
cleavage of the DNA chain and

23
00:01:03,186 --> 00:01:05,710
referred to by ta discoverers as

24
00:01:05,710 --> 00:01:08,114
Maxam-Gilbert, was the first

25
00:01:08,114 --> 00:01:10,679
sequencing method, but is not

26
00:01:10,679 --> 00:01:13,484
longer used today. The

27
00:01:13,484 --> 00:01:16,368
method discovered by Frederick Sanger,

28
00:01:16,689 --> 00:01:19,454
after whom it is named, is still used

29
00:01:19,454 --> 00:01:22,218
today. It is also referred to

30
00:01:22,218 --> 00:01:24,742
as the dideoxy method

31
00:01:25,023 --> 00:01:27,507
because it uses specially modified

32
00:01:27,507 --> 00:01:29,791
nucleotides, so-called end

33
00:01:29,831 --> 00:01:32,796
terminators. These are added

34
00:01:32,796 --> 00:01:34,879
to the mixture with the normal

35
00:01:34,879 --> 00:01:37,324
nucleotides, and if they are

36
00:01:37,364 --> 00:01:39,928
incorporated into the chain, the

37
00:01:39,928 --> 00:01:42,011
synthesis is terminated.

38
00:01:43,213 --> 00:01:45,698
Pyrosequencing and sequencing by

39
00:01:45,698 --> 00:01:48,342
hybridization are alternative

40
00:01:48,342 --> 00:01:50,826
methods, the use of which is

41
00:01:50,826 --> 00:01:53,471
currently small. Conversely,

42
00:01:53,631 --> 00:01:55,194
next-generation sequencing

43
00:01:56,596 --> 00:01:59,481
is very widespread and used due to its

44
00:01:59,481 --> 00:02:02,045
high speed and sequencing capacity,

45
00:02:02,526 --> 00:02:05,491
as it can also be used to sequence entire

46
00:02:05,491 --> 00:02:07,975
genomes and is referred to

47
00:02:08,616 --> 00:02:10,940
as second-generation

48
00:02:10,940 --> 00:02:11,822
sequencing.

49
00:02:13,504 --> 00:02:16,229
Methods based on the sequencing of single

50
00:02:16,229 --> 00:02:18,713
DNA molecule referred to as

51
00:02:18,793 --> 00:02:21,518
third-generation sequencing, are

52
00:02:21,518 --> 00:02:24,202
increasingly being used. A

53
00:02:24,323 --> 00:02:27,127
great advantage of sequencing is

54
00:02:27,127 --> 00:02:29,531
that it enables simple and accurate

55
00:02:29,531 --> 00:02:32,336
identification of a polymorphic

56
00:02:32,336 --> 00:02:32,977
site.

57
00:02:36,102 --> 00:02:38,827
Modern automatic sequencers are

58
00:02:38,947 --> 00:02:41,632
also suitable for direct detection

59
00:02:41,952 --> 00:02:43,154
of polymorphisms.

60
00:02:45,158 --> 00:02:46,921
The principle of the method was

61
00:02:46,961 --> 00:02:49,405
discovered in 1977

62
00:02:49,966 --> 00:02:52,209
by the English biochemist Frederick

63
00:02:52,209 --> 00:02:55,134
Sanger. The bases of Sanger sequencing

64
00:02:55,415 --> 00:02:57,979
are dideoxynucleotides,

65
00:02:58,300 --> 00:03:00,704
called ddNTPs or

66
00:03:00,864 --> 00:03:01,986
end terminators.

67
00:03:02,947 --> 00:03:05,191
ddNTP is a modified

68
00:03:05,191 --> 00:03:08,156
nucleotide that has removed the OH

69
00:03:08,557 --> 00:03:11,442
binding group on the 3rd carbon of

70
00:03:11,442 --> 00:03:14,246
deoxyribose, which is necessary for

71
00:03:14,246 --> 00:03:15,849
binding another nucleotide.

72
00:03:16,971 --> 00:03:19,375
Thus, upon incorporation of the

73
00:03:19,375 --> 00:03:22,140
ddNTP, chain synthesis

74
00:03:22,140 --> 00:03:25,025
stops, producing a chain of the size

75
00:03:25,025 --> 00:03:27,268
and color corresponding to the

76
00:03:27,308 --> 00:03:29,913
corresponding nucleotide on the template.

77
00:03:32,357 --> 00:03:34,521
The sequencing process consists of

78
00:03:34,521 --> 00:03:37,365
several steps. It starts with sample

79
00:03:37,606 --> 00:03:40,210
preparation, which is most often a

80
00:03:40,210 --> 00:03:43,095
PCR product. This must be

81
00:03:43,135 --> 00:03:45,659
purified to contain only template

82
00:03:45,739 --> 00:03:48,584
DNA. Furthermore, we need to

83
00:03:48,584 --> 00:03:50,708
know at least its approximate

84
00:03:50,788 --> 00:03:53,713
concentration. The sequencing

85
00:03:53,753 --> 00:03:56,437
reaction itself is the linear cyclic

86
00:03:56,558 --> 00:03:59,002
enzymatic reaction that must

87
00:03:59,002 --> 00:04:01,686
contain the listed components. The

88
00:04:01,686 --> 00:04:04,251
sequencing primer defines the start of

89
00:04:04,251 --> 00:04:06,975
sequencing, only one must be used,

90
00:04:07,135 --> 00:04:09,660
not two as in PCR. The

91
00:04:09,660 --> 00:04:12,264
reaction is catalyzed by DNA polymerase

92
00:04:12,665 --> 00:04:15,469
inside the reaction buffer with the addition

93
00:04:15,469 --> 00:04:17,232
of standard dNTPs

94
00:04:19,236 --> 00:04:21,800
and additionally ddNTPs.

95
00:04:23,082 --> 00:04:25,847
This is followed by purification of the

96
00:04:25,887 --> 00:04:28,652
sequencing reaction when free labeled

97
00:04:28,652 --> 00:04:31,296
ddNTPs must be removed.

98
00:04:32,298 --> 00:04:35,102
The mixture of fragment is divided by

99
00:04:35,102 --> 00:04:37,947
capillary electrophoresis into sequencers

100
00:04:38,468 --> 00:04:40,151
and the result evaluated.

101
00:04:43,356 --> 00:04:46,041
Currently, a variant known as 4-colour

102
00:04:46,041 --> 00:04:48,445
terminator sequencing is routinely

103
00:04:48,445 --> 00:04:51,169
used. Individual termination

104
00:04:51,169 --> 00:04:54,054
nucleotides are marked with fluorescent

105
00:04:54,054 --> 00:04:56,739
colours according to the heterogeneous

106
00:04:56,739 --> 00:04:59,664
base they carry, or which nucleotide

107
00:04:59,784 --> 00:05:02,549
on the template they pair with.

108
00:05:03,390 --> 00:05:06,115
By the action of the polymerase, chains

109
00:05:06,115 --> 00:05:08,278
complementary to the template are

110
00:05:08,278 --> 00:05:11,083
synthesized, while usually normal

111
00:05:11,163 --> 00:05:13,447
dNTPs are incorporated

112
00:05:14,048 --> 00:05:16,933
and the chain is lengthened, but

113
00:05:17,213 --> 00:05:19,898
occasionally ddNTPs

114
00:05:19,978 --> 00:05:22,783
are also incorporated, which

115
00:05:22,823 --> 00:05:24,946
then terminates the synthesis of the

116
00:05:25,026 --> 00:05:27,711
chain. The resulting fragment

117
00:05:28,152 --> 00:05:30,556
is labelled with a fluorescent colour

118
00:05:30,956 --> 00:05:33,480
that corresponds to the nucleotide at the

119
00:05:33,480 --> 00:05:36,165
appropriate position of the template.

120
00:05:38,088 --> 00:05:41,053
A mixture of single-chain molecules of

121
00:05:41,053 --> 00:05:44,018
different lengths and different colours is

122
00:05:44,018 --> 00:05:46,983
formed. In order to determine at

123
00:05:46,983 --> 00:05:49,788
which position which base occurs, an

124
00:05:49,828 --> 00:05:52,593
accurate electrophoretic separation of

125
00:05:52,593 --> 00:05:55,157
this mixture of fragments must take

126
00:05:55,157 --> 00:05:57,521
place. To do this, it

127
00:05:57,561 --> 00:05:59,604
uses fluorescent capillary

128
00:05:59,685 --> 00:06:02,609
electrophoresis, which is the basis of

129
00:06:02,609 --> 00:06:05,054
genetic analyzers or sequencers.

130
00:06:05,855 --> 00:06:08,539
This device can not only divide and sort

131
00:06:08,539 --> 00:06:11,384
the fragments according to size, but

132
00:06:11,384 --> 00:06:13,588
also directly read the sequence of

133
00:06:13,588 --> 00:06:16,072
nucleotides based on different coloured

134
00:06:16,152 --> 00:06:19,117
peaks. E.g.

135
00:06:19,117 --> 00:06:21,922
the 100 nucleotides

136
00:06:21,922 --> 00:06:23,925
long chain emits green light,

137
00:06:24,807 --> 00:06:27,772
i.e. there was adenine in position

138
00:06:27,772 --> 00:06:30,456
100 of the template, the

139
00:06:30,496 --> 00:06:33,301
chain 101 long

140
00:06:33,301 --> 00:06:35,705
lights up red, i.e. in

141
00:06:35,705 --> 00:06:38,510
position 101 was

142
00:06:38,510 --> 00:06:40,273
thymine etc.

143
00:06:42,517 --> 00:06:45,161
The Sanger method is the most widely used

144
00:06:45,161 --> 00:06:47,325
sequencing method. Among its

145
00:06:47,605 --> 00:06:50,250
advantage are the reading of relatively

146
00:06:50,330 --> 00:06:51,892
long DNA chains

147
00:06:52,213 --> 00:06:55,018
(approximately 800 bp)

148
00:06:55,498 --> 00:06:58,183
and at the same time high accuracy and

149
00:06:58,183 --> 00:07:00,867
reliability. If we need to know the

150
00:07:00,867 --> 00:07:03,432
sequence or detect a mutation in a

151
00:07:03,432 --> 00:07:06,276
specific section in genome in one or two

152
00:07:06,357 --> 00:07:08,640
individuals, it is the most

153
00:07:08,640 --> 00:07:11,485
cost-effective option. Compared to

154
00:07:11,565 --> 00:07:14,050
NGS methods, however, the

155
00:07:14,090 --> 00:07:16,414
performance and price per one

156
00:07:16,414 --> 00:07:19,298
sequenced base is very high. So the

157
00:07:19,298 --> 00:07:21,823
method is not suitable for sequencing

158
00:07:21,823 --> 00:07:24,427
large section of the genome or even

159
00:07:24,427 --> 00:07:25,869
entire genomes.

160
00:07:28,514 --> 00:07:31,158
For whole genome sequencing, it is

161
00:07:31,158 --> 00:07:33,843
more advisable to use another method,

162
00:07:34,364 --> 00:07:36,848
the so-called next generation sequencing,

163
00:07:37,329 --> 00:07:39,893
NGS for short. The

164
00:07:39,893 --> 00:07:42,377
method is also referred to as second

165
00:07:42,377 --> 00:07:45,302
generation sequencing. The method

166
00:07:45,502 --> 00:07:48,307
is based on DNA fragmentation

167
00:07:48,307 --> 00:07:51,272
and sequencing of short fragments, but

168
00:07:51,352 --> 00:07:54,117
in huge quantity at the same time, and

169
00:07:54,718 --> 00:07:57,603
is thus referred as a massively

170
00:07:57,683 --> 00:08:00,367
parallel sequencing. The method

171
00:08:00,448 --> 00:08:03,212
enables a generally very high sequencing

172
00:08:03,212 --> 00:08:05,416
capacity. However, you can choose

173
00:08:05,416 --> 00:08:07,940
sequencer variants with a capacity

174
00:08:08,261 --> 00:08:10,585
according to your needs from 1

175
00:08:10,665 --> 00:08:13,550
gigabase to 8 terabases.

176
00:08:16,434 --> 00:08:19,399
Common NGS sequencers include Illumina

177
00:08:19,399 --> 00:08:22,324
devices. The basis of the method is the

178
00:08:22,324 --> 00:08:25,129
so-called bridge PCR and

179
00:08:25,129 --> 00:08:27,934
sequencing during synthesis using

180
00:08:27,934 --> 00:08:30,418
four-colour fluorescence. The

181
00:08:30,538 --> 00:08:32,862
actual sequencing takes place in

182
00:08:32,982 --> 00:08:35,346
clusters of the same sequence - the

183
00:08:35,346 --> 00:08:38,151
corresponding spot lights up in colour

184
00:08:38,151 --> 00:08:40,956
according to the base. Section

185
00:08:40,956 --> 00:08:43,840
from 150 to 300

186
00:08:43,840 --> 00:08:46,565
nucleotides in size are sequenced.

187
00:08:47,326 --> 00:08:49,770
A more detailed explanation is beyond the

188
00:08:49,770 --> 00:08:52,415
timeframe of this lecture, and I

189
00:08:52,415 --> 00:08:55,139
recommend watching videos on the internet

190
00:08:55,540 --> 00:08:58,185
for those interested in understanding

191
00:08:58,185 --> 00:08:59,948
this rather complex method.

192
00:09:02,191 --> 00:09:04,796
Next-generation sequencing is today a

193
00:09:04,796 --> 00:09:07,640
widely used tool in genetics for

194
00:09:07,640 --> 00:09:10,525
both research and diagnosis. The

195
00:09:10,525 --> 00:09:12,889
high capacity of the methods makes it

196
00:09:12,929 --> 00:09:15,734
possible to obtain sequences of

197
00:09:15,734 --> 00:09:18,058
even large genomes of animals,

198
00:09:19,020 --> 00:09:21,985
e.g. mammals, the size of which is around

199
00:09:21,985 --> 00:09:24,308
3 billion nucleotides,

200
00:09:24,789 --> 00:09:27,754
within a few hours to days, which

201
00:09:27,754 --> 00:09:30,158
would take several years of work using

202
00:09:30,158 --> 00:09:33,123
the Sanger sequencing method. It is

203
00:09:33,123 --> 00:09:35,607
possible to obtain the entire genetic

204
00:09:35,607 --> 00:09:38,492
information of an individual in this

205
00:09:38,492 --> 00:09:41,057
way. It therefore allows

206
00:09:41,057 --> 00:09:43,461
finding a large number of new or

207
00:09:43,461 --> 00:09:46,305
detecting all no polymorphisms and

208
00:09:46,305 --> 00:09:49,070
mutations. However, there is a

209
00:09:49,070 --> 00:09:50,953
problem with the processing and

210
00:09:51,033 --> 00:09:53,317
evaluation of large amounts of data,

211
00:09:53,798 --> 00:09:56,362
which is why it is necessary to use very

212
00:09:56,362 --> 00:09:59,327
powerful bioinformatics

213
00:09:59,327 --> 00:10:01,892
tools. Whole-genome data are

214
00:10:01,892 --> 00:10:04,696
stored in the genome databases of

215
00:10:04,776 --> 00:10:07,301
individual organisms, and can be

216
00:10:07,301 --> 00:10:09,585
compared with a specific sample.

217
00:10:10,226 --> 00:10:12,469
Then methods is also important for

218
00:10:12,469 --> 00:10:14,152
sequencing an individual,

219
00:10:14,954 --> 00:10:17,798
e.g. patients, which is the

220
00:10:17,798 --> 00:10:19,762
field of personal genomics.

221
00:10:21,845 --> 00:10:24,089
Sequencing techniques based on the

222
00:10:24,169 --> 00:10:26,333
sequencing of a single molecule are

223
00:10:26,814 --> 00:10:29,298
referred to as 3rd generation

224
00:10:29,298 --> 00:10:32,183
sequencing. The advantage is long

225
00:10:32,183 --> 00:10:35,067
reads, in the case of de novo sequencing,

226
00:10:35,388 --> 00:10:36,269
up to seven kb.

227
00:10:38,673 --> 00:10:41,158
This makes it possible to correctly

228
00:10:41,318 --> 00:10:43,722
identify variants on one chain,

229
00:10:44,603 --> 00:10:47,488
the so-called haplotypes, which reading

230
00:10:47,528 --> 00:10:49,291
of short sequencing is the second

231
00:10:49,291 --> 00:10:51,495
generation, does not allow.

232
00:10:52,296 --> 00:10:54,861
The method is gradually being improved,

233
00:10:55,382 --> 00:10:57,746
as the reading error rate is still higher

234
00:10:57,746 --> 00:11:00,590
compared to the two previous generations.

235
00:11:01,352 --> 00:11:03,635
A method known as single molecule

236
00:11:03,635 --> 00:11:06,240
real-time sequencing from Pacific

237
00:11:06,240 --> 00:11:09,004
Bioscience or Nanopore sequencers from

238
00:11:09,044 --> 00:11:11,529
Oxford Nanopore is available.

239
00:11:13,933 --> 00:11:16,457
The results of whole-genome sequencing

240
00:11:16,577 --> 00:11:19,542
are stored in genomic databases and

241
00:11:19,542 --> 00:11:22,066
often freely available. For

242
00:11:22,066 --> 00:11:24,711
example, the NCBI Internet

243
00:11:24,711 --> 00:11:27,235
database allows viewing

244
00:11:27,315 --> 00:11:29,880
down to the sequence level of

245
00:11:29,960 --> 00:11:32,203
individual gene nucleotides.

246
00:11:34,367 --> 00:11:36,731
Another practical use of sequencing in

247
00:11:36,771 --> 00:11:39,736
animals is species identification.

248
00:11:40,057 --> 00:11:43,022
Molecular taxonomy with so-called DNA

249
00:11:43,102 --> 00:11:45,586
barcoding, which makes it possible to

250
00:11:45,586 --> 00:11:48,230
identify a species where it is not

251
00:11:48,230 --> 00:11:50,594
possible with the classical method,

252
00:11:51,195 --> 00:11:52,958
e.g., in insect larvae.

253
00:11:54,200 --> 00:11:56,965
A fragment of the gene for cytochrome

254
00:11:56,965 --> 00:11:59,770
C oxidase 1, which is located

255
00:11:59,850 --> 00:12:02,815
in mitochondrial DNA, is used for

256
00:12:02,815 --> 00:12:05,379
this purpose. This

257
00:12:05,379 --> 00:12:07,703
fragment is usually sequenced by

258
00:12:07,703 --> 00:12:10,107
Sanger method, and species

259
00:12:10,107 --> 00:12:12,191
identification is performed by

260
00:12:12,271 --> 00:12:14,755
comparison with determined

261
00:12:14,835 --> 00:12:17,520
sequences stored in the Bold or Blast

262
00:12:17,640 --> 00:12:20,244
databases. The

263
00:12:20,244 --> 00:12:22,728
advantage of mitochondrial

264
00:12:22,728 --> 00:12:24,612
DNA is more copies, i.e.

265
00:12:25,974 --> 00:12:28,899
greater amount of DNA per amount

266
00:12:28,979 --> 00:12:31,904
of biological material, and higher

267
00:12:31,904 --> 00:12:34,628
stability. It is also used for the

268
00:12:34,628 --> 00:12:37,433
determination of older or museum

269
00:12:37,433 --> 00:12:39,597
samples. The

270
00:12:39,597 --> 00:12:42,241
disadvantage is the possibility of

271
00:12:42,241 --> 00:12:44,805
contamination with a bacterial genome

272
00:12:45,126 --> 00:12:46,809
(Wolbachia, etc.).

273
00:12:50,014 --> 00:12:52,498
This image shows a sample of a

274
00:12:52,498 --> 00:12:55,143
specific sequence obtained by

275
00:12:55,143 --> 00:12:57,707
mitochondrial DNA sequencing

276
00:12:58,228 --> 00:13:01,073
of an old museum butterfly specimen.

277
00:13:04,599 --> 00:13:07,043
By inserting the sequence into the

278
00:13:07,243 --> 00:13:08,926
Internet alignment search tool

279
00:13:08,926 --> 00:13:11,570
called Blast, it was

280
00:13:11,570 --> 00:13:14,135
possible to identify the species

281
00:13:14,135 --> 00:13:15,657
based on homology.

282
00:13:18,943 --> 00:13:21,828
The lecture is finished. I believe

283
00:13:21,828 --> 00:13:24,392
that you have understood the basic

284
00:13:24,392 --> 00:13:27,197
principles of one of the most important

285
00:13:27,357 --> 00:13:29,721
molecular-genetic methods - sequencing.

286
00:13:30,562 --> 00:13:33,006
I also recommend watching the follow-up

287
00:13:33,006 --> 00:13:35,931
presentation: Laboratory examples.

288
00:13:36,653 --> 00:13:38,175
Thank you for your attention.