1
00:00:01,990 --> 00:00:04,350
We will show you how to navigate genomic

2
00:00:04,350 --> 00:00:06,990
database and use various bioinformatics

3
00:00:06,990 --> 00:00:08,910
Tools to attract relevant

4
00:00:09,350 --> 00:00:11,350
genomic information, including gene

5
00:00:11,350 --> 00:00:14,030
sequence, its structure and function

6
00:00:14,030 --> 00:00:16,990
annotation. This exercise will contribute

7
00:00:16,990 --> 00:00:19,070
to better understand gene exploration in

8
00:00:19,070 --> 00:00:21,670
the context of bioinformatics and its

9
00:00:21,670 --> 00:00:23,950
potential use in animal genetics.

10
00:00:25,390 --> 00:00:28,110
First, we will use the GenBank database. Until

11
00:00:28,110 --> 00:00:30,710
in the search field, enter the name

12
00:00:30,710 --> 00:00:33,430
RYR1 gene and designation

13
00:00:33,710 --> 00:00:36,270
species, pig and give

14
00:00:36,270 --> 00:00:38,990
look up. After a while, we

15
00:00:38,990 --> 00:00:40,150
to display the results.

16
00:00:41,750 --> 00:00:44,070
We see the information about the gene, here it is

17
00:00:44,070 --> 00:00:46,990
but also about its

18
00:00:46,990 --> 00:00:48,310
messenger RNA.

19
00:00:49,750 --> 00:00:52,310
We see a unique gene number here

20
00:00:52,550 --> 00:00:53,150
his ID.

21
00:00:56,030 --> 00:00:58,950
And by clicking on the name of the gene, we get to

22
00:00:58,950 --> 00:01:00,390
for more detailed information.

23
00:01:01,960 --> 00:01:03,160
We can see that the gene is located on the

24
00:01:03,160 --> 00:01:05,800
6. chromosome and we have information about two

25
00:01:05,800 --> 00:01:08,670
genomic assemblies. One

26
00:01:08,670 --> 00:01:11,430
older 105 annotations and one

27
00:01:11,430 --> 00:01:14,190
newer 106. On

28
00:01:14,190 --> 00:01:16,470
at the end of the table are the exact positions in

29
00:01:16,790 --> 00:01:19,630
nucleotides within the genome of our

30
00:01:19,630 --> 00:01:20,590
gene RYR1.

31
00:01:23,740 --> 00:01:26,580
Below is a graphic representation of the position of

32
00:01:26,580 --> 00:01:29,390
the RYR1 gene and genes,

33
00:01:29,390 --> 00:01:31,830
that are adjacent to this gene,

34
00:01:32,030 --> 00:01:33,030
right and left.

35
00:01:34,830 --> 00:01:36,830
We can also see a graphic representation of the

36
00:01:36,830 --> 00:01:39,350
gene structure. Including

37
00:01:39,350 --> 00:01:42,230
positions of individual exons

38
00:01:42,230 --> 00:01:44,710
and introns. Hover

39
00:01:45,190 --> 00:01:47,670
Mouse cursor on this sequence

40
00:01:48,150 --> 00:01:50,070
We see more details and the possibility

41
00:01:50,070 --> 00:01:52,430
Download sub-sequences.

42
00:01:54,960 --> 00:01:57,440
This is the sequence listed in the database

43
00:01:57,440 --> 00:01:58,640
GenBank NCBI.

44
00:02:02,980 --> 00:02:05,660
We also see another sequence of the same

45
00:02:05,660 --> 00:02:07,860
gene listed in the Ensemble database.

46
00:02:11,270 --> 00:02:13,910
We can read here about information such as

47
00:02:13,910 --> 00:02:15,670
that it's a structural gene

48
00:02:16,670 --> 00:02:18,510
What protein does it encode?

49
00:02:19,470 --> 00:02:22,350
What is its length and more?

50
00:02:26,320 --> 00:02:29,200
If you go below the links,

51
00:02:29,200 --> 00:02:31,640
we click on the Fast record,

52
00:02:31,800 --> 00:02:32,880
Fasta record,

53
00:02:34,810 --> 00:02:37,090
We get into the sequence itself.

54
00:02:46,860 --> 00:02:49,300
We can see that it really is

55
00:02:49,300 --> 00:02:51,900
genomic sequence of whole-genome shotgun

56
00:02:51,900 --> 00:02:54,760
sequence. From the genome

57
00:02:54,760 --> 00:02:57,680
SusScrofa 11.1 in

58
00:02:57,680 --> 00:03:00,360
positions of our selected region with Gene

59
00:03:00,360 --> 00:03:01,040
RYR1.

60
00:03:02,970 --> 00:03:05,410
We can see that the sequence is really long.

61
00:03:06,950 --> 00:03:09,670
Ryr1 ge is one of the longest genes.

62
00:03:11,140 --> 00:03:13,420
We will get more information on this sequence

63
00:03:13,420 --> 00:03:15,020
by expanding the Fasta menu.

64
00:03:17,440 --> 00:03:19,920
That offers us different possibilities,

65
00:03:19,920 --> 00:03:22,520
download in various formats of this

66
00:03:22,520 --> 00:03:25,370
sequence. When

67
00:03:25,370 --> 00:03:27,710
We will go back to

68
00:03:27,710 --> 00:03:30,670
information on the RYR1 gene

69
00:03:30,670 --> 00:03:33,550
And we scroll a little lower we see

70
00:03:33,630 --> 00:03:36,030
Express data information.

71
00:03:37,190 --> 00:03:38,870
We can see that this gene is the most

72
00:03:38,870 --> 00:03:41,790
expressed in skeletal muscle.

73
00:03:44,460 --> 00:03:46,220
These are results from various

74
00:03:46,220 --> 00:03:46,860
Publications.

75
00:03:49,110 --> 00:03:50,950
Learn more about this

76
00:03:50,950 --> 00:03:52,150
gene can be found below.

77
00:03:54,200 --> 00:03:56,760
In particular, note that it is a gene

78
00:03:58,030 --> 00:04:00,670
which involves the transport of calcium ions,

79
00:04:04,310 --> 00:04:06,750
in the sarcoplasmic reticulum of the muscle

80
00:04:06,750 --> 00:04:09,710
cells. They follow up on this

81
00:04:09,710 --> 00:04:12,350
information about the protein about its functions.

82
00:04:12,910 --> 00:04:14,990
Among other things, we read there that this applies to

83
00:04:15,230 --> 00:04:16,990
pig stress syndrome.

84
00:04:18,350 --> 00:04:21,190
And that it is basically about 

85
00:04:21,190 --> 00:04:23,550
protein with calcium function

86
00:04:23,550 --> 00:04:24,190
Channel.

87
00:04:27,510 --> 00:04:29,870
A little below are shown

88
00:04:29,870 --> 00:04:32,190
Reference sequence information

89
00:04:32,190 --> 00:04:35,070
transcripts, which in turn were described in

90
00:04:35,070 --> 00:04:37,640
Literature. And stored in

91
00:04:37,640 --> 00:04:39,000
GanBank database.

92
00:04:42,060 --> 00:04:45,020
In the last part we can see

93
00:04:46,860 --> 00:04:49,020
genomic or mRNA sequences,

94
00:04:49,700 --> 00:04:52,580
which may be related or related to

95
00:04:52,580 --> 00:04:55,460
gene sequence of the RYR1 gene in

96
00:04:55,460 --> 00:04:56,060
Pig.

97
00:05:06,160 --> 00:05:08,360
The last thing we can see is the link

98
00:05:09,440 --> 00:05:11,920
to the protein RYR1 gene up to

99
00:05:11,920 --> 00:05:13,280
Uniprot.

100
00:05:28,310 --> 00:05:31,070
However, we will go back up and

101
00:05:31,070 --> 00:05:33,830
click on the GenBank link

102
00:05:33,830 --> 00:05:35,470
on the right.

103
00:05:37,680 --> 00:05:40,440
Which brings us to the reference sequence

104
00:05:40,800 --> 00:05:41,960
gene regions.

105
00:05:46,430 --> 00:05:48,870
We find its unique number,

106
00:05:52,640 --> 00:05:55,240
Furthermore, information on what kind it is.

107
00:05:59,450 --> 00:06:01,450
And then what is the length of this sequence?

108
00:06:02,510 --> 00:06:04,990
We see that over 118000 pairs

109
00:06:04,990 --> 00:06:07,790
base. Well, and when

110
00:06:07,790 --> 00:06:10,750
We move even lower, so we see

111
00:06:10,750 --> 00:06:12,150
information about the gene.

112
00:06:13,880 --> 00:06:16,440
How long is it and further about individual

113
00:06:19,140 --> 00:06:21,620
sections that then form

114
00:06:21,980 --> 00:06:24,940
transcribed mRNA, that is,

115
00:06:25,180 --> 00:06:27,020
there are those coding sequences.

116
00:06:28,920 --> 00:06:30,720
And when we click on mRNA.

117
00:06:33,400 --> 00:06:36,360
We will see all these encoding,

118
00:06:37,230 --> 00:06:39,830
exon sequences within the genomic

119
00:06:39,830 --> 00:06:40,350
sequence.

120
00:06:43,630 --> 00:06:46,270
And when we click on CDS

121
00:06:46,510 --> 00:06:48,030
i.e. coding sequences.

122
00:06:49,390 --> 00:06:51,830
In this way, we will only mark those

123
00:06:52,110 --> 00:06:54,630
which parts of the sequence

124
00:06:54,710 --> 00:06:56,670
sequences that will be

125
00:06:57,270 --> 00:07:00,110
Translated into

126
00:07:00,110 --> 00:07:00,710
Proteins.

127
00:07:03,040 --> 00:07:05,800
And we see that really with the first triplet,

128
00:07:06,120 --> 00:07:07,720
The first three bases

129
00:07:08,080 --> 00:07:10,960
is ATG, which in mRNA language

130
00:07:10,960 --> 00:07:13,760
means AUG, which is the initiation code

131
00:07:14,280 --> 00:07:16,320
coding for the first amino acid in each

132
00:07:16,320 --> 00:07:17,760
protein, methionine.

133
00:07:23,780 --> 00:07:25,380
Because this gene is really very

134
00:07:25,380 --> 00:07:28,260
long, also coding protein

135
00:07:28,260 --> 00:07:30,500
of this gene is quite long.

136
00:07:31,430 --> 00:07:33,510
We can also see its sequence here.

137
00:07:41,460 --> 00:07:43,780
And that includes his identification number.

138
00:07:50,890 --> 00:07:53,410
If we go back to the beginning

139
00:07:53,730 --> 00:07:55,050
our searches,

140
00:07:56,590 --> 00:07:59,110
We can click on the link to

141
00:07:59,110 --> 00:08:01,070
sequence of messenger RNA.

142
00:08:01,990 --> 00:08:04,630
Here, as you can see, it has over 15 thousand base pairs.

143
00:08:06,890 --> 00:08:09,330
Again, this is a reference sequence,

144
00:08:10,010 --> 00:08:11,890
which has its own unique number.

145
00:08:13,270 --> 00:08:15,070
The reference sequence means that it has been

146
00:08:15,070 --> 00:08:17,990
assembled from many mRNA subsequences

147
00:08:17,990 --> 00:08:20,030
by various authors.

148
00:08:23,490 --> 00:08:26,410
Here, too, we have various information, including

149
00:08:26,410 --> 00:08:29,190
and about protein, about its

150
00:08:29,190 --> 00:08:31,870
Sequence. And when

151
00:08:31,870 --> 00:08:34,710
click on the CDS link, we get the coding

152
00:08:34,710 --> 00:08:37,430
part of this sequence, that is, the one that

153
00:08:37,430 --> 00:08:40,310
it encodes this protein. Because

154
00:08:40,310 --> 00:08:43,070
it is mRNA, there are none

155
00:08:43,070 --> 00:08:45,550
introns and we have 1 large unit.

156
00:08:46,670 --> 00:08:49,350
Nevertheless, even here they are marked

157
00:08:49,350 --> 00:08:52,270
individual exons from which this mRNA

158
00:08:52,270 --> 00:08:53,270
is made up.

159
00:08:55,030 --> 00:08:57,110
The second genomic database that we will

160
00:08:57,110 --> 00:08:58,470
deal with it, is ENSEMBL.

161
00:09:00,730 --> 00:09:02,690
It's again a public database,

162
00:09:03,710 --> 00:09:06,590
in which there are vertebrates.

163
00:09:07,910 --> 00:09:09,670
When we unfold the blind,

164
00:09:10,310 --> 00:09:13,150
We can search for various

165
00:09:13,150 --> 00:09:16,030
species, including humans, of course.

166
00:09:18,410 --> 00:09:21,390
And we can find our 

167
00:09:21,390 --> 00:09:23,190
the species under study, the pig.

168
00:09:26,820 --> 00:09:29,420
Enter the name in the search box again

169
00:09:29,540 --> 00:09:32,340
or the abbreviation of our gene RYR1.

170
00:09:34,780 --> 00:09:37,660
And in a moment we will see many results.

171
00:09:38,140 --> 00:09:40,580
We will be interested in the reference

172
00:09:40,580 --> 00:09:41,380
sequence.

173
00:09:44,730 --> 00:09:47,210
Otherwise, we can find other sources here

174
00:09:47,210 --> 00:09:49,890
information, sequence from

175
00:09:50,410 --> 00:09:52,930
other authors, or even

176
00:09:53,210 --> 00:09:55,970
individual pig breeds,

177
00:09:55,970 --> 00:09:58,250
in which the same sequence was investigated.

178
00:10:00,540 --> 00:10:02,700
On the next page in the results

179
00:10:03,300 --> 00:10:05,660
we see similar information again, that is,

180
00:10:05,660 --> 00:10:08,620
genome information where the gene is located -

181
00:10:08,620 --> 00:10:11,500
Again, we see the 6th chromosome, in which 

182
00:10:11,660 --> 00:10:14,220
range with sequence,

183
00:10:14,220 --> 00:10:16,020
Is our gene found?

184
00:10:17,630 --> 00:10:19,390
And in the graphic display, we see again

185
00:10:19,390 --> 00:10:21,710
results of various studies that

186
00:10:21,710 --> 00:10:23,270
They cut the same gen.

187
00:10:32,030 --> 00:10:34,830
But we will look at the offer

188
00:10:35,110 --> 00:10:38,000
at the top left. Where

189
00:10:38,000 --> 00:10:40,960
is a structure in which we choose a part of

190
00:10:40,960 --> 00:10:43,830
Ontologies, and Looking

191
00:10:43,950 --> 00:10:46,810
to information on

192
00:10:46,810 --> 00:10:48,370
molecular functions.

193
00:10:51,030 --> 00:10:53,790
Where do we actually see similar functions that

194
00:10:53,790 --> 00:10:56,670
were in the Genbank database. Here it is

195
00:10:56,670 --> 00:10:58,430
about the same features.

196
00:11:00,140 --> 00:11:02,500
Next, we will look at biological

197
00:11:02,500 --> 00:11:05,340
processes. Again, it is

198
00:11:05,340 --> 00:11:07,740
same information. These are

199
00:11:07,740 --> 00:11:10,500
protein that is involved in transport

200
00:11:10,500 --> 00:11:11,260
Calcium

201
00:11:15,400 --> 00:11:17,400
and transmembrane transport.

202
00:11:19,410 --> 00:11:22,370
A processes at the cell level

203
00:11:22,810 --> 00:11:24,370
it is again about

204
00:11:25,050 --> 00:11:27,410
membrane protein, trans membrane

205
00:11:27,410 --> 00:11:29,530
protein found in

206
00:11:29,530 --> 00:11:31,570
sarcoplasmic reticulum.

207
00:11:40,720 --> 00:11:43,400
In the Genetic variantion section, we will

208
00:11:43,400 --> 00:11:45,440
to be interested in what has already been described

209
00:11:45,440 --> 00:11:47,640
variability in this gene.

210
00:11:51,490 --> 00:11:54,170
In tabular form, we can then

211
00:11:54,170 --> 00:11:56,570
see a large number of

212
00:11:56,930 --> 00:11:59,670
Mutations. In particular,

213
00:12:00,230 --> 00:12:02,950
these are simple polymorphisms of the SNP,

214
00:12:03,710 --> 00:12:05,830
but also deletions or advertisements.

215
00:12:08,490 --> 00:12:10,610
And this from the beginning of the gene.

216
00:12:11,900 --> 00:12:14,660
Furthermore, within introns and exons.

217
00:12:16,320 --> 00:12:18,760
There is information about a specific position

218
00:12:19,200 --> 00:12:21,120
of this polymorphism.

219
00:12:22,660 --> 00:12:25,620
What is the alternation of nucleotides there,

220
00:12:27,790 --> 00:12:28,710
And what type is it

221
00:12:30,410 --> 00:12:31,290
variability.

222
00:12:36,350 --> 00:12:38,630
The most interesting are the mutations that

223
00:12:38,630 --> 00:12:41,350
are part of the exon, in particular

224
00:12:41,350 --> 00:12:44,150
Misssence - variants that can

225
00:12:44,150 --> 00:12:46,190
cause amino acid substitution.

226
00:12:47,620 --> 00:12:49,860
We will be interested in the SNP that causes

227
00:12:49,860 --> 00:12:52,420
amino acid swap at position 612.

228
00:12:55,240 --> 00:12:57,560
This is a mutation that was described in

229
00:12:57,880 --> 00:12:59,760
Fuji et al. published in

230
00:12:59,760 --> 00:13:01,680
 1991.

231
00:13:02,860 --> 00:13:04,980
It is currently associated with

232
00:13:04,980 --> 00:13:07,260
the sensitivity of pigs to stress;

233
00:13:07,900 --> 00:13:09,420
so-called malignant hyperthermia.

234
00:13:10,740 --> 00:13:13,220
In order to identify this mutation in

235
00:13:13,220 --> 00:13:15,820
genome, we will use the sequence

236
00:13:15,820 --> 00:13:17,500
described in this publication,

237
00:13:18,900 --> 00:13:21,020
which we copy and paste into the new

238
00:13:21,020 --> 00:13:22,900
file in a text editor.

239
00:13:24,130 --> 00:13:26,730
Furthermore, from the genomic sequence,

240
00:13:26,730 --> 00:13:29,410
format we download the sequence

241
00:13:29,490 --> 00:13:31,730
Gene. And this

242
00:13:31,730 --> 00:13:34,410
Copy the sequence to

243
00:13:34,410 --> 00:13:36,770
Clipboard.

244
00:13:39,850 --> 00:13:41,730
And because it will be really long, it will be

245
00:13:41,730 --> 00:13:43,490
take a while to mark the whole

246
00:13:43,490 --> 00:13:44,210
Sequence.

247
00:13:49,630 --> 00:13:51,470
Again, we can insert this sequence into

248
00:13:51,470 --> 00:13:54,110
separate text file to

249
00:13:54,110 --> 00:13:55,430
They could continue to work with her.

250
00:13:57,750 --> 00:14:00,190
As the main tool for finding an area

251
00:14:00,190 --> 00:14:02,850
in the genome, where our

252
00:14:02,850 --> 00:14:05,530
mutation, we will use the

253
00:14:05,530 --> 00:14:08,370
BLAST, which performs local

254
00:14:08,370 --> 00:14:10,930
aliment from a sequence from the publication

255
00:14:11,690 --> 00:14:13,250
with genomic sequence.

256
00:14:14,450 --> 00:14:17,050
Therefore, we check alignment 2 or more

257
00:14:17,050 --> 00:14:19,700
Sequences. Until

258
00:14:19,700 --> 00:14:21,860
upper window, we insert our genomic

259
00:14:21,860 --> 00:14:24,660
sequence and to the bottom of this

260
00:14:24,660 --> 00:14:25,780
sequence from

261
00:14:27,210 --> 00:14:27,890
publication.

262
00:14:37,540 --> 00:14:40,340
We leave the menu checked,

263
00:14:40,340 --> 00:14:43,220
Highly simillar sekvences Megablast

264
00:14:43,260 --> 00:14:45,660
and tick the offer to show us the result

265
00:14:45,660 --> 00:14:48,610
appeared in a new window. Click

266
00:14:48,610 --> 00:14:50,890
BLAST will start a search.

267
00:14:53,480 --> 00:14:55,800
In the new window in the results, we can see in

268
00:14:55,800 --> 00:14:58,400
table

269
00:14:58,400 --> 00:15:01,200
alignments, in our case only one.

270
00:15:02,570 --> 00:15:04,410
And if we click on it, we'll see

271
00:15:04,570 --> 00:15:07,530
Assignment result

272
00:15:07,610 --> 00:15:10,410
of our sequence from a publication with a whole gene

273
00:15:10,410 --> 00:15:13,330
Sequences. Our sequence has 74

274
00:15:13,330 --> 00:15:16,090
base pairs. And genome-wide

275
00:15:16,090 --> 00:15:17,410
sequences start with

276
00:15:17,730 --> 00:15:20,610
 18176 based and

277
00:15:20,610 --> 00:15:23,450
ends at 18249

278
00:15:23,450 --> 00:15:26,130
Base. We copy

279
00:15:26,130 --> 00:15:28,610
result in Word to make us better

280
00:15:28,610 --> 00:15:29,730
worked with text.

281
00:15:31,770 --> 00:15:34,250
We mark the place of the C/T mutation.

282
00:15:35,610 --> 00:15:37,570
And this is the position of our SNP,

283
00:15:38,410 --> 00:15:40,810
that causes amino acid substitution,

284
00:15:41,130 --> 00:15:44,090
because it's 1 base in a triplet

285
00:15:44,130 --> 00:15:46,130
TGC or CGC.

286
00:15:46,890 --> 00:15:49,690
We know from the publication that the CGC variant

287
00:15:49,690 --> 00:15:52,450
presents us with the original dominant

288
00:15:52,450 --> 00:15:55,250
allele of a capital N, which causes

289
00:15:55,250 --> 00:15:58,130
The position of the 615 amino acid is arginine.

290
00:15:59,530 --> 00:16:02,290
Variant with T mutation or TGC

291
00:16:02,370 --> 00:16:04,970
causes amino acid substitution into

292
00:16:04,970 --> 00:16:07,370
Cysteine and that represents to us

293
00:16:07,490 --> 00:16:10,370
recessive allele n, homozygous unit s

294
00:16:10,370 --> 00:16:12,330
This mutation makes them susceptible to stress.

295
00:16:13,890 --> 00:16:14,930
Thank you for your attention.