1 00:00:01,990 --> 00:00:04,350 We will show you how to navigate genomic 2 00:00:04,350 --> 00:00:06,990 database and use various bioinformatics 3 00:00:06,990 --> 00:00:08,910 Tools to attract relevant 4 00:00:09,350 --> 00:00:11,350 genomic information, including gene 5 00:00:11,350 --> 00:00:14,030 sequence, its structure and function 6 00:00:14,030 --> 00:00:16,990 annotation. This exercise will contribute 7 00:00:16,990 --> 00:00:19,070 to better understand gene exploration in 8 00:00:19,070 --> 00:00:21,670 the context of bioinformatics and its 9 00:00:21,670 --> 00:00:23,950 potential use in animal genetics. 10 00:00:25,390 --> 00:00:28,110 First, we will use the GenBank database. Until 11 00:00:28,110 --> 00:00:30,710 in the search field, enter the name 12 00:00:30,710 --> 00:00:33,430 RYR1 gene and designation 13 00:00:33,710 --> 00:00:36,270 species, pig and give 14 00:00:36,270 --> 00:00:38,990 look up. After a while, we 15 00:00:38,990 --> 00:00:40,150 to display the results. 16 00:00:41,750 --> 00:00:44,070 We see the information about the gene, here it is 17 00:00:44,070 --> 00:00:46,990 but also about its 18 00:00:46,990 --> 00:00:48,310 messenger RNA. 19 00:00:49,750 --> 00:00:52,310 We see a unique gene number here 20 00:00:52,550 --> 00:00:53,150 his ID. 21 00:00:56,030 --> 00:00:58,950 And by clicking on the name of the gene, we get to 22 00:00:58,950 --> 00:01:00,390 for more detailed information. 23 00:01:01,960 --> 00:01:03,160 We can see that the gene is located on the 24 00:01:03,160 --> 00:01:05,800 6. chromosome and we have information about two 25 00:01:05,800 --> 00:01:08,670 genomic assemblies. One 26 00:01:08,670 --> 00:01:11,430 older 105 annotations and one 27 00:01:11,430 --> 00:01:14,190 newer 106. On 28 00:01:14,190 --> 00:01:16,470 at the end of the table are the exact positions in 29 00:01:16,790 --> 00:01:19,630 nucleotides within the genome of our 30 00:01:19,630 --> 00:01:20,590 gene RYR1. 31 00:01:23,740 --> 00:01:26,580 Below is a graphic representation of the position of 32 00:01:26,580 --> 00:01:29,390 the RYR1 gene and genes, 33 00:01:29,390 --> 00:01:31,830 that are adjacent to this gene, 34 00:01:32,030 --> 00:01:33,030 right and left. 35 00:01:34,830 --> 00:01:36,830 We can also see a graphic representation of the 36 00:01:36,830 --> 00:01:39,350 gene structure. Including 37 00:01:39,350 --> 00:01:42,230 positions of individual exons 38 00:01:42,230 --> 00:01:44,710 and introns. Hover 39 00:01:45,190 --> 00:01:47,670 Mouse cursor on this sequence 40 00:01:48,150 --> 00:01:50,070 We see more details and the possibility 41 00:01:50,070 --> 00:01:52,430 Download sub-sequences. 42 00:01:54,960 --> 00:01:57,440 This is the sequence listed in the database 43 00:01:57,440 --> 00:01:58,640 GenBank NCBI. 44 00:02:02,980 --> 00:02:05,660 We also see another sequence of the same 45 00:02:05,660 --> 00:02:07,860 gene listed in the Ensemble database. 46 00:02:11,270 --> 00:02:13,910 We can read here about information such as 47 00:02:13,910 --> 00:02:15,670 that it's a structural gene 48 00:02:16,670 --> 00:02:18,510 What protein does it encode? 49 00:02:19,470 --> 00:02:22,350 What is its length and more? 50 00:02:26,320 --> 00:02:29,200 If you go below the links, 51 00:02:29,200 --> 00:02:31,640 we click on the Fast record, 52 00:02:31,800 --> 00:02:32,880 Fasta record, 53 00:02:34,810 --> 00:02:37,090 We get into the sequence itself. 54 00:02:46,860 --> 00:02:49,300 We can see that it really is 55 00:02:49,300 --> 00:02:51,900 genomic sequence of whole-genome shotgun 56 00:02:51,900 --> 00:02:54,760 sequence. From the genome 57 00:02:54,760 --> 00:02:57,680 SusScrofa 11.1 in 58 00:02:57,680 --> 00:03:00,360 positions of our selected region with Gene 59 00:03:00,360 --> 00:03:01,040 RYR1. 60 00:03:02,970 --> 00:03:05,410 We can see that the sequence is really long. 61 00:03:06,950 --> 00:03:09,670 Ryr1 ge is one of the longest genes. 62 00:03:11,140 --> 00:03:13,420 We will get more information on this sequence 63 00:03:13,420 --> 00:03:15,020 by expanding the Fasta menu. 64 00:03:17,440 --> 00:03:19,920 That offers us different possibilities, 65 00:03:19,920 --> 00:03:22,520 download in various formats of this 66 00:03:22,520 --> 00:03:25,370 sequence. When 67 00:03:25,370 --> 00:03:27,710 We will go back to 68 00:03:27,710 --> 00:03:30,670 information on the RYR1 gene 69 00:03:30,670 --> 00:03:33,550 And we scroll a little lower we see 70 00:03:33,630 --> 00:03:36,030 Express data information. 71 00:03:37,190 --> 00:03:38,870 We can see that this gene is the most 72 00:03:38,870 --> 00:03:41,790 expressed in skeletal muscle. 73 00:03:44,460 --> 00:03:46,220 These are results from various 74 00:03:46,220 --> 00:03:46,860 Publications. 75 00:03:49,110 --> 00:03:50,950 Learn more about this 76 00:03:50,950 --> 00:03:52,150 gene can be found below. 77 00:03:54,200 --> 00:03:56,760 In particular, note that it is a gene 78 00:03:58,030 --> 00:04:00,670 which involves the transport of calcium ions, 79 00:04:04,310 --> 00:04:06,750 in the sarcoplasmic reticulum of the muscle 80 00:04:06,750 --> 00:04:09,710 cells. They follow up on this 81 00:04:09,710 --> 00:04:12,350 information about the protein about its functions. 82 00:04:12,910 --> 00:04:14,990 Among other things, we read there that this applies to 83 00:04:15,230 --> 00:04:16,990 pig stress syndrome. 84 00:04:18,350 --> 00:04:21,190 And that it is basically about 85 00:04:21,190 --> 00:04:23,550 protein with calcium function 86 00:04:23,550 --> 00:04:24,190 Channel. 87 00:04:27,510 --> 00:04:29,870 A little below are shown 88 00:04:29,870 --> 00:04:32,190 Reference sequence information 89 00:04:32,190 --> 00:04:35,070 transcripts, which in turn were described in 90 00:04:35,070 --> 00:04:37,640 Literature. And stored in 91 00:04:37,640 --> 00:04:39,000 GanBank database. 92 00:04:42,060 --> 00:04:45,020 In the last part we can see 93 00:04:46,860 --> 00:04:49,020 genomic or mRNA sequences, 94 00:04:49,700 --> 00:04:52,580 which may be related or related to 95 00:04:52,580 --> 00:04:55,460 gene sequence of the RYR1 gene in 96 00:04:55,460 --> 00:04:56,060 Pig. 97 00:05:06,160 --> 00:05:08,360 The last thing we can see is the link 98 00:05:09,440 --> 00:05:11,920 to the protein RYR1 gene up to 99 00:05:11,920 --> 00:05:13,280 Uniprot. 100 00:05:28,310 --> 00:05:31,070 However, we will go back up and 101 00:05:31,070 --> 00:05:33,830 click on the GenBank link 102 00:05:33,830 --> 00:05:35,470 on the right. 103 00:05:37,680 --> 00:05:40,440 Which brings us to the reference sequence 104 00:05:40,800 --> 00:05:41,960 gene regions. 105 00:05:46,430 --> 00:05:48,870 We find its unique number, 106 00:05:52,640 --> 00:05:55,240 Furthermore, information on what kind it is. 107 00:05:59,450 --> 00:06:01,450 And then what is the length of this sequence? 108 00:06:02,510 --> 00:06:04,990 We see that over 118000 pairs 109 00:06:04,990 --> 00:06:07,790 base. Well, and when 110 00:06:07,790 --> 00:06:10,750 We move even lower, so we see 111 00:06:10,750 --> 00:06:12,150 information about the gene. 112 00:06:13,880 --> 00:06:16,440 How long is it and further about individual 113 00:06:19,140 --> 00:06:21,620 sections that then form 114 00:06:21,980 --> 00:06:24,940 transcribed mRNA, that is, 115 00:06:25,180 --> 00:06:27,020 there are those coding sequences. 116 00:06:28,920 --> 00:06:30,720 And when we click on mRNA. 117 00:06:33,400 --> 00:06:36,360 We will see all these encoding, 118 00:06:37,230 --> 00:06:39,830 exon sequences within the genomic 119 00:06:39,830 --> 00:06:40,350 sequence. 120 00:06:43,630 --> 00:06:46,270 And when we click on CDS 121 00:06:46,510 --> 00:06:48,030 i.e. coding sequences. 122 00:06:49,390 --> 00:06:51,830 In this way, we will only mark those 123 00:06:52,110 --> 00:06:54,630 which parts of the sequence 124 00:06:54,710 --> 00:06:56,670 sequences that will be 125 00:06:57,270 --> 00:07:00,110 Translated into 126 00:07:00,110 --> 00:07:00,710 Proteins. 127 00:07:03,040 --> 00:07:05,800 And we see that really with the first triplet, 128 00:07:06,120 --> 00:07:07,720 The first three bases 129 00:07:08,080 --> 00:07:10,960 is ATG, which in mRNA language 130 00:07:10,960 --> 00:07:13,760 means AUG, which is the initiation code 131 00:07:14,280 --> 00:07:16,320 coding for the first amino acid in each 132 00:07:16,320 --> 00:07:17,760 protein, methionine. 133 00:07:23,780 --> 00:07:25,380 Because this gene is really very 134 00:07:25,380 --> 00:07:28,260 long, also coding protein 135 00:07:28,260 --> 00:07:30,500 of this gene is quite long. 136 00:07:31,430 --> 00:07:33,510 We can also see its sequence here. 137 00:07:41,460 --> 00:07:43,780 And that includes his identification number. 138 00:07:50,890 --> 00:07:53,410 If we go back to the beginning 139 00:07:53,730 --> 00:07:55,050 our searches, 140 00:07:56,590 --> 00:07:59,110 We can click on the link to 141 00:07:59,110 --> 00:08:01,070 sequence of messenger RNA. 142 00:08:01,990 --> 00:08:04,630 Here, as you can see, it has over 15 thousand base pairs. 143 00:08:06,890 --> 00:08:09,330 Again, this is a reference sequence, 144 00:08:10,010 --> 00:08:11,890 which has its own unique number. 145 00:08:13,270 --> 00:08:15,070 The reference sequence means that it has been 146 00:08:15,070 --> 00:08:17,990 assembled from many mRNA subsequences 147 00:08:17,990 --> 00:08:20,030 by various authors. 148 00:08:23,490 --> 00:08:26,410 Here, too, we have various information, including 149 00:08:26,410 --> 00:08:29,190 and about protein, about its 150 00:08:29,190 --> 00:08:31,870 Sequence. And when 151 00:08:31,870 --> 00:08:34,710 click on the CDS link, we get the coding 152 00:08:34,710 --> 00:08:37,430 part of this sequence, that is, the one that 153 00:08:37,430 --> 00:08:40,310 it encodes this protein. Because 154 00:08:40,310 --> 00:08:43,070 it is mRNA, there are none 155 00:08:43,070 --> 00:08:45,550 introns and we have 1 large unit. 156 00:08:46,670 --> 00:08:49,350 Nevertheless, even here they are marked 157 00:08:49,350 --> 00:08:52,270 individual exons from which this mRNA 158 00:08:52,270 --> 00:08:53,270 is made up. 159 00:08:55,030 --> 00:08:57,110 The second genomic database that we will 160 00:08:57,110 --> 00:08:58,470 deal with it, is ENSEMBL. 161 00:09:00,730 --> 00:09:02,690 It's again a public database, 162 00:09:03,710 --> 00:09:06,590 in which there are vertebrates. 163 00:09:07,910 --> 00:09:09,670 When we unfold the blind, 164 00:09:10,310 --> 00:09:13,150 We can search for various 165 00:09:13,150 --> 00:09:16,030 species, including humans, of course. 166 00:09:18,410 --> 00:09:21,390 And we can find our 167 00:09:21,390 --> 00:09:23,190 the species under study, the pig. 168 00:09:26,820 --> 00:09:29,420 Enter the name in the search box again 169 00:09:29,540 --> 00:09:32,340 or the abbreviation of our gene RYR1. 170 00:09:34,780 --> 00:09:37,660 And in a moment we will see many results. 171 00:09:38,140 --> 00:09:40,580 We will be interested in the reference 172 00:09:40,580 --> 00:09:41,380 sequence. 173 00:09:44,730 --> 00:09:47,210 Otherwise, we can find other sources here 174 00:09:47,210 --> 00:09:49,890 information, sequence from 175 00:09:50,410 --> 00:09:52,930 other authors, or even 176 00:09:53,210 --> 00:09:55,970 individual pig breeds, 177 00:09:55,970 --> 00:09:58,250 in which the same sequence was investigated. 178 00:10:00,540 --> 00:10:02,700 On the next page in the results 179 00:10:03,300 --> 00:10:05,660 we see similar information again, that is, 180 00:10:05,660 --> 00:10:08,620 genome information where the gene is located - 181 00:10:08,620 --> 00:10:11,500 Again, we see the 6th chromosome, in which 182 00:10:11,660 --> 00:10:14,220 range with sequence, 183 00:10:14,220 --> 00:10:16,020 Is our gene found? 184 00:10:17,630 --> 00:10:19,390 And in the graphic display, we see again 185 00:10:19,390 --> 00:10:21,710 results of various studies that 186 00:10:21,710 --> 00:10:23,270 They cut the same gen. 187 00:10:32,030 --> 00:10:34,830 But we will look at the offer 188 00:10:35,110 --> 00:10:38,000 at the top left. Where 189 00:10:38,000 --> 00:10:40,960 is a structure in which we choose a part of 190 00:10:40,960 --> 00:10:43,830 Ontologies, and Looking 191 00:10:43,950 --> 00:10:46,810 to information on 192 00:10:46,810 --> 00:10:48,370 molecular functions. 193 00:10:51,030 --> 00:10:53,790 Where do we actually see similar functions that 194 00:10:53,790 --> 00:10:56,670 were in the Genbank database. Here it is 195 00:10:56,670 --> 00:10:58,430 about the same features. 196 00:11:00,140 --> 00:11:02,500 Next, we will look at biological 197 00:11:02,500 --> 00:11:05,340 processes. Again, it is 198 00:11:05,340 --> 00:11:07,740 same information. These are 199 00:11:07,740 --> 00:11:10,500 protein that is involved in transport 200 00:11:10,500 --> 00:11:11,260 Calcium 201 00:11:15,400 --> 00:11:17,400 and transmembrane transport. 202 00:11:19,410 --> 00:11:22,370 A processes at the cell level 203 00:11:22,810 --> 00:11:24,370 it is again about 204 00:11:25,050 --> 00:11:27,410 membrane protein, trans membrane 205 00:11:27,410 --> 00:11:29,530 protein found in 206 00:11:29,530 --> 00:11:31,570 sarcoplasmic reticulum. 207 00:11:40,720 --> 00:11:43,400 In the Genetic variantion section, we will 208 00:11:43,400 --> 00:11:45,440 to be interested in what has already been described 209 00:11:45,440 --> 00:11:47,640 variability in this gene. 210 00:11:51,490 --> 00:11:54,170 In tabular form, we can then 211 00:11:54,170 --> 00:11:56,570 see a large number of 212 00:11:56,930 --> 00:11:59,670 Mutations. In particular, 213 00:12:00,230 --> 00:12:02,950 these are simple polymorphisms of the SNP, 214 00:12:03,710 --> 00:12:05,830 but also deletions or advertisements. 215 00:12:08,490 --> 00:12:10,610 And this from the beginning of the gene. 216 00:12:11,900 --> 00:12:14,660 Furthermore, within introns and exons. 217 00:12:16,320 --> 00:12:18,760 There is information about a specific position 218 00:12:19,200 --> 00:12:21,120 of this polymorphism. 219 00:12:22,660 --> 00:12:25,620 What is the alternation of nucleotides there, 220 00:12:27,790 --> 00:12:28,710 And what type is it 221 00:12:30,410 --> 00:12:31,290 variability. 222 00:12:36,350 --> 00:12:38,630 The most interesting are the mutations that 223 00:12:38,630 --> 00:12:41,350 are part of the exon, in particular 224 00:12:41,350 --> 00:12:44,150 Misssence - variants that can 225 00:12:44,150 --> 00:12:46,190 cause amino acid substitution. 226 00:12:47,620 --> 00:12:49,860 We will be interested in the SNP that causes 227 00:12:49,860 --> 00:12:52,420 amino acid swap at position 612. 228 00:12:55,240 --> 00:12:57,560 This is a mutation that was described in 229 00:12:57,880 --> 00:12:59,760 Fuji et al. published in 230 00:12:59,760 --> 00:13:01,680 1991. 231 00:13:02,860 --> 00:13:04,980 It is currently associated with 232 00:13:04,980 --> 00:13:07,260 the sensitivity of pigs to stress; 233 00:13:07,900 --> 00:13:09,420 so-called malignant hyperthermia. 234 00:13:10,740 --> 00:13:13,220 In order to identify this mutation in 235 00:13:13,220 --> 00:13:15,820 genome, we will use the sequence 236 00:13:15,820 --> 00:13:17,500 described in this publication, 237 00:13:18,900 --> 00:13:21,020 which we copy and paste into the new 238 00:13:21,020 --> 00:13:22,900 file in a text editor. 239 00:13:24,130 --> 00:13:26,730 Furthermore, from the genomic sequence, 240 00:13:26,730 --> 00:13:29,410 format we download the sequence 241 00:13:29,490 --> 00:13:31,730 Gene. And this 242 00:13:31,730 --> 00:13:34,410 Copy the sequence to 243 00:13:34,410 --> 00:13:36,770 Clipboard. 244 00:13:39,850 --> 00:13:41,730 And because it will be really long, it will be 245 00:13:41,730 --> 00:13:43,490 take a while to mark the whole 246 00:13:43,490 --> 00:13:44,210 Sequence. 247 00:13:49,630 --> 00:13:51,470 Again, we can insert this sequence into 248 00:13:51,470 --> 00:13:54,110 separate text file to 249 00:13:54,110 --> 00:13:55,430 They could continue to work with her. 250 00:13:57,750 --> 00:14:00,190 As the main tool for finding an area 251 00:14:00,190 --> 00:14:02,850 in the genome, where our 252 00:14:02,850 --> 00:14:05,530 mutation, we will use the 253 00:14:05,530 --> 00:14:08,370 BLAST, which performs local 254 00:14:08,370 --> 00:14:10,930 aliment from a sequence from the publication 255 00:14:11,690 --> 00:14:13,250 with genomic sequence. 256 00:14:14,450 --> 00:14:17,050 Therefore, we check alignment 2 or more 257 00:14:17,050 --> 00:14:19,700 Sequences. Until 258 00:14:19,700 --> 00:14:21,860 upper window, we insert our genomic 259 00:14:21,860 --> 00:14:24,660 sequence and to the bottom of this 260 00:14:24,660 --> 00:14:25,780 sequence from 261 00:14:27,210 --> 00:14:27,890 publication. 262 00:14:37,540 --> 00:14:40,340 We leave the menu checked, 263 00:14:40,340 --> 00:14:43,220 Highly simillar sekvences Megablast 264 00:14:43,260 --> 00:14:45,660 and tick the offer to show us the result 265 00:14:45,660 --> 00:14:48,610 appeared in a new window. Click 266 00:14:48,610 --> 00:14:50,890 BLAST will start a search. 267 00:14:53,480 --> 00:14:55,800 In the new window in the results, we can see in 268 00:14:55,800 --> 00:14:58,400 table 269 00:14:58,400 --> 00:15:01,200 alignments, in our case only one. 270 00:15:02,570 --> 00:15:04,410 And if we click on it, we'll see 271 00:15:04,570 --> 00:15:07,530 Assignment result 272 00:15:07,610 --> 00:15:10,410 of our sequence from a publication with a whole gene 273 00:15:10,410 --> 00:15:13,330 Sequences. Our sequence has 74 274 00:15:13,330 --> 00:15:16,090 base pairs. And genome-wide 275 00:15:16,090 --> 00:15:17,410 sequences start with 276 00:15:17,730 --> 00:15:20,610 18176 based and 277 00:15:20,610 --> 00:15:23,450 ends at 18249 278 00:15:23,450 --> 00:15:26,130 Base. We copy 279 00:15:26,130 --> 00:15:28,610 result in Word to make us better 280 00:15:28,610 --> 00:15:29,730 worked with text. 281 00:15:31,770 --> 00:15:34,250 We mark the place of the C/T mutation. 282 00:15:35,610 --> 00:15:37,570 And this is the position of our SNP, 283 00:15:38,410 --> 00:15:40,810 that causes amino acid substitution, 284 00:15:41,130 --> 00:15:44,090 because it's 1 base in a triplet 285 00:15:44,130 --> 00:15:46,130 TGC or CGC. 286 00:15:46,890 --> 00:15:49,690 We know from the publication that the CGC variant 287 00:15:49,690 --> 00:15:52,450 presents us with the original dominant 288 00:15:52,450 --> 00:15:55,250 allele of a capital N, which causes 289 00:15:55,250 --> 00:15:58,130 The position of the 615 amino acid is arginine. 290 00:15:59,530 --> 00:16:02,290 Variant with T mutation or TGC 291 00:16:02,370 --> 00:16:04,970 causes amino acid substitution into 292 00:16:04,970 --> 00:16:07,370 Cysteine and that represents to us 293 00:16:07,490 --> 00:16:10,370 recessive allele n, homozygous unit s 294 00:16:10,370 --> 00:16:12,330 This mutation makes them susceptible to stress. 295 00:16:13,890 --> 00:16:14,930 Thank you for your attention.