1 00:00:00,881 --> 00:00:02,483 The topic of this lecture is 2 00:00:02,483 --> 00:00:05,287 Bioinformatics Tools for Assessing Genome 3 00:00:05,287 --> 00:00:07,930 Variability of Animal Genetic Resources. 4 00:00:08,651 --> 00:00:10,653 The lecture is part of Module 2, 5 00:00:11,214 --> 00:00:13,297 Conservation and Sustainable 6 00:00:13,297 --> 00:00:16,100 Utilization of Animal Genetic Resources. 7 00:00:17,302 --> 00:00:19,865 The creation of this presentation was 8 00:00:19,865 --> 00:00:22,749 supported by the Erasmus+ KA2 grant 9 00:00:23,149 --> 00:00:25,232 as part of the project ISAGREED 10 00:00:25,472 --> 00:00:28,436 Innovation of Content and Structure 11 00:00:28,436 --> 00:00:31,239 of Study programs in the management of 12 00:00:31,239 --> 00:00:33,883 animal genetics and food resources 13 00:00:33,883 --> 00:00:35,485 using digitalization. 14 00:00:38,088 --> 00:00:41,012 What is bioinformatics and what is its 15 00:00:41,012 --> 00:00:43,895 importance? Bioinformatics is 16 00:00:43,935 --> 00:00:46,459 an interdisciplinary field that combines 17 00:00:46,539 --> 00:00:48,902 biology, computer science, 18 00:00:49,182 --> 00:00:51,946 statistics, and mathematics to 19 00:00:51,986 --> 00:00:54,629 analyze and interpret biological data 20 00:00:55,029 --> 00:00:57,913 using computational tools and algorithms. 21 00:00:58,794 --> 00:01:01,598 In animal genetics, it plays a crucial 22 00:01:01,598 --> 00:01:04,481 role in obtaining and analyzing 23 00:01:04,481 --> 00:01:06,804 extensive genomic data, such as 24 00:01:06,884 --> 00:01:09,688 DNA sequences, to gain 25 00:01:09,688 --> 00:01:12,091 insight into the genetic makeup of 26 00:01:12,091 --> 00:01:14,093 biological process of animals. 27 00:01:14,894 --> 00:01:17,538 In animal genetics, bioinformatics 28 00:01:17,538 --> 00:01:20,181 helps identify and characterize 29 00:01:20,421 --> 00:01:23,225 genes responsible for specific traits, 30 00:01:23,866 --> 00:01:26,669 diseases, or abnormalities in animals, 31 00:01:27,470 --> 00:01:30,354 for example. It can help identify gene 32 00:01:30,354 --> 00:01:32,837 associated with poor color, 33 00:01:33,158 --> 00:01:35,721 milk production, growth rate 34 00:01:35,961 --> 00:01:38,284 or sustainability to disease. 35 00:01:41,408 --> 00:01:43,931 It allows researchers to compare and 36 00:01:43,931 --> 00:01:46,615 analyze genomes of different animal 37 00:01:46,615 --> 00:01:49,338 species, understand their 38 00:01:49,338 --> 00:01:51,701 evolutionary relationships and 39 00:01:51,781 --> 00:01:53,904 identify common genetic elements. 40 00:01:54,785 --> 00:01:57,028 This can provide insights into the 41 00:01:57,028 --> 00:01:59,831 diversity and evolution of animal 42 00:01:59,831 --> 00:02:02,635 species. Bioinformatics 43 00:02:02,635 --> 00:02:04,878 helps in developing new breeding 44 00:02:04,878 --> 00:02:07,681 strategies and improving breeding 45 00:02:07,681 --> 00:02:10,565 practices. By analyzing 46 00:02:10,565 --> 00:02:13,288 genetic data, it helps identify 47 00:02:13,488 --> 00:02:16,412 animals with desirable traits for 48 00:02:16,412 --> 00:02:19,096 breeding programs, improves traits 49 00:02:19,096 --> 00:02:21,939 such as productivity, disease resistance, 50 00:02:21,939 --> 00:02:23,221 or adaptability. 51 00:02:24,422 --> 00:02:26,665 Bioinformatics supports the protection 52 00:02:26,745 --> 00:02:29,108 and conservation of endangered animal 53 00:02:29,108 --> 00:02:31,311 species by studying her 54 00:02:31,311 --> 00:02:34,034 genomes and identifying 55 00:02:34,034 --> 00:02:36,357 genetic markers for monitoring 56 00:02:36,357 --> 00:02:38,600 populations, assessing genetic 57 00:02:38,600 --> 00:02:41,083 diversity, and assessing 58 00:02:41,644 --> 00:02:43,646 in captive breeding programs. 59 00:02:45,249 --> 00:02:47,812 How are the data used for 60 00:02:47,812 --> 00:02:50,375 bioinformatics analysis, internet 61 00:02:50,375 --> 00:02:52,297 connection, computer 62 00:02:52,698 --> 00:02:55,582 programs, and often data are needed. 63 00:02:56,463 --> 00:02:59,106 Data such as DNA, RNA, or protein 64 00:02:59,106 --> 00:03:01,990 sequences are used, also genomic, 65 00:03:01,990 --> 00:03:04,953 transcriptomic, proteomic, metabolomic, 66 00:03:05,033 --> 00:03:07,917 phylogenetics, or structural data. 67 00:03:09,279 --> 00:03:11,842 Data for bioinformatics analysis are 68 00:03:11,842 --> 00:03:14,085 initially uploaded to online 69 00:03:14,085 --> 00:03:16,768 databases. There were about 70 00:03:16,768 --> 00:03:19,692 2,000 databases available online in 71 00:03:19,692 --> 00:03:21,855 January 2024. 72 00:03:22,896 --> 00:03:24,658 The most significant sequences 73 00:03:25,019 --> 00:03:27,862 databases are GenBank, ANA, 74 00:03:28,343 --> 00:03:30,826 UniProt, and the genome database 75 00:03:30,986 --> 00:03:31,707 Ensembl. 76 00:03:34,431 --> 00:03:36,193 On servers, where the 77 00:03:36,193 --> 00:03:38,516 databases are located, 78 00:03:39,157 --> 00:03:40,999 there are tools for searching, 79 00:03:41,560 --> 00:03:44,443 aligning, and analyzing bioinformatic 80 00:03:44,443 --> 00:03:47,407 data. Pairwise sequence 81 00:03:47,407 --> 00:03:49,570 alignment is used to identify 82 00:03:49,570 --> 00:03:52,293 regions of similaritythat may 83 00:03:52,293 --> 00:03:54,776 indicate functional, structural 84 00:03:55,337 --> 00:03:57,660 and or evolutionary relationships 85 00:03:58,060 --> 00:04:00,143 between two biological sequences, 86 00:04:00,303 --> 00:04:02,586 proteins or nucleic acid. 87 00:04:04,268 --> 00:04:06,631 Multiple sequence alignment, MSA, 88 00:04:07,432 --> 00:04:09,675 is the alignment of three or more 89 00:04:09,675 --> 00:04:12,519 biological sequences of similar length. 90 00:04:13,760 --> 00:04:16,243 From the output of MSA applications, 91 00:04:16,684 --> 00:04:19,367 homology can be inferred and the 92 00:04:19,367 --> 00:04:21,610 evolutionary relationship between 93 00:04:21,610 --> 00:04:23,092 sequences can be studied. 94 00:04:26,576 --> 00:04:28,739 The basic tool for aligning two 95 00:04:28,739 --> 00:04:31,623 sequences is BLAST, Basic 96 00:04:31,623 --> 00:04:34,586 Local Alignment Search Tool, on a 97 00:04:34,586 --> 00:04:36,269 server NCBI. 98 00:04:37,390 --> 00:04:40,033 BLAST searches for areas of 99 00:04:40,033 --> 00:04:42,436 similarity between biological sequences. 100 00:04:43,798 --> 00:04:45,801 The program compares nucleotide or 101 00:04:45,801 --> 00:04:48,244 protein sequencesThe sequence 102 00:04:48,244 --> 00:04:50,927 databases and calculate 103 00:04:50,927 --> 00:04:53,450 statistical significance. It is 104 00:04:53,450 --> 00:04:56,054 possible to compare your own sequence 105 00:04:56,054 --> 00:04:58,296 with database sequences in Genbank. 106 00:04:59,258 --> 00:05:01,661 It is possible to compare specific 2 107 00:05:01,741 --> 00:05:04,304 sequences. The primary 108 00:05:04,304 --> 00:05:06,747 focus is on local alignment, 109 00:05:06,867 --> 00:05:09,511 also available for global alignment. 110 00:05:11,753 --> 00:05:14,317 Another tool for pairwise sequence 111 00:05:14,317 --> 00:05:17,040 alignment is on the European 112 00:05:17,240 --> 00:05:20,084 EMBL-ABI server, and it is 113 00:05:20,164 --> 00:05:20,805 EMBOSS. 114 00:05:23,168 --> 00:05:25,371 When do you use local or global 115 00:05:25,531 --> 00:05:28,414 alignment? Local, using 116 00:05:28,414 --> 00:05:31,258 Smith-Waterman algorithmis used 117 00:05:31,258 --> 00:05:33,941 for more different, evolutionary 118 00:05:33,941 --> 00:05:36,745 distant sequences. It is 119 00:05:36,745 --> 00:05:39,669 limited to assigning unique segments 120 00:05:39,789 --> 00:05:42,192 and stops where the sequence 121 00:05:42,993 --> 00:05:44,755 diverge significantly. 122 00:05:46,838 --> 00:05:48,920 Global alignment using 123 00:05:48,920 --> 00:05:51,604 Niedelmann-Wunsch algorithm is the 124 00:05:51,604 --> 00:05:53,566 most suitable for 125 00:05:53,566 --> 00:05:56,410 sequences that are similar and 126 00:05:56,410 --> 00:05:58,492 approximately the same length. 127 00:05:59,614 --> 00:06:02,057 attempt to align sequences over their 128 00:06:02,057 --> 00:06:04,860 entire length even at the cost of 129 00:06:04,860 --> 00:06:07,784 introducing gaps into 130 00:06:07,784 --> 00:06:09,426 one or both sequences. 131 00:06:15,033 --> 00:06:17,436 We will demonstrate a modal alignment 132 00:06:17,436 --> 00:06:19,679 process. We want to 133 00:06:19,679 --> 00:06:22,282 determine which two sequences A 134 00:06:22,282 --> 00:06:24,966 and B or C and D 135 00:06:25,286 --> 00:06:26,968 are more similar to each other. 136 00:06:28,170 --> 00:06:30,973 align the sequences over their entire 137 00:06:30,973 --> 00:06:33,456 length. It is write 138 00:06:33,697 --> 00:06:36,580 them into two rows placed below each 139 00:06:36,660 --> 00:06:39,304 other so that identical positions, 140 00:06:39,865 --> 00:06:42,348 bases or amino acids are aligned. 141 00:06:43,469 --> 00:06:46,193 Each pair and null pair will be 142 00:06:46,193 --> 00:06:48,676 assigned a value. For example, 143 00:06:48,756 --> 00:06:51,279 1 for match and 144 00:06:51,399 --> 00:06:53,081 0 for mismatch. 145 00:06:55,644 --> 00:06:58,088 Both alignments show that the first pair 146 00:06:58,088 --> 00:07:01,011 of sequences, A and B, have eight 147 00:07:01,171 --> 00:07:04,055 match and two mismatch. And the 148 00:07:04,055 --> 00:07:06,698 second pair of sequences, C and D, 149 00:07:06,939 --> 00:07:09,742 have 17 match and three mismatch. 150 00:07:10,463 --> 00:07:13,187 However, which pair of 151 00:07:13,187 --> 00:07:15,029 sequences is more similar? 152 00:07:18,553 --> 00:07:21,117 It is necessary to calculate normalized 153 00:07:21,437 --> 00:07:24,321 similarity values, score. We 154 00:07:24,321 --> 00:07:26,804 can compare the similarity of pairs of 155 00:07:26,804 --> 00:07:28,566 sequences of different lengths, 156 00:07:29,607 --> 00:07:32,171 multiply the number of matches by their 157 00:07:32,171 --> 00:07:34,654 value 1, and add to 158 00:07:35,134 --> 00:07:37,297 it the number of mismatches 159 00:07:37,537 --> 00:07:39,860 multiplied by their value 0. 160 00:07:40,741 --> 00:07:43,705 The normalized score is determined 161 00:07:43,705 --> 00:07:46,188 by dividing the calculated value by the 162 00:07:46,188 --> 00:07:48,591 length of the alignment. In our 163 00:07:48,591 --> 00:07:51,154 case, the alignment of sequences 164 00:07:51,154 --> 00:07:53,718 C and D has a higher score, 165 00:07:54,198 --> 00:07:55,880 so they are more similar. 166 00:08:00,126 --> 00:08:02,769 In another example, we 167 00:08:02,769 --> 00:08:05,252 align two sequences of different lengths. 168 00:08:06,133 --> 00:08:08,777 If we determine the score for 169 00:08:09,097 --> 00:08:11,821 an alignment sequence, it would have a 170 00:08:11,821 --> 00:08:13,663 value of 6. 171 00:08:14,624 --> 00:08:17,307 After alignment, the score increases to 172 00:08:17,307 --> 00:08:19,991 9. The score increases by 173 00:08:19,991 --> 00:08:22,474 inserting gaps. Gaps 174 00:08:22,474 --> 00:08:25,117 increase the number of aligned identical 175 00:08:26,159 --> 00:08:26,960 residues. 176 00:08:29,523 --> 00:08:32,126 There are many online tools for multiple 177 00:08:32,126 --> 00:08:33,258 sequence alignment . 178 00:08:34,970 --> 00:08:37,693 Among the oldest is Clustal, but 179 00:08:37,693 --> 00:08:40,296 others have been gradually developed such 180 00:08:40,296 --> 00:08:43,220 as MAFT, T-Cofee, MASL, 181 00:08:43,460 --> 00:08:45,703 K-ALIGN, or COBALT. 182 00:08:46,745 --> 00:08:49,068 Each was developed for different types of 183 00:08:49,068 --> 00:08:50,830 sequences and their lengths. 184 00:08:53,073 --> 00:08:55,876 The principle of MSA, aligning 185 00:08:55,956 --> 00:08:58,840 three or more sequences, is similar to 186 00:08:58,840 --> 00:09:01,683 that of BLAST, based on pairwise 187 00:09:01,804 --> 00:09:04,127 alignment. However, the 188 00:09:04,127 --> 00:09:06,930 calculations are more complex. This 189 00:09:06,930 --> 00:09:08,452 can reveal mutations, 190 00:09:08,692 --> 00:09:11,496 substitutions or insertion-deletions. 191 00:09:12,617 --> 00:09:15,100 These comparisons are used to derive 192 00:09:15,100 --> 00:09:17,904 evolutionary relationships through 193 00:09:17,904 --> 00:09:20,788 phylogenetic analyses. It can highlight 194 00:09:20,868 --> 00:09:23,111 homologous features between 195 00:09:23,111 --> 00:09:25,834 sequences. The results may be a 196 00:09:25,834 --> 00:09:27,997 phylogenetic tree expressing 197 00:09:28,077 --> 00:09:30,880 evolutionary distances between sequences. 198 00:09:34,164 --> 00:09:36,167 And thank you for your attention.