1
00:00:00,881 --> 00:00:02,483
The topic of this lecture is

2
00:00:02,483 --> 00:00:05,287
Bioinformatics Tools for Assessing Genome

3
00:00:05,287 --> 00:00:07,930
Variability of Animal Genetic Resources.

4
00:00:08,651 --> 00:00:10,653
The lecture is part of Module 2,

5
00:00:11,214 --> 00:00:13,297
Conservation and Sustainable

6
00:00:13,297 --> 00:00:16,100
Utilization of Animal Genetic Resources.

7
00:00:17,302 --> 00:00:19,865
The creation of this presentation was

8
00:00:19,865 --> 00:00:22,749
supported by the Erasmus+ KA2 grant

9
00:00:23,149 --> 00:00:25,232
as part of the project ISAGREED

10
00:00:25,472 --> 00:00:28,436
Innovation of Content and Structure

11
00:00:28,436 --> 00:00:31,239
of Study programs in the management of

12
00:00:31,239 --> 00:00:33,883
animal genetics and food resources

13
00:00:33,883 --> 00:00:35,485
using digitalization.

14
00:00:38,088 --> 00:00:41,012
What is bioinformatics and what is its

15
00:00:41,012 --> 00:00:43,895
importance? Bioinformatics is

16
00:00:43,935 --> 00:00:46,459
an interdisciplinary field that combines

17
00:00:46,539 --> 00:00:48,902
biology, computer science,

18
00:00:49,182 --> 00:00:51,946
statistics, and mathematics to

19
00:00:51,986 --> 00:00:54,629
analyze and interpret biological data

20
00:00:55,029 --> 00:00:57,913
using computational tools and algorithms.

21
00:00:58,794 --> 00:01:01,598
In animal genetics, it plays a crucial

22
00:01:01,598 --> 00:01:04,481
role in obtaining and analyzing

23
00:01:04,481 --> 00:01:06,804
extensive genomic data, such as

24
00:01:06,884 --> 00:01:09,688
DNA sequences, to gain

25
00:01:09,688 --> 00:01:12,091
insight into the genetic makeup of

26
00:01:12,091 --> 00:01:14,093
biological process of animals.

27
00:01:14,894 --> 00:01:17,538
In animal genetics, bioinformatics

28
00:01:17,538 --> 00:01:20,181
helps identify and characterize

29
00:01:20,421 --> 00:01:23,225
genes responsible for specific traits,

30
00:01:23,866 --> 00:01:26,669
diseases, or abnormalities in animals,

31
00:01:27,470 --> 00:01:30,354
for example. It can help identify gene

32
00:01:30,354 --> 00:01:32,837
associated with poor color,

33
00:01:33,158 --> 00:01:35,721
milk production, growth rate

34
00:01:35,961 --> 00:01:38,284
or sustainability to disease.

35
00:01:41,408 --> 00:01:43,931
It allows researchers to compare and

36
00:01:43,931 --> 00:01:46,615
analyze genomes of different animal

37
00:01:46,615 --> 00:01:49,338
species, understand their

38
00:01:49,338 --> 00:01:51,701
evolutionary relationships and

39
00:01:51,781 --> 00:01:53,904
identify common genetic elements.

40
00:01:54,785 --> 00:01:57,028
This can provide insights into the

41
00:01:57,028 --> 00:01:59,831
diversity and evolution of animal

42
00:01:59,831 --> 00:02:02,635
species. Bioinformatics

43
00:02:02,635 --> 00:02:04,878
helps in developing new breeding

44
00:02:04,878 --> 00:02:07,681
strategies and improving breeding

45
00:02:07,681 --> 00:02:10,565
practices. By analyzing

46
00:02:10,565 --> 00:02:13,288
genetic data, it helps identify

47
00:02:13,488 --> 00:02:16,412
animals with desirable traits for

48
00:02:16,412 --> 00:02:19,096
breeding programs, improves traits

49
00:02:19,096 --> 00:02:21,939
such as productivity, disease resistance,

50
00:02:21,939 --> 00:02:23,221
or adaptability.

51
00:02:24,422 --> 00:02:26,665
Bioinformatics supports the protection

52
00:02:26,745 --> 00:02:29,108
and conservation of endangered animal

53
00:02:29,108 --> 00:02:31,311
species by studying her

54
00:02:31,311 --> 00:02:34,034
genomes and identifying

55
00:02:34,034 --> 00:02:36,357
genetic markers for monitoring

56
00:02:36,357 --> 00:02:38,600
populations, assessing genetic

57
00:02:38,600 --> 00:02:41,083
diversity, and assessing

58
00:02:41,644 --> 00:02:43,646
in captive breeding programs.

59
00:02:45,249 --> 00:02:47,812
How are the data used for

60
00:02:47,812 --> 00:02:50,375
bioinformatics analysis, internet

61
00:02:50,375 --> 00:02:52,297
connection, computer

62
00:02:52,698 --> 00:02:55,582
programs, and often data are needed.

63
00:02:56,463 --> 00:02:59,106
Data such as DNA, RNA, or protein

64
00:02:59,106 --> 00:03:01,990
sequences are used, also genomic,

65
00:03:01,990 --> 00:03:04,953
transcriptomic, proteomic, metabolomic,

66
00:03:05,033 --> 00:03:07,917
phylogenetics, or structural data.

67
00:03:09,279 --> 00:03:11,842
Data for bioinformatics analysis are

68
00:03:11,842 --> 00:03:14,085
initially uploaded to online

69
00:03:14,085 --> 00:03:16,768
databases. There were about

70
00:03:16,768 --> 00:03:19,692
2,000 databases available online in

71
00:03:19,692 --> 00:03:21,855
January 2024.

72
00:03:22,896 --> 00:03:24,658
The most significant sequences

73
00:03:25,019 --> 00:03:27,862
databases are GenBank, ANA,

74
00:03:28,343 --> 00:03:30,826
UniProt, and the genome database

75
00:03:30,986 --> 00:03:31,707
Ensembl.

76
00:03:34,431 --> 00:03:36,193
On servers, where the

77
00:03:36,193 --> 00:03:38,516
databases are located,

78
00:03:39,157 --> 00:03:40,999
there are tools for searching,

79
00:03:41,560 --> 00:03:44,443
aligning, and analyzing bioinformatic

80
00:03:44,443 --> 00:03:47,407
data. Pairwise sequence

81
00:03:47,407 --> 00:03:49,570
alignment is used to identify

82
00:03:49,570 --> 00:03:52,293
regions of similaritythat may

83
00:03:52,293 --> 00:03:54,776
indicate functional, structural

84
00:03:55,337 --> 00:03:57,660
and or evolutionary relationships

85
00:03:58,060 --> 00:04:00,143
between two biological sequences,

86
00:04:00,303 --> 00:04:02,586
proteins or nucleic acid.

87
00:04:04,268 --> 00:04:06,631
Multiple sequence alignment, MSA,

88
00:04:07,432 --> 00:04:09,675
is the alignment of three or more

89
00:04:09,675 --> 00:04:12,519
biological sequences of similar length.

90
00:04:13,760 --> 00:04:16,243
From the output of MSA applications,

91
00:04:16,684 --> 00:04:19,367
homology can be inferred and the

92
00:04:19,367 --> 00:04:21,610
evolutionary relationship between

93
00:04:21,610 --> 00:04:23,092
sequences can be studied.

94
00:04:26,576 --> 00:04:28,739
The basic tool for aligning two

95
00:04:28,739 --> 00:04:31,623
sequences is BLAST, Basic

96
00:04:31,623 --> 00:04:34,586
Local Alignment Search Tool, on a

97
00:04:34,586 --> 00:04:36,269
server NCBI.

98
00:04:37,390 --> 00:04:40,033
BLAST searches for areas of

99
00:04:40,033 --> 00:04:42,436
similarity between biological sequences.

100
00:04:43,798 --> 00:04:45,801
The program compares nucleotide or

101
00:04:45,801 --> 00:04:48,244
protein sequencesThe sequence

102
00:04:48,244 --> 00:04:50,927
databases and calculate

103
00:04:50,927 --> 00:04:53,450
statistical significance. It is

104
00:04:53,450 --> 00:04:56,054
possible to compare your own sequence

105
00:04:56,054 --> 00:04:58,296
with database sequences in Genbank.

106
00:04:59,258 --> 00:05:01,661
It is possible to compare specific 2

107
00:05:01,741 --> 00:05:04,304
sequences. The primary

108
00:05:04,304 --> 00:05:06,747
focus is on local alignment,

109
00:05:06,867 --> 00:05:09,511
also available for global alignment.

110
00:05:11,753 --> 00:05:14,317
Another tool for pairwise sequence

111
00:05:14,317 --> 00:05:17,040
alignment is on the European

112
00:05:17,240 --> 00:05:20,084
EMBL-ABI server, and it is

113
00:05:20,164 --> 00:05:20,805
EMBOSS.

114
00:05:23,168 --> 00:05:25,371
When do you use local or global

115
00:05:25,531 --> 00:05:28,414
alignment? Local, using

116
00:05:28,414 --> 00:05:31,258
Smith-Waterman algorithmis used

117
00:05:31,258 --> 00:05:33,941
for more different, evolutionary

118
00:05:33,941 --> 00:05:36,745
distant sequences. It is

119
00:05:36,745 --> 00:05:39,669
limited to assigning unique segments

120
00:05:39,789 --> 00:05:42,192
and stops where the sequence

121
00:05:42,993 --> 00:05:44,755
diverge significantly.

122
00:05:46,838 --> 00:05:48,920
Global alignment using

123
00:05:48,920 --> 00:05:51,604
Niedelmann-Wunsch algorithm is the

124
00:05:51,604 --> 00:05:53,566
most suitable for

125
00:05:53,566 --> 00:05:56,410
sequences that are similar and

126
00:05:56,410 --> 00:05:58,492
approximately the same length.

127
00:05:59,614 --> 00:06:02,057
attempt to align sequences over their

128
00:06:02,057 --> 00:06:04,860
entire length even at the cost of

129
00:06:04,860 --> 00:06:07,784
introducing gaps into

130
00:06:07,784 --> 00:06:09,426
one or both sequences.

131
00:06:15,033 --> 00:06:17,436
We will demonstrate a modal alignment

132
00:06:17,436 --> 00:06:19,679
process. We want to

133
00:06:19,679 --> 00:06:22,282
determine which two sequences A

134
00:06:22,282 --> 00:06:24,966
and B or C and D

135
00:06:25,286 --> 00:06:26,968
are more similar to each other.

136
00:06:28,170 --> 00:06:30,973
align the sequences over their entire

137
00:06:30,973 --> 00:06:33,456
length. It is write

138
00:06:33,697 --> 00:06:36,580
them into two rows placed below each

139
00:06:36,660 --> 00:06:39,304
other so that identical positions,

140
00:06:39,865 --> 00:06:42,348
bases or amino acids are aligned.

141
00:06:43,469 --> 00:06:46,193
Each pair and null pair will be

142
00:06:46,193 --> 00:06:48,676
assigned a value. For example,

143
00:06:48,756 --> 00:06:51,279
1 for match and

144
00:06:51,399 --> 00:06:53,081
0 for mismatch.

145
00:06:55,644 --> 00:06:58,088
Both alignments show that the first pair

146
00:06:58,088 --> 00:07:01,011
of sequences, A and B, have eight

147
00:07:01,171 --> 00:07:04,055
match and two mismatch. And the

148
00:07:04,055 --> 00:07:06,698
second pair of sequences, C and D,

149
00:07:06,939 --> 00:07:09,742
have 17 match and three mismatch.

150
00:07:10,463 --> 00:07:13,187
However, which pair of

151
00:07:13,187 --> 00:07:15,029
sequences is more similar?

152
00:07:18,553 --> 00:07:21,117
It is necessary to calculate normalized

153
00:07:21,437 --> 00:07:24,321
similarity values, score. We

154
00:07:24,321 --> 00:07:26,804
can compare the similarity of pairs of

155
00:07:26,804 --> 00:07:28,566
sequences of different lengths,

156
00:07:29,607 --> 00:07:32,171
multiply the number of matches by their

157
00:07:32,171 --> 00:07:34,654
value 1, and add to

158
00:07:35,134 --> 00:07:37,297
it the number of mismatches

159
00:07:37,537 --> 00:07:39,860
multiplied by their value 0.

160
00:07:40,741 --> 00:07:43,705
The normalized score is determined

161
00:07:43,705 --> 00:07:46,188
by dividing the calculated value by the

162
00:07:46,188 --> 00:07:48,591
length of the alignment. In our

163
00:07:48,591 --> 00:07:51,154
case, the alignment of sequences

164
00:07:51,154 --> 00:07:53,718
C and D has a higher score,

165
00:07:54,198 --> 00:07:55,880
so they are more similar.

166
00:08:00,126 --> 00:08:02,769
In another example, we

167
00:08:02,769 --> 00:08:05,252
align two sequences of different lengths.

168
00:08:06,133 --> 00:08:08,777
If we determine the score for

169
00:08:09,097 --> 00:08:11,821
an alignment sequence, it would have a

170
00:08:11,821 --> 00:08:13,663
value of 6.

171
00:08:14,624 --> 00:08:17,307
After alignment, the score increases to

172
00:08:17,307 --> 00:08:19,991
9. The score increases by

173
00:08:19,991 --> 00:08:22,474
inserting gaps. Gaps

174
00:08:22,474 --> 00:08:25,117
increase the number of aligned identical

175
00:08:26,159 --> 00:08:26,960
residues.

176
00:08:29,523 --> 00:08:32,126
There are many online tools for multiple

177
00:08:32,126 --> 00:08:33,258
sequence alignment .

178
00:08:34,970 --> 00:08:37,693
Among the oldest is Clustal, but

179
00:08:37,693 --> 00:08:40,296
others have been gradually developed such

180
00:08:40,296 --> 00:08:43,220
as MAFT, T-Cofee, MASL,

181
00:08:43,460 --> 00:08:45,703
K-ALIGN, or COBALT.

182
00:08:46,745 --> 00:08:49,068
Each was developed for different types of

183
00:08:49,068 --> 00:08:50,830
sequences and their lengths.

184
00:08:53,073 --> 00:08:55,876
The principle of MSA, aligning

185
00:08:55,956 --> 00:08:58,840
three or more sequences, is similar to

186
00:08:58,840 --> 00:09:01,683
that of BLAST, based on pairwise

187
00:09:01,804 --> 00:09:04,127
alignment. However, the

188
00:09:04,127 --> 00:09:06,930
calculations are more complex. This

189
00:09:06,930 --> 00:09:08,452
can reveal mutations,

190
00:09:08,692 --> 00:09:11,496
substitutions or insertion-deletions.

191
00:09:12,617 --> 00:09:15,100
These comparisons are used to derive

192
00:09:15,100 --> 00:09:17,904
evolutionary relationships through

193
00:09:17,904 --> 00:09:20,788
phylogenetic analyses. It can highlight

194
00:09:20,868 --> 00:09:23,111
homologous features between

195
00:09:23,111 --> 00:09:25,834
sequences. The results may be a

196
00:09:25,834 --> 00:09:27,997
phylogenetic tree expressing

197
00:09:28,077 --> 00:09:30,880
evolutionary distances between sequences.

198
00:09:34,164 --> 00:09:36,167
And thank you for your attention.