1 00:00:01,280 --> 00:00:04,160 Hello. In this 2 00:00:04,160 --> 00:00:06,560 example, we will 3 00:00:06,560 --> 00:00:09,160 calculate the estimation of coefficient 4 00:00:09,160 --> 00:00:11,680 of heritability using analysis of 5 00:00:11,680 --> 00:00:14,480 variance on performance values in 6 00:00:14,480 --> 00:00:15,760 related individuals. 7 00:00:19,120 --> 00:00:22,000 In our case, we use groups 8 00:00:22,200 --> 00:00:23,600 of half sibs by sires. 9 00:00:25,120 --> 00:00:27,840 40 progeny were chosen at random. 10 00:00:28,160 --> 00:00:30,960 Each sire mated to 8 dams, with each 11 00:00:30,960 --> 00:00:32,560 mating producing one male. 12 00:00:34,640 --> 00:00:37,360 Five sire families were chosen at 13 00:00:37,360 --> 00:00:39,600 random and progeny 8 week 14 00:00:40,080 --> 00:00:42,400 body weight in grams 15 00:00:42,880 --> 00:00:43,520 obtained. 16 00:00:47,120 --> 00:00:49,600 Genetic variances and heritability 17 00:00:50,000 --> 00:00:52,960 are to be estimated. We use 18 00:00:53,040 --> 00:00:54,640 analysis of variances. 19 00:00:57,600 --> 00:01:00,080 In the first table we can see 40 20 00:01:00,080 --> 00:01:02,400 individuals values of performance. 21 00:01:04,240 --> 00:01:06,880 In this statistical analysis we use 22 00:01:07,200 --> 00:01:10,080 simple linear model with one 23 00:01:10,440 --> 00:01:11,760 effect of fathers. 24 00:01:15,080 --> 00:01:18,000 Our linear model is "y" which is the 25 00:01:18,000 --> 00:01:20,840 performance rate of the jth offspring 26 00:01:22,880 --> 00:01:25,200 and this "y" is equal to means 27 00:01:25,760 --> 00:01:28,320 plus the effect of i-th 28 00:01:28,400 --> 00:01:29,200 father 29 00:01:31,440 --> 00:01:33,200 plus the residue of 30 00:01:34,480 --> 00:01:34,800 ei. 31 00:01:37,760 --> 00:01:39,840 These are random effects that we are 32 00:01:39,840 --> 00:01:42,600 unable detect and include in the 33 00:01:42,600 --> 00:01:43,360 calculation. 34 00:01:49,950 --> 00:01:52,590 To simplify the manual calculation, we 35 00:01:52,590 --> 00:01:54,830 don't have to perform all the basic 36 00:01:54,830 --> 00:01:57,710 operations. We have them pre-calculated 37 00:01:57,790 --> 00:01:59,870 in the second table, where the sum 38 00:02:00,880 --> 00:02:03,440 groups, sums of squares and 39 00:02:03,440 --> 00:02:05,280 squares of sums are shown. 40 00:02:05,920 --> 00:02:07,600 Also the total sums. 41 00:02:09,600 --> 00:02:12,000 We then use these values in the 42 00:02:12,000 --> 00:02:14,880 equations to calculate the sum of squares 43 00:02:15,160 --> 00:02:17,120 of deviations from the mean. 44 00:02:20,160 --> 00:02:22,960 For calculation we use program 45 00:02:23,280 --> 00:02:26,080 R and Rstudio which 46 00:02:26,160 --> 00:02:28,720 you can see on your left. 47 00:02:31,840 --> 00:02:34,240 We have written the impute values as 48 00:02:34,320 --> 00:02:36,480 objects with name. 49 00:02:37,720 --> 00:02:40,560 A large Y squared means the 50 00:02:40,560 --> 00:02:42,240 square of the total sum. 51 00:02:44,400 --> 00:02:46,640 The next value is the square of the sum 52 00:02:47,120 --> 00:02:49,440 for the group divided by the number of 53 00:02:49,440 --> 00:02:51,960 individuals in the group and summed 54 00:02:52,160 --> 00:02:53,440 over all groups. 55 00:02:55,200 --> 00:02:57,680 The third value is the sum of squares of 56 00:02:57,680 --> 00:03:00,080 the individual values for the 57 00:03:00,080 --> 00:03:01,440 entire data set. 58 00:03:04,800 --> 00:03:07,200 P is number of fathers, 59 00:03:07,360 --> 00:03:10,240 5, N is number of 60 00:03:10,240 --> 00:03:12,160 individuals, 40, 61 00:03:13,920 --> 00:03:16,680 and n0 is weighted 62 00:03:16,680 --> 00:03:18,800 number of offspring per sire. 63 00:03:22,480 --> 00:03:24,920 These objects must be loaded into the 64 00:03:24,960 --> 00:03:27,840 computer's memory by selecting them with 65 00:03:28,080 --> 00:03:30,800 the mouse and running the script by 66 00:03:30,800 --> 00:03:33,200 pressing the Control and 67 00:03:33,280 --> 00:03:35,360 Enter key combination. 68 00:03:39,480 --> 00:03:41,920 We must now calculate the sum of squares 69 00:03:41,960 --> 00:03:44,080 of the deviation from the mean according 70 00:03:44,400 --> 00:03:46,480 to the linear ANOVA equation 71 00:03:50,880 --> 00:03:53,600 first. We calculate the sum of squares 72 00:03:53,840 --> 00:03:56,160 of the deviations from the mean between 73 00:03:56,160 --> 00:03:59,040 groups of half sibs. It is between 74 00:03:59,040 --> 00:04:01,520 the fathers. We name them 75 00:04:01,840 --> 00:04:02,720 SSA. 76 00:04:05,040 --> 00:04:07,440 The result is displayed by pressing the 77 00:04:07,440 --> 00:04:08,880 Control key and Enter. 78 00:04:16,590 --> 00:04:19,310 Next, we compute the sum of squares 79 00:04:19,590 --> 00:04:22,030 of the deviations from the mean 80 00:04:22,430 --> 00:04:25,070 within the groups of half sibs 81 00:04:25,830 --> 00:04:26,670 SSE. 82 00:04:31,160 --> 00:04:34,080 After creating the script according to 83 00:04:34,080 --> 00:04:36,880 the formula, we mark it and run 84 00:04:36,880 --> 00:04:39,720 it. Again we can see the 85 00:04:39,720 --> 00:04:40,880 results below. 86 00:05:01,760 --> 00:05:04,400 The next step is to count the mean of 87 00:05:04,400 --> 00:05:06,680 square, which are the 88 00:05:06,720 --> 00:05:07,760 variances, 89 00:05:10,800 --> 00:05:13,720 and again we calculate the 90 00:05:13,720 --> 00:05:16,480 mean of square MSa 91 00:05:16,960 --> 00:05:19,920 between groups and mean of 92 00:05:19,920 --> 00:05:22,640 square residual MSe 93 00:05:22,880 --> 00:05:23,920 within groups. 94 00:05:28,680 --> 00:05:31,360 We calculate the mean of square between 95 00:05:31,360 --> 00:05:33,840 groups of fathers MSa 96 00:05:34,880 --> 00:05:37,760 by dividing SSa by the degrees of 97 00:05:37,760 --> 00:05:40,560 freedom. In this case 98 00:05:40,720 --> 00:05:42,640 is d the number of fathers 99 00:05:43,440 --> 00:05:43,920 minus 1. 100 00:05:51,180 --> 00:05:54,020 We obtain the residual mean of 101 00:05:54,020 --> 00:05:56,940 square MSe by dividing 102 00:05:57,180 --> 00:05:59,900 SSe by the degrees of freedom. 103 00:05:59,980 --> 00:06:02,860 It is the number of offspring minus 104 00:06:02,860 --> 00:06:04,640 the number of sires. 105 00:06:08,880 --> 00:06:11,360 And this is the result of statistical 106 00:06:11,360 --> 00:06:12,960 analysis of variance. 107 00:06:14,480 --> 00:06:17,280 The genetic part follows where the last 108 00:06:17,280 --> 00:06:20,240 column of the ANOVA table describes 109 00:06:20,240 --> 00:06:22,160 the composition of the mean of square. 110 00:06:23,200 --> 00:06:26,080 The genetic variance contain it in mean 111 00:06:26,080 --> 00:06:28,000 of square of sires MSa 112 00:06:29,680 --> 00:06:32,600 and MSe residual mean 113 00:06:32,600 --> 00:06:35,120 of square directly equal to the 114 00:06:35,120 --> 00:06:37,040 environmental variance. 115 00:06:41,480 --> 00:06:44,040 We can, therefore, estimate the 116 00:06:44,080 --> 00:06:46,480 genetic variance as the difference 117 00:06:46,560 --> 00:06:48,880 between variance between 118 00:06:48,880 --> 00:06:51,680 sires, MSA, and 119 00:06:51,840 --> 00:06:54,320 variance residual, 120 00:06:54,400 --> 00:06:57,200 MSE. And this 121 00:06:57,600 --> 00:07:00,560 difference is divided by n0 - 122 00:07:01,840 --> 00:07:04,640 weighted number of offspring per sire. 123 00:07:07,840 --> 00:07:10,640 So our genetic variation by 124 00:07:10,640 --> 00:07:12,560 sire is. 125 00:07:12,800 --> 00:07:15,360 245.7 126 00:07:20,080 --> 00:07:22,960 So, let's write down that the 127 00:07:23,440 --> 00:07:26,240 environmental variation is directly 128 00:07:26,280 --> 00:07:28,960 equal to the MSE, residual 129 00:07:29,040 --> 00:07:30,080 mean of square. 130 00:07:32,120 --> 00:07:34,560 We have now calculated both 131 00:07:34,560 --> 00:07:36,640 components of variation, 132 00:07:36,880 --> 00:07:39,200 genetic and environmental. 133 00:07:40,720 --> 00:07:43,680 From these both variations we have 134 00:07:43,680 --> 00:07:46,400 to calculate the so-called intraclass 135 00:07:47,040 --> 00:07:48,800 correlation coefficient. 136 00:07:50,000 --> 00:07:52,000 We will mark this with an R 137 00:07:52,960 --> 00:07:53,760 or rho. 138 00:07:55,680 --> 00:07:58,400 This coefficient is equal to genetic 139 00:07:58,400 --> 00:08:00,880 variation divided by the total 140 00:08:00,880 --> 00:08:03,760 phenotype phenotypic variance, which is 141 00:08:03,760 --> 00:08:06,400 the genetic plus environmental 142 00:08:06,800 --> 00:08:07,920 variance. 143 00:08:09,640 --> 00:08:12,320 This equation is very reminiscent of the 144 00:08:12,320 --> 00:08:14,640 heritability, because heritability is 145 00:08:14,640 --> 00:08:17,040 defined as proportion of genetic variance 146 00:08:17,040 --> 00:08:19,440 to total phenotypic variances. 147 00:08:20,960 --> 00:08:23,040 Why is that not heritability? 148 00:08:24,480 --> 00:08:27,120 It is because we are calculated 149 00:08:27,120 --> 00:08:29,440 based on groups of half sibs. 150 00:08:31,240 --> 00:08:33,600 The value of heritability, h^2, 151 00:08:34,160 --> 00:08:36,720 is straight four times this 152 00:08:36,720 --> 00:08:39,640 intraclass correlation coefficient. The 153 00:08:39,640 --> 00:08:42,480 reason is simple, because the genetic 154 00:08:42,480 --> 00:08:44,320 similarity between half sibs 155 00:08:44,960 --> 00:08:47,920 are 25%, 1/4. 156 00:08:49,280 --> 00:08:51,520 We also have to calculate the value of 157 00:08:51,520 --> 00:08:53,800 the standard error of estimation of the coefficient of 158 00:08:54,160 --> 00:08:57,040 heritability. The formula is again 159 00:08:57,040 --> 00:08:58,400 in the Word document. 160 00:09:00,400 --> 00:09:02,400 As you can see, the 161 00:09:03,040 --> 00:09:05,280 standard error of estimation of 162 00:09:05,280 --> 00:09:06,160 heritability 163 00:09:08,160 --> 00:09:08,960 is very high - 164 00:09:10,600 --> 00:09:11,760 0.556. 165 00:09:13,760 --> 00:09:16,720 In order for us in that calculated value 166 00:09:16,720 --> 00:09:19,240 of heritability we get a little bit of 167 00:09:19,240 --> 00:09:22,160 faith, so the standard error would have 168 00:09:22,160 --> 00:09:25,040 to be less than 0.05. 169 00:09:26,240 --> 00:09:28,160 In our case, far from it. 170 00:09:30,320 --> 00:09:32,800 It's because it's only a model example. 171 00:09:33,600 --> 00:09:36,480 We have got a little offsprings, few 172 00:09:36,480 --> 00:09:38,080 groups of half-sibs, 173 00:09:39,400 --> 00:09:41,760 which is why the standard 174 00:09:41,800 --> 00:09:43,360 error is so high. 175 00:09:46,080 --> 00:09:48,720 If we use a statistical software like R, 176 00:09:48,960 --> 00:09:50,920 we don't have to calculate ANOVA 177 00:09:50,920 --> 00:09:52,800 manually, but we can use the function. 178 00:09:53,920 --> 00:09:56,760 But first, you have to load the data. 179 00:09:58,400 --> 00:10:00,800 We have the database prepared in Excel 180 00:10:00,800 --> 00:10:03,720 spreadsheet that is available to you, 181 00:10:04,480 --> 00:10:07,360 so you can... It is 182 00:10:07,360 --> 00:10:09,920 very simple. In RStudio 183 00:10:10,240 --> 00:10:13,200 load the data. So we 184 00:10:13,200 --> 00:10:15,440 are gonna put FILE up here. 185 00:10:16,400 --> 00:10:19,240 We will put Import dataset and select the 186 00:10:19,240 --> 00:10:22,080 menu From Excel. We 187 00:10:22,080 --> 00:10:24,400 will find our Excel file. 188 00:10:25,840 --> 00:10:28,160 As you can see we have already got the 189 00:10:28,160 --> 00:10:30,640 data. The table is loaded nicely. 190 00:10:31,680 --> 00:10:34,640 I have a father, sire here in 191 00:10:34,640 --> 00:10:37,440 the ABCD form and in the second 192 00:10:37,440 --> 00:10:39,400 column are the values of the 193 00:10:40,000 --> 00:10:42,880 offspring. We call this data 194 00:10:42,960 --> 00:10:45,760 data1. We will 195 00:10:45,760 --> 00:10:47,440 just put import. 196 00:10:48,720 --> 00:10:51,200 We write object ANOVA1 197 00:10:52,640 --> 00:10:55,440 and we use function lm, linear 198 00:10:55,440 --> 00:10:58,240 model, which is the basic function in 199 00:10:58,320 --> 00:11:00,880 R. We have to define 200 00:11:01,200 --> 00:11:04,080 model equation. "y" 201 00:11:04,320 --> 00:11:06,480 is equal to one effect, 202 00:11:06,880 --> 00:11:09,680 effect of sire. It's 203 00:11:09,680 --> 00:11:12,480 not used here a call to but the 204 00:11:12,480 --> 00:11:15,120 so-called tilde or the waveform. 205 00:11:16,320 --> 00:11:18,760 Write Alt+1 on alphanumeric 206 00:11:18,760 --> 00:11:21,120 keyboard. To view the 207 00:11:21,120 --> 00:11:24,040 result, let's write down 208 00:11:24,040 --> 00:11:26,000 ANOVA and in parenthesis 209 00:11:26,000 --> 00:11:28,880 ANOVA1. When I 210 00:11:28,880 --> 00:11:31,440 tag it and run it, we see the 211 00:11:31,440 --> 00:11:33,200 results in couple of steps, 212 00:11:34,640 --> 00:11:37,600 actually in two lines. We have 213 00:11:37,600 --> 00:11:39,640 the resulting table of analysis of 214 00:11:39,640 --> 00:11:42,560 variance. Notice that here are 215 00:11:42,560 --> 00:11:44,320 the degrees of freedom, paternal 216 00:11:45,440 --> 00:11:48,080 or between groups, and the residual. 217 00:11:49,600 --> 00:11:52,320 Then there is the sum of squares of 218 00:11:52,480 --> 00:11:54,080 deviations from the mean. 219 00:11:55,680 --> 00:11:58,480 The last third column is crucial for us. 220 00:11:59,760 --> 00:12:01,680 There are mean of squares 221 00:12:02,320 --> 00:12:04,080 between and within groups. 222 00:12:05,600 --> 00:12:07,680 You can see that this is absolutely the 223 00:12:07,680 --> 00:12:10,640 same as we can calculate it manuály before. 224 00:12:11,960 --> 00:12:14,320 This is where the analysis of variance 225 00:12:14,400 --> 00:12:17,280 ends and then the genetic analysis comes 226 00:12:17,280 --> 00:12:20,080 in. So we 227 00:12:20,080 --> 00:12:22,400 should calculate genetic variances, 228 00:12:22,800 --> 00:12:25,440 environmental variances and then 229 00:12:26,160 --> 00:12:28,800 coefficient of correlation and from this 230 00:12:28,800 --> 00:12:31,360 coefficient, coefficient of heritability. 231 00:12:32,680 --> 00:12:35,520 At the end we will show a second analysis 232 00:12:35,800 --> 00:12:38,560 where we use the REML method. Here we 233 00:12:38,560 --> 00:12:41,520 have to load or install and load 234 00:12:41,840 --> 00:12:42,840 special package lme4. 235 00:12:46,800 --> 00:12:49,280 You install the package using the package 236 00:12:49,360 --> 00:12:49,920 folder. 237 00:12:52,320 --> 00:12:55,080 I have already installed it, so I just 238 00:12:55,080 --> 00:12:57,600 activate it by typing library 239 00:12:58,000 --> 00:12:59,200 and the package name. 240 00:13:01,040 --> 00:13:03,520 Here we will use the lmr 241 00:13:03,520 --> 00:13:05,840 function that is in the package 242 00:13:06,960 --> 00:13:09,760 lmf4 and 243 00:13:09,760 --> 00:13:12,120 again we have to define the simple 244 00:13:12,120 --> 00:13:14,240 equation. So again 245 00:13:14,400 --> 00:13:16,240 "y" tilde 246 00:13:17,680 --> 00:13:20,000 1 plus and in 247 00:13:20,000 --> 00:13:22,000 brackets it will be 248 00:13:23,680 --> 00:13:26,520 one, perpendicular line and 249 00:13:26,520 --> 00:13:28,800 then we will write sire. 250 00:13:30,000 --> 00:13:32,720 Let's also write that the data is Data1 251 00:13:32,720 --> 00:13:35,680 to get the result back 252 00:13:35,680 --> 00:13:38,600 displayed, so we will write a summary of 253 00:13:38,600 --> 00:13:40,720 our REML1 calculation. 254 00:13:42,080 --> 00:13:43,760 And if we work it 255 00:13:45,040 --> 00:13:48,000 and put Ctrl key and Enter, we will 256 00:13:48,000 --> 00:13:50,440 see the overall table for this analysis, 257 00:13:51,480 --> 00:13:54,240 where we can notice that these two lines 258 00:13:54,240 --> 00:13:56,880 are important to us when, as you can 259 00:13:56,880 --> 00:13:59,120 see, us directly this 260 00:13:59,120 --> 00:14:02,040 calculation REML calculated directly the 261 00:14:02,080 --> 00:14:04,880 genetic variation by sire. 262 00:14:06,560 --> 00:14:09,200 There is also environmental 263 00:14:09,360 --> 00:14:11,920 variation. We then 264 00:14:11,920 --> 00:14:14,240 process them in the same way. It is we 265 00:14:14,240 --> 00:14:17,040 calculate the coefficient r and 266 00:14:17,040 --> 00:14:18,720 from it the heritability. 267 00:14:20,440 --> 00:14:22,320 And thank you for your attention.