genomic estimated breeding value definition
Both statistical methods yielded high and similar predictive abilities for the four traits (Figure 2, A and B). (4) can be used to predict either \(r_{M}^{2}\) or \(r^{2} .\) A prediction of \(r_{M}^{2}\) is obtained when using \(\theta_{M}\) defined in Eq. Using the estimator of the Ne within a full sib family, given by Ne = [2n/(n+1)] (Resende and Barbosa 2006), the maximum (when n goes to infinite) Ne within a full sib family is 2. The pedigree-based EBV relates to FI for \(g_{G}\), because pedigree information captures the full genetic effect. Equation(3a) is identical to Eq. (7), with \(r_{1}^{2} = r_{2}^{2} = 0.3658\). Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. (2b) into Eq. 2020). Jia C, Zhao F, Wang X, Han J, Zhao H, et al. Predictive ability was always greater for GWFP methods in both populations and all traits, except for the scenario GWFP_Fam_Ind that showed similar or lower accuracy than GEBV for most traits (Figure 4). 2010;42:2. Pembleton LW, Drayton MC, Bain M, Baillie RC, Inch C, et al. Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. An animal's breeding value can be defined as its genetic merit for each trait. Accuracy of genome-enabled prediction in a dairy cattle population using different cross-validation layouts. The principle of breeding value estimation is based on regression We want to know differences in breeding value based on observed differences in phenotype. 2012. The site is secure. Two traits with different genetic architectures were simulated: (1) oligogenic: 30 QTL were sampled from a gamma distribution with rate 1.66 and shape 0.4, with positive or negative QTL effects (Meuwissen et al. Estimated breeding values reflect the true genetic potential or genetic transmitting ability of animals. Provided by the Springer Nature SharedIt content-sharing initiative. These estimates are called Estimated Breeding Values (EBVs). [3]. The traditional genomic prediction approach with individuals in the training set and validation set (GEBV) was contrasted with predictive ability obtained with the family-based (GWFP) method following a 10-fold cross validation scheme. Anim Sci. (9) follows from the general Eq. In summary, the base population was created (G0=1000 diploid individuals) by randomly sampling 2000 haplotypes from a population with an effective size of Ne = 10,000 (Johnson et al. 2016. 8 of Dekkers et al. Unfortunately, substitution of Eqs. This result is equivalent to well-known expressions for the accuracy of progeny testing e.g., [7], as evident from substituting \(\sigma_{s}^{2} = \frac{1}{4}h^{2} \sigma_{p}^{2}\), which yields the well-known result \(r^{2} = nh^{2} /\left[ {nh^{2} + \left( {4 - h^{2} } \right)} \right]\), where \(h^{2}\) is the heritability. Predicted ability was estimated by calculating a Pearsons correlation between the phenotypic values and the estimated breeding values, and prediction accuracy was estimated by calculating a Pearsons correlation between the real breeding value and the estimated breeding value. Accuracy of genomic prediction in a commercial perennial ryegrass breeding program, Genomic selection in forest tree breeding. First, we consider the case of merging genomic information from two subpopulations into a single reference population, followed by the merging of pedigree and genomic information, as in Dekkers et al. (2019), which reported that the accuracy of genomic prediction in wheat showed higher predictions for crosses (validation set) with higher phenotypic variance. J Math Psychol. Remember, the breeding values we use are simply estimates of an animal's genetic potential. 2015; F et al. [3] derived predictions of the accuracy of genomic EBV (GEBV) by combining pedigree and genomic information using two approaches: a derivation based on selection index theory (SIT) vs. a derivation based on Fisher information (FI). 1996. Animal. Article Bethesda, MD 20894, Web Policies Animal Science Breeding Genomic Estimated Breeding Value DOI: Authors: Ashish Rai Dr Rajendra Prasad Central Agricultural University Download file PDF Abstract What is genomic selection, GEBV. 2009. As a library, NLM provides access to scientific literature. [2]. Both cross-validation schemes, leave-one-out and 10-fold, produced similar results in predicting GWFP with a slight advantage for the 10-fold scheme, due to the large variation in the leave-one-out scheme. (4). First, we ignore the reduction in residual variance that results from fitting all markers simultaneously and from joint analysis of the two populations, in order to mathematically demonstrate the equivalence of the SIT and FI approaches for this case. Therefore, the application of genome-wide family prediction (GWFP) would be advantageous for traits that are phenotyped using family pools in swards or plots. [4] (see also Appendix A in Wientjes et al. Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA.. In dairy cattle, the method is already a recognized tool to estimate the breeding values of young animals and reduce generation intervals. Note, however, that independence of sampling errors does not require the individuals from one subpopulation to be genetically unrelated to individuals from the other subpopulation. But we can also use information from relatives, such as the sire, the dam, siblings and progeny. The SIT and FI approaches are equivalent when these sources are accounted for. 2019;102:315574. The GWFP approach is well suited for crops that are routinely genotyped and phenotyped at the plot-level, but it can be extended to other breeding programs. 2007 Dec;124(6):323-30. doi: 10.1111/j.1439-0388.2007.00702.x. Finally, GWFP models could be exploited in scenarios when remnant seeds might be available for the same family, and the goal would be to predict the performance of the family or individuals within the family. Generally, a larger training population (more families in the training population) yield higher accuracy (Voss-Fels et al. PB derived most mathematical results. Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids. This result illustrates that the SIT and FI approaches yield the same reliability of predictions when the same assumptions are made. Next, we show that the apparent discrepancy that has been observed between accuracies based on SIT vs. FI originated from two sources. Additionally, GWFP predictive abilities obtained with the leave-one-out approach were slightly lower than for the 10-fold cross-validation scheme (except for trait Stiffness) (Figure 2, A and B). 2011;128:40921. Animal. 1998;81:27238. [5]). Walsh B, Lynch M. Evolution and selection of quantitative traits. Breeding values define the superiority or inferiority of the offspring of an animal. 2016, 2019). 2020; Esfandyari et al, 2020). (9) represents the FI due to genomic relationships deviated from pedigree relationships for estimation of \(g_{G}\), rather than \(g_{M}\). 2014). Email: Received 2021 Mar 9; Accepted 2021 Jul 2. A breeding value is an estimate of an animal's genetic merit for a particular trait. GWFP has been used in several forage species that are bred in family bulks and whose phenotyping for critical traits is conducted at the sward/plot level (F et al. 2001; Gianola et al. For one group of individuals phenotypic and genotypic data were pooled at the family level and used as the training set for GWFP models. Allele frequency deviations (Figure 1, AD) and mean phenotypic deviations (Figure 1, E and F) indicated that families with less than six individuals were not providing accurate estimates of the familys genotypic and phenotypic means in both populations. (6) yields: Dividing the numerator and denominator by \(1 - q^{2} h^{2} /M_{E}\) yields: Writing all terms with \(\left( {q^{2} - r_{1}^{2} } \right)\left( {q^{2} - r_{2}^{2} } \right)\) as denominator and then cancelling this denominator yields: Dividing the numerator and denominator by \(q^{4}\) yields Eq. Optimising genomic selection in wheat: Effect of marker density, population size and population structure on prediction accuracy. Terms and Conditions, Goals of genomic prediction. 2015, 2016; Gezan et al. Google Scholar. Traditional genomic prediction pipelines involve developing a training set, for which available genotypic and phenotypic data is fitted to build a prediction model. Breeding cross-pollinated and clonally propagated crops. TheEstimatedBreedingValueconsiders two elements, the performanceestimateand environmental factors, with the performance estimation affording guidance on parts of the animal that you cannot actually see but which are influenced by genetics. Resende et al. (2015) studied the effect of the number of families in the accuracy of genomic . Hayes BJ, Daetwyler HD, Bowman P, Moser G, Tier B, et al. First, the FI referred to the genetic component that is captured by the marker genotypes, rather than the full genetic component. Wray NR, Hill WG. Get the best quality genetics from your bull? In the calculation of EBVs, the performance of individual animals within a . Piter Bijma. Besides, relatedness between the training set and the validation set also influence the predictive ability. Your privacy choices/Manage cookies we use in the preference centre. Thus, \(r^{2}\) is the ratio of the variance in the progeny means that is explained by the sire over the full variance in the progeny means, which is the \(R^{2}\) due to the sire. 2017. Equations(1a) and (1b) ignore that the genotyped markers may capture only a proportion \(q^{2}\) of the full additive genetic variance [4, 9], which has two consequences. Fusiforme (h2 = 0.21, oligogenic trait), and (4) diameter at breast height (Diameter) at year 6 (cm) (h2 = 0.31, polygenic trait). 2019; de Bem Oliveira et al. Goddard M. Genomic selection: prediction of accuracy and maximisation of long term response. The predictive ability for GWFP for all traits were least accurate and had larger standard errors when the validation set was composed of families exhibiting small and large phenotypic values (bottom and top classes; Figure 3B). To assess the effect that the validation set structure has in the predictive ability of the models, both populations were divided in three different phenotypic classes for each trait: the smallest 10%, the largest 10%, and values between both extremes. Therefore, Eq. To avoid this complexity, we use a numerical example instead. Therefore, using the numbers from this study as example, considering the significant reduction in costs incurred in DNA extraction and genotyping 56 families (training set for GWFP), instead of 844 individuals (training set for GEBV), the approach GWFP_Fam_Ind could still be an affordable option for implementing genomic prediction in breeding programs that select individual plants, but have limited budgets to phenotype and genotype all individuals in the training set. This yields \(r_{G}^{2}\)=0.7728, which is the same result as obtained with the SIT approach, and illustrates that the SIT and FI approaches yield the same result when the same assumptions are made. 2010;93:74352. Scheme for the different genomic prediction scenarios: (A) GEBV: genomic estimated breeding values for individual trees; (B) GWFP_Fam_Fam: genome-wide family prediction for families prediction; (C) GWFP_Fam_Ind: genome-wide family prediction applied in the selection of individuals. Both populations showed similar trends for the genotypic and phenotypic estimates (Figure 1). However, the minimum number of individuals per family to obtain reasonable accurate estimates of family allele frequency and family phenotypic mean was found to be six. (1a) and (1b), Eqs. (4), it follows that the reliability of GEBV based on analysis of a single subpopulation, \(i\), equals: where \(\theta_{M,i} = N_{i} q^{2} h^{2} /M_{e}\), and \(i =\) 1 or 2. 2015, 2016; Annicchiarico et al. The power of genomic estimated breeding values for selection when using a finite population size in genetic improvement of tetraploid potato Catja Selga, Fredrik Reslow, Paulino Prez-Rodrguez, Rodomiro Ortiz G3 Genes|Genomes|Genetics, Volume 12, Issue 1, January 2022, jkab362, https://doi.org/10.1093/g3journal/jkab362 Published: 21 October 2021 Patterns of population structure and environmental associations to aridity across the range of loblolly pine (. . Validation sets with similar phenotypic mean and variance as the training set showed greater predictive ability and more accurate predictions consistently across traits. This allows animals within theirbreedtype to be scored on their genetics without being influenced by their particular management programme or the environment in which they are reared. (3a) using \(\theta_{M} = \theta_{M,1} + \theta_{M,2}\)=1.5 and \(r^{2}\)=0.3658, and yields \(r^{2}\)=0.5020. What are GEBVs? They consider both genetic and environmental influences to arrive at theEstimatedBreedingValue. Evaluation of Breeding Values and Variance Components of Birth and Weaning Weights in the Holstein Cows Herd Based on Genetic Information. de Bem Oliveira I, Amadeu RR, Ferro LFV, Muoz PR.. The effect of number of individuals within families on accuracy of genomic prediction models was also demonstrated in perennial ryegrass (Pembleton et al. 2019). The phenotypic data collection at the plot level could be extended to other organisms grown and evaluated in families, such as turfgrasses (L.perenne L.), forages (M.sativa L.), sugarcane (Saccharum officinarum L.), cassava (Manihot esculenta L.), honey bees, and to aquaculture species such as shrimp (Litopenaeus vannamei; Barbosa et al. The authors declare that they have no competing interests. Although results from simulation studies suggest that different models may yield more accurate genomic estimated breeding values (GEBVs) for different traits, depending on the underlying QTL distribution of the trait, there is so far only little evidence from studies based on real data to support this. Second, the common SIT-based derivations did not account for the increase in the accuracy of GEBV due to a reduction of the residual variance when combining information sources. Norman A, Taylor J, Edwards J, Kuchel H.. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. In our study, the size of the training set refers to the number of families and the number of individuals within a family. If we regress breeding values on phenotypic observations, the slope of the regression line tells us how much difference we have in breeding values per unit of difference in phenotype. 2016. GEBV, genomic estimated breeding value; GWFP, genome-wide family prediction; CV, cross-validation. The GWFP approach considers family-pools as the measurement unit. (3a), resulting in a quadratic equation in \(r^{2}\). Due to the broad diversity in mating systems, breeding schemes, propagation methods, and unit of selection, no universal genomic prediction approach can be applied in all crops. The accuracy represents how close the estimated breeding value is to the true breeding value for a specific trait. An estimate of the breeding value for a particular trait is obtained from pedigree and performance information. In Eq. 2020). Scenarios implemented to design training and validation sets to test predictive ability of genomic prediction models. Genetic evaluation: Estimated breeding value/Definition. [3] originated from two sources, which, when accounted for, make the SIT and FI approaches equivalent. Derivations of the accuracy of GEBV make use of the concept of effective chromosomal segments [8]. PMC The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). the (economic) worth of an individual's GENOTYPE, as judged by the average performance of its offspring. 2017; Lara et al. Esfandyari H, F D, Tessema BB, Janss L, Jensen J.. Torres LG, Vilela de Resende MD, Azevedo CF, Fonseca e Silva F, de Oliveira EJ.. Instead, we first have to translate \(r_{D}^{2}\) into an FI that refers to the full genetic effect, after which we can add this FI to \(\theta_{A}\) and finally find the full reliability from Eq. It should also be remembered that a calfs genetic makeup comes from its mother also and not just the father and so each parent needs to be considered on its merit. Association studies using family pools of outcrossing crops based on allele-frequency estimates from DNA sequencing. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. trbac L, Pracner D, aran M, Jankovi D, Trivunovi S, Ivkovi M, Tarjan L, Dedovi N. Animals (Basel). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (, family selection, training population, predictive ability, genomic prediction. 2007). Using this concept, we showed that the apparent discrepancy between predictions of the accuracy of GEBV based the SIT vs. FI approaches in Dekkers et al. [3], (see their Eq. We choose identical subpopulation sizes because it allows us to easily illustrate the impact of the reduction in residual variance. (4), as explained in the following). Barbosa MHP, Resende MDV, Dias LADS, Barbosa GVDS, Oliveira RAD, et al. The data can be taken from the animals siblings, and heritage, where this data exists. Therefore, the additive genetic variance in full-sib families is half of the additive variance between individuals. The basic principle is that because of the high marker density, each quantitative trait loci (QTL) is in linkage disequilibrium (LD) with at least one nearby marker. Estimated Breeding Values - a brief explanation 1. We assumed that the observed values based on 15 individuals per family provides with a reasonable estimation of allele frequency and phenotypic mean for a diploid species. (3b) with \(\theta_{M}\)=1.5, which yields \(r^{2}\)=0.5114. Our objective was to investigate the effect of using MFs in genomic prediction for CB performance on estimated variance components, and accuracy and bias of GEBV. Optimizing whole-genomic prediction for autotetraploid blueberry breeding. (7) as derived from SIT in Appendix 15, and [3, 11]) ignores this reduction in residual variance, while Eqs. (8)) if we ignore a potential reduction in residual variance due to the merger of pedigree and marker information. Formally, it is the variance of the score function, which then equals the expected information [10]. van den Berg I, Meuwissen THE, MacLeod IM, Goddard ME. In the case of the genomic data, the allele frequency (p) was calculated for each SNP per family, considering the reference allele (A) as follows: where pij refers to the allele frequency for SNP i in the j family; nAAij and 2nAaij are number of individuals with genotype AA and Aa respectively for SNP i in the family j; Nij are number of individuals in family j with non-missing genotype data for SNP i. Comparing an individual animal with the benchmark of a herd or particularbreedand expressing the difference, either positively or negatively, gives you an animalsEBV or estimated breeding value, which is then expressed as a + or from the starting point for an average animal of zero. 2015. This scheme was repeated until the ten subsets were used as validation set. Genomic selection in plant breeding: methods, models, and perspectives. VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, et al. Next we consider the reliability of GEBV when merging the two subpopulations. This assumption can be validated using the concept of genetic representativeness, given by the effective population size (Ne) (Vencovsky and Crossa 2003). [3], which is \(r^{2} = \left[ {1 + \theta_{M} - \sqrt {\left( {1 + \theta_{M} } \right)^{2} - 4h^{2} q^{4} \theta_{M} } } \right]/2q^{2} h^{2}\), but accounts for having \(r^{2} h^{2}\) in the denominator of Eq. Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Average predictive ability obtained with Bayes B for four traits in CCLONES-real (lignin, tree stiffness, rust and stem diameter), and two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations using different genomic prediction methods. genomic selection, []).In the breeding literature, the additive genetic value an individual has for a phenotype is called the breeding value [20,21].Estimating the breeding value of an organism can be . The reliability of GEBV based on one of the two subpopulations, accounting for the reduction in residual variance from fitting all markers simultaneously, follows from Eq. The scenarios GWFP_Fam_Ind and GWFP_Fam_Fam were run only once because CCLONES (real and simulated) had a limited number of individuals per family. : where \(\sigma_{P}^{2}\) is the phenotypic variance, \(\sigma_{s}^{2}\) is the variance in progeny means that is explained by the effect of the sire, \(n\) is the progeny group size, and \(\left( {\sigma_{P}^{2} - \sigma_{s}^{2} } \right)/n\) is the residual variance of the progeny means after accounting for the sire effect. After fitting the model described above for each trait, the GWFP and GEBV of family/individual j (gj) were obtained using the following expression: where Zij is the allele frequency/marker dosage of the ith marker on family/individual j, and p is the total number of markers, andm^i is the estimated effect of ith SNP. Suppose we have a pedigree-based EBV, \(\hat{g}_{A}\), with reliability \(r_{A}^{2}\), and an EBV based on deviations of genomic relationships from pedigree relationships, \(\hat{g}_{D}\), with reliability \(r_{D}^{2}\), as in Dekkers et al. 2009 Feb;92(2):433-43. doi: 10.3168/jds.2008-1646. Using SIT, the reliability of the total GEBV of \(g_{G}\) follows from Eq. The relevant solution is: Equation(3b) accounts both for \(q^{2} < 1\) and for the reduction of residual variance because all markers are fitted simultaneously in GP. Genetics. An experimental validation of genomic selection in octoploid strawberry, Priors in whole-genome regression: the Bayesian alphabet returns. (4) in the previous paragraph ignores a potential reduction in residual variance due to the merger of pedigree and marker information. (7) is different from the SIT result of combining pedigree and genomic information derived by Dekkers et al. Parameter \(\theta_{{D_{G} }}\) can be solved for by entering \(r_{D}^{2}\) into Eq. 2009; Daetwyler et al. Martins Oliveira IC, Bernardeli A, Soler Guilhen JH, Pastina MM. Thus, in their study family variance was accurately represented with six individuals per family in this autotetraploid species. Collins Dictionary of Biology, 3rd ed. 2018). The squared accuracy of GEBV can be understood as a proportion of the variance explained. For creating true breeding values for the single lines of the breeding and diversity set, we assumed varying frequencies of the positive allele for each locus, which differed between the breeding and the diversity set. Mulder HA, Meuwissen TH, Calus MP, Veerkamp RF. 2010. In genomic prediction (GP), the accuracy of EBV can be increased by combining information sources, such as pedigree and marker information [1], or information from multiple genomic reference populations [2]. (6) for both \(\theta_{M,1}\) and \(\theta_{M,2}\) and simplifying the result yields a FI-based prediction of the reliability of GEBV based on the full reference population (see Appendix 13) that is equal to: Alternatively, we can derive \(r^{2}\) based on SIT. These training models were used and validated in the G3 generation using GEBV and GWFP, and models were assessed by calculating predictive ability and prediction accuracy. (2012) and Munoz et al. The minimum number of individuals per family was calculated assessing allele frequency and phenotypic mean deviations using families with at least 15 individuals. Accounting for the effect of fitting all markers simultaneously in the SIT approach can be accommodated by including the effect of the other \(M_{e} - 1\) segments as an information source in the index, as illustrated in Appendix 10, and gives identical accuracy predictions as accounting for this effect in the FI approach. The remaining seeds from the selected families can be used later to test their merits in further replicated field trials. 2016;202:799823. All phenotypic and genotypic data utilized in this study have been previously published as a standard data set for development of genomic prediction methods (Resende et al. 2014). To explain these differences, first we show that existing expressions for the squared accuracy, or reliability, of GEBV [3,4,5] can be understood as a proportion of the variance explained (\(R^{2}\)), which simplifies subsequent derivations. The genotype of an animal, however, cannot be influenced by these environmental factors. Fitting all markers simultaneously reduces the residual variance and, therefore, increases the reliability (Appendix S1 in [4]). 2015). Ly AMM, Marsman M, Verhagen J, Grasman RP, Wagenmakers EJ. [2], i.e.. to connect expressions for the reliability of GEBV that are based on FI to the corresponding expressions based on SIT. Genome-wide association mapping and genomic selection for alfalfa (, A note on genetic parameters and accuracy of estimated breeding values in honey bees, Theoretical expected genetic gains for among-and-within-family selection methods in perennial forage crops. 2016; Cericola et al. (3a), prior to assuming that \(h^{2}(q^{2}-r^{2})/M_{e}\ll 1\). The higher accuracy in the GWFP method was expected since the additive genetic variance explored in this method is just 50% of the additive genetic variance compared with the GEBV. Epub 2009 Oct 9. Genet Sel Evol. For the 10-fold cross-validation, data was randomly partitioned into ten subsets, and training set populations were created with 90% of the families/individuals, whereas the remaining 10% of families/individuals were used as validation set. When the families in the validation set had phenotypic values outside the range of phenotypes presented in the training set (bottom and top classes), lower and much more variable predictive abilities were obtained. Correspondence to Although the full sib families average explores only half of additive genetic variance, the error variance is mitigated with larger number of observations due progeny replication, when compared with single observations (Hallauer et al. 2018; Guo et al. This example will also illustrate that the difference between accuracy predictions based on the SIT approach used in Dekkers et al. With n equal to 6 individuals the Ne is 1.71, which is 86% of the maximum 2. The genomic prediction models were developed by using the G2 CCLONES_sim population as the training set. All these plant-specific characteristics are key factors affecting predictive ability in genomic prediction due to their influence in breeding methods, effective population size, population structure, and linkage disequilibrium (Lin et al. [] as a methodology to use genome-wide dense marker information to estimate genetic values for selection of breeding populations.The main difference between GS and previous approaches, such as marker-assisted selection (MAS), is that in MAS a requirement is to identify quantitative trait loci (QTL) first by linkage . 8600 Rockville Pike (3a) represents \(r_{M}^{2}\). Lara LAdC, Santos MF, Jank L, Chiari L, Vilela MDM, et al. Genomic relationship matrices are used to obtain genomic inbreeding coefficients. Comparing an individual animal with the benchmark of a herd or particular breed and expressing the difference, either positively or negatively, gives you an animal's EBV or estimated breeding value, which is then expressed as a + or - from the starting point for an average animal of zero.
Eso Start In Coldharbour,
Camp Kirk Executive Director,
Articles G