# The DNA methylation profile in beef cattle is influenced by additive genetics and age

### Demographic resources and data collection

All animal procedures performed in this study were approved by the University of Nebraska-Lincoln Animal Care and Use Committee. All methods were performed in accordance with current guidelines and regulations. The methods are reported in the manuscript in accordance with the recommendations of the ARRIVE guidelines.

Blood collected from 136 cows was centrifuged at 2500 × g for 10 min at room temperature. The buffy coat was collected and stored at -80°C. All samples were collected from October 2 to November 27, 2018 at the Eastern Nebraska Research and Extension Center at the University of Nebraska-Lincoln. Cows were chosen to provide age variation and ranged from 217 to 3192 days (0.6 to 8.7 years) in age at the time of sample collection. The animals came from a mixed population of purebred Angus and composites of different proportions of Angus, Simmental and Red Angus.

All animals were genotyped with the Illumina BovineSNP50 (~50 K SNP) medium density BeadChip (Illumina, San Diego, CA, USA). Genotypic screening included deletion of non-autosomal SNPs, SNPs with minor allele frequency P-value > 10−5.

### DNA methylation

DNA was extracted from buffy coats using the DNeasy Blood and Tissue Kit (Qiagen, Cat No. 69506) and subsequently bisulfite converted using the EZ DNA Methylation Kit (ZymoResearch, Irvine, CA, USA). mDNA data was obtained from bisulfite-treated samples using the Mammalian Array (HorvathMammalMethylChip40)31. The mDNA level for each site was calculated as methylation β value (β value = methylated allele intensity/unmethylated allele intensity + methylated allele intensity + 100). The addition of 100 was used to stabilize β values ​​when the intensities of methylated and unmethylated alleles were low32. The SeSAme pipeline33 was used to generate normalized β values ​​and for quality control. The β-value exhibits severe heteroscedasticity outside the intermediate methylation range; thus, a logit transformation of β values ​​(M values) was used to approximate homoscedasticity. M values ​​of 0 correspond to 50% methylation, and positive and negative values ​​correspond to a level of methylation above and below 50%, respectively. M values ​​were used to quantify mDNA level by region (i.e. promoter, 5′ and 3′ UTR, exonic, intronic, and intergenic) and CpG island-bound location (i.e. i.e. inside or outside). DNA methylation charge was calculated as the sum of all mDNA levels (M value).

The mDNA status of each site was determined by the distribution of β methylation values. For example, β values ​​below, within, and above 2 standard deviations were classified as unmethylated (0), intermediately methylated (1), and methylated (2), respectively. mDNA status was used for prediction purposes.

### statistical analyzes

#### Genetic parameters for mtDNA level and mtDNA load

Genetic parameters (i.e., genetic and additive residual variances) for mDNA level (M values) and mDNA load were estimated using the following animal model fitted in a Bayesian framework genomics best linear unbiased prediction (GBLUP).

$${varvec{y}}={varvec{X}}{varvec{b}}+{varvec{Z}}{varvec{u}}+{varvec{e}}$$

where ({varvec{y}}) corresponds to the vector of phenotypes (mDNA level or mDNA load); ({varvec{X}}) corresponds to the design matrix linking fixed effects to phenotypes; ({varvec{b}}) is the fixed effects vector including the y-intercept, linear and quadratic (mDNA only) covariates of age, and proportion covariates of each race; ({varvec{Z}}) corresponds to the incidence matrix linking the random animal effect to the phenotypes; ({varvec{u}}) corresponds to the vector of random animal effects, where ({varvec{u}}) ~N(0, g ({sigma}_{u}^{2})), where g corresponds to the genomic relationship matrix constructed according to VanRaden’s first method34 and ({sigma}_{u}^{2}) corresponds to the additive genetic variance; and ({varvec{e}}) is the vector of random residual effects associated with the phenotype, where ({varvec{e}}) ~N(0, I ({sigma}_{e}^{2})), where I is the identity matrix and ({sigma}_{e}^{2}) corresponds to the residual variance. The H2 was obtained as the ratio of additive genetic variance divided by phenotypic variance (additive genetic variance + residual variance).

Gibbs sampling was used to sample a posterior parameter distribution with a string length of 20,000 iterations, a burn-in of 2,000 samples, and a thinning interval of 100. Analyzes were performed at the help from BGLR package35 in R software.

#### Effect of age on mtDNA load and age prediction

The effect of age on mDNA load was estimated by fitting an exponential regression of mDNA load on animal age (years). mDNA status was included as a variable to predict animal age using five Bayesian regression models: BRR29BayesA29BayesB29BayesCπ36and Bayesian LASSO (BLASSO)37as following:

$${varvec{y}}={varvec{X}}{varvec{b}}+sum_{i=1}^{k}{{varvec{m}}}_{{varvec {i}}}{boldsymbol{alpha }}_{{varvec{i}}}{{varvec{updelta}}}_{{varvec{i}}}+{varvec{e} }$$

where ({varvec{X}}) and ({varvec{e}}) have been previously described; ({varvec{y}}) corresponds to the vector of ages in years; ({varvec{b}}) is the vector of fixed effects, including the intercept and linear covariates for each race; ({{varvec{m}}}_{{varvec{i}}}) is the mtDNA status vector for the site I (coded as 0, 1 and 2); ({boldsymbol{alpha }}_{{varvec{i}}}) i is the effect of DNAm status for the site I for each of k sites; ({{varvec{updelta}}}_{{varvec{i}}}) is an indicator if the DNAm status site I was included (({{varvec{updelta}}}_{{varvec{i}}}) = 1) or excluded (({{varvec{updelta}}}_{{varvec{i}}}) = 0) of the model for a given iteration of Gibbs sampling (BayesRR and BayesA, ({{varvec{updelta}}}_{{varvec{i}}}) = 1). In BayesRR and BayesCπ, the effect of DNAm status is assumed to follow a normal distribution. In BayesA and BayesB, the effect of DNAm status is assumed to follow a you-distribution with site-specific variances. In Bayes Lasso, the effect of mtDNA status is assumed to follow a double exponential distribution.

Gibbs sampling was used to sample a posterior parameter distribution with a string length of 20,000 iterations, a burn-in of 2,000 samples, and a thinning interval of 100. Bootstrapping (not= 400) was used to evaluate the performance of models with 102 and 34 individuals as training and validation populations, respectively. The performance of the models was evaluated based on the correlation between the actual and predicted age, the root mean square error and the slope of the regression of the predicted age on the actual age. Analyzes were performed in the BGLR package35 in R software.