Using AI to find biomarkers of calorie restriction

We know that calorie restriction substantially extends the lifespan of many model organisms, and recent high-throughput mRNA studies have started to catalog the extensive genetic changes that occur in response to CR. But which of these (very many) differentially regulated genes are the crucial ones that best distinguish the calorie restricted state from the normal one?

Traditional methods of statistical analysis are based on the simple principle that if an individual gene is substantially differentially expressed in CR vs in normal animals, then that gene should be a good biomarker for CR. These methods are limited because they consider only single genes acting alone – but biological reality is usually more complicated than that. It is quite possible that the CR state can be better characterized by some more subtle combined pattern of expression of a group of genes.

To discover these complex multigene relationships, we need more sophisticated tools. In the latest issue of Rejuvenation Research, Goertzel et al. apply artificial intelligence to this problem:

Identifying the Genes and Genetic Interrelationships Underlying the Impact of Calorie Restriction on Maximum Lifespan: An Artificial Intelligence-Based Approach

Novel artificial intelligence methodologies were applied to analyze gene expression microarray data gathered from mice under a calorie restriction (CR) regimen. The data were gathered from three previously published mouse studies; these datasets were merged together into a single composite dataset for the purpose of conducting a broader-based analysis. The result was a list of genes that are important for the impact of CR on lifespan, not necessarily in terms of their individual actions but in terms of their interactions with other genes. Furthermore, a map of gene interrelationships was provided, suggesting which intergene interactions are most important for the effect of CR on life extension. In particular our analysis showed that the genes Mrpl12, Uqcrh, and Snip1 play central roles regarding the effects of CR on life extension, interacting with many other genes (which the analysis enumerates) in carrying out their roles. This is the first time that the genes Snip1 and Mrpl12 have been identified in the context of aging. In a follow-up analysis aimed at validating these results, the analytic process was rerun with a fourth dataset included, yielding largely comparable results. Broadly, the biological interpretation of these analytical results suggests that the effects of CR on life extension are due to multiple factors, including factors identified in prior theories of aging, such as the hormesis, development, cellular, and free radical theories.

So what are Snip1 and Mrpl12 up to? We can hope that some enterprising biologists will follow up this work with experimental validation…

A couple of features of this paper that I found particularly interesting:

  • They pooled data from several different microarray studies, making their job a lot harder — and their results all the more impressive. The various studies they combined used different strains of mice, and measured gene expression at different ages and in different tissues (liver, skeletal muscle, and brain). Dealing with noisy data like this is difficult; it’s notable that they were still able to classify mouse samples as CR or normal with high accuracy.
          In fact, I would like to see them try their approach on an even noisier problem: finding CR biomarkers shared by different species. So far, traditional statistical analysis has identified few if any genes that are significantly differentially expressed in several species of calorie restricted animals; it would be interesting to see if the method of Goertzel et al. could make any sense of a multi-species pool of expression data.
  • They used an unusual algorithm to classify samples as CR or normal (for the computer scientists: they used genetic programming to learn an ensemble of classification rules). Basically, their algorithm votes for whether a sample is CR or normal based on the outputs of several short classification rules (short because each rule looks only at the expression levels of a few genes).
          An advantage of this type of approach (over, say, a ‘black box’ neural network) is that the classification rules are easy to interpret biologically: you can search through them to identify important genes and genetic relationships. A gene is important for CR if it appears in many different rules, and two (or more) genes are related if they appear together in many rules.


  1. Very nice post – thanks for dissecting the paper for us. I especially like the approach of the paper, which you point out so well of, examining combined gene patterns – not a single gene acting alone. (though it was somewhat ironic how the authors go on to pick out three particular genes that ‘play central roles regarding the effects of CR on life extension’. But they do go on to say these interact with many other genes.)
    I will have to read more about the genetic programming they used for classification of the two groups (its been a long time since I read about this general idea).

Comments are closed.