Bioinformatics


Two recent computational studies show that expression relationships between genes change with age – for example, some genes have expression levels that are highly correlated in early adulthood but not in old age. Both studies propose new methods for identifying gene groups with this behaviour, and the second also makes a compelling case that many related genes lose coexpression with age. Crucially, the correlation between a pair of genes may change with age even when the average expression levels of both genes do not – so these new coexpression methods are complementary to traditional differential expression analyses of microarray data.

Gillis et al. developed a new framework for identifying pairs of genes differentially coexpressed with age that is based on Haar wavelets, and tested it on a large set of human expression data mined from the handy GEMMA database. Unlike other methods that can interpret data coming from only two groups (e.g. young mice vs. old), the new wavelet method is designed to handle multiple ordered groups – such as animals of many different ages. The authors don’t discuss the biological implications of their results in any detail, instead promising these will be explored in a later paper.

Southworth et al. showed that coexpression patterns of groups of related genes become less coherent as animals age. Using several different methods for grouping genes together (e.g. assigning genes to the same group if they share a function, or if they are targets of the same transcription factor), they calculated intra-group correlation in 16- and 24-month-old mice using data from the AGEMAP study. They identified a surprisingly large number of groups with lower correlation in old mice. One of these is the targets of NF-κB – a transcription factor that, when knocked down, can reverse skin aging. Only a few groups (including one enriched for DNA damage genes) showed higher correlation in old mice. Also, the authors found that genes showing decreases in correlation aren’t randomly located on the chromosome – instead, they form several clusters.

What are the causes and consequences of these changes in gene group correlation? Previous single-cell studies have shown that transcriptional noise, or cell-to-cell variation in the expression levels of individual genes, increases with age. Clearly transcriptional noise is going to affect coexpression to some degree: any increase in a gene’s noise level will automatically reduce its calculated coexpression with other genes. But changes in coexpression can also occur without any corresponding change in noise. These changes may reflect cellular processes that are active or suppressed at different times of life, and many or all such changes (such as a ramped-up DNA damage response in old age) may be adaptive. Further analyses are needed to tease out which age-related coexpression differences result from noise, and which ones are telling us something new.

ResearchBlogging.orgGillis, J., & Pavlidis, P. (2009). A methodology for the analysis of differential coexpression across the human lifespan BMC Bioinformatics, 10 (1) DOI: 10.1186/1471-2105-10-306

Southworth, L., Owen, A., & Kim, S. (2009). Aging Mice Show a Decreasing Correlation of Gene Expression within Genetic Modules PLoS Genetics, 5 (12) DOI: 10.1371/journal.pgen.1000776

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine

An overwhelming number of natural products and nutraceuticals vie for our attention. Each is associated with a variety of claims of health benefits, often without any reference to the experimental evidence (if any) supporting those claims – or with reference only to dubious, poorly controlled studies in backwater journals. I don’t spend a lot of time following these compounds, but occasionally one gets mentioned often enough that is breaks through into the literature (e.g., resveratrol, green tea, carnitine/lipoate, or other supplements) and I discuss it here.

If only because of the size of the heap, I nonetheless still suspect that there’s a pony in there somewhere; I’ve often wished I had the time to do a comprehensive literature review of my own, so that I could identify the compounds whose associated claims are supported by the best evidence. Now it looks like I can start wishing for something else, because someone did it for me.

At the (amazing) blog Information is Beautiful, David McCandless and Andy Perkins have assembled a “generative data-visualisation of all the scientific evidence for popular health supplements“. In David’s words:

I’m a bit of a health nut. Keeping fit. Streamlining my diet. I plan to live to the age of 150 in fact. But I get frustrated by constant, conflicting reports and studies about health supplements.

Is Vitamin C worth taking or not? Does Echinacea kill colds? Am I missing out not drinking litres of Goji juice, wheatgrass extract and flaxseed oil every day?

In an effort to give myself a quick reference guide, I dove into the scientific evidence and created a visualization for my book. And then worked with the awesome Andy Perkins on a further interactive, generative “living image”.

The image itself is dynamic with respect to both user input about what information is desired, and introduction of new data – it is based on the information in a spreadsheet, which can be updated (new compounds, or information about compounds already mentioned), altering the visual rendering the dynamic image. You can play with the image here; I’ve attached a still snapshot below.

The rendering is imperfect (as also discussed elsewhere): More reliable claims are near the top, and more dubious claims are near the bottom, but this positioning is the result of a single variable, “evidence,” which may the based largely on a citation count. This is a problem because not all citations that mention a compound should be weighted equally; furthermore, it’s not clear how conflicting claims end up getting counted. The abstraction of a complex body of data into a single number unquestionably involves some judgment calls that could be made differently – that’s not necessarily a lethal criticism, but the process should be as transparent as possible.

On a visual level, the image is attractive, but color is mostly a wasted variable: position along the color spectrum is synonymous with height — except in the case of orange, which indicates a compound with “low evidence, promising results”. The orange compounds are still assigned an evidentiary weight, according to an algorithm I can’t fathom; this is particularly confusing at both ends: beta-glucan is in the “high evidence” position, which seems to contradict the label’s definition (“low evidence”); whereas noni and astragalus are in the “no evidence” position, raising questions about how there could be “promising results”.

The strength of the project, however, is that it can evolve; the creators are already enthusiastically updating it. So far the changes (as detailed in this log) are content-oriented; one hopes that the methodology of generative data visualization will also enjoy improvements as time goes by.

(For another example of user-driven visualization, see the Timeline of Discoveries in the Science of aging, which we discussed here previously (1 2). That piece hasn’t been updated in a while – perhaps it could use some new contributors.)

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine

Individuals of the same species age at different rates, and these differences should be reflected in their gene expression profiles. However, most microarray studies of aging are designed only to capture the gene changes that occur with age in a “typical” individual and (with rare exceptions) ignore individual variability – all animals of a given age are lumped together into a group, and different age groups are compared.

To study how gene changes are related to individual longevity, we need another type of data in addition to gene expression profiles: the survival time of individual animals after their gene expression is measured. With this information, we could determine which transcriptional responses are associated with a longer lifespan, and in principle even develop a personalized medicine approach to aging: we could train a machine learning algorithm to peek at the expression levels of a handful of crucial genes and predict your physiological age – and the number of healthy years you have left.

Previous microarray studies of aging animals didn’t include survival times because the animals were sacrificed at the time of sample collection (in order to get enough RNA), and studies of aging humans haven’t included survival times because we live too long.

Recently, some human survival data – together with matching gene expression data from lymphoblastoid cell lines – have become available from a long-range study that began in the early 1980s. In the first aging study to take advantage of this resource, Kerber et al. mine the data to identify gene changes associated with longevity:

Gene Expression Profiles Associated with Aging and Mortality in Humans
We investigated the hypothesis that gene expression profiles in cultured cell lines from adults, aged 57-97 years, contain information about the biological age and potential longevity of the donors. We studied 104 unrelated grandparents from 31 Utah CEU (Centre d’Etude du Polymorphisme Humain – Utah) families, for whom lymphoblastoid cell lines were established in the 1980s. Combining publicly available gene expression data from these cell lines, and survival data from the Utah Population Database, we tested the relationship between expression of 2,151 always-expressed genes, age, and survival of the donors. Approximately 16% of 2,151 expression levels were associated with donor age: 10% decreased in expression with age, and 6% increased with age. CDC42 and CORO1A exhibited strong associations both with age at draw and survival after draw, (multiple comparisons-adjusted Monte Carlo p-value < 0.05). In general, gene expressions that increased with age were associated with increased mortality. Gene expressions that decreased with age were generally associated with reduced mortality. A multivariate estimate of biological age modeled from expression data was dominated by CDC42 expression, and was a significant predictor of survival after blood draw. A multivariate model of survival as a function of gene expression was dominated by CORO1A expression. This model accounted for approximately 23% of the variation in survival among the CEU grandparents. Some expression levels were negligibly associated with age in this cross-sectional dataset, but strongly associated with inter-individual differences in survival. These observations may lead to new insights regarding the genetic contribution to exceptional longevity.

The novel aspect of this study was the integration of gene expression and survival data to identify genes associated with longevity; the authors also identified genes associated with chronological age using both univariate and multivariate models.

A brief summary of some of their major findings:

  • A six-gene model accounts for 23% of the variation in survival time
    The authors trained a penalized regression model to predict survival time on the basis of the expression levels of roughly 2000 genes. After training, only six genes had non-zero model coefficients: CORO1A, FXR2, CBX5, PIK3CA, AKAP2, and CUL3. The model was dominated by the expression levels of CORO1A (which is negatively associated with mortality) and FXR2 (which is positively associated with mortality). CORO1A has been implicated in mitochondrial apoptosis, and FXR2 is involved in Fragile X syndrome; the exact role of these two genes in aging has yet to be determined.
  • Genes associated with age are not necessarily associated with survival (and vice versa)
    The authors used linear regression to identify individual gene changes that were associated with chronological age, and a proportional hazards model to identify changes associated with survival. Among the top 10 genes identified by each test, only one gene appears on both lists (CORO1A) – i.e., genes that are strongly associated with longevity are not necessarily strongly associated with survival. This is an important point – it means that in order to identify gene expression biomarkers of physiological age and longevity, we need more microarray studies that report survival data.

Looking at expression data alone, it is difficult to tell which of the very many age-related gene changes are good and which are bad, i.e., whether a given gene change causes a problem associated with aging or is part of some beneficial damage-control response – an issue which we previously discussed in the context of gender differences in brain aging. With survival data, we can now ask a specific question of each gene: is its age-related response associated with increased or with reduced mortality? For nine of the ten genes most strongly related to survival in this study, relative overexpression was associated with reduced mortality. This strongly suggests that those genes (including CORO1A) are doing something good, i.e. that they are involved in some sort of defense or repair mechanisms.

The expression dataset used by the authors of this study is publically available through GEO: GSE1485, GSE2552.

ResearchBlogging.orgKerber, R., O’Brien, E., & Cawthon, R. (2009). Gene expression profiles associated with aging and mortality in humans Aging Cell, 8 (3), 239-250 DOI: 10.1111/j.1474-9726.2009.00467.x

In recent years, dozens of large-scale gene expression studies (many of them available through the Gene Aging Nexus) have tracked the transcriptional changes that occur with aging. However, these studies usually identify few genes showing statistically significant changes; worse, there is poor overlap across studies – i.e. genes found to be very significant in one study are often not significant in others.

It’s true that these problems are common to microarray studies of other phenotypes – experimental noise and biological variability make this type of data hard to interpret – but for aging the difficulties seem especially pronounced. Aging is complex and global: it happens in every tissue (and possibly differently in every tissue), at both the cellular and organismal levels, and involves many independent biochemical pathways. On top of that, rates of aging can vary substantially for different individuals in the same species, while within the same individual, transcriptional noise increases with age.

So how can we identify a set of genes that are consistently age-associated? In the latest issue of Bioinformatics, Magalhães et al. (the developers of HAGR) develop a statistical methodology for identifying trends of age-regulation across studies and apply it to a collection of 27 different mammalian microarray studies of aging:

Meta-analysis of age-related gene expression profiles identifies common signatures of aging

Motivation: Numerous microarray studies of aging have been conducted, yet given the noisy nature of gene expression changes with age, elucidating the transcriptional features of aging and how these relate to physiological, biochemical and pathological changes remains a critical problem.
Results: We performed a meta-analysis of age-related gene expression profiles using 27 datasets from mice, rats and humans. Our results reveal several common signatures of aging, including 56 genes consistently overexpressed with age, the most significant of which was APOD, and 17 genes underexpressed with age. We characterized the biological processes associated with these signatures and found that age-related gene expression changes most notably involve an overexpression of inflammation and immune response genes and of genes associated with the lysosome. An underexpression of collagen genes and of genes associated with energy metabolism, particularly mitochondrial genes, as well as alterations in the expression of genes related to apoptosis, cell cycle and cellular senescence biomarkers, were also observed. By employing a new method that emphasizes sensitivity, our work further reveals previously unknown transcriptional changes with age in many genes, processes and functions. We suggest these molecular signatures reflect a combination of degenerative processes but also transcriptional responses to the process of aging. Overall, our results help to understand how transcriptional changes relate to the process of aging and could serve as targets for future studies.
Availability: http://genomics.senescence.info/uarrays/signatures.html

To summarize their basic method: the authors reanalyzed data in each of the 27 microarray studies separately to produce a list of differentially expressed genes for each one. Then, they counted up the number of times a gene was differentially expressed with age in the group of studies, and determined whether that number was significantly larger than what would be expected by chance.

Of the 73 genes they found to be consistently age-regulated, 13 have been previously validated (e.g. by qRT-PCR) – a corroboration that strongly supports the new method. The other 60 genes have yet to be investigated.

A couple of points worth noting:

  • This is the first rigorous, large-scale integration of mammalian aging microarray data
    Mining collections of dozens or even hundreds of gene expression datasets to identify global trends is becoming increasingly popular, especially in cancer research (cancer seems to be the research area that sees the most sophisticated applications of bioinformatics). But for aging – an area where the data are noisier, and there is perhaps an even stronger need for integrative computational approaches – few studies have compared more than a handful of expression datasets at once, and none in mammals. Several studies have compared multiple mammalian microarrays on a smaller scale (e.g. Goertzel et al. investigated the effect of calorie restriction on mouse aging; as part of larger studies, Zahn et al. and Adler et al. compared aging in humans and mice).
  • Their analysis is designed to pick out genes that participate in a general aging program
    The microarray studies used in this meta-analysis span a diverse range of tissues, and even multiple species (human, mouse, and rat), so genes emerge as significant here only if they demonstrate a strong age-associated profile across a range of very different conditions. While this approach will likely fail to identify those genes that are age-regulated only in a single tissue, the advantage is that those genes that do come out of this analysis are likely to be the really interesting ones – components of a common aging program that operates in multiple tissues.

ResearchBlogging.orgde Magalhaes, J., Curado, J., & Church, G. (2009). Meta-analysis of age-related gene expression profiles identifies common signatures of aging Bioinformatics, 25 (7), 875-881 DOI: 10.1093/bioinformatics/btp073

The genome era and the advent of high-throughput technologies have brought about a huge increase in the amount of data available to biologists: each genome contains tens of thousands of genes, whose products can potentially interact with each other in an astronomical number of ways. This quantitative change has created a need for a qualitative change in the way we perform analyses: the human brain is not very good at understanding thousands of things at once, let alone millions or billions, so we must find new ways to extract comprehensible patterns from torrents of data.

Many of the techniques being developed to analyze large biological networks fall under the umbrella of systems biology. Some of the newest tools have been used guide genetic perturbation studies in yeast, resulting in the discovery of novel lifespan control genes. What can such network analysis tell us about human aging?

To address this question, Bell et al. compiled a list of gerontogenes (i.e., genes whose wildtype function is associated with accelerated aging, and whose loss-of-function mutants are associated with longer life) from model systems, and studied the connectivity of these genes within the context of interaction data obtained from a large-scale (though not comprehensive) two-hybrid screen of human proteins.

A Human Protein Interaction Network Shows Conservation of Aging Processes between Human and Invertebrate Species
We have mapped a protein interaction network of human homologs of proteins that modify longevity in invertebrate species. This network is derived from a proteome-scale human protein interaction Core Network generated through unbiased high-throughput yeast two-hybrid searches. The longevity network is composed of 175 human homologs of proteins known to confer increased longevity through loss of function in yeast, nematode, or fly, and 2,163 additional human proteins that interact with these homologs. Overall, the network consists of 3,271 binary interactions among 2,338 unique proteins. A comparison of the average node degree of the human longevity homologs with random sets of proteins in the Core Network indicates that human homologs of longevity proteins are highly connected hubs with a mean node degree of 18.8 partners. Shortest path length analysis shows that proteins in this network are significantly more connected than would be expected by chance. To examine the relationship of this network to human aging phenotypes, we compared the genes encoding longevity network proteins to genes known to be changed transcriptionally during aging in human muscle. In the case of both the longevity protein homologs and their interactors, we observed enrichments for differentially expressed genes in the network. To determine whether homologs of human longevity interacting proteins can modulate life span in invertebrates, homologs of 18 human FRAP1 interacting proteins showing significant changes in human aging muscle were tested for effects on nematode life span using RNAi. Of 18 genes tested, 33% extended life span when knocked-down in Caenorhabditis elegans. These observations indicate that a broad class of longevity genes identified in invertebrate models of aging have relevance to human aging. They also indicate that the longevity protein interaction network presented here is enriched for novel conserved longevity proteins.

The authors’ focus on genes studied in model organisms is well motivated; genes that control aging in one species are more likely than one would expect from chance to affect aging in another species, even if those species are as diverged as yeast and worms.

The findings: compared to the genome as a whole, longevity genes tend to be more highly connected network, often acting as “hubs” within the network; furthermore, these genes are more connected to one another than the average gene, forming a “longevity network” that stands out against the web of all interactions.

In conjunction with expression data, this network has predictive power: genes that interact with components of the longevity network and exhibit increased expression in aging muscle are very likely to function as gerontogenes in C. elegans. This finding demonstrates once again the significant conservation of lifespan control systems across large evolutionary distances. Perhaps more importantly, it also shows that applying network analyses to large data sets can do more than merely catalog information. With the right combination of high-throughput data, a good network model and the right kinds of statistics, the tools of systems biology can reveal new biology that otherwise would have taken us a very long time to discover.

Most microarray studies of aging animals try to associate gene expression with chronological age: they look for groups of genes that are upregulated or downregulated as we get older. But chronological age is often an imperfect proxy for the quantity we are really interested in – physiological age, or bodily health, which is notoriously difficult to quantify.

The search is on for informative correlates of physiological age (like telomere length) that can be used to assess current health and predict remaining lifespan. One seemingly relevant quantity that is simple to measure is liveliness: we might expect a sprightly 90 year old woman to look forward to more years of healthy life than a lethargic 80 year old.

In the latest issue of Aging Cell, Golden et al. argue that this kind of behavior is a useful proxy for physiological age, and then show that gene expression can be used to predict behavior:

Age-related behaviors have distinct transcriptional profiles in C.elegans

There has been a great deal of interest in identifying potential biomarkers of aging (Butler et al. 2004). Biomarkers of aging would be useful to predict potential vulnerabilities in an individual that may arise well before they are chronologically expected, due to idiosyncratic aging rates that occur between individuals. Prior attempts to identify biomarkers of aging have often relied on the comparisons of long-lived animals to a wild-type control (Dhahbi et al. 2004). However, the effect of interventions in model systems that prolong lifespan (such as single gene mutations, or caloric restriction) can sometimes be difficult to interpret due to the manipulation itself having multiple unforeseen consequences on physiology, unrelated to aging itself (Gems et al. 2002; Partridge and Gems 2006). The search for predictive biomarkers of aging therefore is problematic, and the identification of metrics that can be used to predict either physiological or chronological age would be of great value (Butler et al. 2004). One methodology which has been used to identify biomarkers for numerous pathologies is gene expression profiling. Here, we report whole-genome expression profiles of individual wild-type Caenorhabditis elegans covering the entire wild-type nematode life span. Individual nematodes were scored for either age-related behavioral phenotypes, or survival, and then subsequently associated with their respective gene expression profiles. This facilitated the identification of transcriptional profiles that were highly associated with either physiological or chronological age. Overall, our approach serves as a paradigm for identifying potential biomarkers of aging in higher organisms that can be repeatedly sampled throughout their lifespan.

In the study, worms are grouped into 3 broad categories based on their behavior: category A if they show “symmetric, spontaneous, and smooth movement, ” C if they “only move their nose or tail when prodded,” and B for anything in between. Using data from a previous work, Golden et al. show that behavior is a useful proxy for physiological age: C worms have a significantly shorter remaining lifespan than A worms (after controlling for chronological age). In other words, in our search for biomarkers of physiological aging, we should include genes whose expression levels predict behavior – not just those that predict chronological age.

Golden et al. then get on to their main experiment: they measure mRNA expression levels in worms of 7 different ages, and grade the behavior of each worm. Using a machine learning approach, they show that 71% of the time, gene expression levels accurately predict behavior class. Several of the genes that are more highly expressed in inert C worms (versus in active A worms) are involved in amino acid metabolism and/or related to actin – the authors speculate that physiologically old worms might upregulate these genes to try to repair their damaged cytoskeletons.

Their full results are available online in the GEO database (GSE12290) – this should prove a wonderful resource for bioinformaticians, especially as it is only the second full-genome microarray study of aging in the worm.

We know that calorie restriction substantially extends the lifespan of many model organisms, and recent high-throughput mRNA studies have started to catalog the extensive genetic changes that occur in response to CR. But which of these (very many) differentially regulated genes are the crucial ones that best distinguish the calorie restricted state from the normal one?

Traditional methods of statistical analysis are based on the simple principle that if an individual gene is substantially differentially expressed in CR vs in normal animals, then that gene should be a good biomarker for CR. These methods are limited because they consider only single genes acting alone – but biological reality is usually more complicated than that. It is quite possible that the CR state can be better characterized by some more subtle combined pattern of expression of a group of genes.

To discover these complex multigene relationships, we need more sophisticated tools. In the latest issue of Rejuvenation Research, Goertzel et al. apply artificial intelligence to this problem:

Identifying the Genes and Genetic Interrelationships Underlying the Impact of Calorie Restriction on Maximum Lifespan: An Artificial Intelligence-Based Approach

Novel artificial intelligence methodologies were applied to analyze gene expression microarray data gathered from mice under a calorie restriction (CR) regimen. The data were gathered from three previously published mouse studies; these datasets were merged together into a single composite dataset for the purpose of conducting a broader-based analysis. The result was a list of genes that are important for the impact of CR on lifespan, not necessarily in terms of their individual actions but in terms of their interactions with other genes. Furthermore, a map of gene interrelationships was provided, suggesting which intergene interactions are most important for the effect of CR on life extension. In particular our analysis showed that the genes Mrpl12, Uqcrh, and Snip1 play central roles regarding the effects of CR on life extension, interacting with many other genes (which the analysis enumerates) in carrying out their roles. This is the first time that the genes Snip1 and Mrpl12 have been identified in the context of aging. In a follow-up analysis aimed at validating these results, the analytic process was rerun with a fourth dataset included, yielding largely comparable results. Broadly, the biological interpretation of these analytical results suggests that the effects of CR on life extension are due to multiple factors, including factors identified in prior theories of aging, such as the hormesis, development, cellular, and free radical theories.

So what are Snip1 and Mrpl12 up to? We can hope that some enterprising biologists will follow up this work with experimental validation…

A couple of features of this paper that I found particularly interesting:

  • They pooled data from several different microarray studies, making their job a lot harder — and their results all the more impressive. The various studies they combined used different strains of mice, and measured gene expression at different ages and in different tissues (liver, skeletal muscle, and brain). Dealing with noisy data like this is difficult; it’s notable that they were still able to classify mouse samples as CR or normal with high accuracy.
          In fact, I would like to see them try their approach on an even noisier problem: finding CR biomarkers shared by different species. So far, traditional statistical analysis has identified few if any genes that are significantly differentially expressed in several species of calorie restricted animals; it would be interesting to see if the method of Goertzel et al. could make any sense of a multi-species pool of expression data.
  • They used an unusual algorithm to classify samples as CR or normal (for the computer scientists: they used genetic programming to learn an ensemble of classification rules). Basically, their algorithm votes for whether a sample is CR or normal based on the outputs of several short classification rules (short because each rule looks only at the expression levels of a few genes).
          An advantage of this type of approach (over, say, a ‘black box’ neural network) is that the classification rules are easy to interpret biologically: you can search through them to identify important genes and genetic relationships. A gene is important for CR if it appears in many different rules, and two (or more) genes are related if they appear together in many rules.

Next Page »