Since the completion of the Human Genome Project, DNA sequencing has been trained on bigger and bigger targets: first the $1000 Genome, then the 100,000 Genomes Project and now the Human Cell Atlas. The aim of this most recent initiative is nothing short of mapping gene expression and the proteins in all human cells.
All this gives us some fantastic insights, but it is certainly not the whole picture.
As we understand more, gaps in our knowledge also become more apparent – from genomics-related processes such as epigenetics and non-coding RNA (of any size) to protein structure and form.
Perhaps the greatest challenges in structural biology will be encountered when we tackle lipid and carbohydrate structures. They can be vastly more complex and difficult to analyse, yet play roles in many biological processes and, with greater understanding, may one day become a boon to personalised medicine where the predictive power of genomic information reaches its limits.
Why are DNA and RNA in the spotlight?
Part of the answer is simplicity. Ignoring for the moment epigenetic modifications, DNA and RNA have only four naturally-occurring base pairs and an in-built mechanism for replication.
While protein sequencing and structural analysis is possible, it is far easier – and for many purposes sufficient – to go back to nucleic acids. Since the translation of messenger RNA (mRNA) into protein sequence is understood, it is again a simple task to translate mRNA sequences into protein sequences.
But that’s not the whole picture.
Anyone who follows nutrition will know of the three macronutrients: proteins, carbohydrates and lipids (fats). Clearly the body is comprised of more than (DNA, RNA and) protein.
The body primarily burns carbohydrates for energy, but complex carbohydrates also populate cell surfaces, contribute to cell-cell recognition and serve as antigens and anticoagulants and in hormones. This is without even touching on the many roles of glycosylation – carbohydrate modification of proteins – in modulating protein activity.
Lipids are similarly essential at the cellular level. They make up the phospholipid bilayer of the cell membrane and perform essential cell signalling functions, e.g. as intra- and extracellular vesicles and as a component of hormones. Lipid composition further affects the physical and electrochemical properties of the plasma membrane as well as traffic in membrane proteins.
Yet, we still know very little about lipid and carbohydrate composition and in many cases function—certainly nothing close to what the Human Cell Atlas promises to deliver.
Arguably this ignorance weighs even heavier when we consider the variety of lipid and carbohydrate structures.
Although lipids are generally synthesised in the endoplasmic reticulum from simple, free fatty acid building blocks, lipid variety is considerable: the LIPID MSPS Structure Database currently contains 43111 unique liquid structures.
Carbohydrates are similarly assembled from simple monosaccharides, disaccharides and hundreds of oligosaccharides. The number of building blocks, plus the possibility of isomerism, creates vast combinatorial complexity in possible carbohydrate structures and carbohydrate modifications of other molecules.
So, given that the structural versatility of carbohydrates and lipids probably exceeds that of nucleic acids and proteins, why don’t they receive the same level of attention?
One reason is that analytic methods for carbohydrates and lipids are less advanced than those for nucleic acid analysis.
Chromatography and rapidly improving mass spectrometry techniques are favoured for lipid analysis, but neither of these is currently suited to rapid point-of-clinic analysis.
There is more hope for carbohydrates. In recent reports, recognition tunnelling by means of capture molecules and electronic signal detection has been extended to carbohydrates. The lower concentrations required and the potential to incorporate these tunnels into nanopores does present the potential of carbohydrate and glycoprotein sequencing on a wider scale. In the future we may be thus be able to see nanopore sequencing applied to carbohydrates in the same way as it has already launched a new era of DNA sequencing.
At this point, a genomic optimist might argue that many lipid- and carbohydrate-based diseases have a genetic cause and can be diagnosed by genotyping—so is this necessary? Why not just focus on genetics and epigenetics and from this derive RNA and protein expression and thus lipid and carbohydrate composition? What is there to be gained from understanding carbohydrate and lipid composition?
For one, inferences from DNA sequencing have limits. Many “phenotypes” only correlate imperfectly with genotypes, and many diseases are determined by more than simple genetics.
As a case in point, recent progress in next-generation DNA sequencing and proteomics has once again shown that gene expression and translation are regulated at many levels, often leading to a weak positive correlation between mRNA expression and protein levels.
The prediction of lipid composition from genomic data is expected to be similarly limited.
At the same time, one can imagine that lipid and carbohydrate composition may one day be of value for personalised medicine.
According to the “membrane-centric” view of diabetes, for example, the lipid composition of cell membranes per se contributes to insulin resistance.
It has been known since 1978 that the membranes of red blood cells of diabetics are unusually rigid – so rigid, in fact, that vesicles containing glucose transporters have a hard time fusing with the membranes of muscle cells. The resulting paucity of transporters embedded in the cell membrane reduces glucose transport from the blood into the cells, thus contributing to high blood sugar.
Membrane composition is not a simple genetic trait, is affected by diet, and individuals with the greatest proportion of membrane-hardening saturated fatty acids in their red blood cell membranes are most likely to later develop type 2 diabetes. One can thus imagine that liquid composition could be a marker that precedes diabetes.
Similarly, carbohydrate signatures, such as the structure of tumour-specific Tn antigen, which is within the scope of what electron tunnelling can currently differentiate, could be detected by carbohydrate sequencing for personalised medicine. Much the same has been argued for protein glycosylation signatures.
There are good reasons, then, not to forget the glycome and lipidome when we think about the future of personalised medicine. Challenges include the dynamic nature of the glycome and the lipidome – but, like in genomics and personalised medicine, dynamic processes will also likely create new opportunities as analytic techniques improve.