Dienstag, 26. September 2017

Das eigentliche Problem der Hochbegabten

Vor zwei Jahren habe ich einen Aufsatz geschrieben (Link), in dem ich auf die Problematik der Hochbegabten eingegangen bin. Allerdings habe ich damals noch nicht den Kern der Problematik erkannt bzw. ihn nicht deutlich genug formuliert.

Recht gut gelungen ist mir der letzte Absatz:
Was ist die Aufgabe der Hochbegabten in der Gesellschaft? Gibt es überhaupt eine solche Aufgabe? Die Meinungen darüber gehen weit auseinander. Wer meint, Begabte müssten verpflichtet sein, besondere Leistungen zu erbringen, vergisst, dass auch Begabte nur Menschen sind. Begabte links liegen zu lassen oder sie gar zu benachteiligen, stellt das andere Extrem dar. In der Realität kommt aber beides vor. Auf die Idee, dass man Begabte einfach so behandeln könnte wie jeden anderen Menschen auch, kommen leider die Wenigsten.
Was ich damals freilich noch nicht erkannt oder nicht ausgedrückt habe: Auch die Forderung, dass Hochbegabte besondere Leistungen erbringen sollten, stellt bereits eine Form von Diskriminierung dar.

Das eigentliche Problem der Hochbegabten ist aber, dass sie von ihren Mitmenschen als Feinde betrachtet werden, weil sie ihnen zu klug sind, und sie (die Hochbegabten) tun können, was sie wollen - sie werden ihnen immer verhasst sein!

Besonders problematisch ist das deswegen, weil ja auch Hochbegabte ihren Lebensunterhalt verdienen müssen, wirklich kluge Leute aber oft bei der Jobsuche benachteiligt werden. Da es in diesem Fall unzulässig ist, den wahren Grund zu nennen, nämlich, dass diese Leute den Arbeitgebern zu klug sind, werden irgendwelche Gründe vorgeschoben, die bei weniger klugen Bewerbern keine Rolle spielen würden.

Freitag, 22. September 2017

Computational meta'omics for microbial community studies

Segata et al. (2013): Computational meta'omics for microbial community studies

This article reviews "the technological and computational meta’omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges". As the abstract says, the technologies that are already available allow to "comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts".

What kinds of approaches is this review about? The authors write:
Although the ubiquity and complexity of microbial communities have been well studied for decades, advances in high-throughput sequencing have provided new tools that supplement culture-based approaches both in their molecular detail and in their accessibility to a broad scientific community. [...] More recently, genome-wide sequencing approaches, such as metagenomics and metatranscriptomics, have further expanded the experimental tools available for studying the microbiome. Such ‘meta’omic’ approaches expose the genes, transcripts, and eventually proteins and metabolites from thousands of microbes to analysis of biochemical function and systems-level microbial interactions. [...] Metagenomic, metatranscriptomic, and other wholecommunity functional assays provide new ways to study complex ecosystems involving host organisms, biogeochemical environments, pathogens, biochemistry and metabolism, and the interactions among them. Interaction modeling is particularly relevant for human health, and current host–microbe–microbiome systems most often rely on mouse models of the interplay of commensal microbes, pathogens, and hosts. [...] [I]ntegrative meta’omic approaches and advanced computational tools are key for a system-level understanding of relevant biomedical and environmental processes[.]
What is the aim of a meta’omic study and how is it done? Quoting the authors of this paper:
A meta’omic study typically aims to identify a panel of microbial organisms, genes, variants, pathways, or metabolic functions characterizing the microbial community populating an uncultured sample. [...] Metagenomic sequencing, if performed at a sufficiently high coverage, can in some cases allow reconstruction of complete genomes of organisms in a community. [...] [R]ecent years have seen an explosion of metagenome-specific assemblers, which use strategies to tease apart sequencing artifacts from true biological ambiguity within communities. [...] Whole-genome assembly from metagenomes is impossible in most cases, and such assemblers instead aim to provide the largest reliable and useful contigs achievable from their input sequence reads.
These approaches "rely on reference genome catalogs" such as the Human Microbiome Project and the Genomic Encyclopedia of Bacteria and Archaea, which "are systematically filling the gaps in the sequenced portion of the phylogeny".

Another purpose of this is "gene function annotation and metabolic reconstruction":
Microbial communities can be seen not only as groups of individual microbes, but also as collections of biochemical functions affecting and responding to an environment or host organism. Metagenomics can thus also identify the genes and pathways carried by a microbial community, and metatranscriptomics can profile their expressed function. [...] Functional profiling using reference information can be based either on reference genome read mapping (at the nucleotide level) or on translated protein database searches.
Meta’omics can also be used to investigate "microbial ecosystem interaction and association networks", but:
All of these current approaches, however, identify only the descriptive covariation of multiple microbes; they characterize neither the mechanisms of nor the regulatory ramifications of such variation. There is thus a pressing need for multiorganism metabolic models to explain such interactions and for a systems-level understanding of their effect on microbial signaling and growth.
Metatranscriptomics in particular can be used to unravel community expression patterns:
Most current meta’omic tools and studies focus on metagenomic DNA sequencing, but metatranscriptomics is becoming increasingly practical as a window into the regulation and dynamics of microbial community transcription. [...] The major challenge faced in metatranscriptomics is the isolation of microbial mRNA, which usually makes up only a small percentage of total microbial RNA and an even smaller proportion of total RNA if host nucleotides are present.
Single-cell sequencing provides an alternative approach to accessing novel information about uncultured microbes. [...] Current single-cell approaches first isolate single microbial cells by sorting them, lyse them separately, amplify and label them separately, and sequence the resulting pool. The subsequent analysis of single-cell sequence data thus relies much more heavily than do meta’omics on assembly, but fortunately in a less-challenging setting. Recently, elegant combinations of both single-cell genomics and metagenomics have begun to emerge, e.g., in the sequencing of a novel, low-salinity ammonia-oxidizing archaeon from an enrichment culture. Such a combinatorial approach may continue to prove very useful, as the single-cell perspective on novel organism-specific sequences tends to complement whole-metagenome and metatranscriptome overviews of diverse communities.
Meta’omics provides an important tool for studying evolution within microbial communities, which can occur on two very different time scales. Over the course of days, weeks, or the years of a host’s lifetime, microbial genome plasticity allows remarkably rapid acquisitions of novel mutations and laterally transferred genes. Over the course of millennia, however, the overall structure of host-associated communities, their phylogenetic composition, and their microbial pan-genomes can evolve more slowly in tandem with their hosts’ physiology and immune systems. [...] Characterizing the coevolution of quickly evolving complex microbial communities with relatively slowly evolving eukaryotic hosts remains a challenging and largely unexplored field.
One of the ultimate goals of microbial community systems biology is to develop predictive models of the whole-community response to changing stimuli, be it their temperature or pH in the environment, or dietary components in a host gut. Such models may be mechanistic, relying on joint metabolic networks as discussed above, or a descriptive systems biology of microbial physiological ‘rules’ may emerge as a simpler alternative. No unifying approach yet exists, although meta’omic data have provided training input for several first attempts. [...] Given the complexity of most ‘wild’ microbial communities, one of the most promising approaches for such validation has been in the construction of model microbial communities. These have been successful both entirely in vitro, by scaling up the ex vivo coculture of multiple organisms, and when associated with hosts in vivo.
The authors conclude:
In combination with innovative computational models, meta’omics in such environments and in vivo will continue to improve our understanding of microbial community systems biology.

Exploring atomic resolution physiology using molecular dynamics simulations

Dror et al. (2010): Exploring atomic resolution physiology on a femtosecond to millisecond timescale using molecular dynamics simulations

The article begins with a dramatic introduction:
Recent dramatic methodological advances have made all-atom molecular dynamics (MD) simulations an ever more useful partner to experiment because MD simulations capture the atomic resolution behavior of biological systems on timescales spanning 12 orders of magnitude, covering a spatiotemporal domain where experimental characterization is often difficult if not impossible.
The motivation for this:
Computational models, especially those arising from MD simulations, are useful because they can provide crucial mechanistic insights that may be difficult or impossible to garner otherwise[.]
This is further explained in the introduction:
An all-atom MD simulation typically comprises thousands to millions of individual atoms representing, for example, all the atoms of a membrane protein and of the surrounding lipid bilayer and water bath. The simulation progresses in a series of short, discrete time steps; the force on each atom is computed at each time step, and the position and velocity of each atom are then updated according to Newton’s laws of motion. Each atom in the system under study is thus followed intimately: its position in space, relative to all the other atoms, is known at all times during the simulation. This exquisite spatial resolution is accompanied by the unique ability to observe atomic motion over an extremely broad range of timescales—12 orders of magnitude - from about 1 femtosecond (10^-15 s), less than the time it takes for a chemical bond to vibrate, to >1 ms (10^-3 s), the time it takes for some proteins to fold, for a substrate to be actively transported across a membrane, or for an action potential to be initiated by the opening of voltage-gated sodium channels. MD simulations thus allow access to a spatiotemporal domain that is difficult to probe experimentally.
What is this for? The authors write:
Simulations can be particularly valuable for membrane proteins, for which experimental characterization of structural dynamics tends to be challenging. [...] A wide variety of physiological processes are amenable to study at the atomic level by MD simulation. Examples relevant to membrane protein function include the active transport of solutes across bilayers by antiporters and symporters; the passive transport of water, ions, and other solutes by structurally diverse channels; the interconversion of transmembrane electrochemical gradients and chemical potential energy by pumps such as the F1F0-ATPase and the Na+/K+-ATPase; the transmission of extracellular stimuli to the cell interior by G protein–coupled receptors (GPCRs) and tyrosine kinase receptors; and the structural coupling of cells and organelles to one another by integrins and membrane curvature modulators.
The paper further presents several case studies, such as "Permeation through a water channel: aquaporin 0 (AQP0)", "Reconciling discordant experimental results: ß2-adrenergic receptor (ß2AR)" and "Permeation and gating of an ion channel: Kv1.2".

As "major strengths and limitations of MD as a technique for molecular physiology", the authors primarily suggest "accessible timescales" ("MD simulations have historically been most powerful for simulating motions that take place on submicrosecond timescales"). A further paragraph in this chapter deals with "accuracy and errors". Also, "system size" is to be considered when designing an MD simulation study, and:
Classical MD simulations treat covalent bonds as unchanging. To simulate chemical reactions, one must use alternative techniques such as quantum mechanics/molecular mechanics simulations, in which the bulk of the system is simulated as in classical MD, but a small part is evaluated using more computationally intensive quantum mechanical approaches.

Computational imaging in cell biology

Eils et al. (2003): Computational imaging in cell biology

This paper deals with "computational methods that (semi-) automatically quantify objects, distances, concentrations, and velocities of cells and subcellular structures" and thus generate quantitative data that "provide the basis for mathematical modeling of protein kinetics and biochemical signaling networks".

In the introduction, the authors write:
Fluorescent dyes such as fluorescein and rhodamine, together with recombinant fluorescent protein technology and voltage- and pH-sensitive dyes allow virtually any cellular structure to be tagged. In combination with techniques in live cells like FRAP and fluorescence resonance energy transfer, it is now possible to obtain spatio-temporal, biochemical, and biophysical information about the cell in a manner not imaginable before.
This is continued by an elaboration on "methods for segmentation and tracking of cells".
Nowadays, techniques for fully automated analysis and time–space visualization of time series from living cells involve either segmentation and tracking of individual structures, or continuous motion estimation. For tracking a large number of small particles that move individually and independently from each other, single particle tracking approaches are most appropriate.
For the determination of more complex movement, two independent approaches were initially developed, but recently have been merged. Optical flow methods estimate the local motion directly from local gray value changes in image sequences. Image registration aims at identifying and allocating certain objects in the real world as they appear in an internal computer model. The main application of image registration in cell biology is the automated correction of rotational and translational movements over time (rigid transformation). This allows the identification of local dynamics, in particular when the movement is a result of the superposition of two or more independent dynamics. Registration also helps to identify global movements when local changes are artifacts and should be neglected.
Several paragraphs follow that explain how these methods work. The paper also mentions computer vision, visualization and quantitative image analysis.
A great advantage of the combination of segmentation and surface reconstruction is the immediate access to quantitative information that corresponds to visual data. These approaches were designed to deal particularly with the high degree of anisotropy typical for 4-D live-cell recordings and to directly estimate quantitative parameters, e.g., the gray values in the segmented area of corresponding images can be measured to determine the amount and concentration of fluorescently labeled proteins in the segmented cellular compartments.
A challenge for future work is to better understand the biomechanical behavior of cellular structures, e.g., cellular membranes, by fitting a biophysical model to the data - an approach already successfully implemented in various fields of medical image analysis.
Finally, the paper mentions a couple of applications and concludes:
In combination with models of biochemical processes and regulatory networks, computational imaging as part of the emerging field of systems biology will lead to the identification of novel principles of cellular regulation derived from the huge amount of experimental data that are currently generated.

Applications of genome-scale metabolic reconstructions

Oberhardt et al. (2009): Applications of genome-scale metabolic reconstructions

This is a review that examines "the many uses and future directions of genome-scale metabolic reconstructions" and highlights "trends and opportunities in the field that will make the greatest impact on many fields of biology" ten years after the publication of the first genome-scale metabolic reconstruction, a metabolic model of Haemophilus influenzae (Edwards et al. (1999): Systems properties of the Haemophilus influenzae Rd metabolic genotype).
[T]oday [more than] 50 genome-scale metabolic reconstructions have been published[.] [...] Of all organisms that have been analyzed through a constraint-based metabolic reconstruction, Escherichia coli has gained the most attention as a model organism.
Since there has already been a review focusing on E. coli (Feist et al. (2008): The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli), this paper excludes E. coli and focuses on the other organisms instead.

The papers this review is about can be put into five different categories:
(1) contextualization of high-throughput data, (2) guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4) interrogation of multi-species relationships, and (5) network property discovery[.]
The authors summarize the process of metabolic reconstruction as follows:
First, an initial reconstruction is built from gene-annotation data coupled with information from online databases such as KEGG and EXPASY, which link known genes to functional categories and help bridge the genotype–phenotype gap. Second, the initial reconstruction is curated through an examination of the primary literature. Then, the reconstruction as a knowledge base is converted into a mathematical model that can be analyzed through constraint-based approaches. Third, the reconstruction is validated through comparison of model predictions to phenotypic data. In a final fourth step, a metabolic reconstruction is subjected to continued wet- and dry-lab cycles, which improve accuracy and allow investigation of key hypotheses.
What data does this process deliver to us? The authors write:
Through gap analysis and subsequent pathway analysis, studies have elucidated both the stoichiometry of certain reactions and the most efficient pathways for production of certain metabolites, and in some cases have even proposed methods for engineering more efficient strains. Also, it is common for reconstruction efforts to provide high-quality estimates of cellular parameters such as growth yield, specific fluxes, P/O ratio, and ATP maintenance costs, and these theoretical values are often used for hypothesis building or validation in biological studies. Several published metabolic reconstruction studies also include in silico predictions for minimal medium design.
Which organisms have been reconstructed and what kind of data have we gained by this? The paper provides the following answer:
Metabolic GENREs of prokaryotes encompass an average of 600 metabolites, 650 genes, and 800 reactions, whereas metabolic GENREs of eukaryotes include on average 1200 metabolites, 1000 genes, and 1500 reactions. Excluding the two existing reconstructions of Homo sapiens metabolism lowers the average eukaryotic network size to 800, 800, and 1300, metabolites, genes, and reactions, respectively, a closer but still higher distribution to that of prokaryotes. [...] Existing reconstructions span the domains Eukaryota, Bacteria, and Archaea. The most represented domain is bacteria, with 25 species reconstructed.
Now comes something that is interesting for us - the relationship with Computational Systems Biology:
With biology increasingly becoming a data-rich field, an emerging challenge has been determining how to organize, sort, interrelate, and contextualize all of the high-throughput datasets now available. This challenge has motivated the field of top–down systems biology, wherein statistical analyses of high-throughput data are used to infer biochemical network structures and functions.
This metabolic data is "often linked with other data types, such as protein expression data, protein–protein interaction data, protein–metabolite interaction data, and physical interaction data." It can also be used for metabolic engineering, which is "the use of recombinant DNA technology to selectively alter cell metabolism and improve a targeted cellular function".

Regarding hypothesis-driven discovery, the authors write:
Gene microarrays serve as a prime example; a traditional hypothesis-driven study might include examination of 1 or 2 genes in a microarray that are of particular interest. This approach would ignore the thousands of other genes on the chip, however, and could miss important information or trends embedded in those data. Therefore, a systematic framework for incorporating genome-scale data available from multiple high-throughput methods would allow hypothesis-driven biology to benefit from the full range of tools available today. Metabolic GENREs represent concise collections of existing hypotheses, and taken together as a broad context they enable systematic identification of new hypotheses that can be tested and resolved. Therefore, they represent a crucial framework for incorporating the flood of biological data now available into the biological discovery process.
Metabolic GENREs intrinsically represent a simplification of cellular function. The distinct biochemical networks categorized by scientists (e.g. metabolism, regulation, and signaling) blend together in a living cell, creating a far more complicated web of interactions than is convenient or possible to model. This web is fundamentally stochastic, and co-habits the cell with many other simultaneous phenomena including transcription and translation, protein modification, cell division, adhesion, motility, and mechanical transduction of external forces. The very simplifications that make metabolic GENREs powerful tools also make them challenging to use for the study of totally unknown or novel phenomena.
About the interrogation of multi-species relationships the authors write:
A promising direction for computational systems biology is the incorporation of network-level analysis into the field of comparative genomics, which is currently driven by bioinformatics. [...] However, most multi-species analyses reported to date have involved either sub-genome-scale metabolic models or models that have not been carefully annotated. [...] Of the five categories of uses of metabolic GENREs described in this paper, multi-species studies have been represented the least in literature so far. With more genome-scale metabolic models being built and an increased focus on studying multicellular systems, however, we anticipate that this field will see a major increase in activity in the coming years.
Finally, regarding the fifth category, network property discovery, the main point conveyed by the authors of this paper is:
The field of computational systems biology has produced a rich array of methods for network-based analysis, offering tremendous insight into the functioning of metabolic networks. However, many of these methods produce results that can be difficult to link to observable phenotypes. Forging this link poses the greatest challenge toward development of useful network-based tools. For instance, several methods exist to analyze redundancy in metabolic networks. Although these techniques define ‘redundancy’ intuitively in terms of the number of available paths between a given set of inputs and outputs, relating ‘redundancy’ to an observable phenotype poses a difficult challenge.
Each chapter of the paper comes along with a wealth of examples and references to concrete research projects that illustrate what has been done in the respective fields so far.

A Strategy for Integrative Computational Physiology

Hunter et al. (2005): A Strategy for Integrative Computational Physiology

This paper describes a "quantitative modeling framework" being developed "under the auspices of the Physiome and Bioengineering Committee (co-chaired by P. Hunter and A. Popel) of the International Union of Physiological Sciences (IUPS)" that can deal with organ function "through knowledge of molecular and cellular processes within the constraints of structure-function relations at the tissue level".

It follows what other authors have called a "top-down approach":
The challenge is to develop mathematical models of structure-function relations appropriate to each (limited) spatial and temporal domain and then to link the parameters of a model at one scale to a more detailed description of structure and function at the level below.
In the authors' opinion, the concept of a "field" as defined by physicists of the 19th century is essential for this endeavour:
The application of continuum field concepts and constitutive laws, whose parameters are derived from separate, finer-scale models, is the key to linking molecular systems biology (with its characterization of molecular processes and pathways) to larger-scale systems physiology (with its characterization of the integrated function of the body’s organ systems).
The authors also write how this branch of science should be called in their opinion:
The appropriate name for this application of physical and engineering principles to physiology is computational physiology. The term systems biology, currently inappropriately limited to the molecular scale, needs to be associated with all spatial scales.
Next, the authors state that computational modeling must be applied "at the scale of whole organs", "at the tissue level" and "even at the protein level".
Good progress is being made on modeling the anatomy and biophysics of the heart, the lungs, the digestive system, and the musculoskeletal system. [...] Linking the organ and organ systems together to yield models that can predict and interpret multiorgan physiological behavior is the focus of systems physiology. [...] The organ-level models [...] are based on finite-element models of the anatomic fields (geometry and tissue structure) encoded in a markup language called FieldML (http://www.physiome.org.nz/fieldml/pages).
For "modeling cell function", a framework "has been developed over the past five years by the Bioengineering Institute at the University of Auckland". It employs a markup language called CellML. At the URL http://www.cellml.org/examples/repository there are about 300 models in various categories, such as signal transduction or metabolic pathway models.

The next chapter of the paper focuses on models of the heart. The authors explain:
Molecular dynamics (MD) models of the atomic structure of ion channels, pumps, exchangers, etc. are needed that can predict the open-channel permeation of the channels, the voltage dependence of the channel permeability, and the time- and voltage-dependent gating behavior. [...] MD calculations, based on ~100,000 atoms in current models, are very expensive and are typically run for periods of only 10 ns. Sometimes homology modeling is used in combination with MD simulation to generate, test, and refine models of mammalian potassium channels based on bacterial templates. The structures of sodium and calcium channels are also on the horizon, as well as those of key pumps and exchangers.
A major challenge now is to develop coarse-grained models of these ion channels and other proteins with parameters calculated from the MD models. This will allow the models to include transient gating behavior for time intervals up to ~100 ms. [...] One of the challenges now for the Heart Physiome Project is to derive the parameters of the Hodgkin-Huxley or Markov models from the MD models via coarse-grained intermediate models as the molecular structures of these proteins become available.
The next stage of development of cell models will need to take account of the spatial distribution of proteins within a cell and subcellular compartments, where second messengers (Ca2+, IP3, cAMP, etc.) are localized. [...] Developing 3-D models at the cellular level will help to fill the large gap in spatial scales between proteins and intact cells.
Current work is linking myocardial mechanics to the fluid mechanics of blood flow in the ventricles and to the function of the heart valves. Future work will need to include models of the Purkinje network and the autonomic nervous system.
In their conclusions, the authors appear to be very optimistic:
Anatomically and biophysically based models of 4 of the 12 organ systems in the human body are now quite well developed at the organ and tissue levels (the cardiovascular, respiratory, digestive, and musculoskeletal systems). Others (the lymphatic system, the kidney and urinary system, the skin, the female reproductive system, and the special sense organs) are at an early stage of development, and the remainder (the endocrine, male reproductive, and brain and nervous systems) will be addressed over the next few years.
An important goal for the Physiome Project is also to use this modeling framework to help interpret clinical images for diagnostic purposes and to aid in the development of new medical devices. Another goal is to apply the anatomically and physiologically based models to virtual surgery, surgical training, and education. A longer-term goal is to help lower the cost of drug discovery by providing a rational multiscale and multiphysics modeling-based framework for dealing with the enormous complexity of physiological systems in the human body.

Computational Cell Biology: Spatiotemporal Simulation of Cellular Events

Slepchenko et al. (2002): Computational Cell Biology: Spatiotemporal Simulation of Cellular Events

This is an introduction to Computational Cell Biology focusing on the system the authors developed, which is called "Virtual Cell". It also mentions several other programs, in particular StochSim and MCell. To illustrate their ideas, the authors provide examples respective to "RNA trafficking" and "neuronal calcium dynamics".

The paper first mentions a couple of pieces of technology that have contributed to the progress of Cell Biology in general in the past twenty years:
Confocal and two-photon excited fluorescence microscopies permit investigators to study the structure and dynamics of living cells with submicrometer three-dimensional (3D) spatial resolution and with time resolutions as fast as milliseconds. These quantitative microscopies can be combined with fluorescent indicators and fluorescent protein constructs to enable the study of the spatiotemporal behavior of individual molecules in cells. Patch clamp electrophysiological recording can be used to study ion currents through single-channel proteins or across the entire cell membrane. All these techniques can be further combined with methods to impart specific perturbations to cells such as photorelease of caged compounds to deliver controlled doses of second messengers or laser tweezer manipulations to determine the response of cells to mechanical stresses.
With all these advances, scientists have gained the following data:
Massive structural biology efforts have produced extensive databases of 3D protein structures. High-throughput molecular biology and molecular genetics technologies have led to descriptions of the full genomes of several organisms, including, of course, the human genome. More recently, highthroughput proteomics technologies promise to catalog, for a given state of a given cell, the dynamic levels of and interactions between all proteins and their posttranslational modifications.
To "link all the molecular-level data to the cellular processes that can be probed with the microscope", computational approaches are needed.

Regarding the mathematical knowledge required to implement these approaches, the authors write:
The concentrations of reacting molecular species as a function of time in a well-mixed reactor can be obtained by solving ordinary differential equations (ODEs) that specify the rate of change of each species as a function of the concentrations of the molecules in the system. If membrane transport and electrical potential are to be included in the model, the rate expressions can become more complex but can still be formulated in terms of a system of ODEs. However, when diffusion of molecules within the complex geometry of a cell is also considered, the resultant “reaction/diffusion” system requires the solution of partial differential equations (PDEs) that describe variations in concentration over space and time.
The finite volume method, developed originally for problems in heat transfer, is especially well-suited to simulations in cell biological systems. It is closely related to finite difference methods but allows for good control of boundary conditions and surface profile assumptions while preserving the conservative nature of the equations. Most importantly, the finite volume formalism accommodates the heterogeneous spatial organization of cellular compartments. [...] Within such elements, the rate of change of the concentration of a given molecular species is simply the sum of fluxes entering the volume element from its adjacent neighbors plus the rate of production of the given species via reactions. [...] Linear solvers based on Krylov space approximations, such as the conjugate gradient method, in conjunction with a preconditioner (an operator that approximates the inverse of the matrix but can be applied at a low computational cost), become powerful and robust. There are commercial packages that implement a range of Krylov space methods, as well as many of the well-known preconditioners (e.g., PCGPAK, Scientific Computing Associates, New Haven, Connecticut).
When can we use deterministic models and when do we have to use stochastic models instead? The authors write:
If the number of molecules involved in a process is relatively small, the fluctuations can become important. In this case, the continuous description is no longer sufficient and stochastic effects have to be included in a model. Single-channel ionic currents are one such example. [...] Stochastic fluctuations of macromolecules are crucial for understanding the dynamics of vesicles and granules driven by competing molecular motors. In the case of a relatively small number of participating particles, a system that would be described deterministically by reaction-diffusion PDEs requires fully stochastic treatment. In this approach, diffusion is described as Brownian random walks of individual particles, and chemical kinetics is simulated as stochastic reaction events. Numerical stochastic simulations in this case are based on pseudo-random-number generation. They are often called Monte Carlo simulations (the term, originally introduced by Ulam and von Neumann in the days of the Manhattan Project) since throwing a dice is actually a way to generate a random number.
They also provide an example of a stochastic model:
As an example, in the Hodgkin-Huxley model, the membrane voltage is treated as a continuous deterministic variable described through a set of differential equations, whereas the single channel behavior is random. A natural way to introduce stochasticity in the model is to replace open probabilities by the actual numbers of open channels. In fact, Hodgkin and Huxley introduced variables in their model to represent the proportion of open gates for various ions. The number of open channels is random and is governed by a corresponding Markov kinetic model that explicitly incorporates the internal workings of the ion channels. Mathematically, the membrane potential is now described by a stochastic differential equation with a discrete random process.
The authors further mention two papers on stochastic methods from Gillespie, which he deems "especially relevant" for Computational Cell Biology (Gillespie (1977): Exact stochastic simulation of coupled chemical reactions; Gillespie (2001): Approximate accelerated stochastic simulation of chemically reacting systems). Regarding the pros and cons of the algorithm described in these papers, the authors write:
The extraordinary efficiency of the Gillespie stochastic kinetics algorithm is achieved by restricting the decision process to selecting which reaction will occur and adjusting the time step accordingly. Focusing exclusively on the reaction avoids consideration of the properties of individual reactive species as discrete entities, which minimizes processing time when the number of reacting species is large. However, processing time increases in proportion to the number of different reactions. Furthermore, the Gillespie approach does not easily accommodate the existence of multiple states of different substrates, which may affect their reactivities, and since individual reactive species are not identified as discrete elements, their states, positions, and velocities within the reaction volume cannot be followed over time.
This type of approach has been utilized in the Virtual Cell to combine the deterministic description of a continuously distributed species (RNA) with the stochastic treatment of discrete particles (RNA granules)[.]
What follows is a review of programs used in Computational Neuroscience. The authors mention the programs NEURON and GENESIS, the two of which "use cable theory to treat the dynamics of electrical signals in the complex geometries of neurons", which "solves the equation for membrane potential in a series of connected segments with the overall topology of the neuron". Further, he mentions the model description language NMODL which has been added to NEURON and the interface KINETIKIT which makes GENESIS work with chemical kinetics.

The authors also write about software that is supposed "to build complex biochemical reaction pathways and numerically simulate the time course of the individual molecular species within them", such as GEPASI, Jarnac/Scamp, DBSolve, Berekeley Madonna, ECELL, BioSpice and JSIM.

Then, the authors introduce StochSim:
In this program individual molecules or molecular complexes are represented as discrete software objects or intracellular automata. The time step is set to accommodate the most rapid reaction in the system. [...] When a reaction occurs the system is updated according to the stoichiometry of the reaction. Molecules that exist in more than one state are encoded as “multi-state molecules” using a series of binary flags to represent different states of the molecule such as conformation, ligand binding, or covalent modification. The flags can modify the reactivity of the molecule, and reactions can modify the flags associated with a multi-state molecule.
Compared to the Gillespie algorithm, StochSim is supposed to be faster "in systems where molecules can exist in multiple states".

Next, they write about MCell, "a general Monte Carlo simulator of cellular microphysiology":
MCell utilizes Monte Carlo randomwalk and chemical reaction algorithms using pseudo-randomnumber generation. One of MCell’s convenient features is checkpointing, which involves stopping and restarting a simulation as many times as desired. [...] To speed up simulations, MCell is optimized by using 3D spatial partitioning that makes computing speed virtually independent of microdomain geometric complexity. Running parallel computations, another way to speed up Monte Carlo simulations, is also being pursued in MCell.
The paper mentions "microphysiology of synaptic transmission, [...] statistical chemistry, diffusion theory, single-channel simulation and data analysis, noise analysis, and Markov processes" as possible applications of MCell.

Finally comes the main part of the publication, which is about the authors' own program, Virtual Cell:
Simulations of both nonspatial (i.e., ODEs) and spatial (PDEs) models can be performed. For nonspatial models, compartments are assigned appropriate volume fractions relative to their parents in the model topology and surface-to-volume ratios for the proper treatment of membrane fluxes. In spatial models, the segmented regions within a 1D, 2D, or 3D image are connected to the corresponding compartments in the topology. The geometry is prepared for a model in a separate Geometry workspace and can come from a segmented experimental image or can be defined analytically.
The Virtual Cell software displays spatial and nonspatial simulation solutions for the variables over time. The spatial data viewer displays a single plane section of a 3D data set and can sample the solution along an arbitrary curve (piecewise linear or Bezier spline) or at a set of points. Membranes are displayed as curves superimposed on the volume mesh, and membrane variables are displayed along these curves. The nonspatial data viewer plots any number of variables over time on the same plot.
The authors summarize two studies conducted using Virtual Cell, which are about a "model of Calcium dynamics in a neuronal cell" and "stochastic models for RNA trafficking", and conclude:
The Virtual Cell program has several important advantages for stochastic modeling in eukaryotic cells. First, realistic image-based cell geometries are used to define intracellular reaction volumes, which constrain the stochastic behavior of intracellular reactants in unexpected ways. Second, definitions of reactive species can include multiple states described as either discrete parameters or continuous variables, which provide extraordinary contextual richness and behavioral versatility. Third, dynamic transformation and translocation of multiple individual reactive species can be tracked over time, facilitating integration of spatially heterogeneous stochastic models with simultaneous deterministic reaction/diffusion models. A major future challenge for the Virtual Cell will be to integrate dynamic shape changes in the reaction volume within the powerful and flexible stochastic modeling platform already developed. If this can be accomplished, the holy grail of stochastic modeling of cell motility may be attainable using the Virtual Cell.
In the last chapter of the publication, the authors address future challenges for Computational Cell Biology. Among other things, they write:
To improve stability, accuracy, and overall efficiency of numerical simulations, the issues of reaction stiffness in the PDEs, more accurate representation of irregular boundaries, and choice of effective linear solvers need to be addressed. [...] [A]dditional features are being developed, including modeling membrane potential, stochastic processes, lateral diffusion in membranes, and one-dimensional structures such as microtubules and microfilaments. [...] Also needed are computational tools to treat cell structural dynamics to enable the construction of models of such processes as cell migration or mitosis.

Mittwoch, 20. September 2017

Computational disease modeling – fact or fiction?

Tegnér et al. (2009): Computational disease modeling - fact or fiction?

In the Abstract, we can learn about the two main approaches towards computational systems biology:
There are two conceptual traditions in biological computational-modeling. The bottom-up approach emphasizes complex intracellular molecular models and is well represented within the systems biology community. On the other hand, the physics-inspired top-down modeling strategy identifies and selects features of (presumably) essential relevance to the phenomena of interest and combines available data in models of modest complexity.
[T]he development of predictive hierarchical models spanning several scales beyond intracellular molecular networks was identified as a major objective. This contrasts with the current focus within the systems biology community on complex molecular modeling.
A couple of more quotes from the paper. First, about standards for data-collection and representation:
Successful modeling of diseases is greatly facilitated by standards for data-collection and storage, interoperable representation, and computational tools enabling pattern/network analysis and modeling. There are several important initiatives in this direction, such as the ELIXIR program providing sustainable bioinformatics infrastructure for biomedical data in Europe. Similar initiatives are in progress in the USA and Asia.
Next, about model uncertainty:
Across different application areas, a key question concerns the handling of model uncertainty. This refers to the fact that for any biological system there are numerous competing models. Any discursive model of a biological system therefore involves uncertainty and incompleteness. Computational model selection has to cope systematically with the fact that there could be additional relevant interactions and components beyond those that are represented in the discursive model. For instance, there is often insufficient experimental determination of kinetic values for mechanisms contemplated in a verbal model, leading to serious indetermination of parameters in a computational model. Hence, biological models, unlike models describing physical laws, are as a rule highly over-parameterized with respect to the available data. This means that different regions of the parameter space can describe the available data equally well from a statistical point-of-view.
A successful strategy in computational neuroscience has been to identify minimal models that adequately describe and predict the biology, but at the potential price of selecting a too narrowly focused model. This approach is justified if adequate knowledge of the underlying mechanisms involved in a given condition exists.
An alternative approach, recently employed within the systems biology and computational neuroscience fields, is to search for parameter dimensions (as opposed to individual parameter sets) that are important for model performance. This concept of model ensembles represents a promising approach.
[A] mechanistic model is not very helpful unless there are experimental means to assess its predictive validity[.]
How do systems biology and computational neuroscience differ?
It appears that the systems biology community focuses on intracellular networks whereas computational neuroscience emphasizes top-down modeling.
It must also be recognized that top-down models of insufficient richness may excessively constrain model space and lose predictive ability.
Further, the authors write about theory and formal models:
There is a lack of theory for how to integrate model selection with constraint propagation across several layers of biological organization. Development of such a theory could be useful in modeling complex diseases even when only sparse data is available. One useful practical first approximation is the notion of disease networks – i.e. network representations of shared attributes among different diseases and their (potential) molecular underpinnings.
[In computational systems biology], much attention is given to formal methods of model selection and datadriven model construction. In contrast, in computational neuroscience (with the notable exception of computational neuroimaging), formal model selection methods are almost completely absent.

What it takes to understand and cure a living system

Swat et al. (2011): What it takes to understand and cure a living system: computational systems biology and a systems biology-driven pharmacokinetics-pharmacodynamics platform

This publication serves as a general introduction to Computational Systems Biology, as well as an introduction to the SBPKPD platform, more about which can be found at this website: http://www.sbpkpd.org/

One of the notable features of this platform is its statistic capabilities:
An automated statistical analysis provides parameter estimates with their standard errors, covariance matrix, residual plots and goodness-of-fit measures, such as the Akaike Information and Schwartz criteria.
On the technical foundations of the platform:
To avoid typical problems of accessibility (owing to restriction to one platform or browser type), we based our SBPKPD on the platform-free Java-based Google Widget Toolkit technology. All models are implemented and run in R, a programming language for statistical computing (http://www.r-project.org/). [...] To our knowledge, no tool in this area has been designed so far for execution on an R-based cluster, and we would like to use this exciting possibility for computationally expensive tasks.
Further development will focus on the following things:
With its solid conceptual base and its mathematical background, our SBPKPD platform is suitable for further development into more specialized facilities. In a (semi)automatic in vitro–in vivo correlation system, existing models and approaches such as PK fitting supported by new processes like numerical deconvolution, could establish mathematical relations between the in vitro drug dissolution and its in vivo behaviour. Such an ‘IVIVC’ system could be quite useful for clinical and pharmaceutical research in the process of new drug admission, for which few tools exist, and which are all commercial: it is our goal to stimulate crossinstitutional cooperation in this area by providing an open-source simulation and modelling platform, the development of which will also be guided by clinical users informed best about current needs in daily medical practice.
Here are some more quotations from the paper, giving an introduction to Computational Systems Biology in general. First, a quote about the history of systems biology.
Genomics started from biochemistry and then molecular biology. It was paralleled by a development in physics and mathematics, which led to applications of non-equilibrium thermodynamics in biology, mathematical biology and ultimately to metabolic control analysis, flux balance analysis and dynamic network modelling. These two upward movements have since been combined into a scientific discipline called Systems Biology. Systems Biology (SB) aims at understanding how biological function emerges in the interactions between components of biological systems. Ultimately, SB should enable one to understand how improper networking of the macromolecules of living organisms leads to their diseases and how molecular interference may redirect those networks to their proper functioning. SB has progressed to new understanding of the organization and functioning of metabolic and signal transduction pathways in ways that had been impossible with molecular and cell biology, and indeed with functional genomics, alone. Moreover, not even SB has delivered yet the understanding of the functioning of entire organisms, such as in an understanding of disease or in actual drug discovery.
What approach to systems biology is realistic?
[T]he extreme bottom-up approach to whole organism SB that would describe the activity of every individual macromolecule, is not within the reach of the present computation methodologies, and, even worse, not within the reach of the necessary experimentation facilities.
How about pharmacokinetics?
The equations used in pharmacokinetics (PK) [...] use abstractions of physiological processes to fit equations to observed dynamics of the concentrations of drugs in the patient. Parameters again refer to abstractions of real components of the systems; they include ‘distribution volumes’, which often much exceed realistic volumes, as they comprise the effects of partition coefficients. This is fine for quasisteady states, but may not work well in dynamic situations, or when saturable kinetics determines distributions. Indeed, mechanistic PK is probably the most neglected field in the area of medically relevant biosimulations.
Why models are often unrealistic:
The lack of quantitative and standardized in vivo measurement techniques at the molecular level forces one to obtain in vitro data in artificial or cell line-derived constructs (e.g. Caco-2) or to interrogate animal models barely resembling the human. The accompanying hurdle is the in vitro-to-in vivo and/or inter-species extrapolation (often based on phenomenological and disputable allometric ‘laws’). Each of these steps is full of simplifications distorting the reality one thinks to observe.
Further, the paper lists tools for model creation:
There exist of course a number of excellent tools for physiologically based whole-body models like SIMCYP, GASTROPLUS and PKSIM or ADAPT II, WINNONLIN, NONMEM and KINETICA for compartmental (population) PK analysis. The tools in the first category suffer from their closed architecture making open source collaboration impossible. On the other hand, tools in the latter category are accessible as standalone applications, running to a large extent under Windows only. Their user-friendliness varies between very sophisticated but expensive, and disputable (e.g. Fortran syntax in NONMEM) but free or inexpensive. [...] In this paper we shall introduce SBPKPD, a platform for [...] an open-source collaboration.
A number of opensource model repositories exist, with a broad spectrum of models and simulation facilities (Java Web Simulation - JWS and Biomodels.net, or more specialized (e.g. CCDB, which contains cell cyclerelated models only)). Together, JWS online and Biomodels store hundreds of kinetic models for metabolic, signal transduction and gene-expression pathways.
Regarding Java Web Simulation, the authors further explain:
JWS online also offers the possibility to run simulations and multiple analysis options (e.g. steady-state and metabolic-control analysis) for any of its models online, i.e. without downloading of software tools. This is what defines it as a ‘live’-model repository, i.e. the models are alive through the web. Through the web, one can change parameter values in any of the models and calculate the implications for model behaviour. One can also determine which steps in a modelled pathway most determine a specific flux or concentration. The view is to make mathematical models produced by SB useful to scientists who are ignorant of mathematics. The use of JWS online is close to experimentation. It may be important that quality control of models is disentangled from the application or validation of the models. If these important activities are mixed, internal inconsistency of modelling may cover up for lack of experimental validation.
JWS online also has the perspective of the silicon organism, also called the virtual biochemical organism (human) (http://vbhuman.org/). This means that it hopes that its models can be linked up with each other such that they grow, ultimately to cover significant parts of entire organisms. This may seem less efficient than the approach of genome-wide kinetic models for entire organisms, but it may not be. The automobile industry is using modular production lines to improve the robustness of the overall production flow to fluctuations in the activities in individual steps. Modularity also makes the quality control manageable. Checking the quality of a genome-wide model is impossible for any individual because of the great complexity. Scientific experts may still be able to check the quality of pathway models.
An important deliverable of the JWS online and Biomodels facilities will become the connecting of adjacent models into larger models of part of the whole cell. Such an activity could greatly reduce the total complexity of the modelling of whole organisms. Success is not guaranteed however; it will depend on whether the biological function is indeed modular and on advances in multi-scale modelling approaches. The organization of whole organisms into tissues, of tissues into cells, and of cells into organelles, as well as the separation between transcription, translation and metabolism, suggests that biology is indeed modular, perhaps because of the same robustness requirements as the automobile industry. At the same time, where such obvious modules are absent this may signal a functional reason, and the approach might not work.

Freitag, 15. September 2017

Model approach for stress induced steroidal hormone cascade changes in severe mental diseases

As I have been talking about it to a fellow Doctor of Medicine, I would like to point out that my most important scientific publication so far, "Model approach for stress induced steroidal hormone cascade changes in severe mental diseases", can be accessed free of charge here.

The publication has not gained the attention it deserves yet. Basically it proposes a model how changes to the steroidal hormone cascade might be the cause or at least a symptom of several mental illness. What this publication does not mention is that we have found out that applying high doses of isoflavones alters the steroidal hormone cascade, which has a beneficial effect in severe mental illness.

Direct link to the PDF file

Artificial Life: An Introduction

In addition to the subjects I listed in my blog post "New Year's Resolutions", I would like to learn more about the emerging field of artificial life. I chose the paper "Open Problems in Artificial Life" (published in 2001) as an introduction to the matter. Here are some crucial quotes from that paper:
In contrast with mathematics, artificial life is quite young and essentially interdisciplinary. The phrase “artificial life” was coined by C. Langton (1986), who envisaged an investigation of life as it is in the context of life as it could be. [...] This broadly based area of study embraces the possibility of discovering lifelike behavior in unfamiliar settings and creating new and unfamiliar forms of life, and its major aim is to develop a coherent theory of life in all its manifestations, rather than an historically contingent documentation bifurcated by discipline. [...] Artificial life is foremost a scientific rather than an engineering endeavor. Given how ignorant we still are about the emergence and evolution of living systems, artificial life should emphasize understanding first and applications second, so the challenges we list below focus on the former.
The challenges that sound most intriguing to me are:
Achieve the transition to life in an artificial chemistry in silico.
Artificial chemistries are computer-based model systems composed of objects (abstractions of molecules), which are generated by collision between existing objects according to a predefined interaction law. [...][...] Bimolecular chemistry is assumed to be sufficient to display the transition to life, but this may involve complex structures. The chemistry may be stochastic rather than deterministic, but should be constructive rather than descriptive; that is, an interaction law should predict (like an algorithm) the product molecules for colliding objects of arbitrary complexity. [...] Artificial chemistries have been investigated by many authors in spaces of various dimensionalities, with deterministic and probabilistic interaction laws. Molecules have been abstracted using cellular automata, secondary structure folding algorithms, finite state automata, Turing machines, von Neumann machines, and the lambda calculus.
Simulate a unicellular organism over its entire lifecycle.
The artificial organism should exhibit virtually its complete spectrum of behavior, including its ability to evolve. [...] The integration of the simulation of many thousands of proteins, and genetic as well as regulatory networks, at the level of deterministic kinetics would already provide important novel quantitative understanding of cell cycle dynamics. However, for moderate completeness, simulating the folding of all biopolymers and their reactions and supramolecular interactions is still a formidable challenge, since current successes in folding are statistical rather than ab initio, and vast progress in integrating molecular dynamics on time scales of minutes to hours is needed. [...] [C]ombinations of (for example) reaction kinetics, molecular dynamics simulations, and lattice gas simulations would be more powerful than any single simulation approach.
Determine what is inevitable in the open-ended evolution of life.
In different historical unfoldings of the evolutionary process and in evolution in other media, two related questions arise: (a) What are the features common to all evolutionary processes, or to broad classes of evolutionary processes? (b) Do different evolutionary processes contain fundamentally different evolutionary potential?
Determine the predictability of evolutionary consequences of manipulating organisms and ecosystems.
The ecosystems of interest include those as different as the entire global biosphere and individual human immune systems, and ecological manipulations range from industrial pollution, climate change, and large-scale mono-crop agriculture to the introduction of genetically engineered organisms. [...] How far can one rationally redesign or rapidly select organisms to fulfill multiple novel criteria without disturbing the viability of the organisms’ organization and defense systems? Is there a tradeoff between utility and viability, or between size of modification and duration of organism utilization? [...] With increasing understanding of the genetic control of development, it will be possible to create novel multicellular organisms through sequential genetic reprogramming. Do we need long-term evolutionary optimization to support or perfect such major changes to organisms?
Develop a theory of information processing, information flow, and information generation for evolving systems.
Firstly, there appear to be two complementary kinds of information transmission in living systems. One is the conservative hereditary transmission of information through evolutionary time. The other is transmission of information specified in a system’s physical environment to components of the system, possibly mediated by the components themselves, with the concomitant possibility of a combination of information processing and transmission. The latter is clearly also linked with the generation of information[.] [...] Secondly, the challenge is to unify evolution with information processing. One starting point is the observation that components of evolving systems (organisms or groups of organisms) seem to solve problems as part of their existence. More generally, theory must address what the capacity of an evolving system’s information processing is, and how it changes with evolution. Are there thresholds between levels of information processing during evolution that match the levels identified in automata theory—for example, from finite state machines to universal computation? How do the algorithms employed by organisms classify in terms of their problem solving efficiency? The third and least-understood role of information is its generation during evolution. As evolution takes place, evolving systems seem to become more complex; successfully quantifying complexity and its increase during evolution is one important part of understanding information generation. Another problem in this area is that of understanding how complexity in an evolving system’s environment can affect the complexity of the organisms that are evolving within the environment.
Demonstrate the emergence of intelligence and mind in an artificial living system.
Two deep issues in this area arise for artificial life. The first is substantive: whether and, if so, how the natures of life and mind are intrinsically connected. The second is methodological: whether it is most profitable to study mind and intelligence only when embodied in living systems. Both issues motivate artificial life’s existing attention to autonomous agents and embodied cognition, and they bear on artificial life’s relation to its elder sister, artificial intelligence. Progress on this challenge will shed new light on many current controversies in both fields, such as the extent to which life and mind should be viewed as “computational.” A constructive approach to all these concerns is to try to demonstrate the emergence of intelligence and mind in an artificial living system.
I am looking for people with similar interests, and perhaps even experience in these fields, in order to discuss the state of this emerging branch of science and possibly cooperate on projects.