The notion of a worldwide HDGP was stifled by a couple of things. The upshot was that the HGDP was funded never. What exactly are the ‘populations’ and what does ‘diversity’ properly include? The worthiness, potential and humane need for properly sampling humans beyond the major large populations in Europe and North America is obvious, however the new paper makes the case mainly for the larger ‘mainline’ populations other than Europeans.

Unfortunately, though even they are numerous in the census sense, these are heterogeneous and it is unclear who, exactly, and exactly how, current data stand for them. Can we just blithely say we need to include ‘Africans’ to address the representativeness problem? Are, for example, African-Americans, not to mention ‘Hispanic-Americans’ yet among possible examples? As, well as the same regarding Asians.

The current paper deals with these issues at least somewhat. But then what about, say, New Zealand natives, or Cherokees, other small populations, or which castes and from which elements of India must we collect data? How exhaustive should we sample and how can complex genomes effectively be parsed in this way (not forgetting environments–a subject at least acknowledged by Sirugo et al.). Now while I concur that increasing sampling of individual diversity is very important to many reasons, not least being fairness, the paper promises that it will increase or improve ‘precision’ medication.

To me, that is sloganeering and avoids facing up from what Big Data ‘mice have previously shown us about causal difficulty of the key non-Mendelian traits–complexity not only in the genome but also environmental senses. There are many obvious, but obviously easily overlooked known reasons for this. First, ‘genetic’ causation involves more than inherited genetic variation. Important variance comes up during life, when cells divide.

This somatic variant is genetic, however, not sampled in the usual genome-sequencing way. Yet somatic variation obviously has important consequences because, a cell doesn’t ‘know’ if its genome sequences were inherited from the individual’s parents or arose through the individual’s life. Secondly, the whole enterprise assumes that induction can result in deduction, that is, that what we’ve seen in days gone by leads us to anticipate the future.

It is not just inherited and somatic mutations whose future generally is unpredictable, however the same is true for lifestyle exposures. Yet lifestyle exposures are essential components of complex disease risks. They cannot be predicted, in principle even. That means past exposures do not predict future ones (to environments or mutations). This is not a dark secret, no matter how inconvenient for the ‘omics’ prediction sectors. Unlike many areas in chemistry and physics, induction does not lead to a deduction in life. What we need is a deep re-thinking of the nagging problem of genetic effects on disease and other traits.

But that is not easy to set up when professions and institutions depend on large, very predictable, basically permanent funding is needed for the people involved. To boost these aspects of our science, we need a different way to support it, new economics, not bigger data, or even more sequencing. I really do have to notice that the propensity to ignore, or be ignorant of, prior work is expressed in this paper, which does not point out the HGDP.

We are in an “I’m first!” period in science. I believe Shakespeare realized the clearer truth: ‘What is past, is prologue’. Guidelines need to be followed up, and sampling the world is one particular good notion properly. But this paper doesn’t really deal with the small, traditional aboriginal populations. In the case of the HGDP effort, there was too little support for sampling small simply, relatively isolated populations to build a picture of individual genomic diversity out of the context that it actually arose. Nonetheless it was an effort that knew the problems explicitly, as they stood in those days.