Researchers at the Big Data Institute at Oxford University have achieved a remarkable feat. They have created the largest ever human family tree, comprising of 231 million lineages.
What makes a human being? It is the human genome that provides the extensive amount of data that eventually ends up as a human. And what is the human genome? It is the complete set of nucleic acid sequences for humans, which are encoded as DNA in the 23 chromosome pairs in cell nuclei. A small DNA molecule found within individual mitochondria also contains some. The two are usually treated separately as the nuclear genome and the mitochondrial genome. The human genome contains both protein-coding DNA genes and noncoding DNA.
How did we find this out? Scientists first started mapping the entire human genome in the 1990s as part of the Human Genome Project.The first almost complete sequence was published in 2001, “with the sequence of the entire genome’s three billion base pairs some 90 percent complete” by the International Human Genome Sequencing Consortium. The full sequence was completed and published in April 2003 but even then there were gaps. In any case, this revealed that there are about 20,000 human genes the code proteins. However, there are others that do not encode proteins but instead express regulatory RNA. Plus, there are also micro-RNA genes.
So, it is quite clear that there is a lot of data required to make a human and we still do not have access to all of it. Further complications arise when we take into account the different technologies and formats used, as well as the differences in samples.
What is clear though is the our DNA is not only what makes us human but also a link to our past through our ancestral lines. This has been further clarified by the Big Data Institute, who have traced “the entirety of genetic relationships among humans: a single genealogy that traces the ancestry of all of us“.
According to the team, “The past two decades have seen extraordinary advancements in human genetic research, generating genomic data for hundreds of thousands of individuals, including from thousands of prehistoric people. This raises the exciting possibility of tracing the origins of human genetic diversity to produce a complete map of how individuals across the world are related to each other.”
They devised a new method, “which can easily combine data from multiple sources and scale to accommodate millions of genome sequences“.
Dr Yan Wong, an evolutionary geneticist at the BDI and one of the principal authors, explained: ‘We have basically built a huge family tree, a genealogy for all of humanity that models as exactly as we can the history that generated all the variation in the modern human genome. This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome.’
As per the news article, “Since individual genomic regions are only inherited from one parent, either the mother or the father, the ancestry of each point on the genome can be thought of as a tree. The set of trees, known as a ‘tree sequence’ or ‘ancestral recombination graph’, links genetic regions back through time to ancestors where the genetic variation first appeared.”
Lead author Dr Anthony Wilder Wohns, who undertook the research as part of his DPhil at the BDIsaid, “Essentially, we are reconstructing the genomes of our ancestors and using them to form a series of linked evolutionary trees that we call a ‘tree sequence’. We can then estimate when and where these ancestors lived. The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples.”
For the analysis, the researchers integrated data on modern and ancient human genomes from eight different databases of 3,609 individual genome sequences from 215 populations. Amongst the ancient genomes were “three Neanderthal genomes, a Denisovan genome, and a family of four people who lived in Siberia around 4.6 thousand years ago. The algorithms predicted where common ancestors must be present in the evolutionary trees to explain the patterns of genetic variation. The resulting network contained almost 27 million ancestors.”
Adding location date to the samples also enabled them to estimate where the predicted common ancestors had lived. The results successfully recaptured key events in human evolutionary history, including the migration out of Africa.
They then converted the data into tree sequences into graphs that had various lineages being represented by nodes, as well as mutations. This means that they can incorporate huge databases into amrelatively small space, which can easily be accessed. The researchers further explains their work in the video below:
The team plans to add more genetic data as it becomes available to make this tree even more comprehensive and because these tree sequences can store data very efficiently, it can accomodate millions of more genomes.
“This study is laying the groundwork for the next generation of DNA sequencing. As the quality of genome sequences from modern and ancient DNA samples improves, the tree will become even more accurate and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today,” said Dr Wong. He further added, “While humans are the focus of this study, the method is valid for most living things; from orang-utans to bacteria. It could be particularly beneficial in medical genetics, in separating out true associations between genetic regions and diseases from spurious connections arising from our shared ancestral history.”
The study has been published in Science.