Human ancestry: how to work your own PCA, ADMIXTURE analyses for human evolutionary and genealogical studies

I wrote two days ago in the post anouncing the revised version (October 2017) of the Indo-European demic diffusion model, about dumping the information I had on doing PCA and ADMIXTURE analyses as ‘drafts’, without reviewing them, in the new section of this website called Human Ancestry.

I had some time today to review them, and to correct gross mistakes in the texts, so that they might be more usable now

I began to work with free datasets to see if I could learn something more about results of recent Genetic research by working with the available free software. For the moment, I don’t see it necessary to continue working with samples myself, because there are many professionals in Bioinformatics doing an excellent job with their publications – much better than I could do -, and publishing results early (as pre-prints) and with free licenses, which allow us to reuse and modify their material. To work again with their samples seems most of the time like reinventing the wheel.

After all, my interpretation of Indo-European migrations does not depend on my own analysis of free datasets – or on genetic analysis, or on archaeological fieldwork, for that matter – but on the study of all anthropological questions involved. I am actually more interested in Linguistics, and – only marginally – in Archaeology, as is the field of Indo-European Studies in general.

I did find certain interesting aspects that I have commented in the model, though: especially by labelling all samples and reading about them carefully (usually in the supplementary notes of the published papers), you can observe certain patterns and derive some information that others might have missed. Such examples include the Corded Ware outlier from Esperstedt (see more on the Corded Ware migration), or the differences in the three samples from early Khvalynsk.

Now that most data published seem to keep supporting what I have suggested – regarding the more complex nature of the steppe component (so-called ‘yamnaya component‘), and also regarding the migration from Yamna to Bell Beaker, and a migration of a different population (and probably language) with Corded Ware – I don’t find it worthy to spend more of my quite limited time in these tasks.

However, if I need to work again with datasets, I will try to complete the drafts the best I can. Especially regarding F3 Statistics and qpGraph, which I didn’t even try. If you want to help improve the sections, you are welcome of course.

If I find time, I might be of help with your work. And even though modern genealogy does not interest me (for the moment), I guess it can also be relevant to obtain conclusions on more recent migrations, so if I can be of any help to any interesting work, I will do it too.

plot3d-yamna
Plot 3D of datasets Minoans and Mycenaeans + Scythians and Sarmatians, using the same colours as in the Indo-European demic diffusion model.

Related:

  • The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt
  • New Ukraine Eneolithic sample from late Sredni Stog, near homeland of the Corded Ware culture