Y-Chromosome Single Nucleotide Polymorphism testing

Y-Chromosome Single Nucleotide Polymorphism testing aka Y-SNP testing explained

Y-Chromosome Single Nucleotide Polymorphisms or Y-SNPs (pronounced as “snip”) are used in Y- Chromosomal DNA (Y-DNA) testing for Haplogroup and Haplotype confirmation. SNPs are defined as: “A single-nucleotide polymorphism (SNP is pronounced snip) is a DNA sequence variation occurring when a single nucleotide adenine (A), thymine (T), cytosine (C), or guanine (G]) in the genome (or other shared sequence) differs between members of a species or paired chromosomes in an individual.” SNPs need to be confirmed by specific DNA SNP tests and are resolved in series of SNPs from basic Haplogrouping into the specific haplotypes or sub-types. Haplogroups are measured in the thousands and tens of thousands of years.

In general Haplogroups can be generally estimated by the first 10 or so STRs (Short tandem repeats) or Y-DNA markers, but the actual haplotype must be confirmed with a specific SNP test.

SNPs when tested are either derived or positive ( + ) or negative ( - ). By following the positive SNPs to the end of the chain, one can see the progression of confirmations through the Phylogentic Tree used to the last or terminal SNP that is derived. See the "Another example" paragraph below.

Color coding and confusion
DNA testing companies like FTDNA.com (Family Tree DNA) use the color green to represent a positive ( + ) or derived SNP test. Those estimated are in red and are not tested for that SNP. Past testing of SNPs are also listed in green confirming the last positive ( + ) tested at that time. This means those who SNP tested many years ago is a lower resolution SNP test compared to today's SNP testing. The newer SNPs are confirmed as by the Big Y test chip as at FTDNA. Without understanding this, it appears that Y-DNA genetic groups with a common genealogical ancestry can appear to have multiple levels or types of positive or tested SNPs when that is not the actual fact. And this can cause much confusion because of the older lower resolution SNP tests tested then verses the newer ones tested for today.

For an example of this at FTDNA, please see the Carpenter Cousins Y-DNA Project version at FTDNA: https://www.familytreedna.com/public/carpenter%20cousins%20%20dna/default.aspx?section=ycolorized - Group 2 provides an example of different green colored SNP names tested at different times.

Background
Before we go into the understanding or resolution of such SNP confusion, the next couple of paragraphs provide a quick review of some important points.

It is very important to understand that “Genetic Genealogy” is genealogical techniques using specific DNA testing to help focus research or over come choke points in genealogical research and that this is often done by triangulation. And that “Genetic Genealogy” is NOT genetic anthropology, the DNA study of humankind.

The time before genealogical records is often referred to as "Deep Ancestry." And any DNA or genetic ancestry testing at this level is anthropologic or delves into DNA data points of genetic anthropology. Genealogy uses personal data (documentation) to make familia connections from one person to another. Where as anthropology uses impersonal unrelated data points using computational models involving things such as geological time and place of material to make sense or to determine a logical sequence of those data points.

Y Chromosomal SNP (Y-SNP) testing determines the Haplogroup and its sub-types or haplotypes from the very deep of human ancestry towards the present. This is done in a mathematical progression that is subject by several variables and the results must be taken as an estimational figure. One example: YSNP prediction or methodology closely maps the classic S-Curve of the classic binary logistic regression formula: y=exp(a+b*x)/(1+exp(a+b*x)).

Y-SNPs are expressed in either a longhand format or a shorthand format. And this is where more (most!) of the confusion occurs in understanding SNPs.

Longhand verses Shorthand SNP designations
The long hand format was a re-organization of over 15 very different regional models completed by the Y Chromosome Consortium (YCC) via a scholarship group that provided compiled information on the YCC Repository, NRY polymorphisms and changes in nomenclature. The nomenclature system published in 2002 and updated in 2008 is widely used in papers on Y chromosome variation today. While the YCC is no longer active the International Society of Genetic Genealogy (ISOGG) web-based Y-DNA Haplogroup Tree continues the longhand methodology of the YCC nomenclature.

The shorthand format focus on the major Haplogroup followed by the estimated (unconfirmed or red color value) or the confirmed (green or derived (+) value) SNP name. This specific focus on the SNP name can cause major confusion when one does not understand the Y-DNA Tree in general. Simply Haplogroup A comes before B and this sequence is generally (mostly!) followed down to Haplogroup T. There has been changes in the estimated time of various Haplogroups since causing adjustments to be made. To see the basic phylogentic structure, see: https://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroup#Phylogenetic_structure

Using Haplogroup R as an example, it has sub-groups or haplotypes defined by confirmed SNPs. A brief view of its sub-tree can be seen at: https://en.wikipedia.org/wiki/Haplogroup_R_(Y-DNA)#Structure – Please the pedigree shows defined SNPs with its longhand classification.

Another example of the Y-Chromosome Phylogentic Tree (Y-Tree) breakdown comes from the Carpenter Cousins Y-DNA Project and the example of FTDNA kit number 5734 at: https://carpentercousins.com/RealDeepAncestry.pdf  In this example, it follows the longhand progression of Haplogroups to their haplotypes along with the specific SNPs tested with an estimation of when those specific SNPs occurred. I cite that specific portion here with its references removed. For its references see the link cited just above.

Y Chromosome Phylogenetic Tree breakdown by  Major Haplogroups then Haplotypes: A (Sample SNPs M42, PR2921, M94, etc) – abt 140,000 years ago B (SNP M168, P9, M181) – abt 65,000 years ago F (SNP M89, M213) – abt 50,000 years ago K (SNP M9, P128) – abt 48,000 years ago P (SNP M45, M74) – abt 39,000 years ago R (SNP M207, M306) – abt 32,000 years ago R1 (SNP M173, M306, P225) – abt 26,000 years ago R1a (SNP M511, M513, M420) – abt 23,000 years ago R1a1 (SNP M459, SRY 10831.2) – abt 21,000 years ago R1a1a (SNP M17, M198) – abt 15,000 years ago R1a1a1 (SNP M417) – abt 7,000 years ago R1a1a1b (SNP Z645, S441, CTS4385) – abt 6,500 years ago R1a1a1b1 (SNP Z283, S339, PF6162) – abt 6,000 years ago R1a1a1b1a (SNP Z282, S198) – abt 5,500 years ago - short hand code example: R-Z282 R1a1a1b1a3~ (SNP Y2395) – abt 5,000 years ago R1a1a1b1a3b? (SNP Z284) – maybe abt 4,000 years ago R1a1a1b1a3c~ (SNP YP694) – maybe abt 3,000 years ago Ria1a1b1a3c~? (SNP YP6281) – maybe abt 2,500 years ago

For the current view of R HaploTree from ISOGG, please see: https://isogg.org/tree/HaplogroupR2019.html - See link there to download the current version into Excel or similar spreadsheet. For example: Do a keyword search or find function for: Z282 to see the example cited above.

Conclusion
In conclusion, the older or lower resolution SNPs are higher on the tree. The lower the defined SNP is on the tree, it is farther away in time from the top of the tree. This is where you will find the newer or higher resolution SNPs.

Please remember that SNPs are useful for "Deep" Ancestry" or for the time AFTER the genealogical time period.