Analysis I-M223 Members

Ed Ralston
November 6, 2018

One factor that differentiates the I-M223 members of the Ralston Project is that all the members appear to be related and descendants of a common ancestor (MRCA).  This analysis shows the following:

  • All the I-M223 members are indicated by Family Tree DNA as being related at the 37 and 67 marker tests. (Those who have tested at 111 markers are likewise related.)   Matrices showing the genetic distances at 37 and 67 markers are shown below.
  • With any group of related y-DNA matches, the modal or most common value for each STR is the presumed STR value for the most recent common ancestor (MRCA). This presumes that the matches all generally inherited the y-DNA of the MRCA, except where mutations occurred.  Charts showing these mutations, or differences from the mode, for each member at 37, 67, and 111 markers are shown below.
  • Matrices below show the genetic distance between each member and the number of mutations of each member to the MRCA.
  • At least two pairs of cousins exist amongst our members. By comparing the STR mutations of cousins to the modal values, one can tell if a member’s mutations occurred before or after the intermediate ancestor (the ancestor the cousins have in common.)  This discussion is below.
  • Some family lines mutate faster than others which makes it difficult to estimate time to the MRCA based on genetic distance. See below for more information.
  • Sometimes a mutation on a marker might show a difference of two steps compared to the mode or others. How is genetic distance calculated in this case?   See below.
  • Some STRs mutate faster than other STRs. If a mutation occurs on one of these markers it may not be an indication of the passage of many generations.  Some researchers disregard the fastest mutating STRs when doing analysis.  Charts showing the member’s differences from the mode disregarding the fast mutating STRs at 37, 67, and 111 markers are shown below.
  • This analysis implies the possibly these members are even more closely related than thought. A hypothetical descendancy tree from the MRCA is shown below.

A limitation of this and any y-DNA analysis is that a mutation might occur on an STR several generations back, then at some point that same STR might mutate again and revert back to the original, which would look like no mutation at all.  This is not likely, except on markers that mutate quickly.  Another complication is that all or the majority of the members of a group of matches might coincidently all have mutations on the same marker at the same value, which would skew the modal value.  The larger the group, however, the more remote is this possibility.

The following charts are genetic distance matrices at 37 markers and 67 markers for the I-M223 members of the Ralston Project.  These matrices show that these members are all related to each other.

37 marker matrix:

Note:  Although 206275, 821573 and 783383 show as “not related” at this level to a few others, they are still included in the group because they are related to other members of the group.

67 marker matrix: 

Note:  At 67 markers, 206275 and 783383 are now in the “Related” category for all other members.

Modal Values

Since all these members are related, another factor can then be considered in analyzing their y-DNA.  Dr. Allan H. Westreich wrote in his paper, Using a Y-DNA Surname Project to Dig Deeper Into Your Genealogy: A Case Study, “A common method (Gleeson, 2015b) for determining the genetic proximity of a group of haplotypes is to compare each of them to an approximation of their MRCA’s haplotype. This haplotype is often best estimated by the modal haplotype (Gleeson, 2015b) of the group, which is calculated as the haplotype consisting of the modal (most frequent) values on a marker-by-marker basis.”

On the Ralston Project home page, if one selects to view DNA test results, then selects “Colorized Chart”, the resulting chart shows the modal value, “MODE”, for each subgroup in the project.  The chart also highlights STR values that differ from this mode.  By counting the number of highlighted numbers, one can find their number of mutations since the group’s MRCA.

Modal Difference Charts:  Following are charts that show all the STRs where each member of the project differs from the modal value, or the STR of the MRCA.  For simplification, the STR values where all agree are omitted.   The modal values in this analysis were based on an actual mode calculation for each STR using an Excel spreadsheet.

At 37 Markers:

At 67 Markers:

At 111 Markers:

Totals are shown for the number of mismatches to the mode or mutations since the MRCA for each group of tested markers.  The totals ranch from 0 to 4, the 4 being one member at 37 markers.  “Modern Mutations”, shaded in green, are explained in the section “Cousins” which will follow.

Following are the matrices again – this time with the mutations from the MRCA added as determined from the Modal Difference Charts above:

37 marker matrix:

67 marker matrix:

Note that at 67 markers all are 0 to 2 mutations from the MRCA, except 783383.  However, 783383 is a second cousin once removed of 752799 (more on that following) so the actual distance to the MRCA can be adjusted for recent mutations to match that of his cousin at 2.


OBSERVATIONS:


  1. COUSINS

    1. There are two sets of known cousins in this group. The first set is 752799 and 783383 who have a common ancestor of the mid-1800s.  Referring back to the modal difference chart, their markers have the same mismatch at 21 (DYS449) and 34 (CDYa) compared to the mode values.  These two markers, therefore, must have mutated prior to their common ancestor for them both to have inherited those values.  783383 has two additional mismatches which his cousin does not share.  These two markers must have mutated after their common ancestor and are what I term “modern mutations.”
    2. The other cousins, 206275 and 859409, have a common ancestor from the early 1700s. They show no mismatches in common, so they inherited no mutations from their common ancestor.  Both of these cousins had three “modern mutations” that occurred after their common ancestor, because where one cousin had a mismatch, the other retained the original value for that marker.  Their common ancestor was closely related to the MRCA of the group (0 mutations after deducting the “modern mutations”).
    3. Theoretically, it would be possible to determine if two members of the group are closer related (closer cousins) than to the rest of the group (if there are enough generations between the cousins’ ancestor and the group MRCA for mutations to have occurred) by observing where they agree on mismatches. However, other than 752799 and 783383, who are known cousins, there no obvious situations among our current members.  821573 matches 783383 on one mismatch, but this mutation of 783383 was from a known intermediate ancestor, not in the lineage of 821573.  Likewise 868852 matches 783383 on a “modern mutation”.  Both these are coincidental.
    4. This analysis shows that there is a benefit in cousins (the more distant, the better) being y-DNA tested and being part of a surname project. By having one or more cousins to compare against, it can be determined if the genetic distance to the MRCA is from recent mutations or are from many generations back.  If testing is done for more than two cousins, with generations between common ancestors, it should be possible to also determine if mutations occurred between the two ancestors.

  2. STABLE VS. UNSTABLE LINES

783383 shows that mutations can occur in a short period of time, having two mutations occurring in three generations.  Other members show that many generations can pass without a mutation.  FTDNA.com shows the number of mutations as genetic distance between two tested individuals. The estimated closeness of kin has to be considered as only a guideline, based on averages.  It might be very possible for one to be just as kin to someone at 4 or more genetic distance as to someone at 0 genetic distance.  Charles F. Kerchner, Jr., P.E. stated in his paper, An Overview and Discussion of Various DNA Mutation Rates and DNA Haplotype Mutation Rates. Do the YSTR Haplotypes in some Y Chromosome Male Lines Mutate Faster Than in Other Male Lines?, “Some family male lines appear to have a very stable Y chromosome. And other males lines may have one that mutates far more than the average…”


  1. TWO-STEP MUTATIONS

An interesting observation regarding 868852 is that on STR DYS413b he shows a value of 24, whereas all others in the group have a value of 22.  So did this happen as a result of two mutations, one to 23, then another one later to 24?  Or did this happen in a single event?  Is this one genetic distance or two?  There are different models that count it either way.  FTDNA counts it as one “difference”, therefore one genetic distance.  For more information see https://dna-explained.com/2016/07/27/y-dna-match-changes-at-family-tree-dna-affect-genetic-distance.


  1. MUTATION RATES

Additionally, different STR markers mutate at differing rates.  Thus, a mutation on one STR might carry more significance than a mutation on another, since a mutation on a marker that rarely mutates would be an indication of many more generations for the mutation to have occurred.  Again, note on the Ralston Project home page, if one selects to view DNA test results, then selects “Colorized Chart”, the STR markers listed across the top are colored.  Those with a red background are the fast mutating markers.

Several web sites, such as http://adamsfamilydna.com and http://www.taylorfamilygenes.info, have posted tables showing the average mutation rate of each STR.  Of the 111 markers tested at FTDNA, the top four fastest mutating markers are CDYa, CDYb, DYS712, and DYS449.  Because these markers mutate so quickly, some analysts exclude these markers from their analysis.

If these fast mutating markers are excluded the results are as follows in these revised
Modal Difference Charts:

At 37 Markers excluding CDYa, CDYb, DYS712, and DYS449:

At 67 Markers excluding CDYa, CDYb, DYS712, and DYS449:

At 111 Markers excluding CDYa, CDYb, DYS712, and DYS449:

Many of the mutations in the I-M223 group occurred in the STRs that frequently mutate.  With the exclusion of just these four highly mutating markers, the genetic distance appears to be much closer, with 2 or fewer mutations at 37 and 67 markers, and 3 or fewer at 111 markers.


CONCLUSION:

The accuracy of these analyses is improved by having as many matches as possible in the group.  Having more matching members makes the modal calculation more accurate and offers greater possibilities for determining if intermediate ancestors existed.

Additionally, accuracy is improved by having as many as possible testing at 111 markers.  Note in the chart that four members show a mutation count of 0 from the MRCA at 37 and 67 markers.  However, the two of these four that have tested 111 markers showed mutations in the additional markers.  This would also make the modal calculation at the 111 level more accurate.

In addition to STR testing there are also benefits to be gained from SNP testing to further narrow a member’s haplogroup.  In the I-M223 members, a few have further tested to show they are I-L623.  Because all these members are closely related, that would indicate that all of the I-M223 members can be further refined to I-L623.  Further, a close match to all the I-M223 members has a Big-Y test that indicated he is I-PH3906.  One of the I-M223 members as now also done Big-Y testing which proves he is, in fact, I-PH3906.  His results are being further analyzed at yFull.com.   Having others to do this testing would also help further analysis.

It was already apparent that all the I-M223 members are related, descending from a common ancestor.  This analysis supports that and implies the possibly these members are even more closely kin than thought.

The most likely scenario for the path to the MRCA is that there are branches with intermediate ancestors, from which closer cousins descend, as in our (at least) two cases.  However, if there were many generations between the MRCA and the intermediate ancestor, one would expect mutations to have occurred and been passed down to the end matches.  We currently have only one instance of inherited mismatches, between the MRCA and Robert Ralston 1820 two mutations occurred (in fast mutating STRs.)

 

Hypothetical Paths to Most Recent Common Ancestor

With Mutations at 37 Markers
(Rapidly mutating markers shown in red)

* It is undetermined if this mutation occurred before or after the intermediate ancestor (white box).
^320496 has a cousin not in the project, but based on data obtained from the cousin’s matches, the single mutation occurred prior to Henry McCalvin Rolston.

The tree above shows seven lines back to the MRCA.  Unless our MRCA, Mr. R_lston, had eight sons from which each of these lines descended, that is incorrect.  It is more likely these lines descend from a few of Mr. R_lston’s sons, who would be intermediate ancestors, and/or perhaps grandsons or great grandsons.  Whoever the intermediate ancestors were, as stated previously, they were close enough for there to have been no mutations in their y-DNA.