Analysis of I-M223 Group Updated

October 2019
by Ed Ralston

The following chart shows FTDNA STR mismatches or mutations for all I-M223 (I-BY61820) matching members of the Ralston Project:

 

The purpose of this analysis is to develop a hypothetical tree for each member to the most recent common ancestor of the group (MRCA).  Where genealogical data falls short, assumptions based on y-DNA are used to fill in the gaps as much as possible.

The criteria used for this prediction is the assumption that where mutations have occurred and this mutation value is shared by other members (values in red above), that It is more likely that these mutations occurred in as few mutation events as possible. For example, if four members share the same mismatch value on an STR, it is less likely these occurred as four separate mutation events, coincidentally to the same value.  It would be more likely that this happened in one or two events, in common ancestors, and the values passed down to the members.  This new value is the “signature”, or part of a signature, of that common ancestor.

Where a presumed signature is precluded by another presumed signature on a different marker, the marker that has the slower mutation rate is given precedence. For example, DYS449 and CDYa have values shared by four members, both sets including D C Ralston and J Ralston (known cousins).  If the values in both cases came from a common ancestor, it would require back-mutations of the other two in one or the other STR, back to the original value.  Since the probability of that is very low, it is assumed that one or the other STR mutations is the result of more than one event.  In this case, since CDYa mutates about twice as fast as DYS449, it would be more likely that multiple mutation events occurred on CDYa.

The following hypothetical tree is based primarily on this assumption.  The tree is similar in form to the trees generated by Dave Vance’s SAPP tool at jdvtools.com, though, by adjusting signature sensitivity (pfactor) the SAPP tree results are also similar.  Mutations are shown as they are assumed to have happened in the lineages.

 

 

SHARED MISMATCHES to MODE

DYS449: 30 to 29 – This marker is in the 1-37 range at FTDNA.  Although it is listed as one of the faster mutating markers, statistics show it to only mutate on an average of once every 120 generations – about every 3000 years[1].  There are four mutations on this marker, all from 30 to 29, by the following:

D C Ralston          752799
J Ralston               783383
T Ralston               911024
Private 37             

The conclusion is that these four may have shared a common ancestor more recent than the MRCA that passed the 29 value to them.

DYS576: 20 to 19 – This marker is in the 1-37 range at FTDNA.  It is also one of the faster mutating markers, but mutates on average every 90 generations – about 2,250 years.  This mutation is shared by the following:

J Ralston               783383
T S Ralston           868852
L W Ralston         B540602

For these three to have been passed this mutation from a common ancestor, D C Ralston 752799, a close cousin of J Ralston, would have had to have also inherited this value, then back-mutated to the original value.  While this is possible, it is more likely that the mutation shown by J Ralston was a separate event. 

The conclusion is that T S Ralston’s and L W Ralston’s value here was inherited and that J Ralston’s was a separate event.

CDYa: 31 to 32 – This marker is in the 1-37 range at FTDNA.  It is one of the multi-copy markers.  Some consider CDY to be the fastest mutating marker.  Even so, it only mutates on average every 61 generations, or about every 1,525 years.  This mutation is shared by:

D C Ralston          752799
J Ralston               783383
K V S Roulston     250857
K E Ralston           821573

CDY mutates about twice as fast as DYS449. Shared matches on DYS449 are more likely the result of a single event.  J and D C share the DYS449 mismatch with two others.  For J and D C to have also inherited the CDYa mismatch from a common ancestor with K V S and K E, back-mutations would be required.

The conclusion is that this resulted from two mutation events.  D C and J inherited this value from their common ancestor, and K V S and K E may have inherited this value from a common ancestor. 

DYS714: 26 to 27 and DYS549: 11 to 12 – These markers are in the 68-111 range at FTDNA.  DYS714 mutates about every 129 generations and DYS549 about every 200 generations.  These mutations are only shared by these known cousins:

D C Ralston          752799
J Ralston               783383

DYS635: 26 to 25 – This marker is in the 68-111 range at FTDNA, and has the lowest mutation rate for shared mismatches in this group, mutating only about every 295 generations or 7,375 years.  Mathematically this marker has two modes for those so far tested, 25 and 26. (Four have values of 25, four 26, and one 27.)  SAPP shows a mode of 25, FTDNA shows 26, and Excel shows the mode as the first of these values listed.  I agree with the mode of 26 since there is a value of 27 in the list.  An original value of 25 would mean the 27 was the result of two mutation events, instead of one.  If the mode is 26, the mutation is to 25.  It is shared by:

R L Southard        82024
J W Ralston          206275
R O Ralston II       859409
J Ralston               783383

Since J Ralston has this value, and not his cousin, D C Ralston, there would have to be at least two mutation events which resulted in these values.  If the first event was from a common ancestor of all five (including D C), D C would have to have back mutated to 26 later   The rarity of the mutation on this marker would make a back mutation less likely, so it is most probable that J Ralston’s mutation to 25 was a separate event from the other three.

The conclusion is that R L Southard, J W Ralston, and R O Ralston II share a common ancestor from which this value was inherited.  J W and R O II are known cousins which supports a common ancestor.

ESTIMATED GENERATIONS

Genetic distances as shown by FTDNA are a result of a direct comparison of STR values between two individuals.  It is (generally) a count of the mismatches between the two.  However, the true genetic distance is not mismatches, but mutation events between the two.  For example, if two individual lines have mutations on the same marker to the same value, this appears to be a match between the two, but is, in fact, two mismatches.  The hypothetical tree above shows all known/speculated mutations and the number of mismatches between two individuals can be recalculated.

These revised genetic distances are shown in the following chart:

 

Since all these individuals descended from the same common ancestor, the distance to the MRCA should be no more than the distance between the most distant of the matches.  The greatest genetic distances are shown by J Ralston, but since he is a close cousin to D C Ralston, D C’s values can represent that line. The greatest distance at 111 markers is 9 as between K V S Ralston and R L Southard, R O Ralston, J W Ralston, and D C Ralston. Below is the graph for R O. The green arrow shows the greatest probability for the common ancestor is at 9.5 generations, the height of the red line.

 

This graph indicates a likelihood that a MRCA would fall around 10 generations, or up to around 13 (there is a 66.7% probability the MRCA is from 7 to 13 generations – orange area on the graph.).

Another consideration is that we know the MRCA does not fall with the first several generations. K V S can trace his lineage back five generations.  R O can trace lineage back 8 generations, so we know the MRCA was beyond that.  TiP results between R O to K V S, excluding the first eight generations, again show the highest probability at 9.5 generation.

 

Based on this, the tree shows the MRCA at around 10 generations, which varies in each line.

Similarly, the estimated generations between people sharing more recent ancestors, as shown in the hypothetical tree, all fall within the likely range of generations – except where the range is closer than genealogical records.  This is due to an apparent lack of or very few mutations in some lines.  This will be discussed later.

The generations to the MRCA from other sources show slightly higher, but similar results.

At 67 markers, Ken Nordtvedt using Chandler's mutation rates predicts 11.0 generations.
James Herold, at 67 markers, using Chandler's mutation rates predicts 14.3 generations.
Ken Nordtvedt's 111 Markers algorithm predicts 12.6 generations for the most distant and an average of 10.5.

The hypothetical tree shows an average of 10 generations, based on the TiP tool.  The MRCA might be a few generations back from that.

Specific Cases

R L Southard – father born in in 1895 Hill County, Texas – grandfather unknown.  Y-DNA has proven that R L’s father was from the I-M223 Ralston line.  The matching value on DYS635 with R O Ralston II and J W Ralston could mean they all shared a more recent common ancestor.  The known ancestor of R O and J W was born in the early 1700’s, and is the closest of known ancestors in the M223 group to the MRCA.  So, the mutation in DYS635 must have occurred very soon after the MRCA. 

If R L is from the line of a common ancestor to R O and J W, that ancestor would likely either be William Rolstone or his father or grandfather.  Doing a TiP tool comparison of R L to R O and to J W shows the most likely number of generations is 7 to J W and 8 to R O.  If R L descended from William Rolstone, actual generations would be 7.5 to J W and 8 to R O.

Who was R L’s mystery grandfather?  This question may never be answered, but it is worth noting that a Raulston family that descended from William Rolstone, with two male sons, one about the age of R L’s grandmother, was living in the vicinity of Hill County in 1900.  This family moved from Arkansas to Texas between 1893 and 1900. 

S Roulston’s y-DNA values are unusual in that he shows no mutations from the MODE at 111 markers, i.e., no deviations from the values (apparently) held by the MRCA:

 

How unlikely is this? Adding the probabilities of a mutation on each marker shows that there is a probability (p) of about 29.07% that there will be a mutation on at least one of the 111 markers in one generation (g). The probability of there not being a mutation in one generation is (1-p), or 70.93%. The probability for no mutation in two generations is (1-p) 2 or (1-p) g = 50.32%. The chart below shows the probabilities for up to 10 generations:

Probability for a no mutations in:

1 generation 70.93%
2 generations 50.32%
3 generations 35.69%
4 generations 25.32%
5 generations 17.96%
6 generations 12.74%
7 generations 9.04%
8 generations 6.41%
9 generations 4.55%
10 generations 3.23%

S Roulston is also unusual in that his line has a very large average number of years per generation at 48 years. The projected earliest time to the MRCA would be around 8 generations. Even so, the charts above show that in 8 generations, there is only a 6.41% chance of there being no mutations. (Note: Mutations rate projections vary from different sources, but the above results are very similar using other mutation data.)

Although 6.41% sounds like low probability, on average, in eight generations, one out of 16 men will have no mutations. So, in this matching group of 14 men, we would expect one with no mutations.

As the number of generations to the MRCA increases, the chance of no mutations decreases and the chance of back-mutations increases. Although at eight generations it would be more likely to have no mutations than a back mutation, it is possible for there to have been a back mutation.

If so, which marker(s) mutated? It is possible the first mutation was from a common ancestor of other matches, or the back-mutation might have occurred in a separate event, unique to S Roulston’s line, and he may not share a common ancestor at all, before the MRCA.

Following is a tree with an example of a mutation on DYS449 from a common ancestor which later mutated back to the original value. This hypothesized line for S Roulston is very speculative, but possible.

 

 


(1] All mutations rates are based on work of Marko Heinila, Helsinki University of Technology, Unpublished, 2012