Analysis of the genealogy process in forensic genetic genealogy

Mine Su Ertürk; Colleen Fitzpatrick; Margaret Press; Lawrence M. Wein


The genealogy process is typically the most time‐consuming part of — and a limiting factor in the success of — forensic genetic genealogy, which is a new approach to solving violent crimes and identifying human remains. We formulate a stochastic dynamic program that — given the list of matches and their genetic distances to the unknown target — chooses the best decision at each point in time: which match to investigate (i.e., find its ancestors and look for most recent common ancestors between the match and the target), which set of potential most recent common ancestors to descend from (i.e., find its descendants, with the goal of identifying a marriage between the maternal and paternal sides of the target's family tree), or whether to terminate the investigation. The objective is to maximize the probability of finding the target minus a cost associated with the expected size of the final family tree. We estimate the parameters of our model using data from 17 cases (eight solved, nine unsolved) from the DNA Doe Project. We assess the Proposed Strategy using simulated versions of the 17 DNA Doe Project cases, and compare it to a Benchmark Strategy that ranks matches by their genetic distance to the target and only descends from known common ancestors between a pair of matches. The Proposed Strategy solves cases ≈10 − fold faster than the Benchmark Strategy, and does so by aggressively descending from a set of potential most recent common ancestors between the target and a match even when this set has a low probability of containing the correct most recent common ancestor. Our analysis provides a mathematical foundation for improving the genealogy process in forensic genetic genealogy.