A stochastic optimization approach for shredded document reconstruction in forensic investigations
Abstract
Document shredding remains a common method for destroying sensitive information, creating significant challenges for forensic investigators seeking to recover such materials as evidence. This paper addresses shredded document reconstruction through a stochastic optimization approach inspired by Markov chain Monte Carlo (MCMC) methods. Unlike traditional approaches relying on physical edge matching — suitable for hand‐torn documents but computationally prohibitive for cross‐cut shredding — our method evaluates visual content matches through edge compatibility metrics. We develop a specialized acceptance criterion balancing exploration of diverse configurations with exploitation of promising solutions. The method employs gamma distribution modeling of edge deviations with maximum likelihood parameter estimation, providing an adaptive framework responsive to reconstruction progress. Through evaluation with over 1100 document instances spanning typed text, handwritten notes, photographs, and mixed‐content materials, we demonstrate robust performance across diverse document types. Empirical comparisons reveal that while simulated annealing (SA) and genetic algorithms (GA) achieve only marginal cost reductions (1%–13%), our approach successfully reconstructs documents that these standard metaheuristics cannot solve. The algorithm handles intermixed fragments from multiple documents — common in forensic casework — with performance analysis showing content‐rich regions assembling faster than uniform areas. Validation on physically shredded documents from the DARPA Shredder Challenge confirms practical utility where traditional methods fail. For complex reconstructions, our semi‐automated approach incorporates human guidance at intermediate stages, reducing computation time while maintaining accuracy. This research advances forensic document examination capabilities, offering a flexible framework adaptable to various document types encountered in investigative practice.