In my recent newsfeed article, I noted that "technological advances are happening at a remarkable, perhaps exponential pace, and will be ubiquitous in almost every field of forensic science in the near future. These tools (e.g., artificial intelligence, forensic genealogy, risk assessment algorithms) have tremendous promise but also great potential peril. In addition to the aforementioned technologies, I'm sure readers can think of other technologies that likely will impact each of their section's members."

In the vein of the 2025 meeting theme, you will see section articles related to promoting the responsible, ethical, and just use of technology in the forensic sciences of their discipline. Below is the first submission provided by the Engineering & Applied Sciences Section.

Forensic Artificial Intelligence? Lessons Pertaining to Justice for All

Source: Carole Chaski, PhD, Engineering & Applied Sciences Section Chair

Forensic science, particularly forensic engineering science, is faced with a multitude of issues regarding Artificial Intelligence (AI) and, in particular, the effects of generative AI. This article will discuss AI and how it relates to our goal of justice for all.

AI is an outgrowth of the very human desire to create and use tools. It is not an evil unto itself, simply because it exists (although Ted Kaczynski and other Luddites would disagree with me). The usual and somewhat realistic fear induced by the introduction of a new tool is that the person who previously performed the service that the tool can now perform is going to be replaced by the tool. On the other hand, proponents of the new tool deploy the usual encouragement of efficiency, profit, and productivity and hope to entice new users of the tool. Recently Sam Altman and other proponents of "generative AI" have presented both of these views to the public, to government officials, and to political leaders, claiming that generative AI will both destroy and save humanity.

Like any tool, AI can be used for good and for evil. But first, a bit of history is needed so that, as forensic scientists, we can know what to keep and what to avoid.

AI has gone through four phases, in my opinion as someone who started working in AI applications in the 1980s. In each phase, there has usually been one spectacular claim and several working software programs related to that claim.

The first phase in AI, in the 1950-60s, is usually associated with machine translation. The first attempts at machine translation were spectacularly bad, and human translators were saved as a profession because the fundamental idea about human language was wrong-headed and naïve. The idea that language is a big dictionary, a big list of words, is simply not true. Yes, each language uses words, but words are only one small part of the cognitive system that language is. If you have ever tried (and failed) to pass a language examination simply by memorizing vocabulary at the end of the chapter, then you have proven to yourself that language is more than a list of words. When an early machine translation program translated the English sentence, "The spirit is willing but the flesh is weak," into Russian as "the vodka is good but the meat is rotten" (which may be apocryphal but is nonetheless realistic), the program relied on a "word-for-word" translation (i.e., that language is a big list of words and the program just has to find the "word matches" from one language to the other language). Modern machine translation programs are much better because their underlying idea about language is better.

The lesson for forensic science from phase one is this: AI will not provide a reliable tool if the fundamental idea underlying the tool is wrong or naïve.

The second phase in AI, in the 70s, 80s, and still on-going, is associated with expert systems. After the spectacular failure of early machine translation, the hope that machines could still be programmed as a tool to help humans did not die. Instead, the idea that machines could simulate human cognition came to the forefront. If a human expert performs a task in a methodical way, then a computer program can be written to simulate the steps that the human uses. For instance, an early expert system was developed for selecting wine and cheese pairings. If a wine connoisseur is presented with a particular cheese and asked to select a wine to serve with the cheese, the wine connoisseur relies on his expertise to make a decision. Notice that the expert system is undergirded by knowledge (sometimes called a knowledge base, knowledge graph, or an ontology) that the human expert possesses. During this phase, knowledge engineers are hired to extract this expertise from humans, and usually the knowledge engineers work with multiple human experts to get as much reliable, valid knowledge as possible.

Expert systems are not as popular as they once were, mostly because they are extremely work-intensive: building an ontology — building a knowledge base — takes a lot of time, a lot of experts, and a careful knowledge engineer. You may have experienced a "help system" that was not very helpful: you were dealing with an expert system that had a very poor knowledge base underlying a user interface that, while polite, could only spout its limited knowledge or ask you if you want to speak to a human. In contrast, when IBM's® Watson won Jeopardy, its knowledge base had been worked on for years by knowledge engineers so that it had a huge store of facts and figures and, in some ways, encyclopedic, if not expert, knowledge.

The first lesson for forensic science from phase two is this: AI will provide a reliable tool only if the knowledge underlying the tool has been tested, proven, and collected thoroughly.

The second lesson for forensic science from phase two is this: Knowledge engineering that undergirds a reliable AI tool cannot be based on one expert. The "database in an expert's head" — which is another way of saying the expert's "experience" — is not sufficient to build a reliable tool.

The third phase in AI, starting in the late 80s and still on-going, is associated with pattern recognition. Pattern recognition is a set of mathematical techniques for classifying items into patterns. When one of my 2000s grants from the National Institute of Justice (NIJ) was titled "Pattern Recognition Techniques in Forensic and Investigative Sciences," I was referring to this mathematical analysis, but unfortunately the term "pattern recognition" was hijacked to refer to the subjective decisions about patterns that occur in some areas of forensic science. There was a great deal of fear that human experts would lose their jobs if intelligent pattern recognition systems were developed for two reasons: first, the Daubert standard was requiring error rates that subjective decisions could not provide but intelligent systems can, and second, real pattern recognition may actually do a better and more reliable job than subjective decisions.

Pattern recognition and "machine learning" techniques in general are extremely powerful and can be reliable under two conditions. First, the data that contains the patterns has to be "ground-truth" (the answer of the pattern being there or is not known) and the data has to be comprehensive (the data should include a broad range of patterns). The problem is that computer scientists are not taught how to collect data, how to curate data, or how to design experiments. These two conditions about data are often ignored by computer scientists.

When these two data conditions are ignored, pattern recognition systems are biased and ineffective. For example, facial recognition technologies are biased against people of color and women. The data containing the patterns that are required to build a reliable system is itself unreliable: the data may not be ground truth, and it is certainly not comprehensive.

On the other hand, when the data is good, the breakthroughs from pattern recognition can be game changing. For example, the recent work on fingerprint associations can possibly provide a way for a very strong, data-driven, and accurate error assessment on fingerprint identification.

Some pattern recognition techniques, such as neural networks, are not explainable. That is, the user is not sure how the system is setting parameters and making decisions. Other pattern recognition techniques, such as classifiers, are extensions of established statistical methods such as linear discriminant function analysis and logistic regression, which are explainable.

The first lesson for forensic science from phase three is this: Principled, fair, and honest data collection and curation is essential for any reliable AI tool using pattern recognition.

The second lesson for forensic science from phrase three is this: Any pattern recognition method must be explainable, or it will not succeed as courtroom testimony.

The fourth phase of AI, starting in the 80s but only making real headway into the public consciousness in the 2020s, is generative AI. Generative AI systems such as ChatGPT predict the next most likely word. Still word-based, as in the 1960s, the fundamental idea about language underlying generative AI is still naïve and extremely restrictive. ChatGPT is fluent but simply because it is repeating what it has previously ingested as data. ChatGPT serves the lazy who do not mind spouting unoriginal ideas and platitudes. ChatGPT is not thinking, and it is not conscious. It is computer code.

Fortunately, the Journal of Forensic Sciences, led by Dr. Michael Peat and its publisher, Wiley, and the AAFS Connect Committee, led by Dr. Marcus Rogers, have already released statements about the use of AI generated text. Gail Groy, Esq., provided legal perspective, and I hope that I provided some theoretical perspective on AI on the AAFS Connect's statement.

Generative AI text has the potential of contaminating linguistic evidence, just as generative AI video and audio has the potential, in some ways already realized, of contaminating multimedia and digital evidence. Further, linguistic evidence produced by generative AI will cause legal issues for copyright (which has already started) and a multitude of other types of cases, both criminal and civil, in which linguistic evidence is involved.

It is important to note that the main supplier of machine-generated text has not been able to differentiate its own productions from human-generated text with better than chance (actually, worse than chance) accuracy. Watermarking is one proactive way to guard against contamination, but I doubt this watermarking will take hold. My team at the Institute for Linguistic Evidence is working on reliable methods for detecting machine-generated text and presented results at the Engineering & Applied Sciences sessions in February 2024.

The first lesson for forensic science from phase four is this: AI systems that mimic human abilities are not trustworthy in the way that expert systems can be: there is a huge difference between mimicry and expertise.

The second lesson for forensic science from phase four is this: We need to be developing reliable methods for detecting machine-generated text, audio, and video because contaminated data as forensic evidence is deadly to any forensic investigation or trial.

So how does this history relate to justice for all?

First, as forensic scientists, we need to be sure that any AI tools, such as expert systems, pattern recognition systems, and generative systems, are built on reliable data (i.e., data that has been collected by design, curated for particular research questions and validation testing) and not swiped from the web.

Second, as forensic scientists, we need to reject systems that are built on biased data.

Third, as forensic scientists, we need to put our algorithmic expertise to good use in working with programmers to build expert systems that are reliable and validated, founded on a knowledge base of more than one expert.

Finally, since the United States has dominated in both AI and in the reliability standard for scientific evidence, as American forensic scientists, we need to ensure that the justice we model is true justice for all. We can do this by applying these lessons from the history of artificial intelligence.