Cited by Inconclusive Conclusions in Forensic Science: Rejoinders to Scurich, Morrison, Sinha & Gutierrez

The false promise of firearms examination validation studies: Lay controls, simplistic comparisons, and the failure to soundly measure misidentification rates DOI

Richard E. Gutierrez,

Emily J. Prokesch

Journal of Forensic Sciences, Journal Year: 2024, Volume and Issue: 69(4), P. 1334 - 1349

Published: April 29, 2024

Abstract Several studies have recently attempted to estimate practitioner accuracy when comparing fired ammunition. But whether this research has included sufficiently challenging comparisons dependent upon expertise for accurate conclusions regarding source remains largely unexplored in the literature. Control groups of lay people comprise one means vetting question, assessing comparison samples were at least enough distinguish between experts and novices. This article therefore utilizes such a group, specifically 82 attorneys, as post hoc control juxtaposes their performance on set cartridge case images from commonly cited study (Duez et al. J Forensic Sci. 2018;63:1069–1084) with that original participant pool professionals. Despite lacking kind formalized training experience common latter, our participants displayed an ability, generally, cases by same versus different guns 327 they performed. And while rates lagged substantially behind those professionals same‐source comparisons, different‐source was essentially indistinguishable trained examiners. indicates although we vetted may provide useful information about professional performing it little offer terms measuring examiners' ability guns. If similar issues pervade other studies, then there is reason rely false‐positive generated.

Language: Английский

Citations

The Hawthorne effect in studies of firearm and toolmark examiners DOI

Nicholas Scurich, Thomas D. Albright,

Peter Stout

et al.

Journal of Forensic Sciences, Journal Year: 2025, Volume and Issue: unknown

Published: April 10, 2025

Abstract The Hawthorne effect refers to the tendency of individuals behave differently when they know are being studied. In forensic science domain, concerns have been raised about “strategic examiner,” where examiner uses different decision thresholds depending on whether in a test situation or working an actual case. blind testing conducted by Houston Forensic Science Center (“HFSC”) firearms examination presents unique opportunity hypothesis that rate inconclusive calls differs for discovered vs. undiscovered tests firearm examination. Over 5 years, 529 item comparisons were filtered into casework at HFSC. items was 56.4%, while 39.3%. Thus, percentage 43.5% higher among than items. This pattern results held bullet (83% 59%) and cartridge case (29% 20%) both same‐source different‐source comparisons. These findings corroborate examiners tested demonstrate necessity if research goal is evaluate performance conducting casework.

Language: Английский

Citations

Shifting decision thresholds can undermine the probative value and legal utility of forensic pattern-matching evidence DOI

William C. Thompson

Proceedings of the National Academy of Sciences, Journal Year: 2023, Volume and Issue: 120(41)

Published: Oct. 2, 2023

Forensic pattern analysis requires examiners to compare the patterns of items such as fingerprints or tool marks assess whether they have a common source. This article uses signal detection theory model examiners' reported conclusions (e.g., identification, inconclusive, exclusion), focusing on connection between examiner's decision threshold and probative value forensic evidence. It Bayesian network explore how shifts in thresholds may affect rates ratios true false convictions hypothetical legal system. demonstrates that small thresholds, which arise from contextual bias, can dramatically pattern-matching evidence its utility

Language: Английский

Citations

Scientific guidelines for evaluating the validity of forensic feature-comparison methods DOI

Nicholas Scurich,

David L. Faigman,

Thomas D. Albright

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2023, Volume and Issue: 120(41)

Published: Oct. 2, 2023

When it comes to questions of fact in a legal context-particularly about measurement, association, and causality-courts should employ ordinary standards applied science. Applied sciences generally develop along path that proceeds from basic scientific discovery some natural process the formation theory how works what causes fail, development an invention intended assess, repair, or improve process, specification predictions instrument's actions and, finally, empirical validation determine instrument achieves effect. These elements are salient deeply embedded cultures medicine engineering, both which primarily grew sciences. However, inventions underlie most forensic science disciplines have few roots science, they do not sound theories justify their predicted results tests prove work as advertised. Inspired by "Bradford Hill Guidelines"-the dominant framework for causal inference epidemiology-we set forth four guidelines can be used establish validity comparison methods generally. This is checklist establishing threshold minimum validity, no magic formula determines when particular hypotheses passed necessary threshold. We illustrate these considering discipline firearm tool mark examination.

Language: Английский

Citations

Inconclusive conclusions in forensic science: rejoinders to Scurich, Morrison, Sinha and Gutierrez DOI

Hal R. Arkes,

Jonathan J. Koehler

Law Probability and Risk, Journal Year: 2022, Volume and Issue: 21(3-4), P. 175 - 177

Published: Dec. 1, 2022

To the Editor, We thank professors Scurich, Morrison, Sinha and Mr Gutierrez for their thoughtful comments on our article (Arkes Koehler, 2021). agree with Scurich (2023) that when an examiner knows he or she is being tested, results of such a test are highly suspect. If can avoid making errors by deeming comparison to be inconclusive if inconclusives never deemed indicative error, then 'strategic' inflate accuracy levels rendering decision any difficult test. Such will not provide unbiased measure examiner's accuracy. But this reason enough change way measured. In support different view matter, provides analogy offered Kaye et al. (2022) in which student answers 'I don't know' true–false question. should know answer, say answer counted as error. think Kaye's apt context. Teachers charged determining whether For example, topic was covered required reading lecture, answer. forensic test, one cannot Dror (2020) suggested strategies might help determine but response 2022) we reasons why those inadequate.

Language: Английский

Citations

Authors' response DOI

Richard E. Gutierrez,

Emily J. Prokesch

Journal of Forensic Sciences, Journal Year: 2024, Volume and Issue: 69(6), P. 2346 - 2348

Published: Aug. 26, 2024

We read Mr. Marshall's commentary with interest, but unfortunately his submission—riddled ad hominem attacks on the motivations of firearms examination's critics and just shy reference-less—nigh-uniformly concentrates, not our study design or data, bemoaning notion that examiners' grandiose claims near infallibility (e.g., "practical impossibility" error [1]) should rest empirical grounds rather than tradition intention. In this way, Marshall has built a soapbox in lieu scientific critique, one, despite warnings old parable, raised up sand stone. Indeed, preference for weaving self-serving narrative victimization over engaging substantively findings evinces itself even when considering attention to detail he must have employed reading article. How else, through cursory review could criticize us treating Duez et al. [2] as an "error rate study"—rather simply "validation technique virtual comparison microscopy"—given repeated explicit discussion ways which FBI, DOJ, one study's authors promoted it proof low false-positive field examination [3-6]? Had thoroughly evaluated treatment control groups—including its reference, long storied history controls [7], also studies utilized novices contextualize performance forensic professionals [3, 8, 9]—would he, really characterize suggestion validation include such groups "new unusual"? And why else would felt necessary point out limitations (the use attorneys group lay people photos opposed 3D scans cartridge cases) we had already forthrightly acknowledged original article [3]? gone further—for example, managed muster any argument supported by references data might erred suggesting actually disadvantaged participants—he found ready willing engage robust back forth, if reconsider conclusions. That did leaves little respond to. But because cannot, at juncture, ask go try again, few points warrant brief discussion. A condition is something happen happen. Being human going college, colleges do admit other animals. sufficient that, present, bound happen; so being all beings college. contrast, dropping lighted match into bucket gasoline starting fire, condition, there are many rubbing two sticks together. Notice empty be neither nor fire; usually conditions concur act specific causal consequences. [11] Along similar lines, makes sense article's focus (much less accuse malicious intent) merely ability separate professional novice does alone suffice render capable definitively proving validity. Just value no set ever (in isolation) resolve debate, suitable research they answer question interest. Physicians, after all, ignored HIV necessary, sufficient, developing more serious AIDS [14]. Viewed light, characterization some sinisterly plotted "gotcha" moment shows true colors knee-jerk reaction well-reasoned critique. Second, solitary foray rebutting conclusions own—his recent paper Growns [15]—relies gross mischaracterizations aims limitations. short, portrays establishing likely outperform comparing fired bullets cases. paper, tested only visual abilities included single within participant pool, done nothing sort. specifically reject very inference draws from their findings, noting "[a]s participants were untrained novices, unclear whether these results generalize practicing professionals" [15]. Really though, grateful directing al.'s excellent because, reality, supports made article, citing showing domain expertise fields acknowledging none exist realm examination, emphasizing larger-than-expected role "domain-general ability" "varies naturally general population" emerging training experience (a finding logically impel further dispense need compare performance) words, while remain perplexed belief citation served ends—especially inconsistency between view validity settled statement "research beginning explore comparison" [15]—we thank him including how adequately establish accuracy practitioners Third, fares better mounting defense own effort validate microscopy [16]. He quoted saying "[t]here was intention select pairs elimination sets attempt lead making false positive source attribution," parenthetical followed: "(e.g. strong carry subclass characteristics pairs)," doing failed recognize way latter "changes meaning" quotation, difficulty coauthors indeed, Marshall—now originally wrote them, cannot say—who misunderstands words. The highlights uses e.g. (meaning example many) i.e. is). Thus, portraying Knowles efforts whatsoever inject sets, misunderstood misrepresent paper. now fault assume grammatical usage errors part coauthors. More point, however, used said parenthetical, concerns about bias against testing challenging eliminations stand. provides support anecdotal conjecture rarity characteristics, justify ignoring evaluation practitioner grappling marks, especially produced staggering rates misidentification test examiners [12]. troublingly still, calls assertions nonexistent challenge "categorically misleading," evidence cites applied exclusively identifications. repeatedly dismisses inconsequential, different (just 20% total study) "were significant final outcome research" listing first (and thus logically, primary) rationale "to reduce potential identification response bias" Given harms criminal defendants based individual [17, 18], laboratory's policies can excuse shortcomings satisfy who demanded misidentification. Having dispensed substantive divine commentary, final, albeit philosophical, distinction draw views his. Early remarks, paradoxically accuses both remaining perpetually tethered record existed 2016 well to, opinion, "move goal posts" what adherents discipline demonstrate assuage doubts frankly understand former claim given himself concedes, expanded understanding appropriate harkening PCAST's criteria. assertion find disquieting. No scientist bemoan chance expand base knowledge develop new techniques technologies. most sickly stunted version method harbor precedent past. Down road dunkers women prove witchcraft, phrenologists measuring bumps skulls uphold racist hegemony, backwater physicians still bleeding those burdened illness; them enraged empiricism replace tradition. hold grander vision endeavor research, relish opportunity naysayers wrong proven themselves, life liberty impacted methods treasure bare minimum, ground beliefs desire. If hope see again solid literature under law, take expansive formidable view. They cease begrudging goalposts continue long, hard grind reflection research. make Marshalls world minority.

Language: Английский

Citations

Authors' response DOI

Richard E. Gutierrez,

Emily J. Prokesch

Journal of Forensic Sciences, Journal Year: 2024, Volume and Issue: 70(1), P. 405 - 408

Published: Nov. 13, 2024

See Original Article here Commentary on We thank the commentors for drawing our attention to two typographical errors in original article—the spelling of Dr. Lilien's name and inconclusive rate provided table 2 "Duez examiners," which should have been 13% as opposed 15%—though we emphasize that latter did not carry over into other figures calculations (e.g., confidence intervals) throughout remainder piece. Unfortunately, beyond its contribution copy editing, their letter amounts to, at best, much ado about nothing, and, worst, something akin statistical malpractice. But wherever along spectrum readers place Weller et al.'s commentary after reviewing reply, analysis al. provide cannot support strongly held (and potentially financially motivated) belief they contradicted central claims article proven value work (Duez [1]) even evaluation performance different source comparisons. Indeed, given set out explore if post hoc inclusion a control group could insights whether comparisons Duez (especially comparisons) adequately explored "full range distribution types difficulty normally seen casework" [2-4], well end this response now merely by noting commentors' admission samples used "look so 'laypersons' are unlikely misidentify them." because gimmicks might otherwise prove persuasive casual those with only most superficial understanding hypothesis testing, feel obliged offer more fulsome rebuttal. To begin, caveat "feel[ing] no need publish detailed line-by-line analysis" does little compensate perfunctory, self-absolving, internally inconsistent discussion appropriate characteristics group, use static images, binning examiners trainees. In article, specifically cautioned drew participant pool from "a sample convenience" was necessarily "representative defense attorneys whole," less novices writ large [4]. And conceded would preferred lay off same materials participants itself (i.e., 3D scans cartridge cases CCTS2); indeed, recommended future studies endeavor correct both limitations criticisms ultimately fail expand credibly upon these concessions. Initially, speculate obliquely lit images may advantaged participants. argument disserves firearms examiners—whose expertise, possess any, it reduces mere lighting rather than comparison ability—it also ignores own role forcing us less-than-ideal samples: neither explain years-long failure make CCTS2 public uploading them reference database maintained National Institute Standards Technology [5], nor corrected lack transparency article. Instead, chosen path scientific entrapment, ensuring outside researchers attempt reproduce or reassess findings without opening themselves up criticism regarding inexact matching items. At bottom, Hobson's choice created coexist legitimate open debate. Much is true how categorize They again "some trainees had experience toolmark examination 'lay' attorneys" NFC group. However, do forthrightly acknowledge (by failing collect sufficient demographic data creating any ambiguities level training enjoyed trainee It strikes thoroughly unfair disingenuous exploit collection about) contrasting failures fronts. putting aside concerns hypocrisy self-absolution, simultaneous treatment attorney quasi-experts—only three whom ever completed case before study majority never cross-examined examiner received field [4]—and underserving moniker withstand scrutiny. With all due respect Cambridge Dictionary cite define "layperson" [6], prefer draw expertise English wordsmiths, but courts must grapple qualify particular witness [7]. As noted legal definition an expert sets low bar indeed [4, 8]. while relatively early programs likely satisfy such minimal requirements (armed be days education specific in, comparisons), surely (on basis indirect gained through cross-examination, 20 min) [9-11]. Sad day though practitioners outcompete outsiders zero experience, lines drawn when qualifying witnesses decision, main conclusions, bin considering best practice guidance classifying professionals feature fields recommending introductory groups [12]. Unless until forensic community imposes mandatory courts, simply claiming otherwise. Preliminaries aside, devote bulk using Fisher's exact test evaluate cherrypicked portion overall, alleging selective approach has "rejected [our] published conclusions" shown statistically significant difference between doubt wisdom non-statisticians debating merits application various mathematical methods testing relationship training/experience performance. avoided p-values significance thresholds outright concern contradict provoked controversy across fields, some journals banning outright, scholars issuing complex guard against misleading unsupported inferences [13-15]. continue believe that—by reporting intervals focusing patterns trends data—we engaged practices [13], decision forge ahead morass alone (with statistician, partnership) made avoiding accompanying reality debate non-experts) impossible. Before diving test, admit considerable confusion fit draw. Specifically, repeatedly, quite forcefully, contend conclusions," results report, confined just al., almost entirely confirm findings. expressed bias away eliminations 16, 17]. (at least results) supports concern: 13 orders magnitude separate report comparing versus samples. point indicating accuracy same-source "against sweeping definitive conclusions identification cases," stated "our show separation participants" metrics (performance sensitivity, false-negative rate, rate) Our conclusion result align. false-positive rates examiners" were indistinguishable Again, confirms conclusion. observed terms specificity (further explaining disadvantages faced latter, including time constraints inability zoom align comparable participants) The showing professional novice metrics, caveated conclusions. overall thus incorporating overreliance on) arguably run counter reported then, can find contradiction (rather widespread correspondence reality) form "p hacking" [14, 15], claim rebutted limiting study's full (our all"). clear, caused inappropriately binary logic deploying discreet (like 0.05 al.) [14]. Rather, consistently "researcher degrees freedom"—that is, ability pick choose what exclude "[w]hich conditions combined ones compared"—allow authors manipulate near-guarantee favored [15]. American Statistical Association chastised practices, emphasizing "[c]herry-picking promising findings, known dredging, chasing, questing, inference, 'p-hacking,' leads spurious excess literature vigorously avoided" remedy problems, ASA voiced consistent mandate transparency, "P-values related analyses selectively," "[c]onducting multiple … certain renders essentially uninterpretable" [14], eliminating observations, "authors observations included" abide principles: although explained length why ought compared neglect include study. favors, choosing instead exclusively favor desired (clean professionals, probative value). regardless process preceding creation commentary, afoul principle, conceals malignant heart inferences. Had they, did, (its overall), claimed way. Table 1, generated uniformly exceed threshold wide margins words, content rely thresholds, dispensed handwringing caveats around specificity: there world bars qualification non-existent necessary examiner, remain firm therefore offers labs, comes questions ground truth eliminations. Whether knew refused stubbornly failed reinforce preexisting beliefs, arrived ignorance, say. inferential inadequacies, forcefully back PCAST emphasis involvement (during validation testing) "independent third parties stake outcome" [3]. While law enforcement chaffed requirement [18-20], see emblematic PCAST's wisdom. Those financial virtual microscopy techniques sought validate, defend commentary. become component company's marketing, appearing website below text "Cadre's scanning hardware, VCM software, algorithms developed, validated, peer reviewed journals" [21]. view, sway interests explain, analysis, several concerning aspects struggle disinterested like arguments authority (their extended recitation awards received) refusing one (Mr. Weller) found "remarkably credible" "lack[ing] objectivity virtually every area," Illinois judge, part, his highly [22]. doubt, finances line, try buttress pointing follow-up [23], author under oath involved where "similarities, features kind dupe trick making false positive" [24]. pretend independence disinterest ourselves (given long defending indigent accused confronting admissibility evidence), allegiance transparently fully supplying raw (including linked data) scrutiny researchers. For commentors, whose stems philosophy dollars cents, demand much. prior work, leap.

Language: Английский

Citations

More unjustified inferences from limited data in DOI

Richard E. Gutierrez

Law Probability and Risk, Journal Year: 2024, Volume and Issue: 23(1)

Published: Jan. 1, 2024

Abstract In recent years, multiple scholars have criticized the design of studies exploring accuracy firearms examination methods. Rosenblum et al. extend those criticisms to work Guyll on practitioner performance when comparing fired cartridge cases. But while thoroughly dissect issues regarding equiprobability bias and positive predictive values in study, they do not delve as deeply into other areas such variability participant performance, well sampling participants test samples, that further undercut ability generalize al.’s results. This commentary extends what began explores how low rates error reported by likely underestimate potential for misidentifications casework. Ultimately, given convenience authors should gone beyond descriptive statistics instead draw conclusive inferences classify “a highly valid forensic technique.”

Language: Английский

Citations

Inconclusive Conclusions in Forensic Science: Rejoinders to Scurich, Morrison, Sinha & Gutierrez DOI

Hal R. Arkes,

Jonathan J. Koehler

SSRN Electronic Journal, Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

We agree with Scurich (2023) that when an examiner knows he or she is being tested, the results of such a test highly suspect. If can avoid making errors by deeming comparison to be inconclusive, and if inconclusives are never deemed indicative error, then “strategic” inflate accuracy levels rendering inconclusive decision for any difficult test. Such will not provide unbiased measure examiner’s accuracy. But this reason enough change way measured. support view expressed in Morrison (2023), but our paper we accepted world as it currently exists, one which examiners use categorical conclusions. Finally, Sinha Gutierrez blind testing resolve all issues mentioned their final sentence. think would major step right direction.

Language: Английский

Citations