On Bayesian Sensitivity Analysis in Digital Forensics

The idea of using of Bayesian Belief Networks in digital forensics to quantify the evidence has been around for a while now. To provide qualitative approaches to Bayesian evidential reasoning in the digital Meta-Forensics is however relatively new in the decision support systems research. For law enforcement, decision support and application of data mining techniques to “soft” forensic evidence is a large area in Bayesian forensic statistics which has depicted how Bayesian Networks can be used to infer the probability of defense and prosecution statements based on forensic evidence. Kevin B. Korb and Ann E. Nicholson's study on Sally Clark is Wrongly Convicted of Murdering Her Children and Linguistic Bayesian Networks for reasoning with subjective probabilities in forensic statistics gives an insight into an important development which helps to quantify the meaning of forensic expert testimony for "strong support".

The IEEE paper on Sensitivity Analysis of a Bayesian Network for Reasoning about Digital Forensic Evidence published in 3rd International Conference on Human-Centric Computing (HumanCom), 2010 is of particular interest since it has a comprehensive real-world list of evidence items and hypothesis.

Bayesian network representing an actual prosecuted case of illegal file sharing over a peer-to-peer network has been subjected to a systematic and rigorous sensitivity analysis. Our results demonstrate that such networks are usefully insensitive both to the occurrence of missing evidential traces and to the choice of conditionalevidential probabilities

one of the co-authors Dr. Overill has also covered grounds for A Complexity Based Forensic Analysis of the Trojan Horse Defence.

The evidence nodes are follows.

Modification time of the destination file equals that of the source file
Creation time of the destination file is after its own modification time
Hash value of the destination file matches that of the source file
BitTorrent client software is installed on the seized computer
File link for the shared file is created
Shared file exists on the hard disk
Torrent file creation record is found
Torrent file exists on the hard disk
Peer connection information is found
Tracker server login record is found
Torrent file activation time is corroborated by its MAC time and link file
Internet history record about the publishing website is found
Internet connection is available
Cookie of the publishing website is found
URL of the publishing website is stored in the web browser
Web browser software is available
Internet cache record about the publishing of the torrent file is found
Internet history record about the tracker server connection is found
The seized computer was used as the initial seeder to share the pirated file on a BitTorrent network

while the following hypothesis stand.

The pirated file was copied from the seized optical disk to the seized computer
A torrent file was created from the copied file
The torrent file was sent to newsgroups for publishing
The torrent file was activated, which caused the seized computer to connect to the tracker server
The connection between the seized computer and the tracker server was maintained

The authors conclude, exonerating the sparse evidence such that

The sensitivity analysis reported in this paper demonstrates that the BT BBN used in is insensitive to the occurrence of missing evidence and also to the choice of evidential likelihoods to an unexpected degree.

Our overall finding is gratifying because it implies that the exact choice of values for the inherently subjective evidential likelihoods is not as critical as might have been expected. Values falling within the consensus of experienced expert investigators are sufficiently reliable to be used in the BBN model. Furthermore, our results imply that the inability to recover one or more evidential traces in a digital forensic investigation is not generally critical for the probability of the investigatory hypothesis under consideration.

For some reason, this reminded me of a recent read SuperFreakonomics where authors devise a terrorist-algorithm with the following black-box variable.

“What finally made it work was one last metric that dramatically sharpened the aalgorithm. In the interest of national security, was have been asked to not disclose the particulars; we’ll call it Variable X.

What makes Variable X so special?

For one, it is a behavioral metric, not a demographic one. The dream of anti-terrorist authorities everywhere is to somehow become a fly on the wall in a room full of terrorists. In one small important way, Variable X accomplishes that. Unlike most other metrics in the algorithm, which produce a yes or no answer, Variable X measures the intensity of a particular banking activity. While not unusual in low intensities among the general population, this behavior occurs in high intensities much more frequently among those who have other terrorist markers.

This ultimately gave the algorithm great predictive power. Starting with a database of millions of bank customers, Horsley was able to generate a list of about 30 highly suspicious individuals. According to his rather conservative estimate, at least 5 of those 30 are almost certainly involved in actitvities. Five out of 30 isn’t perfect—the algorithm misses many terrorists and still falsley identifies some innocents—but it sure beats 495 out of 500,495.”

Bayesian Belief Networks can definitely serve as a better probabilistic graphical model to achieve a improved visibility and prior/posterior probabilities for such network related algorithm.