02-bullets.Rmd

\addtocounter{chapter}{1}\setcounter{section}{0}\specialchapt{CHAPTER 2. AUTOMATIC MATCHING OF BULLET LANDS}

\begin{center}

A paper submitted to the \textbf{Annals of Applied Statistics}. \\
Eric Hare, Heike Hofmann, Alicia Carriquiry

\textbf{Abstract}

\end{center}

In 2009, the National Academy of Sciences published a report questioning the scientific validity of many forensic methods including firearm examination. Firearm examination is a forensic tool used to help the court determine whether two bullets were fired from the same gun barrel. During the firing process, rifling, manufacturing defects, and impurities in the barrel create striation marks on the bullet. Identifying these striation markings in an attempt to match two bullets is one of the primary goals of firearm examination. We propose an automated framework for the analysis of the 3D surface measurements of bullet lands that first transcribes the markings into a 2D plotting framework. This makes identification of matches easier and allows for a quantification of both matches and matchability of barrels. The automatic matching routine we propose manages to (a) correctly identify lands (the surface between two bullet grooves) with too much damage to be suitable for comparison, and (b) correctly identify all 10,384 land-to-land matches of the James Hamby study [@hamby:2009].

\newpage

```{r, fig.keep='all', cache=FALSE, echo=FALSE, eval=TRUE, message=F, warning=F}
#rm(list=ls())
#wd <- getwd()
library(extrafont)
library(knitr)
imgdir <- "Figure/"
codedir <- "code"
datadir <- "images/Hamby (2009) Barrel/bullets/"

options(replace.assign=TRUE,scipen=3, digits=2)
bstats <- read.csv("data/data-25-25/bullet-stats-old.csv")

scrubPath <- function(x) {
  splits <- strsplit(as.character(x), split="/")
  last <- sapply(splits, function(x) x[length(x)])
  gsub(".x3p","", last)
}

library(RColorBrewer)
library(ggplot2)
library(scales)
library(dplyr)
library(bulletr)
library(grid)
library(gridExtra)
library(zoo)
library(tidyr)
library(rpart)
library(rpart.plot)
library(xtable)
library(sm)
library(reshape2)
library(randomForest)
```

# Introduction

Firearm examination is a forensic tool used to help the court determine whether two bullets were fired from the same gun barrel. This process has broad applicability in terms of convictions in the United States criminal justice system. Firearms identification has long been considered an accepted and reliable procedure, but in the past ten years has undergone more significant scrutiny. In 2005, in *United States vs. Green*, the court ruled that the forensic expert could not confirm that the bullet casings came from a specific weapon with certainty, but could merely "describe" other casings which are similar. Further court cases in the late 2000s expressed caution about the use of firearms identification evidence [@giannelli:2011].

In 2009, the National Academy of Sciences published a report [@NAS:2009] questioning the scientific validity of many forensic methods including firearm examination. The report states that "[m]uch forensic evidence -- including, for example, bite marks and firearm and toolmark identification is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing to explain the limits of the discipline."

Rifling, manufacturing defects, and impurities in a barrel create striation marks on the bullet during the firing process. These marks are assumed to be unique to the barrel, as described in a 1992 AFTE article [@afte:1992]. "The theory of identification as it pertains to the comparison of toolmarks enables opinions of common origin to be made when the *unique surface contours* of two toolmarks are in sufficient agreement". The article goes on to state that "Significance is determined by the comparative examination of two or more sets of surface contour patterns comprised of individual peaks, ridges and furrows."

From a statistical standpoint, identification of the gun that fired the bullet(s) requires that we compare the probabilities of observing matching striae under the competing hypotheses that the gun fired, or did not fire, the crime scene bullet. If indeed the uniqueness assumption is plausible, the latter probability approaches zero.

Current firearm examination practice relies mostly on visual assessment and comparison of striation. Indeed, the AFTE Theory of Identification (https://afte.org/about-us/what-is-afte/afte-theory-of-identification) explicitly requires that examiners evaluate the strength of similarity between two samples relative to other comparisons they may have carried out in the past. An attempt to quantify the degree of similarity consists in counting the number of consecutively matching striae (CMS) between two bullets, first proposed by @biasotti:1959. This approach has two drawbacks, however. First, determining matching striae is still a subjective activity. Second, as discussed by @miller:1998, the number of CMS may be high even if the bullets were not fired by the same gun.

Here, we focus on the question of defining a metric that can be used to objectively compare two bullets. We propose a framework which allows for the automatic analysis of the surface topologies of bullets, and the transcription of the individual characteristics into a 2D plotting framework. 

We work with images from the James Hamby Consecutively Rifled Ruger Barrel Study [@hamby:2009]. Ten consecutively rifled Ruger P-85 pistol barrels were obtained from the manufacturer and fired to produce 20 known test bullets and 15 unknown bullets for comparison. 3D topographical images of each bullet were obtained using a NanoFocus lens at 20x magnification and made publicly available on the NIST Ballistics Database Project\footnote{\url{http://www.nist.gov/forensics/ballisticsdb/hamby-consecutively-rifled-barrels.cfm}} in a format called x3p (XML 3-D Surface Profile). The x3p format conforms to the ISO5436-2 standard\footnote{\url{http://sourceforge.net/p/open-gps/mwiki/X3p/}}, implemented to provide a simple and standard conforming way to exchange 2D and 3D profile data. It was adopted by the OpenFMC (Open Forensic Metrology Consortium\footnote{\url{http://www.openfmc.org/}}), a group of academic, industry, and government firearm forensics researchers whose aim is to establish best practices for researchers using metrology in forensic science.  We have developed an open-source package for analyzing bullet lands written in R [@R]. This package is called bulletr [@bulletr] and enables direct reading and manipulation of x3p files. It also implements all of the methods we propose in this paper. A different package exists for reading x3p files called x3pr [@x3pr] developed by Petraco (2014), but it is not designed to carry out calculations like the ones we propose after the x3p files have been read.

\begin{figure}[H]
\centering
\includegraphics[width=\linewidth]{images/sidex3p.png}
\caption{View of the data along the circumference of the bullet (circular segment of about 30 degrees).}
\label{fig:sidex3p}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[width=\linewidth]{images/topx3p.png}
\caption{Frontal view of a bullet land (lower end of the view is the bottom of the bullet).}
\label{fig:topx3p}
\end{figure}

```{r, echo=FALSE}
typical_width <- read.csv("csvs/grooves.csv") %>%
    summarise(length = mean(groove_right - groove_left)) %>%
    as.numeric %>%
    round(digits = 2)
```

Each fired bullet is provided in the form of a set of six x3p files, where each file is a surface scan between adjacent grooves on the bullet, called a "land". In the Hamby data, typical length (groove-to-groove) of a land is about `r typical_width` micrometers or `r typical_width / 1000` millimeters. For notational simplicity, we refer to a particular land of a bullet as bullet X-Y, where X is the bullet identifier, and Y is the land number. An example of plotting one of these lands is given in Figures \ref{fig:sidex3p} and \ref{fig:topx3p}. These figures show side and top profiles of the land respectively. The tilt of the lines to the left in Figure \ref{fig:topx3p} is not an artifact, but a direct and expected consequence of the spin induced by the rifling during the firing process. Depending on whether a barrel is rifled clockwise or counter-clockwise, the striations have a left or right tilt. The direction of the rifling is a class characteristic, i.e. a feature that pertains to a particular class of firearms, and is not unique at the individual barrel/bullet level.

The typical number and width of striation markings on bullets varies significantly depending on the gun barrel. For instance, a Smith and Wesson barrel with a land-width of 2.4 millimeters contained an average 60 striae, with an average width of about 0.08 millimeters [@chu:2011].

The purpose of our paper is to present an automatic matching routine that allows for a completely objective assessment of the strength of a match between two bullet lands. While we assess the performance of the algorithm in terms of a binary decision of match vs. non-match using a 50% probability cut-off, our primary goal is to highlight the features that are statistically associated with matches and non-matches, and to provide a quantitative assessment of this association. In a real-world application of our algorithm, the raw scores would need further analysis and scrutiny, and it is likely a 50% cut-off would be an inappropriate choice on the basis of reasonable doubt.

Our algorithm is fully open source and available on GitHub [@bulletr]. This transparency allows for a greater understanding of the individual steps involved in the bullet matching process, and allows other forensic examiners, as well as outside observers, to examine the factors that discriminate between known bullet matches and non-matches. We have chosen to perform the matching on a land-to-land level, rather than bullet-to-bullet level. Although doing so introduces an implicit assumption of independence between lands, assuming independence only serves to make the task more challenging.

The remainder of this paper is structured as follows: We first briefly review some earlier work. We then discuss two methods of modeling the class structure of the bullet surfaces. Finally, we proceed to describing an automatic matching routine which we evaluate on the bullets made available through the Hamby study.

# Previous Work

There have been attempts to develop automatic or semi-automatic matching protocols, but most have focused on breech face and firing pin marks [e.g., @riva:2014] or discuss a single attribute for comparison [e.g., @vorburger:2011, @chu:2011]. Still others refer to proprietary algorithms [@roberge:2006]. We briefly review some of this earlier work in what follows.

The original paper on the complete Hamby study already reports the successful use of several computer-assisted methods. However, aside from a zero false positive rate, false-negative error rates for bullets are not given nor are error rates for land-to-land matches mentioned. 

@lock2013significance proposed an approach to quantify similarity of toolmarks. Their algorithm determines an optimal matching window between two toolmark signatures, and then performs a set of both coordinated and independent shifts. Given a match, the coordinated shifts would be expected to yield correlation values higher than those obtained from independent shifts. This is assessed using a Mann-Whitney U Statistic. 

A procedure for bullet matching using the BulletTrax3D system is described in @roberge:2006. Their study used a different set of ten consecutively rifled barrels; matches are identified based on a bullet-to-bullet correlation score. The authors state that this process `could be automated', but no implementation of the algorithm is available. 

Modern automated techniques using 3D images have also been proposed by e.g. @riva:2014. However, the authors focused on cartridge cases and not bullets. This might seem like a trivial distinction, but it has implications for the development of the algorithm. The algorithm performs alignment of striae by rotation of the XY plane, which is not generalizable to bullets in which the XY plane is not flat. 

Other work on 3D images has been described by @petraco:2012, who also focus on cartridge cases, as well as screwdriver striation patterns, and by others [e.g., @chu:2011, @chu:2010, @vorburger:2011].

# Bullet Signatures

To analyze the striation pattern, we extract a **bullet profile** [@ma:2004] by taking a cross section of the surface measurements at a fixed height $x$ along the bullet land. Figure \ref{fig:fixedX} shows a plot of the side profile of a bullet land. It can be seen that the global structure of the land dominates the appearance of the plot. The grooves can be clearly identified on the left and right side, and the curvature of the surface is the most visible feature in the middle.

\begin{figure}[hbtp]
  \centering
```{r fixedX, echo=FALSE, warning=FALSE, message=FALSE, fig.height=2, fig.width=6, out.width='0.65\\textwidth'}

cols = c(alpha("grey60", alpha=0.6), alpha("black", 0.5))

br111 <- read_x3p(paste(datadir,"Br1 Bullet 1-5.x3p", sep = "/"))
dbr111 <- fortify_x3p(br111)

pars <- data.frame(getCircle(dbr111$y, dbr111$value))
dbr111$theta <- acos((dbr111$y-pars$x0)/pars$radius)/pi*180
dbr111 <- dbr111 %>% mutate(
  xpred = cos(theta/180*pi)*pars$radius + pars$x0,
  ypred = sin(theta/180*pi)*pars$radius + pars$y0
)

qplot(data=subset(dbr111, x <= 100*1.5625^2 & x >= 99*1.5625^2), y, value, geom="line", size=I(1)) +
  geom_line(aes(x=xpred, y=ypred, group=x), 
            colour="grey30", size=0.25) +
#  ylab(expression(paste("Surface Measurements (in ",mu,m,")", sep=""))) + 
  ylab("") +
  theme_bw() + 
  theme(legend.position="bottom") #+ coord_equal()
```

\caption{\label{fig:fixedX}Side profile of the surface measurements (in~$\mu m$) of a bullet land at a fixed height $x$. Note that the global features dominate any deviations, corresponding to the individual characteristics of striation marks.}
\end{figure}

The smooth curve on the plot represents a segment of a perfect circle with the same radius as the bullet. While the circle is an obvious first choice for fitting the structure, it does not completely capture the bullet surface after it was fired. A discussion of a circular fit and the remaining residual structure can be found in Section \ref{cylindrical-fit}.

Instead of a circular fit, we use multiple loess fits to model the overall structure and extract the bullet markings. 

## Identifying Groove Locations

We first identify the location of the left and right grooves in the image. The grooves are assumed to contain no information relevant for determining matches. They also dominate the structure, and therefore need to be removed.  
Fortunately, the location and appearance of the grooves in the surface profiles is quite consistent.
Surface measurements reach local maxima around the peak of the groove at either end of the range of $y$, and we can then follow the descent of the surface measurements inwards to the valley of the groove. 
The location of the valleys mark the points at which we trim the image. The procedure can be described as follows:

1. At a fixed height $x$ extract a bullet's profile (Figure \ref{fig:loess_step1}, with $x = 243.75\mu m$).
2. For each $y$ value, smooth out any deviations occurring near the minima by twice applying a rolling average with a pre-set \emph{smoothing factor} $s$. (Figure \ref{fig:loess_step3}, smoothing factor $s = 35$ corresponding to 55$\mu m$).
3. Determine the location of the peak of the left groove by finding the first doubly-smoothed value $y_i$ that is the maximum within its smoothing window (e.g. such that $y_i > y_{i - 1}$ and $y_i > y_{i + 1}$, where $i$ is between 1  and $\lfloor s/2 \rfloor$). We call the location of this peak $p_{\ell}$ (see Figure \ref{fig:loess_step47}). 
4. Similarly, determine the location of the valley of the left groove by finding the first double-smoothed $y_j$ that is the minimum within its smoothing window. Call the location of this valley $v_{\ell}$.
5. Reverse the order of the $y$ values and repeat the previous two steps to find the peak and valley of the right groove, $(p_{r}, v_{r})$.
6. Trim the surface measurements to values within the two grooves (i.e.\ remove all records with $y_i < v_{\ell}$ and $y_i > v_{r}$) (see Figure \ref{fig:loess_step47}).

\begin{figure}[hbtp]
  \centering
\begin{subfigure}[t]{\textwidth}\centering
\caption{\label{fig:loess_step1}Step 1 of identifying groove locations: For a fixed height ($x = 243.75\mu m$)  surface measurements for bullet 1-5 are plotted across the range of $y$.}
```{r loess_step1, echo=FALSE, fig.width=10, fig.height=3.5, out.width='.5\\textwidth', warning=FALSE,  message=FALSE}
br111_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 1-5.x3p", sep = "/"))
br121_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 2-1.x3p", sep = "/"))
br122_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 2-2.x3p", sep = "/"))
br123_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 2-3.x3p", sep = "/"))
br124_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 2-4.x3p", sep = "/"))
br125_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 2-5.x3p", sep = "/"))
br126_bullet <- get_crosscut(paste(datadir,"Br1 Bullet 2-6.x3p", sep = "/"))

p1 <- qplot(y, value, data = br111_bullet, size=I(1)) + theme_bw() + ylab("Surface measurements")
p1
```
\end{subfigure}
\begin{subfigure}[t]{\textwidth}\centering
\caption{\label{fig:loess_step3}Step 2 of identifying groove locations: The  surface measurements are smoothed twice with a smoothing factor of $s = 35$. The orange rectangle shows an example of the smoothing window. Valleys and peaks are detected, if they are not within the same window.}
```{r loess_step3, echo=FALSE, fig.width=10, fig.height=3.5, out.width='.5\\textwidth', warning=FALSE,  message=FALSE}
subdata <- br111_bullet
smoothfactor <- 35

value_filled <- na.fill(subdata$value, "extend")
smoothed <- rollapply(value_filled, smoothfactor, function(x) mean(x))
smoothed_truefalse <- rollapply(smoothed, smoothfactor, function(x) mean(x))

xloc=250
p3 <- ggplot() +
  annotate("rect",xmin=xloc-17*1.5625, xmax=xloc+18*1.5625, ymin=80, ymax=200, fill=alpha("orange", alpha=0.5)) + 
  geom_point(aes(x=(1:length(smoothed_truefalse))*1.5625, y=smoothed_truefalse)) + theme_bw() +
    geom_vline(xintercept=xloc, colour="grey50") + xlab("y") + ylab("Surface measurements")

p3
```
\end{subfigure}
\begin{subfigure}[t]{\textwidth}\centering
\caption{\label{fig:loess_step47}Steps 3 -- 6 of identifying groove locations: After smoothing the surface measurements extrema on the left and right are detected (marked by vertical lines, red indicating peaks and blue indicating valleys). Values outside the blue boundaries are removed (shown in grey)}
```{r loess_step47, echo=FALSE, fig.width=10, fig.height=3.5, out.width='.5\\textwidth', warning=FALSE,  message=FALSE}
    lengthdiff <- length(subdata$value) - length(smoothed_truefalse)
    
    peak_ind_smoothed <- head(which(rollapply(smoothed_truefalse, 3, function(x) which.max(x) == 2)), n = 1)
    peak_ind <- peak_ind_smoothed + floor(lengthdiff / 2)
    groove_ind <- head(which(rollapply(tail(smoothed_truefalse, n = -peak_ind_smoothed), 3, function(x) which.min(x) == 2)), n = 1) + peak_ind
    
    peak_ind2_smoothed_temp <- head(which(rollapply(rev(smoothed_truefalse), 3, function(x) which.max(x) == 2)), n = 1)
    peak_ind2_temp <- peak_ind2_smoothed_temp + floor(lengthdiff / 2)
    groove_ind2_temp <- head(which(rollapply(tail(rev(smoothed_truefalse), n = -peak_ind2_smoothed_temp), 3, function(x) which.min(x) == 2)), n = 1) + peak_ind2_temp
    
    peak_ind2 <- length(subdata$value) - peak_ind2_temp + 1
    groove_ind2 <- length(subdata$value) - groove_ind2_temp + 1

    p4 <- qplot(subdata$y[subdata$y < subdata$y[groove_ind]], subdata$value[subdata$y < subdata$y[groove_ind]], colour=I("grey60"), alpha=I(.25)) +
        ylim(c(min(subdata$value[!is.nan(subdata$value)]) - 25, max(subdata$value[!is.nan(subdata$value)]) + 25)) +
        xlim(c(min(subdata$y) - 25, max(subdata$y) + 25)) +
        geom_point(aes(x, y), data = data.frame(x = subdata$y[subdata$y > subdata$y[groove_ind2]], y = subdata$value[subdata$y > subdata$y[groove_ind2]]), colour=I("grey60"), alpha=I(.25)) +
        geom_point(inherit.aes = FALSE, aes(x, y), data = data.frame(x = subdata$y[subdata$y < subdata$y[groove_ind2] & subdata$y > subdata$y[groove_ind]], y = subdata$value[subdata$y < subdata$y[groove_ind2] & subdata$y > subdata$y[groove_ind]])) +
        theme_bw() +
        geom_vline(xintercept = subdata$y[peak_ind], colour = "red") +
        geom_vline(xintercept = subdata$y[groove_ind], colour = "blue") +
        geom_vline(xintercept = subdata$y[peak_ind2], colour = "red") +
        geom_vline(xintercept = subdata$y[groove_ind2], colour = "blue") +
      xlab("y") + ylab("Surface measurements")

    p4
```
\end{subfigure}
\caption{Overview of all six steps of the smoothing algorithm to identify and remove grooves from the bullet images.}
\end{figure}

The smoothing factor $s$ introduced in the algorithm represents the window size to use for a rolling average. Higher values of $s$ therefore lead to more smoothing. Empirically, a value of $s = 35$ for the smoothing factor seems to work well (the smoothing factor is further discussed in Section \ref{varying-smoothing-factor}). It is important to note that the smoothing pass is done twice. That is, the smoothed data are once again smoothed by computing a new rolling average with the same smoothing factor. This bears some similarities to the ideas of John Tukey in his book Exploratory Data Analysis, where he describes a smoothing process called ``twicing" in which a second pass is made on the residuals computed from the first pass and then added back to the result [@tukey:1977]. This has the effect of introducing a bit more variance back into the smoothed data. 
We instead performed a second smoothing pass on the smoothed data, which has the effect of weighting observations near the center of the window the highest, with the weights linearly dropping off as we reach either end of the smoothing window.

## Removing Curvature

Next, we fit a loess regression to the data. Loess regression [@cleveland:1979] is based on the assumption that the relationship between two random variables $X$ and $Y$ can be described in the form of a smooth, continuous function $f$ with $y_i = f(x_i) + \varepsilon_i$ for all values $i = 1, ..., n$. The function $f$ is approximated via locally weighted polynomial regressions. Parameters of the estimation are $\alpha$, the proportion of all points included in the fit (here, $\alpha = 0.75$), the weighting function and the degree of the polynomial (here, we fit a quadratic regression). 

The main idea of locally weighted regression is to use a weighting routine that  emphasizes the effect of points close to the fitting location and de-emphasizes the effect of points as they are further away. The weighting function used here is the tricubic function $w(d) = \left(1 - d^3\right)^3$, for $d \in [0,1]$ and $w(d) = 0$ otherwise. Here, $d$ is defined as the distance between $x_i$ and the location of the fit $x_o$ and the maximum distance of the range of the $x$-values for span $\alpha$ in $x_o$.

Figure \ref{fig:loess_fit} shows the loess fit, in blue, overlaid on the processed image of bullet 1-5. The fit seems to do a reasonable job of capturing the structure of the image.
Figure \ref{fig:loess_resid} shows the residuals from this fit. These residuals are called the **signature** of bullet 1-5.

\begin{figure}[hbtp]
  \centering
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{\label{fig:loess_fit} Loess fit for bullet 1-5.}
```{r loess_fit, echo=FALSE, fig.width=10, fig.height=3.5, out.width='\\textwidth', warning=FALSE,  message=FALSE}

br111.groove <- get_grooves(br111_bullet)

my.loess <- fit_loess(br111_bullet, br111.groove)

my.loess$fitted + ylab("Surface measurements")
```
\end{subfigure}
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{\label{fig:loess_resid} Residuals of loess fit for bullet 1-5.}
```{r boot_loess, echo=FALSE, fig.width=10, fig.height=3.5, out.width='\\textwidth', warning=FALSE}
#boot.loess <- boot_fit_loess(br111_bullet, br111.groove)
#poly <- with(boot.loess, data.frame(x = c(y,rev(y)), y=c(high, rev(low))))
#qplot(x=x, y=y, geom="polygon", fill=I("steelblue"), data=poly) +
ggplot() +
  geom_line(aes(x=y, y=resid), data=my.loess$data) +
  ylab("Residuals of loess fit") + xlab("y") + theme_bw()
```
\end{subfigure}
\caption{Fit and residuals of a loess fit to bullet 1-5 (Barrel~1). The residuals define the {\it signature} of bullet 1-5. %Blue areas around the residuals show  95\% bootstrap confidence intervals (based on $B=1,000$ bootstrap samples).
}
\end{figure}

# Automatic Matching

Applying the loess fit to a range of different signatures (see Figure \ref{fig:manualmatch-rgl} for signatures extracted at heights between 50$\mu m$ and 150$\mu m$) shows the 3D striation marks from two bullets. Signatures of bullet~1 are shown on the left (all extracted from heights below 100$\mu m$) and signatures of bullet 2 are shown on the right (extracted at heights above 100$\mu m$). Signatures are manually aligned, resulting in many of the striation marks to continuously pass from one side to the other. Visually, this allows for an easy assessment of these two bullet lands as a match. However, this match relies on visual inspection and is therefore subjective. The goal of this section is to eliminate the need for a visual inspection during the matching process and replace it by an automatic algorithm. This also allows for a quantification of the strength of the match.

\begin{figure}[hbtp]
\centering
\includegraphics[width=.65\linewidth]{images/matchup-rgl-copy.png}
\caption{\label{fig:manualmatch-rgl}3D view of the manually adjusted side-by-side comparison of bullet~1-5 and bullet 2-1 after removing the curvature. Bullet 2-1 is shaded light grey in the background.}
\end{figure}

In this section, we describe the algorithm for matching signatures first, and the impact of parameter choices in the subsections thereafter.

## Algorithm

```{r twolands100,  echo=FALSE}
images <- file.path(datadir, dir(datadir))
images <- images[grep(" ", images)]

im1 <- "images/Hamby (2009) Barrel/bullets/Br1 Bullet 1-5.x3p"
im2 <- "images/Hamby (2009) Barrel/bullets/Br1 Bullet 2-1.x3p"

#lof <- processBullets(paths = images[c(5,7)], x = 100)
subLOFx1 <- processBullets(read_x3p(im1), name = im1, x = 100)
subLOFx2 <- processBullets(read_x3p(im2), name = im2, x = 100)
#subLOFx1 <- subset(lof, bullet=="Br1 Bullet 1-5")
#subLOFx2 <- subset(lof, bullet=="Br1 Bullet 2-1")
subLOFx1$y <- subLOFx1$y + 23*1.5625 # working now!!!
lofX <- rbind(data.frame(subLOFx1), data.frame(subLOFx2))

# smooth
lofX <- lofX %>% group_by(bullet) %>% mutate(
  l30 = smoothloess(y, resid, span = 0.03)
)

# cut at .75
threshold <- .75
lofX$r05 <- threshold* sign(lofX$l30) * as.numeric(abs(lofX$l30) > threshold)
lofX$type <- factor(lofX$r05)
levels(lofX$type) <- c("groove", NA, "peak")
```

```{r twolands100adjust, echo=FALSE}
images <- file.path(datadir, dir(datadir))
images <- images[grep(" ", images)]

#lof <- processBullets(paths = images[c(5,7)], x = 100)
subLOFx1 <- processBullets(read_x3p(im1), name = im1, x = 100)
subLOFx2 <- processBullets(read_x3p(im2), name = im2, x = 100)
#subLOFx1 <- subset(lof, bullet=="Br1 Bullet 1-5")
#subLOFx2 <- subset(lof, bullet=="Br1 Bullet 2-1")
subLOFx1$y <- subLOFx1$y - min(subLOFx1$y)
subLOFx2$y <- subLOFx2$y - min(subLOFx2$y)
ccf <- ccf(subLOFx1$resid, subLOFx2$resid, lag.max=100, plot=FALSE)
lag <- ccf$lag[which.max(ccf$acf)]
subLOFx1$y <- subLOFx1$y - lag*1.5625 
lofY <- rbind(data.frame(subLOFx1), data.frame(subLOFx2))

```

```{r twolands100match, echo=FALSE, dependson='twolands100'}
matches <- lofX %>% group_by(y) %>% summarise(
  potential = (length(unique(type)) == 1),
  allnas = sum(is.na(type))/n(),
  type1 = na.omit(type)[1],
  type = paste(type, sep="|", collapse="|"),
  n = n()
)

matches$id <- cumsum(matches$allnas == 1) + 1
matches$lineid <- as.numeric(matches$allnas != 1) * matches$id

isMatch <- function(id, type) {
  if (id[1] == 0) return(FALSE)
#  browser()
  types <- strsplit(type, split = "|", fixed=TRUE) 
  t1 <- sapply(types, function(x) x[1])
  t2 <- sapply(types, function(x) x[2])
  if (all(t1 == "NA")) return(FALSE)
  if (all(is.na(t2))) return(FALSE)
  if (all(t2 == "NA")) return(FALSE)
  
  peak <- length(grep("peak", c(t1, t2))) > 0
  groove <- length(grep("groove", c(t1, t2))) > 0
  if (peak & groove) return(FALSE)

  return(TRUE)
}

lines <- matches %>% group_by(lineid) %>% summarise(
  meany = mean(y, na.rm=T),
  miny = min(y, na.rm=T),
  maxy = max(y, na.rm=T) + 1.5625,
  match = isMatch(lineid, type),
  type = type1[1]
)
lines <- subset(lines, lineid != 0)
```

\begin{figure}[hbtp]
\centering
    \begin{subfigure}[t]{\textwidth}\centering
\caption{Loess smooth of signatures  at a height of $x = 100\mu m$ (span is 0.03).\label{fig:smooth}}{%
```{r smooth, echo=FALSE, dependson='twolands100', fig.width=7, fig.height=3.5, out.width='.65\\textwidth', warning=FALSE}
lofX$bulletname <- scrubPath(lofX$bullet)
qplot(data=lofX, y, resid, colour=I("grey30"), size=I(0.75), geom="line", group=bulletname) +
  geom_line(aes(y=l30), colour="grey70", size=0.5) +
  facet_grid(bulletname~.) +
#  scale_colour_manual("", values=cols) +
  theme_bw() + ylab("") + 
  theme(legend.position="none")
```
    }
\end{subfigure}
\begin{subfigure}[t]{\textwidth}\centering
\caption{Using a rolling median peaks and valleys are identified for each signature. Peaks and valleys on the signature correspond to striation marks on the bullet's surface. \label{fig:smoothcutb}}{%
```{r smoothcut, echo=FALSE, dependson='twolands100', fig.width=7, fig.height=3.5, out.width='.65\\textwidth', warning=FALSE}
#images <- file.path(datadir, dir(datadir))
  #lof <- processBullets(paths = images[c(5,7)], x = 100)

  subLOFx1 <- processBullets(read_x3p(im1), name = im1, x = 100)
  subLOFx2 <- processBullets(read_x3p(im2), name = im2, x = 100)
  lof <- rbind(subLOFx1, subLOFx2)

  lof <- bulletSmooth(lof)
  bAlign = bulletAlign(lof)
  lofX <- bAlign$bullet  

  b12 <- unique(lof$bullet)
  peaks1 <- get_peaks(subset(lofX, bullet==b12[1]), smoothfactor = 25)
  peaks2 <- get_peaks(subset(lofX, bullet == b12[2]), smoothfactor = 25)
  peaks1$lines$bullet <- b12[1]
  peaks2$lines$bullet <- b12[2]
  peaks <- rbind(peaks1$lines, peaks2$lines)
  
  peaks$bulletname <- scrubPath(peaks$bullet)
  lofX$bulletname <- scrubPath(lofX$bullet)
  ggplot() + theme_bw() +
    geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), ymin=-6, ymax=6, 
              data=peaks,  alpha=0.2) +
    geom_vline(aes(xintercept=extrema, colour=factor(type)), 
               data= peaks, alpha=0.7) +
    scale_colour_brewer(palette="Set2") + 
    scale_fill_brewer(palette="Set2") +
    theme(legend.position="none") + 
    facet_grid(bulletname~.) +
    geom_line(aes(x=y, y=l30, group=bulletname), data=lofX) +
  ylab(expression(paste("Signatures (in ",mu,"m)", sep=""))) 
```
}
\end{subfigure}    
\begin{subfigure}[t]{\textwidth}\centering
    \caption{Rectangles in the back identify a striation mark on one of the bullets.  Matching striation marks are indicated by color filled rectangles and marked by an `o'. Mismatches are filled in grey and  marked by an `x'.   \label{fig:smoothcutd}}{%
```{r smoothmatch, echo=FALSE, dependson='twolands100match', fig.width=7, fig.height=2.75, out.width='.65\\textwidth', warning=FALSE}
peaks1$lines$bullet <- b12[1]
peaks2$lines$bullet <- b12[2]

lines <- striation_identify(peaks1$lines, peaks2$lines)

ggplot() + 
  geom_rect(aes(xmin = xmin, xmax = xmax, fill=factor(type)), ymin = -6, ymax=6.5,  data = lines, alpha=0.2, show.legend = FALSE) +
  theme(legend.position="bottom") +
  geom_text(aes(x = meany), y= -5.5, label= "x", data = subset(lines, !match)) +
  geom_text(aes(x = meany), y= -5.5, label= "o", data = subset(lines, match)) +
  ylim(c(-6,6.5)) + theme_bw() +
  geom_line(data=lofX, aes(x=y, y=l30, group=bulletname, linetype=bulletname)) +
  scale_linetype_discrete("") +
  scale_colour_manual("", values=cols) +
  scale_fill_brewer("", palette="Set2", na.value=alpha("grey60", 0.5)) +
    theme(legend.position = c(1,1.1), legend.justification=c(1,1),
        legend.background = element_rect(fill=alpha('white', 0.4))) + 
  ylab(expression(paste("Signatures (in ",mu,"m)", sep=""))) +
  xlab("y")
```
}
\end{subfigure}
\caption{\label{fig:match}Matching striation marks: smooth (a), identify peaks and valley (b), and match peaks and valleys between signatures (c).}
\end{figure}

Figure \ref{fig:match} gives an overview of the automated matching routine: 
We first identify a stable region for each bullet land and extract the signature at the lowest height in this region, because typically, individual characteristics are best expressed at the lower end of the bullet, near the base (see Section \ref{signature-intensities} for a more detailed discussion). All of the other steps are done on pairs of bullet lands:

1. **Smooth the two signatures** using a loess with a very small span (see Figure \ref{fig:smooth}). 
2. Use cross-correlation to **find the best alignment** of the two signatures: shift one of the signatures by the lag indicated by the cross-correlation function (see Figure \ref{fig:ccf} for the cross-correlation function and Figure \ref{fig:crosscutX} for the resulting shift).
3. Using a rolling average, **identify peaks and valleys** for each of the signatures. We then define an interval around the location of the extrema on each side as one third of the distance to the location of the next extrema (see Figure \ref{fig:smoothcutb}). Peaks and valleys constitute the *striation marks* on the bullet.
4. **Match striations across signatures:** based on the intervals around the extrema as defined above, we identify common intervals as the areas in which two or more of the individual intervals overlap: a joint interval is defined as the smallest interval that encompasses all of the overlapping intervals. A joint interval is then called a match(ing stria) between the signatures, if all of the intervals are of the same type of extrema, i.e. they are either all peaks or all valleys. In Figure \ref{fig:match} all matches are shown as color-filled rectangles corresponding to their type of extrema (peaks are shown in orange, and valleys in green). Non-matching intervals are left grey. 
5. **Extract features from the aligned signatures and the matches between them:** many different features can be extracted from the aligned signatures. Here, we describe a few of the ones that can be found in the literature and some that we found to be of practical relevance:

i. Maximal number of CMS (consecutive matching striae), and, similarly, the number of consecutively non-matching striae (CNMS), 
ii. Number of matches and non-matches,
iii. The value of the cross-correlation function (ccf) between the aligned signatures [@vorburger:2011],
iv. Average difference $D$ between signatures, defined as the Euclidean vertical distance between surface measurements of aligned signatures. Let $f(t)$ and $g(t)$ be smoothed, aligned signatures:
\[
D^2 = \frac{1}{\text{\#}t}\sum_t \left[f(t) - g(t)\right]^2,
\]
v. The sum $S$ of average absolute heights of matched extrema: for each of the two matched stria, compute the average of the absolute heights of the peaks or valleys. $S$ is then defined as the sum of all these averages. 

The difference $D$ between signatures is here defined as the Euclidean distance (in $\mu m$). In the paper by @ma:2004, distance is defined as a measure relative to the first signature, which serves as a comparison reference and is therefore a unitless quantity. 

Counting the maximal number of CMS is part of the current practice to identify bullet matches [@nichols:1997, @nichols:2003, @nichols:2003b]. 
In the example of Figure \ref{fig:match}, the number of consecutive matching striations (CMS) is fifteen, a high number suggestive of a match between the lands. Note that the definition of CMS we use does not match the one given in @thompson:2013. There, CMS is defined only in terms of matching peaks without regarding valleys. Additionally, peaks in  @thompson:2013 are  used only if they can be identified and matched `within a tolerable range' between lands. The definition given here is computationally less complex, but should yield highly correlated values, because of the requirement to only consider signatures from a stable region in the land (see Section \ref{impact-of-bullet-height} for further details on stability of regions). In the Hamby study, the definition of CMS by @thompson:2013 leads to approximately half of the values of CMS defined in this paper (with a correlation coefficient between the values of the two definitions of about 0.92). 
For lead bullets, such as used in the Hamby study, @biasotti:1959 considered four or more consecutive peaks (corresponding to eight or more consecutive lines in our definition) to be sufficient evidence of a match.  

Determining a threshold such that CMS values above the threshold indicate a match with high reliability is beyond the scope of this work, even though it is critically important in practice. We provide some ideas in the next section, but first we assess the robustness of the matching algorithm to different choices of the parameter values.

## Horizontal Alignment

Signatures of each of the two lands, 1-5 and 2-1,  in Figure \ref{fig:manualmatch-rgl} are shown in Figure \ref{fig:cross100} extracted at a height of $x = 100\mu m$. Striation marks show up in these representations as peaks and valleys.  The individual characteristics are prominent and, again, suggest a match between the lands. A horizontal shift of one of the signatures (result shown in Figure \ref{fig:crosscutX}) emphasizes the strong similarities between signatures. 

\begin{figure}[hbtp]
\centering
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{Raw bullet land signatures.\label{fig:crosscut}}{%
```{r crosscuts, echo=FALSE, dependson='twolands100', fig.width=7, fig.height=3, out.width='\\textwidth'}
lof$bulletname <- scrubPath(lof$bullet)
qplot(y, resid, group=bulletname, colour=bulletname, data=lof, 
      geom="line") + 
  scale_colour_manual("", values=cols) + theme_bw() +
  theme(legend.position = c(1, 1.1), legend.justification=c("right", "top"),
        legend.background = element_rect(fill=alpha('white', 0.4))) + 
  ylab("Residuals of loess fit")
```
    }
\end{subfigure}    
\begin{subfigure}[b]{.49\textwidth}\centering
    \caption{Aligned signatures.\label{fig:crosscutX}}{%
```{r crosscutsX, echo=FALSE, dependson='twolands100', fig.width=7, fig.height=3, out.width='\\textwidth'}
lofY$bulletname <- scrubPath(lofY$bullet)
qplot(y+min(lof$y), resid, group=bulletname, colour=bulletname, data=lofY, 
      geom="line") + xlab("y") +
  scale_colour_manual("", values=cols) + theme_bw() +
  theme(legend.position = c(1, 1.1), legend.justification=c("right", "top"),
        legend.background = element_rect(fill=alpha('white', 0.4))) + 
  ylab("Residuals of loess fit")
```
    }
\end{subfigure}
\caption{\label{fig:cross100}Signatures of bullets 1-5 and 2-1 taken at  heights of $x = 100\mu m$. A horizontal shift of the values of bullet 1-5 to the right shows the similarity of the striation marks.}
\end{figure}

For this alignment we use the cross-correlation function to find a maximal amount of agreement between the signatures [@bachrach:2002, @chu:2010, @vorburger:2011, @thompson:2013].
This horizontal shift is based on the cross-correlation between the two signatures: let $f(t)$ and $g(t)$ define the signature values  at $t$, where $t$ are locations between 0~$\mu m$ and about 2500 $\mu m$, 1.5625 $\mu m$ apart.
The cross-correlation between $f$ and $g$ at lag $k$ is then defined as
\[
(f * g) (k) = \sum_t f(t+k) g(t),
\]
with suitably defined limits for the summation.

\begin{figure}[hbtp]
  \centering
```{r ccf, echo=FALSE, fig.width = 8, fig.height = 3, out.width = '.65\\textwidth'}
lag <- ccf$lag[which.max(ccf$acf)]
dframe <- data.frame(lag = ccf$lag, yend=ccf$acf)
ggplot(data = dframe) +
  geom_segment(aes(x = lag, xend=lag, yend=yend), y = 0, colour="grey50") +
  theme_bw() + ylab("Correlation") + xlab("Lag k") +
  geom_segment(x = lag, xend = lag, y = 0, yend =  max(ccf$acf), colour="black") +
  scale_x_continuous(breaks = c(-100, -50, lag, 0, 50, 100),
                     minor_breaks = c(-75,-25,25, 75))
```
\caption{\label{fig:ccf}Cross-correlation function between the two signatures shown in Figure \ref{fig:crosscut} at lags between -100 and 100. The correlation is maximized at a lag of -17, indicating the largest amount of agreement between the signatures. Figure \ref{fig:crosscutX} shows the lag-shifted signatures.}
\end{figure}

## Impact of Bullet Height

The height at which signatures are extracted for a comparison between bullet lands matters -- signatures taken from heights that are further apart, show more pronounced differences between the signatures. 
This poses both a caveat to matching attempts as well as an opportunity for quality control: we have to be aware of the height that was used in a matching. Visually, matches degrade if the signatures upon which the match is based are from heights further than 200$\mu m$ apart  (see Section \ref{cross-correlation-at-multiple-heights} for more discussion). However, we can extract signatures from multiple heights of the same bullet land for an initial assessment of its quality. By comparing signatures from heights that are not too far apart -- 25$\mu m$ to 50$\mu m$ -- we get an indication whether the signatures come from a rapidly changing section of the surface, indicative of a break-off or some other damage, or from a stable section, where we have a reasonable expectation of finding matches to other signatures. In the approach here, we keep increasing the height $x$ at which the signature is taken until we find a section with a stable pattern. This process is shown in Figure \ref{fig:crosscuts2} at the example of bullet 1-1 from barrel 3, where `stability' is  defined as two aligned signatures from heights chosen 25$\mu m$ apart having a cross-correlation of at least 0.95. 

\begin{figure}[hbtp]
  \centering
```{r crosscuts-vary-b31, echo=FALSE, fig.width = 12, fig.height = 7, out.width = '\\textwidth', warning = FALSE}
paths <- file.path(datadir, dir(datadir))
paths <- paths[grep("Br[0-9]", paths)]

crosscuts <- seq(100,250, by=25)
lof <- processBullets(read_x3p(paths[37]), name = paths[37], x = crosscuts)
lof$bullet <- paste(scrubPath(lof$bullet), lof$x)

reslist <- lapply(1:length(crosscuts[-1]), function(i) {
  b2 <- subset(lof, x %in% crosscuts[i:(i+1)])
  lofX <- bulletSmooth(b2)
  bAlign <- bulletAlign(lofX)
  lofX <- bAlign$bullet  

  b12 <- unique(b2$bullet)
  peaks1 <- get_peaks(subset(lofX, bullet==b12[1]), smoothfactor = 25)
  peaks2 <- get_peaks(subset(lofX, bullet == b12[2]), smoothfactor = 25)

#  threshold <- bulletPickThreshold(lofX, thresholds = seq(0.3, 1.5, by = 0.05))
#  lines <- striation_identify(lofX, threshold = threshold)
  peaks1$lines$bullet <- b12[1]
  peaks2$lines$bullet <- b12[2]
  lines <- striation_identify(peaks1$lines, peaks2$lines)

  maxCMS <- maxCMS(lines$match==TRUE)
  list(maxCMS = maxCMS, ccf = bAlign$ccf, lines=lines, bullets=lofX)
})

ccfs <- sapply(reslist, function(res) res$ccf)

lop <- lapply(reslist, function(res) {
ggplot() +
  theme_bw() + 
  geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), show.legend=FALSE, ymin=-6, ymax=5.5, data=res$lines, alpha=0.2) +
  geom_line(aes(x = y, y = l30, linetype = bullet),   data = res$bullets) +
  scale_linetype_discrete("") +
#  scale_colour_brewer("", palette="Set1", na.value=alpha("grey50", alpha=0.5)) +
  scale_fill_brewer("", palette="Set2", na.value=alpha("grey50", alpha=0.5)) +
  theme(legend.position = c(1,1.2), legend.justification=c(1,1),
        legend.background = element_rect(fill=alpha('white', 0.4))) + 
  ylim(c(-6,6.5)) + ylab("") + xlab("") + 
  geom_text(aes(x = meany), y= -5.5, label= "x", data = subset(res$lines, !match)) +
  geom_text(aes(x = meany), y= -5.5, label= "o", data = subset(res$lines, match))
})

grid.arrange(lop[[1]], lop[[2]], lop[[3]], 
             lop[[4]], lop[[5]], lop[[6]],
             ncol=2)
```
\caption{\label{fig:crosscuts2}Signatures for barrel 3, bullet 1-1 extracted from varying heights. Initially, the match between signatures taken at heights 25$\mu m$ apart is affected strongly by some break off at the bottom of the bullet. At a  level of $175\mu m$ the bullet's signature stabilizes. For this land, matches should not be attempted at lower heights.}
\end{figure}

## Varying Smoothing Factor

As mentioned earlier, the algorithm for detecting peaks and valleys depends on the selection of a smoothing window, called the smoothing factor or span. A smoothing factor of $k$ means that the $k$ closest observations to $x_o$ are considered for a fit for $x_o$. Because surface measurements are recorded at an equidistant resolution (here, of 1.5625$\mu m$), we decided to only consider odd smoothing factors $2k + 1$, which means that the $k$ observations to the left and right of $x_o$ are considered for a local fit of $x_o$. For detecting and removing the grooves prior to fitting a loess regression we selected a smoothing factor  of 35, while for detecting the peaks/valleys of the loess residuals a smoothing factor of 25 seems more appropriate.

Figure \ref{fig:varysmooth} displays the  peaks and valleys detected in the same signature at smoothing factors of 5, 25, and 45, respectively. The dark line corresponds to the smoothed values, while the grey line in the back shows the raw signature. The choice of smoothing factor is a classical decision of a bias/variance trade-off. It is immediately clear that a small smoothing factor like 5 is a poor choice. It results in a significant amount of noise in the data such that even just a point or two can skew the rolling average enough for a peak or valley to be detected. Given that striation widths are typically much larger, we are in effect muddying the waters by performing such minimal smoothing. Another consideration is that the smoothing should not fall below the  resolution of the equipment at which the surface measurements are taken -- so as to not introduce artifacts in the analysis. 

A larger smoothing factor on the other hand (like 45), seems to be a more plausible option. Most of the peaks/valleys present which are detected by a smoothing factor of 25 are also detected at 45. However, some notable issues arise. Notice that the valley on the right hand side of the image is smoothed out, and thus not detected. On the left hand side, a double peak is detected - that might be a questionable decision - but there are several peaks in the middle, that are smoothed out, for example the peak at around $y = 750$. That is, in many cases, large windows are smoothing out some of the structure that we wish to see. Furthermore, it can be seen that the peaks/valleys are often shifted relative to their position in the original loess residuals, or in the smoothed data with smaller smoothing factors.

\begin{figure}[hbtp]
  \centering
```{r smoothfac1, echo=FALSE, fig.width=7, fig.height=5.2, out.width='.8\\textwidth', warning=FALSE}
br111.groove <- get_grooves(br111_bullet)
br111.loess <- fit_loess(br111_bullet, br111.groove)

peakslist <- lapply(c(5, 25, 45), function(s) {
  peaks <- get_peaks(br111.loess$data, smoothfactor = s)
  peaks$lines$smoothfactor = s
  peaks$plot$data$smoothfactor = s
  
  peaks
})

peaks <- plyr::ldply(peakslist, function(x) x$lines)
smooths <- plyr::ldply(peakslist, function(x) x$plot$data)

ggplot() + theme_bw() +
  geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), ymin=-6, ymax=6, 
            data=peaks,  alpha=0.2) +
  geom_vline(aes(xintercept=extrema, colour=factor(type)), data= peaks, alpha=.7) +
  scale_colour_brewer(palette="Set2") + 
  scale_fill_brewer(palette="Set2") +
  theme(legend.position="none") + 
  facet_grid(smoothfactor~., labeller="label_both") +
  geom_line(aes(x=y, y=resid, group=x), data=br111.loess$data,  
            colour="grey50") +
  geom_line(aes(x=y, y=smoothed), data=smooths) +
  ylab(expression(paste("Signature values (in ",mu,"m)", sep=""))) 
```
\caption{\label{fig:varysmooth} Peak/valley detection at smoothing factors of 5, 25, and 45, respectively. Note that a smoothing factor of 5 yields enough noise that many very minimal overlapping peaks and valleys are detected, while a smoothing factor of 45 might over-smooth and cause the peaks/valleys to either end disappear or shift horizontally from their original position in the signature.}
\end{figure}

# Evaluation

In order to get a better understanding of  how  the matching algorithm works in known matches and non-matches, we investigate its performance using the James Hamby study data. As a first step, we automatically assess the quality of each of the lands by  checking that we can identify a stable region on each land. For this, we compute the cross-correlation of signatures extracted from heights 25$\mu m$ apart. For a stable region, we require a minimum of 0.95 for the cross correlation. Four lands from different bullets are flagged as problematic in this respect. A visual inspection (see Figure \ref{fig:fourflags}) shows that each one of these lands has scratch marks across the surface, also known as `tank rash'.

\begin{figure}
  \centering
\begin{subfigure}[t]{.49\textwidth}\centering
\caption{Barrel 6 Bullet 2-1}
\includegraphics[width=\textwidth]{images/br6-2-1-grey.png}
\end{subfigure}
\begin{subfigure}[t]{.49\textwidth}\centering
\caption{Barrel 9 Bullet 2-4}
\includegraphics[width=\textwidth]{images/br9-2-4-grey.png}
\end{subfigure}
\begin{subfigure}[t]{.49\textwidth}\centering
\caption{Unknown Bullet B-2}
\includegraphics[width=\textwidth]{images/b-2-grey.png}
\end{subfigure}
\begin{subfigure}[t]{.49\textwidth}\centering
\caption{Unknown Bullet Q-4}
\includegraphics[width=\textwidth]{images/q-4-grey.png}
\end{subfigure}
\caption{\label{fig:fourflags}Images of the four lands that got flagged during the quality assessment. All of them show scratch marks (tank rash) across the striation marks from the barrel. They are excluded from the remainder of the analysis.}
\end{figure}

We exclude these four lands from further matching considerations and run all remaining lands from the unknown bullets against all remaining lands from known bullets for matches, i.e. we are comparing $15 \times 6 -2 = 90 - 2 = 88$ lands from unknown bullets against $2 \times 10 \times 6 -2 = 120 - 2 = 118$ lands from known bullets, yielding a total of $10,384$ land-to-land comparisons. Out of these comparisons, there are 172 known matches (KM), while the rest are known non-matches (KNM). Ideally, results look like the results in Figure \ref{fig:hamby-perfect}: Figure \ref{fig:hamby-perfect}a shows the distribution of the number of maximum consecutive matching striae between land C-3 and all 118 lands from known bullets. Two lands show a high CMS. These correspond to the known matches with C-3, shown in Figures \ref{fig:hamby-perfect}b and \ref{fig:hamby-perfect}c. 
 Unfortunately, not all results are as clear cut. 
It might not be reasonable to assume that we can match all lands, but the idea is to try to maximize the number of matches to get an overview of what we might be able to expect from an automated match. 

\begin{figure}[hbtp]
\begin{subfigure}[t]{\textwidth}\centering
\caption{Maximal number of CMS between unknown bullet C-3 and all of the other 118 considered (known) lands. For two lands the number of maximum CMS is high. }
```{r cms, echo=FALSE, fig.width=7, fig.height=3.5, out.width='.5\\textwidth'}
load("data/data-25-25/unkn9.RData")
cmsdist <- sapply(reslist, function(x) x$maxCMS)

qplot(cmsdist, geom="bar") + theme_bw() + xlab("Number of CMS")
```
\end{subfigure}
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{Overlaid signatures of C-3 and the land with the top matching CMS.}
```{r top, echo=FALSE, fig.width=7, fig.height=3.25, out.width='\\textwidth', warning = FALSE}
res <- reslist[[which.max(cmsdist)]]  
#res <- reslist[[which.max(cmsdist[-which.max(cmsdist)])]]  # number 2
res$bullets$bullet <- scrubPath(res$bullets$bullet)

print(ggplot() +
  theme_bw() + 
  geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), ymin=-6, ymax=5, data=res$lines, alpha=0.2, show.legend=FALSE) +
  geom_line(aes(x = y, y = l30, linetype = bullet),  data = res$bullets) +
  scale_linetype_discrete("") + ylab("") +
  scale_fill_brewer("", palette="Set2", na.value=alpha("grey50", alpha=0.5)) +
  theme(legend.position = c(1,1), legend.justification=c(1,1)) + 
  ylim(c(-6,6)) +
  geom_text(aes(x = meany), y= -5.5, label= "x", data = subset(res$lines, !match)) +
  geom_text(aes(x = meany), y= -5.5, label= "o", data = subset(res$lines, match)))

```
\end{subfigure}
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{Top 2 match with C-3 based on CMS.}
```{r top2, echo=FALSE, fig.width=7, fig.height=3.25, out.width='\\textwidth', warning = FALSE}
res <- reslist[[which(cmsdist==14)]]  # number 2
res$bullets$bullet <- scrubPath(res$bullets$bullet)

print(ggplot() +
  theme_bw() + 
  geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), ymin=-6, ymax=5, data=res$lines, alpha=0.2, show.legend=FALSE) +
  geom_line(aes(x = y, y = l30, linetype = bullet),  data = res$bullets) +
  scale_linetype_discrete("") + ylab("") +
  scale_fill_brewer("", palette="Set2", na.value=alpha("grey50", alpha=0.5)) +
  theme(legend.position = c(1,1), legend.justification=c(1,1)) + 
  ylim(c(-6,6)) +
  geom_text(aes(x = meany), y= -5.5, label= "x", data = subset(res$lines, !match)) +
  geom_text(aes(x = meany), y= -5.5, label= "o", data = subset(res$lines, match)))

```
\end{subfigure}
\caption{\label{fig:hamby-perfect}Showcase scenario  when matching with CMS works very well. Unfortunately the matches are not always that convincing.}
\end{figure}

Figure \ref{fig:cms} shows the strong connection between the maximal number of consecutive striae and matches in the Hamby study. All 42 pairs of lands with at least thirteen CMS in common are matches. 

\begin{figure}[hbtp]
  \centering
\begin{minipage}[t]{.47\textwidth}
```{r cms-bars, echo=FALSE, fig.width=7, fig.height=3.5, out.width='\\textwidth'}
ggplot(data=subset(bstats, !flagged)) + 
  geom_bar(aes(x=factor(CMS))) + theme_bw() +
  theme(legend.position="bottom") + 
  xlab("maximum CMS")
```
\end{minipage}
\begin{minipage}[t]{.52\textwidth}
```{r cms-spines, echo=FALSE,  fig.width=7, fig.height=4, out.width='\\textwidth'}
bstats$km <- c( "Known non-match", "Known match")[as.numeric(bstats$match)+1]
ggplot(data=subset(bstats, !flagged)) + 
  geom_bar(aes(x=factor(CMS), fill=km), position="fill") +
  theme(legend.position="bottom") + 
  scale_fill_brewer("", palette="Paired") +
  xlab("maximum CMS") + ylab("Proportion")
```
\end{minipage}
\caption{\label{fig:cms}Distribution of maximal CMS (left). Conditional barchart (Hummel 1996) on the right: heights show probability of match/non-match given a specific CMS. All land-to-land comparisons with at least 13 CMS are matches.}
\end{figure}

There are two things that should be noted at this point: the automated algorithm finds a relatively high number of CMS even for non-matches. On average, there are `r mean(bstats$CMS[!bstats$match], na.rm=T)` maximal CMS between known non-matches (with a standard deviation of `r sd(bstats$CMS[!bstats$match])`). Known matches share on average `r mean(bstats$CMS[bstats$match])` maximal CMS, with a standard deviation of `r sd(bstats$CMS[bstats$match])`. While the probability for a match increases with the number of maximal CMS, a large number of maximal CMS by itself is not indicative of a match, as was previously pointed out by @miller:1998. Figure \ref{fig:mismatch} shows a known mismatch between two lands that share twelve consecutively matched striae. Visually we can easily tell that these two lands do not match well.

\begin{figure}[hbtp]
  \centering
```{r strange-res, echo=FALSE, fig.width=7, fig.height=3.25, out.width='.65\\textwidth', warning=FALSE}
load("data/data-25-25/unkn47.RData")
res <- reslist[[106]]
res$bullets$bullet <- scrubPath(res$bullets$bullet)

print(ggplot() +
  theme_bw() + 
  geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), ymin=-6, ymax=5, data=res$lines, alpha=0.2, show.legend=FALSE) +
  geom_line(aes(x = y, y = l30, linetype = bullet),  data = res$bullets) +
  scale_linetype_discrete("") + ylab("") +
  scale_fill_brewer("", palette="Set2", na.value=alpha("grey50", alpha=0.5)) +
  theme(legend.position = c(1,1), legend.justification=c(1,1)) + 
  ylim(c(-6,6)) +
  geom_text(aes(x = meany), y= -5.5, label= "x", data = subset(res$lines, !match)) +
  geom_text(aes(x = meany), y= -5.5, label= "o", data = subset(res$lines, match)))
```
\caption{\label{fig:mismatch}Known mismatch with a relatively large number of maximal consecutive matching striae (twelve) in the middle. The pattern in the middle does look surprisingly similar, however the outer ends of the signatures easily reveals this comparison as mismatch.}
\end{figure}

For smaller numbers of CMS, the percentage of false positives quickly increases. However, if we take other features of the image into account, we can increase the number of correct matches considerably: Figure \ref{fig:densities} gives an overview of the densities of all of the features derived earlier, for known matches (KM) and known non-matches (KNM). The densities of almost all of the features show strong differences between matches and non matches. For example, a high amount of cross-correlation between two signatures is indicative of a match -- in the Hamby study, only known matches have a cross-correlation of 0.75 or higher. There are 97 land-to-land comparisons with a cross-correlation that high.

\begin{figure}[hbtp]
  \centering
```{r density-overview, echo=FALSE, warning=FALSE, fig.width=12.5, fig.height=6.25, out.width='\\textwidth'}
features <- c("CMS", "CNMS",  "num.matches", "num.nonmatches", "D", "S", "ccf")
densities <- plyr::ldply(features, function(x) {
  xr <- range(bstats[,x])
  xx <- seq(xr[1], xr[2], length.out=500)
  
  densKM <- sm.density(bstats[,x][bstats$match], display="none", 
                       eval.points=xx, weights=NA, method="normal")
  densKNM <- sm.density(bstats[,x][!bstats$match], display="none", 
                       eval.points=xx, weights=NA, method="normal")
  dframe <- data.frame(var=x, x = xx, KM=densKM$estimate, KNM = densKNM$estimate)
  dframe <- rbind(data.frame(var=x, x = xr[1], KM=0, KNM=0),
        dframe, data.frame(var=x, x = xr[2], KM=0, KNM=0))
  dframe$order <- 1:nrow(dframe)
  dframe
}) 

dm <- melt(densities, measure.var=c("KM", "KNM"))
dm$var <- factor(dm$var, levels=features)
levels(dm$var)[3:4] <- c("#matches", "#non-matches")
levels(dm$variable) <- c("Known matches (KM)", "Known non-matches (KNM)")
dm$varLabel <- dm$var
levels(dm$varLabel) <- c(
  "Consecutive Matching Striae (CMS)", 
  "Consecutive Non-Matching Striae (CNMS)",
  "#matches",
  "#non-matches",
  "Average difference (D)",
  "Sum of peaks (S)",
  "Cross-correlation function (ccf)")

qplot(x, value, group=variable, geom="polygon", data=dm, fill=variable, 
      alpha=I(0.6), colour=I("grey20")) + 
  facet_wrap(~varLabel, ncol=4, scales="free") +
  theme_bw() + ylab("") + xlab("") +
  scale_fill_brewer("Bullet Land Pairs", palette="Paired") +
  theme(legend.position=c(1,0), 
        legend.justification = c("right", "bottom"),
        legend.background = element_rect(colour="grey75"),
        axis.title.y=element_blank(),
        plot.margin = unit(c(0,0,0,0), unit="cm"))
```
\caption{\label{fig:densities}Overview of all the marginal densities for features described in Section \ref{algorithm}. Shifts in the mode of the density functions between known matches and known non-matches indicate the variable's predictive power in distinguishing matches and non-matches. Predictive power is shown in more detail in Figure \ref{fig:rocs}.}
\end{figure}

\begin{figure}[hbtp]
  \centering
```{r rocs-overview,echo=FALSE, warning=FALSE, fig.width=12.5, fig.height=6.25, out.width='\\textwidth'}
# plot false positive against false negative
# prob of false positive: P(match | KNM)
# prob of false negative: P(non-match | KM)

# for  X < c: we get probability for P(X < c | KNM) and P(X < c | KM)


features <- c("CMS", "CNMS",  "num.matches", "num.nonmatches", "D", "S", "ccf")
errors <- plyr::ldply(features, function(x) {
  xx <- unique(bstats[,x])
  if (length(xx) > 500) {
    xr <- range(bstats[,x])
    xx <- seq(xr[1], xr[2], length.out=500)
  }
  # upper and lower rule:
  # upper: match is defined as X > xx
  # lower: match is defined as X < xx
  subbstats <- subset(bstats, flagged == FALSE)
  subbstats$match <- factor(subbstats$match)
  
  errors <- plyr::ldply(xx, function(cc) {
    idx <- which(subbstats[,x] >= cc)
    upper <- data.frame(xtabs(~match, data= subbstats[idx,,drop=FALSE])/
                          xtabs(~match, data= subbstats))
    idx <- which(subbstats[,x] <= cc)
    lower <- data.frame(xtabs(~match, data= subbstats[idx,,drop=FALSE])/
                          xtabs(~match, data= subbstats))
    
    data.frame(value = cc, match=lower[,1], lower=lower[,2], upper=upper[,2])
  })
  
  errors$variable <- x
  errors
}) 

rocs.lower <- dcast(errors, variable+value~match, value.var="lower")
rocs.lower$type <- "lower"
rocs.upper <- dcast(errors, variable+value~match, value.var="upper")
rocs.upper$type <- "upper"
rocs <- rbind(rocs.lower, rocs.upper)

set.seed(20140105)
aucs <- plyr::ldply(features, function(x) {
  subbstats <- subset(bstats, flagged == FALSE)
  pos.scores <- sample(subbstats[which(subbstats$match),x], 50000, replace=TRUE)
  neg.scores <- sample(subbstats[which(!subbstats$match),x], 50000, replace=TRUE)
  data.frame(x, upper=mean(pos.scores > neg.scores), 
             lower = mean(pos.scores < neg.scores))
})
aucs <- melt(aucs, measure.var=c("lower", "upper"))
names(aucs) <- c("variable", "type", "auc")
rocs <- merge(rocs, aucs, by=c("variable", "type"))
rocs$variable <- reorder(rocs$variable, -rocs$auc, min)
levels(rocs$variable)[3:4] <- c("#non-matches", "#matches")
rocs$varLabel <- rocs$variable
levels(rocs$varLabel) <- c(
  "Sum of peaks (S)",
  "Cross-correlation function (ccf)",
  "#non-matches",
  "#matches",
  "Consecutive Non-Matching Striae (CNMS)",
  "Consecutive Matching Striae (CMS)", 
  "Average difference (D)"
  )


labels <- unique(subset(rocs, auc > 0.5)[,c("variable", "auc")])
labels$labels <- sprintf("AUC: %.2f", labels$auc)
labels$type <- NA

eer <- rocs %>% filter(auc > 0.5) %>% 
  mutate(differror = abs(1-`TRUE` - `FALSE`))
eer <- eer %>% group_by(variable) %>% mutate(
  minerror = differror==min(differror),
  labels = sprintf("EER: %.2f", 0.5*(`FALSE`+1-`TRUE`))
) %>% filter(minerror==TRUE)
  
ggplot(data=subset(rocs, auc > 0.5)) +
  geom_ribbon(aes(x=`FALSE`, ymax=`TRUE`, ymin=0), alpha=0.2) +
  geom_line(aes(x=`FALSE`,y=`TRUE`))+ theme_bw() + 
  xlab("False negative rate") + 
  ylab("False positive rate") + facet_wrap(facets=~varLabel, ncol=4) + 
  geom_label(x=1, y=0, aes(label=labels), data=labels, hjust=1, vjust=0) + 
  geom_abline(linetype=2) + 
  geom_point(aes(x=`FALSE`, y=`TRUE`), data=eer, size=2.5) +
  geom_label(x=1, y=0.125, aes(label=labels), data=eer, hjust=1, vjust=0)
```
\caption{\label{fig:rocs}ROC curves for all of the features described in Section \ref{algorithm}. Variables are sorted according to their area under the curve (AUC). The equal error rate (EER) is marked by a point on the ROC curve. Except for the distance $D$ between signatures, all individual features derived from the surface measurements and the aligned striation marks are more predictive than the maximal CMS.}
\end{figure}

All of the features in Figure \ref{fig:densities} show large, if not significant, differences between matches and non-matches. The predictive power of each one of these features is shown in the form of the Receiving Operating Characteristic (ROC) curves in Figure \ref{fig:rocs}. The features are arranged in descending order according to the area under the curve (AUC).
The dots mark the equal error rate, i.e. the location on the ROC curve, where false positive and false negative error rates are the same. The smaller the value, the better. We see that in this instance a low equal error rate (EER) goes hand in hand with high predictive power as measured in AUC.
The feature with the highest individual predictive power is $S$, the sum of the average heights of two signatures at peaks and valleys. The maximal number of CMS is only in the seventh position here. The overall high AUC values indicate that we can successfully employ machine learning methods to distinguish matches from non-matches.

Using recursive partitioning, we fit a decision tree [@breiman:1984, @rpart, @rpart.plot] to predict matches between lands based on features derived from the image files. The resulting tree is shown in Figure \ref{fig:tree}. A total of `r sum(bstats$match[bstats$pred>0.5])` lands is being matched correctly. Interestingly, the number of consecutive matching striae does not feature in this evaluation. 
Instead of CMS, cross-correlation (ccf) between the signatures is very important in the matching process  by the decision tree. Aside from  cross correlation, the total number of matches is also included in the decision rule. 
Between cross-correlation and CMS, cross-correlation has higher  predictive power. This  does not  contradict earlier findings emphasizing the value of CMS on visual assessments of bullet matches: in those papers, assessments were based on purely visual inspection of either actual bullets or 2D microscopic images of bullets.
Neither one of these methods allows for an assessment of cross-correlations. This is one of the benefits of switching to a digitized version of the images that preserves the 3D surface structure. The findings about the discriminating power of cross-correlation are consistent with the results of the study by @ma:2004. However, in that study, the authors did not consider the number of matches and non-matches.

\begin{figure}[hbtp]
  \centering
```{r tree, echo=FALSE, fig.width=7, fig.height=4, out.width='.7\\textwidth'}
vals = alpha(brewer.pal(3, name="Paired"), alpha=0.5)
names(bstats)[9:10] <- c("#matches", "#non-matches")
includesVar <- setdiff(names(bstats), c("b1", "b2", "data", "resID", "id.x", "id.y", "pred", "forest", "bullet", "span", "crosscutdist", "flagged", "km"))
#cc <- read.csv("csvs/crosscuts-25.csv")
# cc$bullet <- gsub(".*//","",cc$path)
# cc$bullet <- gsub(".x3p", "", cc$bullet)
# bstats$flagged <- bstats$b2 %in% cc$bullet[which(is.na(cc$cc))] |
#                   bstats$b1 %in% cc$bullet[which(is.na(cc$cc))]

#excludesObs <- which(bstats$flagged)

rp1 <- rpart(match~., subset(bstats, !flagged)[,includesVar])  # doesn't include cms at all !!!!

per <- rp1$frame$yval # predictive probability for each node in the tree

prp(rp1, extra = 101, box.col=vals[as.numeric(per > 0.5)+1])

bstats$pred <- predict(rp1, newdata=bstats)
bstats$pred[bstats$flagged] <- NA
```
\caption{\label{fig:tree}Decision tree of matching bullets based on recursive partitioning. The rectangular nodes are the leaves, giving a short summary consisting of the number of observations in the leaf (bottom left), the corresponding percentage of the total (bottom right). The number at the top shows the fraction of these observations that are a match. A 1 or a 0 therefore indicate a homogeneous (or perfect) node. }
\end{figure}

```{r rforest, echo=FALSE, message=FALSE}
set.seed(20160105)
names(bstats)[9:10] <- c("num.matches", "num.nonmatches")
includesVar <- setdiff(names(bstats), c("b1", "b2", "data", "resID", "id.x", "id.y", "pred", "forest", "bullet", "span", "crosscutdist", "flagged", "km"))

rtrees <- randomForest(factor(match)~., data=subset(bstats, !flagged)[,includesVar], ntree=300)

errors <- data.frame(rtrees$err.rate)
errors$id <- 1:nrow(errors)
bstats$forest <- predict(rtrees, type="prob", newdata=bstats)[,2]
bstats$forest[bstats$flagged] <- NA
```

Another benefit of the digitized version of the images is that we can apply several hundred decision trees to combine in a random forest [@breiman:2001, @randomForest].  For each of the trees in a random forest, only two thirds of the observations are used for fitting, while the remaining third is used to evaluate the tree's predictive power and accuracy, or its reverse, the error rate. Because errors are determined from the one third of held-back observations, this error rate is called the out-of bag (OOB) error. 
Figure \ref{fig:oob} shows the cumulative out-of-bag error (OOB) rate for 300 trees. 

\begin{figure}[hbtp]
  \centering
```{r oob, echo=FALSE, fig.width=6, fig.height=3, out.width='.7\\textwidth'}
qplot(id, OOB, geom="line", data=errors) + theme_bw() + ylim(c(0,NA)) + xlab("Number of trees") + ylab("Out of Bag Error (OOB)")
```
\caption{\label{fig:oob}Cumulative out-of-bag error rate of a random forest fit to predict land-to-land matches from image features.}
\end{figure}

After about 100 trees, the error rate of land-to-land comparisons stabilizes at 0.0039. This is a weighted average between false positive error rate of 0.0001 and an error rate of false negatives of 0.2267. This out-of-bag error rate is over-estimating the actual error in the Hamby study: here, the final random forest based on 300 trees is able to correctly predict all known  matches and non-matches (see Figure \ref{fig:tree-forest}).
Note that this error rate is based on land-to-land comparisons and is expected to be much lower for bullet-to-bullet comparisons. In the case of the Hamby data, even a single tree results in an overall error rate of zero, if we require that a match of two bullets occurs when at least two of the bullet's lands are matched. This makes the errors in the automated approach smaller than the human error in the Hamby study. Out of the 507 participants who returned results, eight (out of $15 \times 507 = 7,605$) bullets were not matched conclusively, corresponding to a rate of 0.0011.

For the Hamby data, error rates based on bullet-to-bullet matches do not carry a lot of weight because of the small size of the study: fifteen unknown bullets are successfully matched to two pairs of ten bullets. Matching bullets can only be tested realistically in a much bigger experiment. 
Another thing to note about the random forest's error rates is that they are based on probability cutoffs of 0.5, i.e. whenever the predicted probability of a match exceeds 0.5, a match is declared. Basing this decision on a threshold fixed at 0.5 may not be the best approach. In practice, examiners are allowed a third option of 'inconclusive'. On a probability spectrum of outcomes we could therefore introduce an interval of 'inconclusive' results in the middle of the spectrum -- which turns out to be unnecessary in the Hamby study, because, here, the results from the random forest are very clear cut. Figure \ref{fig:tree-forest} shows a comparison of the predicted probabilities of a match by the tree and the random forest. As expected, the random forest provides a more realistic estimate of the uncertainty in the classification.

\begin{figure}[hbtp]
  \centering
```{r tree:forest, echo=FALSE, fig.width=5, fig.height=2.5, out.width='.6\\textwidth', warning = FALSE}
predm <- melt(bstats, measure.var=c("pred", "forest"))
levels(predm$variable) <- c("Tree", "Forest")
predm$match <- c("KNM", "KM")[as.numeric(predm$match)+1]
predm$barrel1 <- gsub("(.*) .*-[0-9]+", "\\1", predm$b1)
predm$barrel2 <- gsub("(.*)-[0-9]+", "\\1", predm$b2)

qplot(data=predm, value, match, geom="jitter", colour=match, shape=match) + theme_bw() + facet_grid(variable~.) +
    scale_colour_brewer(palette="Paired", labels=c("KNM", "KM")) +
  scale_shape_discrete(labels=c("KNM", "KM")) + ylab("") + 
  xlab("Predicted probability of a match") +
  theme(legend.position="none")
```
\caption{\label{fig:tree-forest}Prediction results from the tree and the forest. Using a cut-off probability of 0.5 the forest correctly predicts every single comparison. Compared to the tree, the forest's prediction probabilities are  shrunk towards either end of the prediction range.}
\end{figure}

Besides resulting in a probabilistic quantification of matches, random forests also provide an assessment of the importance of each of the features derived from the bullets' 3D topological surface measurements. Figure \ref{fig:importance} shows an overview of the importance of each variable measured as the mean decrease in the Gini index when the variable in question is included in a tree (for the exact values please refer to Section \ref{table-of-feature-importance}). 

\begin{figure}
```{r featimp, echo=FALSE, fig.width=7, fig.height=2, out.width='.7\\textwidth', fig.align='center'}
imp <- data.frame(importance(rtrees))
imp$Variable <- row.names(imp)
imp$Variable[c(7,8)] <- c("matches", "mismatches")
imp1 <- subset(imp, !(Variable %in% c("x1", "sd.D", "S")))
imp2 <- subset(imp, Variable %in% c("x1", "sd.D", "S"))
imp3 <- subset(imp, !(Variable %in% c("x1", "x2", "sd.D", "lag")))
qplot(MeanDecreaseGini,1, data=imp3, geom="point") + 
  geom_text(aes(y=1.2, label=Variable), angle=45, hjust=0, data=subset(imp3, Variable != "S")) + 
  geom_text(aes(y=1.2, label=Variable), angle=45, vjust=1.2, hjust=0, data=subset(imp3, Variable == "S")) + 
  ylim(c(0, 2.75)) + theme_bw() + ylab("") +
  theme(axis.ticks.y=element_blank(), axis.text.y=element_blank(),
        panel.grid.major.y=element_blank(), panel.grid.minor.y=element_blank()) +
  xlab("Importance (mean decrease in Gini index)")
```
\caption{\label{fig:importance}Importance of features in the random forest. Importance is measured in terms of mean decrease in Gini index when including the variable in a decision tree.}
\end{figure}

The variables with the most predictive power are cross-correlation and the overall number of matching extrema, followed by the total depth of joint striations $S$ and total number of mismatches. CMS is found only in sixth place.

Besides including results from known matches against known non-matches, we can increase the number of comparisons in the Hamby study to include all possible land-to-land comparisons. This effectively doubles the number of data points available. Comparisons not previously included in fitting the random forest can also be used as an additional source for assessing error rates.  Results for this and a more detailed discussion can be found in Section \ref{complete-evaluation-of-the-hamby-study}.

# Discussion

We present an algorithm which detects the most prominent but least relevant structure of a bullet from a firearms identification perspective, removes these features, and produces residuals which allow for the easy identification of markings. We have generalized this algorithm to align the residuals from two bullets to automatically determine whether they are indistinguishable. A random forest model provides a probabilistic assessment of the strength of a match, along with an ordering of the relevance of features. Matching bullets is clearly not a one-step process, but rather a sequence of data analysis tasks each deserving attention. As there is no scientific standard in place at this point in time, our intent is to explain an approach to addressing these tasks, while documenting all steps and providing all code so other researchers and forensic scientists can reproduce and expand on our findings.

The matching algorithm is sensitive to the parameter choices made. The heights at which signatures are extracted (currently 25$\mu m$ apart) to evaluate stability, as well as the cross-correlation factor (currently 0.95) we set as a minimum threshold do affect the final outcome. Another parameter that must be selected is the amount of smoothing when identifying peaks and grooves (currently, a window of 23.4375$\mu m$ is used, corresponding to a window of 7 values to the left and the right of an observation). We try to lay out in the paper the impact that each of the parameter choices has on the matching performance, but more research and better data are needed to define an optimized scenario.

The Hamby study serves as our evaluation `database'. It consists of only 35 bullets -- this is obviously not a particularly realistic scenario for an automatic matching procedure, but for now we are unaware of other databases containing bullets in the x3p format that we could add to our study.
 
The feasibility of creating a database of images that could be used to identify guns used in crimes was evaluated in a 2008 report [@nap:2008] by the National Research Council. The committee investigated the scalability of NIBIN (National Integrated Ballistic Information Network), which uses proprietary matching algorithms provided by IBIS. The bottom line of the report was that in spite of the many technical and practical hurdles, solutions to all but one problem could be found. The problem that remained is that statistically, the quality of the matching algorithm (in this case, of breech-face marks and firing pin impressions) could not withstand a hugely increased number of records without overwhelming forensic examiners, who have to examine possible matches suggested by the system. 
The findings of the NRC report on imaging are based on two-dimensional greyscale images, which the committee argued were not reliable enough for distinguishing between fine marks. This finding coincides with the assessment by @dekinder:2004 based on the IBIS Heritage system. A further re-assessment by @deceuster:2015 came to the same conclusions based on the EvoFinder system. 
The NRC report also found that results from 2D images can be improved when matches are based on 3D images. This is consistent with the importance of features found here: out of the top five features (see Figure \ref{fig:importance}), only the total number of matches and mismatches are available for a match based on 2D features.

By suggesting an automated algorithm that first removes class characteristics, such as the grooves and the curvature of the bullet to reveal the region of the  land, then identifies peaks and valleys on this land, we reduce subjectivity and with it possible sources of bias. In particular, 'the concept of counting striations is subjective and based on experience' [@miller:1998]. The steps outlined in this paper could also help explore other important forensic science problems. In particular, more general toolmark examination can benefit from the approach we discuss.

For a fair assessment of the performance of an algorithm, we need transparency. Our matching  algorithm is open: the code is readily available in form of the R package bulletr [@bulletr], and the code to produce this paper is available at \url{http://www.github.com/erichare/imaging-paper}. To understand whether an automated approach along the lines of the one we propose can accurately identify sets of bullets with undistinguishable markings, it will be necessary to assemble a much larger database that includes a wide range of ammunition types, degrees of damage, gun makes, etc. We are unaware of the existence of any such database. In addition to serving as a realistic testbed for the performance of the automated matching algorithm, such a database would also permit testing the underlying, as of yet untested, assumptions of uniqueness and reproducibility of the markings left by a gun on bullets.

# Acknowledgment

Thanks to David Baldwin for pointing us to the NIST database and doing a Firearms 101 for us.
Thanks to the men and women behind the software R [@R], and the authors of the R packages knitr [@knitr] and ggplot2 [@ggplot2].

# Appendix

## Cylindrical Fit

Figure \ref{fig:fixedX2} shows the  profile of surface measurements of bullet 1-5 at a fixed height. The smooth line on top is a circle, with estimated radius and center. The details of this fit are given below:

\begin{figure}[hbtp]
  \centering
```{r fixedX2, dependson='data', echo=FALSE, warning=FALSE, message=FALSE, fig.height=2, fig.width=6, out.width='0.5\\textwidth'}

cols = c(alpha("grey60", alpha=0.6), alpha("black", 0.5))

br111 <- read_x3p(paste(datadir,"Br1 Bullet 1-5.x3p", sep = "/"))
dbr111 <- fortify_x3p(br111)

pars <- data.frame(getCircle(dbr111$y, dbr111$value))
dbr111$theta <- acos((dbr111$y-pars$x0)/pars$radius)/pi*180
dbr111 <- dbr111 %>% mutate(
  xpred = cos(theta/180*pi)*pars$radius + pars$x0,
  ypred = sin(theta/180*pi)*pars$radius + pars$y0
)

qplot(data=subset(dbr111, x <= 100*1.5625^2 & x >= 99*1.5625^2), y, value, geom="line", size=I(1)) +
  geom_line(aes(x=xpred, y=ypred, group=x), 
            colour="grey30", size=0.25) +
#  ylab(expression(paste("Surface Measurements (in ",mu,m,")", sep=""))) + 
  ylab("") +
  theme_bw() + 
  theme(legend.position="bottom") #+ coord_equal()
```
\caption{\label{fig:fixedX2}Side profile of the surface measurements (in $\mu m$) of a bullet land at a fixed height of $x$. Note that the global features dominate any deviations, corresponding to the individual characteristics of striation marks.}
\end{figure}

Assume that $n$ data points are given in the form of data tuples $(x_1, y_1)$, $(x_2, y_2)$, $...$, $(x_n, y_n)$ that are (approximately) located on a circle. We want to estimate the location of the center and radius of the best fitting circle using a least squares approach.

We minimize the following expression:

\begin{equation}\label{eq:circle}
D = \sum_{i=1}^n \left( r^2 - (x_i-a)^2 - (y_i-b)^2 \right)^2,
\end{equation}

by differentiating $D$ with respect to $r, a,$ and $b$:
let us assume that $x_i$ and $y_i$ are centered (i.e. $\sum x_i = \sum_i y_i = 0$). Note, if they are not, make a note of the current means, subtract them now and add them to $(\hat{a}, \hat{b})$ at the end. 

\noindent
The  derivate of $D$ with respect to $r$ is:
\begin{eqnarray*}
\frac{d}{dr} D &=& 2 \sum_i \left( r^2 - (x_i-a)^2 - (y_i-b)^2 \right) 2 r = \\
&=& 4 r \left( n r^2 - \sum_i (x_i-a)^2 - \sum_i(y_i-b)^2 \right).
\end{eqnarray*}
At the minimum:
\begin{equation}\label{eq:rmin}
\frac{d}{dr} D = 0 \stackrel{r \neq 0}{\iff} nr^2  = \sum_i (x_i-a)^2 + \sum_i(y_i-b)^2.
\end{equation}

The  derivative of $D$ with respect to $a$ is:
\begin{eqnarray*}
\frac{d}{da} D &=& 2 \sum_i \left( r^2 - (x_i-a)^2 - (y_i-b)^2 \right) 2 (x_i - a) = \\
&=& -4 \left[ a \cdot nr^2 + \sum_i (x_i - a)^3  + \sum_i (x_i - a) (y_i - b)^2 \right].
\end{eqnarray*}
Using (\ref{eq:rmin}) for $nr^2$  in the equation above we get:
\begin{eqnarray*}
\frac{d}{da} D &=& -4 \left[  \sum_i a(x_i-a)^2 +  \sum_i a(y_i-b)^2  + \right. \\
&& \phantom{-4 \ \ } \left . \sum_i (x_i - a)^3  + \sum_i (x_i - a) (y_i - b)^2 \right]  = \\
&=& -4 \left[ \sum_i (x_i-a)^2 (a + x_i - a)  + \right.\\
&& \phantom{-4 \ \ } \left .\sum_i (x_i - a + a) (y_i - b)^2 \right] = \\
&=& -4 \left[ \sum_i (x_i-a)^2 x_i   + \sum_i x_i  (y_i - b)^2 \right] 
\stackrel{\begin{array}{c}\sum_i x_i = 0\\
\sum_i y_i = 0\end{array}}{=} \\
&=& -4 \left[ \sum_i x_i^3   + \sum_i x_i y_i^2  - 2a s_{xx} - 2b s_{xy} \right],
\end{eqnarray*}
where $s_{xx} = \sum_i x_i^2, s_{xy} = \sum_i x_i y_i$ and $s_{yy} = \sum_i y_i^2$.

\noindent
Likewise, we get for the derivative of $D$ with respect to $b$:
\begin{eqnarray*}
\frac{d}{db} D &=& -4 \left[ \sum_i y_i^3   + \sum_i x_i^2 y_i - 2a s_{xy} - 2b s_{yy} \right].
\end{eqnarray*}
To find the minimum we therefore get a system of two linear equations in $a$ and $b$:
\begin{eqnarray*}
2 s_{xx} a + 2 s_{xy} b = c_1 && \text{ with } c_1 = \sum_i x_i^3 + x_i y_i^2 \\
2 s_{xy} a + 2 s_{yy} b = c_2 &&\text{ with } c_2 = \sum_i x_i^2 y_i + y_i^3.
\end{eqnarray*}
The solution to the system is:
\begin{eqnarray*}
\hat{a} &=& \frac{c_1 s_{yy} - c_2 s_{xy}}{2 s_{xx} s_{yy} - 2 s_{xy}^2},\\
\hat{b} &=& \frac{c_2 s_{xx} - c_1 s_{xy}}{2 s_{xx} s_{yy} - 2 s_{xy}^2}, \text{ and}\\
\hat{r^2} &=& \frac{1}{n}s_{xx} + \frac{1}{n}s_{yy} + \hat{a}^2 + \hat{b}^2.
\end{eqnarray*}

The scatterplot in Figure \ref{fig:residual} shows the residuals of such a fit.
In this instance, the radius is estimated as $\hat{r} = `r pars$radius`\mu m = `r pars$radius/1000`mm$ and the land covers about `r diff(range(dbr111$theta))` degrees.  Both of these estimates are consistent with a 9 mm bullet fired by a Ruger P-85.
The residuals are dominated, as expected, by the grooves, which show up as large positive residuals. For a profile at height $x = 100\mu m$ there is a residual circular structure that does not show up for all signatures. 

\begin{figure}[hbtp]
  \centering
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{\label{fig:residuala}Residual structure at height $x = 1.5625\mu m$ (bottom of the bullet).}
```{r residual2, dependson='fixedX2', echo=FALSE, warning=FALSE, fig.height=3, out.width='\\textwidth'}
qplot(data=subset(dbr111, x <= 1.5625), y, value-ypred, #colour=factor(x),
      geom="line", size=I(1)) +
#  scale_colour_brewer("x", palette="Paired") + 
  theme_bw() + 
  geom_hline(yintercept = 0, colour="grey50") +
  ylab(expression(paste("Residuals (in ",mu,"m)", sep=""))) + 
  theme(legend.position="bottom")
```
\end{subfigure}    
\begin{subfigure}[b]{.49\textwidth}\centering
\caption{\label{fig:residualb} Residual structure at height $x = 100.00\mu m$}
```{r residual, dependson='fixedX2', echo=FALSE, warning=FALSE, fig.height=3, out.width='\\textwidth'}
#qplot(data=subset(dbr111, x <= 80*1.5625^2 & x >=75*1.5625^2), y, value-ypred,
qplot(data=subset(dbr111, x == 100), y, value-ypred,  #colour=factor(x), 
      geom="line", size=I(1)) +
#  scale_colour_brewer("x", palette="Paired") + 
  geom_hline(yintercept = 0, colour="grey50") +
  theme_bw() + 
  ylab(expression(paste("Residuals (in ",mu,"m)", sep=""))) + 
  theme(legend.position="bottom")
```
\end{subfigure}
\caption{\label{fig:residual} Residual structure of circular fits at two different cross sections. Both residual plots show systematic structures, indicating that a circular fit is not entirely appropriate.}
\end{figure}

A single cylinder as a fit is unlikely to be a particularly good fit, because there seem to be quite massive deformations in the vertical direction. Even when we fit a circle at each distinct height of the bullet, as in Figure \ref{fig:circlefits}, this does not address all of these issues. While the wider circumference at the base of the bullet can be resolved by individual circular fits, the systematic residual structure in Figure \ref{fig:residualb} stays the same.

```{r bullet1, echo=FALSE}
db1 <- NULL
for (i in 1:6) {
  bname <- sprintf(file.path(datadir, "Br1 Bullet 1-%d.x3p"), i)
  dbi <- fortify_x3p(read_x3p(bname))
  dbi$part <- i
  db1 <- rbind(db1, dbi)
}

db1 <- db1 %>% group_by(part, x) %>% do (
    data.frame(., predCircle(.$y, .$value))
  )
```

```{r bullet2, echo=FALSE}
db2 <- NULL
for (i in 1:6) {
  bname <- sprintf(file.path(datadir, "Br1 Bullet 2-%d.x3p"), i)
  dbi <- fortify_x3p(read_x3p(bname))
  dbi$part <- i
  db2 <- rbind(db2, dbi)
}

db2 <- db2 %>% group_by(part, x) %>% do (
    data.frame(., predCircle(.$y, .$value))
  )
```

\begin{figure}[hbtp]
  \centering
```{r circlefits, echo=FALSE, fig.width=10, fig.height=5, out.width='\\textwidth', warning=FALSE}
db2$land <- db2$part
db1$land <- db1$part
qplot(y, resid, data=subset(db2, x == 100), #colour=factor(x), 
      geom="line", size=I(.75), colour=I("grey70")) + 
  facet_wrap(~land, ncol=3, labeller="label_both") + 
  scale_colour_brewer(palette="Paired") +
  theme_bw() + theme(legend.position="bottom") + 
  geom_line(aes(y, resid, group = x), colour="black", size=.75, alpha=0.5,
             data = filter(db1, land==5, x == 100)[,c("y", "resid", "x")]) +
  ylab("Residuals from circular fit") +
  ggtitle("Bullet 1-5 in black")
```
\caption{Circular fit to the signature of each land of bullet 2, with signature from bullet 1-5 overlaid.\label{fig:circlefits} The signature of bullet 1-5 matches best with bullet 2-1.}
\end{figure}

## Cross-Correlation at Multiple Heights

Figure \ref{fig:crosscuts} shows a sequence of signatures for bullet 1-5 (barrel 1) that are taken at heights 50$\mu m$ apart, between 150$\mu m$  and 400$\mu m$. These are compared to the signature at a height of 100$\mu m$. Initially, this comparison constitutes an almost perfect match between the two signatures. However, the match quickly deteriorates with increasing distance between the heights at which signatures are extracted.  Only if signatures are from heights within 150$\mu m$ do we get a good visual match even when we know that the same bullet surface is being used. 
Given that we have to expect some minor variation in the same height values due to (manual) alignments in microscopes, we should take height values into account in the automatic matching routine by evaluating matches at several heights. 
\begin{figure}[hbtp]
  \centering
```{r crosscuts-vary, echo=FALSE, fig.width = 12, fig.height = 7, out.width = '\\textwidth', warning = FALSE}

scrubPath <- function(x) {
  splits <- strsplit(as.character(x), split="/")
  last <- sapply(splits, function(x) x[length(x)])
  gsub(".x3p","", last)
}

paths <- file.path(datadir, dir(datadir))
paths <- paths[grep(" ", paths)]

im1 <- "images/Hamby (2009) Barrel/bullets/Br1 Bullet 1-5.x3p"

crosscuts <- seq(100, 400, by = 50)
lof <- processBullets(read_x3p(im1), name = scrubPath(im1), x = crosscuts)
lof$bullet <- paste(lof$bullet, lof$x)

reslist <- lapply(crosscuts[-1], function(cc) {
#  browser()
  b2 <- subset(lof, x %in% c(cc, 100))
  lofX <- bulletSmooth(b2)
  bAlign = bulletAlign(lofX)
  lofX <- bAlign$bullet
    b12 <- unique(b2$bullet)
  peaks1 <- get_peaks(subset(lofX, bullet==b12[1]), smoothfactor = 25)
  peaks2 <- get_peaks(subset(lofX, bullet == b12[2]), smoothfactor = 25)

#  threshold <- bulletPickThreshold(lofX, thresholds = seq(0.3, 1.5, by = 0.05))
#  lines <- striation_identify(lofX, threshold = threshold)
  peaks1$lines$bullet <- b12[1]
  peaks2$lines$bullet <- b12[2]
  lines <- striation_identify(peaks1$lines, peaks2$lines)

  maxCMS <- maxCMS(lines$match==TRUE)
  list(maxCMS = maxCMS, ccf = bAlign$ccf, lines=lines, bullets=lofX)
})

ccfs <- sapply(reslist, function(res) res$ccf)

lop <- lapply(reslist, function(res) {
ggplot() +
  theme_bw() + 
  geom_rect(aes(xmin=xmin, xmax=xmax, fill=factor(type)), show.legend=FALSE, ymin=-6, ymax=5, data=res$lines, alpha=0.2) +
  geom_line(aes(x = y, y = l30, linetype=bullet),   data = res$bullets, alpha=0.6) +
  scale_colour_brewer("", palette="Set1", na.value=alpha("grey50", alpha=0.5)) +
  scale_linetype_discrete("") +
  scale_fill_brewer("", palette="Set2", na.value=alpha("grey50", alpha=0.5)) +
  theme(legend.position = c(1,1.2), legend.justification=c(1,1),
        legend.background = element_rect(fill=alpha('white', 0.4))) + 
  ylim(c(-6,6)) +
  geom_text(aes(x = meany), y= -5.5, label= "x", data = subset(res$lines, !match)) +
  geom_text(aes(x = meany), y= -5.5, label= "o", data = subset(res$lines, match)) +
    ylab("") + xlab("")
})


grid.arrange(lop[[1]], lop[[2]], lop[[3]], lop[[4]], lop[[5]], lop[[6]],
             ncol = 2)
```
\caption{\label{fig:crosscuts}Overview of the variations in the signatures at different heights. The signature extracted at $x = 100\mu m$ is compared to signatures at every 50$\mu m$. With every step away from the original height, the number of differences between the signatures increases; the number of maximum CMS decreases from initially 22 to  four or fewer at a height of $x = 300\mu m$ and above.}
\end{figure}

## Signature Intensities

```{r setup-signatures, echo=FALSE, message=FALSE}
knowndatadir <- "images/Hamby (2009) Barrel/bullets"
unknowndatadir <- "images/Hamby (2009) Barrel/bullets"

###############
# can we identify the barrels the unknown bullets came from?

# match unknown land using crosscuts
ccs <- read.csv("csvs/crosscuts-25-old.csv")
ccs$path <- file.path(knowndatadir, basename(as.character(ccs$path)))
all_bullets <- lapply(as.character(ccs$path), function(x) {
  result <- read_x3p(x)
  result[[3]] <- x
  names(result)[3] <- "path"
  
  return(result)
})

knowns <- all_bullets[1:120]
unknowns <- all_bullets[121:210]

if (!file.exists("csvs/crosscuts-sd.csv")) {
  bullets_processed <- lapply(all_bullets, function(bul) {
    #cat("Computing processed bullet", basename(bul$path), "\n")
    crosscuts <- 25*(1:20)
    dframe <- processBullets(bullet = bul, x = crosscuts)
    dframe$bullet <- with(dframe, paste(bullet, x))
    dframe
  })
  names(bullets_processed) <- as.character(ccs$path)
  
  bullets_smoothed <- bullets_processed %>% bind_rows %>% bulletSmooth
  
  stats <- bullets_smoothed %>% group_by(bullet, x) %>% summarize(
    sd = sd(l30, na.rm=T)
  )
  splits <- strsplit(stats$bullet, split=".", fixed=TRUE)
  bnames <- sapply(splits, function(x) paste(x[1],".x3p", sep=""))
  stats$bullet <- bnames
  
  write.csv(stats, "csvs/crosscuts-sd.csv", row.names=FALSE)
} else {
  stats <- read.csv("csvs/crosscuts-sd.csv")
}
```

Figure \ref{fig:overview} shows an overview of the signatures at different heights on a single bullet. 

\begin{figure}[hbtp]
```{r one-bulletland-sd, echo=FALSE, dependson='setup-signatures', fig.height=5, fig.width=8, out.width='0.7\\textwidth', warning=FALSE, fig.align='center'}
k <- 1
x <- as.character(ccs$path)[k]
land <- read_x3p(x)
land[[3]] <- x
names(land)[3] <- "path"

crosscuts <- 25*(1:20)
dframe <- processBullets(bullet = land, x = crosscuts)
dframe$bullet <- with(dframe, paste(bullet, x))

bullet <- dframe %>% bind_rows %>% bulletSmooth 
subbullet <- subset(bullet, x >= ccs$cc[k])
qplot(x=y, geom="line", y=l30, colour=x, group=x, data=subbullet) + theme_bw() +
  scale_colour_gradient("Crosscuts") +
  theme(legend.position="bottom") +
  ylab("Residuals from loess fit")
```
\caption{\label{fig:overview}Signatures of the same bullet at different heights.  With increasing height, peaks and valleys are less pronounced, resulting in a smaller standard deviation.}
\end{figure}

At larger heights individual characteristics become less distinctive, making true matches to other bullets harder. The pattern of decreasing peaks and valleys is generally true for bullet lands, as can be seen in Figure \ref{fig:sds}. 

\begin{figure}[hbtp]
\centering
```{r crosscuts-sd, echo=FALSE, fig.height=5, fig.width=8, out.width='.7\\textwidth', dependson='setup-signatures', message=FALSE}
qplot(x, sd, data=stats, geom="line", group=bullet, alpha=I(0.5)) + theme_bw() +
  geom_smooth(group=1) + 
  xlab("Height (from the bottom of the bullet)") +
  ylab("Standard deviation")
```
\caption{\label{fig:sds}Standard deviation reduces as height increase.}
\end{figure}

Figure \ref{fig:sds} shows that the amount of standard deviation of a signature decreases on average for all bullet lands at larger heights.
This makes standard deviation of a signature one measure to quantify the extent to which a signature is expressed. For identifying matches we should therefore use the lowest height to extract a bullet's signature once a stable surface region is detected. This is in accordance with current standard practice [@afte:1992].

## Complete Evaluation of the Hamby Study

One way to expand the use of the James Hamby study is to not only match all of the unknown bullet lands against the known bullet lands, but to compare every land against every other land. This effectively doubles the number of comparisons from 10,384 pairwise comparisons of usable bullet lands to 21,115 $\left[= (118+88)\cdot 205/2\right]$ comparisons by adding another 10,731 bullet land comparisons made up of known-to-known and unknown-to-unkown comparisons. 

When we predict the new 10,731 comparisons using the random forest based on the previously considered 10,384 known-unknown comparisons, we encounter 18 false negatives and 9 false positives, corresponding to an actual false error rate of 0.19 and a false positive rate of 0.00085, which is close to the random forest's estimated OOB error rates of 0.226744 and 0.000098. 

However, if we use all of the available comparisons to fit another random forest of 300 trees, the defacto error rates for false positives and false negatives are again at 0. The estimated OOB error rates are 0.00024 for the false positive rate and 0.22180 for the false negative rate. The false positive rate is therefore virtually unchanged, while we see a slight improvement in the false negative rate for an overall OOB error rate of 0.3%, i.e. an increase to twice the number of comparisons leads to a decrease of 25% of the estimated error rate. This is yet another argument in favor of a larger database for training algorithms.

```{r allhamby, echo=FALSE, message=FALSE, warning=FALSE}
knowndatadir <- "images/Hamby (2009) Barrel/bullets"
knowns <- dir(path=knowndatadir)
knowns <- knowns[grep("Br[0-9]", knowns)]
knowns <- gsub(".x3p", "", knowns)

unknowndatadir <- "images/Hamby (2009) Barrel/bullets"
unknowns <- dir(path=unknowndatadir)
unknowns <- unknowns[grep("Ukn", unknowns)]
unknowns <- gsub(".x3p", "", unknowns)

#bstats <- read.csv("data/data-25-25/bullet-stats.csv")
#flagged <- c("Br6 Bullet 2-1", "Br9 Bullet 2-4", "Ukn Bullet B-2", "Ukn Bullet Q-4")
ballstats <- read.csv("data/data-new-all-25-25/bullet-stats-single.csv")

includesVar <- setdiff(names(ballstats), c("b1", "b2", "data", "resID", "id.x", "id.y", "id", "pred", "forest", "bullet", "span", "crosscutdist", "flagged", "km", "forestsmall",
                                        "left_cms", "right_cms", "left_noncms", "right_noncms"))

library(randomForest)
#set.seed(20151202)
#rtrees <- randomForest(factor(match)~., data=subset(bstats, !flagged)[,includesVar], ntree=100)


# names(ballstats)[10] <- "num.nonmatches"
# names(ballstats)[11] <- "CNMS"
# names(ballstats)[16] <- "S"
# names(ballstats)[17] <- "CMS"
#ballstats$forestsmall <- predict(rtrees, newdata=ballstats, type="prob")[,2]
###########

ballstats$insmall <- with(ballstats, ((b1 %in% knowns) & (b2 %in% unknowns)) | ((b2 %in% knowns) & (b1 %in% unknowns)))


set.seed(20160512)
rtrees2 <- randomForest(factor(match)~., data=subset(ballstats, !flagged & insmall)[,includesVar], ntree=300)


ballstats$forestsmall <- predict(rtrees2, newdata=ballstats, type="prob")[,2]

# xtabs(~(forestsmall>0.5)+match+insmall, data=ballstats[!ballstats$flagged,])


set.seed(20151201)
rtrees3 <- randomForest(factor(match)~., data=subset(ballstats, !flagged)[,includesVar], ntree=300)
ballstats$forest <- predict(rtrees3, newdata=ballstats, type="prob")[,2]
#xtabs(~(forest>0.5)+match+insmall, data=ballstats[!ballstats$flagged,])
#qplot(forest, data=subset(ballstats, !flagged), geom="jitter", y=match)
```

Figures \ref{fig:aligned} and \ref{fig:aligned-second} give an overview of all the signatures from bullet lands in the Hamby study aligned by barrel. Three to five bullets were fired from each barrel. The figures give us both some insight into how well signatures match and how consistent individual characteristics are impregnated on bullets fired from each of the barrels. Signatures for some lands match remarkably well -- such as land 5 from barrel 1, whereas all lands from barrel 5 show some variability both in the location and depths of peaks and valleys.

```{r aligned, echo=FALSE, warning=FALSE}
matches <- read.csv("csvs/matches-old.csv", header = FALSE)
matches$barrel <- rep(1:10, each=6)
matches$id <- 1:nrow(matches)
mm <- melt(matches, id.var=c("barrel", "id"), na.rm=TRUE)
mm <- subset(mm, value != "")
mm$value <- gsub("Br.* Bullet ", "", mm$value)
mm$prefix <- sprintf("Br%d Bullet ", mm$barrel)
unkns <- grep("[A-Z]",mm$value)
mm$prefix[unkns] <- "Ukn Bullet "
mm$path <- "images/Hamby (2009) Barrel/bullets/"
mm$path <-   with(mm, paste0(path, prefix, value, ".x3p"))
  
ccs <- read.csv("csvs/crosscuts-25-old.csv")
ccs$path <- file.path(knowndatadir, basename(as.character(ccs$path)))
mm <- merge(mm, ccs, by="path", all=TRUE)
mm <- subset(mm, !is.na(mm$cc))

crosscuts <- plyr::ldply(
  1:nrow(mm), 
  function(i) {
    dframe <- get_crosscut(mm$path[i], mm$cc[i])
    groove <- get_grooves(dframe)
    data.frame(mm[i,], fit_loess(dframe, groove)$resid$data)
  })

crosscuts$bullet <- crosscuts$path
crossSmooth <- bulletSmooth(crosscuts)

alignme <- function(data) {
#  browser()
  data$bullet <- as.character(data$bullet)
  bullets <- unique(data$bullet)
  b1 <- bullets[1]
  lofs <- NULL
  for (i in bullets[-1]) {
    lof <- subset(data, bullet %in% c(b1, i))
    lofs <- rbind(lofs, bulletAlign(lof)$bullet)
  }
  lofs <- unique(lofs)
  lofs
}

aligned <- crossSmooth %>% group_by(id) %>% do(
  data.frame(alignme(.))
)
aligned$land <- (aligned$id %% 6)
aligned$land[aligned$land==0] <- 6
aligned$bulletland <- aligned$bullet
aligned$bullet <- gsub("-[0-9]","",aligned$bullet)
library(RColorBrewer)
set.seed(20160106)
cols <- rep(brewer.pal(n=10, name="Paired"), length=35)[sample(35, 35, replace=TRUE)]
labels <- aligned %>% group_by(bulletland) %>% summarize(
  x = max(y),
  y = min(l30, na.rm=T),
  label = gsub(" Bullet","",bulletland[1]),
  barrel = barrel[1],
  land = land[1],
  bullet = bullet[1]
)
labels <- labels %>% group_by(barrel, land) %>% mutate(id=order(bulletland))
```

\begin{figure}[hbtp]
```{r aligned-first, dependson='aligned', echo=FALSE, warning=FALSE, fig.width=10, fig.height=15, out.width='7.5in'}
qplot(x=y, y=l30, data=subset(aligned, barrel <=5), 
      geom="line", group=bullet, colour=bullet) + 
  ylim(c(-7,7)) + xlab("") + ylab("") + xlim(c(0,3000)) +
  facet_grid(facets=land~barrel, labeller="label_both") + theme_bw() +
  theme(plot.margin=unit(c(0,0,-1,-1), unit="line"), legend.position="none") + 
  geom_label(aes(y=2.5*(id-3), label=label, colour=bullet), 
             x = 3000, data=subset(labels, barrel <= 5), 
             inherit.aes = FALSE, hjust="right", size=3, 
             fill=alpha("white", .5)) +
  scale_colour_manual(values=cols)
```
\caption{\label{fig:aligned}Overview of aligned signatures for all bullet lands for  barrels 1 to 5 of the Hamby study.}
\end{figure}

\begin{figure}[hbtp]
```{r aligned-second, dependson='aligned', echo=FALSE, warning=FALSE, fig.width=10, fig.height=15, out.width='7.5in'}
qplot(x=y, y=l30, data=subset(aligned, barrel > 5), 
      geom="line", group=bullet, colour=bullet) + 
  ylim(c(-7,7)) + xlab("") + ylab("") + xlim(c(0,3000)) +
  facet_grid(facets=land~barrel, labeller="label_both") + theme_bw() +
  theme(plot.margin=unit(c(0,0,-1,-1), unit="line"), legend.position="none") + 
  geom_label(aes(y=2.5*(id-3), label=label, colour=bullet), 
             x = 3000, data=subset(labels, barrel > 5), 
             inherit.aes = FALSE, hjust="right", size=3) +
  scale_colour_manual(values=cols)
```
\caption{\label{fig:aligned-second}Overview of aligned signatures for all bullet lands for  barrels 5 to 10 of the Hamby study.}
\end{figure}

## Table of Feature Importance

Two random forests were calculated for the Hamby study. For the first random forest only comparisons of bullet lands from known bullets and unknown bullets were used. The second random forest is based on a full comparison of every land with every other land, increasing the number of comparisons from originally 10,384 (10,212 known non-matches and 172 known matches) by another 10,931 comparisons (10,637 known non-matches and 94 known matches). Random forests allow an assessment of variable importance (also called feature importance) as the mean decrease in Gini index when including each variable. 
Table \ref{tab:importance} shows the results for feature importance for both of these random forests. Importance 1 refers to the smaller subset, Importance 2 is the feature importance derived from the random forest based on all pairwise comparisons.

```{r setup-rtrees, echo=FALSE, message=FALSE}
bstats <- read.csv("data/data-25-25/bullet-stats-old.csv", stringsAsFactors = FALSE)

library(randomForest)
set.seed(20151202)
includesVar <- setdiff(names(bstats), c("b1", "b2", "data", "resID", "id.x", "id.y", "pred", "forest", "bullet", "span", "crosscutdist", "flagged", "km"))

rtrees <- randomForest(factor(match)~., data=subset(bstats, !flagged)[,includesVar], ntree=300)
```

\begin{table}[tbhp]
\caption{\label{tab:importance}Table of features derived from bullet image ordered by importance in predicting matches. Importance is measured in terms of mean decrease in Gini index when including the variable in a decision tree. Averages (and standard deviations) for known matches (KM) and known non-matches (KNM) are shown in the last four columns.}
\centering
```{r importance, echo=FALSE, results='asis', warning=FALSE}

imp2 <- data.frame(importance(rtrees3))
imp2$Variable <- row.names(imp2)
names(imp2)[1] <- "Bigforest"

imp <- data.frame(importance(rtrees))
imp$Variable <- row.names(imp2)

imp <- merge(imp, imp2, by="Variable")

imp <- imp[order(-imp$MeanDecreaseGini),]
names(imp)[2:3] <- c("Importance 1", "Importance 2")

bstats <- subset(ballstats, insmall == TRUE)
imp$meanMatch <- sapply(imp$Variable, function(var) {
  mean(subset(bstats, match)[,var])
})
imp$sdMatch <- sapply(imp$Variable, function(var) {
  sd(subset(bstats, match)[,var])
})
imp$meanNonMatch <- sapply(imp$Variable, function(var) {
  mean(subset(bstats, !match)[,var])
})
imp$sdNonMatch <- sapply(imp$Variable, function(var) {
  sd(subset(bstats, !match)[,var])
})


imp$meanMatch <- round(imp$meanMatch,1)
imp$sdMatch <- sprintf("(%5.2f)", imp$sdMatch)
imp$meanNonMatch <- round(imp$meanNonMatch,1)
imp$sdNonMatch <- sprintf("(%7.2f)", imp$sdNonMatch)
row.names(imp) <- 1:nrow(imp)
imp$Variable[c(2,3,4,6,8)] <- c("#matches", "S", "#non-matches", "CMS", "CNMS")
names(imp)[-(1:3)] <- c("KM", "(sd)", "KNM", "(sd)")

result <- subset(imp, !(Variable %in% c("x1", "x2", "lag", "sd.D")))
rownames(result) <- 1:nrow(result)

print(xtable(result, align="clrrrrrr", digits=c(0,0,1,1,1,2,1,2)), 
      floating = FALSE)
```
\end{table}