-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDissertation.tex
194 lines (152 loc) · 10.9 KB
/
Dissertation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
%%
%% ACS project dissertation template.
%%
%% Currently designed for printing two-sided, but if you prefer to
%% print single-sided just remove ",twoside,openright" from the
%% \documentclass[] line below.
%%
%%
%% SMH, May 2010.
\documentclass[a4paper,12pt,twoside,openright]{report}
%%
%% EDIT THE BELOW TO CUSTOMIZE
%%
\def\authorname{Peter E.\ Conn\xspace}
\def\authorcollege{Trinity Hall\xspace}
\def\authoremail{[email protected]}
\def\dissertationtitle{A Safety Enhancing Source Translation for C}
\def\wordcount{xx,xxx}
\usepackage{url}
\usepackage{epstopdf}
\usepackage{epsfig,graphicx,parskip,setspace,tabularx,xspace}
\usepackage{pgfplots}
\usepackage{amssymb}
\usepackage{rotating}
\usepackage{pifont}
\usepackage{tikz}
\usepackage{amsmath}
\usepackage{minted}
\usetikzlibrary{shapes, arrows, decorations.markings}
%% START OF DOCUMENT
\begin{document}
%% FRONTMATTER (TITLE PAGE, DECLARATION, ABSTRACT, ETC)
\pagestyle{empty}
\singlespacing
\input{titlepage}
\onehalfspacing
\input{declaration}
\singlespacing
\newpage
{\Huge \bf Abstract}
\vspace{24pt}
C is one of the most widely used languages, favoured for its low-cost abstractions and the control it gives programmers over memory.
This control frequently leads to bugs, especially while using pointers, causing unpredictable errors and security vulnerabilities.
Much research has been done to providing pointer safety to C programs; however current work suffers from systematic flaws.
The analysis performed by such systems is inextricably linked to the transformation required to provide pointer safety.
%Second, they are studied in isolation and not in combination with commonly used compiler optimizations or as part of a widespread compiler tool chain.
Additionally, the majority of these approaches sacrifice binary compatibility, requiring the entire libraries a program uses to be recompiled as well or even the program to be rewritten.
This dissertation aims to provide solutions to these problems.
I use an approach where the analysis and transformation are kept separate, with a single analysis able to support two drastically different transformations (fat pointers and lookup tables), giving performance overheads comparable to current work.
I then provide methods for maintaining very high binary compatibility with programs that use already compiled libraries, and evaluate the costs of doing so.
%I investigate the effects of these two prevailing methods of ensuring pointer safety on commonly used optimizations passes.
Additionally, I create a taxonomy for classifying and evaluating current methods for providing pointer safety.
The taxonomy goes beyond the currently quantitative measure of performance overhead imposed by a pointer safety system and includes qualitative measures such as the types of safety provided, completeness and compatibility.
Finally I argue that such techniques fall in a trade-off space between complete compatibility and complete safety and place current work on that spectrum, highlighting the trade-offs between the two criteria to aid future system design.
\newpage
\vspace*{\fill}
\pagenumbering{roman}
\setcounter{page}{0}
\pagestyle{plain}
\tableofcontents
\listoffigures
\listoftables
\onehalfspacing
%% START OF MAIN TEXT
\chapter{Introduction}
\pagenumbering{arabic}
\setcounter{page}{1}
Pointer errors account for many bugs in C, ranging from simple off-by-one errors where the programmer writes one past the end of the array to buffer overflow vulnerabilities, where an attacker inputs data designed to manipulate the code into writing past the end of the array and over some other important data.
C allows many practices that make such errors easy.
For example a pointer can legally point one past the end of an array (although dereferencing it is undefined).
A pointer frequently starts pointing to a valid area of memory and is then modified (through pointer arithmetic) to point to an invalid area of memory.
This pointer is modified again, bringing it back to point to the valid area and, though used in practice, this is not allowed in the standard.
For example, Ghostscript, an open source PostScript implementation violates the standard by issuing pointers to element \verb!-1! of stacks \cite{ghostscript}.
One method of dealing with such bugs is to keep track of data about the valid area of memory that a pointer has access to and consult this data on pointer dereference.
In this dissertation, the data used to determine the validity of a pointer is the address of the start of the area of allocated memory (the base) and the address one past its end (the bound).
The base and bound data must be associated with the pointer in some way.
\begin{figure}
\centering
\tikzstyle{block} = [rectangle, draw,
text width=5em, text centered, rounded corners, minimum height=2em]
\tikzstyle{lib} = [rectangle, draw, fill=green!30,
text width=5em, text centered, rounded corners, minimum height=2em]
\tikzstyle{comp} = [rectangle, draw, fill=blue!20,
text width=5em, text centered, rounded corners, minimum height=2em]
\tikzstyle{line} = [draw, -latex']
\begin{tikzpicture}[node distance = 3cm, auto]
\node [block] (code) {Source Code};
\node [comp, right of=code] (ccured) {Pointer Analysis};
\node [comp, above of=ccured, node distance=2cm] (bandage) {Fat Pointer Transform};
\node [block, right of=bandage] (exe1) {Bounds Checked Binary};
\node [comp, below of=ccured, node distance=2cm] (softbound) {Lookup Table Transform};
\node [block, right of=softbound] (exe2) {Bounds Checked Binary};
\node [lib, below right of=exe2] (hash) {HashTable};
\node [lib, above right of=exe2] (mem) {MemTable};
\path[line] (code) -- (ccured);
\path[line] (ccured) -- (bandage);
\path[line] (ccured) -- (softbound);
\path[line] (code) -- (bandage);
\path[line] (code) -- (softbound);
\path[line] (bandage) -- (exe1);
\path[line] (softbound) -- (exe2);
\path[line] (mem) -- (exe2);
\path[line] (hash) -- (exe2);
\end{tikzpicture}
\caption{Overview of Components of Bandage}
\label{fig:Components}
\end{figure}
I implemented the \textit{Bandage system}, consisting of the parts shown in Figure \ref{fig:Components}.
Bandage contains one analysis pass, two transformation passes and two C libraries.
The pointer analysis pass identifies the many pointers not modified using pointer arithmetic during the life of the program.
The two transformation passes implement different methods of associating data about pointers with the pointers themselves.
With fat pointers, pointers are replaced by structures that contain the pointer value and their bounds information.
With a lookup table approach, the address of the pointer is used as the key in a lookup table containing the bounds information.
Finally, the two C libraries contain different implementations of the lookup table.
By differentiating the pointer analysis pass from the transformation pass, it can be combined with different transformations to achieve different trade-offs in terms of safety, runtime and binary compatibility, allowing a much larger range of potential solutions to a specific need.
Additionally, it allows support for further backends that were not considered during its design, such as MPX \cite{mpx} or CHERI \cite{cheri}.
The Bandage system is designed to provide high binary compatibility by drawing a boundary around the source files it can transform and ensuring that the instrumentation is stripped when it passes through this boundary.
This allows programs using Bandage to also use libraries that haven't been modified, but also restricts the safety completeness it can offer, as pointers that come in through this boundary do not contain sufficient information for full safety.
This is in contrast to work such as CCured \cite{necula2002ccured} that requires a full recompilation of all included libraries.
Finally, Bandage will be used in the classification of pointer-safety systems, providing a reference point to compare other systems with and a in depth study to highlight some subtleties in the taxonomy.
\section{Outline}
This dissertation proceeds as follows, with a brief coverage of the background information required: the types of pointer safety, examples of vulnerabilities caused by lack of it and an overview of how pointers are dealt with in LLVM IR.
A related work section follows, enumerating previous efforts to provide pointer safety, drawing together common themes between them and highlighting the position of this work.
The design and implementation section contains details of the two transformation passes (the fat pointer and the metadata) and the one analysis pass.
The evaluation section covers the framework for evaluating future pointer-safety systems, the runtime penalties, tested on handcrafted microbenchmarks and the Olden benchmark suite and the compilation overhead incurred by the analysis and transformations.
\chapter{Background}
\input{Background/Background}
\chapter{Related Work}
\input{RelatedWork/RelatedWork}
\chapter{Design and Implementation}
\input{Implementation/Implementation}
\chapter{Evaluation}
\input{Evaluation/Evaluation}
\chapter{Summary and Conclusions}
I developed the novel idea of separation of analysis and implementation and implemented a system demonstrating it, paving the way for a reusable ecosystem of components to enforce pointer safety and to identify where it needs to be enforced.
I implemented the two prevalent methods of carrying spatial pointer information and tracking pointer safety that incurred overheads comparable to that of leading research, and a pointer analysis designed to identify pointers that don't need bounds checking.
Due to their modular nature, the transformations and analysis developed can be used in future work.
By placing the existing pointer safety systems in a common evaluation framework they can be compared and contrasted, allowing the users to make a choice of which system to use or allowing researchers to identify gaps in the design space for future work.
This evaluation also identified a core trade-off in this area, that of compatibility vs completeness, with Cyclone taking its place at the completeness end of the spectrum and Heapmon taking its place at the compatibility end.
\section{Future Work}
A potential direction for future work would be to extend the Bandage system to work with C++ or Objective-C, as both can be compiled to LLVM IR but created more complicated code structures.
Further analysis passes could be implemented, for example a Cyclone inspired \textit{Not Null} analysis that detects if the pointer can be proven to be not null at that point in the code and prevents the transformations from inserting their own null checks.
Alternatively, a transformation pass using a poison based approach that tracks valid memory areas but doesn't associate them with pointers could be developed and could use the pointer analysis.
\appendix
\singlespacing
\urlstyle{footnote}
\bibliographystyle{plain}
\begingroup
\raggedright
\bibliography{Bibliography}
\endgroup
\end{document}