-
Notifications
You must be signed in to change notification settings - Fork 1
/
talk.txt
83 lines (42 loc) · 8.09 KB
/
talk.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Writing Perl extensions in Rust
Perl interpreter is written in C, and, naturally, it is the language of choice for writing Perl extensions. C is a very powerful language, but it is a sharp tool and must be used with care. Where Perl has run-time checks to catch coding errors, C instead relies on the programmer to follow the rules.
For example, when a value is dereferenced in Perl, the interpreter will check and generate a run-time error if a value is not defined. In C, the compiler just assumes the value will be valid when the program runs.
Lots of functions in XS API may return NULL. Reading from a NULL pointer is not okay, may or may not crash your program, so programmers must be diligent and check all returned values before using them.
There is no automatic reference counting in C and programmers are required to track the lifetime of every piece of memory. Reading from a pointer to memory that was released back to OS may or may not crash the program, and there is no way to prevent this but to be extra careful during development.
Programmers also need to keep track of the size of every piece of memory. Reading or writing past the end of an array is just as bad as accessing freed memory.
And there are much more. Just the list of these rules that C compiler assumes to be followed takes fifteen pages in the language standard.
Perl XS API is not making life any easier.
What does this function return? Contrary to what its name suggests, it returns the index of the last element.
C does not have facilities to support automatic reference counting, and naturally, this becomes a responsibility of the programmer. To be fast, Perl does not always count every pointer as a reference, and similar looking pointers require different treatment depending on the situation.
C does not have exceptions, while Perl does. Open files, allocated memory and any other resources held by stack local variables have to be protected from Perl exceptions to avoid leaks.
I'm not saying all this to convince you that C, XS or Perl are bad or wrong in any way. My point is that there is a lot to worry about when writing C and XS. Mistakes in C code don't just make programs crash once in a while, they cause serious security vulnerabilities as well.
Rust is a new language that is built around the idea that being careful is not enough. It is a compiled, statically typed language with a focus on memory safe programming.
It is also binary compatible with C: Rust can call functions written in C and vice versa.
In particular, being binary compatible with C means that Rust can do the same memory operations that C can - do pointer arithmetic, cast pointers between different types, read and write arbitrary memory locations. What is different, is that these and few other operations are considered unsafe in Rust and are only allowed inside special code blocks.
Handling pointers is very common in C programs, so one might think these unsafe code blocks would be everywhere. Many of Rust features are aimed at making unsafe code either not necessary, or at least well contained and isolated behind a safe interface.
This idea is not a new at all. Perl itself can be viewed as a safe abstraction around unsafe C code, but with a different set of tradeoffs. Perl XS API has this too. Internally, scalar variables are quite complex with several different structures used at the same time, but this is something programmer doesn't have to know or worry about as long as they are using official API.
This is what 'perl-xs' Rust package tries to do. It tries to abstract away low level details of using the Perl API, and provide a safe high-level interface while still using a compiled language and keeping some of performance benefits of C XS. And by safe I mean that it should be impossible to crash the interpreter or corrupt its internal state by making a mistake in the Rust program.
Let's write some code.
First, we start with the bare minimum skeleton of a subroutine. Words that end with an exclamation mark are rust macro invocations. XS macro sets up module structure in a way that is compatible with XS using a pseudo-perl syntax.
Each sub gets a single context argument that represents the Perl interpreter and provides access to the argument stack. xs_return can be used to return one or more arguments.
Then we pull an argument from the Perl stack. st_fetch will panic if there are not enough arguments, but we could also check explicitly and die with a more user-friendly message.
Where possible, Rust API uses the same names as C, so scalar values are still SVs, arrays are AVs, and so on.
What's different, is that there are no pointers here. Reference counted Perl objects are represented as smart handles. To meet the safety goals I mentioned earlier each stack local variable is reference counted.
The argument is an array reference, so it needs to be dereferenced first. But what happens if param is not a reference? In C deref_av could return a NULL pointer, but Rust has a special type, Option, to handle this situation. Option wraps around the actual value, so it is impossible to use the wrapped value without checking first if it is there or not.
This program does not compile because I forgot to check. Option type comes with many handy methods. Expect is the Rust equivalent of the common 'or die' Perl idiom.
Then we need a variable to hold the results as we go over the array. Rust can infer types of local variables most of the time, so type annotations are mostly optional.
Inside the loop, we call fetch on the array to get each element. In C array fetch function may return NULL if array element does not exist at a given index, so Rust API uses option type again. This time, the program just checks the result explicitly to skip over empty slots in the array.
Array fetch can automatically convert the element to a basic type. Since it happens immediately after the fetch, it can be done without manipulating element's reference count. In this case, everything aligns nicely, and the compiler can figure out that program wants a floating point number. If there were any ambiguity, the program wouldn't compile.
This isn't bad, but its a lot of code for a simple thing. Lets rewind a little bit and do this again.
Rust standard library contains iterator type, and it comes with about 50 different useful methods.
Like map. Map gets a code block, that replaces undefined values with a zero.
Another standard iterator method is sum. And that's it. A few short lines instead of a loop.
This talk wouldn't be complete without some benchmarks.
I tested the two versions of the program we just discussed. The Rust version is about three times faster than pure Perl implementation, and XS is about five times faster than that.
Rust itself is comparable to C in terms of performance, and most of the difference between Rust and XS versions come from how XS API is handled in Rust.
First reason is reference counting. To guarantee that user code stays free of dangling pointers, I need to tell the interpreter about every pointer kept on the stack.
Second, while XS uses macros to inline do many common operations. Doing the same thing in Rust is not impossible, but would require a considerable effort at this moment.
And the third reason is handling exceptions, in both Perl and Rust code. Why it is a problem?
Because Rust has destructors, the compiler automatically inserts code to do cleanups when variables leave scope. For example if user code allocated some memory, a destructor be called to free it. This is used extensively for all sorts of things, including automatic reference counting I just mentioned.
Now imagine that code called from a Rust module croaks. When this happens, interpreter will jump to the closest eval block, skipping everything in between. To prevent that from happening, a guard layers are inserted on both sides of Rust code. The guard block on the right, catches Perl exceptions and converts them into Rust panics. Panics bubble through the stack doing necesary cleanups and are converted back into Perl exceptions when they reach the left guard.
To summarise, it is indeed possible to write a Perl extension in Rust. It is possible to do it in a safe and relatively efficient manner.