diff --git a/.gitignore b/.gitignore index 65576b88..98f680fc 100644 --- a/.gitignore +++ b/.gitignore @@ -1,19 +1,19 @@ -/.quarto/ - -/zig-out/ -/zig-cache/ -/**/zig-out/ -/**/zig-cache/ - -/**/.zig-cache/ - -./lldb/ -lldb - -foo.txt - -*.o -*.a -*.aux -*.log -*.toc +/.quarto/ + +/zig-out/ +/zig-cache/ +/**/zig-out/ +/**/zig-cache/ + +/**/.zig-cache/ + +./lldb/ +lldb + +foo.txt + +*.o +*.a +*.aux +*.log +*.toc diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..0c3098a8 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,19 @@ +# Contributing + +There are some different ways for you to contribute to this project: + +- If you don't understand something, or, if a particular section of the book seems + confusing to you, you can [create an issue and give a feedback, or, a suggestion of improvement](https://github.com/pedropark99/zig-book/issues). + +- If you spot a typo anywhere in the book, feel free to edit the underlying QMD file (QMD files are inside the [`Chapters` folder](https://github.com/pedropark99/zig-book/tree/main/Chapters)) and send a pull request. + If you've never done this before, the process is very easy: + + - Click the edit this page on the sidebar. + + - Make the changes using GitHub's in-page editor and save. + + - Submit a pull request and include a brief description of your changes. + "Fixing typos" is perfectly adequate. + +If you send a Pull Request with significant changes, include the phrase "I assign the copyright of this contribution +to Pedro Duarte Faria" in the description of your Pull Request - I need this so I can publish the book. diff --git a/Chapters/03-structs.qmd b/Chapters/03-structs.qmd index 506b578d..f37416ba 100644 --- a/Chapters/03-structs.qmd +++ b/Chapters/03-structs.qmd @@ -1,1055 +1,1055 @@ ---- -engine: knitr -knitr: true -syntax-definition: "../Assets/zig.xml" ---- - -```{r} -#| include: false -source("../zig_engine.R") -knitr::opts_chunk$set( - auto_main = FALSE, - build_type = "lib" -) -``` - - - -# Structs, Modules and Control Flow - -I introduced a lot of the Zig's syntax to you in the last chapter, -specially at @sec-root-file and @sec-main-file. -But we still need to discuss about some other very important -elements of the language that you will use constantly on your day-to-day -routine. - -We begin this chapter by talking about the different keywords and structures -in Zig related to control flow (e.g. loops and if statements). -Then, we talk about structs and how they can be used to do some -basic Object-Oriented (OOP) patterns in Zig. We also talk about -type inference, which help us to write less code and achieve the same results. -Finally, we end this chapter by discussing modules, and how they relate -to structs. - - - -## Control flow {#sec-zig-control-flow} - -Sometimes, you need to make decisions in your program. Maybe you need to decide -wether to execute or not a specific piece of code. Or maybe, -you need to apply the same operation over a sequence of values. These kinds of tasks, -involve using structures that are capable of changing the "control flow" of our program. - -In computer science, the term "control flow" usually refers to the order in which expressions (or commands) -are evaluated in a given language or program. But this term is also used to refer -to structures that are capable of changing this "evaluation order" of the commands -executed by a given language/program. - -These structures are better known -by a set of terms, such as: loops, if/else statements, switch statements, among others. So, -loops and if/else statements are examples of structures that can change the "control -flow" of our program. The keywords `continue` and `break` are also examples of symbols -that can change the order of evaluation, since they can move our program to the next iteration -of a loop, or make the loop stop completely. - - -### If/else statements - -An if/else statement performs an "conditional flow operation". -A conditional flow control (or choice control) allows you to execute -or ignore a certain block of commands based on a logical condition. -Many programmers and computer science professionals also use -the term "branching" in this case. -In essence, we use if/else statements to use the result of a logical test -to decide whether or not to execute a given block of commands. - -In Zig, we write if/else statements by using the keywords `if` and `else`. -We start with the `if` keyword followed by a logical test inside a pair -of parentheses, and then, a pair of curly braces with contains the lines -of code to be executed in case the logical test returns the value `true`. - -After that, you can optionally add an `else` statement. Just add the `else` -keyword followed by a pair of curly braces, with the lines of code -to executed in case the logical test defined in the `if` -returns `false`. - -In the example below, we are testing if the object `x` contains a number -that is greater than 10. Judging by the output printed to the console, -we know that this logical test returned `false`. Because the output -in the console is compatible with the line of code present in the -`else` branch of the if/else statement. - - -```{zig} -#| auto_main: true -#| build_type: "run" -const x = 5; -if (x > 10) { - try stdout.print( - "x > 10!\n", .{} - ); -} else { - try stdout.print( - "x <= 10!\n", .{} - ); -} -``` - - - -### Swith statements {#sec-switch} - -Switch statements are also available in Zig. -A switch statement in Zig have a similar syntax to a switch statement in Rust. -As you would expect, to write a switch statement in Zig we use the `switch` keyword. -We provide the value that we want to "switch over" inside a -pair of parentheses. Then, we list the possible combinations (or "branchs") -inside a pair of curly braces. - -Let's take a look at the code example below. You can see in this example that, -I'm creating an enum type called `Role`. We talk more about enums at @sec-enum. -But in essence, this `Role` type is listing different types of roles in a fictitious -company, like `SE` for Software Engineer, `DE` for Data Engineer, `PM` for Product Manager, -etc. - -Notice that we are using the value from the `role` object in the -switch statement, to discover which exact area we need to store in the `area` variable object. -Also notice that we are using type inference inside the switch statement, with the dot character, -as we described at @sec-type-inference. -This makes the `zig` compiler infer the correct data type of the values (`PM`, `SE`, etc.) for us. - -Also notice that, we are grouping multiple values in the same branch of the switch statement. -We just separate each possible value with a comma. So, for example, if `role` contains either `DE` or `DA`, -the `area` variable would contain the value `"Data & Analytics"`, instead of `"Platform"`. - -```{zig} -#| build_type: "run" -#| auto_main: false -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const Role = enum { - SE, DPE, DE, DA, PM, PO, KS -}; - -pub fn main() !void { - var area: []const u8 = undefined; - const role = Role.SE; - switch (role) { - .PM, .SE, .DPE, .PO => { - area = "Platform"; - }, - .DE, .DA => { - area = "Data & Analytics"; - }, - .KS => { - area = "Sales"; - }, - } - try stdout.print("{s}\n", .{area}); -} -``` - -Now, one very important aspect about this switch statement presented -in the code example above, is that it exhaust all existing possibilities. -In other words, all possible values that could be found inside the `order` -object are explicitly handled in this switch statement. - -Since the `role` object have type `Role`, the only possible values to -be found inside this object are `PM`, `SE`, `DPE`, `PO`, `DE`, `DA` and `KS`. -There is no other possible value to be stored in this `role` object. -This what "exhaust all existing possibilities" means. The switch statement covers -every possible case. - -In Zig, switch statements must exhaust all existing possibilities. You cannot write -a switch statement, and leave an edge case with no expliciting action to be taken. -This is a similar behaviour to switch statements in Rust, which also have to -handle all possible cases. - -Take a look at the `dump_hex_fallible()` function below as an example. This function -also comes from the Zig Standard Library, but this time, it comes from the [`debug.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/debug.zig)[^debug-mod]. -There are multiple lines in this function, but I omitted them to focus solely on the -switch statement found in this function. Notice that this switch statement have four -possible cases, or four explicit branches. Also, notice that we used an `else` branch -in this case. Whenever you have multiple possible cases in your switch statement -which you want to apply the same exact action, you can use an `else` branch to do that. - -[^debug-mod]: - -```{zig} -#| eval: false -pub fn dump_hex_fallible(bytes: []const u8) !void { - // Many lines ... - switch (byte) { - '\n' => try writer.writeAll("␊"), - '\r' => try writer.writeAll("␍"), - '\t' => try writer.writeAll("␉"), - else => try writer.writeByte('.'), - } -} -``` - -Many users would also use an `else` branch to handle a "not supported" case. -That is, a case that cannot be properly handled by your code, or, just a case that -should not be "fixed". So many programmers use an `else` branch to panic (or raise an error) to stop -the current execution. - -Take the code example below as an example. We can see that, we are handling the cases -for the `level` object being either 1, 2, or 3. All other possible cases are not supported by default, -and, as consequence, we raise an runtime error in these cases, through the `@panic()` built-in function. - -Also notice that, we are assigning the result of the switch statement to a new object called `category`. -This is another thing that you can do with switch statements in Zig. If the branchs in this switch -statement output some value as result, you can store the result value of the switch statement into -a new object. - -```{zig} -#| eval: false -const level: u8 = 4; -const category = switch (level) { - 1, 2 => "beginner", - 3 => "professional", - else => { - @panic("Not supported level!"); - }, -}; -try stdout.print("{s}\n", .{category}); -``` - -``` -thread 13103 panic: Not supported level! -t.zig:9:13: 0x1033c58 in main (switch2) - @panic("Not supported level!"); - ^ -``` - -Furthermore, you can also use ranges of values in switch statements. -That is, you can create a branch in your switch statement that is used -whenever the input value is contained in a range. These range -expressions are created with the operator `...`. Is important -to emphasize that the ranges created by this operator are -inclusive on both ends. - -For example, I could easily change the code example above to support all -levels between 0 and 100. Like this: - -```{zig} -#| build_type: "run" -#| auto_main: true -const level: u8 = 4; -const category = switch (level) { - 0...25 => "beginner", - 26...75 => "intermediary", - 76...100 => "professional", - else => { - @panic("Not supported level!"); - }, -}; -try stdout.print("{s}\n", .{category}); -``` - -This is neat, and it works with character ranges too. That is, I could -simply write `'a'...'z'`, to match any character value that is a -lowercase letter, and it would work fine. - - - -### The `defer` keyword {#sec-defer} - -With the `defer` keyword you can execute expressions at the end of the current scope. -Take the `foo()` function below as an example. When we execute this function, the expression -that prints the message "Exiting function ..." get's executed only at -the end of the function scope. - -```{zig} -#| auto_main: false -#| build_type: "run" -#| echo: true -#| results: "hide" -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -fn foo() !void { - defer std.debug.print( - "Exiting function ...\n", .{} - ); - try stdout.print("Adding some numbers ...\n", .{}); - const x = 2 + 2; _ = x; - try stdout.print("Multiplying ...\n", .{}); - const y = 2 * 8; _ = y; -} - -pub fn main() !void { - try foo(); -} -``` - -``` -Adding some numbers ... -Multiplying ... -Exiting function ... -``` - -It doesn't matter how the function exits (i.e. because -of an error, or, because of an return statement, or whatever), -just remember, this expression get's executed when the function exits. - - - - -### For loops - -A loop allows you to execute the same lines of code multiple times, -thus, creating a "repetition space" in the execution flow of your program. -Loops are particularly useful when we want to replicate the same function -(or the same set of commands) over several different inputs. - -There are different types of loops available in Zig. But the most -essential of them all is probably the *for loop*. A for loop is -used to apply the same piece of code over the elements of a slice or an array. - -For loops in Zig have a slightly different syntax that you are -probably used to see in other languages. You start with the `for` keyword, then, you -list the items that you want to iterate -over inside a pair of parentheses. Then, inside of a pair of pipes (`|`) -you should declare an identifier that will serve as your iterator, or, -the "repetition index of the loop". - -```{zig} -#| eval: false -for (items) |value| { - // code to execute -} -``` - -Instead of using a `(value in items)` syntax, -in Zig, for loops use the syntax `(items) |value|`. In the example -below, you can see that we are looping through the items -of the array stored at the object `name`, and printing to the -console the decimal representation of each character in this array. - -If we wanted, we could also iterate through a slice (or a portion) of -the array, instead of iterating through the entire array stored in the `name` object. -Just use a range selector to select the section you want. For example, -I could provide the expression `name[0..3]` to the for loop, to iterate -just through the first 3 elements in the array. - -```{zig} -#| auto_main: true -#| build_type: "run" -const name = [_]u8{'P','e','d','r','o'}; -for (name) |char| { - try stdout.print("{d} | ", .{char}); -} -``` - -In the above example we are using the value itself of each -element in the array as our iterator. But there are many situations where -we need to use an index instead of the actual values of the items. - -You can do that by providing a second set of items to iterate over. -More precisely, you provide the range selector `0..` to the for loop. So, -yes, you can use two different iterators at the same time in a for -loop in Zig. - -But remember from @sec-assignments that, every object -you create in Zig must be used in some way. So if you declare two iterators -in your for loop, you must use both iterators inside the for loop body. -But if you want to use just the index iterator, and not use the "value iterator", -then, you can discard the value iterator by maching the -value items to the underscore character, like in the example below: - - -```{zig} -#| eval: false -for (name, 0..) |_, i| { - try stdout.print("{d} | ", .{i}); -} -``` - -``` -0 | 1 | 2 | 3 | 4 | -``` - - -### While loops - -A while loop is created from the `while` keyword. A `for` loop -iterates through the items of an array, but a `while` loop -will loop continuously, and infinitely, until a logical test -(specified by you) becomes false. - -You start with the `while` keyword, then, you define a logical -expression inside a pair of parentheses, and the body of the -loop is provided inside a pair of curly braces, like in the example below: - -```{zig} -#| auto_main: true -#| build_type: "run" -var i: u8 = 1; -while (i < 5) { - try stdout.print("{d} | ", .{i}); - i += 1; -} -``` - - - -### Using `break` and `continue` - -In Zig, you can explicitly stop the execution of a loop, or, jump to the next iteration of the loop, using -the keywords `break` and `continue`, respectively. The `while` loop present in the example below, is -at first sight, an infinite loop. Because the logical value inside the parenthese will always be equal to `true`. -What makes this `while` loop stop when the `i` object reaches the count -10? Is the `break` keyword! - -Inside the while loop, we have an if statement that is constantly checking if the `i` variable -is equal to 10. Since we are increasing the value of this `i` variable at each iteration of the -while loop. At some point, this `i` variable will be equal to 10, and when it does, the if statement -will execute the `break` expression, and, as a result, the execution of the while loop is stopped. - -Notice the `expect()` function from the Zig standard library after the while loop. -This `expect()` function is an "assert" type of function. -This function checks if the logical test provided is equal to true. If this logical test is false, -the function raises an assertion error. But it is equal to true, then, the function will do nothing. - -```{zig} -#| auto_main: true -#| build_type: "run" -var i: usize = 0; -while (true) { - if (i == 10) { - break; - } - i += 1; -} -try std.testing.expect(i == 10); -try stdout.print("Everything worked!", .{}); -``` - -Since this code example was executed succesfully by the `zig` compiler, -without raising any errors, then, we known that, after the execution of while loop, -the `i` variable is equal to 10. Because if it wasn't equal to 10, then, an error would -be raised by `expect()`. - -Now, in the next example, we have an use case for -the `continue` keyword. The if statement is constantly -checking if the current index is a multiple of 2. If -it is, then we jump to the next iteration of the loop -directly. But it the current index is not a multiple of 2, -then, the loop will simply print this index to the console. - -```{zig} -#| auto_main: true -#| build_type: "run" -const ns = [_]u8{1,2,3,4,5,6}; -for (ns) |i| { - if ((i % 2) == 0) { - continue; - } - try stdout.print("{d} | ", .{i}); -} -``` - - - -## Function parameters are immutable {#sec-fun-pars} - -We have already discussed a lot of the syntax behind function declarations at @sec-root-file and @sec-main-file. -But I want to emphasize a curious fact about function parameters (a.k.a. function arguments) in Zig. -In summary, function parameters are immutable in Zig. - -Take the code example below, where we declare a simple function that just tries to add -some amount to the input integer, and returns the result back. But if you look closely -at the body of this `add2()` function, you will notice that we try -to save the result back into the `x` function argument. - -In other words, this function not only use the value that it received through the function argument -`x`, but it also tries to change the value of this function argument, by assigning the addition result -into `x`. However, function arguments in Zig are immutable. You cannot change their values, or, you -cannot assign values to them inside the body's function. - -This is the reason why, the code example below do not compile successfully. If you try to compile -this code example, you get a compile error warning you that you are trying to change the value of a -immutable (i.e. constant) object. - -```{zig} -#| eval: false -const std = @import("std"); -fn add2(x: u32) u32 { - x = x + 2; - return x; -} - -pub fn main() !void { - const y = add2(4); - std.debug.print("{d}\n", .{y}); -} -``` - -``` -t.zig:3:5: error: cannot assign to constant - x = x + 2; - ^ -``` - - -If a function argument receives as input a object whose data type is -any of the primitive types that we have listed at @sec-primitive-data-types -this object is always passed by value to the function. In other words, this object -is copied to the function stack frame. - -However, if the input object have a more complex data type, for example, it might -be a struct instance, or an array, or a union, etc., in cases like that, the `zig` compiler -will take the liberty of deciding for you which strategy is best. The `zig` compiler will -pass your object to the function either by value, or by reference. The compiler will always -choose the strategy that is faster for you. -This optimization that you get for free is possible only because function arguments are -immutable in Zig. - -To overcome this barrier, we need to take the lead, and explicitly choose to pass the -object by reference. That is, instead of depending on the `zig` compiler to decide for us, we need -to explicitly mark the function argument as a pointer. This way, we are telling the compiler -that this function argument will be passed by reference to the function. - -By making it a pointer, we can finally use and alter directly the value of this function argument inside -the body of the `add2()` function. You can see that the code example below compiles successfully. - -```{zig} -#| build_type: "run" -#| auto_main: false -const std = @import("std"); -fn add2(x: *u32) void { - const d: u32 = 2; - x.* = x.* + d; -} - -pub fn main() !void { - var x: u32 = 4; - add2(&x); - std.debug.print("Result: {d}\n", .{x}); -} -``` - -``` -Result: 6 -``` - - - -## Structs and OOP {#sec-structs-and-oop} - -Zig is a language more closely related to C (which is a procedural language), -than it is to C++ or Java (which are object-oriented languages). Because of that, you do not -have advanced OOP (Object-Oriented Programming) patterns available in Zig, such as classes, interfaces or -class inheritance. Nonetheless, OOP in Zig is still possible by using struct definitions. - -With struct definitions, you can create (or define) a new data type in Zig. These struct definitions work the same way as they work in C. -You give a name to this new struct (or, to this new data type you are creating), then, you list the data members of this new struct. You can -also register functions inside this struct, and they become the methods of this particular struct (or data type), so that, every object -that you create with this new type, will always have these methods available and associated with them. - -In C++, when we create a new class, we normally have a constructor method (or, a constructor function) to construct or to instantiate every object -of this particular class, and you also have a destructor method (or a destructor function) that -is the function responsible for destroying every object of this class. - -In Zig, we normally declare the constructor and the destructor methods -of our structs, by declaring an `init()` and a `deinit()` methods inside the struct. -This is just a naming convention that you will find across the entire Zig standard library. -So, in Zig, the `init()` method of a struct is normally the constructor method of the class represented by this struct. -While the `deinit()` method is the method used for destroying an existing instance of that struct. - -The `init()` and `deinit()` methods are both used extensively in Zig code, and you will see both of -them being used when we talk about allocators at @sec-allocators. -But, as another example, let's build a simple `User` struct to represent an user of some sort of system. -If you look at the `User` struct below, you can see the `struct` keyword, and inside of a -pair of curly braces, we write the struct's body. - -Notice the data members of this struct, `id`, `name` and `email`. Every data member have it's -type explicitly annotated, with the colon character (`:`) syntax that we described earlier at @sec-root-file. -But also notice that every line in the struct body that describes a data member, ends with a comma character (`,`). -So every time you declare a data member in your Zig code, always end the line with a comma character, instead -of ending it with the traditional semicolon character (`;`). - -Next, also notice in this example, that we registrated an `init()` function as a method -of this `User` struct. This `init()` method is the constructor method that you use to instantiate -every new `User` object. That is why this `init()` function return an `User` object as result. - - -```{zig} -#| auto_main: false -#| build_type: "run" -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const User = struct { - id: u64, - name: []const u8, - email: []const u8, - - pub fn init(id: u64, - name: []const u8, - email: []const u8) User { - - return User { - .id = id, - .name = name, - .email = email - }; - } - - pub fn print_name(self: User) !void { - try stdout.print("{s}\n", .{self.name}); - } -}; - -pub fn main() !void { - const u = User.init(1, "pedro", "email@gmail.com"); - try u.print_name(); -} -``` - -The `pub` keyword plays an important role in struct declarations, and OOP in Zig. -Every method that you declare in your struct that is marked with the keyword `pub`, -becomes a public method of this particular struct. - -So every method that you create in your struct, is, at first, a private method -of that struct. Meaning that, this method can only be called from within this -struct. But, if you mark this method as public, with the keyword `pub`, then, -you can call the method directly from the `User` object you have -in your code. - -In other words, the functions marked by the keyword `pub` -are members of the public API of that struct. -For example, if I did not marked the `print_name()` method as public, -then, I could not execute the line `u.print_name()`. Because I would -not be authorized to call this method directly in my code. - - - -### Anonymous struct literals {#sec-anonymous-struct-literals} - -You can declare a struct object as a literal value. When we do that, we normally specify the -data type of this struct literal by writing it's data type just before the opening curly braces. -For example, I could write a struct literal of type `User` that we defined in the previous section like -this: - -```{zig} -#| eval: false -const eu = User { - .id = 1, - .name = "Pedro", - .email = "someemail@gmail.com" -}; -_ = eu; -``` - -However, in Zig, we can also write an anonymous struct literal. That is, you can write a -struct literal, but not especify explicitly the type of this particular struct. -An anonymous struct is written by using the syntax `.{}`. So, we essentially -replaced the explicit type of the struct literal with a dot character (`.`). - -As we described at @sec-type-inference, when you put a dot before a struct literal, -the type of this struct literal is automatically inferred by the `zig` compiler. -In essence, the `zig` compiler will look for some hint of what is the type of that struct. -It can be the type annotation of an function argument, -or the return type annotation of the function that you are using, or the type annotation -of a variable. -If the compiler do find such type annotation, then, it will use this -type in your literal struct. - -Anonymous structs are very commom to use in function arguments in Zig. -One example that you have seen already constantly, is the `print()` -function from the `stdout` object. -This function takes two arguments. -The first argument, is a template string, which should -contain string format specifiers in it, which tells how the values provided -in the second argument should be printed into the message. - -While the second argument is a struct literal that lists the values -to be printed into the template message specified in the first argument. -You normally want to use an anonymous struct literal here, so that, the -`zig` compiler do the job of specifying the type of this particular -anonymous struct for you. - -```{zig} -#| build_type: "run" -#| auto_main: false -const std = @import("std"); -pub fn main() !void { - const stdout = std.io.getStdOut().writer(); - try stdout.print("Hello, {s}!\n", .{"world"}); -} -``` - - - -### Struct declarations must be constant - -Types in Zig must be `const` or `comptime` (we are going to talk more about comptime at @sec-comptime). -What this means is that you cannot create a new data type, and mark it as variable with the `var` keyword. -So struct declarations are always constant. You cannot declare a new struct using the `var` keyword. -It must be `const`. - -In the `Vec3` example below, this declaration is allowed because I'm using the `const` keyword -to declare this new data type. - -```{zig} -#| build_type: "lib" -#| auto_main: false -const Vec3 = struct { - x: f64, - y: f64, - z: f64, -}; -``` - - -### The `self` method argument {#sec-self-arg} - -In every language that have OOP, when we declare a method of some class or struct, we -usually declare this method as a function that have a `self` argument. -This `self` argument is the reference to the object itself from which the method -is being called from. - -Is not mandatory to use this `self` argument. But why would you not use this `self` argument? -There is no reason to not use it. Because the only way to get access to the data stored in the -data members of your struct is to access them through this `self` argument. -If you don't need to use the data in the data members of your struct inside your method, then, you very likely don't need -a method, you can just simply declare this logic as a simple function, outside of your -struct declaration. - - -Take the `Vec3` struct below. Inside this `Vec3` struct we declared a method named `distance()`. -This method calculates the distance between two `Vec3` objects, by following the distance -formula in euclidean space. Notice that this `distance()` method takes two `Vec3` objects -as input, `self` and `other`. - - -```{zig} -#| build_type: "lib" -#| auto_main: false -const std = @import("std"); -const m = std.math; -const Vec3 = struct { - x: f64, - y: f64, - z: f64, - - pub fn distance(self: Vec3, other: Vec3) f64 { - const xd = m.pow(f64, self.x - other.x, 2.0); - const yd = m.pow(f64, self.y - other.y, 2.0); - const zd = m.pow(f64, self.z - other.z, 2.0); - return m.sqrt(xd + yd + zd); - } -}; -``` - - -The `self` argument corresponds to the `Vec3` object from which this `distance()` method -is being called from. While the `other` is a separate `Vec3` object that is given as input -to this method. In the example below, the `self` argument corresponds to the object -`v1`, because the `distance()` method is being called from the `v1` object, -while the `other` argument corresponds to the object `v2`. - - -```{zig} -#| eval: false -const v1 = Vec3 { - .x = 4.2, .y = 2.4, .z = 0.9 -}; -const v2 = Vec3 { - .x = 5.1, .y = 5.6, .z = 1.6 -}; - -std.debug.print( - "Distance: {d}\n", - .{v1.distance(v2)} -); -``` - -``` -Distance: 3.3970575502926055 -``` - - - -### About the struct state - -Sometimes you don't need to care about the state of your struct object. Sometimes, you just need -to instantiate and use the objects, without altering their state. You can notice that when you have methods -inside your struct declaration that might use the values that are present in the data members, but they -do not alter the values in the data members of the struct in anyway. - -The `Vec3` struct that was presented at @sec-self-arg is an example of that. -This struct have a single method named `distance()`, and this method do use the values -present in all three data members of the struct (`x`, `y` and `z`). But at the same time, -this method do not change the values of these data members in any point. - -As a result of that, when we create `Vec3` objects we usually create them as -constant objects, like the `v1` and `v2` objects presented at @sec-self-arg. -We can create them as variable objects with the `var` keyword, -if we want to. But because the methods of this `Vec3` struct do not change -the state of the objects in any point, is unnecessary to mark them -as variable objects. - -But why? Why am I talkin about this here? Is because the `self` argument -in the methods is affected depending on whether the -methods present in a struct change or not the state of the object itself. -More specifically, when you have a method in a struct that changes the state -of the object (i.e. change the value of a data member), the `self` argument -in this method must be annotated in a different manner. - -As I described at @sec-self-arg, the `self` argument in methods of -a struct is the argument that receives as input the object from which the method -was called from. We usually annotate this argument in the methods by writing `self`, -followed by the colon character (`:`), and the data type of the struct to which -the method belongs to (e.g. `User`, `Vec3`, etc.). - -If we take the `Vec3` struct that we defined in the previous section as an example, -we can see in the `distance()` method that this `self` argument is annotated as -`self: Vec3`. Because the state of the `Vec3` object is never altered by this -method. - -But what if we do have a method that alters the state of the object, by altering the -values of it's data members. How should we annotate `self` in this instance? The answer is: -"we should annotate `self` as a pointer of `x`, instead of just `x`". -In other words, you should annotate `self` as `self: *x`, instead of annotating it -as `self: x`. - -If we create a new method inside the `Vec3` object that, for example, expands the -vector by multiplying it's coordinates by a factor o two, then, we need to follow -this rule specified in the previous paragraph. The code example below demonstrates -this idea: - -```{zig} -#| build_type: "lib" -#| auto_main: false -const std = @import("std"); -const m = std.math; -const Vec3 = struct { - x: f64, - y: f64, - z: f64, - - pub fn distance(self: Vec3, other: Vec3) f64 { - const xd = m.pow(f64, self.x - other.x, 2.0); - const yd = m.pow(f64, self.y - other.y, 2.0); - const zd = m.pow(f64, self.z - other.z, 2.0); - return m.sqrt(xd + yd + zd); - } - - pub fn double(self: *Vec3) void { - self.x = self.x * 2.0; - self.y = self.y * 2.0; - self.z = self.z * 2.0; - } -}; -``` - -Notice in the code example above that we have added a new method -to our `Vec3` struct named `double()`. This method essentially doubles the -coordinate values of our vector object. Also notice that, in the -case of the `double()` method, we annotated the `self` argument as `*Vec3`, -indicating that this argument receives a pointer (or a reference, if you prefer to call it this way) -to a `Vec3` object as input. - -```{zig} -#| eval: false -var v3 = Vec3 { - .x = 4.2, .y = 2.4, .z = 0.9 -}; -v3.double(); -std.debug.print("Doubled: {d}\n", .{v3.x}); -``` - -``` -Doubled: 8.4 -``` - - - -Now, if you change the `self` argument in this `double()` method to `self: Vec3`, like in the -`distance()` method, you will get the compiler error exposed below as result. Notice that this -error message is indicating a line from the `double()` method body, -indicating that you cannot alter the value of the `x` data member. - -```zig -// If we change the function signature of double to: - pub fn double(self: Vec3) void { -``` - -This error message indicates that the `x` data member belongs to a constant object, -and, because of that, it cannot be changed. Ultimately, this error message -is telling us that the `self` argument is constant. - -``` -t.zig:16:13: error: cannot assign to constant - self.x = self.x * 2.0; - ~~~~^~ -``` - -If you take some time, and think hard about this error message, you will understand it. -You already have the tools to understand why we are getting this error message. -We have talked about it already at @sec-fun-pars. -So remember, every function argument is immutable in Zig, and `self` -is included in this rule. - -It does not matter if the object that you pass as input to the function argument is -a variable object or not. In this example, we marked the `v3` object as a variable object. -But this does not matter. Because it is not about the input object, it is about -the function argument. - -The problem begins when we try to alter the value of `self` directly, which is a function argument, -and, every function argument is immutable by default. You may quest yourself how can we overcome -this barrier, and once again, the solution was also discussed at @sec-fun-pars. -We overcome this barrier, by explicitly marking the `self` argument as a pointer. - - -::: {.callout-note} -If a method of your `x` struct alters the state of the object, by -changing the value of any data member, then, remember to use `self: *x`, -instead of `self: x` in the function signature of this method. -::: - - -You could also interpret the content discussed in this section as: -"if you need to alter the state of your `x` struct object in one of it's methods, -you must explicitly pass the `x` struct object by reference to the `self` argument of this method". - - - -## Type inference {#sec-type-inference} - -Zig is kind of a strongly typed language. I say "kind of" because there are situations -where you don't have to explicitly write the type of every single object in your source code, -as you would expect from a traditional strongly typed language, such as C and C++. - -In some situations, the `zig` compiler can use type inference to solves the data types for you, easing some of -the burden that you carry as a developer. -The most commom way this happens is through function arguments that receives struct objects -as input. - -In general, type inference in Zig is done by using the dot character (`.`). -Everytime you see a dot character written before a struct literal, or before a enum value, or something like that, -you know that this dot character is playing a special party in this place. More specifically, it is -telling the `zig` compiler something on the lines of: "Hey! Can you infer the type of this -value for me? Please!". In other words, this dot character is playing a role similar to the `auto` keyword in C++. - -I give you some examples of this at @sec-anonymous-struct-literals, where we present anonymous struct literals. -Anonymous struct literals are, essentially, struct literals that use type inference to -infer the exact type of this particular struct literal. -This type inference is done by looking for some minimal hint of the correct data type to be used. -You could say that the `zig` compiler looks for any neighbouring type annotation that might tell him -what would be the correct type. - -Another commom place where we use type inference in Zig is at switch statements (which we talk about at @sec-switch). -So I also gave some other examples of type inference at @sec-switch, where we were inferring the data types of enum values listed inside -of switch statements (e.g. `.DE`). -But as another example, take a look at this `fence()` function reproduced below, -which comes from the [`atomic.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/atomic.zig)[^fence-fn] -of the Zig Standard Library. - -[^fence-fn]: . - -There are a lot of things in this function that we haven't talked about yet, such as: -what `comptime` means? `inline`? `extern`? What is this star symbol before `Self`? -Let's just ignore all of these things, and focus solely on the switch statement -that is inside this function. - -We can see that this switch statement uses the `order` object as input. This `order` -object is one of the inputs of this `fence()` function, and we can see in the type annotation, -that this object is of type `AtomicOrder`. We can also see a bunch of values inside the -switch statements that begins with a dot character, such as `.release` and `.acquire`. - -Because these weird values contain a dot character before them, we are asking the `zig` -compiler to infer the types of these values inside the switch statement. Then, the `zig` -compiler is looking into the current context where these values are being used, and it is -trying to infer the types of these values. - -Since they are being used inside a switch statement, the `zig` compiler looks into the type -of the input object given to the switch statement, which is the `order` object in this case. -Because this object have type `AtomicOrder`, the `zig` compiler infers that these values -are data members from this type `AtomicOrder`. - -```{zig} -#| eval: false -pub inline fn fence(self: *Self, comptime order: AtomicOrder) void { - // LLVM's ThreadSanitizer doesn't support the normal fences so we specialize for it. - if (builtin.sanitize_thread) { - const tsan = struct { - extern "c" fn __tsan_acquire(addr: *anyopaque) void; - extern "c" fn __tsan_release(addr: *anyopaque) void; - }; - - const addr: *anyopaque = self; - return switch (order) { - .unordered, .monotonic => @compileError(@tagName(order) ++ " only applies to atomic loads and stores"), - .acquire => tsan.__tsan_acquire(addr), - .release => tsan.__tsan_release(addr), - .acq_rel, .seq_cst => { - tsan.__tsan_acquire(addr); - tsan.__tsan_release(addr); - }, - }; - } - - return @fence(order); -} -``` - -This is how basic type inference is done in Zig. If we didn't use the dot character before -the values inside this switch statement, then, we would be forced to write explicitly -the data types of these values. For example, instead of writing `.release` we would have to -write `AtomicOrder.release`. We would have to do this for every single value -in this switch statement, and this is a lot of work. That is why type inference -is commonly used on switch statements in Zig. - - - -## Modules - -We already talked about what modules are, and also, how to import other modules into -you current module through *import statements*, so that you can use functionality from these other modules in -your current module. -But in this section, I just want to make it clear that modules are actually structs in Zig. - -In other words, every Zig module (i.e. a `.zig` file) that you write in your project -is internally stored as a struct object. -Take the line exposed below as an example. In this line we are importing the -Zig Standard Library into our current module. - -```zig -const std = @import("std"); -``` - -When we want to access the functions and objects from the standard library, we -are basically accessing the data members of the struct stored in the `std` -object. That is why we use the same syntax that we use in normal structs, with the dot operator (`.`) -to access the data members and methods of the struct. - -When this "import statement" get's executed, the result of this expression is a struct -object that contains the Zig Standard Library modules, global variables, functions, etc. -And this struct object get's saved (or stored) inside the constant object named `std`. - - -Take the [`thread_pool.zig` module from the project `zap`](https://github.com/kprotty/zap/blob/blog/src/thread_pool.zig)[^thread] -as an example. This module is written as if it was -a big struct. That is why we have a top-level and public `init()` method -written in this module. The idea is that all top-level functions written in this -module are methods from the struct, and all top-level objects and struct declarations -are data members of this struct. The module is the struct itself. - -[^thread]: - - -So you would import and use this module by doing something like this: - -```zig -const std = @import("std"); -const ThreadPool = @import("thread_pool.zig"); -const num_cpus = std.Thread.getCpuCount() - catch @panic("failed to get cpu core count"); -const num_threads = std.math.cast(u16, num_cpus) - catch std.math.maxInt(u16); -const pool = ThreadPool.init( - .{ .max_threads = num_threads } -); -``` +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + +# Structs, Modules and Control Flow + +I introduced a lot of the Zig's syntax to you in the last chapter, +specially at @sec-root-file and @sec-main-file. +But we still need to discuss about some other very important +elements of the language that you will use constantly on your day-to-day +routine. + +We begin this chapter by talking about the different keywords and structures +in Zig related to control flow (e.g. loops and if statements). +Then, we talk about structs and how they can be used to do some +basic Object-Oriented (OOP) patterns in Zig. We also talk about +type inference, which help us to write less code and achieve the same results. +Finally, we end this chapter by discussing modules, and how they relate +to structs. + + + +## Control flow {#sec-zig-control-flow} + +Sometimes, you need to make decisions in your program. Maybe you need to decide +wether to execute or not a specific piece of code. Or maybe, +you need to apply the same operation over a sequence of values. These kinds of tasks, +involve using structures that are capable of changing the "control flow" of our program. + +In computer science, the term "control flow" usually refers to the order in which expressions (or commands) +are evaluated in a given language or program. But this term is also used to refer +to structures that are capable of changing this "evaluation order" of the commands +executed by a given language/program. + +These structures are better known +by a set of terms, such as: loops, if/else statements, switch statements, among others. So, +loops and if/else statements are examples of structures that can change the "control +flow" of our program. The keywords `continue` and `break` are also examples of symbols +that can change the order of evaluation, since they can move our program to the next iteration +of a loop, or make the loop stop completely. + + +### If/else statements + +An if/else statement performs an "conditional flow operation". +A conditional flow control (or choice control) allows you to execute +or ignore a certain block of commands based on a logical condition. +Many programmers and computer science professionals also use +the term "branching" in this case. +In essence, we use if/else statements to use the result of a logical test +to decide whether or not to execute a given block of commands. + +In Zig, we write if/else statements by using the keywords `if` and `else`. +We start with the `if` keyword followed by a logical test inside a pair +of parentheses, and then, a pair of curly braces with contains the lines +of code to be executed in case the logical test returns the value `true`. + +After that, you can optionally add an `else` statement. Just add the `else` +keyword followed by a pair of curly braces, with the lines of code +to executed in case the logical test defined in the `if` +returns `false`. + +In the example below, we are testing if the object `x` contains a number +that is greater than 10. Judging by the output printed to the console, +we know that this logical test returned `false`. Because the output +in the console is compatible with the line of code present in the +`else` branch of the if/else statement. + + +```{zig} +#| auto_main: true +#| build_type: "run" +const x = 5; +if (x > 10) { + try stdout.print( + "x > 10!\n", .{} + ); +} else { + try stdout.print( + "x <= 10!\n", .{} + ); +} +``` + + + +### Swith statements {#sec-switch} + +Switch statements are also available in Zig. +A switch statement in Zig have a similar syntax to a switch statement in Rust. +As you would expect, to write a switch statement in Zig we use the `switch` keyword. +We provide the value that we want to "switch over" inside a +pair of parentheses. Then, we list the possible combinations (or "branchs") +inside a pair of curly braces. + +Let's take a look at the code example below. You can see in this example that, +I'm creating an enum type called `Role`. We talk more about enums at @sec-enum. +But in essence, this `Role` type is listing different types of roles in a fictitious +company, like `SE` for Software Engineer, `DE` for Data Engineer, `PM` for Product Manager, +etc. + +Notice that we are using the value from the `role` object in the +switch statement, to discover which exact area we need to store in the `area` variable object. +Also notice that we are using type inference inside the switch statement, with the dot character, +as we described at @sec-type-inference. +This makes the `zig` compiler infer the correct data type of the values (`PM`, `SE`, etc.) for us. + +Also notice that, we are grouping multiple values in the same branch of the switch statement. +We just separate each possible value with a comma. So, for example, if `role` contains either `DE` or `DA`, +the `area` variable would contain the value `"Data & Analytics"`, instead of `"Platform"`. + +```{zig} +#| build_type: "run" +#| auto_main: false +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Role = enum { + SE, DPE, DE, DA, PM, PO, KS +}; + +pub fn main() !void { + var area: []const u8 = undefined; + const role = Role.SE; + switch (role) { + .PM, .SE, .DPE, .PO => { + area = "Platform"; + }, + .DE, .DA => { + area = "Data & Analytics"; + }, + .KS => { + area = "Sales"; + }, + } + try stdout.print("{s}\n", .{area}); +} +``` + +Now, one very important aspect about this switch statement presented +in the code example above, is that it exhaust all existing possibilities. +In other words, all possible values that could be found inside the `order` +object are explicitly handled in this switch statement. + +Since the `role` object have type `Role`, the only possible values to +be found inside this object are `PM`, `SE`, `DPE`, `PO`, `DE`, `DA` and `KS`. +There is no other possible value to be stored in this `role` object. +This what "exhaust all existing possibilities" means. The switch statement covers +every possible case. + +In Zig, switch statements must exhaust all existing possibilities. You cannot write +a switch statement, and leave an edge case with no expliciting action to be taken. +This is a similar behaviour to switch statements in Rust, which also have to +handle all possible cases. + +Take a look at the `dump_hex_fallible()` function below as an example. This function +also comes from the Zig Standard Library, but this time, it comes from the [`debug.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/debug.zig)[^debug-mod]. +There are multiple lines in this function, but I omitted them to focus solely on the +switch statement found in this function. Notice that this switch statement have four +possible cases, or four explicit branches. Also, notice that we used an `else` branch +in this case. Whenever you have multiple possible cases in your switch statement +which you want to apply the same exact action, you can use an `else` branch to do that. + +[^debug-mod]: + +```{zig} +#| eval: false +pub fn dump_hex_fallible(bytes: []const u8) !void { + // Many lines ... + switch (byte) { + '\n' => try writer.writeAll("␊"), + '\r' => try writer.writeAll("␍"), + '\t' => try writer.writeAll("␉"), + else => try writer.writeByte('.'), + } +} +``` + +Many users would also use an `else` branch to handle a "not supported" case. +That is, a case that cannot be properly handled by your code, or, just a case that +should not be "fixed". So many programmers use an `else` branch to panic (or raise an error) to stop +the current execution. + +Take the code example below as an example. We can see that, we are handling the cases +for the `level` object being either 1, 2, or 3. All other possible cases are not supported by default, +and, as consequence, we raise an runtime error in these cases, through the `@panic()` built-in function. + +Also notice that, we are assigning the result of the switch statement to a new object called `category`. +This is another thing that you can do with switch statements in Zig. If the branchs in this switch +statement output some value as result, you can store the result value of the switch statement into +a new object. + +```{zig} +#| eval: false +const level: u8 = 4; +const category = switch (level) { + 1, 2 => "beginner", + 3 => "professional", + else => { + @panic("Not supported level!"); + }, +}; +try stdout.print("{s}\n", .{category}); +``` + +``` +thread 13103 panic: Not supported level! +t.zig:9:13: 0x1033c58 in main (switch2) + @panic("Not supported level!"); + ^ +``` + +Furthermore, you can also use ranges of values in switch statements. +That is, you can create a branch in your switch statement that is used +whenever the input value is contained in a range. These range +expressions are created with the operator `...`. Is important +to emphasize that the ranges created by this operator are +inclusive on both ends. + +For example, I could easily change the code example above to support all +levels between 0 and 100. Like this: + +```{zig} +#| build_type: "run" +#| auto_main: true +const level: u8 = 4; +const category = switch (level) { + 0...25 => "beginner", + 26...75 => "intermediary", + 76...100 => "professional", + else => { + @panic("Not supported level!"); + }, +}; +try stdout.print("{s}\n", .{category}); +``` + +This is neat, and it works with character ranges too. That is, I could +simply write `'a'...'z'`, to match any character value that is a +lowercase letter, and it would work fine. + + + +### The `defer` keyword {#sec-defer} + +With the `defer` keyword you can execute expressions at the end of the current scope. +Take the `foo()` function below as an example. When we execute this function, the expression +that prints the message "Exiting function ..." get's executed only at +the end of the function scope. + +```{zig} +#| auto_main: false +#| build_type: "run" +#| echo: true +#| results: "hide" +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +fn foo() !void { + defer std.debug.print( + "Exiting function ...\n", .{} + ); + try stdout.print("Adding some numbers ...\n", .{}); + const x = 2 + 2; _ = x; + try stdout.print("Multiplying ...\n", .{}); + const y = 2 * 8; _ = y; +} + +pub fn main() !void { + try foo(); +} +``` + +``` +Adding some numbers ... +Multiplying ... +Exiting function ... +``` + +It doesn't matter how the function exits (i.e. because +of an error, or, because of an return statement, or whatever), +just remember, this expression get's executed when the function exits. + + + + +### For loops + +A loop allows you to execute the same lines of code multiple times, +thus, creating a "repetition space" in the execution flow of your program. +Loops are particularly useful when we want to replicate the same function +(or the same set of commands) over several different inputs. + +There are different types of loops available in Zig. But the most +essential of them all is probably the *for loop*. A for loop is +used to apply the same piece of code over the elements of a slice or an array. + +For loops in Zig have a slightly different syntax that you are +probably used to see in other languages. You start with the `for` keyword, then, you +list the items that you want to iterate +over inside a pair of parentheses. Then, inside of a pair of pipes (`|`) +you should declare an identifier that will serve as your iterator, or, +the "repetition index of the loop". + +```{zig} +#| eval: false +for (items) |value| { + // code to execute +} +``` + +Instead of using a `(value in items)` syntax, +in Zig, for loops use the syntax `(items) |value|`. In the example +below, you can see that we are looping through the items +of the array stored at the object `name`, and printing to the +console the decimal representation of each character in this array. + +If we wanted, we could also iterate through a slice (or a portion) of +the array, instead of iterating through the entire array stored in the `name` object. +Just use a range selector to select the section you want. For example, +I could provide the expression `name[0..3]` to the for loop, to iterate +just through the first 3 elements in the array. + +```{zig} +#| auto_main: true +#| build_type: "run" +const name = [_]u8{'P','e','d','r','o'}; +for (name) |char| { + try stdout.print("{d} | ", .{char}); +} +``` + +In the above example we are using the value itself of each +element in the array as our iterator. But there are many situations where +we need to use an index instead of the actual values of the items. + +You can do that by providing a second set of items to iterate over. +More precisely, you provide the range selector `0..` to the for loop. So, +yes, you can use two different iterators at the same time in a for +loop in Zig. + +But remember from @sec-assignments that, every object +you create in Zig must be used in some way. So if you declare two iterators +in your for loop, you must use both iterators inside the for loop body. +But if you want to use just the index iterator, and not use the "value iterator", +then, you can discard the value iterator by maching the +value items to the underscore character, like in the example below: + + +```{zig} +#| eval: false +for (name, 0..) |_, i| { + try stdout.print("{d} | ", .{i}); +} +``` + +``` +0 | 1 | 2 | 3 | 4 | +``` + + +### While loops + +A while loop is created from the `while` keyword. A `for` loop +iterates through the items of an array, but a `while` loop +will loop continuously, and infinitely, until a logical test +(specified by you) becomes false. + +You start with the `while` keyword, then, you define a logical +expression inside a pair of parentheses, and the body of the +loop is provided inside a pair of curly braces, like in the example below: + +```{zig} +#| auto_main: true +#| build_type: "run" +var i: u8 = 1; +while (i < 5) { + try stdout.print("{d} | ", .{i}); + i += 1; +} +``` + + + +### Using `break` and `continue` + +In Zig, you can explicitly stop the execution of a loop, or, jump to the next iteration of the loop, using +the keywords `break` and `continue`, respectively. The `while` loop present in the example below, is +at first sight, an infinite loop. Because the logical value inside the parenthese will always be equal to `true`. +What makes this `while` loop stop when the `i` object reaches the count +10? Is the `break` keyword! + +Inside the while loop, we have an if statement that is constantly checking if the `i` variable +is equal to 10. Since we are increasing the value of this `i` variable at each iteration of the +while loop. At some point, this `i` variable will be equal to 10, and when it does, the if statement +will execute the `break` expression, and, as a result, the execution of the while loop is stopped. + +Notice the `expect()` function from the Zig standard library after the while loop. +This `expect()` function is an "assert" type of function. +This function checks if the logical test provided is equal to true. If this logical test is false, +the function raises an assertion error. But it is equal to true, then, the function will do nothing. + +```{zig} +#| auto_main: true +#| build_type: "run" +var i: usize = 0; +while (true) { + if (i == 10) { + break; + } + i += 1; +} +try std.testing.expect(i == 10); +try stdout.print("Everything worked!", .{}); +``` + +Since this code example was executed succesfully by the `zig` compiler, +without raising any errors, then, we known that, after the execution of while loop, +the `i` variable is equal to 10. Because if it wasn't equal to 10, then, an error would +be raised by `expect()`. + +Now, in the next example, we have an use case for +the `continue` keyword. The if statement is constantly +checking if the current index is a multiple of 2. If +it is, then we jump to the next iteration of the loop +directly. But it the current index is not a multiple of 2, +then, the loop will simply print this index to the console. + +```{zig} +#| auto_main: true +#| build_type: "run" +const ns = [_]u8{1,2,3,4,5,6}; +for (ns) |i| { + if ((i % 2) == 0) { + continue; + } + try stdout.print("{d} | ", .{i}); +} +``` + + + +## Function parameters are immutable {#sec-fun-pars} + +We have already discussed a lot of the syntax behind function declarations at @sec-root-file and @sec-main-file. +But I want to emphasize a curious fact about function parameters (a.k.a. function arguments) in Zig. +In summary, function parameters are immutable in Zig. + +Take the code example below, where we declare a simple function that just tries to add +some amount to the input integer, and returns the result back. But if you look closely +at the body of this `add2()` function, you will notice that we try +to save the result back into the `x` function argument. + +In other words, this function not only use the value that it received through the function argument +`x`, but it also tries to change the value of this function argument, by assigning the addition result +into `x`. However, function arguments in Zig are immutable. You cannot change their values, or, you +cannot assign values to them inside the body's function. + +This is the reason why, the code example below do not compile successfully. If you try to compile +this code example, you get a compile error warning you that you are trying to change the value of a +immutable (i.e. constant) object. + +```{zig} +#| eval: false +const std = @import("std"); +fn add2(x: u32) u32 { + x = x + 2; + return x; +} + +pub fn main() !void { + const y = add2(4); + std.debug.print("{d}\n", .{y}); +} +``` + +``` +t.zig:3:5: error: cannot assign to constant + x = x + 2; + ^ +``` + + +If a function argument receives as input a object whose data type is +any of the primitive types that we have listed at @sec-primitive-data-types +this object is always passed by value to the function. In other words, this object +is copied to the function stack frame. + +However, if the input object have a more complex data type, for example, it might +be a struct instance, or an array, or a union, etc., in cases like that, the `zig` compiler +will take the liberty of deciding for you which strategy is best. The `zig` compiler will +pass your object to the function either by value, or by reference. The compiler will always +choose the strategy that is faster for you. +This optimization that you get for free is possible only because function arguments are +immutable in Zig. + +To overcome this barrier, we need to take the lead, and explicitly choose to pass the +object by reference. That is, instead of depending on the `zig` compiler to decide for us, we need +to explicitly mark the function argument as a pointer. This way, we are telling the compiler +that this function argument will be passed by reference to the function. + +By making it a pointer, we can finally use and alter directly the value of this function argument inside +the body of the `add2()` function. You can see that the code example below compiles successfully. + +```{zig} +#| build_type: "run" +#| auto_main: false +const std = @import("std"); +fn add2(x: *u32) void { + const d: u32 = 2; + x.* = x.* + d; +} + +pub fn main() !void { + var x: u32 = 4; + add2(&x); + std.debug.print("Result: {d}\n", .{x}); +} +``` + +``` +Result: 6 +``` + + + +## Structs and OOP {#sec-structs-and-oop} + +Zig is a language more closely related to C (which is a procedural language), +than it is to C++ or Java (which are object-oriented languages). Because of that, you do not +have advanced OOP (Object-Oriented Programming) patterns available in Zig, such as classes, interfaces or +class inheritance. Nonetheless, OOP in Zig is still possible by using struct definitions. + +With struct definitions, you can create (or define) a new data type in Zig. These struct definitions work the same way as they work in C. +You give a name to this new struct (or, to this new data type you are creating), then, you list the data members of this new struct. You can +also register functions inside this struct, and they become the methods of this particular struct (or data type), so that, every object +that you create with this new type, will always have these methods available and associated with them. + +In C++, when we create a new class, we normally have a constructor method (or, a constructor function) to construct or to instantiate every object +of this particular class, and you also have a destructor method (or a destructor function) that +is the function responsible for destroying every object of this class. + +In Zig, we normally declare the constructor and the destructor methods +of our structs, by declaring an `init()` and a `deinit()` methods inside the struct. +This is just a naming convention that you will find across the entire Zig standard library. +So, in Zig, the `init()` method of a struct is normally the constructor method of the class represented by this struct. +While the `deinit()` method is the method used for destroying an existing instance of that struct. + +The `init()` and `deinit()` methods are both used extensively in Zig code, and you will see both of +them being used when we talk about allocators at @sec-allocators. +But, as another example, let's build a simple `User` struct to represent an user of some sort of system. +If you look at the `User` struct below, you can see the `struct` keyword, and inside of a +pair of curly braces, we write the struct's body. + +Notice the data members of this struct, `id`, `name` and `email`. Every data member have it's +type explicitly annotated, with the colon character (`:`) syntax that we described earlier at @sec-root-file. +But also notice that every line in the struct body that describes a data member, ends with a comma character (`,`). +So every time you declare a data member in your Zig code, always end the line with a comma character, instead +of ending it with the traditional semicolon character (`;`). + +Next, also notice in this example, that we registrated an `init()` function as a method +of this `User` struct. This `init()` method is the constructor method that you use to instantiate +every new `User` object. That is why this `init()` function return an `User` object as result. + + +```{zig} +#| auto_main: false +#| build_type: "run" +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const User = struct { + id: u64, + name: []const u8, + email: []const u8, + + pub fn init(id: u64, + name: []const u8, + email: []const u8) User { + + return User { + .id = id, + .name = name, + .email = email + }; + } + + pub fn print_name(self: User) !void { + try stdout.print("{s}\n", .{self.name}); + } +}; + +pub fn main() !void { + const u = User.init(1, "pedro", "email@gmail.com"); + try u.print_name(); +} +``` + +The `pub` keyword plays an important role in struct declarations, and OOP in Zig. +Every method that you declare in your struct that is marked with the keyword `pub`, +becomes a public method of this particular struct. + +So every method that you create in your struct, is, at first, a private method +of that struct. Meaning that, this method can only be called from within this +struct. But, if you mark this method as public, with the keyword `pub`, then, +you can call the method directly from the `User` object you have +in your code. + +In other words, the functions marked by the keyword `pub` +are members of the public API of that struct. +For example, if I did not marked the `print_name()` method as public, +then, I could not execute the line `u.print_name()`. Because I would +not be authorized to call this method directly in my code. + + + +### Anonymous struct literals {#sec-anonymous-struct-literals} + +You can declare a struct object as a literal value. When we do that, we normally specify the +data type of this struct literal by writing it's data type just before the opening curly braces. +For example, I could write a struct literal of type `User` that we defined in the previous section like +this: + +```{zig} +#| eval: false +const eu = User { + .id = 1, + .name = "Pedro", + .email = "someemail@gmail.com" +}; +_ = eu; +``` + +However, in Zig, we can also write an anonymous struct literal. That is, you can write a +struct literal, but not especify explicitly the type of this particular struct. +An anonymous struct is written by using the syntax `.{}`. So, we essentially +replaced the explicit type of the struct literal with a dot character (`.`). + +As we described at @sec-type-inference, when you put a dot before a struct literal, +the type of this struct literal is automatically inferred by the `zig` compiler. +In essence, the `zig` compiler will look for some hint of what is the type of that struct. +It can be the type annotation of an function argument, +or the return type annotation of the function that you are using, or the type annotation +of a variable. +If the compiler do find such type annotation, then, it will use this +type in your literal struct. + +Anonymous structs are very commom to use in function arguments in Zig. +One example that you have seen already constantly, is the `print()` +function from the `stdout` object. +This function takes two arguments. +The first argument, is a template string, which should +contain string format specifiers in it, which tells how the values provided +in the second argument should be printed into the message. + +While the second argument is a struct literal that lists the values +to be printed into the template message specified in the first argument. +You normally want to use an anonymous struct literal here, so that, the +`zig` compiler do the job of specifying the type of this particular +anonymous struct for you. + +```{zig} +#| build_type: "run" +#| auto_main: false +const std = @import("std"); +pub fn main() !void { + const stdout = std.io.getStdOut().writer(); + try stdout.print("Hello, {s}!\n", .{"world"}); +} +``` + + + +### Struct declarations must be constant + +Types in Zig must be `const` or `comptime` (we are going to talk more about comptime at @sec-comptime). +What this means is that you cannot create a new data type, and mark it as variable with the `var` keyword. +So struct declarations are always constant. You cannot declare a new struct using the `var` keyword. +It must be `const`. + +In the `Vec3` example below, this declaration is allowed because I'm using the `const` keyword +to declare this new data type. + +```{zig} +#| build_type: "lib" +#| auto_main: false +const Vec3 = struct { + x: f64, + y: f64, + z: f64, +}; +``` + + +### The `self` method argument {#sec-self-arg} + +In every language that have OOP, when we declare a method of some class or struct, we +usually declare this method as a function that have a `self` argument. +This `self` argument is the reference to the object itself from which the method +is being called from. + +Is not mandatory to use this `self` argument. But why would you not use this `self` argument? +There is no reason to not use it. Because the only way to get access to the data stored in the +data members of your struct is to access them through this `self` argument. +If you don't need to use the data in the data members of your struct inside your method, then, you very likely don't need +a method, you can just simply declare this logic as a simple function, outside of your +struct declaration. + + +Take the `Vec3` struct below. Inside this `Vec3` struct we declared a method named `distance()`. +This method calculates the distance between two `Vec3` objects, by following the distance +formula in euclidean space. Notice that this `distance()` method takes two `Vec3` objects +as input, `self` and `other`. + + +```{zig} +#| build_type: "lib" +#| auto_main: false +const std = @import("std"); +const m = std.math; +const Vec3 = struct { + x: f64, + y: f64, + z: f64, + + pub fn distance(self: Vec3, other: Vec3) f64 { + const xd = m.pow(f64, self.x - other.x, 2.0); + const yd = m.pow(f64, self.y - other.y, 2.0); + const zd = m.pow(f64, self.z - other.z, 2.0); + return m.sqrt(xd + yd + zd); + } +}; +``` + + +The `self` argument corresponds to the `Vec3` object from which this `distance()` method +is being called from. While the `other` is a separate `Vec3` object that is given as input +to this method. In the example below, the `self` argument corresponds to the object +`v1`, because the `distance()` method is being called from the `v1` object, +while the `other` argument corresponds to the object `v2`. + + +```{zig} +#| eval: false +const v1 = Vec3 { + .x = 4.2, .y = 2.4, .z = 0.9 +}; +const v2 = Vec3 { + .x = 5.1, .y = 5.6, .z = 1.6 +}; + +std.debug.print( + "Distance: {d}\n", + .{v1.distance(v2)} +); +``` + +``` +Distance: 3.3970575502926055 +``` + + + +### About the struct state + +Sometimes you don't need to care about the state of your struct object. Sometimes, you just need +to instantiate and use the objects, without altering their state. You can notice that when you have methods +inside your struct declaration that might use the values that are present in the data members, but they +do not alter the values in the data members of the struct in anyway. + +The `Vec3` struct that was presented at @sec-self-arg is an example of that. +This struct have a single method named `distance()`, and this method do use the values +present in all three data members of the struct (`x`, `y` and `z`). But at the same time, +this method do not change the values of these data members in any point. + +As a result of that, when we create `Vec3` objects we usually create them as +constant objects, like the `v1` and `v2` objects presented at @sec-self-arg. +We can create them as variable objects with the `var` keyword, +if we want to. But because the methods of this `Vec3` struct do not change +the state of the objects in any point, is unnecessary to mark them +as variable objects. + +But why? Why am I talkin about this here? Is because the `self` argument +in the methods is affected depending on whether the +methods present in a struct change or not the state of the object itself. +More specifically, when you have a method in a struct that changes the state +of the object (i.e. change the value of a data member), the `self` argument +in this method must be annotated in a different manner. + +As I described at @sec-self-arg, the `self` argument in methods of +a struct is the argument that receives as input the object from which the method +was called from. We usually annotate this argument in the methods by writing `self`, +followed by the colon character (`:`), and the data type of the struct to which +the method belongs to (e.g. `User`, `Vec3`, etc.). + +If we take the `Vec3` struct that we defined in the previous section as an example, +we can see in the `distance()` method that this `self` argument is annotated as +`self: Vec3`. Because the state of the `Vec3` object is never altered by this +method. + +But what if we do have a method that alters the state of the object, by altering the +values of it's data members. How should we annotate `self` in this instance? The answer is: +"we should annotate `self` as a pointer of `x`, instead of just `x`". +In other words, you should annotate `self` as `self: *x`, instead of annotating it +as `self: x`. + +If we create a new method inside the `Vec3` object that, for example, expands the +vector by multiplying it's coordinates by a factor o two, then, we need to follow +this rule specified in the previous paragraph. The code example below demonstrates +this idea: + +```{zig} +#| build_type: "lib" +#| auto_main: false +const std = @import("std"); +const m = std.math; +const Vec3 = struct { + x: f64, + y: f64, + z: f64, + + pub fn distance(self: Vec3, other: Vec3) f64 { + const xd = m.pow(f64, self.x - other.x, 2.0); + const yd = m.pow(f64, self.y - other.y, 2.0); + const zd = m.pow(f64, self.z - other.z, 2.0); + return m.sqrt(xd + yd + zd); + } + + pub fn double(self: *Vec3) void { + self.x = self.x * 2.0; + self.y = self.y * 2.0; + self.z = self.z * 2.0; + } +}; +``` + +Notice in the code example above that we have added a new method +to our `Vec3` struct named `double()`. This method essentially doubles the +coordinate values of our vector object. Also notice that, in the +case of the `double()` method, we annotated the `self` argument as `*Vec3`, +indicating that this argument receives a pointer (or a reference, if you prefer to call it this way) +to a `Vec3` object as input. + +```{zig} +#| eval: false +var v3 = Vec3 { + .x = 4.2, .y = 2.4, .z = 0.9 +}; +v3.double(); +std.debug.print("Doubled: {d}\n", .{v3.x}); +``` + +``` +Doubled: 8.4 +``` + + + +Now, if you change the `self` argument in this `double()` method to `self: Vec3`, like in the +`distance()` method, you will get the compiler error exposed below as result. Notice that this +error message is indicating a line from the `double()` method body, +indicating that you cannot alter the value of the `x` data member. + +```zig +// If we change the function signature of double to: + pub fn double(self: Vec3) void { +``` + +This error message indicates that the `x` data member belongs to a constant object, +and, because of that, it cannot be changed. Ultimately, this error message +is telling us that the `self` argument is constant. + +``` +t.zig:16:13: error: cannot assign to constant + self.x = self.x * 2.0; + ~~~~^~ +``` + +If you take some time, and think hard about this error message, you will understand it. +You already have the tools to understand why we are getting this error message. +We have talked about it already at @sec-fun-pars. +So remember, every function argument is immutable in Zig, and `self` +is included in this rule. + +It does not matter if the object that you pass as input to the function argument is +a variable object or not. In this example, we marked the `v3` object as a variable object. +But this does not matter. Because it is not about the input object, it is about +the function argument. + +The problem begins when we try to alter the value of `self` directly, which is a function argument, +and, every function argument is immutable by default. You may quest yourself how can we overcome +this barrier, and once again, the solution was also discussed at @sec-fun-pars. +We overcome this barrier, by explicitly marking the `self` argument as a pointer. + + +::: {.callout-note} +If a method of your `x` struct alters the state of the object, by +changing the value of any data member, then, remember to use `self: *x`, +instead of `self: x` in the function signature of this method. +::: + + +You could also interpret the content discussed in this section as: +"if you need to alter the state of your `x` struct object in one of it's methods, +you must explicitly pass the `x` struct object by reference to the `self` argument of this method". + + + +## Type inference {#sec-type-inference} + +Zig is kind of a strongly typed language. I say "kind of" because there are situations +where you don't have to explicitly write the type of every single object in your source code, +as you would expect from a traditional strongly typed language, such as C and C++. + +In some situations, the `zig` compiler can use type inference to solves the data types for you, easing some of +the burden that you carry as a developer. +The most commom way this happens is through function arguments that receives struct objects +as input. + +In general, type inference in Zig is done by using the dot character (`.`). +Everytime you see a dot character written before a struct literal, or before a enum value, or something like that, +you know that this dot character is playing a special party in this place. More specifically, it is +telling the `zig` compiler something on the lines of: "Hey! Can you infer the type of this +value for me? Please!". In other words, this dot character is playing a role similar to the `auto` keyword in C++. + +I give you some examples of this at @sec-anonymous-struct-literals, where we present anonymous struct literals. +Anonymous struct literals are, essentially, struct literals that use type inference to +infer the exact type of this particular struct literal. +This type inference is done by looking for some minimal hint of the correct data type to be used. +You could say that the `zig` compiler looks for any neighbouring type annotation that might tell him +what would be the correct type. + +Another commom place where we use type inference in Zig is at switch statements (which we talk about at @sec-switch). +So I also gave some other examples of type inference at @sec-switch, where we were inferring the data types of enum values listed inside +of switch statements (e.g. `.DE`). +But as another example, take a look at this `fence()` function reproduced below, +which comes from the [`atomic.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/atomic.zig)[^fence-fn] +of the Zig Standard Library. + +[^fence-fn]: . + +There are a lot of things in this function that we haven't talked about yet, such as: +what `comptime` means? `inline`? `extern`? What is this star symbol before `Self`? +Let's just ignore all of these things, and focus solely on the switch statement +that is inside this function. + +We can see that this switch statement uses the `order` object as input. This `order` +object is one of the inputs of this `fence()` function, and we can see in the type annotation, +that this object is of type `AtomicOrder`. We can also see a bunch of values inside the +switch statements that begins with a dot character, such as `.release` and `.acquire`. + +Because these weird values contain a dot character before them, we are asking the `zig` +compiler to infer the types of these values inside the switch statement. Then, the `zig` +compiler is looking into the current context where these values are being used, and it is +trying to infer the types of these values. + +Since they are being used inside a switch statement, the `zig` compiler looks into the type +of the input object given to the switch statement, which is the `order` object in this case. +Because this object have type `AtomicOrder`, the `zig` compiler infers that these values +are data members from this type `AtomicOrder`. + +```{zig} +#| eval: false +pub inline fn fence(self: *Self, comptime order: AtomicOrder) void { + // LLVM's ThreadSanitizer doesn't support the normal fences so we specialize for it. + if (builtin.sanitize_thread) { + const tsan = struct { + extern "c" fn __tsan_acquire(addr: *anyopaque) void; + extern "c" fn __tsan_release(addr: *anyopaque) void; + }; + + const addr: *anyopaque = self; + return switch (order) { + .unordered, .monotonic => @compileError(@tagName(order) ++ " only applies to atomic loads and stores"), + .acquire => tsan.__tsan_acquire(addr), + .release => tsan.__tsan_release(addr), + .acq_rel, .seq_cst => { + tsan.__tsan_acquire(addr); + tsan.__tsan_release(addr); + }, + }; + } + + return @fence(order); +} +``` + +This is how basic type inference is done in Zig. If we didn't use the dot character before +the values inside this switch statement, then, we would be forced to write explicitly +the data types of these values. For example, instead of writing `.release` we would have to +write `AtomicOrder.release`. We would have to do this for every single value +in this switch statement, and this is a lot of work. That is why type inference +is commonly used on switch statements in Zig. + + + +## Modules + +We already talked about what modules are, and also, how to import other modules into +you current module through *import statements*, so that you can use functionality from these other modules in +your current module. +But in this section, I just want to make it clear that modules are actually structs in Zig. + +In other words, every Zig module (i.e. a `.zig` file) that you write in your project +is internally stored as a struct object. +Take the line exposed below as an example. In this line we are importing the +Zig Standard Library into our current module. + +```zig +const std = @import("std"); +``` + +When we want to access the functions and objects from the standard library, we +are basically accessing the data members of the struct stored in the `std` +object. That is why we use the same syntax that we use in normal structs, with the dot operator (`.`) +to access the data members and methods of the struct. + +When this "import statement" get's executed, the result of this expression is a struct +object that contains the Zig Standard Library modules, global variables, functions, etc. +And this struct object get's saved (or stored) inside the constant object named `std`. + + +Take the [`thread_pool.zig` module from the project `zap`](https://github.com/kprotty/zap/blob/blog/src/thread_pool.zig)[^thread] +as an example. This module is written as if it was +a big struct. That is why we have a top-level and public `init()` method +written in this module. The idea is that all top-level functions written in this +module are methods from the struct, and all top-level objects and struct declarations +are data members of this struct. The module is the struct itself. + +[^thread]: + + +So you would import and use this module by doing something like this: + +```zig +const std = @import("std"); +const ThreadPool = @import("thread_pool.zig"); +const num_cpus = std.Thread.getCpuCount() + catch @panic("failed to get cpu core count"); +const num_threads = std.math.cast(u16, num_cpus) + catch std.math.maxInt(u16); +const pool = ThreadPool.init( + .{ .max_threads = num_threads } +); +``` diff --git a/Chapters/03-unittests.qmd b/Chapters/03-unittests.qmd index 869138ea..8d40e85c 100644 --- a/Chapters/03-unittests.qmd +++ b/Chapters/03-unittests.qmd @@ -1,351 +1,351 @@ ---- -engine: knitr -knitr: true -syntax-definition: "../Assets/zig.xml" ---- - -```{r} -#| include: false -source("../zig_engine.R") -knitr::opts_chunk$set( - auto_main = FALSE, - build_type = "lib" -) -``` - - - -# Unit tests {#sec-unittests} - -In this chapter, I want to dive in on how unit tests are developed in -Zig. We are going to talk about what is the testing wokflow in Zig, and -also, about the `test` command from the `zig` compiler. - - -## Introducing the `test` block - -In Zig, unit tests are written inside a `test` declaration, or, how I prefer to call it, inside a `test` block. -Every `test` block is written by using the keyword `test`. -You can optionally use a string literal to write a label, which is responsible for identifying -the specific group of unit tests that you are writing inside this specific `test` block. - -In the example below, we are testing if the sum of two objects (`a` and `b`) -is equal to 4. The `expect()` function from the Zig Standard Library -is a function that receives a logical test as input. If this logical test -results in `true`, then, the test passes. But if it results -in `false`, then, the test fails. - -You can write any Zig code you want inside of each `test` block. -Part of this code might be some necessary commands to setup your testing -environment, or just initializing some necessary objects. - -```{zig} -#| build_type: "test" -const std = @import("std"); -const expect = std.testing.expect; -test "testing simple sum" { - const a: u8 = 2; - const b: u8 = 2; - try expect((a + b) == 4); -} -``` - -You can have multiple `test` blocks written on the same Zig module. -Also, you can mix `test` blocks with your source code, with no problems -or consequences. If you mix `test` blocks with your normal source code, -when you execute the `build`, `build-exe`, `build-obj` or `build-lib` commands from the -`zig` compiler that we exposed at @sec-compile-code, these `test` blocks are automatically ignored by the -compiler. - -In other words, the `zig` compiler only builds and execute your tests -when you ask it to. By default, the compiler always ignore `test` -blocks written in your Zig modules. The compiler normally checks only if -there are any syntax errors in these `test` blocks. - -If you look at the source code for most of the files present in the -Zig Standard Library[^zig-std-lib], you can see that the `test` blocks -are written together with the normal source code of the library. -You can see this for example, at the [`array_list` module](https://github.com/ziglang/zig/blob/master/lib/std/array_list.zig)[^zig-array]. -So, the standard that the Zig developers decided to adopt -is to keep their unit tests together with the source code -of the functionality that they are testing. - -Each programmer might have a different opinion on this. -Some of them might prefer to keep unit tests separate from the actual -source code of their application. If that is your case, you can -simply create a separate `tests` folder in your project, and -start writing Zig modules that contains only unit tests (as would normally do on a Python project with `pytest`, for example), -and everything will work fine. -It boils down to which is your preference here. - -[^zig-std-lib]: -[^zig-array]: - - -## How to run your tests - -If the `zig` compiler ignores any `test` block by default, how can -you compile and run your unit tests? The answer is the `test` command from -the `zig` compiler. By running `zig test` command, the compiler will -find every instance of `test` block in your Zig module, and, it will -compile and run the unit tests you wrote. - - -```bash -zig test simple_sum.zig -``` - -``` -1/1 simple_sum.test.testing simple sum... OK -All 1 tests passed. -``` - - -## Testing memory allocations - -One of the advantages of Zig is that it offers great tools -that hep us, programmers, to avoid (but also detect) memory problems, such as -memory leaks and double-frees. The `defer` keyword -is specially helpful in this regard. - -When developing your -source code, you, the programmer, is responsible for making -sure that your code do not produce such problems. However, -you can also use a special type of allocator object in Zig, -that is capable of automatically detect such problems for you. -This is the `std.testing.allocator` object. -This allocator object offers some basic memory safety detection -features, which are capable of detecting memory leaks. - -As we described at @sec-heap, to allocate memory on the heap, you need -to use an allocator object, and your functions that use these objects to allocate memory -on the heap, should receive an allocator object as one of it's inputs. -Every memory on the heap that you allocate using these allocator objects, -must also be freed using this same allocator object. - -So, if you want to test the memory allocations performed by your functions, -and make sure that you don't have problems in these allocations, you can simply -write unit tests for these functions, where you provide the -`std.testing.allocator` object as input to these functions. - -Look at the example below, where I'm defining a function that clearly causes -a memory leak. Because we allocate memory with the allocator object, -but we do not free this allocated memory in any point. So, when the function -returns, we lose the reference to the `buffer` object, which contains -the allocated memory, and, as a result, we can no longer free this memory. - -Notice that, inside a `test` block I execute this function with -the `std.testing.allocator`. Since no visible errors were raised inside -the `test` block, the `zig` compiler completes the process indicating that -the unit tests performed inside the `test` block labeled as `"memory leak` -have all passed. But despite this result, the allocator object was capable -of looking deeper in our program, and detecting the memory leak. As a result, -this allocator object returns a message "tests leaked memory", and also, -a stack trace showing the exact point where the memory was leaked. - -```{zig} -#| eval: false -const std = @import("std"); -const Allocator = std.mem.Allocator; -fn some_memory_leak(allocator: Allocator) !void { - const buffer = try allocator.alloc(u32, 10); - _ = buffer; - // Return without freeing the - // allocated memory -} - -test "memory leak" { - const allocator = std.testing.allocator; - try some_memory_leak(allocator); -} -``` - -``` -Test [1/1] leak_memory.test.memory leak... - [gpa] (err): memory address 0x7c1fddf39000 leaked: -./ZigExamples/debugging/leak_memory.zig:4:39: 0x10395f2 - const buffer = try allocator.alloc(u32, 10); - ^ -./ZigExamples/debugging/leak_memory.zig:12:25: 0x10398ea - try some_memory_leak(allocator); - -... more stack trace -``` - - -## Testing errors - -One commom style of unit tests are those that look for -specific errors in your functions. In other words, you write -a unit test that tries to assert if a specific function call -returns any error, or a specific type of error. - -In C++ you would normally write this stye of unit test using, for example, -the functions `REQUIRE_THROWS()` or `CHECK_THROWS()` from the [`Catch2` test framework](https://github.com/catchorg/Catch2/tree/devel)[^catch2]. -In the case of a Python project, you would use the -[`raises()` function from `pytest`](https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises)[^pytest]. -While in Rust, you would probably use `assert_eq!()` in conjunction with `Err()`. - -[^pytest]: -[^catch2]: - - -But in Zig, we use the `expectError()` function, from the `std.testing` module. -With this function, you can test if a specific function call returns the exact -type of error that you expect it to return. To use this function, you first write -`try expectError()`. Then, on the first argument, you provide the type of error that you -are expecting from the function call. Then, on the second argument, you write -the function call you expect to fail. - -The code example below demonstrates such type of unit test in Zig. -Notice that, inside the function `alloc_error()` we are allocating -100 bytes of memory, or, an array of 100 elements, for the object `ibuffer`. However, -in the `test` block, we are using the `FixedBufferAllocator()` -allocator object, which is limited to 10 bytes of space, because -the object `buffer`, which we provided to the allocator object, -have only 10 bytes of space. - -That is why, the `alloc_error()` function raises an `OutOfMemory` error -on this case. -Because this function is trying to allocate more space than the allocator -object allows. -So, in essence, we are testing for a specific type of error, -which is `OutOfMemory`. If the `alloc_error()` function returns any other type of error, -then, the `expectError()` function would make the entire test fail. - - -```{zig} -#| build_type: "test" -const std = @import("std"); -const Allocator = std.mem.Allocator; -const expectError = std.testing.expectError; -fn alloc_error(allocator: Allocator) !void { - var ibuffer = try allocator.alloc(u8, 100); - defer allocator.free(ibuffer); - ibuffer[0] = 2; -} - -test "testing error" { - var buffer: [10]u8 = undefined; - var fba = std.heap.FixedBufferAllocator.init(&buffer); - const allocator = fba.allocator(); - try expectError(error.OutOfMemory, alloc_error(allocator)); -} -``` - -``` -1/1 oom.test.testing error... OK -All 1 tests passed. -``` - - - -## Testing simple equalities - -In Zig, there are some different ways you can test for an equality. -You already saw that we can use `expect()` with the logical operator `==` -to essentially reproduce an equality test. But we also have -some helper functions that you should know about, specially -`expectEqual()`, `expectEqualSlices()` and `expectEqualStrings()`. - - -The `expectEqual()` function, as the name suggests, is a classic -test equality function. It receives two objects as input. The first -object is the value that you expect to be in the second object. -While second object is the object you have, or, the object that your application -produced as result. So, with `expectEqual()` you are essentially -testing if the values stored inside the two provided objects -are equal or not. - -You can see in the example below that, the test performed by -`expectEqual()` failed. Because the objects `v1` and `v2` contain -different values in them. - -```{zig} -#| eval: false -const std = @import("std"); -test "values are equal?" { - const v1 = 15; - const v2 = 18; - try std.testing.expectEqual(v1, v2); -} -``` - -``` -1/1 ve.test.values are equal?... - expected 15, found 18 - FAIL (TestExpectedEqual) -ve.zig:5:5: test.values are equal? (test) - try std.testing.expectEqual(v1, v2); - ^ -0 passed; 0 skipped; 1 failed. -``` - - -Although useful, the `expectEqual()` function does not work with arrays. -For testing if two arrays are equal, you should use the `expectEqualSlices()` -function instead. This function have three arguments. First, you provide -the data type contained in both arrays that you are trying to compare. -While the second and third arguments corresponds to the array objects that you want to compare. - -In the example below, we are using this function to test if two array -objects (`array1` and `array2`) are equal or not. Since they -are in fact equal, the unit test passed with no errors. - -```{zig} -#| build_type: "test" -const std = @import("std"); -test "arrays are equal?" { - const array1 = [3]u32{1, 2, 3}; - const array2 = [3]u32{1, 2, 3}; - try std.testing.expectEqualSlices( - u32, &array1, &array2 - ); -} -``` - -``` -1/1 oom.test.arrays are equal?... OK -All 1 tests passed. -``` - - -At last, you might also want to use the `expectEqualStrings()` function. -As the name suggests, you can use this function to test if two strings -are equal or not. Just provide the two string objects that you want to compare, -as inputs to the functions. - -If the function finds any existing difference between the two strings, -then, the function will raise an error, and also, print an error message -that shows the exact difference between the two string objects provided, -as the example below demonstrates: - - -```{zig} -#| eval: false -const std = @import("std"); -test "strings are equal?" { - const str1 = "hello, world!"; - const str2 = "Hello, world!"; - try std.testing.expectEqualStrings( - str1, str2 - ); -} -``` - -``` -1/1 t.test.strings are equal?... -====== expected this output: ========= -hello, world!␃ -======== instead found this: ========= -Hello, world!␃ -====================================== -First difference occurs on line 1: -expected: -hello, world! -^ ('\x68') -found: -Hello, world! -^ ('\x48') -``` - +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + +# Unit tests {#sec-unittests} + +In this chapter, I want to dive in on how unit tests are developed in +Zig. We are going to talk about what is the testing wokflow in Zig, and +also, about the `test` command from the `zig` compiler. + + +## Introducing the `test` block + +In Zig, unit tests are written inside a `test` declaration, or, how I prefer to call it, inside a `test` block. +Every `test` block is written by using the keyword `test`. +You can optionally use a string literal to write a label, which is responsible for identifying +the specific group of unit tests that you are writing inside this specific `test` block. + +In the example below, we are testing if the sum of two objects (`a` and `b`) +is equal to 4. The `expect()` function from the Zig Standard Library +is a function that receives a logical test as input. If this logical test +results in `true`, then, the test passes. But if it results +in `false`, then, the test fails. + +You can write any Zig code you want inside of each `test` block. +Part of this code might be some necessary commands to setup your testing +environment, or just initializing some necessary objects. + +```{zig} +#| build_type: "test" +const std = @import("std"); +const expect = std.testing.expect; +test "testing simple sum" { + const a: u8 = 2; + const b: u8 = 2; + try expect((a + b) == 4); +} +``` + +You can have multiple `test` blocks written on the same Zig module. +Also, you can mix `test` blocks with your source code, with no problems +or consequences. If you mix `test` blocks with your normal source code, +when you execute the `build`, `build-exe`, `build-obj` or `build-lib` commands from the +`zig` compiler that we exposed at @sec-compile-code, these `test` blocks are automatically ignored by the +compiler. + +In other words, the `zig` compiler only builds and execute your tests +when you ask it to. By default, the compiler always ignore `test` +blocks written in your Zig modules. The compiler normally checks only if +there are any syntax errors in these `test` blocks. + +If you look at the source code for most of the files present in the +Zig Standard Library[^zig-std-lib], you can see that the `test` blocks +are written together with the normal source code of the library. +You can see this for example, at the [`array_list` module](https://github.com/ziglang/zig/blob/master/lib/std/array_list.zig)[^zig-array]. +So, the standard that the Zig developers decided to adopt +is to keep their unit tests together with the source code +of the functionality that they are testing. + +Each programmer might have a different opinion on this. +Some of them might prefer to keep unit tests separate from the actual +source code of their application. If that is your case, you can +simply create a separate `tests` folder in your project, and +start writing Zig modules that contains only unit tests (as would normally do on a Python project with `pytest`, for example), +and everything will work fine. +It boils down to which is your preference here. + +[^zig-std-lib]: +[^zig-array]: + + +## How to run your tests + +If the `zig` compiler ignores any `test` block by default, how can +you compile and run your unit tests? The answer is the `test` command from +the `zig` compiler. By running `zig test` command, the compiler will +find every instance of `test` block in your Zig module, and, it will +compile and run the unit tests you wrote. + + +```bash +zig test simple_sum.zig +``` + +``` +1/1 simple_sum.test.testing simple sum... OK +All 1 tests passed. +``` + + +## Testing memory allocations + +One of the advantages of Zig is that it offers great tools +that hep us, programmers, to avoid (but also detect) memory problems, such as +memory leaks and double-frees. The `defer` keyword +is specially helpful in this regard. + +When developing your +source code, you, the programmer, is responsible for making +sure that your code do not produce such problems. However, +you can also use a special type of allocator object in Zig, +that is capable of automatically detect such problems for you. +This is the `std.testing.allocator` object. +This allocator object offers some basic memory safety detection +features, which are capable of detecting memory leaks. + +As we described at @sec-heap, to allocate memory on the heap, you need +to use an allocator object, and your functions that use these objects to allocate memory +on the heap, should receive an allocator object as one of it's inputs. +Every memory on the heap that you allocate using these allocator objects, +must also be freed using this same allocator object. + +So, if you want to test the memory allocations performed by your functions, +and make sure that you don't have problems in these allocations, you can simply +write unit tests for these functions, where you provide the +`std.testing.allocator` object as input to these functions. + +Look at the example below, where I'm defining a function that clearly causes +a memory leak. Because we allocate memory with the allocator object, +but we do not free this allocated memory in any point. So, when the function +returns, we lose the reference to the `buffer` object, which contains +the allocated memory, and, as a result, we can no longer free this memory. + +Notice that, inside a `test` block I execute this function with +the `std.testing.allocator`. Since no visible errors were raised inside +the `test` block, the `zig` compiler completes the process indicating that +the unit tests performed inside the `test` block labeled as `"memory leak` +have all passed. But despite this result, the allocator object was capable +of looking deeper in our program, and detecting the memory leak. As a result, +this allocator object returns a message "tests leaked memory", and also, +a stack trace showing the exact point where the memory was leaked. + +```{zig} +#| eval: false +const std = @import("std"); +const Allocator = std.mem.Allocator; +fn some_memory_leak(allocator: Allocator) !void { + const buffer = try allocator.alloc(u32, 10); + _ = buffer; + // Return without freeing the + // allocated memory +} + +test "memory leak" { + const allocator = std.testing.allocator; + try some_memory_leak(allocator); +} +``` + +``` +Test [1/1] leak_memory.test.memory leak... + [gpa] (err): memory address 0x7c1fddf39000 leaked: +./ZigExamples/debugging/leak_memory.zig:4:39: 0x10395f2 + const buffer = try allocator.alloc(u32, 10); + ^ +./ZigExamples/debugging/leak_memory.zig:12:25: 0x10398ea + try some_memory_leak(allocator); + +... more stack trace +``` + + +## Testing errors + +One commom style of unit tests are those that look for +specific errors in your functions. In other words, you write +a unit test that tries to assert if a specific function call +returns any error, or a specific type of error. + +In C++ you would normally write this stye of unit test using, for example, +the functions `REQUIRE_THROWS()` or `CHECK_THROWS()` from the [`Catch2` test framework](https://github.com/catchorg/Catch2/tree/devel)[^catch2]. +In the case of a Python project, you would use the +[`raises()` function from `pytest`](https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises)[^pytest]. +While in Rust, you would probably use `assert_eq!()` in conjunction with `Err()`. + +[^pytest]: +[^catch2]: + + +But in Zig, we use the `expectError()` function, from the `std.testing` module. +With this function, you can test if a specific function call returns the exact +type of error that you expect it to return. To use this function, you first write +`try expectError()`. Then, on the first argument, you provide the type of error that you +are expecting from the function call. Then, on the second argument, you write +the function call you expect to fail. + +The code example below demonstrates such type of unit test in Zig. +Notice that, inside the function `alloc_error()` we are allocating +100 bytes of memory, or, an array of 100 elements, for the object `ibuffer`. However, +in the `test` block, we are using the `FixedBufferAllocator()` +allocator object, which is limited to 10 bytes of space, because +the object `buffer`, which we provided to the allocator object, +have only 10 bytes of space. + +That is why, the `alloc_error()` function raises an `OutOfMemory` error +on this case. +Because this function is trying to allocate more space than the allocator +object allows. +So, in essence, we are testing for a specific type of error, +which is `OutOfMemory`. If the `alloc_error()` function returns any other type of error, +then, the `expectError()` function would make the entire test fail. + + +```{zig} +#| build_type: "test" +const std = @import("std"); +const Allocator = std.mem.Allocator; +const expectError = std.testing.expectError; +fn alloc_error(allocator: Allocator) !void { + var ibuffer = try allocator.alloc(u8, 100); + defer allocator.free(ibuffer); + ibuffer[0] = 2; +} + +test "testing error" { + var buffer: [10]u8 = undefined; + var fba = std.heap.FixedBufferAllocator.init(&buffer); + const allocator = fba.allocator(); + try expectError(error.OutOfMemory, alloc_error(allocator)); +} +``` + +``` +1/1 oom.test.testing error... OK +All 1 tests passed. +``` + + + +## Testing simple equalities + +In Zig, there are some different ways you can test for an equality. +You already saw that we can use `expect()` with the logical operator `==` +to essentially reproduce an equality test. But we also have +some helper functions that you should know about, specially +`expectEqual()`, `expectEqualSlices()` and `expectEqualStrings()`. + + +The `expectEqual()` function, as the name suggests, is a classic +test equality function. It receives two objects as input. The first +object is the value that you expect to be in the second object. +While second object is the object you have, or, the object that your application +produced as result. So, with `expectEqual()` you are essentially +testing if the values stored inside the two provided objects +are equal or not. + +You can see in the example below that, the test performed by +`expectEqual()` failed. Because the objects `v1` and `v2` contain +different values in them. + +```{zig} +#| eval: false +const std = @import("std"); +test "values are equal?" { + const v1 = 15; + const v2 = 18; + try std.testing.expectEqual(v1, v2); +} +``` + +``` +1/1 ve.test.values are equal?... + expected 15, found 18 + FAIL (TestExpectedEqual) +ve.zig:5:5: test.values are equal? (test) + try std.testing.expectEqual(v1, v2); + ^ +0 passed; 0 skipped; 1 failed. +``` + + +Although useful, the `expectEqual()` function does not work with arrays. +For testing if two arrays are equal, you should use the `expectEqualSlices()` +function instead. This function have three arguments. First, you provide +the data type contained in both arrays that you are trying to compare. +While the second and third arguments corresponds to the array objects that you want to compare. + +In the example below, we are using this function to test if two array +objects (`array1` and `array2`) are equal or not. Since they +are in fact equal, the unit test passed with no errors. + +```{zig} +#| build_type: "test" +const std = @import("std"); +test "arrays are equal?" { + const array1 = [3]u32{1, 2, 3}; + const array2 = [3]u32{1, 2, 3}; + try std.testing.expectEqualSlices( + u32, &array1, &array2 + ); +} +``` + +``` +1/1 oom.test.arrays are equal?... OK +All 1 tests passed. +``` + + +At last, you might also want to use the `expectEqualStrings()` function. +As the name suggests, you can use this function to test if two strings +are equal or not. Just provide the two string objects that you want to compare, +as inputs to the functions. + +If the function finds any existing difference between the two strings, +then, the function will raise an error, and also, print an error message +that shows the exact difference between the two string objects provided, +as the example below demonstrates: + + +```{zig} +#| eval: false +const std = @import("std"); +test "strings are equal?" { + const str1 = "hello, world!"; + const str2 = "Hello, world!"; + try std.testing.expectEqualStrings( + str1, str2 + ); +} +``` + +``` +1/1 t.test.strings are equal?... +====== expected this output: ========= +hello, world!␃ +======== instead found this: ========= +Hello, world!␃ +====================================== +First difference occurs on line 1: +expected: +hello, world! +^ ('\x68') +found: +Hello, world! +^ ('\x48') +``` + diff --git a/Chapters/10-stack-project.qmd b/Chapters/10-stack-project.qmd index 5296cbff..5924b140 100644 --- a/Chapters/10-stack-project.qmd +++ b/Chapters/10-stack-project.qmd @@ -1,714 +1,714 @@ ---- -engine: knitr -knitr: true -syntax-definition: "../Assets/zig.xml" ---- - -```{r} -#| include: false -source("../zig_engine.R") -knitr::opts_chunk$set( - auto_main = FALSE, - build_type = "lib" -) -``` - - - -# Project 3 - Building a stack data structure - -In this chapter we are going to implement a stack data structure as our next small project -in this book. Implementing basic data structures in any language is kind of a -"kindergarten task" (if this term even exist) in computer science (CS), because -we normally learn and implement them in the first semesters of CS. - -But this is actually good! Since this should be a very easy task, we don't need much to explain -what a stack is, then, we can concentrate on what is really important here, which is learning -how the concept of "generics" are implemented in the Zig language, and how one of the key -features of Zig, which is comptime, works, and use the stack data structure to demonstrate -these concepts on the fly. - -But before we get into building the stack data structure, we first need to understand -what the `comptime` keyword does to your code, and after that, we also need to learn about -how generics work in Zig. - - -## Understanding `comptime` in Zig {#sec-comptime} - -One of the key features of Zig is `comptime`. This keyword introduces a whole -new concept and paradigm, that is tightly connected with the compilation process. -At @sec-compile-time we described the importance and the role that "compile-time vs runtime" -plays into Zig. At that section, we learned that the rules applied to a value/object change -a lot depending on whether this value is known at compile-time, or just at runtime. - -The `comptime` keyword is strongly related to these two spaces of time in the compilation process, -i.e. the compile-time and the runtime. Let's quickly recap the differences. Compile-time is the period of time when your -Zig source code is being compiled by the `zig` compiler, while the runtime is -the period of time when your Zig program is being executed, i.e. when we execute -the binary files outputted by the `zig` compiler. - -There are three ways in which you can apply the `comptime` keyword, which are: - -- apply `comptime` on a function argument. -- apply `comptime` on an object. -- apply `comptime` on a block of expressions. - - - -### Applying over a function argument - -When you apply the `comptime` on a function argument you are saying to the `zig` compiler -that the value assigned to that particular argument of the function must be known at compile-time. -We explained in details at @sec-compile-time what exactly "value known at compile-time" means, so, -in case you have doubts about this idea, comeback to that section. - -Now let's think about the consequences of this idea. First of all, we are imposing a limit, or, a requirement -to that particular function argument. If the programmer accidentally tries to give a value to this -function argument that is not known at compile time, the `zig` compiler will notice this problem, and -as a consequence, it will raise a compilation error saying that it cannot compile your program. Because -you are providing a value that is "runtime known" to a function argument that must be "compile-time known". - -Take a look at this very simple example below, where we define a `double()` function, that simply -doubles the input value named `num`. Notice that, we use the `comptime` keyword before the name -of the function argument. This keyword is marking the function argument `num` as a "comptime argument". - -That is a function argument whose value must be compile-time known. This is why the expression `double(5678)` -is valid, and no compilation errors are raised. -Because the value `5678` is compile-time known, so this is the expected behaviour for this function. - -```{zig} -#| auto_main: false -#| build_type: "test" -fn double(comptime num: u32) u32 { - return num * 2; -} -test "test comptime" { - _ = double(5678); -} -``` - -But what if we provide a number that is not compile-time known to this function? -For example, we might provide a different input value to this function depending -on the target OS of our compilation process. The code example delow demonstrates such case. - -Because the value of the object `n` is determined at runtime, we cannot provide this object -as input to the `double()` function. The `zig` compiler will not allow it, because we marked -the `num` argument as a "comptime argument". That is why the `zig` compiler raises -the compile-time error exposed below: - -```{zig} -#| eval: false -const builtin = @import("builtin"); -fn double(comptime num: u32) u32 { - return num * 2; -} -test "test comptime" { - var n: u32 = undefined; - if (builtin.target.os.tag == .windows) { - n = 1234; - } else { - n = 5678; - } - _ = double(n); -} -``` - -``` -t.zig:12:16: error: runtime-known argument passed to comptime parameter -``` - -Comptime arguments are frequently used on functions that return some sort -of generic structure. In fact, `comptime` is the essence (or the basis) to make generics in Zig. -We are going to talk more about generics at @sec-generics. - -For now, let's take a look at this code example from @karlseguin_generics. You -can see that this `IntArray()` function have one argument named `length`. -This argument is marked as comptime, and receives a value of type `usize` as input. So the value given to this argument -must be compile-time known. -We can also see that this function returns an array of `i64` values as output. - -```{zig} -#| eval: false -fn IntArray(comptime length: usize) type { - return [length]i64; -} -``` - -Now, the key component of this function is the `length` argument. This argument -is used to determine the size of the array that is produced by the function. Let's -think about the consequences of that. If the size of the array is dependent on -the value assigned to the `length` argument, this means that the data type of the -output of the function depends on the value of this `length` argument. - -Let this statement sink for a bit in your mind. As I described at @sec-root-file, -Zig is a strongly-typed language, specially on function declarations. -So every time we write a function in Zig, we have to annotate the data type of -the value returned by the function. But how can we do that, if this data type -depends on the value given to the argument of the function? - -Think about this for a second. If `length` is equal to 3 for example, then, the -return type of the function is `[3]i64`. But if `length` is equal to 40, then, -the return type becomes `[40]i64`. At this point the `zig` compiler would be confused, -and raise a compilation error, saying something like this: - -> Hey! You have annotated that this function should return a `[3]i64` value, but I got a `[40]i64` value instead! This doesn't look right! - -So how can you solve this problem? How do we overcome this barrier? This is when -the `type` keyword comes in. This `type` keyword is basically saying to the -`zig` compiler that this function will return some type, but it doesn't know yet -what exactly type that is. We will talk more about this at @sec-generics. - - - -### Applying over an expression - -When you apply the `comptime` keyword over an expression, then, it is garanteed that the `zig` compiler will -execute this expression at compile-time. If for some reason, this expression cannot be executed at compile-time -(e.g. for example, maybe this expression depends on a value that is only known at runtime), then, the `zig` compiler -will raise a compilation error. - -Take this example from the official documentation of Zig [@zigdocs]. We -are executing the same `fibonacci()` function both at runtime, and, at compile-time. -The function is by default executed at runtime, but because we use the `comptime` -keyword at the second "try expression", this expression is executed at compile-time. - -This might be a bit confusing for some people. Yes! When I say that this expression -is executed at compile-time, I mean that this expression is compiled and executed -while the `zig` compiler is compiling your Zig source code. - -If you think hard enough about how the C and C++ macros work, you can reach the conclusion -that they are also executed at compile-time. Because C and C++ macros are -just simple text replacements operations, that are performed at compile-time to form the final version -of the C or C++ source code that get's sent to the compiler. - -```{zig} -#| auto_main: false -#| build_type: "test" -const expect = @import("std").testing.expect; -fn fibonacci(index: u32) u32 { - if (index < 2) return index; - return fibonacci(index - 1) + fibonacci(index - 2); -} - -test "fibonacci" { - // test fibonacci at run-time - try expect(fibonacci(7) == 13); - // test fibonacci at compile-time - try comptime expect(fibonacci(7) == 13); -} -``` - -A lot of your Zig source code might be potentially executed at compile-time, -because the `zig` compiler can figure it out the result value of some types of expressions. -Specially if these expressions involves only compile-time known values. -We have talked about this at @sec-compile-time. - -But when you use the `comptime` keyword on an expression, there is no "it might be executed -at compile-time" anymore. With the `comptime` keyword you are ordering the `zig` compiler -to execute this expression at compile-time. You are imposing this rule, it is garanteed -that the compiler will always execute it at compile-time. Or it at least, the compiler -will try to execute it. If the compiler cannot execute the expression for whatever reason, -the compiler will raise a compilation error. - - -### Applying over a block - -Blocks were described at @sec-blocks. When you apply the `comptime` keyword over a -block of expressions, you get essentially the same effect when you apply this keyword to -a single expression. That is, the entire block of expressions is executed at -compile-time by the `zig` compiler. - -In the example below, we mark the block labeled of `blk` as a comptime block, -and, therefore, the expressions inside this block are executed at compile-time. - -```{zig} -#| auto_main: false -#| build_type: "test" -const expect = @import("std").testing.expect; -fn fibonacci(index: u32) u32 { - if (index < 2) return index; - return fibonacci(index - 1) + fibonacci(index - 2); -} - -test "fibonacci in a block" { - const x = comptime blk: { - const n1 = 5; - const n2 = 2; - const n3 = n1 + n2; - try expect(fibonacci(n3) == 13); - break :blk n3; - }; - _ = x; -} -``` - - - - - -## Introducing Generics {#sec-generics} - -First of all, what is a generic? Generic is the idea to allow a type -(`f64`, `u8`, `u32`, `bool`, and also, user-defined types, like the `User` struct -that we defined at @sec-structs-and-oop) to be a parameter to methods, classes and -interfaces [@geeks_generics]. In other words, a "generic" is a class (or a method) that can work -with multiple data types. - -For example, in Java, generics are created through the operator `<>`. With this operator, -a Java class is capable of receiving a data type as input, and therefore, the class can fit -it's features according to this input data type. -As another example, generics in C++ are supported through the concept of templates. -Class templates in C++ are generics. - -In Zig, generics are implemented through `comptime`. The `comptime` keyword -allows us to collect a data type at compile time, and pass this data type as -input to a piece of code. - - -### A generic function {#sec-generic-fun} - -Take the `max()` function exposed below as a first example. -This function is essentially a "generic function". -In this function, we have a comptime function argument named `T`. -Notice that this `T` argument have a data type of `type`. Weird right? This `type` keyword is the -"father of all types", or, "the type of types" in Zig. Because we used this `type` keyword in the `T` -argument, we are telling the `zig` compiler that this `T` argument will receive some data type -as input. - -Also notice the use of the `comptime` keyword in this argument. -As I described at @sec-comptime, every time you use this keyword in a function argument, -this means that the value of this argument must be known at compile-time. -This makes sense, right? Because there is no data type that you can make -that is known only at runtime. - -Think about this. Every data type that you will ever write is always -known at compile-time. Specially because data types are an essential -information for the compiler to actually compile your source code. -Having this in mind, makes sense to mark this argument as a comptime argument. - - -```{zig} -#| auto_main: false -#| build_type: "test" -fn max(comptime T: type, a: T, b: T) T { - return if (a > b) a else b; -} -``` - -Also notice that the value of the `T` argument is actually used -to define the data type of the other arguments of the function, `a` and `b`, and also at the -return type annotation of the function. -That is, the data type of these arguments (`a` and `b`), and, the return data type of the function itself, -are determined by the input value given to the `T` argument. - -As a result, we have a generic function that works with different data types. -For example, I can provide `u8` values to this `max()` function, and it will work as expected. -But if I provide `f64` values instead, it will also work as expected. -If I did not use a generic function, I would have to write a `max()` -for each one of the data types that I wanted to use. -This generic function provides a very useful shortcut for us. - -```{zig} -#| auto_main: false -#| build_type: "test" -const std = @import("std"); -fn max(comptime T: type, a: T, b: T) T { - return if (a > b) a else b; -} -test "test max" { - const n1 = max(u8, 4, 10); - std.debug.print("Max n1: {d}\n", .{n1}); - const n2 = max(f64, 89.24, 64.001); - std.debug.print("Max n2: {d}\n", .{n2}); -} -``` - -``` -Max n1: 10 -Max n2: 89.24 -``` - - - -### A generic data structure {#sec-generic-struct} - -Every data structure that you find in the Zig Standard Library (e.g. `ArrayList`, `HashMap`, etc.) -is essentially a generic data structure. -These data structures are generic in the sense that they work with any data type you want. -You just say which is the data type of the values that are going to be stored in this data -structure, and they just work as expected. - -A generic data structure in Zig is the way to replicate a generic class from Java, -or, a class template from C++. -But you may quest yourself: how do we build a generic data structure in Zig? - -The basic idea is to write a generic function that creates -the data structure definition for the specific type we want. -In other words, this generic function behaves as a "constructor". The function outputs the `struct` definition -that defines this data structure for a specific data type. - -To create such function, we need to add a comptime argument to this function that receives a data type -as input. We already learned how to do this at the previous section (@sec-generic-fun). - -I think the best way to demonstrate how to create a generic data structure is to actually write one. -This where we go to our next small project in this book. This one is a very small project, -which is to write a stack data structure. - - - - -## What is a stack? {#sec-what-stack} - -A stack data structure is a structure that follows a LIFO (*last in first out*) principle. -Only two operations are normally supported in a stack data structure, which are `push` and `pop`. -The `push` operation is used to add new values to the stack, while `pop` is used to remove -values from the stack. - -When people try to explain how the stack data structure works, the most common analogy -that people use is a stack of plates. Imagine that you have -a stack of plates, for example, a stack of 10 plates in your table. -Each plate represents a value that is currently stored in this stack. - -Currently we have a stack with 10 different values, or 10 different plates. Now, imagine that you want to -add a new plate (or a new value) to this stack, which translates to the `push` operation. -You would add this plate (or this value) by just putting the new plate -on the top of the stack. Then, you would increase the stack to 11 plates. - -But how would you remove plates (or remove values) from this stack (a.k.a. the `pop` operation) ? -To do that, we would have to remove the plate on the top of the stack, and, as a result, we would -have, once again, 10 plates in the stack. - -This demonstrates the LIFO concept, because the first plate in the stack, which is the plate -in the bottom of the stack is the last plate to get out of the stack. Think about it. In order -to remove this specific plate from the stack, we have to remove all plates in the -stack. So every operation in the stack, either insertion or deletion, is always made at the top of the stack. -The @fig-stack below exposes this logic visually: - -![A diagram of a stack structure. Source: Wikipedia, the free encyclopedia.](./../Figures/lifo-stack.svg){#fig-stack} - - - -## Writing the stack data structure - -We are going to write the stack data structure in two steps. First, we are going -to implement a stack that can only store `u32` values. Then, after that, we are going -to extend our implementation to make it generic, so that it works with any data type -we want. - -First, we need to decide how the values will be stored inside the stack. There are multiple -ways to implement the storage behind a stack structure. Some people prefer to use a doubly linked list, -some others prefer to use a dynamic array, etc. In this example we are going to use an array behind the hood, -to store the values in the stack, which is the `items` data member of our `Stack` struct definition. - -Also notice in our `Stack` struct that we have three other data members: `capacity`, `length` and `allocator`. -The `capacity` member contains the capacity of the underlying array that stores the values in the stack. -The `length` contains the number of values that are currently being stored in the stack. -And the `allocator` contains the allocator object which will be used by the stack structure whenever it -needs to allocate more space for the values that are being stored. - -We begin by defining an `init()` method of this struct, which is going to be -responsible for instantiating a `Stack` object. Notice that, inside this -`init()` method, we start by allocating an array with the capacity specified -in the `capacity` argument. Then, we use the built-in function `@memset()` -to set all elements of this new array that we have allocated to zero. - -In other words, this `@memset()` function is basically equivalent to the `memset()` -function from C Standard Library[^cmemset]. In this example, we are using this function to essentially -assign the value `0` to all elements of this array. We do that to initialize this memory that was allocated. -Uninitialized memory is a commom source of undefined behaviour, so, this step -is made for safety reasons. - -[^cmemset]: - - -```{zig} -#| auto_main: false -#| build_type: "lib" -const std = @import("std"); -const Allocator = std.mem.Allocator; - -const Stack = struct { - items: []u32, - capacity: usize, - length: usize, - allocator: Allocator, - - pub fn init(allocator: Allocator, capacity: usize) !Stack { - var buf = try allocator.alloc(u32, capacity); - @memset(buf[0..], 0); - return .{ - .items = buf[0..], - .capacity = capacity, - .length = 0, - .allocator = allocator, - }; - } -}; -``` - - -### Implementing the `push` operation - -Now that we have written the basic logic to create a new `Stack` object, -we can start writing the logic responsible for performing a push operation. -Remember, a push operation in a stack data structure is the operation -responsible for adding a new value to the stack. - -So how can we add a new value to the `Stack` object that we have? -The `push()` function exposed below is an answer to this question. -Remember from what we discussed at @sec-what-stack that values are always added to the top of the stack. -This means that this `push()` function must always find the element in the underlying array of the stack, -that currently represents the top position of the stack, and add this value there. - -First, we have an if statement in this function. This if statement is -checking whether we need to expand the underlying array to store -this new value that we are adding to the stack. In other words, maybe -the underlying array does not have enough capacity to store this new -value, and, in this case, we need to expand our array to get the capacity that we need. - -So, if the logic test in this if statement returns true, it means that the array -does not have enough capacity, and we need to expand it before we store this new value. -So inside this if statement we are executing the necessary expressions -to expand the underlying array. - -Notice that we use the allocator object to allocate a new array that is twice as bigger -than the current array (`self.capacity * 2`). Then, we use `memset()` once again to -set all fields in this array to zero, making sure that all memory is properly initialized. - -After that, we use a different built-in function named `@memcpy()`. This built-in function -is equivalent to the `memcpy()` function from the C Standard Library[^cmemcpy]. It is used to -copy the values from one block of memory to another block of memory. In other words, -you can use this function to copy the values from one array into another array. - -[^cmemcpy]: - -We are using this `@memcpy()` built-in function to copy the values that are currently -in the underlying array of the stack object (`self.items`) into our new and bigger array that -we have allocated (`new_buf`). After we execute this function, the `new_buf` contains a copy -of the values that are present at `self.items`. - -Now that we have secured a copy of our current values in the `new_buf` object, we -can now free the memory currently allocated at `self.items`. After that, we just need -to assign our new and bigger array to `self.items`. This is the sequence -of steps necessary to expand our array. - - -```{zig} -#| eval: false -pub fn push(self: *Stack, val: u32) !void { - if ((self.length + 1) > self.capacity) { - var new_buf = try self.allocator.alloc( - u32, self.capacity * 2 - ); - @memset(new_buf[0..], 0); - @memcpy( - new_buf[0..self.capacity], self.items - ); - self.allocator.free(self.items); - self.items = new_buf; - } - - self.items[self.length] = val; - self.length += 1; -} -``` - -After we make sure that we have enough room to store this new value -that we are adding to the stack, all we have to do is to assign -this value to the top element in this stack, and, increase the -value of the `length` attribute by one. We find the top element -in the stack by using the `length` attribute. - - - -### Implementing the `pop` operation - -Now, we can implement the pop operation of our stack object. -This is a much easier operation to implement, and the pop method below summarises -all the logic that is needed. - -We just have to find the element in the underlying array that currently represents the top -of the stack, and set this element to zero, to indicate that -this element is "empty". After that, we also need to decrease -the `length` attribute of the stack by one. - -If the current length of the stack is zero, it means that there is -no values being stored in the stack currently. So, in this case, -we could just return from the function and do nothing really. -This is what the if statement inside this function is checking for. - -```{zig} -#| eval: false -pub fn pop(self: *Stack) void { - if (self.length == 0) return; - - self.items[self.length - 1] = 0; - self.length -= 1; -} -``` - - - -### Implementing the `deinit` method - -We have implemented the methods responsible for the two main operations -associated with the stack data structure, which is `pop()` and `push()`, -and we also have implemented the method responsible for instantiating -a new `Stack` object, which is the `init()` method. - -But now, we need to implement also the method responsible for destroying -a `Stack` object. In Zig, this task is commonly associated with the method -named `deinit()`. Most struct objects in Zig have such method, and it -is commonly nicknamed of "the destructor method". - -In theory, all we have to do to destroy the `Stack` object is to make -sure that we free the allocated memory for the underlying array, using -the allocator object that is stored inside the `Stack` object. -This is what the `deinit()` method below is doing. - -```{zig} -#| eval: false -pub fn deinit(self: *Stack) void { - self.allocator.free(self.items); -} -``` - - - - -## Making it generic - -Now that we have implemented the basic skeleton of our stack data structure, -we can now focus on discussing how can we make it generic. How can we make -this basic skeleton to work not only with `u32` values, but also, with any other -data type we want? -For example, we might need to create a stack object to store `User` values -in it. How can we make this possible? The answer lies on the use of generics -and `comptime`. - -As I described at @sec-generic-struct, the basic idea is to write a generic -function that returns as result a struct definition as output. -In theory, we do not need much to transform our `Stack` struct into a generic -data structure. All that we need to do, to make our `Stack` struct -generic, is to transform the underlying array in the stack structure -into a generic array. - -In other words, this underlying array needs to be a "chameleon". It needs to adapt, -and transform it into an array of the data type that we give it to. For example, if we need to create -a stack that will store `u8` values, then, this underlying array needs to be -a `u8` array (i.e. `[]u8`). But if we need to store `User` values instead, then, -this array needs to be a `User` array (i.e. `[]User`). Etc. - -We do that by using a generic function. Because a generic function can receive a data type -as input, and we can pass this data type to the struct definition of our `Stack` object. -Therefore, we can use the generic function to create a `Stack` object that can store -the data type we want. If we want to create a stack structure that stores `User` values, -we pass the `User` data type to this generic function, and it will create for us -the struct definition that describes a `Stack` object that can store `User` values in it. - -Take the code example below as an example. I truncated some parts of the `Stack` struct -for brevity reasons. But if a specific part of our `Stack` struct is not exposed here -in this example, is because this part did not changed from the previous example. -It remains the same. - - - - -```{zig} -#| eval: false -fn Stack(comptime T: type) type { - return struct { - items: []T, - capacity: usize, - length: usize, - allocator: Allocator, - const Self = @This(); - - pub fn init(allocator: Allocator, capacity: usize) !Stack(T) { - var buf = try allocator.alloc(T, capacity); - @memset(buf[0..], 0); - return .{ - .items = buf[0..], - .capacity = capacity, - .length = 0, - .allocator = allocator, - }; - } - - pub fn push(self: *Self, val: T) !void { - // Truncate the rest of the struct - }; -} -``` - -Notice that we have created a function in this example named `Stack()`. This function -takes a type as input, and passes this type to the struct definition of our -`Stack` object. The data member `items` is now, an array of type `T`, which is the -data type that we passed as input to the function. The function argument -`val` in the `push()` function is now a value of type `T` too. - -Now we have finally a stack data structure that is generic. I mean, we probably still -need to adjust the `@memset()` calls, to trully make it generic for any data type. -But this version is good enough, and it really works well with any of the primitive data -types of Zig. - -We can just provide a data type to this function, and it will create a definition of a -`Stack` object that can store values of the data type that we provided. In the example below, we are creating -the definition of a -`Stack` object that can store `u8` values in it. This definition is stored at the `Stacku8` object. -This `Stacku8` object becomes our new struct, it is the struct that we are going to use -to create our `Stack` object. - - -```{zig} -#| eval: false -var gpa = std.heap.GeneralPurposeAllocator(.{}){}; -const allocator = gpa.allocator(); -const Stacku8 = Stack(u8); -var stack = try Stacku8.init(allocator, 10); -defer stack.deinit(); -try stack.push(1); -try stack.push(2); -try stack.push(3); -try stack.push(4); -try stack.push(5); -try stack.push(6); - -std.debug.print("Stack len: {d}\n", .{stack.length}); -std.debug.print("Stack capacity: {d}\n", .{stack.capacity}); - -stack.pop(); -std.debug.print("Stack len: {d}\n", .{stack.length}); -stack.pop(); -std.debug.print("Stack len: {d}\n", .{stack.length}); -std.debug.print("Stack state: {any}\n", .{stack.items}); -``` - -``` -Stack len: 6 -Stack capacity: 10 -Stack len: 5 -Stack len: 4 -Stack state: { 1, 2, 3, 4, 0, 0, 0, 0, 0, 0 } -``` - -Every generic data structure in the Zig Standard Library (`ArrayList`, `HashMap`, `SinlyLinkedList`, etc.) is implemented through this logic, -of using a generic function to create the struct definition that can work -with the data type that you provided as input. - - - - -## Conclusion - -The full source code of the stack structure discussed in this chapter is freely available at the official -repository for this book. Just checkout the [`stack.zig`](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/data-structures/stack.zig)[^zig-stack] -for the `u32` version of our stack, -and the [`generic_stack.zig`](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/data-structures/generic_stack.zig)[^zig-stack2] -for the generic version, -available inside the `ZigExamples` folder of the repository. - - -[^zig-stack]: -[^zig-stack2]: - +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + +# Project 3 - Building a stack data structure + +In this chapter we are going to implement a stack data structure as our next small project +in this book. Implementing basic data structures in any language is kind of a +"kindergarten task" (if this term even exist) in computer science (CS), because +we normally learn and implement them in the first semesters of CS. + +But this is actually good! Since this should be a very easy task, we don't need much to explain +what a stack is, then, we can concentrate on what is really important here, which is learning +how the concept of "generics" are implemented in the Zig language, and how one of the key +features of Zig, which is comptime, works, and use the stack data structure to demonstrate +these concepts on the fly. + +But before we get into building the stack data structure, we first need to understand +what the `comptime` keyword does to your code, and after that, we also need to learn about +how generics work in Zig. + + +## Understanding `comptime` in Zig {#sec-comptime} + +One of the key features of Zig is `comptime`. This keyword introduces a whole +new concept and paradigm, that is tightly connected with the compilation process. +At @sec-compile-time we described the importance and the role that "compile-time vs runtime" +plays into Zig. At that section, we learned that the rules applied to a value/object change +a lot depending on whether this value is known at compile-time, or just at runtime. + +The `comptime` keyword is strongly related to these two spaces of time in the compilation process, +i.e. the compile-time and the runtime. Let's quickly recap the differences. Compile-time is the period of time when your +Zig source code is being compiled by the `zig` compiler, while the runtime is +the period of time when your Zig program is being executed, i.e. when we execute +the binary files outputted by the `zig` compiler. + +There are three ways in which you can apply the `comptime` keyword, which are: + +- apply `comptime` on a function argument. +- apply `comptime` on an object. +- apply `comptime` on a block of expressions. + + + +### Applying over a function argument + +When you apply the `comptime` on a function argument you are saying to the `zig` compiler +that the value assigned to that particular argument of the function must be known at compile-time. +We explained in details at @sec-compile-time what exactly "value known at compile-time" means, so, +in case you have doubts about this idea, comeback to that section. + +Now let's think about the consequences of this idea. First of all, we are imposing a limit, or, a requirement +to that particular function argument. If the programmer accidentally tries to give a value to this +function argument that is not known at compile time, the `zig` compiler will notice this problem, and +as a consequence, it will raise a compilation error saying that it cannot compile your program. Because +you are providing a value that is "runtime known" to a function argument that must be "compile-time known". + +Take a look at this very simple example below, where we define a `double()` function, that simply +doubles the input value named `num`. Notice that, we use the `comptime` keyword before the name +of the function argument. This keyword is marking the function argument `num` as a "comptime argument". + +That is a function argument whose value must be compile-time known. This is why the expression `double(5678)` +is valid, and no compilation errors are raised. +Because the value `5678` is compile-time known, so this is the expected behaviour for this function. + +```{zig} +#| auto_main: false +#| build_type: "test" +fn double(comptime num: u32) u32 { + return num * 2; +} +test "test comptime" { + _ = double(5678); +} +``` + +But what if we provide a number that is not compile-time known to this function? +For example, we might provide a different input value to this function depending +on the target OS of our compilation process. The code example delow demonstrates such case. + +Because the value of the object `n` is determined at runtime, we cannot provide this object +as input to the `double()` function. The `zig` compiler will not allow it, because we marked +the `num` argument as a "comptime argument". That is why the `zig` compiler raises +the compile-time error exposed below: + +```{zig} +#| eval: false +const builtin = @import("builtin"); +fn double(comptime num: u32) u32 { + return num * 2; +} +test "test comptime" { + var n: u32 = undefined; + if (builtin.target.os.tag == .windows) { + n = 1234; + } else { + n = 5678; + } + _ = double(n); +} +``` + +``` +t.zig:12:16: error: runtime-known argument passed to comptime parameter +``` + +Comptime arguments are frequently used on functions that return some sort +of generic structure. In fact, `comptime` is the essence (or the basis) to make generics in Zig. +We are going to talk more about generics at @sec-generics. + +For now, let's take a look at this code example from @karlseguin_generics. You +can see that this `IntArray()` function have one argument named `length`. +This argument is marked as comptime, and receives a value of type `usize` as input. So the value given to this argument +must be compile-time known. +We can also see that this function returns an array of `i64` values as output. + +```{zig} +#| eval: false +fn IntArray(comptime length: usize) type { + return [length]i64; +} +``` + +Now, the key component of this function is the `length` argument. This argument +is used to determine the size of the array that is produced by the function. Let's +think about the consequences of that. If the size of the array is dependent on +the value assigned to the `length` argument, this means that the data type of the +output of the function depends on the value of this `length` argument. + +Let this statement sink for a bit in your mind. As I described at @sec-root-file, +Zig is a strongly-typed language, specially on function declarations. +So every time we write a function in Zig, we have to annotate the data type of +the value returned by the function. But how can we do that, if this data type +depends on the value given to the argument of the function? + +Think about this for a second. If `length` is equal to 3 for example, then, the +return type of the function is `[3]i64`. But if `length` is equal to 40, then, +the return type becomes `[40]i64`. At this point the `zig` compiler would be confused, +and raise a compilation error, saying something like this: + +> Hey! You have annotated that this function should return a `[3]i64` value, but I got a `[40]i64` value instead! This doesn't look right! + +So how can you solve this problem? How do we overcome this barrier? This is when +the `type` keyword comes in. This `type` keyword is basically saying to the +`zig` compiler that this function will return some type, but it doesn't know yet +what exactly type that is. We will talk more about this at @sec-generics. + + + +### Applying over an expression + +When you apply the `comptime` keyword over an expression, then, it is garanteed that the `zig` compiler will +execute this expression at compile-time. If for some reason, this expression cannot be executed at compile-time +(e.g. for example, maybe this expression depends on a value that is only known at runtime), then, the `zig` compiler +will raise a compilation error. + +Take this example from the official documentation of Zig [@zigdocs]. We +are executing the same `fibonacci()` function both at runtime, and, at compile-time. +The function is by default executed at runtime, but because we use the `comptime` +keyword at the second "try expression", this expression is executed at compile-time. + +This might be a bit confusing for some people. Yes! When I say that this expression +is executed at compile-time, I mean that this expression is compiled and executed +while the `zig` compiler is compiling your Zig source code. + +If you think hard enough about how the C and C++ macros work, you can reach the conclusion +that they are also executed at compile-time. Because C and C++ macros are +just simple text replacements operations, that are performed at compile-time to form the final version +of the C or C++ source code that get's sent to the compiler. + +```{zig} +#| auto_main: false +#| build_type: "test" +const expect = @import("std").testing.expect; +fn fibonacci(index: u32) u32 { + if (index < 2) return index; + return fibonacci(index - 1) + fibonacci(index - 2); +} + +test "fibonacci" { + // test fibonacci at run-time + try expect(fibonacci(7) == 13); + // test fibonacci at compile-time + try comptime expect(fibonacci(7) == 13); +} +``` + +A lot of your Zig source code might be potentially executed at compile-time, +because the `zig` compiler can figure it out the result value of some types of expressions. +Specially if these expressions involves only compile-time known values. +We have talked about this at @sec-compile-time. + +But when you use the `comptime` keyword on an expression, there is no "it might be executed +at compile-time" anymore. With the `comptime` keyword you are ordering the `zig` compiler +to execute this expression at compile-time. You are imposing this rule, it is garanteed +that the compiler will always execute it at compile-time. Or it at least, the compiler +will try to execute it. If the compiler cannot execute the expression for whatever reason, +the compiler will raise a compilation error. + + +### Applying over a block + +Blocks were described at @sec-blocks. When you apply the `comptime` keyword over a +block of expressions, you get essentially the same effect when you apply this keyword to +a single expression. That is, the entire block of expressions is executed at +compile-time by the `zig` compiler. + +In the example below, we mark the block labeled of `blk` as a comptime block, +and, therefore, the expressions inside this block are executed at compile-time. + +```{zig} +#| auto_main: false +#| build_type: "test" +const expect = @import("std").testing.expect; +fn fibonacci(index: u32) u32 { + if (index < 2) return index; + return fibonacci(index - 1) + fibonacci(index - 2); +} + +test "fibonacci in a block" { + const x = comptime blk: { + const n1 = 5; + const n2 = 2; + const n3 = n1 + n2; + try expect(fibonacci(n3) == 13); + break :blk n3; + }; + _ = x; +} +``` + + + + + +## Introducing Generics {#sec-generics} + +First of all, what is a generic? Generic is the idea to allow a type +(`f64`, `u8`, `u32`, `bool`, and also, user-defined types, like the `User` struct +that we defined at @sec-structs-and-oop) to be a parameter to methods, classes and +interfaces [@geeks_generics]. In other words, a "generic" is a class (or a method) that can work +with multiple data types. + +For example, in Java, generics are created through the operator `<>`. With this operator, +a Java class is capable of receiving a data type as input, and therefore, the class can fit +it's features according to this input data type. +As another example, generics in C++ are supported through the concept of templates. +Class templates in C++ are generics. + +In Zig, generics are implemented through `comptime`. The `comptime` keyword +allows us to collect a data type at compile time, and pass this data type as +input to a piece of code. + + +### A generic function {#sec-generic-fun} + +Take the `max()` function exposed below as a first example. +This function is essentially a "generic function". +In this function, we have a comptime function argument named `T`. +Notice that this `T` argument have a data type of `type`. Weird right? This `type` keyword is the +"father of all types", or, "the type of types" in Zig. Because we used this `type` keyword in the `T` +argument, we are telling the `zig` compiler that this `T` argument will receive some data type +as input. + +Also notice the use of the `comptime` keyword in this argument. +As I described at @sec-comptime, every time you use this keyword in a function argument, +this means that the value of this argument must be known at compile-time. +This makes sense, right? Because there is no data type that you can make +that is known only at runtime. + +Think about this. Every data type that you will ever write is always +known at compile-time. Specially because data types are an essential +information for the compiler to actually compile your source code. +Having this in mind, makes sense to mark this argument as a comptime argument. + + +```{zig} +#| auto_main: false +#| build_type: "test" +fn max(comptime T: type, a: T, b: T) T { + return if (a > b) a else b; +} +``` + +Also notice that the value of the `T` argument is actually used +to define the data type of the other arguments of the function, `a` and `b`, and also at the +return type annotation of the function. +That is, the data type of these arguments (`a` and `b`), and, the return data type of the function itself, +are determined by the input value given to the `T` argument. + +As a result, we have a generic function that works with different data types. +For example, I can provide `u8` values to this `max()` function, and it will work as expected. +But if I provide `f64` values instead, it will also work as expected. +If I did not use a generic function, I would have to write a `max()` +for each one of the data types that I wanted to use. +This generic function provides a very useful shortcut for us. + +```{zig} +#| auto_main: false +#| build_type: "test" +const std = @import("std"); +fn max(comptime T: type, a: T, b: T) T { + return if (a > b) a else b; +} +test "test max" { + const n1 = max(u8, 4, 10); + std.debug.print("Max n1: {d}\n", .{n1}); + const n2 = max(f64, 89.24, 64.001); + std.debug.print("Max n2: {d}\n", .{n2}); +} +``` + +``` +Max n1: 10 +Max n2: 89.24 +``` + + + +### A generic data structure {#sec-generic-struct} + +Every data structure that you find in the Zig Standard Library (e.g. `ArrayList`, `HashMap`, etc.) +is essentially a generic data structure. +These data structures are generic in the sense that they work with any data type you want. +You just say which is the data type of the values that are going to be stored in this data +structure, and they just work as expected. + +A generic data structure in Zig is the way to replicate a generic class from Java, +or, a class template from C++. +But you may quest yourself: how do we build a generic data structure in Zig? + +The basic idea is to write a generic function that creates +the data structure definition for the specific type we want. +In other words, this generic function behaves as a "constructor". The function outputs the `struct` definition +that defines this data structure for a specific data type. + +To create such function, we need to add a comptime argument to this function that receives a data type +as input. We already learned how to do this at the previous section (@sec-generic-fun). + +I think the best way to demonstrate how to create a generic data structure is to actually write one. +This where we go to our next small project in this book. This one is a very small project, +which is to write a stack data structure. + + + + +## What is a stack? {#sec-what-stack} + +A stack data structure is a structure that follows a LIFO (*last in first out*) principle. +Only two operations are normally supported in a stack data structure, which are `push` and `pop`. +The `push` operation is used to add new values to the stack, while `pop` is used to remove +values from the stack. + +When people try to explain how the stack data structure works, the most common analogy +that people use is a stack of plates. Imagine that you have +a stack of plates, for example, a stack of 10 plates in your table. +Each plate represents a value that is currently stored in this stack. + +Currently we have a stack with 10 different values, or 10 different plates. Now, imagine that you want to +add a new plate (or a new value) to this stack, which translates to the `push` operation. +You would add this plate (or this value) by just putting the new plate +on the top of the stack. Then, you would increase the stack to 11 plates. + +But how would you remove plates (or remove values) from this stack (a.k.a. the `pop` operation) ? +To do that, we would have to remove the plate on the top of the stack, and, as a result, we would +have, once again, 10 plates in the stack. + +This demonstrates the LIFO concept, because the first plate in the stack, which is the plate +in the bottom of the stack is the last plate to get out of the stack. Think about it. In order +to remove this specific plate from the stack, we have to remove all plates in the +stack. So every operation in the stack, either insertion or deletion, is always made at the top of the stack. +The @fig-stack below exposes this logic visually: + +![A diagram of a stack structure. Source: Wikipedia, the free encyclopedia.](./../Figures/lifo-stack.svg){#fig-stack} + + + +## Writing the stack data structure + +We are going to write the stack data structure in two steps. First, we are going +to implement a stack that can only store `u32` values. Then, after that, we are going +to extend our implementation to make it generic, so that it works with any data type +we want. + +First, we need to decide how the values will be stored inside the stack. There are multiple +ways to implement the storage behind a stack structure. Some people prefer to use a doubly linked list, +some others prefer to use a dynamic array, etc. In this example we are going to use an array behind the hood, +to store the values in the stack, which is the `items` data member of our `Stack` struct definition. + +Also notice in our `Stack` struct that we have three other data members: `capacity`, `length` and `allocator`. +The `capacity` member contains the capacity of the underlying array that stores the values in the stack. +The `length` contains the number of values that are currently being stored in the stack. +And the `allocator` contains the allocator object which will be used by the stack structure whenever it +needs to allocate more space for the values that are being stored. + +We begin by defining an `init()` method of this struct, which is going to be +responsible for instantiating a `Stack` object. Notice that, inside this +`init()` method, we start by allocating an array with the capacity specified +in the `capacity` argument. Then, we use the built-in function `@memset()` +to set all elements of this new array that we have allocated to zero. + +In other words, this `@memset()` function is basically equivalent to the `memset()` +function from C Standard Library[^cmemset]. In this example, we are using this function to essentially +assign the value `0` to all elements of this array. We do that to initialize this memory that was allocated. +Uninitialized memory is a commom source of undefined behaviour, so, this step +is made for safety reasons. + +[^cmemset]: + + +```{zig} +#| auto_main: false +#| build_type: "lib" +const std = @import("std"); +const Allocator = std.mem.Allocator; + +const Stack = struct { + items: []u32, + capacity: usize, + length: usize, + allocator: Allocator, + + pub fn init(allocator: Allocator, capacity: usize) !Stack { + var buf = try allocator.alloc(u32, capacity); + @memset(buf[0..], 0); + return .{ + .items = buf[0..], + .capacity = capacity, + .length = 0, + .allocator = allocator, + }; + } +}; +``` + + +### Implementing the `push` operation + +Now that we have written the basic logic to create a new `Stack` object, +we can start writing the logic responsible for performing a push operation. +Remember, a push operation in a stack data structure is the operation +responsible for adding a new value to the stack. + +So how can we add a new value to the `Stack` object that we have? +The `push()` function exposed below is an answer to this question. +Remember from what we discussed at @sec-what-stack that values are always added to the top of the stack. +This means that this `push()` function must always find the element in the underlying array of the stack, +that currently represents the top position of the stack, and add this value there. + +First, we have an if statement in this function. This if statement is +checking whether we need to expand the underlying array to store +this new value that we are adding to the stack. In other words, maybe +the underlying array does not have enough capacity to store this new +value, and, in this case, we need to expand our array to get the capacity that we need. + +So, if the logic test in this if statement returns true, it means that the array +does not have enough capacity, and we need to expand it before we store this new value. +So inside this if statement we are executing the necessary expressions +to expand the underlying array. + +Notice that we use the allocator object to allocate a new array that is twice as bigger +than the current array (`self.capacity * 2`). Then, we use `memset()` once again to +set all fields in this array to zero, making sure that all memory is properly initialized. + +After that, we use a different built-in function named `@memcpy()`. This built-in function +is equivalent to the `memcpy()` function from the C Standard Library[^cmemcpy]. It is used to +copy the values from one block of memory to another block of memory. In other words, +you can use this function to copy the values from one array into another array. + +[^cmemcpy]: + +We are using this `@memcpy()` built-in function to copy the values that are currently +in the underlying array of the stack object (`self.items`) into our new and bigger array that +we have allocated (`new_buf`). After we execute this function, the `new_buf` contains a copy +of the values that are present at `self.items`. + +Now that we have secured a copy of our current values in the `new_buf` object, we +can now free the memory currently allocated at `self.items`. After that, we just need +to assign our new and bigger array to `self.items`. This is the sequence +of steps necessary to expand our array. + + +```{zig} +#| eval: false +pub fn push(self: *Stack, val: u32) !void { + if ((self.length + 1) > self.capacity) { + var new_buf = try self.allocator.alloc( + u32, self.capacity * 2 + ); + @memset(new_buf[0..], 0); + @memcpy( + new_buf[0..self.capacity], self.items + ); + self.allocator.free(self.items); + self.items = new_buf; + } + + self.items[self.length] = val; + self.length += 1; +} +``` + +After we make sure that we have enough room to store this new value +that we are adding to the stack, all we have to do is to assign +this value to the top element in this stack, and, increase the +value of the `length` attribute by one. We find the top element +in the stack by using the `length` attribute. + + + +### Implementing the `pop` operation + +Now, we can implement the pop operation of our stack object. +This is a much easier operation to implement, and the pop method below summarises +all the logic that is needed. + +We just have to find the element in the underlying array that currently represents the top +of the stack, and set this element to zero, to indicate that +this element is "empty". After that, we also need to decrease +the `length` attribute of the stack by one. + +If the current length of the stack is zero, it means that there is +no values being stored in the stack currently. So, in this case, +we could just return from the function and do nothing really. +This is what the if statement inside this function is checking for. + +```{zig} +#| eval: false +pub fn pop(self: *Stack) void { + if (self.length == 0) return; + + self.items[self.length - 1] = 0; + self.length -= 1; +} +``` + + + +### Implementing the `deinit` method + +We have implemented the methods responsible for the two main operations +associated with the stack data structure, which is `pop()` and `push()`, +and we also have implemented the method responsible for instantiating +a new `Stack` object, which is the `init()` method. + +But now, we need to implement also the method responsible for destroying +a `Stack` object. In Zig, this task is commonly associated with the method +named `deinit()`. Most struct objects in Zig have such method, and it +is commonly nicknamed of "the destructor method". + +In theory, all we have to do to destroy the `Stack` object is to make +sure that we free the allocated memory for the underlying array, using +the allocator object that is stored inside the `Stack` object. +This is what the `deinit()` method below is doing. + +```{zig} +#| eval: false +pub fn deinit(self: *Stack) void { + self.allocator.free(self.items); +} +``` + + + + +## Making it generic + +Now that we have implemented the basic skeleton of our stack data structure, +we can now focus on discussing how can we make it generic. How can we make +this basic skeleton to work not only with `u32` values, but also, with any other +data type we want? +For example, we might need to create a stack object to store `User` values +in it. How can we make this possible? The answer lies on the use of generics +and `comptime`. + +As I described at @sec-generic-struct, the basic idea is to write a generic +function that returns as result a struct definition as output. +In theory, we do not need much to transform our `Stack` struct into a generic +data structure. All that we need to do, to make our `Stack` struct +generic, is to transform the underlying array in the stack structure +into a generic array. + +In other words, this underlying array needs to be a "chameleon". It needs to adapt, +and transform it into an array of the data type that we give it to. For example, if we need to create +a stack that will store `u8` values, then, this underlying array needs to be +a `u8` array (i.e. `[]u8`). But if we need to store `User` values instead, then, +this array needs to be a `User` array (i.e. `[]User`). Etc. + +We do that by using a generic function. Because a generic function can receive a data type +as input, and we can pass this data type to the struct definition of our `Stack` object. +Therefore, we can use the generic function to create a `Stack` object that can store +the data type we want. If we want to create a stack structure that stores `User` values, +we pass the `User` data type to this generic function, and it will create for us +the struct definition that describes a `Stack` object that can store `User` values in it. + +Take the code example below as an example. I truncated some parts of the `Stack` struct +for brevity reasons. But if a specific part of our `Stack` struct is not exposed here +in this example, is because this part did not changed from the previous example. +It remains the same. + + + + +```{zig} +#| eval: false +fn Stack(comptime T: type) type { + return struct { + items: []T, + capacity: usize, + length: usize, + allocator: Allocator, + const Self = @This(); + + pub fn init(allocator: Allocator, capacity: usize) !Stack(T) { + var buf = try allocator.alloc(T, capacity); + @memset(buf[0..], 0); + return .{ + .items = buf[0..], + .capacity = capacity, + .length = 0, + .allocator = allocator, + }; + } + + pub fn push(self: *Self, val: T) !void { + // Truncate the rest of the struct + }; +} +``` + +Notice that we have created a function in this example named `Stack()`. This function +takes a type as input, and passes this type to the struct definition of our +`Stack` object. The data member `items` is now, an array of type `T`, which is the +data type that we passed as input to the function. The function argument +`val` in the `push()` function is now a value of type `T` too. + +Now we have finally a stack data structure that is generic. I mean, we probably still +need to adjust the `@memset()` calls, to trully make it generic for any data type. +But this version is good enough, and it really works well with any of the primitive data +types of Zig. + +We can just provide a data type to this function, and it will create a definition of a +`Stack` object that can store values of the data type that we provided. In the example below, we are creating +the definition of a +`Stack` object that can store `u8` values in it. This definition is stored at the `Stacku8` object. +This `Stacku8` object becomes our new struct, it is the struct that we are going to use +to create our `Stack` object. + + +```{zig} +#| eval: false +var gpa = std.heap.GeneralPurposeAllocator(.{}){}; +const allocator = gpa.allocator(); +const Stacku8 = Stack(u8); +var stack = try Stacku8.init(allocator, 10); +defer stack.deinit(); +try stack.push(1); +try stack.push(2); +try stack.push(3); +try stack.push(4); +try stack.push(5); +try stack.push(6); + +std.debug.print("Stack len: {d}\n", .{stack.length}); +std.debug.print("Stack capacity: {d}\n", .{stack.capacity}); + +stack.pop(); +std.debug.print("Stack len: {d}\n", .{stack.length}); +stack.pop(); +std.debug.print("Stack len: {d}\n", .{stack.length}); +std.debug.print("Stack state: {any}\n", .{stack.items}); +``` + +``` +Stack len: 6 +Stack capacity: 10 +Stack len: 5 +Stack len: 4 +Stack state: { 1, 2, 3, 4, 0, 0, 0, 0, 0, 0 } +``` + +Every generic data structure in the Zig Standard Library (`ArrayList`, `HashMap`, `SinlyLinkedList`, etc.) is implemented through this logic, +of using a generic function to create the struct definition that can work +with the data type that you provided as input. + + + + +## Conclusion + +The full source code of the stack structure discussed in this chapter is freely available at the official +repository for this book. Just checkout the [`stack.zig`](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/data-structures/stack.zig)[^zig-stack] +for the `u32` version of our stack, +and the [`generic_stack.zig`](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/data-structures/generic_stack.zig)[^zig-stack2] +for the generic version, +available inside the `ZigExamples` folder of the repository. + + +[^zig-stack]: +[^zig-stack2]: + diff --git a/Chapters/12-file-op.qmd b/Chapters/12-file-op.qmd index 3f880833..687c4055 100644 --- a/Chapters/12-file-op.qmd +++ b/Chapters/12-file-op.qmd @@ -1,898 +1,898 @@ ---- -engine: knitr -knitr: true -syntax-definition: "../Assets/zig.xml" ---- - -```{r} -#| include: false -source("../zig_engine.R") -knitr::opts_chunk$set( - auto_main = FALSE, - build_type = "lib" -) -``` - - - - -# Filesystem and Input/Output (IO) {#sec-filesystem} - -In this chapter we are going to discuss how to use the cross-platform structs and functions available -in the Zig Standard Library that executes filesystem operations. Most of these functions and structs -comes from the `std.fs` module. - -We are also going to talk about Input/Output (also known as IO) operations in Zig. Most of -these operations are made by using the structs and functions from `std.io` module, which defines -descriptors for the *standard channels* of your system (`stdout` and `stdin`), and also, -functions to create and use I/O streams. - - -## Input/Output basics {#sec-io-basics} - -If you have programming experience in a high-level language, you certainly have used before -the input and output functionalities of this language. In other words, you certainly have -been in a situation where you needed to sent some output to the user, or, to receive an input -from the user. - -For example, in Python we can receive some input from the user by using the `input()` built-in -function. But we can also print (or "show") some output to the user by using the `print()` -built-in function. So yes, if you have programmed before in Python, you certainly have -used these functions once before. - -But do you know how these functions relate back to your operating system (OS)? How exactly -they are interacting with the resources of your OS to receive or sent input/output. -In essence, these input/output functions from high-level languages are just abstractions -over the *standard output* and *standard input* channels of your operating system. - -This means that we receive an input, or send some output, through the operating system. -It is the OS that makes the bridge between the user and your program. Your program -does not have a direct access to the user. It is the OS that intermediates every -message exchanged between your program and the user. - -The *standard output* and *standard input* channels of your OS are commonly known as the -`stdout` and `stdin` channels of your OS, respectively. In some contexts, they are also called of the *standard output device* -and *standard input device*. As the name suggests, the *standard output* -is the channel through which output flows, while the *standard input* is the channel in which -input flows. - -Furthermore, OS's also normally create a dedicated channel for exchanging error messages, known as the -*standard error* channel, or, the `stderr` channel. This is the channel to which error and warning messages -are usually sent to. These are the messages that are normally displayed in red-like or orange-like colors -into your terminal. - -Normally, every OS (e.g. Windows, MacOS, Linux, etc.) creates a dedicated and separate pair of -*standard output*, *standard error* and *standard input* channels for every single program (or process) that runs in your computer. -This means that every program you write have a dedicated `stdin`, `stderr` and `stdout` that are separate -from the `stdin`, `stderr` and `stdout` of other programs and processes that are currently running. - -This is a behaviour from your OS. -This does not come from the programming language that you are using. -Because as I sad earlier, input and output in programming languages, specially -in high-level ones, are just a simple abstraction over the `stdin`, `stderr` and `stdout` from your current OS. -That is, your OS is the intermediary between every input/output operation made in your program, -regardless of the programming language that you are using. - - -### The writer and reader pattern {#sec-writer-reader} - -In Zig, there is a pattern around input/output (IO). I (the author of this book) don't know if there is an official name for this pattern. -But here, in this book, I will call it the "writer and reader pattern". In essence, every IO operation in Zig is -made through either a `GenericReader` or a `GenericWriter` object[^gen-zig]. - -These two data types come from the `std.io` module of the Zig Standard Library. As their names suggests, a -`GenericReader` is an object that offers tools to read data from "something" (or "somewhere"), while a `GenericWriter` -offers tools to write data into this "something". -This "something" might be different things: like a file that exists in your filesystem; or, it might be a network socket of your system[^sock]; or, -a continuous stream of data, like a standard input device from your system, that might be constantly -receiving new data from users, or, as another example, a live chat in a game that is constantly receiving and displaying new messages from the -players of the game. - -[^gen-zig]: Previously, these objects were known as the `Reader` and `Writer` objects. -[^sock]: The socket objects that we have created at @sec-create-socket, are examples of network sockets. - -So, if you want to **read** data from something, or somewhere, it means that you need to use a `GenericReader` object. -But if you need instead, to **write** data into this "something", then, you need to use a `GenericWriter` object instead. -Both of these objects are normally created from a file descriptor object. More specifically, through the `writer()` and `reader()` -methods of this file descriptor object. If you are not familiar with this type of object, go to the -next section. - -Every `GenericWriter` object have methods like `print()`, which allows you to write/send a formatted string -(i.e. this formatted string is like a `f` string in Python, or, similar to the `printf()` C function) -into the "something" (file, socket, stream, etc.) that you are using. It also have a `writeAll()` method, which allows you to -write a string, or, an array of bytes into the "something". - -Likewise, every `GenericReader` object have methods like `readAll()`, which allows you to read the -data from the "something" (file, socket, stream, etc.) until it fills a particular array (i.e. a "buffer") object. -In other words, if you provide an array object of 300 `u8` values to `readAll()`, then, this method attempts to read 300 bytes -of data from the "something", and it stores them into the array object that you have provided. - -We also have other methods, like the `readAtLeast()` method, -which allows you to specify how many bytes exactly you want to read from the "something". -In more details, if you give the number $n$ as input to this method, then, it will attempt to read at least $n$ bytes of data from the "something". -The "something" might have less than $n$ bytes of data available for you to read, so, it is not garanteed -that you will get precisely $n$ bytes as result. - -Another useful method is `readUntilDelimiterOrEof()`. In this method, you specify a "delimiter character". -The idea is that this function will attempt to read as many bytes of data as possible from the "something", -until it encounters the end of the stream, or, it encounters the "delimiter character" that you have specified. - -If you don't know exactly how many bytes will come from the "something", you may find the `readAllAlloc()` method -useful. In essence, you provide an allocator object to this method, so that it can allocate more space if needed. -As consequence, this method will try to read all bytes of the "something", and, if it runs out of space at some point -during the "reading process", it uses the allocator object to allocate more space to continue reading the bytes. -As result, this method returns a slice to the array object containing all the bytes read. - -This is just a quick description of the methods present in these types of objects. But I recommend you -to read the official docs, both for -[`GenericWriter`](https://ziglang.org/documentation/master/std/#std.io.GenericWriter)[^gen-write] and -[`GenericReader`](https://ziglang.org/documentation/master/std/#std.io.GenericReader)[^gen-read]. -I also think it is a good idea to read the source code of the modules in the Zig Standard Library -that defines the methods present in these objects, which are the -[`Reader.zig`](https://github.com/ziglang/zig/blob/master/lib/std/io/Reader.zig)[^mod-read] -and [`Writer.zig`]()[^mod-write]. - -[^gen-read]: . -[^gen-write]: . -[^mod-read]: . -[^mod-write]: . - - -### Introducing file descriptors {#sec-file-descriptor} - -A "file descriptor" object is a core component behind every I/O operation that is made in any operating system (OS). -Such object is an identifier for a particular input/output (IO) resource from your OS [@wiki_file_descriptor]. -It describes and identifies this particular resource. An IO resource might be: - -- an existing file in your filesystem. -- an existing network socket. -- other types of stream channels. -- a pipeline (or just "pipe") in your terminal[^pipes]. - -[^pipes]: A pipeline is a mechanism for inter-process communication, or, inter-process IO. You could also interpret a pipeline as a "set of processes that are chained together, through the standard input/output devices of the system". At Linux for example, a pipeline is created inside a terminal, by connecting two or more terminal commands with the "pipe" character (`|`). - -From the bulletpoints listed aboved, we know that although the term "file" is present, -a "file descriptor" might describe something more than just a file. -This concept of a "file descriptor" comes from the Portable Operating System Interface (POSIX) API, -which is a set of standards that guide how operating systems across the world should be implemented, -to maintain compatibility between them. - -A file descriptor not only identifies the input/output resource that you are using to receive or send some data, -but it also describes where this resource is, and also, which IO mode this resource is currently using. -For example, this IO resource might be using only the "read" IO mode, which means that this resource -is open to "read operations", while "write operations" are closed and not authorized. -These IO modes are essentially, the modes that you provide to the argument `mode` -from the `fopen()` C function, and also, from the `open()` Python built-in function. - -In C, a "file descriptor" is a `FILE` pointer, but, in Zig, a file descriptor is a `File` object. -This data type (`File`) is described in the `std.fs` module of the Zig Standard Library. -We normally don't create a `File` object directly in our Zig code. Instead, we normally get such object as result when we -open an IO resource. In other words, we normally ask to our OS to open and use a particular IO -resource, and, if the OS do open succesfully this IO resource, the OS normally handles back to us -a file descriptor to this particular IO resource. - -So you usually get a `File` object by using functions and methods from the Zig Standard Library -that asks the OS to open some IO resources, like the `openFile()` method that opens a file in the -filesystem. The `net.Stream` object that we have created at @sec-create-socket is also a type of -file descriptor object. - - -### The *standard output* - -You already saw across this book, how can we access and use specifically the `stdout` in Zig -to send some output to the user. -For that, we use the `getStdOut()` function from the `std.io` module. This function returns -a file descriptor that describes the `stdout` channel of your current OS. Through this file -descriptor object, we can read from or write stuff to the `stdout` of our program. - -Although we can read stuff recorded into the `stdout` channel, we normally only -write to (or "print") stuff into this channel. The reason is very similar to what we discussed at -@sec-read-http-message, when we were discussing what "reading from" versus "writing to" the connection -object from our small HTTP Server project would mean. - -When we write stuff into a channel, we are essentially sending data to the other end of this channel. -In contrast, when we read stuff from this channel, we are essentially reading the data that was sent -through this channel. Since the `stdout` is a channel to send output to the user, the key verb here -is **send**. We want to send something to someone, and, as consequence, we want to **write** something -into some channel. - -That is why, when we use `getStdOut()`, most of the times, we also use the `writer()` method from the `stdout` file descriptor, -to get access to a writer object that we can use to write stuff into this `stdout` channel. -More specifically, this `writer()` method returns a `GenericWriter` object. One of the -main methods of this `GenericWriter` object is the `print()` method that we have used -before to write (or "print") a formatted string into the `stdout` channel. - -```{zig} -#| auto_main: false -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -pub fn main() !void { - try stdout.writeAll( - "This message was written into stdout.\n" - ); -} -``` - -``` -This message was written into stdout. -``` - - -This `GenericWriter` object is like any other generic writer object that you would normally get from a file descriptor object. -So, the same methods from a generic writer object that you would use while writing files to the filesystem for example, you could also -use them here, from the file descriptor object of `stdout`, and vice-versa. - - -### The *standard input* - -You can access the *standard input* (i.e. `stdin`) in Zig by using the `getStdIn()` function from the `std.io` module. -Like it's sister (`getStdOut()`), this function also returns a file descriptor object that describes the `stdin` channel -of your OS. - -Since now, we want to receive some input from the user, the key verb here becomes **receive**, and, as consequence, -we usually want to **read** data from the `stdin` channel, instead of writing data into it. So, we normally use -the `reader()` method of the file descriptor object returned by `getStdIn()`, to get access to a `GenericReader` -object that we can use to read data from `stdin`. - -In the example below, we are creating a small buffer capable of holding 20 characters. Then, we try to read -the data from the `stdin` with the `readUntilDelimiterOrEof()` method, and save this data into the `buffer` object. -Also notice that we are reading the data from the `stdin` until we hit a new line character (`'\n'`). - -If you execute this program, you will notice that this program stops the execution, and start to wait indefinetly -for some input from the user. In other words, you need to type your name into the terminal, and then, you press Enter to -send your name to `stdin`. After you send your name to `stdin`, the program reads this input, and continues with the execution, -by printing the given name to `stdout`. In the example below, I typed my name (Pedro) into the terminal, and then, pressed Enter. - - -```{zig} -#| eval: false -#| auto_main: false -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const stdin = std.io.getStdIn().reader(); -pub fn main() !void { - try stdout.writeAll("Type your name\n"); - var buffer: [20]u8 = undefined; - @memset(buffer[0..], 0); - _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\n'); - try stdout.print("Your name is: {s}\n", .{buffer}); -} -``` - -``` -Type your name -Your name is: Pedro - -``` - - -### The *standard error* - -The *standard error* (a.k.a. the `stderr`) works exactly the same as the `stdout`. -You just call the `getStdErr()` function from the `std.io` module, and you get the file descriptor to `stderr`. -Ideally, you should write only error or warning messages to `stderr`, because this is -the purpose of this channel. - - - - - -## Buffered IO - -As we described at @sec-io-basics, input/output (IO) operations are made directly by the operating system. -It is the OS that manages the IO resource that you want to use for your IO operations. -The consequence of this fact is that IO operations are heavilly based on system calls (i.e. calling the operating system directly). - -Just to be clear, there is nothing particularly wrong with system calls. We use them all the time on -any serious codebase written in any low-level programming language. However, system calls are -always orders of magnitude slower than many different types of operations. - -So is perfectly fine to use a system call once in a while. But when these system calls start to be used often, -you can clearly notice most of the times the lost of performance in your application. So, the good rule of thumbs -is to use a system call only when it is needed, and also, only in infrequent situations, to reduce -the number of system calls performed to a minimum. - - -### Understanding how buffered IO works - -Buffered IO is a strategy to achieve better performance. It is used to reduce the number of system calls made by IO operations, and, as -consequence, achieve a much higher performance. At @fig-buff-diff you can find two different diagrams which presents the differences between -read operations performed in an unbuferred IO environment versus a buffered IO environemnt. - -To give a better context to these diagrams, let's suppose that we have a text file that contains the famous Lorem ipsum text[^lorem] -in our filesystem. Let's also suppose that these diagrams at @fig-buff-diff -are showing the read operations that we are performing to read the Lorem ipsum text from this text file. -The first thing you notice when looking at the diagrams, is that in an unbuffered environment the read operations leads to many system calls. -More precisely, in the diagram exposed at @fig-unbuffered-io we get one system call per each byte that we read from the text file. -On the other hand, at @fig-buffered-io we have only one system call at the very beginning. - -When we use a buffered IO system, at the first read operation we perform, instead of sending one single byte directly -to our program, the OS first sends a chunk of bytes from the file to a buffer object (i.e. an array). -This chunk of bytes are cached/stored inside this buffer object, and when this operation is done, then -your program receives the byte that it actually asked for. - -From now on, for every new read operation that you perform, instead of making a new system call to ask -for the next byte in the file to the OS, this read operation is redirected to the buffer object, that have -this next byte already cached and ready to go. - - -[^lorem]: . - -::: {#fig-buff-diff layout-nrow=2} - -![Unbuffered IO](./../Figures/unbuffered-io.png){#fig-unbuffered-io width=60%} - -![Buffered IO](./../Figures/buffered-io.png){#fig-buffered-io} - -Diagrams of read operations performed in buffered IO and unbuffered IO environments. - -::: - -This is the basic logic behind buffered IO systems. The size of the buffer object depends, but most of the times, -it is equal to a full page of memory (4096 bytes). If we follow this logic, then, the OS reads the first 4096 bytes -of the file and caches it into the buffer object. As long as your program does not consume all of the 4096 bytes from the buffer, -not a single system call is created. - -However, as soon as you consume all of the 4096 bytes from the buffer, it means that there is no bytes left in the buffer. -In this situation, a new system call is made to ask the OS to send the next 4096 bytes in the file, and once again, -these bytes are cached into the buffer object, and the cycle starts once again. - - -### Buffered IO across different languages - -IO operations made through a `FILE` pointer in C are buffered -by default, so, at least in C, you don't need to worry about this subject. But in contrast, IO operations in both Rust and Zig are not -buffered depending on which functions from the standard libraries that you are using. - -For example, in Rust, buffered IO is implemented through the `BufReader` and `BufWriter` structs, while in Zig, it is implemented -through the `BufferedReader` and `BufferedWriter` structs. -So any IO operation that you perform through the `GenericWriter` and `GenericReader` objects -that I presented at @sec-writer-reader are not buffered, which means that these objects -might create a lot of system calls depending on the situation. - - -### Using buffered IO in Zig - -Using buffered IO in Zig is actually very easy. All you have to do is to just -give the `GenericWriter` object to the `bufferedWriter()` function, or, to give the `GenericReader` -object to the `bufferedReader()` function. These functions come from the `std.io` module, -and they will construct the `BufferedWriter` or `BufferedReader` object for you. - -After you create this new `BufferedWriter` or `BufferedReader` object, you can call the `writer()` -or `reader()` method of this new object, to get access to a new (and buffered) generic reader or -generic writer. - -Let's describe the process once again. Every time that you have a file descriptor object, you first get the generic writer or generic reader -object from it, by calling the `writer()` or `reader()` methods of this file descriptor object. -Then, you provide this generic writer or generic reader to the `bufferedWriter()` or `bufferedReader()` -function, which creates a new `BufferedWriter` or `BufferedReader` object. Then, you call -the `writer()` or `reader()` methods of this buffered writer or buffered reader object, -which gives you access to a generic writer or generic reader object that is buffered. - -Take this program as an example. This program is essentially demonstrating the process exposed at @fig-buffered-io. -We are simply opening a text file that contains the Lorem ipsum text, and then, we create a buffered IO reader object -at `bufreader`, and we use this `bufreader` object to read the contents of this file into a buffer object, then, -we end the program by printing this buffer to `stdout`. - - -```{zig} -#| eval: false -var file = try std.fs.cwd().openFile( - "ZigExamples/file-io/lorem.txt", .{} -); -defer file.close(); -var buffered = std.io.bufferedReader(file.reader()); -var bufreader = buffered.reader(); - -var buffer: [1000]u8 = undefined; -@memset(buffer[0..], 0); - -_ = try bufreader.readUntilDelimiterOrEof( - buffer[0..], '\n' -); -try stdout.print("{s}\n", .{buffer}); -``` - -``` -Lorem ipsum dolor sit amet, consectetur -adipiscing elit. Sed tincidunt erat sed nulla ornare, nec -aliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus, -ultrices eleifend porttitor ut, finibus ut tortor. Maecenas -sapien justo, finibus tincidunt dictum ac, semper et lectus. -Vivamus molestie egestas orci ac viverra. Pellentesque nec -arcu facilisis, euismod eros eu, sodales nisl. Ut egestas -sagittis arcu, in accumsan sapien rhoncus sit amet. Aenean -neque lectus, imperdiet ac lobortis a, ullamcorper sed massa. -Nullam porttitor porttitor erat nec dapibus. Ut vel dui nec -nulla vulputate molestie eget non nunc. Ut commodo luctus ipsum, -in finibus libero feugiat eget. Etiam vel ante at urna tincidunt -posuere sit amet ut felis. Maecenas finibus suscipit tristique. -Donec viverra non sapien id suscipit. -``` - -Despite being a buffered IO reader, this `bufreader` object is similar to any other `GenericReader` object, -and have the exact same methods. So, although these two types of objects perform very different IO operations, -they have the same interface, so, you the programmer, can interchangeably use them -without the need to change anything in your source code. -So a buffered IO reader or a buffered IO writer objects have the same methods than it's generic and unbuffered brothers, -i.e. the generic reader and generic writer objects that I presented at @sec-writer-reader. - -::: {.callout-tip} -In general, you should always use a buffered IO reader or a buffered IO writer object to perform -IO operations in Zig. Because they deliver better performance to your IO operations. -::: - - -## Filesystem basics - -Now that we have discussed the basics around Input/Output operations in Zig, we need to -talk about the basics around filesystems, which is another core part of any operating system. -Also, filesystems are related to input/output, because the files that we store and create in our -computer are considered an IO resource, as we described at @sec-file-descriptor. - -Likewise when we were talking about input/output, if you have ever programmed in your life, you probably know -some basics about filesystems and file operations, etc. -But, since I don't know you, I don't know what is your background. As a result, -these concepts that I will describe might be clear in your mind, but they also maybe be not as clear as you think. -Just bare with me, while I'm trying to put everyone on the same basis. - - -### The concept of current working directory (CWD) - -The working directory is the folder on your computer where you are currently rooted at, -or in other words, it is the folder that your program is currently looking at. -Therefore, whenever you are executing a program, this program is always working with -a specific folder on your computer. It is always in this folder that the program will initially -look for the files you require, and it is also in this folder that the program -will initially save all the files you ask it to save. - -The working directory is determined by the folder from which you invoke your program -in the terminal. In other words, if you are in the terminal of your OS, and you -execute a binary file (i.e. a program) from this terminal, the folder to which your terminal -is pointing at is the current working directory of your program that is being executed. - -At @fig-cwd we have an example of me executing a program from the terminal. We are executing -the program outputted by the `zig` compiler by compiling the Zig module named `hello.zig`. -The CWD in this case is the `zig-book` folder. In other words, while the `hello.zig` program -is executing, it will be looking at the `zig-book` folder, and any file operation that we perform -inside this program, will be using this `zig-book` folder as the "starting point", or, as the "central focus". - -![An example of executing a program from the terminal](./../Figures/cwd.png){#fig-cwd} - -Just because we are rooted inside a particular folder (in the case of @fig-cwd, the `zig-book` folder) of our computer, -it doesn't mean that we cannot access or write resources in other locations of our computer. -The current working directory (CWD) mechanism just defines where your program will look first -for the files you ask for. This does not prevent you from accessing files that are located -elsewhere on your computer. However, to access any file that is in a folder other than your -current working directory, you must provide a path to that file or folder. - - -### The concept of paths - -A path is essentially a location. It points to a location in your filesystem. We use -paths to describe the location of files and folders in our computer. -One important aspect is that paths are always written inside strings, -i.e. they are always provided as text values. - -There are two types of paths that you can provide to any program in any OS: a relative path, or an absolute path. -Absolute paths are paths that start at the root of your filesystem, and go all the way to the file name or the specfic folder -that you are referring to. This type of path is called absolute, because it points to a unique, absolute location on your computer. -That is, there is no other existing location on your computer that corresponds to this path. It is an unique identifier. - -In Windows, an absolute path is a path that starts with a hard disk identifier (e.g. `C:/Users/pedro`). -On the other hand, absolute paths in Linux and MacOS, are paths that start with a forward slash character (e.g. `/usr/local/bin`). -Notice that a path is composed by "segments". Each segment is connected to each other by a slash character (`\` or `/`). -On Windows, the backward slash (`\`) is normally used to connect the path segments. While on Linux and MacOS, the forward -slash (`/`) is the character used to connect path segments. - -In contrast, a relative path is a path that start at the CWD. In other words, a relative path is -"relative to the CWD". The path used to access the `hello.zig` file at @fig-cwd is an example of relative path. This path -is reproduced below. This path begins at the CWD, which in the context of @fig-cwd, is the `zig-book` folder, -then, it goes to the `ZigExamples` folder, then, into `zig-basics`, then, to the `hello.zig` file. - -``` -ZigExamples/zig-basics/hello_world.zig -``` - - -### Path wildcards - -When providing paths, specially relative paths, you have the option of using a *wildcard*. -There are two commonly used *wildcards* in paths, which are "one period" (.) and "two periods" (..). -In other words, these two specific characters have special meanings when used in paths, -and can be used on any operating system (Mac, Windows, Linux, etc.). That is, they -are "cross platform". - -The "one period" represents an alias for your current working directory. -This means that the relative paths `"./Course/Data/covid.csv"` and `"Course/Data/covid.csv"` are equivalent. -On the other hand, the "two periods" refers to the previous directory. -For example, the path `"Course/.."` is equivalent to the path `"."`, that is, the current working directory. - -Therefore, the path `"Course/.."` refers to the folder before the `Course` folder. -As another example, the path `"src/writexml/../xml.cpp"` refers to the file `xml.cpp` -that is inside the folder before the `writexml` folder, which in this example is the `src` folder. -Therefore, this path is equivalent to `"src/xml.cpp"`. - - - - -## The CWD handler - -In Zig, filesystem operations are usually made through a directory handler object. -A directory handler in Zig is an object of type `Dir`, which is an object that describes -a particular folder in the filesystem of our computer. -You normally create a `Dir` object, by calling the `std.fs.cwd()` function. -This function returns a `Dir` object that points to (or, that describes) the -current working directory (CWD). - -Through this `Dir` object, you can create new files, or modify, or read existing ones that are -inside your CWD. In other words, a `Dir` object is the main entrypoint in Zig to perform -multiple types of filesystem operations. -In the example below, we are creating this `Dir` object, and storing it -inside the `cwd` object. Although we are not using this object at this code example, -we are going to use it a lot over the next examples. - -```{zig} -#| auto_main: true -const cwd = std.fs.cwd(); -_ = cwd; -``` - - - - - - - - -## File operations - -### Creating files {#sec-creating-files} - -We create new files by using the `createFile()` method from the `Dir` object. -Just provide the name of the file that you want to create, and this function will -do the necessary steps to create such file. You can also provide a relative path to this function, -and it will create the file by following this path, which is relative to the CWD. - -This function might return an error, so, you should use `try`, `catch`, or any of the other methods presented -at @sec-error-handling to handle the possible error. But if everything goes well, -this `createFile()` method returns a file descriptor object (i.e. a `File` object) as result, -through which you can add content to the file with the IO operations that I presented before. - -Take this code example below. In this example, we are creating a new text file -named `foo.txt`. If the function `createFile()` succeeds, the object named `file` will contain a file descriptor -object, which we can use to write (or add) new content to the file, like we do in this example, by using -a buffered writer object to write a new line of text to the file. - -Now, a quick note, when we create a file descriptor object in C, by using a C function like `fopen()`, we must always close the file -at the end of our program, or, as soon as we complete all operations that we wanted to perform -on the file. In Zig, this is no different. So everytime we create a new file, this file remains -"open", waiting for some operation to be performed. As soon as we are done with it, we always have -to close this file, to free the resources associated with it. -In Zig, we do this by calling the method `close()` from the file descriptor object. - - -```{zig} -#| eval: false -#| auto_main: true -const cwd = std.fs.cwd(); -const file = try cwd.createFile("foo.txt", .{}); -// Don't forget to close the file at the end. -defer file.close(); -// Do things with the file ... -var fw = file.writer(); -_ = try fw.writeAll( - "Writing this line to the file\n" -); -``` - - -So, in this example we not only have created a file into the filesystem, -but we also wrote some data into this file, using the file descriptor object -returned by `createFile()`. If the file that you are trying to create -already exists in your filesystem, this `createFile()` call will -overwrite the contents of the file, or, in other words, it will -in practice erase all the contents of the existing file. - -If you don't want this to happen, meaning, that you don't want to overwrite -the contents of the existing file, but you want to write data to this file anyway -(i.e. you want to append data to the file), you should use the `openFile()` -method from the `Dir` object. - -Another important aspect about `createFile()` is that this method creates a file -that is not opened to read operations by default. It means that you cannot read this file. -You are not allowed to. -So for example, you might want to write some stuff into this file at the beginning of the execution -of your program. Then, at a future point in your program you might need to read what you have -wroted into this file. If you try to read data from this file, you will likely -get a `NotOpenForReading` error as result. - - -But how can you overcome this barrier? How can you create a file that is open -to read operations? All you have to do, is to set the `read` flag to true -in the second argument of `createFile()`. When you set this flag to true, -then the file get's create with "read permissions", and, as consequence, -a program like this one below becomes valid: - - -```{zig} -#| eval: false -#| auto_main: true -#| build_type: "run" -const cwd = std.fs.cwd(); -const file = try cwd.createFile("foo.txt", .{ .read = true }); -defer file.close(); - -var fw = file.writer(); -_ = try fw.writeAll("We are going to read this line\n"); - -var buffer: [300]u8 = undefined; -@memset(buffer[0..], 0); -try file.seekTo(0); -var fr = file.reader(); -_ = try fr.readAll(buffer[0..]); -try stdout.print("{s}\n", .{buffer}); -``` - - -``` -We are going to read this line -``` - - -If you are not familiar with position indicators, you may not recognize what the method -`seekTo()` is, or, what does it do. If that is your case, do not worry, -we are going to talk more about this method at @sec-indicators. But essentially -this method is moving the position indicator back to the beginning of the file, -so that we can read the contents of the file from the beginning. - - -### Opening files and appending data to it - -Opening files is easy. Just use the `openFile()` method instead of `createFile()`. -In the first argument of `openFile()` you provide the path to the file that -you want to open. Then, on the second argument you provide the flags (or, the options) -that dictates how the file is opened. - -You can see the full list of options for `openFile()` by visiting the documentation for -[`OpenFlags`](https://ziglang.org/documentation/master/std/#std.fs.File.OpenFlags)[^oflags]. -But the main flag that you will most certainly be worried about is the `mode` flag. -This flag specifies the IO mode that the file will be using when it get's opened. -There are three IO modes, or, three values that you can provide to this flag, which are: - -- `read_only`, allows only read operations on the file. All write operations are blocked. -- `write_only`, allows only write operations on the file. All read operations are blocked. -- `read_write`, allows both write and read operations on the file. - -[^oflags]: - -These modes are similar to the modes that you provide to the `mode` argument of the -`open()` Python built-in function[^py-open], or, the `mode` argument of the -`fopen()` C function[^c-open]. -In the code example below, we are opening the `foo.txt` text file with a `write_only` mode, -and appending a new line of text to the end of the file. We use `seekFromEnd()` this time -to garantee that we are going to append the text to the end of the file. Once again, methods -such as `seekFromEnd()` are described in more depth at @sec-indicators. - -[^py-open]: -[^c-open]: - - - -```{zig} -#| eval: false -#| auto_main: true -#| build_type: "run" -const cwd = std.fs.cwd(); -const file = try cwd.openFile("foo.txt", .{ .mode = .write_only }); -defer file.close(); -try file.seekFromEnd(0); -var fw = file.writer(); -_ = try fw.writeAll("Some random text to write\n"); -``` - - -### Deleting files - -Sometimes, we just need to delete/remove the files that we have. -To do that, we use the `deleteFile()` method. You just provide the path of the -file that you want to delete, and this method will try to delete the file located -at this path. - -```{zig} -#| eval: false -const cwd = std.fs.cwd(); -try cwd.deleteFile("foo.txt"); -``` - -### Copying files - -To copy existing files, we use the `copyFile()` method. The first argument in this method -is the path to the file that you want to copy. The second argument is a `Dir` object, i.e. a directory handler, -more specifically, a `Dir` object that points to the folder in your computer where you want to -copy the file to. The third argument is the new path of the file, or, in other words, the new location -of the file. The fourth argument is the options (or flags) to be used in the copy operation. - -The `Dir` object that you provide as input to this method will be used to copy the file to -the new location. You may create this `Dir` object before calling the `copyFile()` method. -Maybe you are planning to copy the file to a completly different location in your computer, -so it might be worth to create a directory handler to that location. But if you copying the -file to a subfolder of your CWD, then, you can just simply pass the CWD handler to this argument. - -```{zig} -#| eval: false -const cwd = std.fs.cwd(); -try cwd.copyFile( - "foo.txt", - cwd, - "ZigExamples/file-io/foo.txt", - .{} -); -``` - - -### Read the docs! - -There are some other useful methods for file operations available at `Dir` objects, -such as the `writeFile()` method, but I recommend you to read the docs for the -[`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir] -to explore the other available methods, since I already talked too much about them. - - -[^zig-dir]: - - - - -## Position indicators {#sec-indicators} - -A position indicator is like a type of cursor, or, an index. This "index" identifies the current -location in the file (or, in the data stream) that the file descriptor object that you have -is currently looking at. -When you create a file descriptor, the position indicator starts at the beginning of the file, -or, at the beginning of the stream. When you read or write data into the file (or socket, or data stream, etc.) -described by this file descriptor object, you end up moving the position indicator. - -In other words, any IO operation have a common side effect, which is moving the position indicator. -For example, suppose that we have a file of 300 bytes total in size. If you -read 100 bytes from the file, the position indicator moves 100 bytes forward. If you try -to write 50 bytes into this same file, these 50 bytes will be written from the current -position indicated by the position indicator. Since the indicator is at a 100 bytes forward from -the beginning of the file, these 50 bytes would be written in the middle of the file. - -This is why we have used the `seekTo()` method at the last code example presented at @sec-creating-files. -We have used this method to move the position indicator back to the beginning of the file, which -would make sure that we would write the text that we wanted to write from the beginning of the file, -instead of writing it from the middle of the file. Because before the write operation, we already had -performed a read operation, which means that the position indicator was moved in this read operation. - -The position indicators of a file descriptor object can be changed (or altered) by using the -"seek" methods from this file descriptor, which are: `seekTo()`, `seekFromEnd()` and `seekBy()`. -These methods have the same effect, or, the same resposibility that the -[`fseek()`](https://en.cppreference.com/w/c/io/fseek)[^c-fseek] -C function. - -[^c-fseek]: - - -Considering that `offset` refers to the index that you provide as input to these "seek" methods, -the bulletpoints below summarises what is the effect of each of these methods. -A quick note, in the case of `seekFromEnd()` and `seekBy()`, the `offset` provided can be either a -positive or negative index. - -- `seekTo()` will move the position indicator to the location that is `offset` bytes from the beginning of the file. -- `seekFromEnd()` will move the position indicator to the location that is `offset` bytes from the end of the file. -- `seekBy()` will move the position indicator to the location that is `offset` bytes from the current position in the file. - - - - - - - -## Directory operations - -### Iterating through the files in a directory - -One of the most classic tasks related to filesystem is to be able -to iterate through the existing files in a directory. Iteration -over a directory is made in Zig through a iterator pattern. In other words, we need -to create a iterator object, and use this object to iterate through the files. - -You can produce such directory object by using either the `iterate()` or `walk()` methods -of a `Dir` object. Both methods return a iterator object as result, which you can advance by using -`next()`. The difference between these methods, is that `iterate()` returns a non-recursive iterator, -while `walk()` does. It means that the iterator returned by `walk()` will not only iterate through -the files available in the current directory, but also, through the files from any subdirectory found -inside the current directory. - -In the example below, we are displaying the names of the files stored inside the -directory `ZigExamples/file-io`. Notice that we had to open this directory through -the `openDir()` function. Also notice that we provided the flag `iterate` in the -second argument of `openDir()`. This flag is important, because without this flag, -we would not be allowed to iterate through the files in this directory. - -```{zig} -#| eval: false -#| auto_main: true -#| build_type: "run" -const cwd = std.fs.cwd(); -const dir = try cwd.openDir( - "ZigExamples/file-io/", - .{ .iterate = true } -); -var it = dir.iterate(); -while (try it.next()) |entry| { - try stdout.print( - "File name: {s}\n", - .{entry.name} - ); -} -``` - -``` -File name: create_file_and_write_toit.zig -File name: create_file.zig -File name: lorem.txt -File name: iterate.zig -File name: delete_file.zig -File name: append_to_file.zig -File name: user_input.zig -File name: foo.txt -File name: create_file_and_read.zig -File name: buff_io.zig -File name: copy_file.zig -``` - - -### Creating new directories - -There are two methods that are important when it comes to -creating directories, which are `makeDir()` and `makePath()`. -The difference between these two methods is that `makeDir()` can -only create one single directory in the current directory in each call, -while `makePath()` is capable of recursively create subdirectories in the same call. - - -This is why the name of this method is "make path". It will create as many -subdirectories as necessary to create the path that you provided as input. -So, if you provide the path `"sub1/sub2/sub3"` as input to this method, -it will create three different subdirectories, `sub1`, `sub2` and `sub3`, -within the same function call. In contrast, if you provided such path -as input to `makeDir()`, you would likely get an error as result, since -this method can only create a single subdirectory. - -```{zig} -#| auto_main: true -#| eval: false -const cwd = std.fs.cwd(); -try cwd.makeDir("src"); -try cwd.makePath("src/decoders/jpg/"); -``` - -### Deleting directories - -To delete a directory, just provide the path to the directory that you want to delete -as input to the `deleteDir()` method from a `Dir` object. In the example below, -we are deleting the `src` directory that we have just created in the previous example. - -```{zig} -#| auto_main: true -#| eval: false -const cwd = std.fs.cwd(); -try cwd.deleteDir("src"); -``` - - -## Conclusion - -In this chapter, I have described how to perform in Zig the most common filesystem and IO operations. -But you might feel the lack of some other, less common, operation in this chapter, such as: how to rename files, -or how to open a directory, or how to create symbolic links, or how to use `access()` to test if a particular -path exists in your computer. But for all of these less common tasks, I recommend you to read -the docs of the [`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir] -, since you can find a good description of these cases there. - - - - +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + + +# Filesystem and Input/Output (IO) {#sec-filesystem} + +In this chapter we are going to discuss how to use the cross-platform structs and functions available +in the Zig Standard Library that executes filesystem operations. Most of these functions and structs +comes from the `std.fs` module. + +We are also going to talk about Input/Output (also known as IO) operations in Zig. Most of +these operations are made by using the structs and functions from `std.io` module, which defines +descriptors for the *standard channels* of your system (`stdout` and `stdin`), and also, +functions to create and use I/O streams. + + +## Input/Output basics {#sec-io-basics} + +If you have programming experience in a high-level language, you certainly have used before +the input and output functionalities of this language. In other words, you certainly have +been in a situation where you needed to sent some output to the user, or, to receive an input +from the user. + +For example, in Python we can receive some input from the user by using the `input()` built-in +function. But we can also print (or "show") some output to the user by using the `print()` +built-in function. So yes, if you have programmed before in Python, you certainly have +used these functions once before. + +But do you know how these functions relate back to your operating system (OS)? How exactly +they are interacting with the resources of your OS to receive or sent input/output. +In essence, these input/output functions from high-level languages are just abstractions +over the *standard output* and *standard input* channels of your operating system. + +This means that we receive an input, or send some output, through the operating system. +It is the OS that makes the bridge between the user and your program. Your program +does not have a direct access to the user. It is the OS that intermediates every +message exchanged between your program and the user. + +The *standard output* and *standard input* channels of your OS are commonly known as the +`stdout` and `stdin` channels of your OS, respectively. In some contexts, they are also called of the *standard output device* +and *standard input device*. As the name suggests, the *standard output* +is the channel through which output flows, while the *standard input* is the channel in which +input flows. + +Furthermore, OS's also normally create a dedicated channel for exchanging error messages, known as the +*standard error* channel, or, the `stderr` channel. This is the channel to which error and warning messages +are usually sent to. These are the messages that are normally displayed in red-like or orange-like colors +into your terminal. + +Normally, every OS (e.g. Windows, MacOS, Linux, etc.) creates a dedicated and separate pair of +*standard output*, *standard error* and *standard input* channels for every single program (or process) that runs in your computer. +This means that every program you write have a dedicated `stdin`, `stderr` and `stdout` that are separate +from the `stdin`, `stderr` and `stdout` of other programs and processes that are currently running. + +This is a behaviour from your OS. +This does not come from the programming language that you are using. +Because as I sad earlier, input and output in programming languages, specially +in high-level ones, are just a simple abstraction over the `stdin`, `stderr` and `stdout` from your current OS. +That is, your OS is the intermediary between every input/output operation made in your program, +regardless of the programming language that you are using. + + +### The writer and reader pattern {#sec-writer-reader} + +In Zig, there is a pattern around input/output (IO). I (the author of this book) don't know if there is an official name for this pattern. +But here, in this book, I will call it the "writer and reader pattern". In essence, every IO operation in Zig is +made through either a `GenericReader` or a `GenericWriter` object[^gen-zig]. + +These two data types come from the `std.io` module of the Zig Standard Library. As their names suggests, a +`GenericReader` is an object that offers tools to read data from "something" (or "somewhere"), while a `GenericWriter` +offers tools to write data into this "something". +This "something" might be different things: like a file that exists in your filesystem; or, it might be a network socket of your system[^sock]; or, +a continuous stream of data, like a standard input device from your system, that might be constantly +receiving new data from users, or, as another example, a live chat in a game that is constantly receiving and displaying new messages from the +players of the game. + +[^gen-zig]: Previously, these objects were known as the `Reader` and `Writer` objects. +[^sock]: The socket objects that we have created at @sec-create-socket, are examples of network sockets. + +So, if you want to **read** data from something, or somewhere, it means that you need to use a `GenericReader` object. +But if you need instead, to **write** data into this "something", then, you need to use a `GenericWriter` object instead. +Both of these objects are normally created from a file descriptor object. More specifically, through the `writer()` and `reader()` +methods of this file descriptor object. If you are not familiar with this type of object, go to the +next section. + +Every `GenericWriter` object have methods like `print()`, which allows you to write/send a formatted string +(i.e. this formatted string is like a `f` string in Python, or, similar to the `printf()` C function) +into the "something" (file, socket, stream, etc.) that you are using. It also have a `writeAll()` method, which allows you to +write a string, or, an array of bytes into the "something". + +Likewise, every `GenericReader` object have methods like `readAll()`, which allows you to read the +data from the "something" (file, socket, stream, etc.) until it fills a particular array (i.e. a "buffer") object. +In other words, if you provide an array object of 300 `u8` values to `readAll()`, then, this method attempts to read 300 bytes +of data from the "something", and it stores them into the array object that you have provided. + +We also have other methods, like the `readAtLeast()` method, +which allows you to specify how many bytes exactly you want to read from the "something". +In more details, if you give the number $n$ as input to this method, then, it will attempt to read at least $n$ bytes of data from the "something". +The "something" might have less than $n$ bytes of data available for you to read, so, it is not garanteed +that you will get precisely $n$ bytes as result. + +Another useful method is `readUntilDelimiterOrEof()`. In this method, you specify a "delimiter character". +The idea is that this function will attempt to read as many bytes of data as possible from the "something", +until it encounters the end of the stream, or, it encounters the "delimiter character" that you have specified. + +If you don't know exactly how many bytes will come from the "something", you may find the `readAllAlloc()` method +useful. In essence, you provide an allocator object to this method, so that it can allocate more space if needed. +As consequence, this method will try to read all bytes of the "something", and, if it runs out of space at some point +during the "reading process", it uses the allocator object to allocate more space to continue reading the bytes. +As result, this method returns a slice to the array object containing all the bytes read. + +This is just a quick description of the methods present in these types of objects. But I recommend you +to read the official docs, both for +[`GenericWriter`](https://ziglang.org/documentation/master/std/#std.io.GenericWriter)[^gen-write] and +[`GenericReader`](https://ziglang.org/documentation/master/std/#std.io.GenericReader)[^gen-read]. +I also think it is a good idea to read the source code of the modules in the Zig Standard Library +that defines the methods present in these objects, which are the +[`Reader.zig`](https://github.com/ziglang/zig/blob/master/lib/std/io/Reader.zig)[^mod-read] +and [`Writer.zig`]()[^mod-write]. + +[^gen-read]: . +[^gen-write]: . +[^mod-read]: . +[^mod-write]: . + + +### Introducing file descriptors {#sec-file-descriptor} + +A "file descriptor" object is a core component behind every I/O operation that is made in any operating system (OS). +Such object is an identifier for a particular input/output (IO) resource from your OS [@wiki_file_descriptor]. +It describes and identifies this particular resource. An IO resource might be: + +- an existing file in your filesystem. +- an existing network socket. +- other types of stream channels. +- a pipeline (or just "pipe") in your terminal[^pipes]. + +[^pipes]: A pipeline is a mechanism for inter-process communication, or, inter-process IO. You could also interpret a pipeline as a "set of processes that are chained together, through the standard input/output devices of the system". At Linux for example, a pipeline is created inside a terminal, by connecting two or more terminal commands with the "pipe" character (`|`). + +From the bulletpoints listed aboved, we know that although the term "file" is present, +a "file descriptor" might describe something more than just a file. +This concept of a "file descriptor" comes from the Portable Operating System Interface (POSIX) API, +which is a set of standards that guide how operating systems across the world should be implemented, +to maintain compatibility between them. + +A file descriptor not only identifies the input/output resource that you are using to receive or send some data, +but it also describes where this resource is, and also, which IO mode this resource is currently using. +For example, this IO resource might be using only the "read" IO mode, which means that this resource +is open to "read operations", while "write operations" are closed and not authorized. +These IO modes are essentially, the modes that you provide to the argument `mode` +from the `fopen()` C function, and also, from the `open()` Python built-in function. + +In C, a "file descriptor" is a `FILE` pointer, but, in Zig, a file descriptor is a `File` object. +This data type (`File`) is described in the `std.fs` module of the Zig Standard Library. +We normally don't create a `File` object directly in our Zig code. Instead, we normally get such object as result when we +open an IO resource. In other words, we normally ask to our OS to open and use a particular IO +resource, and, if the OS do open succesfully this IO resource, the OS normally handles back to us +a file descriptor to this particular IO resource. + +So you usually get a `File` object by using functions and methods from the Zig Standard Library +that asks the OS to open some IO resources, like the `openFile()` method that opens a file in the +filesystem. The `net.Stream` object that we have created at @sec-create-socket is also a type of +file descriptor object. + + +### The *standard output* + +You already saw across this book, how can we access and use specifically the `stdout` in Zig +to send some output to the user. +For that, we use the `getStdOut()` function from the `std.io` module. This function returns +a file descriptor that describes the `stdout` channel of your current OS. Through this file +descriptor object, we can read from or write stuff to the `stdout` of our program. + +Although we can read stuff recorded into the `stdout` channel, we normally only +write to (or "print") stuff into this channel. The reason is very similar to what we discussed at +@sec-read-http-message, when we were discussing what "reading from" versus "writing to" the connection +object from our small HTTP Server project would mean. + +When we write stuff into a channel, we are essentially sending data to the other end of this channel. +In contrast, when we read stuff from this channel, we are essentially reading the data that was sent +through this channel. Since the `stdout` is a channel to send output to the user, the key verb here +is **send**. We want to send something to someone, and, as consequence, we want to **write** something +into some channel. + +That is why, when we use `getStdOut()`, most of the times, we also use the `writer()` method from the `stdout` file descriptor, +to get access to a writer object that we can use to write stuff into this `stdout` channel. +More specifically, this `writer()` method returns a `GenericWriter` object. One of the +main methods of this `GenericWriter` object is the `print()` method that we have used +before to write (or "print") a formatted string into the `stdout` channel. + +```{zig} +#| auto_main: false +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +pub fn main() !void { + try stdout.writeAll( + "This message was written into stdout.\n" + ); +} +``` + +``` +This message was written into stdout. +``` + + +This `GenericWriter` object is like any other generic writer object that you would normally get from a file descriptor object. +So, the same methods from a generic writer object that you would use while writing files to the filesystem for example, you could also +use them here, from the file descriptor object of `stdout`, and vice-versa. + + +### The *standard input* + +You can access the *standard input* (i.e. `stdin`) in Zig by using the `getStdIn()` function from the `std.io` module. +Like it's sister (`getStdOut()`), this function also returns a file descriptor object that describes the `stdin` channel +of your OS. + +Since now, we want to receive some input from the user, the key verb here becomes **receive**, and, as consequence, +we usually want to **read** data from the `stdin` channel, instead of writing data into it. So, we normally use +the `reader()` method of the file descriptor object returned by `getStdIn()`, to get access to a `GenericReader` +object that we can use to read data from `stdin`. + +In the example below, we are creating a small buffer capable of holding 20 characters. Then, we try to read +the data from the `stdin` with the `readUntilDelimiterOrEof()` method, and save this data into the `buffer` object. +Also notice that we are reading the data from the `stdin` until we hit a new line character (`'\n'`). + +If you execute this program, you will notice that this program stops the execution, and start to wait indefinetly +for some input from the user. In other words, you need to type your name into the terminal, and then, you press Enter to +send your name to `stdin`. After you send your name to `stdin`, the program reads this input, and continues with the execution, +by printing the given name to `stdout`. In the example below, I typed my name (Pedro) into the terminal, and then, pressed Enter. + + +```{zig} +#| eval: false +#| auto_main: false +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const stdin = std.io.getStdIn().reader(); +pub fn main() !void { + try stdout.writeAll("Type your name\n"); + var buffer: [20]u8 = undefined; + @memset(buffer[0..], 0); + _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\n'); + try stdout.print("Your name is: {s}\n", .{buffer}); +} +``` + +``` +Type your name +Your name is: Pedro + +``` + + +### The *standard error* + +The *standard error* (a.k.a. the `stderr`) works exactly the same as the `stdout`. +You just call the `getStdErr()` function from the `std.io` module, and you get the file descriptor to `stderr`. +Ideally, you should write only error or warning messages to `stderr`, because this is +the purpose of this channel. + + + + + +## Buffered IO + +As we described at @sec-io-basics, input/output (IO) operations are made directly by the operating system. +It is the OS that manages the IO resource that you want to use for your IO operations. +The consequence of this fact is that IO operations are heavilly based on system calls (i.e. calling the operating system directly). + +Just to be clear, there is nothing particularly wrong with system calls. We use them all the time on +any serious codebase written in any low-level programming language. However, system calls are +always orders of magnitude slower than many different types of operations. + +So is perfectly fine to use a system call once in a while. But when these system calls start to be used often, +you can clearly notice most of the times the lost of performance in your application. So, the good rule of thumbs +is to use a system call only when it is needed, and also, only in infrequent situations, to reduce +the number of system calls performed to a minimum. + + +### Understanding how buffered IO works + +Buffered IO is a strategy to achieve better performance. It is used to reduce the number of system calls made by IO operations, and, as +consequence, achieve a much higher performance. At @fig-buff-diff you can find two different diagrams which presents the differences between +read operations performed in an unbuferred IO environment versus a buffered IO environemnt. + +To give a better context to these diagrams, let's suppose that we have a text file that contains the famous Lorem ipsum text[^lorem] +in our filesystem. Let's also suppose that these diagrams at @fig-buff-diff +are showing the read operations that we are performing to read the Lorem ipsum text from this text file. +The first thing you notice when looking at the diagrams, is that in an unbuffered environment the read operations leads to many system calls. +More precisely, in the diagram exposed at @fig-unbuffered-io we get one system call per each byte that we read from the text file. +On the other hand, at @fig-buffered-io we have only one system call at the very beginning. + +When we use a buffered IO system, at the first read operation we perform, instead of sending one single byte directly +to our program, the OS first sends a chunk of bytes from the file to a buffer object (i.e. an array). +This chunk of bytes are cached/stored inside this buffer object, and when this operation is done, then +your program receives the byte that it actually asked for. + +From now on, for every new read operation that you perform, instead of making a new system call to ask +for the next byte in the file to the OS, this read operation is redirected to the buffer object, that have +this next byte already cached and ready to go. + + +[^lorem]: . + +::: {#fig-buff-diff layout-nrow=2} + +![Unbuffered IO](./../Figures/unbuffered-io.png){#fig-unbuffered-io width=60%} + +![Buffered IO](./../Figures/buffered-io.png){#fig-buffered-io} + +Diagrams of read operations performed in buffered IO and unbuffered IO environments. + +::: + +This is the basic logic behind buffered IO systems. The size of the buffer object depends, but most of the times, +it is equal to a full page of memory (4096 bytes). If we follow this logic, then, the OS reads the first 4096 bytes +of the file and caches it into the buffer object. As long as your program does not consume all of the 4096 bytes from the buffer, +not a single system call is created. + +However, as soon as you consume all of the 4096 bytes from the buffer, it means that there is no bytes left in the buffer. +In this situation, a new system call is made to ask the OS to send the next 4096 bytes in the file, and once again, +these bytes are cached into the buffer object, and the cycle starts once again. + + +### Buffered IO across different languages + +IO operations made through a `FILE` pointer in C are buffered +by default, so, at least in C, you don't need to worry about this subject. But in contrast, IO operations in both Rust and Zig are not +buffered depending on which functions from the standard libraries that you are using. + +For example, in Rust, buffered IO is implemented through the `BufReader` and `BufWriter` structs, while in Zig, it is implemented +through the `BufferedReader` and `BufferedWriter` structs. +So any IO operation that you perform through the `GenericWriter` and `GenericReader` objects +that I presented at @sec-writer-reader are not buffered, which means that these objects +might create a lot of system calls depending on the situation. + + +### Using buffered IO in Zig + +Using buffered IO in Zig is actually very easy. All you have to do is to just +give the `GenericWriter` object to the `bufferedWriter()` function, or, to give the `GenericReader` +object to the `bufferedReader()` function. These functions come from the `std.io` module, +and they will construct the `BufferedWriter` or `BufferedReader` object for you. + +After you create this new `BufferedWriter` or `BufferedReader` object, you can call the `writer()` +or `reader()` method of this new object, to get access to a new (and buffered) generic reader or +generic writer. + +Let's describe the process once again. Every time that you have a file descriptor object, you first get the generic writer or generic reader +object from it, by calling the `writer()` or `reader()` methods of this file descriptor object. +Then, you provide this generic writer or generic reader to the `bufferedWriter()` or `bufferedReader()` +function, which creates a new `BufferedWriter` or `BufferedReader` object. Then, you call +the `writer()` or `reader()` methods of this buffered writer or buffered reader object, +which gives you access to a generic writer or generic reader object that is buffered. + +Take this program as an example. This program is essentially demonstrating the process exposed at @fig-buffered-io. +We are simply opening a text file that contains the Lorem ipsum text, and then, we create a buffered IO reader object +at `bufreader`, and we use this `bufreader` object to read the contents of this file into a buffer object, then, +we end the program by printing this buffer to `stdout`. + + +```{zig} +#| eval: false +var file = try std.fs.cwd().openFile( + "ZigExamples/file-io/lorem.txt", .{} +); +defer file.close(); +var buffered = std.io.bufferedReader(file.reader()); +var bufreader = buffered.reader(); + +var buffer: [1000]u8 = undefined; +@memset(buffer[0..], 0); + +_ = try bufreader.readUntilDelimiterOrEof( + buffer[0..], '\n' +); +try stdout.print("{s}\n", .{buffer}); +``` + +``` +Lorem ipsum dolor sit amet, consectetur +adipiscing elit. Sed tincidunt erat sed nulla ornare, nec +aliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus, +ultrices eleifend porttitor ut, finibus ut tortor. Maecenas +sapien justo, finibus tincidunt dictum ac, semper et lectus. +Vivamus molestie egestas orci ac viverra. Pellentesque nec +arcu facilisis, euismod eros eu, sodales nisl. Ut egestas +sagittis arcu, in accumsan sapien rhoncus sit amet. Aenean +neque lectus, imperdiet ac lobortis a, ullamcorper sed massa. +Nullam porttitor porttitor erat nec dapibus. Ut vel dui nec +nulla vulputate molestie eget non nunc. Ut commodo luctus ipsum, +in finibus libero feugiat eget. Etiam vel ante at urna tincidunt +posuere sit amet ut felis. Maecenas finibus suscipit tristique. +Donec viverra non sapien id suscipit. +``` + +Despite being a buffered IO reader, this `bufreader` object is similar to any other `GenericReader` object, +and have the exact same methods. So, although these two types of objects perform very different IO operations, +they have the same interface, so, you the programmer, can interchangeably use them +without the need to change anything in your source code. +So a buffered IO reader or a buffered IO writer objects have the same methods than it's generic and unbuffered brothers, +i.e. the generic reader and generic writer objects that I presented at @sec-writer-reader. + +::: {.callout-tip} +In general, you should always use a buffered IO reader or a buffered IO writer object to perform +IO operations in Zig. Because they deliver better performance to your IO operations. +::: + + +## Filesystem basics + +Now that we have discussed the basics around Input/Output operations in Zig, we need to +talk about the basics around filesystems, which is another core part of any operating system. +Also, filesystems are related to input/output, because the files that we store and create in our +computer are considered an IO resource, as we described at @sec-file-descriptor. + +Likewise when we were talking about input/output, if you have ever programmed in your life, you probably know +some basics about filesystems and file operations, etc. +But, since I don't know you, I don't know what is your background. As a result, +these concepts that I will describe might be clear in your mind, but they also maybe be not as clear as you think. +Just bare with me, while I'm trying to put everyone on the same basis. + + +### The concept of current working directory (CWD) + +The working directory is the folder on your computer where you are currently rooted at, +or in other words, it is the folder that your program is currently looking at. +Therefore, whenever you are executing a program, this program is always working with +a specific folder on your computer. It is always in this folder that the program will initially +look for the files you require, and it is also in this folder that the program +will initially save all the files you ask it to save. + +The working directory is determined by the folder from which you invoke your program +in the terminal. In other words, if you are in the terminal of your OS, and you +execute a binary file (i.e. a program) from this terminal, the folder to which your terminal +is pointing at is the current working directory of your program that is being executed. + +At @fig-cwd we have an example of me executing a program from the terminal. We are executing +the program outputted by the `zig` compiler by compiling the Zig module named `hello.zig`. +The CWD in this case is the `zig-book` folder. In other words, while the `hello.zig` program +is executing, it will be looking at the `zig-book` folder, and any file operation that we perform +inside this program, will be using this `zig-book` folder as the "starting point", or, as the "central focus". + +![An example of executing a program from the terminal](./../Figures/cwd.png){#fig-cwd} + +Just because we are rooted inside a particular folder (in the case of @fig-cwd, the `zig-book` folder) of our computer, +it doesn't mean that we cannot access or write resources in other locations of our computer. +The current working directory (CWD) mechanism just defines where your program will look first +for the files you ask for. This does not prevent you from accessing files that are located +elsewhere on your computer. However, to access any file that is in a folder other than your +current working directory, you must provide a path to that file or folder. + + +### The concept of paths + +A path is essentially a location. It points to a location in your filesystem. We use +paths to describe the location of files and folders in our computer. +One important aspect is that paths are always written inside strings, +i.e. they are always provided as text values. + +There are two types of paths that you can provide to any program in any OS: a relative path, or an absolute path. +Absolute paths are paths that start at the root of your filesystem, and go all the way to the file name or the specfic folder +that you are referring to. This type of path is called absolute, because it points to a unique, absolute location on your computer. +That is, there is no other existing location on your computer that corresponds to this path. It is an unique identifier. + +In Windows, an absolute path is a path that starts with a hard disk identifier (e.g. `C:/Users/pedro`). +On the other hand, absolute paths in Linux and MacOS, are paths that start with a forward slash character (e.g. `/usr/local/bin`). +Notice that a path is composed by "segments". Each segment is connected to each other by a slash character (`\` or `/`). +On Windows, the backward slash (`\`) is normally used to connect the path segments. While on Linux and MacOS, the forward +slash (`/`) is the character used to connect path segments. + +In contrast, a relative path is a path that start at the CWD. In other words, a relative path is +"relative to the CWD". The path used to access the `hello.zig` file at @fig-cwd is an example of relative path. This path +is reproduced below. This path begins at the CWD, which in the context of @fig-cwd, is the `zig-book` folder, +then, it goes to the `ZigExamples` folder, then, into `zig-basics`, then, to the `hello.zig` file. + +``` +ZigExamples/zig-basics/hello_world.zig +``` + + +### Path wildcards + +When providing paths, specially relative paths, you have the option of using a *wildcard*. +There are two commonly used *wildcards* in paths, which are "one period" (.) and "two periods" (..). +In other words, these two specific characters have special meanings when used in paths, +and can be used on any operating system (Mac, Windows, Linux, etc.). That is, they +are "cross platform". + +The "one period" represents an alias for your current working directory. +This means that the relative paths `"./Course/Data/covid.csv"` and `"Course/Data/covid.csv"` are equivalent. +On the other hand, the "two periods" refers to the previous directory. +For example, the path `"Course/.."` is equivalent to the path `"."`, that is, the current working directory. + +Therefore, the path `"Course/.."` refers to the folder before the `Course` folder. +As another example, the path `"src/writexml/../xml.cpp"` refers to the file `xml.cpp` +that is inside the folder before the `writexml` folder, which in this example is the `src` folder. +Therefore, this path is equivalent to `"src/xml.cpp"`. + + + + +## The CWD handler + +In Zig, filesystem operations are usually made through a directory handler object. +A directory handler in Zig is an object of type `Dir`, which is an object that describes +a particular folder in the filesystem of our computer. +You normally create a `Dir` object, by calling the `std.fs.cwd()` function. +This function returns a `Dir` object that points to (or, that describes) the +current working directory (CWD). + +Through this `Dir` object, you can create new files, or modify, or read existing ones that are +inside your CWD. In other words, a `Dir` object is the main entrypoint in Zig to perform +multiple types of filesystem operations. +In the example below, we are creating this `Dir` object, and storing it +inside the `cwd` object. Although we are not using this object at this code example, +we are going to use it a lot over the next examples. + +```{zig} +#| auto_main: true +const cwd = std.fs.cwd(); +_ = cwd; +``` + + + + + + + + +## File operations + +### Creating files {#sec-creating-files} + +We create new files by using the `createFile()` method from the `Dir` object. +Just provide the name of the file that you want to create, and this function will +do the necessary steps to create such file. You can also provide a relative path to this function, +and it will create the file by following this path, which is relative to the CWD. + +This function might return an error, so, you should use `try`, `catch`, or any of the other methods presented +at @sec-error-handling to handle the possible error. But if everything goes well, +this `createFile()` method returns a file descriptor object (i.e. a `File` object) as result, +through which you can add content to the file with the IO operations that I presented before. + +Take this code example below. In this example, we are creating a new text file +named `foo.txt`. If the function `createFile()` succeeds, the object named `file` will contain a file descriptor +object, which we can use to write (or add) new content to the file, like we do in this example, by using +a buffered writer object to write a new line of text to the file. + +Now, a quick note, when we create a file descriptor object in C, by using a C function like `fopen()`, we must always close the file +at the end of our program, or, as soon as we complete all operations that we wanted to perform +on the file. In Zig, this is no different. So everytime we create a new file, this file remains +"open", waiting for some operation to be performed. As soon as we are done with it, we always have +to close this file, to free the resources associated with it. +In Zig, we do this by calling the method `close()` from the file descriptor object. + + +```{zig} +#| eval: false +#| auto_main: true +const cwd = std.fs.cwd(); +const file = try cwd.createFile("foo.txt", .{}); +// Don't forget to close the file at the end. +defer file.close(); +// Do things with the file ... +var fw = file.writer(); +_ = try fw.writeAll( + "Writing this line to the file\n" +); +``` + + +So, in this example we not only have created a file into the filesystem, +but we also wrote some data into this file, using the file descriptor object +returned by `createFile()`. If the file that you are trying to create +already exists in your filesystem, this `createFile()` call will +overwrite the contents of the file, or, in other words, it will +in practice erase all the contents of the existing file. + +If you don't want this to happen, meaning, that you don't want to overwrite +the contents of the existing file, but you want to write data to this file anyway +(i.e. you want to append data to the file), you should use the `openFile()` +method from the `Dir` object. + +Another important aspect about `createFile()` is that this method creates a file +that is not opened to read operations by default. It means that you cannot read this file. +You are not allowed to. +So for example, you might want to write some stuff into this file at the beginning of the execution +of your program. Then, at a future point in your program you might need to read what you have +wroted into this file. If you try to read data from this file, you will likely +get a `NotOpenForReading` error as result. + + +But how can you overcome this barrier? How can you create a file that is open +to read operations? All you have to do, is to set the `read` flag to true +in the second argument of `createFile()`. When you set this flag to true, +then the file get's create with "read permissions", and, as consequence, +a program like this one below becomes valid: + + +```{zig} +#| eval: false +#| auto_main: true +#| build_type: "run" +const cwd = std.fs.cwd(); +const file = try cwd.createFile("foo.txt", .{ .read = true }); +defer file.close(); + +var fw = file.writer(); +_ = try fw.writeAll("We are going to read this line\n"); + +var buffer: [300]u8 = undefined; +@memset(buffer[0..], 0); +try file.seekTo(0); +var fr = file.reader(); +_ = try fr.readAll(buffer[0..]); +try stdout.print("{s}\n", .{buffer}); +``` + + +``` +We are going to read this line +``` + + +If you are not familiar with position indicators, you may not recognize what the method +`seekTo()` is, or, what does it do. If that is your case, do not worry, +we are going to talk more about this method at @sec-indicators. But essentially +this method is moving the position indicator back to the beginning of the file, +so that we can read the contents of the file from the beginning. + + +### Opening files and appending data to it + +Opening files is easy. Just use the `openFile()` method instead of `createFile()`. +In the first argument of `openFile()` you provide the path to the file that +you want to open. Then, on the second argument you provide the flags (or, the options) +that dictates how the file is opened. + +You can see the full list of options for `openFile()` by visiting the documentation for +[`OpenFlags`](https://ziglang.org/documentation/master/std/#std.fs.File.OpenFlags)[^oflags]. +But the main flag that you will most certainly be worried about is the `mode` flag. +This flag specifies the IO mode that the file will be using when it get's opened. +There are three IO modes, or, three values that you can provide to this flag, which are: + +- `read_only`, allows only read operations on the file. All write operations are blocked. +- `write_only`, allows only write operations on the file. All read operations are blocked. +- `read_write`, allows both write and read operations on the file. + +[^oflags]: + +These modes are similar to the modes that you provide to the `mode` argument of the +`open()` Python built-in function[^py-open], or, the `mode` argument of the +`fopen()` C function[^c-open]. +In the code example below, we are opening the `foo.txt` text file with a `write_only` mode, +and appending a new line of text to the end of the file. We use `seekFromEnd()` this time +to garantee that we are going to append the text to the end of the file. Once again, methods +such as `seekFromEnd()` are described in more depth at @sec-indicators. + +[^py-open]: +[^c-open]: + + + +```{zig} +#| eval: false +#| auto_main: true +#| build_type: "run" +const cwd = std.fs.cwd(); +const file = try cwd.openFile("foo.txt", .{ .mode = .write_only }); +defer file.close(); +try file.seekFromEnd(0); +var fw = file.writer(); +_ = try fw.writeAll("Some random text to write\n"); +``` + + +### Deleting files + +Sometimes, we just need to delete/remove the files that we have. +To do that, we use the `deleteFile()` method. You just provide the path of the +file that you want to delete, and this method will try to delete the file located +at this path. + +```{zig} +#| eval: false +const cwd = std.fs.cwd(); +try cwd.deleteFile("foo.txt"); +``` + +### Copying files + +To copy existing files, we use the `copyFile()` method. The first argument in this method +is the path to the file that you want to copy. The second argument is a `Dir` object, i.e. a directory handler, +more specifically, a `Dir` object that points to the folder in your computer where you want to +copy the file to. The third argument is the new path of the file, or, in other words, the new location +of the file. The fourth argument is the options (or flags) to be used in the copy operation. + +The `Dir` object that you provide as input to this method will be used to copy the file to +the new location. You may create this `Dir` object before calling the `copyFile()` method. +Maybe you are planning to copy the file to a completly different location in your computer, +so it might be worth to create a directory handler to that location. But if you copying the +file to a subfolder of your CWD, then, you can just simply pass the CWD handler to this argument. + +```{zig} +#| eval: false +const cwd = std.fs.cwd(); +try cwd.copyFile( + "foo.txt", + cwd, + "ZigExamples/file-io/foo.txt", + .{} +); +``` + + +### Read the docs! + +There are some other useful methods for file operations available at `Dir` objects, +such as the `writeFile()` method, but I recommend you to read the docs for the +[`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir] +to explore the other available methods, since I already talked too much about them. + + +[^zig-dir]: + + + + +## Position indicators {#sec-indicators} + +A position indicator is like a type of cursor, or, an index. This "index" identifies the current +location in the file (or, in the data stream) that the file descriptor object that you have +is currently looking at. +When you create a file descriptor, the position indicator starts at the beginning of the file, +or, at the beginning of the stream. When you read or write data into the file (or socket, or data stream, etc.) +described by this file descriptor object, you end up moving the position indicator. + +In other words, any IO operation have a common side effect, which is moving the position indicator. +For example, suppose that we have a file of 300 bytes total in size. If you +read 100 bytes from the file, the position indicator moves 100 bytes forward. If you try +to write 50 bytes into this same file, these 50 bytes will be written from the current +position indicated by the position indicator. Since the indicator is at a 100 bytes forward from +the beginning of the file, these 50 bytes would be written in the middle of the file. + +This is why we have used the `seekTo()` method at the last code example presented at @sec-creating-files. +We have used this method to move the position indicator back to the beginning of the file, which +would make sure that we would write the text that we wanted to write from the beginning of the file, +instead of writing it from the middle of the file. Because before the write operation, we already had +performed a read operation, which means that the position indicator was moved in this read operation. + +The position indicators of a file descriptor object can be changed (or altered) by using the +"seek" methods from this file descriptor, which are: `seekTo()`, `seekFromEnd()` and `seekBy()`. +These methods have the same effect, or, the same resposibility that the +[`fseek()`](https://en.cppreference.com/w/c/io/fseek)[^c-fseek] +C function. + +[^c-fseek]: + + +Considering that `offset` refers to the index that you provide as input to these "seek" methods, +the bulletpoints below summarises what is the effect of each of these methods. +A quick note, in the case of `seekFromEnd()` and `seekBy()`, the `offset` provided can be either a +positive or negative index. + +- `seekTo()` will move the position indicator to the location that is `offset` bytes from the beginning of the file. +- `seekFromEnd()` will move the position indicator to the location that is `offset` bytes from the end of the file. +- `seekBy()` will move the position indicator to the location that is `offset` bytes from the current position in the file. + + + + + + + +## Directory operations + +### Iterating through the files in a directory + +One of the most classic tasks related to filesystem is to be able +to iterate through the existing files in a directory. Iteration +over a directory is made in Zig through a iterator pattern. In other words, we need +to create a iterator object, and use this object to iterate through the files. + +You can produce such directory object by using either the `iterate()` or `walk()` methods +of a `Dir` object. Both methods return a iterator object as result, which you can advance by using +`next()`. The difference between these methods, is that `iterate()` returns a non-recursive iterator, +while `walk()` does. It means that the iterator returned by `walk()` will not only iterate through +the files available in the current directory, but also, through the files from any subdirectory found +inside the current directory. + +In the example below, we are displaying the names of the files stored inside the +directory `ZigExamples/file-io`. Notice that we had to open this directory through +the `openDir()` function. Also notice that we provided the flag `iterate` in the +second argument of `openDir()`. This flag is important, because without this flag, +we would not be allowed to iterate through the files in this directory. + +```{zig} +#| eval: false +#| auto_main: true +#| build_type: "run" +const cwd = std.fs.cwd(); +const dir = try cwd.openDir( + "ZigExamples/file-io/", + .{ .iterate = true } +); +var it = dir.iterate(); +while (try it.next()) |entry| { + try stdout.print( + "File name: {s}\n", + .{entry.name} + ); +} +``` + +``` +File name: create_file_and_write_toit.zig +File name: create_file.zig +File name: lorem.txt +File name: iterate.zig +File name: delete_file.zig +File name: append_to_file.zig +File name: user_input.zig +File name: foo.txt +File name: create_file_and_read.zig +File name: buff_io.zig +File name: copy_file.zig +``` + + +### Creating new directories + +There are two methods that are important when it comes to +creating directories, which are `makeDir()` and `makePath()`. +The difference between these two methods is that `makeDir()` can +only create one single directory in the current directory in each call, +while `makePath()` is capable of recursively create subdirectories in the same call. + + +This is why the name of this method is "make path". It will create as many +subdirectories as necessary to create the path that you provided as input. +So, if you provide the path `"sub1/sub2/sub3"` as input to this method, +it will create three different subdirectories, `sub1`, `sub2` and `sub3`, +within the same function call. In contrast, if you provided such path +as input to `makeDir()`, you would likely get an error as result, since +this method can only create a single subdirectory. + +```{zig} +#| auto_main: true +#| eval: false +const cwd = std.fs.cwd(); +try cwd.makeDir("src"); +try cwd.makePath("src/decoders/jpg/"); +``` + +### Deleting directories + +To delete a directory, just provide the path to the directory that you want to delete +as input to the `deleteDir()` method from a `Dir` object. In the example below, +we are deleting the `src` directory that we have just created in the previous example. + +```{zig} +#| auto_main: true +#| eval: false +const cwd = std.fs.cwd(); +try cwd.deleteDir("src"); +``` + + +## Conclusion + +In this chapter, I have described how to perform in Zig the most common filesystem and IO operations. +But you might feel the lack of some other, less common, operation in this chapter, such as: how to rename files, +or how to open a directory, or how to create symbolic links, or how to use `access()` to test if a particular +path exists in your computer. But for all of these less common tasks, I recommend you to read +the docs of the [`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir] +, since you can find a good description of these cases there. + + + + diff --git a/Chapters/13-image-filter.qmd b/Chapters/13-image-filter.qmd index 84c4813e..45027af0 100644 --- a/Chapters/13-image-filter.qmd +++ b/Chapters/13-image-filter.qmd @@ -1,665 +1,665 @@ ---- -engine: knitr -knitr: true -syntax-definition: "../Assets/zig.xml" ---- - -```{r} -#| include: false -source("../zig_engine.R") -knitr::opts_chunk$set( - auto_main = FALSE, - build_type = "lib" -) -``` - - - -# Project 4 - Developing an image filter - -In this chapter we are going to build a new small project. The objective of -this project is to build a program in Zig that applies a filter over an image. -More specifically, a "grayscale filter". This filter essentially transforms -any color image into a grayscale image. - -We are going to use the image displayed at @fig-pascal as the example in this project. -In other words, we are going to transform this colored image, into a grayscale image, -using our "image filter program" written in Zig. - -![A photo of the chilean-american actor Pedro Pascal. Source: Google Images.](../ZigExamples/image_filter/pedro_pascal.png){#fig-pascal} - -We don't need to write a lot of code to build such "image filter program". However, in order for us -to build such program, we first need to understand how digital images work. That is why we begin this chapter -by explaining the theory behind digital images and how colors are represented in modern computers. -We also give a brief explanation about the file format PNG (Portable Network Graphics), which is used -in the example images. - -At the end of this chapter, we will have a full example of a program that takes the PNG image displayed at @fig-pascal -as input, and writes a new image to the current working directory that is the grayscale version of the input image. -This grayscale version of @fig-pascal that is written by the program is exposed at @fig-pascal-gray. -You can find the full source code of this small project at the `ZigExamples/image_filter` -[folder at the official repository of this book](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/image_filter)[^img-filter-folder]. - - -![The grayscale version of the photo.](../ZigExamples/image_filter/pedro_pascal_filter.png){#fig-pascal-gray} - - -## How we see things? {#sec-eyes} - -In this section, I want to briefly describe to you how we (humans) actually see things with our own eyes. -I mean, how our eyes work? If you do have a very basic understanding of how our eyes work, you will understand -more easily how digital images are made. Because the techniques used to produce a digital image -were developed by taking a lot of inspiration from how our human eyes work. - -You can interpret a human eye as a light sensor, or, a light receptor. The eye receives some amount of light as input, -and it interprets the colors that are present in this "amount of light". -If no amount of light hits the eye, then, the eye cannot extract color from it, and as result, -we end up seeing nothing, or, more precisely, we see complete blackness. - -So everything depends on light. What we actually see are the colors (blue, red, orange, green, purple, yellow, etc.) that -are being reflected from the light that is hitting our eyes. **Light is the source of all colors!** -This is what Isaac Newton discovered on his famous prism experiment[^newton] in the 1660s. - -[^newton]: - -Inside our eyes, we have a specific type of cell called the "cone cell". -Our eye have three different types, or, three different versions of these "cone cells". -Each of these three types of cone cell is very sensitive to a specific spectrum of the light, -which are the spectrums that define the colors red, green and blue. -So, in summary, our eyes have specific types of cells that -are highly sensitive to these three colors (red, green and blue). - -These are the cells responsible for perceiving the color present in the light that hits our eyes. -As a result, our eyes perceives color as a mixture of these three colors (red, green and blue). By having an amount -of each one of these three colors, and mixing them together, we can get any other visible color -that we want. So every color that we see is perceived as a specific mixture of blues, greens and reds, -like 30% of red, plus 20% of green, plus 50% of blue. - -When these cone cells perceive (or, detect) the colors that are found in the -light that is hitting our eyes, these cells produce electrical signals and sent them to the brain. -Our brain interprets these electrical signals, and use them to form the image that we are seeing -inside our head. - -Based on what we have discussed here, the items below describes the sequence of events that -composes this very simplified version of how our human eyes work: - -1. Light hits our eyes. -1. The cone cells perceive the colors that are present in this light. -1. Cone cells produce electrical signals that describes the colors that were perceived in the light. -1. The electrical signals are sent to the brain. -1. Brain interprets these signals, and form the image based on the colors identified by these electrical signals. - - -## How digital images work? {#sec-digital-img} - -A digital image is a "digital representation" of an image that we see with our eyes. -In other words, a digital image is a "digital representation" of the colors that we see -and perceive through the light. -In the digital world, we have two types of images, which are: vector image and raster image. -Vector images are not described here. So just remember that every single aspect that we discuss -here in this chapter about digital images **are related solely to raster images**, and not vector images. - -Raster images are digital images that are represented as a 2D (two dimensional) matrix -of pixels. In other words, every raster image is basically a rectangle of pixels. Each pixel have a particular color. -So, a raster image is just a rectangle of pixels, and each of these pixels are displayed in the screen of your computer (or the screen -of any other device, e.g. laptop, tablet, smartphone, etc.) as a color. - -@fig-raster demonstrates this idea. If you take any raster image, and you zoom into it very hard, -you will see the actual pixels of the image. Although JPEG and PNG are file formats to store raster images, -when you zoom into JPEG, PNG, and some other types of raster image files, you usually do not quite see the pixels. -That is because most of these file formats implement techniques that affect how the pixels are displayed, -with the objective of increasing the details in the image. The most famous of these techniques is anti-aliasing, -which you probably know from video-games. But nevertheless, the pixels are still there! They are -just displayed differently in JPEG/PNG files. - -![Zooming over a raster image to see the pixels. Source: Google Images.](../Figures/imagem-raster.png){#fig-raster} - -The more pixels the image has, the more information and detail we can include in the image. -The more precise, sharp and pretty will look the image. That is why photographic cameras -usually produces big raster images, with several megapixels of resolution, to include as much detail as possible into the final image. -As an example, a digital image with dimensions of 1920 pixels of width and 1080 pixels of height, would be a image that -contains $1920 \times 1080 = 2073600$ pixels in total. You could also say that the "total area" of the image is -of 2073600 pixels, although the concept of "area" is not very used here in computer graphics. - -Most digital images we see in our modern world uses the RGB color model. RGB stands for (red, green and blue) if you did not -detected that yet. So the color of each pixel in these raster images are usually -represented as a mixture of red, green and blue, just like in our eyes. That is, the color of each pixel is identified by a set of -three different integer values. Each integer value identifies the "amount" of each color (red, green and blue). -For example, the set `(199, 78, 70)` identifies a color that is close to red. We have 199 of red, 78 of green, -and 70 of blue. In contrast, the set `(129, 77, 250)` describes a color that is more close to purple. Et cetera. - - - -### Images are displayed from top to bottom - -This is not a rule written in stone, but the big majority of digital images are displayed from top -to bottom and left to right. Most computers screens also follow this pattern. So, the first pixels -in the image are the ones that are at the top and left corner of the image. You can find a visual representation -of this logic at @fig-img-display. - -Also notice in @fig-img-display that, since a digital image is essentially a 2D matrix of pixels, -the image is organized into rows and columns of pixels. The columns are defined by the horizontal x axis, -while the rows are defined by the vertical y axis. - -Each pixel (i.e. the gray rectangles) exposed at @fig-img-display contains a number inside of it. -These numbers are the indexes of the pixels. You can notice that the first pixels are in the top and left -corner, and also, that the indexes of these pixels "grow to the sides", or, in other words, they grow in the direction of the horizontal x axis. -This means that most digital images are usually organized as rows of pixels. So when these digital images are -displayed, the screen display the first row of pixels, then, the second row, then, the third row, etc. - -![How the pixels of raster images are displayed.](./../Figures/image-display.png){#fig-img-display} - - - - - - -### Representing the matrix of pixels in code {#sec-pixel-repr} - -Ok, we know already that digital images are represented as a 2D matrix of pixels. -But we do not have a notion of a 2D matrix in Zig. Most low-level languages in general (Zig, C, Rust, etc.) do not have such notion. -So how do we represent such matrix of pixels in Zig, or any other low-level language. -The strategy that most programmers choose is to just use a normal 1D array to store the values of -this 2D matrix. In other words, you just create an normal 1D array, and store all values from both dimensions into this 1D array. - - -Now, remember, a digital image is represented as a -2D matrix of pixels, and each pixel is represented by 3 unsigned 8bit-integer values. -So, if we have for example a very small image of dimensions 4x3, then, -we have 12 pixels in total in this image. As a result, we need to create a normal array that -can store $3 \times 12 = 36$ integer values, more precisely, an array of 36 `u8` values. - -The reason why unsigned 8-bit integer (`u8`) values are used to represent the amounts of each color, -instead of any other integer type, is because they take the minimum amount of space as possible, or, -the minimum amount of bits as possible. Which helps to reduces the binary size of the image, i.e. of the 2D matrix. -Also, they convey a good amount of precision and detail about the colors, even though they can represent -a relatively small range (from 0 to 255) of "color amounts". - -Coming back to our initial example of a 4x3 image, -the `matrix` object exposed below could be an example of an 1D array that stores -the data that represents this 4x3 image. - -```zig -const matrix = [_]u8{ - 201, 10, 25, 185, 65, 70, - 65, 120, 110, 65, 120, 117, - 98, 95, 12, 213, 26, 88, - 143, 112, 65, 97, 99, 205, - 234, 105, 56, 43, 44, 216, - 45, 59, 243, 211, 209, 54, -}; -``` - -The first three integer values in this array are the color amounts of the first pixel in the image. -While the next three integer are the colors amounts for the second pixel. -And the sequence goes on in this pattern. So the size of the array that stores -the values of the pixels from a raster image is usually a multiple of 3. -In this case, the array have a size of 36. - -I mean, the size of the array is **usually** a multiple of 3, because in specific circumstances, -it can also be a multiple of 4. This happens when a transparency amount is -also included into the raster image. In other words, there are some types of raster images -that follow a different color model, which is the RGBA (red, green, blue and alpha) -color model. The "alpha" corresponds to an amount of transparency in the pixel. -So every pixel in a RGBA image is represented by a red, green, blue and alpha values. - -Most raster images uses the standard RGB model, so, for the most part, you will -see arrays sizes that are multiples of 3. But some images, specially the ones -that are stored in PNG files, might be using the RGBA model, and, therefore, are -represented by an array whose size is a multiple of 4. - -In our case here, the example image of our project (@fig-pascal) is a raster image -stored in a PNG file, and this specific image is using the RGBA color model. So -each pixel in the image is represented by 4 different integer values, and, as consequence, -to store this image in our Zig code, we need to create an array whose size is a multiple of 4. - - -## The PNG library we are going to use - -Let's begin our project by focusing on writing the necessary Zig code to -read the data from the PNG file. In other words, we want -to read the PNG file exposed at @fig-pascal, and parse it's data to extract the 2D matrix of pixels -that represents the image. - -As we have discussed at @sec-pixel-repr, the image that we are using as example here -is a PNG file that uses the RGBA color model, and, threfore, each pixel of the image -is represented by 4 integer values. You can download this image by visiting the `ZigExamples/image_filter` -[folder at the official repository of this book](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/image_filter)[^img-filter-folder]. -You can also find in this folder the complete source code of this small project that we -are developing here. - -[^img-filter-folder]: - -There are some C libraries available that we can use to read and parse PNG files. -The most famous and used of all is the `libpng`, which is the "official library" for reading and writing -PNG files. This C library is available in most operating system. But this C library is very known -for being a little complex and hard to use. - -That is why, I'm going to use a more modern alternative here in this project, which is the `libspng` library. -I have choose to use this C library here, because it is much, much -simpler to use than `libpng`, and also, offers very good performance for all operations. -You can checkout the [official website of the library](https://libspng.org/)[^libspng] -to know more about it. -You will also find there, some documentation that might help you to understand and -follow the code examples exposed here. - -[^libspng]: - - -First of all, remember to build and install this `libspng` into your system. Because -if you don't do this step, the `zig` compiler will not find the files and resources of -this library in your computer, and link them with the Zig code that we are going to write here -together. There is good information about how to build and install the library at the -[build section of the library documentation at the official website](https://libspng.org/docs/build/)[^lib-build]. - -[^lib-build]: - - - - -## Reading the PNG file - -In order to extract the pixel data from the PNG file, we need to read and decode the file. -A PNG file is just a binary file written in the "PNG format". We need to decode the -file in order to extract the pixel data from it. Luckily, the `libspng` library offers -a function called `spng_decode_image()` that does all this heavy work for us. - -Now, since `libspng` is a C library, most of the file and I/O operations in this library are made by using -a `FILE` C pointer. Because of that, is probably a better idea to use the `fopen()` C function -to open our PNG file, instead of using the `openFile()` method that I introduced at @sec-filesystem. -That is why I'm importing the `stdio.h` header in this project, and using the `fopen()` C function to open the file. - -If you look at the snippet below, you can see that we are: - -1. opening the PNG file with `fopen()`. -1. creating the `libspng` context with `spng_ctx_new()`. -1. using `spng_set_png_file()` to specify the `FILE` object that reads the PNG file that we are going to use. - -Every operation in `libspng` is made through a "context object". In our snippet below, this object is `ctx`. -Also, to perform an operation over a PNG file, we need to specify which PNG file we are referring to. -This is the job of `spng_set_png_file()`. We are using this function to specify the file descriptor -object that reads the PNG file that we want to use. - - -```zig -const c = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); -const png = @cImport({ - @cInclude("spng.h"); -}); - -const path = "pedro_pascal.png"; -const file_descriptor = c.fopen(path, "rb"); -if (file_descriptor == null) { - @panic("Could not open file!"); -} -const ctx = png.spng_ctx_new(0) orelse unreachable; -_ = png.spng_set_png_file( - ctx, @ptrCast(file_descriptor) -); -``` - -Before we continue, is important to emphasize the following: since we have opened the file with `fopen()`, -we have to remember to close the file at the end of the program, with `fclose()`. -In other words, after we have done everything that we wanted to do with the PNG file -`pedro_pascal.png`, we need to close the file, by applying `fclose()` over the file descriptor object. -We could use also the `defer` keyword to help us in this task, if we want to. -This code snippet below demonstrates this step: - -```zig -if (c.fclose(file_descriptor) != 0) { - return error.CouldNotCloseFileDescriptor; -} -``` - - - - -### Reading the image header section - -Now, the context object `ctx` is aware of our PNG file `pedro_pascal.png`, and it has access to -a file descriptor object to this file, and we can start doing operations in this file. The first thing that we -are going to do is to read the "image header section" of the PNG file. This "image header section" is the section -of the file that contains some basic information about the PNG file, like, the bit depth of the pixel data -of the image, the color model used in the file, the dimensions of the image (height and width in number of pixels), -etc. - -To make things easier, I will encapsulate this "read image header" operation into a -nice small function called `get_image_header()`. All that this function needs to do -is to call the `spng_get_ihdr()` function. This function from `libspng` is responsible -for reading the image header data, and storing it into a C struct named `spng_ihdr`. -Thus, an object of type `spng_ihdr` is a C struct that contains the data from the -image header section of the PNG file. - -Since this Zig function is receiving a C object (the `libspng` context object) as input, I marked -the function argument `ctx` as "a pointer to the context object" (`*png.spng_ctx`), following the recommendations -that we have discussed at @sec-pass-c-structs. - -```zig -fn get_image_header(ctx: *png.spng_ctx) !png.spng_ihdr { - var image_header: png.spng_ihdr = undefined; - if (png.spng_get_ihdr(ctx, &image_header) != 0) { - return error.CouldNotGetImageHeader; - } - - return image_header; -} - -var image_header = try get_image_header(ctx); -``` - -Also notice in this function, that I'm checking -if the `spng_get_ihdr()` function call have returned or not an integer value that is different than zero. -Most functions from the `libspng` library return a code status as result, and the code status zero means -"success". So any code status that is different than zero means that an error occurred while running `spng_get_ihdr()`. -This is why I'm returning an error value from the function in case the code status returned by the function -is different than zero. - - -### Allocating space for the pixel data - -Before we read the pixel data from the PNG file, we need to allocate enough space to hold this data. -But in order to allocate such space, we first need to know how much space we need to allocate. -The image dimension is obviously needed to calculate the size of this space. But there are -other elements that also affect this number, such as the color model used in the image, the bit depth, and others. - -Anyway, all of this means that calculating the size of the space that we need, is not a simple task. -That is why the `libspng` library offers a utility function named -`spng_decoded_image_size()` to calculate this size for us. Once again, I'm going -to encapsulate the logic around this function into a nice small Zig function -named `calc_output_size()`. -You can see that this function returns a nice integer value as result, informing -the size of the space that we need to allocate. - - -```zig -fn calc_output_size(ctx: *png.spng_ctx) !u64 { - var output_size: u64 = 0; - const status = png.spng_decoded_image_size( - ctx, png.SPNG_FMT_RGBA8, &output_size - ); - if (status != 0) { - return error.CouldNotCalcOutputSize; - } - return output_size; -} -``` - - - -You might quest yourself what the value `SPNG_FMT_RGBA8` means. This value is actually an enum value defined in the `spng.h` header file. -This enum is used to identify a "PNG format". More precisely, it identifies a PNG file -that uses the RGBA color model and 8 bit depth. So by providing this enum value as input to the `spng_decoded_image_size()` function, -we are saying to this function to calculate the size of the decoded pixel data considering a PNG file that follows this -"RGBA color model with 8 bit depth format". - -Having this function, we can use it in conjunction with an allocator object, to allocate an array of bytes (`u8` values) that is big enough -to store the decoded pixel data of the image. Notice that I'm using `@memset()` to properly initialize -the entire array to zero. - -```zig -const output_size = try calc_output_size(ctx); -var buffer = try allocator.alloc(u8, output_size); -@memset(buffer[0..], 0); -``` - - -### Decoding the image data - -Now that we have the necessary space to store the decoded pixel data of the image, -we can start to actually decode and extract this pixel data from the image, -by using the `spng_decode_image()` function. - -The `read_data_to_buffer()` function exposed below summarises the necessary -steps to read this decoded pixel data, and store it into an input buffer. -Notice that this function is encapsulating the logic around the `spng_decode_image()` function. -Also, we are using the `SPNG_FMT_RGBA8` enum value once again to inform the corresponding function, -that the PNG image being decoded, uses the RGBA color model and 8 bit depth. - -```zig -fn read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void { - const status = png.spng_decode_image( - ctx, - buffer.ptr, - buffer.len, - png.SPNG_FMT_RGBA8, - 0 - ); - - if (status != 0) { - return error.CouldNotDecodeImage; - } -} -``` - -Having this function at hand, we can apply it over our context object, and also, over -the buffer object that we have created in the previous section to hold the decoded pixel data -of the image: - -```zig -try read_data_to_buffer(ctx, buffer[0..]); -``` - - -### Looking at the pixel data - -Now that we have the pixel data stored in our "buffer object", we can take just -a quick look at the bytes. In the example below, we are looking at the first -12 bytes in the decoded pixel data. - -If you take a close look at these values, you might notice that every 4 bytes -in the sequence is 255. Which, coincidentally is the maximum possible integer value -to be represented by a `u8` value. So, if the range from 0 to 255, which is the range -of integer values that can be represented by an `u8` value, can be represented as a scale from 0% to 100%, -these 255 values are essentially 100% in that scale. - -If you recall from @sec-pixel-repr, I have -described in that section that our `pedro_pascal.png` PNG file uses the RGBA color model, -which adds an alpha (or transparency) byte to each pixel in the image. -As consequence, each pixel in the image is represented by 4 bytes. Since we are looking -here are the first 12 bytes in the image, it means that we are looking at the data of -the first $12 / 4 = 3$ pixels in the image. - -So, based on how these first 12 bytes (or these 3 pixels) look, with these 255 values at every 4 bytes, we can say that is likely -that every pixel in the image have alpha (or transparency) setted to 100%. This might not be true, -but, is the most likely possibility. Also, if we look at the image itself, which if your recall is -exposed at @fig-pascal, we can see that the transparency of the image does not change across the image, -which enforces this theory. - - -```zig -try stdout.print( - "{any}\n", .{buffer[0..12]} -); -``` - -``` -{ - 200, 194, 216, 255, 203, 197, - 219, 255, 206, 200, 223, 255 -} -``` - - -We can see in the above result that the first pixel in this image have 200 of red, 194 of green, and 216 of blue. -How do I know the order in which the colors appears in the sequence? If you have not guessed that yet, -is because of the acronym RGB. First RED, then GREEN, then BLUE. If we scale these integer values -according to our scale of 0% to 100% (0 to 255), we get 78% of red, 76% of green and 85% of blue. - - - -## Applying the image filter - -Now that we have the data of each pixel in the image, we can focus on applying our image -filter over these pixels. Remember, our objective here is to apply a grayscale filter over -the image. A grayscale filter is a filter that transforms a colored image into a grayscale image. - -There are different formulas and strategies to transform a colored image into a grayscale image. -But all of these different strategies involve -applying some math over the colors of each pixel. In this project, we are going to use the -most general formula, which is exposed below. -This formula considers $r$ as the red of the pixel, $g$ as the green, $b$ as the blue, and $p'$ as the -linear luminance of the pixel. - -$$ - p' = (0.2126 \times r) + (0.7152 \times g) + (0.0722 \times b) -$${#eq-grayscale} - -This @eq-grayscale is the formula to calculate the linear luminance of a pixel. Is worth noting that this formula -works only for images whose pixels are using the sRGB color space, which is the standard color space -for the web. This means that, ideally, all images on the web should use this color space. Luckily, -this is our case here, i.e. the `pedro_pascal.png` image is using this sRGB color space, and, as consequence, -we can use the @eq-grayscale. You can read more about this at the Wikipedia page for grayscale [@wiki_grayscale]. - -The `apply_image_filter()` function exposed below summarises the necessary steps to -apply @eq-grayscale over the pixels in the image. We just apply this function -over our buffer object that contains our pixel data, and, as result, the pixel -data stored in this buffer object should now represent the grayscale version of our image. - -```zig -fn apply_image_filter(buffer:[]u8) !void { - const len = buffer.len; - const red_factor: f16 = 0.2126; - const green_factor: f16 = 0.7152; - const blue_factor: f16 = 0.0722; - var index: u64 = 0; - while (index < (len - 4)) : (index += 4) { - const rf: f16 = @floatFromInt(buffer[index]); - const gf: f16 = @floatFromInt(buffer[index + 1]); - const bf: f16 = @floatFromInt(buffer[index + 2]); - const y_linear: f16 = ( - (rf * red_factor) + (gf * green_factor) - + (bf * blue_factor) - ); - buffer[index] = @intFromFloat(y_linear); - buffer[index + 1] = @intFromFloat(y_linear); - buffer[index + 2] = @intFromFloat(y_linear); - } -} - -try apply_image_filter(buffer[0..]); -``` - - - -## Saving the grayscale version of the image - -Since we have now the grayscale version of our image stored in our buffer object, -we need to encode this buffer object back into the "PNG format", and save the encoded data into -a new PNG file in our filesystem, so that we can access and see the grayscale version of our image -that was produced by our small program. - -To do that, the `libspng` library help us once again by offering an "encode data to PNG" type of function, -which is the `spng_encode_image()` function. But in order to "encode data to PNG" with `libspng`, we need -to create a new context object. This new context object must use a "encoder context", which -is identified by the enum value `SPNG_CTX_ENCODER`. - -The `save_png()` function exposed below, summarises all the necessary steps to save the -grayscale version of our image into a new PNG file in the filesystem. By default, this -function will save the grayscale image into a file named `pedro_pascal_filter.png` in the CWD. - -Notice in this code example that we are using the same image header object (`image_header`) that we have -collected previously with the `get_image_header()` function. Remember, this image header object -is a C struct (`spng_ihdr`) that contains basic information about our PNG file, specially -the dimensions of the image, the color model used, etc. - -If we wanted to save a very different image in this new PNG file, e.g. an image -with different dimensions, or, an image that uses a different color model, a different bit depth, etc. -we would need to create a new image header (`spng_ihdr`) object that describes the properties -of this new image. - -But we are essentially saving the same image that we have begin with here (the dimensions of -the image, the color model, etc. are all still the same). The only difference -between the two images are the colors of the pixels, which are now shades of gray. -As consequence, we can safely use the exact same image header data -in this new PNG file. - - - -```zig -fn save_png(image_header: *png.spng_ihdr, buffer: []u8) !void { - const path = "pedro_pascal_filter.png"; - const file_descriptor = c.fopen(path.ptr, "wb"); - if (file_descriptor == null) { - return error.CouldNotOpenFile; - } - const ctx = ( - png.spng_ctx_new(png.SPNG_CTX_ENCODER) - orelse unreachable - ); - defer png.spng_ctx_free(ctx); - _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); - _ = png.spng_set_ihdr(ctx, image_header); - - const encode_status = png.spng_encode_image( - ctx, - buffer.ptr, - buffer.len, - png.SPNG_FMT_PNG, - png.SPNG_ENCODE_FINALIZE - ); - if (encode_status != 0) { - return error.CouldNotEncodeImage; - } - if (c.fclose(file_descriptor) != 0) { - return error.CouldNotCloseFileDescriptor; - } -} - -try save_png(&image_header, buffer[0..]); -``` - -After we execute this `save_png()` function, we should have a new PNG file -inside our CWD, named `pedro_pascal_filter.png`. If we open this PNG file, -we will see the same image exposed at @fig-pascal-gray. - - -## Building our project - -Now that we have written the code, let's discuss how can we build/compile this project. -To do that, I'm going to create a `build.zig` file in the root directory of our project, -and start writing the necessary code to compile the project, using the knowledge -that we have acquired from @sec-build-system. - - -We first create the build target for our executable file, that executes our -Zig code. Let's suppose that all of our Zig code was written into a Zig module -named `image_filter.zig`, then, the `exe` object described in the build script below -describes the build target for our executable file. - -Since we have used C code from the `libspng` library in our Zig code, -we need to link our Zig code (which is in the `exe` build target) to both -the C Standard Library, and, to the `libspng` library. We do that, by calling -the `linkLibC()` and `linkSystemLibrary()` methods from our `exe` build target. - -```zig -const std = @import("std"); -pub fn build(b: *std.Build) void { - const target = b.standardTargetOptions(.{}); - const optimize = b.standardOptimizeOption(.{}); - const exe = b.addExecutable(.{ - .name = "image_filter", - .root_source_file = b.path("src/image_filter.zig"), - .target = target, - .optimize = optimize, - }); - exe.linkLibC(); - // Link to spng library: - exe.linkSystemLibrary("spng"); - b.installArtifact(exe); -} -``` - -Because we are using the `linkSystemLibrary()` method, it means that the library -files for `libspng` are searched in your system to be linked with the `exe` build target. -If you have not yet built and installed the `libspng` library into your system, this -linkage step will likely not work. Because it will not find the library files. - - -So, just remember to install `libspng` in your system, if you want to build this project. -Having this build script above written, we can finally build our project by -running the `zig build` command in the terminal. - - +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + +# Project 4 - Developing an image filter + +In this chapter we are going to build a new small project. The objective of +this project is to build a program in Zig that applies a filter over an image. +More specifically, a "grayscale filter". This filter essentially transforms +any color image into a grayscale image. + +We are going to use the image displayed at @fig-pascal as the example in this project. +In other words, we are going to transform this colored image, into a grayscale image, +using our "image filter program" written in Zig. + +![A photo of the chilean-american actor Pedro Pascal. Source: Google Images.](../ZigExamples/image_filter/pedro_pascal.png){#fig-pascal} + +We don't need to write a lot of code to build such "image filter program". However, in order for us +to build such program, we first need to understand how digital images work. That is why we begin this chapter +by explaining the theory behind digital images and how colors are represented in modern computers. +We also give a brief explanation about the file format PNG (Portable Network Graphics), which is used +in the example images. + +At the end of this chapter, we will have a full example of a program that takes the PNG image displayed at @fig-pascal +as input, and writes a new image to the current working directory that is the grayscale version of the input image. +This grayscale version of @fig-pascal that is written by the program is exposed at @fig-pascal-gray. +You can find the full source code of this small project at the `ZigExamples/image_filter` +[folder at the official repository of this book](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/image_filter)[^img-filter-folder]. + + +![The grayscale version of the photo.](../ZigExamples/image_filter/pedro_pascal_filter.png){#fig-pascal-gray} + + +## How we see things? {#sec-eyes} + +In this section, I want to briefly describe to you how we (humans) actually see things with our own eyes. +I mean, how our eyes work? If you do have a very basic understanding of how our eyes work, you will understand +more easily how digital images are made. Because the techniques used to produce a digital image +were developed by taking a lot of inspiration from how our human eyes work. + +You can interpret a human eye as a light sensor, or, a light receptor. The eye receives some amount of light as input, +and it interprets the colors that are present in this "amount of light". +If no amount of light hits the eye, then, the eye cannot extract color from it, and as result, +we end up seeing nothing, or, more precisely, we see complete blackness. + +So everything depends on light. What we actually see are the colors (blue, red, orange, green, purple, yellow, etc.) that +are being reflected from the light that is hitting our eyes. **Light is the source of all colors!** +This is what Isaac Newton discovered on his famous prism experiment[^newton] in the 1660s. + +[^newton]: + +Inside our eyes, we have a specific type of cell called the "cone cell". +Our eye have three different types, or, three different versions of these "cone cells". +Each of these three types of cone cell is very sensitive to a specific spectrum of the light, +which are the spectrums that define the colors red, green and blue. +So, in summary, our eyes have specific types of cells that +are highly sensitive to these three colors (red, green and blue). + +These are the cells responsible for perceiving the color present in the light that hits our eyes. +As a result, our eyes perceives color as a mixture of these three colors (red, green and blue). By having an amount +of each one of these three colors, and mixing them together, we can get any other visible color +that we want. So every color that we see is perceived as a specific mixture of blues, greens and reds, +like 30% of red, plus 20% of green, plus 50% of blue. + +When these cone cells perceive (or, detect) the colors that are found in the +light that is hitting our eyes, these cells produce electrical signals and sent them to the brain. +Our brain interprets these electrical signals, and use them to form the image that we are seeing +inside our head. + +Based on what we have discussed here, the items below describes the sequence of events that +composes this very simplified version of how our human eyes work: + +1. Light hits our eyes. +1. The cone cells perceive the colors that are present in this light. +1. Cone cells produce electrical signals that describes the colors that were perceived in the light. +1. The electrical signals are sent to the brain. +1. Brain interprets these signals, and form the image based on the colors identified by these electrical signals. + + +## How digital images work? {#sec-digital-img} + +A digital image is a "digital representation" of an image that we see with our eyes. +In other words, a digital image is a "digital representation" of the colors that we see +and perceive through the light. +In the digital world, we have two types of images, which are: vector image and raster image. +Vector images are not described here. So just remember that every single aspect that we discuss +here in this chapter about digital images **are related solely to raster images**, and not vector images. + +Raster images are digital images that are represented as a 2D (two dimensional) matrix +of pixels. In other words, every raster image is basically a rectangle of pixels. Each pixel have a particular color. +So, a raster image is just a rectangle of pixels, and each of these pixels are displayed in the screen of your computer (or the screen +of any other device, e.g. laptop, tablet, smartphone, etc.) as a color. + +@fig-raster demonstrates this idea. If you take any raster image, and you zoom into it very hard, +you will see the actual pixels of the image. Although JPEG and PNG are file formats to store raster images, +when you zoom into JPEG, PNG, and some other types of raster image files, you usually do not quite see the pixels. +That is because most of these file formats implement techniques that affect how the pixels are displayed, +with the objective of increasing the details in the image. The most famous of these techniques is anti-aliasing, +which you probably know from video-games. But nevertheless, the pixels are still there! They are +just displayed differently in JPEG/PNG files. + +![Zooming over a raster image to see the pixels. Source: Google Images.](../Figures/imagem-raster.png){#fig-raster} + +The more pixels the image has, the more information and detail we can include in the image. +The more precise, sharp and pretty will look the image. That is why photographic cameras +usually produces big raster images, with several megapixels of resolution, to include as much detail as possible into the final image. +As an example, a digital image with dimensions of 1920 pixels of width and 1080 pixels of height, would be a image that +contains $1920 \times 1080 = 2073600$ pixels in total. You could also say that the "total area" of the image is +of 2073600 pixels, although the concept of "area" is not very used here in computer graphics. + +Most digital images we see in our modern world uses the RGB color model. RGB stands for (red, green and blue) if you did not +detected that yet. So the color of each pixel in these raster images are usually +represented as a mixture of red, green and blue, just like in our eyes. That is, the color of each pixel is identified by a set of +three different integer values. Each integer value identifies the "amount" of each color (red, green and blue). +For example, the set `(199, 78, 70)` identifies a color that is close to red. We have 199 of red, 78 of green, +and 70 of blue. In contrast, the set `(129, 77, 250)` describes a color that is more close to purple. Et cetera. + + + +### Images are displayed from top to bottom + +This is not a rule written in stone, but the big majority of digital images are displayed from top +to bottom and left to right. Most computers screens also follow this pattern. So, the first pixels +in the image are the ones that are at the top and left corner of the image. You can find a visual representation +of this logic at @fig-img-display. + +Also notice in @fig-img-display that, since a digital image is essentially a 2D matrix of pixels, +the image is organized into rows and columns of pixels. The columns are defined by the horizontal x axis, +while the rows are defined by the vertical y axis. + +Each pixel (i.e. the gray rectangles) exposed at @fig-img-display contains a number inside of it. +These numbers are the indexes of the pixels. You can notice that the first pixels are in the top and left +corner, and also, that the indexes of these pixels "grow to the sides", or, in other words, they grow in the direction of the horizontal x axis. +This means that most digital images are usually organized as rows of pixels. So when these digital images are +displayed, the screen display the first row of pixels, then, the second row, then, the third row, etc. + +![How the pixels of raster images are displayed.](./../Figures/image-display.png){#fig-img-display} + + + + + + +### Representing the matrix of pixels in code {#sec-pixel-repr} + +Ok, we know already that digital images are represented as a 2D matrix of pixels. +But we do not have a notion of a 2D matrix in Zig. Most low-level languages in general (Zig, C, Rust, etc.) do not have such notion. +So how do we represent such matrix of pixels in Zig, or any other low-level language. +The strategy that most programmers choose is to just use a normal 1D array to store the values of +this 2D matrix. In other words, you just create an normal 1D array, and store all values from both dimensions into this 1D array. + + +Now, remember, a digital image is represented as a +2D matrix of pixels, and each pixel is represented by 3 unsigned 8bit-integer values. +So, if we have for example a very small image of dimensions 4x3, then, +we have 12 pixels in total in this image. As a result, we need to create a normal array that +can store $3 \times 12 = 36$ integer values, more precisely, an array of 36 `u8` values. + +The reason why unsigned 8-bit integer (`u8`) values are used to represent the amounts of each color, +instead of any other integer type, is because they take the minimum amount of space as possible, or, +the minimum amount of bits as possible. Which helps to reduces the binary size of the image, i.e. of the 2D matrix. +Also, they convey a good amount of precision and detail about the colors, even though they can represent +a relatively small range (from 0 to 255) of "color amounts". + +Coming back to our initial example of a 4x3 image, +the `matrix` object exposed below could be an example of an 1D array that stores +the data that represents this 4x3 image. + +```zig +const matrix = [_]u8{ + 201, 10, 25, 185, 65, 70, + 65, 120, 110, 65, 120, 117, + 98, 95, 12, 213, 26, 88, + 143, 112, 65, 97, 99, 205, + 234, 105, 56, 43, 44, 216, + 45, 59, 243, 211, 209, 54, +}; +``` + +The first three integer values in this array are the color amounts of the first pixel in the image. +While the next three integer are the colors amounts for the second pixel. +And the sequence goes on in this pattern. So the size of the array that stores +the values of the pixels from a raster image is usually a multiple of 3. +In this case, the array have a size of 36. + +I mean, the size of the array is **usually** a multiple of 3, because in specific circumstances, +it can also be a multiple of 4. This happens when a transparency amount is +also included into the raster image. In other words, there are some types of raster images +that follow a different color model, which is the RGBA (red, green, blue and alpha) +color model. The "alpha" corresponds to an amount of transparency in the pixel. +So every pixel in a RGBA image is represented by a red, green, blue and alpha values. + +Most raster images uses the standard RGB model, so, for the most part, you will +see arrays sizes that are multiples of 3. But some images, specially the ones +that are stored in PNG files, might be using the RGBA model, and, therefore, are +represented by an array whose size is a multiple of 4. + +In our case here, the example image of our project (@fig-pascal) is a raster image +stored in a PNG file, and this specific image is using the RGBA color model. So +each pixel in the image is represented by 4 different integer values, and, as consequence, +to store this image in our Zig code, we need to create an array whose size is a multiple of 4. + + +## The PNG library we are going to use + +Let's begin our project by focusing on writing the necessary Zig code to +read the data from the PNG file. In other words, we want +to read the PNG file exposed at @fig-pascal, and parse it's data to extract the 2D matrix of pixels +that represents the image. + +As we have discussed at @sec-pixel-repr, the image that we are using as example here +is a PNG file that uses the RGBA color model, and, threfore, each pixel of the image +is represented by 4 integer values. You can download this image by visiting the `ZigExamples/image_filter` +[folder at the official repository of this book](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/image_filter)[^img-filter-folder]. +You can also find in this folder the complete source code of this small project that we +are developing here. + +[^img-filter-folder]: + +There are some C libraries available that we can use to read and parse PNG files. +The most famous and used of all is the `libpng`, which is the "official library" for reading and writing +PNG files. This C library is available in most operating system. But this C library is very known +for being a little complex and hard to use. + +That is why, I'm going to use a more modern alternative here in this project, which is the `libspng` library. +I have choose to use this C library here, because it is much, much +simpler to use than `libpng`, and also, offers very good performance for all operations. +You can checkout the [official website of the library](https://libspng.org/)[^libspng] +to know more about it. +You will also find there, some documentation that might help you to understand and +follow the code examples exposed here. + +[^libspng]: + + +First of all, remember to build and install this `libspng` into your system. Because +if you don't do this step, the `zig` compiler will not find the files and resources of +this library in your computer, and link them with the Zig code that we are going to write here +together. There is good information about how to build and install the library at the +[build section of the library documentation at the official website](https://libspng.org/docs/build/)[^lib-build]. + +[^lib-build]: + + + + +## Reading the PNG file + +In order to extract the pixel data from the PNG file, we need to read and decode the file. +A PNG file is just a binary file written in the "PNG format". We need to decode the +file in order to extract the pixel data from it. Luckily, the `libspng` library offers +a function called `spng_decode_image()` that does all this heavy work for us. + +Now, since `libspng` is a C library, most of the file and I/O operations in this library are made by using +a `FILE` C pointer. Because of that, is probably a better idea to use the `fopen()` C function +to open our PNG file, instead of using the `openFile()` method that I introduced at @sec-filesystem. +That is why I'm importing the `stdio.h` header in this project, and using the `fopen()` C function to open the file. + +If you look at the snippet below, you can see that we are: + +1. opening the PNG file with `fopen()`. +1. creating the `libspng` context with `spng_ctx_new()`. +1. using `spng_set_png_file()` to specify the `FILE` object that reads the PNG file that we are going to use. + +Every operation in `libspng` is made through a "context object". In our snippet below, this object is `ctx`. +Also, to perform an operation over a PNG file, we need to specify which PNG file we are referring to. +This is the job of `spng_set_png_file()`. We are using this function to specify the file descriptor +object that reads the PNG file that we want to use. + + +```zig +const c = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); +const png = @cImport({ + @cInclude("spng.h"); +}); + +const path = "pedro_pascal.png"; +const file_descriptor = c.fopen(path, "rb"); +if (file_descriptor == null) { + @panic("Could not open file!"); +} +const ctx = png.spng_ctx_new(0) orelse unreachable; +_ = png.spng_set_png_file( + ctx, @ptrCast(file_descriptor) +); +``` + +Before we continue, is important to emphasize the following: since we have opened the file with `fopen()`, +we have to remember to close the file at the end of the program, with `fclose()`. +In other words, after we have done everything that we wanted to do with the PNG file +`pedro_pascal.png`, we need to close the file, by applying `fclose()` over the file descriptor object. +We could use also the `defer` keyword to help us in this task, if we want to. +This code snippet below demonstrates this step: + +```zig +if (c.fclose(file_descriptor) != 0) { + return error.CouldNotCloseFileDescriptor; +} +``` + + + + +### Reading the image header section + +Now, the context object `ctx` is aware of our PNG file `pedro_pascal.png`, and it has access to +a file descriptor object to this file, and we can start doing operations in this file. The first thing that we +are going to do is to read the "image header section" of the PNG file. This "image header section" is the section +of the file that contains some basic information about the PNG file, like, the bit depth of the pixel data +of the image, the color model used in the file, the dimensions of the image (height and width in number of pixels), +etc. + +To make things easier, I will encapsulate this "read image header" operation into a +nice small function called `get_image_header()`. All that this function needs to do +is to call the `spng_get_ihdr()` function. This function from `libspng` is responsible +for reading the image header data, and storing it into a C struct named `spng_ihdr`. +Thus, an object of type `spng_ihdr` is a C struct that contains the data from the +image header section of the PNG file. + +Since this Zig function is receiving a C object (the `libspng` context object) as input, I marked +the function argument `ctx` as "a pointer to the context object" (`*png.spng_ctx`), following the recommendations +that we have discussed at @sec-pass-c-structs. + +```zig +fn get_image_header(ctx: *png.spng_ctx) !png.spng_ihdr { + var image_header: png.spng_ihdr = undefined; + if (png.spng_get_ihdr(ctx, &image_header) != 0) { + return error.CouldNotGetImageHeader; + } + + return image_header; +} + +var image_header = try get_image_header(ctx); +``` + +Also notice in this function, that I'm checking +if the `spng_get_ihdr()` function call have returned or not an integer value that is different than zero. +Most functions from the `libspng` library return a code status as result, and the code status zero means +"success". So any code status that is different than zero means that an error occurred while running `spng_get_ihdr()`. +This is why I'm returning an error value from the function in case the code status returned by the function +is different than zero. + + +### Allocating space for the pixel data + +Before we read the pixel data from the PNG file, we need to allocate enough space to hold this data. +But in order to allocate such space, we first need to know how much space we need to allocate. +The image dimension is obviously needed to calculate the size of this space. But there are +other elements that also affect this number, such as the color model used in the image, the bit depth, and others. + +Anyway, all of this means that calculating the size of the space that we need, is not a simple task. +That is why the `libspng` library offers a utility function named +`spng_decoded_image_size()` to calculate this size for us. Once again, I'm going +to encapsulate the logic around this function into a nice small Zig function +named `calc_output_size()`. +You can see that this function returns a nice integer value as result, informing +the size of the space that we need to allocate. + + +```zig +fn calc_output_size(ctx: *png.spng_ctx) !u64 { + var output_size: u64 = 0; + const status = png.spng_decoded_image_size( + ctx, png.SPNG_FMT_RGBA8, &output_size + ); + if (status != 0) { + return error.CouldNotCalcOutputSize; + } + return output_size; +} +``` + + + +You might quest yourself what the value `SPNG_FMT_RGBA8` means. This value is actually an enum value defined in the `spng.h` header file. +This enum is used to identify a "PNG format". More precisely, it identifies a PNG file +that uses the RGBA color model and 8 bit depth. So by providing this enum value as input to the `spng_decoded_image_size()` function, +we are saying to this function to calculate the size of the decoded pixel data considering a PNG file that follows this +"RGBA color model with 8 bit depth format". + +Having this function, we can use it in conjunction with an allocator object, to allocate an array of bytes (`u8` values) that is big enough +to store the decoded pixel data of the image. Notice that I'm using `@memset()` to properly initialize +the entire array to zero. + +```zig +const output_size = try calc_output_size(ctx); +var buffer = try allocator.alloc(u8, output_size); +@memset(buffer[0..], 0); +``` + + +### Decoding the image data + +Now that we have the necessary space to store the decoded pixel data of the image, +we can start to actually decode and extract this pixel data from the image, +by using the `spng_decode_image()` function. + +The `read_data_to_buffer()` function exposed below summarises the necessary +steps to read this decoded pixel data, and store it into an input buffer. +Notice that this function is encapsulating the logic around the `spng_decode_image()` function. +Also, we are using the `SPNG_FMT_RGBA8` enum value once again to inform the corresponding function, +that the PNG image being decoded, uses the RGBA color model and 8 bit depth. + +```zig +fn read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void { + const status = png.spng_decode_image( + ctx, + buffer.ptr, + buffer.len, + png.SPNG_FMT_RGBA8, + 0 + ); + + if (status != 0) { + return error.CouldNotDecodeImage; + } +} +``` + +Having this function at hand, we can apply it over our context object, and also, over +the buffer object that we have created in the previous section to hold the decoded pixel data +of the image: + +```zig +try read_data_to_buffer(ctx, buffer[0..]); +``` + + +### Looking at the pixel data + +Now that we have the pixel data stored in our "buffer object", we can take just +a quick look at the bytes. In the example below, we are looking at the first +12 bytes in the decoded pixel data. + +If you take a close look at these values, you might notice that every 4 bytes +in the sequence is 255. Which, coincidentally is the maximum possible integer value +to be represented by a `u8` value. So, if the range from 0 to 255, which is the range +of integer values that can be represented by an `u8` value, can be represented as a scale from 0% to 100%, +these 255 values are essentially 100% in that scale. + +If you recall from @sec-pixel-repr, I have +described in that section that our `pedro_pascal.png` PNG file uses the RGBA color model, +which adds an alpha (or transparency) byte to each pixel in the image. +As consequence, each pixel in the image is represented by 4 bytes. Since we are looking +here are the first 12 bytes in the image, it means that we are looking at the data of +the first $12 / 4 = 3$ pixels in the image. + +So, based on how these first 12 bytes (or these 3 pixels) look, with these 255 values at every 4 bytes, we can say that is likely +that every pixel in the image have alpha (or transparency) setted to 100%. This might not be true, +but, is the most likely possibility. Also, if we look at the image itself, which if your recall is +exposed at @fig-pascal, we can see that the transparency of the image does not change across the image, +which enforces this theory. + + +```zig +try stdout.print( + "{any}\n", .{buffer[0..12]} +); +``` + +``` +{ + 200, 194, 216, 255, 203, 197, + 219, 255, 206, 200, 223, 255 +} +``` + + +We can see in the above result that the first pixel in this image have 200 of red, 194 of green, and 216 of blue. +How do I know the order in which the colors appears in the sequence? If you have not guessed that yet, +is because of the acronym RGB. First RED, then GREEN, then BLUE. If we scale these integer values +according to our scale of 0% to 100% (0 to 255), we get 78% of red, 76% of green and 85% of blue. + + + +## Applying the image filter + +Now that we have the data of each pixel in the image, we can focus on applying our image +filter over these pixels. Remember, our objective here is to apply a grayscale filter over +the image. A grayscale filter is a filter that transforms a colored image into a grayscale image. + +There are different formulas and strategies to transform a colored image into a grayscale image. +But all of these different strategies involve +applying some math over the colors of each pixel. In this project, we are going to use the +most general formula, which is exposed below. +This formula considers $r$ as the red of the pixel, $g$ as the green, $b$ as the blue, and $p'$ as the +linear luminance of the pixel. + +$$ + p' = (0.2126 \times r) + (0.7152 \times g) + (0.0722 \times b) +$${#eq-grayscale} + +This @eq-grayscale is the formula to calculate the linear luminance of a pixel. Is worth noting that this formula +works only for images whose pixels are using the sRGB color space, which is the standard color space +for the web. This means that, ideally, all images on the web should use this color space. Luckily, +this is our case here, i.e. the `pedro_pascal.png` image is using this sRGB color space, and, as consequence, +we can use the @eq-grayscale. You can read more about this at the Wikipedia page for grayscale [@wiki_grayscale]. + +The `apply_image_filter()` function exposed below summarises the necessary steps to +apply @eq-grayscale over the pixels in the image. We just apply this function +over our buffer object that contains our pixel data, and, as result, the pixel +data stored in this buffer object should now represent the grayscale version of our image. + +```zig +fn apply_image_filter(buffer:[]u8) !void { + const len = buffer.len; + const red_factor: f16 = 0.2126; + const green_factor: f16 = 0.7152; + const blue_factor: f16 = 0.0722; + var index: u64 = 0; + while (index < (len - 4)) : (index += 4) { + const rf: f16 = @floatFromInt(buffer[index]); + const gf: f16 = @floatFromInt(buffer[index + 1]); + const bf: f16 = @floatFromInt(buffer[index + 2]); + const y_linear: f16 = ( + (rf * red_factor) + (gf * green_factor) + + (bf * blue_factor) + ); + buffer[index] = @intFromFloat(y_linear); + buffer[index + 1] = @intFromFloat(y_linear); + buffer[index + 2] = @intFromFloat(y_linear); + } +} + +try apply_image_filter(buffer[0..]); +``` + + + +## Saving the grayscale version of the image + +Since we have now the grayscale version of our image stored in our buffer object, +we need to encode this buffer object back into the "PNG format", and save the encoded data into +a new PNG file in our filesystem, so that we can access and see the grayscale version of our image +that was produced by our small program. + +To do that, the `libspng` library help us once again by offering an "encode data to PNG" type of function, +which is the `spng_encode_image()` function. But in order to "encode data to PNG" with `libspng`, we need +to create a new context object. This new context object must use a "encoder context", which +is identified by the enum value `SPNG_CTX_ENCODER`. + +The `save_png()` function exposed below, summarises all the necessary steps to save the +grayscale version of our image into a new PNG file in the filesystem. By default, this +function will save the grayscale image into a file named `pedro_pascal_filter.png` in the CWD. + +Notice in this code example that we are using the same image header object (`image_header`) that we have +collected previously with the `get_image_header()` function. Remember, this image header object +is a C struct (`spng_ihdr`) that contains basic information about our PNG file, specially +the dimensions of the image, the color model used, etc. + +If we wanted to save a very different image in this new PNG file, e.g. an image +with different dimensions, or, an image that uses a different color model, a different bit depth, etc. +we would need to create a new image header (`spng_ihdr`) object that describes the properties +of this new image. + +But we are essentially saving the same image that we have begin with here (the dimensions of +the image, the color model, etc. are all still the same). The only difference +between the two images are the colors of the pixels, which are now shades of gray. +As consequence, we can safely use the exact same image header data +in this new PNG file. + + + +```zig +fn save_png(image_header: *png.spng_ihdr, buffer: []u8) !void { + const path = "pedro_pascal_filter.png"; + const file_descriptor = c.fopen(path.ptr, "wb"); + if (file_descriptor == null) { + return error.CouldNotOpenFile; + } + const ctx = ( + png.spng_ctx_new(png.SPNG_CTX_ENCODER) + orelse unreachable + ); + defer png.spng_ctx_free(ctx); + _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); + _ = png.spng_set_ihdr(ctx, image_header); + + const encode_status = png.spng_encode_image( + ctx, + buffer.ptr, + buffer.len, + png.SPNG_FMT_PNG, + png.SPNG_ENCODE_FINALIZE + ); + if (encode_status != 0) { + return error.CouldNotEncodeImage; + } + if (c.fclose(file_descriptor) != 0) { + return error.CouldNotCloseFileDescriptor; + } +} + +try save_png(&image_header, buffer[0..]); +``` + +After we execute this `save_png()` function, we should have a new PNG file +inside our CWD, named `pedro_pascal_filter.png`. If we open this PNG file, +we will see the same image exposed at @fig-pascal-gray. + + +## Building our project + +Now that we have written the code, let's discuss how can we build/compile this project. +To do that, I'm going to create a `build.zig` file in the root directory of our project, +and start writing the necessary code to compile the project, using the knowledge +that we have acquired from @sec-build-system. + + +We first create the build target for our executable file, that executes our +Zig code. Let's suppose that all of our Zig code was written into a Zig module +named `image_filter.zig`, then, the `exe` object described in the build script below +describes the build target for our executable file. + +Since we have used C code from the `libspng` library in our Zig code, +we need to link our Zig code (which is in the `exe` build target) to both +the C Standard Library, and, to the `libspng` library. We do that, by calling +the `linkLibC()` and `linkSystemLibrary()` methods from our `exe` build target. + +```zig +const std = @import("std"); +pub fn build(b: *std.Build) void { + const target = b.standardTargetOptions(.{}); + const optimize = b.standardOptimizeOption(.{}); + const exe = b.addExecutable(.{ + .name = "image_filter", + .root_source_file = b.path("src/image_filter.zig"), + .target = target, + .optimize = optimize, + }); + exe.linkLibC(); + // Link to spng library: + exe.linkSystemLibrary("spng"); + b.installArtifact(exe); +} +``` + +Because we are using the `linkSystemLibrary()` method, it means that the library +files for `libspng` are searched in your system to be linked with the `exe` build target. +If you have not yet built and installed the `libspng` library into your system, this +linkage step will likely not work. Because it will not find the library files. + + +So, just remember to install `libspng` in your system, if you want to build this project. +Having this build script above written, we can finally build our project by +running the `zig build` command in the terminal. + + diff --git a/Chapters/14-threads.qmd b/Chapters/14-threads.qmd new file mode 100644 index 00000000..a376740c --- /dev/null +++ b/Chapters/14-threads.qmd @@ -0,0 +1,1114 @@ +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + +# Introducing threads and parallelism in Zig {#sec-thread} + +Threads are available in Zig through the `Thread` struct +from the Zig Standard Library. This struct represents a kernel thread, and it follows a POSIX Thread pattern, +meaning that, it works similarly to a thread from the `pthread` C library, which is usually available on any distribution +of the GNU C Compiler (`gcc`). If you are not familiar with threads, I will give you some threory behind it first, shall we? + + +## What are threads? {#sec-what-thread} + +A thread is basically a separate context of execution. +We use threads to introduce parallelism into our program, +which in most cases, makes the program runs faster, because we have multiple tasks +being performed at the same time, parallel to each other. + +Programs are normally single-threaded by default. Which means that each program +usually runs on a single thread, or, a single context of execution. When we have only one thread running, we have no +parallelism. And when we don't have parallelism, the commands are executed sequentially, that is, +only one command is executed at a time, one after another. By creating multiple threads inside our program, +we start to execute multiple commands at the same time. + +Programs that create multiple threads are very commom on the wild. Because many different types +of applications are well suited for parallelism. Good examples are video and photo-editing applications +(e.g. Adobe Photoshop or DaVinci Resolve) +, games (e.g. The Witcher 3), and also web browsers (e.g. Google Chrome, Firefox, Microsoft Edge, etc). +For example, in web browsers, threads are normally used to implement tabs. +In other words, the tabs in a web browsers usually run as separate threads in the main process of +the web browser. That is, each new tab that you open in your web browser, +usually runs on a separate thread of execution. + +By running each tab in a separate thread, we allow all open tabs in the browser to run at the same time, +and independently from each other. For example, you might have YouTube, or Spotify, currently opened in +a tab, and you are listening to some podcast in that tab, while, at the same time, +you are working in another tab, writing an essay on Google Docs. Even if you are not looking +into the YouTube tab, you can still hear the podcast only because this YouTube tab is running in parallel +with the other tab where Google Docs is running. + +Without threads, the other alternative would be to run each tab as a completely separate running +process in your computer. But that would be a bad choice, because just a few tabs would already consume +too much power and resources from your computer. In other words, is very expensive to create a completely new process, +compared to creating a new thread of execution. Also, the chances of you experiencing lag and overhead +while using the browser would be significant. Threads are faster to create, and they also consume +much, much less resources from the computer, specially because they share some resources +with the main process. + +Therefore, is the use of threads in modern web browsers that allows you to hear the podcast +at the same time while you are writing something on Google Docs. +Without threads, a web browser would probably be limited to just one single tab. + +Threads are also well-suited for anything that involves serving requests or orders. +Because serving a request takes time, and usually involves a lot of "waiting time". +In other words, we spend a lot of time in idle, waiting for something to complete. +For example, consider a restaurant. Serving orders in a restaurant usually involves +the following steps: + +1. receive order from the client. +1. pass the order to the kitchen, and wait for the food to be cooked. +1. start cooking the food in the kitchen. +1. when the food is fully cooked deliver this food to the client. + +If you think about the bulletpoints above, you will notice that one big moment of waiting +is present in this hole process, which is while the food is being prepared and cooked +inside the kitchen. Because while the food is being prepped, both the waiter and the client +itself are waiting for the food to be ready and delivered. + +If we write a program to represent this restaurant, more specifically, a single-threaded program, then, +this program would be very inefficient. Because the program would stay in idle, waiting for a considerable amount +of time on the "check if food is ready" step. +Consider the code snippet exposed below that could potentially represent such +program. + +The problem with this program is the while loop. This program will spend a lot of time +waiting on the while loop, doing nothing more than just checking if the food is ready. +This is a waste of time. Instead of waiting for something to happen, the waiter +could just send the order to the kitchen, and just move on, and continue with receiving +more orders from other clients, and sending more orders to the kitchen, insteading +of doing nothing and waiting for the food to be ready. + +```zig +const order = Order.init("Pizza Margherita", n = 1); +const waiter = Waiter.init(); +waiter.receive_order(order); +waiter.ask_kitchen_to_cook(); +var food_not_ready = false; +while (food_not_ready) { + food_not_ready = waiter.is_food_ready(); +} +const food = waiter.get_food_from_kitchen(); +waiter.send_food_to_client(food); +``` + +This is why threads would be a great fit for this program. We could use threads +to free the waiters from their "waiting duties", so they can go on with their +other tasks, and receive more orders. Take a look at the next example, where I have re-written the above +program into a different program that uses threads to cook and deliver the orders. + +You can see in this program that when a waiter receives a new order +from a client, this waiter executes the `send_order()` function. +The only thing that this function does is: it creates a new thread +and detaches it. Since creating a thread is a very fast operation, +this `send_order()` function returns almost immediatly, +so the waiter spends almost no time worring about the order, and just +move on and tries to get the next order from the clients. + +Inside the new thread created, the order get's cooked by a chef, and when the +food is ready, it is delivered to the client's table. + + +```zig +fn cook_and_deliver_order(order: *Order) void { + const chef = Chef.init(); + const food = chef.cook(order.*); + chef.deliver_food(food); +} +fn send_order(order: Order) void { + const cook_thread = Thread.spawn( + .{}, cook_and_deliver_order, .{&order} + ); + cook_thread.detach(); +} + +const waiter = Waiter.init(); +while (true) { + const order = waiter.get_new_order(); + if (order) { + send_order(order); + } +} +``` + + + +## Threads versus processes + +When we run a program, this program is executed as a *process* in the operating system. +This is a one to one relationship, each program or application that you execute +is a separate process in the operating system. But each program, or each process, +can create and contain multiple threads inside of it. Therefore, +processes and threads have a one to many relationship. + +This also means that every thread that we create is always associated with a particular process in our computer. +In other words, a thread is always a subset (or a children) of an existing process. +All threads share some of the resources associated with the process from which they were created. +And because threads share resources with the process, they are very good for making communication +between tasks easier. + +For example, suppose that you were developing a big and complex application +that would be much simpler if you could split it in two, and make these two separate pieces talk +with each other. Some programmers opt to effectively write these two pieces of the codebase as two +completely separate programs, and then, they use IPC (*inter-process communication*) to make these +two separate programs/processes talk to each other, and make them work together. + +However, some programmers find IPC hard to deal with, and, as consequence, +they prefer to write one piece of the codebase as the "main part of the program", +or, as the part of the code that runs as the process in the operating system, +while the other piece of the codebase is written as a task to be executed in +a new thread. A process and a thread can easily comunicate with each other +through both control flow, and also, through data, because they share and have +access to the same standard file descriptors (`stdout`, `stdin`, `stderr`) and also to the same memory space +on the heap and global data section. + + +In more details, each thread that you create have a separate stack frame reserved just for that thread, +which essentially means that each local object that you create inside this thread, is local to that +thread, i.e. the other threads cannot see this local object. Unless this object that you have created +is an object that lives on the heap. In other words, if the memory associated with this object +is on the heap, then, the other threads can potentially access this object. + +Therefore, objects that are stored in the stack are local to the thread where they were created. +But objects that are stored on the heap are potentially accessible to other threads. All of this means that, +each thread have it's own separate stack frame, but, at the same time, all threads share +the same heap, the same standard file descriptors (which means that they share the same `stdout`, `stdin`, `stderr`), +and the same global data section in the program. + + + +## Creating a thread + +We create new threads in Zig, by first, importing the `Thread` struct into +our current Zig module, and then, calling the `spawn()` method of this struct, +which creates (or, "spawns") a new thread of execution from our current process. +This method have three arguments, which are, respectively: + +1. a `SpawnConfig` object, which contains configurations for the spawn process. +1. the name of the function that is going to be executed (or, that is going to be "called") inside this new thread. +1. a list of arguments (or inputs) to be passed to the function provided in the second argument. + +With these three arguments, you can control how the thread get's created, and also, specify which +work (or "tasks") will be performed inside this new thread. A thread is just a separate context of execution, +and we usually create new threads in our code, because we want to perform some work inside this +new context of execution. And we specify which exact work, or, which exact steps that are going to be +performed inside this context, by providing the name of a function on the second argument of the `spawn()` method. + +Thus, when this new thread get's created, this function that you provided as input to the `spawn()` +method get's called, or, get's executed inside this new thread. You can control the +arguments, or, the inputs that are passed to this function when it get's called, by providing +a list of arguments (or a list of inputs) on the third argument of the `spawn()` method. +These arguments are passed to the function in the same order that they are +provided to `spawn()`. + +Furthermore, the `SpawnConfig` is a struct object with only two possible fields, or, two possible members, that you +can set to tailor the spawn behaviour. These fields are: + +- `stack_size`: you can provide an `usize` value to specify the size (in bytes) of the thread's stack frame. By default, this value is: $16 \times 1024 \times 1024$. +- `allocator`: you can provide an allocator object to be used when allocating memory for the thread. + +To use one of these two fields (or, "configs") you just have to create a new object of type `SpawnConfig`, +and provide this object as input to the `spawn()` method. But, if you are not interested in using +one of these configs, and you are ok with using just the defaults, you can just provide an anonymous +struct literal (`.{}`) in the place of this `SpawnConfig` argument. + +As our first, and very simple example, consider the code exposed below. +Inside the same program, you can create multiple threads of execution if you want to. +But, in this first example, we are creating just a single thread of execution, because +we call `spawn()` only once. + +Also, notice in this example that we are executing the function `do_some_work()` +inside the new thread. Since this function receives no inputs, because it has +no arguments, in this instance, we have passed an empty list, or, more precisely, an empty and anonymous struct (`.{}`) +in the third argument of `spawn()`. + + +```{zig} +#| build_type: "run" +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +fn do_some_work() !void { + _ = try stdout.write("Starting the work.\n"); + std.time.sleep(100 * std.time.ns_per_ms); + _ = try stdout.write("Finishing the work.\n"); +} + +pub fn main() !void { + const thread = try Thread.spawn(.{}, do_some_work, .{}); + thread.join(); +} +``` + +Notice the use of `try` when calling the `spawn()` method. This means +that this method can return an error in some circunstances. One circunstance +in particular is when you attempt to create a new thread, when you have already +created too much (i.e. you have excedeed the quota of concurrent threads in your system). + +But, if the new thread is succesfully created, the `spawn()` method returns a handler +object (which is just an object of type `Thread`) to this new thread. You can use +this handler object to effectively control all aspects of the thread. + +The instant that you create the new thread, the function that you provided as input to `spawn()` +get's invoked (i.e. get's called) to start the execution on this new thread. +In other words, everytime you call `spawn()`, not only a new thread get's created, +but also, the "start work button" of this thread get's automatically pressed. +So the work being performed in this thread starts at the moment that the thread is created. +This is similar to how `pthread_create()` from the `pthreads` library in C works, +which also starts the execution at the moment that the thread get's created. + + +## Returning from a thread + +We have learned on the previous section that the execution of the thread starts at the moment +that the thread get's created. Now, we will learn how to "join" or "detach" a thread in Zig. +"Join" and "detach" are operations that control how the thread returns to +the main thread, or, to the main process in our program. + +We perform these operations by using the methods `join()` and `detach()` from the thread handler object. +Every thread that you create can be marked as either *joinable* or *detached* [@linux_pthread_create]. +You can turn a thread into a *detached* thread by calling the `detach()` method +from the thread handler object. But if you call the `join()` method instead, then, this thread +becomes a *joinable* thread. + +A thread cannot be both *joinable* and *detached*. Which in general means +that you cannot call both `join()` and `detach()` on the same thread. +But a thread must be one of the two, meaning that, you should always call +either `join()` or `detach()` over a thread. If you don't call +one of these two methods over your thread, you introduce undefined behaviour into your program, +which is described at @sec-not-call-join-detach. + +Now, let's describe what each of these two methods do to your thread. + + +### Joining a thread + +When you join a thread, you are essentially saying: "Hey! Could you please wait for the thread to finish, +before you continue with your execution?". For example, if we comeback to our first and simpliest example +of a thread in Zig, in that example we have created a single thread inside the `main()` function of our program, +and just called `join()` over this thread at the end. This section of the code example is reproduced below. + +Because we are joining this new thread inside the `main()`'s scope, it means that the +execution of the `main()` function is temporarily stopped, to wait for the execution of the thread +to finish. That is, the execution of `main()` stops temporarily at the line where `join()` get's called, +and it will continue only after the thread has finished it's tasks. + +```{zig} +#| eval: false +pub fn main() !void { + const thread = try Thread.spawn(.{}, do_some_work, .{}); + thread.join(); +} +``` + +Because we have joined this new thread inside `main()`, by calling `join()`, we have a +garantee that this new thread will finish before the end of the execution of `main()`. +Because it is garanteed that `main()` will wait for the thread to finish it's tasks. +You could also interpret this as: the execution of main will hang at +the line where `join()` is called, and the next lines of code that come after +this `join()` call, will be executed solely after the execution of main +is "unlocked" after the thread finish it's tasks. + +In the example above, there is no more expressions after the `join()` call. We just have the end +of the `main()`'s scope, and, therefore after the thread finish it's tasks, the execution +of our program just ends, since there is nothing more to do. But what if we had more stuff to do +after the join call? + +To demonstrate this other possibility, consider the next example exposed +below. Here, we create a `print_id()` function, that just receives an id +as input, and prints it to `stdout`. In this example, we are creating two +new threads, one after another. Then, we join the first thread, then, +we wait for two hole seconds, then, at last, we join the second thread. + +The idea behind this example is that the last `join()` call is executed +only after the first thread finish it's task (i.e. the first `join()` call), +and also, after the two seconds of delay. If you compile and run this +example, you will notice that most messages are quickly printed to `stdout`, +i.e. they appear almost instantly on your screen. +However, the last message ("Joining thread 2") takes aroung 2 seconds to appear +in the screen. + + +```zig +fn print_id(id: *const u8) !void { + try stdout.print("Thread ID: {d}\n", .{id.*}); +} + +pub fn main() !void { + const id1: u8 = 1; + const id2: u8 = 2; + const thread1 = try Thread.spawn(.{}, print_id, .{&id1}); + const thread2 = try Thread.spawn(.{}, print_id, .{&id2}); + + _ = try stdout.write("Joining thread 1\n"); + thread1.join(); + std.time.sleep(2 * std.time.ns_per_s); + _ = try stdout.write("Joining thread 2\n"); + thread2.join(); +} +``` + +``` +Thread ID: Joining thread 1 +1 +Thread ID: 2 +Joining thread 2 +``` + +This demonstrates that both threads finish their work (i.e. printing the IDs) +very fast, before the two seconds of delay end. Because of that, the last `join()` call +returns pretty much instantly. Because when this last `join()` call happens, the second +thread have already finished it's task. + +Now, if you compile and run this example, you will also notice that, in some cases, +the messages get intertwined with each other. In other words, you might see +the message "Joining thread 1" inserted in the middle of the message "Thread 1", +or vice-versa. This happens because: + +- the threads are executing basically at the same time as the main process of the program (i.e. the `main()` function). +- the threads share the same `stdout` from the main process of the program, which means that the messages that the threads produce are sent to exact same place as the messages produced by the main process. + +Both of these points were described previously at @sec-what-thread. +So the messages might get intertwined because they are being produced and +sent to the same `stdout` roughly at the same time. +Anyway, when you call `join()` over a thread, the current process will wait +for the thread to finish before it continues, and, when the thread does finishs it's +task, the resources associated with this thread are automatically freed, and, +the current process continues with it's execution. + + +### Detaching a thread + +When you detach a thread, by calling the `detach()` method, the thread is marked as *detached*. +When a *detached* thread terminates, its resources are automatically released back to the system without +the need for another thread to join with this terminated thread. + +In other words, when you call `detach()` over a thread is like when your children becomes adults, +i.e. they become independent from you. A detached thread frees itself, and it does need to report the results back +to you, when the thread finishs it's task. Thus, you normally mark a thread as *detached* +when you don't need to use the return value of the thread, or, when you don't care about +when exactly the thread finishs it's job, i.e. the thread solves everything by itself. + +Take the code example below. We create a new thread, detach it, and then, we just +print a final message before we end our program. We use the same `print_id()` +function that we have used over the previous examples. + + +```zig +fn print_id(id: *const u8) !void { + try stdout.print("Thread ID: {d}\n", .{id.*}); +} + +pub fn main() !void { + const id1: u8 = 1; + const thread1 = try Thread.spawn(.{}, print_id, .{&id1}); + thread1.detach(); + _ = try stdout.write("Finish main\n"); +} +``` + +``` +Finish main +``` + +Now, if you look closely at the output of this code example, you will notice +that only the final message in main was printed to the console. The message +that was supposed to be printed by `print_id()` did not appear in the console. +Why? Is because the main process of our program has finished first, +before the thread was able to say anything. + +And that is perfectly ok behaviour, because the thread was detached, so, it was +able to free itself, without the need of the main process. +If you ask main to sleep (or "wait") for some extra nanoseconds, before it ends, you will likely +see the message printed by `print_id()`, because you give enough time for the thread to +finish before the main process ends. + + +## Thread pools + +Thread pools is a very popular programming pattern, which is used specially on servers and daemons processes. A thread pool is just a +set of threads, or, a "pool" of threads. Many programmers like to use this pattern, because it makes +easier to manage and use multiple threads, instead of manually creating the threads when you need them. + +Also, using thread pools might increase performance as well in your program, +especially if your program is constantly creating threads to perform short-lived tasks. +In such instance, a thread pool might cause an increase in performance because you do not have be constantly +creating and destroying threads all the time, so you don't face a lot of the overhead involved +in this constant process of creating and destroying threads. + +The main idea behind a thread pool is to have a set of threads already created and ready to perform +tasks at all times. You create a set of threads at the moment that your program starts, and keep +these threads alive while your program runs. Each of these threads will be either performing a task, or, +waiting for a task to be assigned. +Every time a new task emerges in your program, this task is added to a "queue of tasks". +The moment that a thread becomes available and ready to perform a new task, +this thread takes the next task in the "queue of tasks", then, +it simply performs the task. + +The Zig Standard Library offers a thread pool implementation on the `std.Thread.Pool` struct. +You create a new instance of a `Pool` object by providing a `Pool.Options` object +as input to the `init()` method of this struct. A `Pool.Options` object, is a struct object that contains +configurations for the pool of threads. The most important settings in this struct object are +the members `n_jobs` and `allocator`. As the name suggests, the member `allocator` should receive an allocator object, +while the member `n_jobs` specifies the number of threads to be created and maintained in this pool. + +Consider the example exposed below, that demonstrates how can we create a new thread pool object. +Here, we create a `Pool.Options` object that contains +a general purpose allocator object, and also, the `n_jobs` member was set to 4, which +means that the thread pool will create and use 4 threads. + +Also notice that the `pool` object was initially set to `undefined`. This allow us +to initially declare the thread pool object, but not properly instantiate the +underlying memory of the object. You have to initially declare your thread pool object +by using `undefined` like this, because the `init()` method of `Pool` needs +to have an initial pointer to properly instantiate the object. + +So, just +remember to create your thread pool object by using `undefined`, and then, +after that, you call the `init()` method over the object. +You should also not forget to call the `deinit()` method over the thread pool +object, once you are done with it, to release the resources allocated for the thread pool. Otherwise, you will +have a memory leak in your program. + +```{zig} +#| auto_main: false +#| build_type: "run" +const std = @import("std"); +const Pool = std.Thread.Pool; +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + const opt = Pool.Options{ + .n_jobs = 4, + .allocator = allocator, + }; + var pool: Pool = undefined; + _ = try pool.init(opt); + defer pool.deinit(); +} +``` + +Now that we know how to create `Pool` objects, we have +to understand how to assign tasks to be executed by the threads in this pool object. +To assign a task to be performed by a thread, we need to call the `spawn()` method +from the thread pool object. + +This `spawn()` method works identical to the `spawn()` method from the +`Thread` object. The method have almost the same arguments as the previous one, +more precisely, we don't have to provide a `SpawnConfig` object in this case. +But instead of creating a new thread, this `spawn()` method from +the thread pool object just register a new task in the internal "queue of tasks" to be performed, +and any available thread in the pool will get this task, and it will simply perform the task. + +In the example below, we are using our previous `print_id()` function once again. +But you may notice that the `print_id()` function is a little different this time, +because now we are using `catch` instead of `try` in the `print()` call. +Currently, the `Pool` struct only supports functions that don't return errors +as tasks. Thus, when assigining tasks to threads in a thread pool, is essential to use functions +that don't return errors. That is why we are using `catch` here, so that the +`print_id()` function don't return an error. + + +```{zig} +#| eval: false +fn print_id(id: *const u8) void { + _ = stdout.print("Thread ID: {d}\n", .{id.*}) + catch void; +} +const id1: u8 = 1; +const id2: u8 = 2; +try pool.spawn(print_id, .{&id1}); +try pool.spawn(print_id, .{&id2}); +``` + +This limitation should probably not exist, and, in fact, it is already on the radar of the +Zig team to fix this issue, and it is being tracked on an [open issue](https://github.com/ziglang/zig/issues/18810)[^issue]. +So, if you do need to provide a function that might return an error as the task +to be performed by the threads in the thread pool, then, you are either limited to: + +- implementing your own thread pool that does not have this limitation. +- wait for the Zig team to actually fix this issue. + +[^issue]: + + + + +## Mutexes + +Mutexes are a classic component of every thread library. In essence, a mutex is a *Mutually Exclusive Flag*, and this flag +acts like a type of "lock", or as a gate keeper to a particular section of your code. Mutexes are related to thread syncronization, +more specifically, they prevent you from having some classic race conditions in your program, +and, therefore, major bugs and undefined behaviour that are usually difficult to track and understand. + +The main idea behind a mutex is to help us to control the execution of a particular section of the code, and to +prevent two or more threads from executing this particular section of the code at the same time. +Many programmers like to compare a mutex to a bathroom door (which usually have a lock). +When a thread locks it's own mutex object, it is like if the bathroom door was locked, +and, therefore, the other people (in this case, the other threads) that wants to use the same bathroom at the same time +have to be patient, and simply wait for the other person (or the other thread) to unlock the door and get out of the bathroom. + +Some other programmers also like to explain mutexes by using the analogy of "each person will have their turn to speak". +This is the analogy used on the [*Multithreading Code* video from the Computherfile project](https://www.youtube.com/watch?v=7ENFeb-J75k&ab_channel=Computerphile)[^computerphile]. +Imagine +if you are in a conversation circle. There is a moderator in this circle, which is the person that decides who +have the right to speak at that particular moment. The moderator gives a green card (or some sort of an authorization card) to the person that +is going to speak, and, as a result, everyone else must be silent and hear this person that has the green card. +When the person finishs talking, it gives the green card back to the moderator, and the moderator decides +who is going to talk next, and delivers the green card to that person. And the cycle goes on like this. + +[^computerphile]: + + +A mutex acts like the moderator in this conversation circle. The mutex authorizes one single thread to execute a specific section of the code, +and it also blocks the other threads from executing this same section of the code. If these other threads wants to execute this same +piece of the code, they are forced to wait for the the authorized thread to finish first. +When the authorized thread finishs executing this code, the mutex authorizes the next thread to execute this code, +and the other threads are still blocked. Therefore, a mutex is like a moderator that does a "each thread will have their turn to execute this section of the code" +type of control. + + +Mutexes are specially used to prevent data race problems from happening. A data race problem happens when two or more threads +are trying to read from or write to the same shared object at the same time. +So, when you have an object that is shared will all threads, and, you want to avoid two or more threads from +accessing this same object at the same time, you can use a mutex to lock the part of the code that access this specific object. +When a thread tries to run this code that is locked by a mutex, this thread stops it's execution, and patiently waits for this section of the codebase to be +unlocked to continue. + +In other words, the execution of the thread is paused while the code section +is locked by the mutex, and it is unpaused the moment that the code section is unlocked by the other thread that +was executing this code section. +Notice that mutexes are normally used to lock areas of the codebase that access/modify data that is **shared** with all threads, +i.e. objects that are either stored in the global data section, or, in the heap space of your program. +So mutexes are not normally used on areas of the codebase that access/modify objects that are local to the thread. + + + +### Critical section {#sec-critical-section} + +Critical section is a concept commonly associated with mutexes and thread syncronization. +In essence, a critical section is the section of the program that a thread access/modify a shared resource +(i.e. an object, a file descriptor, something that all threads have access to). In other words, +a critical section is the section of the program where race conditions might happen, and, therefore, +where undefined behaviour can be introduced into the program. + +When we use mutexes in our program, the critical section defines the area of the codebase that we want to lock. +So we normally lock the mutex object at the beginning of the critical section, +and then, we unlock it at the end of the critical section. +The two bulletpoints exposed below comes from the "Critical Section" article from GeekFromGeeks, +and they summarise well the role that a critical section plays in the thread syncronization problem [@geeks_critical_section]. + + +1. The critical section must be executed as an atomic operation, which means that once one thread or process has entered the critical section, all other threads or processes must wait until the executing thread or process exits the critical section. The purpose of synchronization mechanisms is to ensure that only one thread or process can execute the critical section at a time. +2. The concept of a critical section is central to synchronization in computer systems, as it is necessary to ensure that multiple threads or processes can execute concurrently without interfering with each other. Various synchronization mechanisms such as semaphores, mutexes, monitors, and condition variables are used to implement critical sections and ensure that shared resources are accessed in a mutually exclusive manner. + + +### Atomic operations {#sec-atomic-operation} + +You will also see the term "atomic operation" a lot when reading about threads, race conditions and mutexes. +In summary, an operation is categorized as "atomic", when there is no way to happen a context switch in +the middle of this operation. In other words, this operation is always done from beginning to end, without interruptions +of another process or operation in the middle of it's execution phase. + +Not many operations today are atomic. But why atomic operations matters here? Is because data races +(which is a type of a race condition) cannot happen on operations that are atomic. +So if a particular line in your code performs an atomic operation, then, this line will never +suffer from a data race problem. Therefore, programmers sometimes use an atomic operation +to protect themselves from data race problems in their code. + +When you have an operation that is compiled into just one single assembly instruction, this operation might be atomic, +because is just one assembly instruction. But this is not guaranteed. This is usually true for old CPU architectures (such as `x86`). But nowadays, most +assembly instructions in modern CPU architectures turn into multiple micro-tasks, which inherently makes the operation not atomic anymore, +even though it has just one single assembly instruction. + +The Zig Standard Library offers some atomic functionality at the `std.atomic` module. +In this module, you will find a public and generic function called `Value()`. With this function we create an "atomic object", which is +a value that contains some native atomic operations, most notably, a `load()` and a `fetchAdd()` operation. +If you have experience with multithreading in C++, you probably have recognized this pattern. So yes, this generic +"atomic object" in Zig is essentially identical to the template struct `std::atomic` from the C++ Standard Library. +Is important to emphasize that only primitive data types (i.e. the types presented at @sec-primitive-data-types) +are supported by these atomic operations. + + + + + +### Data races and race conditions + +To understand why mutexes are used, we need to understand better the problem that they seek +to solve, which can be summarized into data races problems. A data race problem is a type of a race condition, +which happens when one thread is accessing a particular memory location (i.e. a particular shared object) at the same +time that another thread is trying to write/save new data into this same memory location (i.e. the same shared object). + +We can simply define a race condition as any type of bug in your program that is based +on a "who get's there first" problem. A data race problem is a type of a race condition, because it occurs when two or more parties +are trying to read and write into the same memory location at the same time, and, therefore, the end result of this operation +depends completely on who get's to this memory location first. +As consequence, a program that have a data race problem will likely produce a different result each time that we execute it. + +Thus, race conditions produce unefined behaviour and unpredictability because the program produces +a different answer in each time that a different person get's to the target location first than the others. +And we have no easy way to either predict or control who is going to get to this target location first. +In other words, in each execution of your program, +you get a different answer, because a different person, or, a different function, or, a different part of the code is finishing +its tasks (or it is reaching a location) first than the others. + +As an example, consider the code snippet exposed below. In this example, we create a global counter +variable, and we also create a `increment()` function, whose job is to just increment this global counter +variable in a for loop. + +Since the for loop iterates 1 hundred thousand times, and, we create two separate threads +in this code example, what number do you expect to see in the final message printed to `stdout`? +The answer should be 2 hundred thousand. Right? Well, in threory, this program was supposed +to print 2 hundred thousand at the end, but in practice, every time that I execute this program +I get a different answer. + +In the example exposed below, you can see that this time we have executed the program, the end +result was 117254, instead of the expected 200000. The second time I have executed this program, +I got the number 108592 as result. So the end result of this program is varying, but it never gets +to the expected 200000 that we want. + + +```{zig} +#| auto_main: false +#| build_type: "run" +#| eval: false +// Global counter variable +var counter: usize = 0; +// Function to increment the counter +fn increment() void { + for (0..100000) |_| { + counter += 1; + } +} + +pub fn main() !void { + const thr1 = try Thread.spawn(.{}, increment, .{}); + const thr2 = try Thread.spawn(.{}, increment, .{}); + thr1.join(); + thr2.join(); + try stdout.print("Couter value: {d}\n", .{counter}); +} +``` + +``` +Couter value: 117254 +``` + + +Why this is happening? The answer is: because this program contains a data race problem. +This program would print the correct number 200000, if, and only if the first thread finishs +it's tasks before the second thread starts to execute. But that is very unlikely to happen. +Because the process of creating the thread is too fast, and therefore, both threads starts to execute roughly +at the same time. If you change this code to add some nanoseconds of sleep between the first and the second calls to `spawn()`, +you will increase the chances of the program producing the "correct result". + +So the data race problem happens, because both threads are reading and writing to the same +memory location at roughly the same time. In this example, each thread is essentially performing +three basic operations at each iteration of the for loop, which are: + +1. reading the current value of `count`. +1. incrementing this value by 1. +1. writing the result back into `count`. + +Ideally, a thread B should read the value of `count`, only after the other thread A has finished +writing the incremented value back into the `count` object. Therefore, in the ideal scenario, which is demonstrated +at @tbl-data-race-ideal, the threads should work in sync with each other. But the reality is that these +threads are out of sync, and because of that, they suffer from a data race problem, which is demonstrated +at @tbl-data-race-not. + +Notice that, in the data race scenario (@tbl-data-race-not), the read performed by a thread B happens +before the write operation of thread A, and that ultimately leads to wrong results at the end of the program. +Because when the thread B reads the value from the `count` variable, the thread A is still processing +the initial value from `count`, and it did not write the new and incremented value into `count` yet. So what +happens is that thread B ends up reading the same initial value (the "old" value) from `count`, instead of +reading the new and incremented version of this value that would be calculated by thread A. + + +::: {#tbl-data-race-ideal} + +| Thread 1 | Thread 2 | Integer value | +|-------------|-------------|---------------| +| read value | | 0 | +| increment | | 1 | +| write value | | 1 | +| | read value | 1 | +| | increment | 2 | +| | write value | 2 | + +: An ideal scenario for two threads incrementing the same integer value +::: + +::: {#tbl-data-race-not} + +| Thread 1 | Thread 2 | Integer value | +|-------------|-------------|---------------| +| read value | | 0 | +| | read value | 0 | +| increment | | 1 | +| | increment | 1 | +| write value | | 1 | +| | write value | 1 | + +: A data race scenario when two threads are incrementing the same integer value +::: + + +If you think about these diagrams exposed in form of tables, you will notice that they relate back to our discussion of atomic operations +at @sec-atomic-operation. Remember, atomic operations are operations that the CPU executes +from beginning to end, without interruptions from other threads or processes. So, +the scenario exposed at @tbl-data-race-ideal do not suffer from a data race, because +the operations performed by thread A are not interrupted in the middle by the operations +from thread B. + +If we also think about the discussion of critical section from @sec-critical-section, we can identify +the section that representes the critical section of the program, which is the section that is vulnerable +to data race conditions. In this example, the critical section of the program is the line where we increment +the `counter` variable (`counter += 1`). So, ideally, we want to use a mutex, and lock right before this line, and then, +unlock right after this line. + + + + +### Using mutexes in Zig + +Now that we know the problem that mutexes seek to solve, we can learn how to use them in Zig. +Mutexes in Zig are available through the `std.Thread.Mutex` struct from the Zig Standard Library. +If we take the same code example from the previous example, and improve it with mutexes, to solve +our data race problem, we get the code example exposed below. + +Notice that we had this time to alter the `increment()` function to receive a pointer to +the `Mutex` object as input. All that we need to do, to make this program safe against +data race problems, is to call the `lock()` method at the beginning of +the critical section, and then, call `unlock()` at the end of the critical section. +Notice that the output of this program is now the correct number of 200000. + +```{zig} +#| build_type: "run" +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +const Mutex = std.Thread.Mutex; +var counter: usize = 0; +fn increment(mutex: *Mutex) void { + for (0..100000) |_| { + mutex.lock(); + counter += 1; + mutex.unlock(); + } +} + +pub fn main() !void { + var mutex: Mutex = .{}; + const thr1 = try Thread.spawn(.{}, increment, .{&mutex}); + const thr2 = try Thread.spawn(.{}, increment, .{&mutex}); + thr1.join(); + thr2.join(); + try stdout.print("Couter value: {d}\n", .{counter}); +} +``` + + + + +## Read/Write locks + +Mutexes are normally used when is always not safe to have two or more threads running the same +piece of code at the same time. In contrast, read/write locks are normally used in situations +where you have a mixture of scenarios, i.e. there are some pieces of the codebase that are safe to run in parallel, and other pieces that +are not safe. + +For example, suppose that you have multiple threads that uses the same shared file in the filesystem to store some configurations, or, +statistics. If two or more threads try to read the data from this same file at the same time, nothing bad happens. +So this part of the codebase is perfectly safe to be executed in parallel, with multiple threads reading the same file at the same time. + +However, if two or more threads try to write data into this same file at the same time, then, we cause some race conditions +problems. So this other part of the codebase is not safe to be executed in parallel. +More specifically, a thread might end up writing data in the middle of the data written by the other thread. +This process of two or more threads writing to the same location, might lead to data corruption. +This specific situation is usually called of a *torn write*. + +Thus, what we can extract from this is that there is certain types of operations that causes a race condition, but there +are also, other types of operations that do not cause a race condition problem. +You could also say that, there are types of operations that are susceptible to race condition problems, +and there are other types of operations that are not. + +A read/write lock is a type of lock that acknowledges the existance of this specific scenario, and you can +use this type of lock to control which parts of the codebase are safe to run in parallel, and which parts are not safe. + + + +### Exclusive lock vs shared lock + +Therefore, a read/write lock is a little different from a mutex. Because a mutex is always an *exclusive lock*, meaning that, only +one thread is allowed to execute at all times. With an exclusive lock, the other threads are always "excluded", +i.e. they are always blocked from executing. +But in a read/write lock, the other threads might be authorized to run at the same time, depending on the type of lock that they acquire. + +We have two types of locks in a read/write lock, which are: an exclusive lock and a shared lock. An exclusive lock works exactly the same +as a mutex, while a shared lock is a lock that does not block the other threads from running. +In the `pthreads` C library, read/write locks are available through the `pthread_rwlock_t` C struct. With +this C struct, you can create a "write lock", which corresponds to an exclusive lock, or, you can create a "read lock", +which corresponds to a shared lock. The terminology might be a little different, but the meaning is the same, +so just remember this relationship, write locks are exclusive locks, while read locks are shared locks. + +When a thread tries to acquire a read lock (i.e. a shared lock), this thread get's the shared lock +if, and only if another thread does not currently holds a write lock (i.e. an exclusive lock), and also, +if there are no other threads that are already in the queue, +waiting for their turn to acquire a write lock. In other words, the thread in the queue have attempted +to get a write lock earlier, but this thread was blocked +because there was another thread running that already had a write lock. As consequence, this thread is on the queue to get a write lock, +and it's currently waiting for the other thread with a write lock to finish it's execution. + +When a thread tries to acquire a read lock, but it fails in acquiring this read lock, either because there is +a thread with a write lock already running, or, because there is a thread in the queue to get a write lock, +the execution of this thread is instantly blocked, i.e. paused. This thread will indefinitely attempt to get the +read lock, and it's execution will be unblocked (or unpaused) only after this thread successfully acquires the read lock. + +If you think deeply about this dynamic between read locks versus write locks, you might notice that a read lock is basically a safety mechanism. +More specifically, it is a way for us to +allow a particular thread to run together with the other threads, only when it's safe to. In other words, if there is currently +a thread with a write lock running, then, it is very likely not safe for the thread that is trying to acquire the read lock to run now. +As consequence, the read lock protects this thread from running into dangerous waters, and patienly waits for the +"write lock" thread to finishs it's tasks before it continues. + +On the other hand, if there are only "read lock" (i.e. "shared lock") threads currently running +(i.e. not a single "write lock" thread currently exists), then, +is perfectly safe for this thread that is acquiring the read lock to run in parallel with the other +threads. As a result, the read lock just +allows for this thread to run together with the other threads. + +Thus, by using read locks (shared locks) in conjunction with write locks (exclusive locks), we can control which regions or sections +of our multithreaded code is safe for us to have parallelism, and which sections are not safe to have parallelism. + + + + + +### Using read/write locks in Zig + +The Zig Standard Library supports read/write locks through the `std.Thread.RwLock` module. +If you want to a particular thread to acquire a shared lock (i.e. a read lock), you should +call the `lockShared()` method from the `RwLock` object. But, if you want for this thread +to acquire an exclusive lock (i.e. a write lock) instead, then, you should call the +`lock()` method from the `RwLock` object. + +As with mutexes, we also have to unlock the shared or exclusive locks that we acquire through a read/write lock object, +once we are at the end of our "critical section". If you have acquired an exclusive lock, then, you unlock +this exclusive lock by calling the `unlock()` method from the read/write lock object. In contrast, +if you have acquired a shared lock instead, then, call `unlockShared()` to unlock this shared lock. + +As a simple example, the code below creates three separate threads responsible for reading the +current value in a `counter` object, and it also creates another thread, responsible for writing +new data into the `counter` object (incrementing it, more specifically). + +```{zig} +#| eval: false +var counter: u32 = 0; +fn reader(lock: *RwLock) !void { + while (true) { + lock.lockShared(); + const v: u32 = counter; + try stdout.print("{d}", .{v}); + lock.unlockShared(); + std.time.sleep(2 * std.time.ns_per_s); + } +} +fn writer(lock: *RwLock) void { + while (true) { + lock.lock(); + counter += 1; + lock.unlock(); + std.time.sleep(2 * std.time.ns_per_s); + } +} + +pub fn main() !void { + var lock: RwLock = .{}; + const thr1 = try Thread.spawn(.{}, reader, .{&lock}); + const thr2 = try Thread.spawn(.{}, reader, .{&lock}); + const thr3 = try Thread.spawn(.{}, reader, .{&lock}); + const wthread = try Thread.spawn(.{}, writer, .{&lock}); + + thr1.join(); + thr2.join(); + thr3.join(); + wthread.join(); +} +``` + + +## Yielding a thread + +The `Thread` struct supports yielding through the `yield()` method. +Yielding a thread means that the execution of the thread is temporarily stopped, +and the thread comes back to the end of the queue of priority of the scheduler from +your operating system. + +That is, when you yield a thread, you are essentially saying the following to your OS: +"Hey! Could you please stop executing this thread for now, and comeback to continue it later?". +You could also interpret this yield operation as: "Could you please deprioritize this thread, +to focus on doing other things instead?". +So this yield operation is also a way for you +to stop a particular thread, so that you can work and prioritize other threads instead. + +Is important to say that, yielding a thread is a "not so commom" thread operation these days. +In other words, not many programmers use yielding in production, simply because is hard to use +this operation and make it work properly, and also, there +are better alternatives. Most programmers prefer to use `join()` instead. +In fact, most of the times, when you see somebody using yield in some code example, they are mostly using it to help them +debug race conditions in their applications. That is, yield is mostly used as a debug tool nowadays. + +Anyway, if you want to yield a thread, just call the `yield()` method from it, like this: + +```zig +thread.yield(); +``` + + + + + + +## Common problems in threads + + + +### Deadlocks + +A deadlock occurs when two or more threads are blocked forever, +waiting for each other to release a resource. This usually happens when multiple locks are involved, +and the order of acquiring them is not well managed. + +The code example below demonstrates a deadlock situation. We have two different threads that execute +two different functions (`work1()` and `work2()`) in this example. And we also have two separate +mutexes. If you compile and run this code example, you will notice that the program just runs indefinitely, +without ending. + +When we look into the first thread, which executes the `work1()` function, we can +notice that this function acquires the `mut1` lock first. Because this is the first operation +that is executed inside this thread, which is the first thread created in the program. +After that, the function sleeps for 1 second, to +simulate some type of work, and then, the function tries to acquire the `mut2` lock. + +On the other hand, when we look into the second thread, which executes the `work2()` function, +we can see that this function acquires the `mut2` lock first. Because when this thread get's created and it tries +to acquire this `mut2` lock, the first thread is still sleeping on that "sleep 1 second" line. +After acquiring `mut2`, the `work2()` function also sleeps for 1 second, to +simulate some type of work, and then, the function tries to acquire the `mut1` lock. + +This creates a deadlock situation, because after the "sleep for 1 second" line in both threads, +the thread 1 is trying to acquire the `mut2` lock, but this lock is currently being used by thread 2. +However, at this moment, the thread 2 is also trying to acquire the `mut1` lock, which is currently +being used by thread 1. Therefore, both threads end up waiting for ever. Waiting for their peer to +free the lock that they want to acquire. + + +```{zig} +#| eval: false +var mut1: Mutex = .{}; var mut2: Mutex = .{}; +fn work1() !void { + mut1.lock(); + std.time.sleep(1 * std.time.ns_per_s); + mut2.lock(); + _ = try stdout.write("Doing some work 1\n"); + mut2.unlock(); mut1.unlock(); +} + +fn work2() !void { + mut2.lock(); + std.time.sleep(1 * std.time.ns_per_s); + mut1.lock(); + _ = try stdout.write("Doing some work 1\n"); + mut1.unlock(); mut2.unlock(); +} + +pub fn main() !void { + const thr1 = try Thread.spawn(.{}, work1, .{}); + const thr2 = try Thread.spawn(.{}, work2, .{}); + thr1.join(); + thr2.join(); +} +``` + + +### Not calling `join()` or `detach()` {#sec-not-call-join-detach} + +When you do not call either `join()` or `detach()` over a thread, then, this thread becomes a "zombie thread", +because it does not have a clear "return point". +You could also interpret this as: "nobody is properly resposible for managing the thread". +When we don't establish if a thread is either *joinable* or *detached*, +nobody becomes responsible for dealing with the return value of this thread, and also, +nobody becomes responsible for clearing (or freeing) the resources associated with this thread. + +You don't want to be in this situation, so remember to always use `join()` or `detach()` +on the threads that you create. When you don't use these methods, the execution of the thread +becomes completely independent from the execution of the main process in your program. +This means that the main process of your program might end before the thread finish it's job, +or vice-versa. The idea is that we have no idea of who is going to finish first. It +becomes a race condition problem. +In such case, we loose control over this thread, and it's resources are never freed +(i.e. you have leaked resources in the system). + + +### Cancelling or killing a particular thread + +When we think about the `pthreads` C library, there is a possible way to asynchronously kill or cancel +a thread, which is by sending a `SIGTERM` signal to the thread through the `pthread_kill()` function. +But canceling a thread like this is bad. Is dangerously bad. As consequence, the Zig implementation +of threads does not have a similar function, or, a similar way to asynchronously cancel or kill +a thread. + +Therefore, if you want to cancel a thread in the middle of it's execution in Zig, +then, one good strategy that you can take is to use control flow in your favor in conjunction with `join()`. +More specifically, you can design your thread around a while loop, that is constantly +checking if the thread should continue running. +If is time to cancel the thread, we could make the while loop break, and join the thread with the main thread +by calling `join()`. + +The code example below demonstrates to some extent this strategy. +Here, we are using control flow to break the while loop, and exit the thread earlier than +what we have initially planned to. This example also demonstrates how can we use +atomic objects in Zig with the `Value()` generic function that we have mentioned at @sec-atomic-operation. + + +```zig +var running = std.atomic.Value(bool).init(true); +var counter: u64 = 0; +fn do_more_work() void { + std.time.sleep(2 * std.time.ns_per_s); +} +fn work() !void { + while (running.load(.monotonic)) { + for (0..10000) |_| { counter += 1; } + if (counter < 15000) { + _ = try stdout.write("Time to cancel the thread.\n"); + running.store(false, .monotonic); + } else { + _ = try stdout.write("Time to do more work.\n"); + do_more_work(); + running.store(false, .monotonic); + } + } +} + +pub fn main() !void { + const thread = try Thread.spawn(.{}, work, .{}); + thread.join(); +} +``` + diff --git a/Chapters/14-zig-c-interop.qmd b/Chapters/14-zig-c-interop.qmd index a29bcb55..ba4739f5 100644 --- a/Chapters/14-zig-c-interop.qmd +++ b/Chapters/14-zig-c-interop.qmd @@ -1,516 +1,516 @@ ---- -engine: knitr -knitr: true -syntax-definition: "../Assets/zig.xml" ---- - -```{r} -#| include: false -source("../zig_engine.R") -knitr::opts_chunk$set( - auto_main = FALSE, - build_type = "lib" -) -``` - - - - -# Zig interoperability with C - -In this chapter, we are going to discuss the interoperability of Zig with C. -We have discussed at @sec-building-c-code how to build C code using the `zig` compiler. -But we haven't discussed yet how to actually use C code in Zig. In other words, -we haven't discussed yet how to call and use C code from Zig. - -These matters are discussed here, in this chapter. -Also, in our next small project in this book, we are going to use a C library in it. -As consequence, we will put in practice a lot of the knowledge discussed here on -this next project. - - -## How to call C code from Zig - -Interop with C is not something new. Most high-level programming languages have FFI (foreign function interfaces), -which can be used to call C code. For example, Python have Cython, R have `.Call()`, Javascript have `ccall()`, etc. -But Zig integrates with C in a deeper level, which affects not only the way that C code get's called, but also, -how this C code is compiled and incorporated into your Zig project. - -In summary, Zig have great interoperability with C. But if you want to call C code from Zig -you will have to perform the following steps: - -- import a C header file into your Zig code. -- link your Zig code with the C library. - -In more details, you should begin by importing into your Zig code the C header file that describes the C functions -that you want to call. Which is pretty much the same thing that you would do in C, by including the header files into your C module. -After you import the C header file, you can start calling and using the C functions described in this header file -directly in your Zig code. - -Everytime you use a C library in your Zig code, **you introduce a dependency in your build process**. -This should come as no surprise to anyone that have any experience with C and C++. -Because this is no different in C. Everytime you use a C library in your C code, you also -have to build and link your C code with this C library that you are using. - -When we use a C library in our Zig code, the `zig` compiler needs to access the definition of the C functions that -are being called in your Zig code. The C header file that we have imported into our Zig code provides the -declarations of these C functions, but not their definitions. So, in order to access these definitions, -the `zig` compiler needs to build your Zig code and link it with the C library in the build process. - -As we discussed across the @sec-build-system, there are different strategies to link something with a library. -This might involve building the C library first, and then, linking it with the Zig code. Or, -it could also involve just the linking step, if this C library is already built and -installed in your system. Anyway, if you have doubts about this, comeback to @sec-build-system. - - - -## Importing C header files {#sec-import-c-header} - -To import a C header file into our Zig code, we use the built-in functions `@cInclude()` and `@cImport()`. -Inside the `@cImport()` function, we open a block (with a pair of curly braces). Inside this block -we can (if we need to) include multiple `@cDefine()` calls to define C macros when including this specific C header file. -But for the most part, you will probably need to use just a single call inside this block at `@cImport()`, -which is a call to `@cInclude()`. - -This `@cInclude()` function is equivalent to the `#include` statement in C. -You provide the name of the C header that you want to include as input to this `@cInclude()` function, -then, in conjunction with `@cImport()`, it will perform the necessary steps -to include this C header file into your Zig code. - -You should bind the result of `@cImport()` to a constant object, pretty much like you would do with -`@import()`. You just assign the result to a constant object in your -Zig code, and, as consequence, all C functions, C structs, C macros, etc. that are defined inside the -C header file will be available through this constant object. - -Look at the code example below, where we are importing the Standard I/O C Library (`stdio.h`), -and calling the `printf()`[^printf] C function. Notice that we have also used in this example the C function `powf()`[^powf], -which comes from the C Math Library (`math.h`). -In order to compile this example, you have to link this Zig code with both -the C Standard Library and the C Math Library, by passing the flags `-lc` and `-lm` -to the `zig` compiler. - -[^printf]: -[^powf]: - - -```{zig} -#| eval: false -const cmath = @cImport({ - @cInclude("math.h"); -}); -const stdio = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); - -pub fn main() !void { - const x: f32 = 15.2; - const y = cmath.powf(x, @as(f32, 2.6)); - _ = stdio.printf("%.3f\n", y); -} -``` - -``` -1182.478 -``` - - -## About passing Zig values to C functions {#sec-zig-obj-to-c} - -Zig objects have some intrinsic differences between their C equivalents. -Probably the most noticeable one is the difference between C strings and Zig strings, -which I described at @sec-zig-strings. -Zig strings are objects that contains both an array of arbitrary bytes and a length value. -On the other hand, a C string is usually just a pointer to a null-terminated array of arbitrary bytes. - -Because of these intrinsic differences, in some specific cases, you cannot pass Zig objects directly -as inputs to C functions before you convert them into C compatible values. However, in some other cases, -you are allowed to pass Zig objects and Zig literal values directly as inputs to C functions, -and everything will work just fine, because the `zig` compiler will handle everything for you. - -So we have two different scenarios being described here. Let's call them "auto-conversion" and "need-conversion". -The "auto-conversion" scenario is when the `zig` compiler handles everything for you, and automatically convert your -Zig objects/values into C compatible values. In contrast, -the "need-conversion" scenario is when you, the programmer, have the responsibility of converting -that Zig object into a C compatible value, before passing it to C code. - -There is also a third scenario that is not being described here, which is when you create a C object, or, a C struct, or -a C compatible value in your Zig code, and you pass this C object/value as input to a C function in your Zig code. -This scenario will be described later at @sec-c-inputs. In this section, we are focused on the scenarios where -we are passing Zig objects/values to C code, instead of C objects/values being passed to C code. - - -### The "auto-conversion" scenario - -An "auto-conversion" scenario is when the `zig` compiler automatically converts our Zig objects into -C compatible values for us. This specific scenario happens mostly in two instances: - -- with string literal values; -- with any of the primitive data types that were introduced at @sec-primitive-data-types. - -When we think about the second instance described above, the `zig` compiler does automatically -convert any of the primitive data types into their C equivalents, because the compiler knows how -to properly convert a `i16` into a `signed short`, or, a `u8` into a `unsigned char`, etc. -Now, when we think about string literal values, they can be automatically -converted into C strings as well, specially because the `zig` compiler does not forces -a specific Zig data type into a string literal at first glance, unless you store this -string literal into a Zig object, and explicitly annotate the data type of this object. - -Thus, with string literal values, the `zig` compiler have more freedom to infer which is the appropriate data type -to be used in each situation. You could say that the string literal value "inherits it's data type" depending on the context that -it is used. Most of the times, this data type is going to be the type that we commonly associate with Zig strings (`[]const u8`). -But it might be a different type depending on the situation. When the `zig` compiler detects that you are providing -a string literal value as input to some C function, the compiler automatically interprets this string -literal as a C string value. - -As an example, look at the code exposed below. Here we are using -the `fopen()` C function to simply open and close a file. If you do not know how this `fopen()` -function works in C, it takes two C strings as input. But in this code example below, we are passing some -string literals written in our Zig code directly as inputs to this `fopen()` C function. - -In other words, we are not doing any type of conversion from a Zig string to a C string. -We are just passing the Zig string literals directly as inputs to the C function. And it works just fine! -Because the compiler inteprets the string `"foo.txt"` as a C string, as a result of the current context -that this string literal is being used. - - -```{zig} -#| eval: false -const c = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); - -pub fn main() !void { - const file = c.fopen("foo.txt", "rb"); - if (file == null) { - @panic("Could not open file!"); - } - if (c.fclose(file) != 0) { - return error.CouldNotCloseFileDescriptor; - } -} -``` - -Let's make some experiments, by writing the same code in different manners, and we -see how this affects the program. As a starting point, let's store the `"foo.txt"` string inside -a Zig object, like the `path` object below, and then, we pass this Zig object as input to the `fopen()` C function. - -If we do this, the program still compiles and runs successfully. Notice that I have ommitted most of the code in this example below. -This is just for brevitty reasons, because the remainder of the program is still the same. -The only difference between this example and the previous example is just these two lines exposed below. - -```{zig} -#| eval: false - const path = "foo.txt"; - const file = c.fopen(path, "rb"); - // Remainder of the program -``` - -Now, what happens if you give an explicit data type to the `path` object? Well, if I force -the `zig` compiler to interpret this `path` object as a Zig string object, -by annotating the `path` object with the data type `[]const u8`, then, I actually get a compile error -as demonstrated below. We get this compile error because now I'm forcing the `zig` compiler -to interpret `path` as a Zig string object. - -According to the error message, the `fopen()` C function was expecting to receive an -input value of type `[*c]const u8` (C string) instead of a value of type `[]const u8` (Zig string). -In more details, the type `[*c]const u8` is actually the Zig type representation of a C string. -The `[*c]` portion of this type identifies a C pointer. So, this Zig type essentially means: a C pointer to an array (`[*c]`) of -constant bytes (`const u8`). - - -```{zig} -#| eval: false - const path: []const u8 = "foo.txt"; - const file = c.fopen(path, "rb"); - // Remainder of the program -``` - -``` -t.zig:10:26: error: expected type '[*c]const u8', found '[]const u8' - const file = c.fopen(path, "rb"); - ^~~~ -``` - -Therefore, when we talk exclusively about string literal values, as long as you don't give an -explicit data type to these string literal values, the `zig` compiler should be capable of automatically -converting them into C strings as needed. - -But what about using one of the primitive data types that were introduced at @sec-primitive-data-types? -Let's take code exposed below as an example of that. Here, we are giving some float literal values as input -to the C function `powf()`. Notice that this code example compiles and runs succesfully. - -```{zig} -#| eval: false -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const cmath = @cImport({ - @cInclude("math.h"); -}); - -pub fn main() !void { - const y = cmath.powf(15.68, 2.32); - try stdout.print("{d}\n", .{y}); -} -``` - -``` -593.2023 -``` - -Once again, because the `zig` compiler does not associate a specific data type with the literal values -`15.68` and `2.32` at first glance, the compiler can automatically convert these values -into their C `float` (or `double`) equivalents, before it passes to the `powf()` C function. -Now, even if I give an explicit Zig data type to these literal values, by storing them into a Zig object, -and explicit annotating the type of these objects, the code still compiles and runs succesfully. - -```{zig} -#| eval: false - const x: f32 = 15.68; - const y = cmath.powf(x, 2.32); - // The remainder of the program -``` - -``` -593.2023 -``` - - - -### The "need-conversion" scenario - -A "need-conversion" scenario is when we need to manually convert our Zig objects into C compatible values -before passing them as input to C functions. You will fall in this scenario, when passing Zig string objects -to C functions. - -We already saw this specific circumstance on the last `fopen()` example, -which is reproduced below. You can see in this example, that we have given an explicit Zig data type -(`[]const u8`) to our `path` object, and, as a consequence of that, we have forced the `zig` compiler -to see this `path` object, as a Zig string object. Because of that, we need now to manually convert -this `path` object into a C string before we pass it to `fopen()`. - - -```{zig} -#| eval: false - const path: []const u8 = "foo.txt"; - const file = c.fopen(path, "rb"); - // Remainder of the program -``` - -``` -t.zig:10:26: error: expected type '[*c]const u8', found '[]const u8' - const file = c.fopen(path, "rb"); - ^~~~ -``` - - -There are different ways to convert a Zig string object into a C string. -One way to solve this problem is to provide the pointer to the underlying array -of bytes, instead of providing the Zig object directly as input. -You can access this pointer by using the `ptr` property of the Zig string object. - -The code example below demonstrates this strategy. Notice that, by giving the -pointer to the underlying array in `path` through the `ptr` property, we get no compile errors as result -while using the `fopen()` C function. - -```{zig} -#| eval: false - const path: []const u8 = "foo.txt"; - const file = c.fopen(path.ptr, "rb"); - // Remainder of the program -``` - -This strategy works because this pointer to the underlying array found in the `ptr` property, -is semantically identical to a C pointer to a null-terminated array of bytes, i.e. a C object of type `*unsigned char`. -This is why this option also solves the problem of converting the Zig string into a C string. - -Another option is to explicitly convert the Zig string object into a C pointer by using the -built-in function `@ptrCast()`. With this function we can convert -an object of type `[]const u8` into an object of type `[*c]const u8`. -As I described at the previous section, the `[*c]` portion of the type -means that it is a C pointer. This strategy is not-recommended. But it is -useful to demonstrate the use of `@ptrCast()`. - -You may recall of the `@as()` built-in function, which is used to explicit convert (or cast) a Zig value from a type `x` -to a type `y`, etc. That is, this `@as()` Zig function is equivalent to the -`as` keyword in Rust, and the C type casting syntax (e.g. `(int) x`). -But in our case here, we are not converting any type of object. -More specifically, we are converting something into a pointer, or, a C pointer more specifically. -Everytime a pointer is involved in some "type casting operation" in Zig, -the `@ptrCast()` function is involved. This `@ptrCast()` function is responsible -for converting a pointer of one type to a pointer of another type. - -In the example below, we are using this function to cast our `path` object -into a C pointer to an array of bytes. Then, we pass this C pointer as input -to the `fopen()` function. Notice that this code example compiles succesfully -with no errors. - -```{zig} -#| eval: false - const path: []const u8 = "foo.txt"; - const c_path: [*c]const u8 = @ptrCast(path); - const file = c.fopen(c_path, "rb"); - // Remainder of the program -``` - - - -## Creating C objects in Zig {#sec-c-inputs} - -Creating C objects, or, in other words, creating instances of C structs in your Zig code -is actually something quite easy to do. You first need to import the C header file (like I described at @sec-import-c-header) that describes -the C struct that you are trying to instantiate in your Zig code. After that, you can just -create a new object in your Zig code, and annotate it with the C type of the struct. - -For example, suppose we have a C header file called `user.h`, and that this header file is declaring a new struct named `User`. -This C header file is exposed below: - -```c -#include - -typedef struct -{ - uint64_t id; - char* name; -} User; -``` - -This `User` C struct have two distinct fields, or two struct members, named `id` and `name`. -The field `id` is a unsigned 64-bit integer value, while the field `name` is just a standard C string. -Now, suppose that I want to create an instance of this `User` struct in my Zig code. -I can do that by importing this `user.h` header file into my Zig code, and creating -a new object with type `User`. These steps are reproduced in the code example below. - -Notice that I have used the keyword `undefined` in this example. This allows me to -create the `new_user` object without the need to provide an initial value to the object. -As consequence, the underlying memory associated with this `new_user` is unintialized, -i.e. the memory is currently populated with "garbage" values. -Thus, this expression have the exact same effect of the expression `User new_user;` in C, -which means "declare a new object named `new_user` of type `User`". - -Is our responsibility to properly initialize this memory associated with this `new_user` object, -by assigining valid values to the members (or the fields) of the C struct. In the example below, I am assigning the integer 1 to the -member `id`. I am also saving the string `"pedropark99"` into the member `name`. -Notice in this example that I manually add the null character (zero byte) to the end of the allocated array -for this string. This null character marks the end of the array in C. - -```{zig} -#| auto_main: false -#| eval: false -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const c = @cImport({ - @cInclude("user.h"); -}); - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - - var new_user: c.User = undefined; - new_user.id = 1; - var user_name = try allocator.alloc(u8, 12); - defer allocator.free(user_name); - @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); - user_name[user_name.len - 1] = 0; - new_user.name = user_name.ptr; -} -``` - -So, in this example above, we are manually initializing each field of the C struct. -We could say that, in this instance, we are "manually instantiating -the C struct object". However, when we use C libraries in our Zig code, we rarely need -to manually instantiate the C structs like in the above example. Only because C libraries -usually provide "constructor functions" in their public APIs. As consequence, we normally rely on -these constructor functions to properly initialize the C structs, and -the struct fields for us. - -For example, consider the Harfbuzz C library. This a text shaping C library, -and it works around a "buffer object", or, more specifically, an instance of -the C struct `hb_buffer_t`. Therefore, we need to create an instance of -this C struct if we want to use this C library. Luckily, this library offers -the function `hb_buffer_create()`, which we can use to create such object. -So the Zig code necessary to create such object would probably look something like this: - -```zig -const c = @cImport({ - @cInclude("hb.h"); -}); -var buf: c.hb_buffer_t = c.hb_buffer_create(); -// Do stuff with the "buffer object" -``` - -Therefore, we do not need to manually create an instance of the C struct -`hb_buffer_t` here, and manually assign valid values to each field in this C struct. -Because the constructor function `hb_buffer_create()` is doing this heavy job for us. - -Since this `buf` object (and also the `new_user` object) is an instance of a C struct, this -object is, in itself, a C compatible value. It is a C object defined in our Zig code. As consequence, -you can freely pass this object as input to any C function that expects to receive this type -of C struct as input. You do not need to use any special syntax, or, to convert this object in -any special manner to use it in C code. -This is how we create and use C objects in our Zig code. - - - -## Passing C structs across Zig functions {#sec-pass-c-structs} - -Now that we have learned how to create/declare C objects in our Zig code, we -need to learn how to pass these C objects as inputs to Zig functions. -As I described at @sec-c-inputs, we can freely pass these C objects as inputs to C code -that we call from our Zig code. But what about passing these C objects as inputs to Zig functions? - -In essence, this specific case requires one small adjustment in the Zig function declaration. -All you need to do, is to make sure that you pass your C object *by reference* to the function, -instead of passing it *by value*. To do that, you have to annotate the data type of the function argument -that is receiving this C object as "a pointer to the C struct", instead of annotating it as "an instance of the C struct". - -Let's consider the C struct `User` from the `user.h` C header file that we have used at @sec-c-inputs. -Now, consider that we want to create a Zig function that sets the value of the `id` field -in this C struct, like the `set_user_id()` function declared below. -Notice that the `user` argument in this function is annotated as a pointer (`*`) to a `c.User` object. - -Therefore, essentially, all you have to do when passing C objects to Zig functions, is to add `*` to the -data type of the function argument that is receiving the C object. This will make sure that -the C object is passed *by reference* to the function. - -Now, because we have transformed the function argument into a pointer, -everytime that you have to access the value pointed by the input pointer inside the function body, for whatever reason (e.g. you want -to read, update, or delete this value), you have to dereference the pointer with the `.*` syntax that we -learned from @sec-pointer. Notice that the `set_user_id()` function is using this syntax to alter -the value in the `id` field of the `User` struct pointed by the input pointer. - -```{zig} -#| auto_main: false -#| eval: false -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const c = @cImport({ - @cInclude("user.h"); -}); -fn set_user_id(id: u64, user: *c.User) void { - user.*.id = id; -} - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - - var new_user: c.User = undefined; - new_user.id = 1; - var user_name = try allocator.alloc(u8, 12); - defer allocator.free(user_name); - @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); - user_name[user_name.len - 1] = 0; - new_user.name = user_name.ptr; - - set_user_id(25, &new_user); - try stdout.print("New ID: {any}\n", .{new_user.id}); -} -``` - -``` -New ID: 25 -``` - +--- +engine: knitr +knitr: true +syntax-definition: "../Assets/zig.xml" +--- + +```{r} +#| include: false +source("../zig_engine.R") +knitr::opts_chunk$set( + auto_main = FALSE, + build_type = "lib" +) +``` + + + + +# Zig interoperability with C + +In this chapter, we are going to discuss the interoperability of Zig with C. +We have discussed at @sec-building-c-code how to build C code using the `zig` compiler. +But we haven't discussed yet how to actually use C code in Zig. In other words, +we haven't discussed yet how to call and use C code from Zig. + +These matters are discussed here, in this chapter. +Also, in our next small project in this book, we are going to use a C library in it. +As consequence, we will put in practice a lot of the knowledge discussed here on +this next project. + + +## How to call C code from Zig + +Interop with C is not something new. Most high-level programming languages have FFI (foreign function interfaces), +which can be used to call C code. For example, Python have Cython, R have `.Call()`, Javascript have `ccall()`, etc. +But Zig integrates with C in a deeper level, which affects not only the way that C code get's called, but also, +how this C code is compiled and incorporated into your Zig project. + +In summary, Zig have great interoperability with C. But if you want to call C code from Zig +you will have to perform the following steps: + +- import a C header file into your Zig code. +- link your Zig code with the C library. + +In more details, you should begin by importing into your Zig code the C header file that describes the C functions +that you want to call. Which is pretty much the same thing that you would do in C, by including the header files into your C module. +After you import the C header file, you can start calling and using the C functions described in this header file +directly in your Zig code. + +Everytime you use a C library in your Zig code, **you introduce a dependency in your build process**. +This should come as no surprise to anyone that have any experience with C and C++. +Because this is no different in C. Everytime you use a C library in your C code, you also +have to build and link your C code with this C library that you are using. + +When we use a C library in our Zig code, the `zig` compiler needs to access the definition of the C functions that +are being called in your Zig code. The C header file that we have imported into our Zig code provides the +declarations of these C functions, but not their definitions. So, in order to access these definitions, +the `zig` compiler needs to build your Zig code and link it with the C library in the build process. + +As we discussed across the @sec-build-system, there are different strategies to link something with a library. +This might involve building the C library first, and then, linking it with the Zig code. Or, +it could also involve just the linking step, if this C library is already built and +installed in your system. Anyway, if you have doubts about this, comeback to @sec-build-system. + + + +## Importing C header files {#sec-import-c-header} + +To import a C header file into our Zig code, we use the built-in functions `@cInclude()` and `@cImport()`. +Inside the `@cImport()` function, we open a block (with a pair of curly braces). Inside this block +we can (if we need to) include multiple `@cDefine()` calls to define C macros when including this specific C header file. +But for the most part, you will probably need to use just a single call inside this block at `@cImport()`, +which is a call to `@cInclude()`. + +This `@cInclude()` function is equivalent to the `#include` statement in C. +You provide the name of the C header that you want to include as input to this `@cInclude()` function, +then, in conjunction with `@cImport()`, it will perform the necessary steps +to include this C header file into your Zig code. + +You should bind the result of `@cImport()` to a constant object, pretty much like you would do with +`@import()`. You just assign the result to a constant object in your +Zig code, and, as consequence, all C functions, C structs, C macros, etc. that are defined inside the +C header file will be available through this constant object. + +Look at the code example below, where we are importing the Standard I/O C Library (`stdio.h`), +and calling the `printf()`[^printf] C function. Notice that we have also used in this example the C function `powf()`[^powf], +which comes from the C Math Library (`math.h`). +In order to compile this example, you have to link this Zig code with both +the C Standard Library and the C Math Library, by passing the flags `-lc` and `-lm` +to the `zig` compiler. + +[^printf]: +[^powf]: + + +```{zig} +#| eval: false +const cmath = @cImport({ + @cInclude("math.h"); +}); +const stdio = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); + +pub fn main() !void { + const x: f32 = 15.2; + const y = cmath.powf(x, @as(f32, 2.6)); + _ = stdio.printf("%.3f\n", y); +} +``` + +``` +1182.478 +``` + + +## About passing Zig values to C functions {#sec-zig-obj-to-c} + +Zig objects have some intrinsic differences between their C equivalents. +Probably the most noticeable one is the difference between C strings and Zig strings, +which I described at @sec-zig-strings. +Zig strings are objects that contains both an array of arbitrary bytes and a length value. +On the other hand, a C string is usually just a pointer to a null-terminated array of arbitrary bytes. + +Because of these intrinsic differences, in some specific cases, you cannot pass Zig objects directly +as inputs to C functions before you convert them into C compatible values. However, in some other cases, +you are allowed to pass Zig objects and Zig literal values directly as inputs to C functions, +and everything will work just fine, because the `zig` compiler will handle everything for you. + +So we have two different scenarios being described here. Let's call them "auto-conversion" and "need-conversion". +The "auto-conversion" scenario is when the `zig` compiler handles everything for you, and automatically convert your +Zig objects/values into C compatible values. In contrast, +the "need-conversion" scenario is when you, the programmer, have the responsibility of converting +that Zig object into a C compatible value, before passing it to C code. + +There is also a third scenario that is not being described here, which is when you create a C object, or, a C struct, or +a C compatible value in your Zig code, and you pass this C object/value as input to a C function in your Zig code. +This scenario will be described later at @sec-c-inputs. In this section, we are focused on the scenarios where +we are passing Zig objects/values to C code, instead of C objects/values being passed to C code. + + +### The "auto-conversion" scenario + +An "auto-conversion" scenario is when the `zig` compiler automatically converts our Zig objects into +C compatible values for us. This specific scenario happens mostly in two instances: + +- with string literal values; +- with any of the primitive data types that were introduced at @sec-primitive-data-types. + +When we think about the second instance described above, the `zig` compiler does automatically +convert any of the primitive data types into their C equivalents, because the compiler knows how +to properly convert a `i16` into a `signed short`, or, a `u8` into a `unsigned char`, etc. +Now, when we think about string literal values, they can be automatically +converted into C strings as well, specially because the `zig` compiler does not forces +a specific Zig data type into a string literal at first glance, unless you store this +string literal into a Zig object, and explicitly annotate the data type of this object. + +Thus, with string literal values, the `zig` compiler have more freedom to infer which is the appropriate data type +to be used in each situation. You could say that the string literal value "inherits it's data type" depending on the context that +it is used. Most of the times, this data type is going to be the type that we commonly associate with Zig strings (`[]const u8`). +But it might be a different type depending on the situation. When the `zig` compiler detects that you are providing +a string literal value as input to some C function, the compiler automatically interprets this string +literal as a C string value. + +As an example, look at the code exposed below. Here we are using +the `fopen()` C function to simply open and close a file. If you do not know how this `fopen()` +function works in C, it takes two C strings as input. But in this code example below, we are passing some +string literals written in our Zig code directly as inputs to this `fopen()` C function. + +In other words, we are not doing any type of conversion from a Zig string to a C string. +We are just passing the Zig string literals directly as inputs to the C function. And it works just fine! +Because the compiler inteprets the string `"foo.txt"` as a C string, as a result of the current context +that this string literal is being used. + + +```{zig} +#| eval: false +const c = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); + +pub fn main() !void { + const file = c.fopen("foo.txt", "rb"); + if (file == null) { + @panic("Could not open file!"); + } + if (c.fclose(file) != 0) { + return error.CouldNotCloseFileDescriptor; + } +} +``` + +Let's make some experiments, by writing the same code in different manners, and we +see how this affects the program. As a starting point, let's store the `"foo.txt"` string inside +a Zig object, like the `path` object below, and then, we pass this Zig object as input to the `fopen()` C function. + +If we do this, the program still compiles and runs successfully. Notice that I have ommitted most of the code in this example below. +This is just for brevitty reasons, because the remainder of the program is still the same. +The only difference between this example and the previous example is just these two lines exposed below. + +```{zig} +#| eval: false + const path = "foo.txt"; + const file = c.fopen(path, "rb"); + // Remainder of the program +``` + +Now, what happens if you give an explicit data type to the `path` object? Well, if I force +the `zig` compiler to interpret this `path` object as a Zig string object, +by annotating the `path` object with the data type `[]const u8`, then, I actually get a compile error +as demonstrated below. We get this compile error because now I'm forcing the `zig` compiler +to interpret `path` as a Zig string object. + +According to the error message, the `fopen()` C function was expecting to receive an +input value of type `[*c]const u8` (C string) instead of a value of type `[]const u8` (Zig string). +In more details, the type `[*c]const u8` is actually the Zig type representation of a C string. +The `[*c]` portion of this type identifies a C pointer. So, this Zig type essentially means: a C pointer to an array (`[*c]`) of +constant bytes (`const u8`). + + +```{zig} +#| eval: false + const path: []const u8 = "foo.txt"; + const file = c.fopen(path, "rb"); + // Remainder of the program +``` + +``` +t.zig:10:26: error: expected type '[*c]const u8', found '[]const u8' + const file = c.fopen(path, "rb"); + ^~~~ +``` + +Therefore, when we talk exclusively about string literal values, as long as you don't give an +explicit data type to these string literal values, the `zig` compiler should be capable of automatically +converting them into C strings as needed. + +But what about using one of the primitive data types that were introduced at @sec-primitive-data-types? +Let's take code exposed below as an example of that. Here, we are giving some float literal values as input +to the C function `powf()`. Notice that this code example compiles and runs succesfully. + +```{zig} +#| eval: false +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const cmath = @cImport({ + @cInclude("math.h"); +}); + +pub fn main() !void { + const y = cmath.powf(15.68, 2.32); + try stdout.print("{d}\n", .{y}); +} +``` + +``` +593.2023 +``` + +Once again, because the `zig` compiler does not associate a specific data type with the literal values +`15.68` and `2.32` at first glance, the compiler can automatically convert these values +into their C `float` (or `double`) equivalents, before it passes to the `powf()` C function. +Now, even if I give an explicit Zig data type to these literal values, by storing them into a Zig object, +and explicit annotating the type of these objects, the code still compiles and runs succesfully. + +```{zig} +#| eval: false + const x: f32 = 15.68; + const y = cmath.powf(x, 2.32); + // The remainder of the program +``` + +``` +593.2023 +``` + + + +### The "need-conversion" scenario + +A "need-conversion" scenario is when we need to manually convert our Zig objects into C compatible values +before passing them as input to C functions. You will fall in this scenario, when passing Zig string objects +to C functions. + +We already saw this specific circumstance on the last `fopen()` example, +which is reproduced below. You can see in this example, that we have given an explicit Zig data type +(`[]const u8`) to our `path` object, and, as a consequence of that, we have forced the `zig` compiler +to see this `path` object, as a Zig string object. Because of that, we need now to manually convert +this `path` object into a C string before we pass it to `fopen()`. + + +```{zig} +#| eval: false + const path: []const u8 = "foo.txt"; + const file = c.fopen(path, "rb"); + // Remainder of the program +``` + +``` +t.zig:10:26: error: expected type '[*c]const u8', found '[]const u8' + const file = c.fopen(path, "rb"); + ^~~~ +``` + + +There are different ways to convert a Zig string object into a C string. +One way to solve this problem is to provide the pointer to the underlying array +of bytes, instead of providing the Zig object directly as input. +You can access this pointer by using the `ptr` property of the Zig string object. + +The code example below demonstrates this strategy. Notice that, by giving the +pointer to the underlying array in `path` through the `ptr` property, we get no compile errors as result +while using the `fopen()` C function. + +```{zig} +#| eval: false + const path: []const u8 = "foo.txt"; + const file = c.fopen(path.ptr, "rb"); + // Remainder of the program +``` + +This strategy works because this pointer to the underlying array found in the `ptr` property, +is semantically identical to a C pointer to a null-terminated array of bytes, i.e. a C object of type `*unsigned char`. +This is why this option also solves the problem of converting the Zig string into a C string. + +Another option is to explicitly convert the Zig string object into a C pointer by using the +built-in function `@ptrCast()`. With this function we can convert +an object of type `[]const u8` into an object of type `[*c]const u8`. +As I described at the previous section, the `[*c]` portion of the type +means that it is a C pointer. This strategy is not-recommended. But it is +useful to demonstrate the use of `@ptrCast()`. + +You may recall of the `@as()` built-in function, which is used to explicit convert (or cast) a Zig value from a type `x` +to a type `y`, etc. That is, this `@as()` Zig function is equivalent to the +`as` keyword in Rust, and the C type casting syntax (e.g. `(int) x`). +But in our case here, we are not converting any type of object. +More specifically, we are converting something into a pointer, or, a C pointer more specifically. +Everytime a pointer is involved in some "type casting operation" in Zig, +the `@ptrCast()` function is involved. This `@ptrCast()` function is responsible +for converting a pointer of one type to a pointer of another type. + +In the example below, we are using this function to cast our `path` object +into a C pointer to an array of bytes. Then, we pass this C pointer as input +to the `fopen()` function. Notice that this code example compiles succesfully +with no errors. + +```{zig} +#| eval: false + const path: []const u8 = "foo.txt"; + const c_path: [*c]const u8 = @ptrCast(path); + const file = c.fopen(c_path, "rb"); + // Remainder of the program +``` + + + +## Creating C objects in Zig {#sec-c-inputs} + +Creating C objects, or, in other words, creating instances of C structs in your Zig code +is actually something quite easy to do. You first need to import the C header file (like I described at @sec-import-c-header) that describes +the C struct that you are trying to instantiate in your Zig code. After that, you can just +create a new object in your Zig code, and annotate it with the C type of the struct. + +For example, suppose we have a C header file called `user.h`, and that this header file is declaring a new struct named `User`. +This C header file is exposed below: + +```c +#include + +typedef struct +{ + uint64_t id; + char* name; +} User; +``` + +This `User` C struct have two distinct fields, or two struct members, named `id` and `name`. +The field `id` is a unsigned 64-bit integer value, while the field `name` is just a standard C string. +Now, suppose that I want to create an instance of this `User` struct in my Zig code. +I can do that by importing this `user.h` header file into my Zig code, and creating +a new object with type `User`. These steps are reproduced in the code example below. + +Notice that I have used the keyword `undefined` in this example. This allows me to +create the `new_user` object without the need to provide an initial value to the object. +As consequence, the underlying memory associated with this `new_user` is unintialized, +i.e. the memory is currently populated with "garbage" values. +Thus, this expression have the exact same effect of the expression `User new_user;` in C, +which means "declare a new object named `new_user` of type `User`". + +Is our responsibility to properly initialize this memory associated with this `new_user` object, +by assigining valid values to the members (or the fields) of the C struct. In the example below, I am assigning the integer 1 to the +member `id`. I am also saving the string `"pedropark99"` into the member `name`. +Notice in this example that I manually add the null character (zero byte) to the end of the allocated array +for this string. This null character marks the end of the array in C. + +```{zig} +#| auto_main: false +#| eval: false +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const c = @cImport({ + @cInclude("user.h"); +}); + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + + var new_user: c.User = undefined; + new_user.id = 1; + var user_name = try allocator.alloc(u8, 12); + defer allocator.free(user_name); + @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); + user_name[user_name.len - 1] = 0; + new_user.name = user_name.ptr; +} +``` + +So, in this example above, we are manually initializing each field of the C struct. +We could say that, in this instance, we are "manually instantiating +the C struct object". However, when we use C libraries in our Zig code, we rarely need +to manually instantiate the C structs like in the above example. Only because C libraries +usually provide "constructor functions" in their public APIs. As consequence, we normally rely on +these constructor functions to properly initialize the C structs, and +the struct fields for us. + +For example, consider the Harfbuzz C library. This a text shaping C library, +and it works around a "buffer object", or, more specifically, an instance of +the C struct `hb_buffer_t`. Therefore, we need to create an instance of +this C struct if we want to use this C library. Luckily, this library offers +the function `hb_buffer_create()`, which we can use to create such object. +So the Zig code necessary to create such object would probably look something like this: + +```zig +const c = @cImport({ + @cInclude("hb.h"); +}); +var buf: c.hb_buffer_t = c.hb_buffer_create(); +// Do stuff with the "buffer object" +``` + +Therefore, we do not need to manually create an instance of the C struct +`hb_buffer_t` here, and manually assign valid values to each field in this C struct. +Because the constructor function `hb_buffer_create()` is doing this heavy job for us. + +Since this `buf` object (and also the `new_user` object) is an instance of a C struct, this +object is, in itself, a C compatible value. It is a C object defined in our Zig code. As consequence, +you can freely pass this object as input to any C function that expects to receive this type +of C struct as input. You do not need to use any special syntax, or, to convert this object in +any special manner to use it in C code. +This is how we create and use C objects in our Zig code. + + + +## Passing C structs across Zig functions {#sec-pass-c-structs} + +Now that we have learned how to create/declare C objects in our Zig code, we +need to learn how to pass these C objects as inputs to Zig functions. +As I described at @sec-c-inputs, we can freely pass these C objects as inputs to C code +that we call from our Zig code. But what about passing these C objects as inputs to Zig functions? + +In essence, this specific case requires one small adjustment in the Zig function declaration. +All you need to do, is to make sure that you pass your C object *by reference* to the function, +instead of passing it *by value*. To do that, you have to annotate the data type of the function argument +that is receiving this C object as "a pointer to the C struct", instead of annotating it as "an instance of the C struct". + +Let's consider the C struct `User` from the `user.h` C header file that we have used at @sec-c-inputs. +Now, consider that we want to create a Zig function that sets the value of the `id` field +in this C struct, like the `set_user_id()` function declared below. +Notice that the `user` argument in this function is annotated as a pointer (`*`) to a `c.User` object. + +Therefore, essentially, all you have to do when passing C objects to Zig functions, is to add `*` to the +data type of the function argument that is receiving the C object. This will make sure that +the C object is passed *by reference* to the function. + +Now, because we have transformed the function argument into a pointer, +everytime that you have to access the value pointed by the input pointer inside the function body, for whatever reason (e.g. you want +to read, update, or delete this value), you have to dereference the pointer with the `.*` syntax that we +learned from @sec-pointer. Notice that the `set_user_id()` function is using this syntax to alter +the value in the `id` field of the `User` struct pointed by the input pointer. + +```{zig} +#| auto_main: false +#| eval: false +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const c = @cImport({ + @cInclude("user.h"); +}); +fn set_user_id(id: u64, user: *c.User) void { + user.*.id = id; +} + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + + var new_user: c.User = undefined; + new_user.id = 1; + var user_name = try allocator.alloc(u8, 12); + defer allocator.free(user_name); + @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); + user_name[user_name.len - 1] = 0; + new_user.name = user_name.ptr; + + set_user_id(25, &new_user); + try stdout.print("New ID: {any}\n", .{new_user.id}); +} +``` + +``` +New ID: 25 +``` + diff --git a/Figures/hashtable.svg b/Figures/hashtable.svg index 33532a9d..31bf3b61 100644 --- a/Figures/hashtable.svg +++ b/Figures/hashtable.svg @@ -1,170 +1,170 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - hash - function - - - - keys - - - - - John Smith - - - - - - - Lisa Smith - - - - - - - Sandra Dee - - - - - - - - - - - - buckets - - - - - 00 - - - - - 01 - - 521-8976 - - - - 02 - - 521-1234 - - - - 03 - - - - : - : - - - - 13 - - - - - 14 - - 521-9655 - - - - 15 - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + hash + function + + + + keys + + + + + John Smith + + + + + + + Lisa Smith + + + + + + + Sandra Dee + + + + + + + + + + + + buckets + + + + + 00 + + + + + 01 + + 521-8976 + + + + 02 + + 521-1234 + + + + 03 + + + + : + : + + + + 13 + + + + + 14 + + 521-9655 + + + + 15 + + + + + + + + diff --git a/Figures/lifo-stack.svg b/Figures/lifo-stack.svg index bf304795..7e249ac7 100644 --- a/Figures/lifo-stack.svg +++ b/Figures/lifo-stack.svg @@ -1,165 +1,165 @@ - -LIFO (Last In, First Out) Stack - - - - 1 - Push - 2 - 1 - - - - - - - - 2 - Push - 3 - 2 - 1 - - - - - - - - 3 - Push - 4 - 2 - 3 - 1 - - - - - - - - 4 - Push - 4 - 5 - 2 - 3 - 1 - - - - - - - - 5 - Push - 6 - 4 - 5 - 2 - 3 - 1 - - - - - - - 6 - Pop - 6 - 4 - 5 - 2 - 3 - 1 - - - - - - - - 7 - Pop - 5 - 4 - 2 - 3 - 1 - - - - - - - - 8 - Pop - 4 - 2 - 3 - 1 - - - - - - - - 9 - Pop - 3 - 2 - 1 - - - - - - - - 10 - Pop - 1 - 2 - - - - - - + +LIFO (Last In, First Out) Stack + + + + 1 + Push + 2 + 1 + + + + + + + + 2 + Push + 3 + 2 + 1 + + + + + + + + 3 + Push + 4 + 2 + 3 + 1 + + + + + + + + 4 + Push + 4 + 5 + 2 + 3 + 1 + + + + + + + + 5 + Push + 6 + 4 + 5 + 2 + 3 + 1 + + + + + + + 6 + Pop + 6 + 4 + 5 + 2 + 3 + 1 + + + + + + + + 7 + Pop + 5 + 4 + 2 + 3 + 1 + + + + + + + + 8 + Pop + 4 + 2 + 3 + 1 + + + + + + + + 9 + Pop + 3 + 2 + 1 + + + + + + + + 10 + Pop + 1 + 2 + + + + + + diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..eea272b0 --- /dev/null +++ b/LICENSE @@ -0,0 +1 @@ +Introduction to Zig: a project-based book © 2024 by Pedro Duarte Faria is licensed under Creative Commons Attribution 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ diff --git a/ZigExamples/calling-c/cstring.zig b/ZigExamples/calling-c/cstring.zig index 3d69f530..7e84ee94 100644 --- a/ZigExamples/calling-c/cstring.zig +++ b/ZigExamples/calling-c/cstring.zig @@ -1,12 +1,12 @@ -const std = @import("std"); - -pub fn main() !void { - const zig_string: []const u8 = "test"; - // Every pointer type casting is made with the @ptrCast() function. - const c_string: [*c]const u8 = @ptrCast(zig_string); - _ = c_string; - // Pointer type casting cannot be done only with @as(), the @ptrCast() function must be involved. - // This is why the expression below does not compile succesfully. - const v2 = @as([*c]const u8, zig_string); - _ = v2; -} +const std = @import("std"); + +pub fn main() !void { + const zig_string: []const u8 = "test"; + // Every pointer type casting is made with the @ptrCast() function. + const c_string: [*c]const u8 = @ptrCast(zig_string); + _ = c_string; + // Pointer type casting cannot be done only with @as(), the @ptrCast() function must be involved. + // This is why the expression below does not compile succesfully. + const v2 = @as([*c]const u8, zig_string); + _ = v2; +} diff --git a/ZigExamples/calling-c/fopen.zig b/ZigExamples/calling-c/fopen.zig index 7218a86b..dd432470 100644 --- a/ZigExamples/calling-c/fopen.zig +++ b/ZigExamples/calling-c/fopen.zig @@ -1,14 +1,14 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const c = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); - -pub fn main() !void { - const file = c.fopen("ZigExamples/image_filter/build.zig", "rb"); - const close_status = c.fclose(file); - if (close_status != 0) { - return error.CouldNotCloseFileDescriptor; - } -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const c = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); + +pub fn main() !void { + const file = c.fopen("ZigExamples/image_filter/build.zig", "rb"); + const close_status = c.fclose(file); + if (close_status != 0) { + return error.CouldNotCloseFileDescriptor; + } +} diff --git a/ZigExamples/calling-c/instantiating-c-objects.zig b/ZigExamples/calling-c/instantiating-c-objects.zig index 856c86a3..068afa41 100644 --- a/ZigExamples/calling-c/instantiating-c-objects.zig +++ b/ZigExamples/calling-c/instantiating-c-objects.zig @@ -1,20 +1,20 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const c = @cImport({ - @cInclude("user.h"); -}); - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - - var new_user: c.User = undefined; - new_user.id = 1; - var user_name = try allocator.alloc(u8, 12); - defer allocator.free(user_name); - @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); - user_name[user_name.len - 1] = 0; - new_user.name = user_name.ptr; - - try stdout.print("{any}\n", .{(new_user.name + 11).*}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const c = @cImport({ + @cInclude("user.h"); +}); + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + + var new_user: c.User = undefined; + new_user.id = 1; + var user_name = try allocator.alloc(u8, 12); + defer allocator.free(user_name); + @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); + user_name[user_name.len - 1] = 0; + new_user.name = user_name.ptr; + + try stdout.print("{any}\n", .{(new_user.name + 11).*}); +} diff --git a/ZigExamples/calling-c/pow.zig b/ZigExamples/calling-c/pow.zig index b5bf4a28..2f6ff4f7 100644 --- a/ZigExamples/calling-c/pow.zig +++ b/ZigExamples/calling-c/pow.zig @@ -1,11 +1,11 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const cmath = @cImport({ - @cInclude("math.h"); -}); - -pub fn main() !void { - const x: f32 = 15.68; - const y = cmath.powf(x, 2.32); - try stdout.print("{d}\n", .{y}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const cmath = @cImport({ + @cInclude("math.h"); +}); + +pub fn main() !void { + const x: f32 = 15.68; + const y = cmath.powf(x, 2.32); + try stdout.print("{d}\n", .{y}); +} diff --git a/ZigExamples/calling-c/set_user_id.zig b/ZigExamples/calling-c/set_user_id.zig index 52380216..b2ab96c6 100644 --- a/ZigExamples/calling-c/set_user_id.zig +++ b/ZigExamples/calling-c/set_user_id.zig @@ -1,25 +1,25 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const c = @cImport({ - @cInclude("user.h"); -}); - -fn set_user_id(id: u64, user: *c.User) void { - user.*.id = id; -} - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - - var new_user: c.User = undefined; - new_user.id = 1; - var user_name = try allocator.alloc(u8, 12); - defer allocator.free(user_name); - @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); - user_name[user_name.len - 1] = 0; - new_user.name = user_name.ptr; - - set_user_id(25, &new_user); - try stdout.print("{any}\n", .{new_user.id}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const c = @cImport({ + @cInclude("user.h"); +}); + +fn set_user_id(id: u64, user: *c.User) void { + user.*.id = id; +} + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + + var new_user: c.User = undefined; + new_user.id = 1; + var user_name = try allocator.alloc(u8, 12); + defer allocator.free(user_name); + @memcpy(user_name[0..(user_name.len - 1)], "pedropark99"); + user_name[user_name.len - 1] = 0; + new_user.name = user_name.ptr; + + set_user_id(25, &new_user); + try stdout.print("{any}\n", .{new_user.id}); +} diff --git a/ZigExamples/calling-c/stdio-example.zig b/ZigExamples/calling-c/stdio-example.zig index 32185d82..68726e87 100644 --- a/ZigExamples/calling-c/stdio-example.zig +++ b/ZigExamples/calling-c/stdio-example.zig @@ -1,10 +1,10 @@ -// Compile this program with: `zig build-exe stdio-example.zig -lc` -const stdio = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); - -pub fn main() !void { - const x: f32 = 15.2; - _ = stdio.printf("%.3f\n", x); -} +// Compile this program with: `zig build-exe stdio-example.zig -lc` +const stdio = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); + +pub fn main() !void { + const x: f32 = 15.2; + _ = stdio.printf("%.3f\n", x); +} diff --git a/ZigExamples/calling-c/stdio-math-example.zig b/ZigExamples/calling-c/stdio-math-example.zig index 6e6720d1..9dd8224a 100644 --- a/ZigExamples/calling-c/stdio-math-example.zig +++ b/ZigExamples/calling-c/stdio-math-example.zig @@ -1,13 +1,13 @@ -const cmath = @cImport({ - @cInclude("math.h"); -}); -const stdio = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); - -pub fn main() !void { - const x: f32 = 15.2; - const y = cmath.powf(x, @as(f32, 2.6)); - _ = stdio.printf("%.3f\n", y); -} +const cmath = @cImport({ + @cInclude("math.h"); +}); +const stdio = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); + +pub fn main() !void { + const x: f32 = 15.2; + const y = cmath.powf(x, @as(f32, 2.6)); + _ = stdio.printf("%.3f\n", y); +} diff --git a/ZigExamples/calling-c/user.h b/ZigExamples/calling-c/user.h index 87650450..1b37b9bd 100644 --- a/ZigExamples/calling-c/user.h +++ b/ZigExamples/calling-c/user.h @@ -1,8 +1,8 @@ -#include - -typedef struct -{ - uint64_t id; - char* name; -} User; - +#include + +typedef struct +{ + uint64_t id; + char* name; +} User; + diff --git a/ZigExamples/data-structures/generic_array.zig b/ZigExamples/data-structures/generic_array.zig index 27754f6a..73247b43 100644 --- a/ZigExamples/data-structures/generic_array.zig +++ b/ZigExamples/data-structures/generic_array.zig @@ -1,13 +1,13 @@ -const std = @import("std"); -fn Array(comptime T: type) type { - return struct { - items: []T, - size: usize, - }; -} - -pub fn main() !void { - var buffer: [5]u8 = undefined; - const ar = Array(u8){ .items = &buffer, .size = 0 }; - std.debug.print("{any}\n", .{@TypeOf(ar)}); -} +const std = @import("std"); +fn Array(comptime T: type) type { + return struct { + items: []T, + size: usize, + }; +} + +pub fn main() !void { + var buffer: [5]u8 = undefined; + const ar = Array(u8){ .items = &buffer, .size = 0 }; + std.debug.print("{any}\n", .{@TypeOf(ar)}); +} diff --git a/ZigExamples/data-structures/generic_linked_list.zig b/ZigExamples/data-structures/generic_linked_list.zig index c99d4c8e..8821d380 100644 --- a/ZigExamples/data-structures/generic_linked_list.zig +++ b/ZigExamples/data-structures/generic_linked_list.zig @@ -1,26 +1,26 @@ -const std = @import("std"); -const Allocator = std.mem.Allocator; - -fn LinkedList(comptime T: type) type { - return struct { - pub const Node = struct { - value: T, - next: ?*Node = null, - }; - - const self = @This(); - first: ?*Node = null, - - pub fn insert(list: *self, new_node: *Node) void { - new_node.next = list.first; - list.first = new_node; - } - }; -} - -pub fn main() !void { - const Listu32 = LinkedList(u32); - var list = Listu32{}; - var n1 = Listu32.Node{ .value = 1 }; - list.insert(&n1); -} +const std = @import("std"); +const Allocator = std.mem.Allocator; + +fn LinkedList(comptime T: type) type { + return struct { + pub const Node = struct { + value: T, + next: ?*Node = null, + }; + + const self = @This(); + first: ?*Node = null, + + pub fn insert(list: *self, new_node: *Node) void { + new_node.next = list.first; + list.first = new_node; + } + }; +} + +pub fn main() !void { + const Listu32 = LinkedList(u32); + var list = Listu32{}; + var n1 = Listu32.Node{ .value = 1 }; + list.insert(&n1); +} diff --git a/ZigExamples/data-structures/generic_stack.zig b/ZigExamples/data-structures/generic_stack.zig index e6926a46..04bb8381 100644 --- a/ZigExamples/data-structures/generic_stack.zig +++ b/ZigExamples/data-structures/generic_stack.zig @@ -1,70 +1,70 @@ -const std = @import("std"); -const Allocator = std.mem.Allocator; - -fn Stack(comptime T: type) type { - return struct { - items: []T, - capacity: usize, - length: usize, - allocator: Allocator, - const Self = @This(); - - pub fn init(allocator: Allocator, capacity: usize) !Stack(T) { - var buf = try allocator.alloc(T, capacity); - @memset(buf[0..], 0); - return .{ - .items = buf[0..], - .capacity = capacity, - .length = 0, - .allocator = allocator, - }; - } - - pub fn push(self: *Self, val: T) !void { - if ((self.length + 1) > self.capacity) { - var new_buf = try self.allocator.alloc(T, self.capacity * 2); - @memset(new_buf[0..], 0); - @memcpy(new_buf[0..self.capacity], self.items); - self.allocator.free(self.items); - self.items = new_buf; - } - - self.items[self.length] = val; - self.length += 1; - } - - pub fn pop(self: *Self) void { - if (self.length == 0) return; - - self.items[self.length - 1] = 0; - self.length -= 1; - } - - pub fn deinit(self: *Self) void { - self.allocator.free(self.items); - } - }; -} - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - const Stacku8 = Stack(u8); - var stack = try Stacku8.init(allocator, 10); - defer stack.deinit(); - try stack.push(1); - try stack.push(2); - try stack.push(3); - try stack.push(4); - try stack.push(5); - try stack.push(6); - - std.debug.print("Stack len: {d}\n", .{stack.length}); - std.debug.print("Stack capacity: {d}\n", .{stack.capacity}); - - stack.pop(); - std.debug.print("Stack len: {d}\n", .{stack.length}); - stack.pop(); - std.debug.print("Stack len: {d}\n", .{stack.length}); - std.debug.print("Stack state: {any}\n", .{stack.items}); -} +const std = @import("std"); +const Allocator = std.mem.Allocator; + +fn Stack(comptime T: type) type { + return struct { + items: []T, + capacity: usize, + length: usize, + allocator: Allocator, + const Self = @This(); + + pub fn init(allocator: Allocator, capacity: usize) !Stack(T) { + var buf = try allocator.alloc(T, capacity); + @memset(buf[0..], 0); + return .{ + .items = buf[0..], + .capacity = capacity, + .length = 0, + .allocator = allocator, + }; + } + + pub fn push(self: *Self, val: T) !void { + if ((self.length + 1) > self.capacity) { + var new_buf = try self.allocator.alloc(T, self.capacity * 2); + @memset(new_buf[0..], 0); + @memcpy(new_buf[0..self.capacity], self.items); + self.allocator.free(self.items); + self.items = new_buf; + } + + self.items[self.length] = val; + self.length += 1; + } + + pub fn pop(self: *Self) void { + if (self.length == 0) return; + + self.items[self.length - 1] = 0; + self.length -= 1; + } + + pub fn deinit(self: *Self) void { + self.allocator.free(self.items); + } + }; +} + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + const Stacku8 = Stack(u8); + var stack = try Stacku8.init(allocator, 10); + defer stack.deinit(); + try stack.push(1); + try stack.push(2); + try stack.push(3); + try stack.push(4); + try stack.push(5); + try stack.push(6); + + std.debug.print("Stack len: {d}\n", .{stack.length}); + std.debug.print("Stack capacity: {d}\n", .{stack.capacity}); + + stack.pop(); + std.debug.print("Stack len: {d}\n", .{stack.length}); + stack.pop(); + std.debug.print("Stack len: {d}\n", .{stack.length}); + std.debug.print("Stack state: {any}\n", .{stack.items}); +} diff --git a/ZigExamples/data-structures/hash_table_it_key.zig b/ZigExamples/data-structures/hash_table_it_key.zig index e0d9730e..a8d9e773 100644 --- a/ZigExamples/data-structures/hash_table_it_key.zig +++ b/ZigExamples/data-structures/hash_table_it_key.zig @@ -1,18 +1,18 @@ -const std = @import("std"); -const AutoHashMap = std.hash_map.AutoHashMap; - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var hash_table = AutoHashMap(u32, u16).init(allocator); - defer hash_table.deinit(); - - try hash_table.put(54321, 89); - try hash_table.put(50050, 55); - try hash_table.put(57709, 41); - - var kit = hash_table.keyIterator(); - while (kit.next()) |key| { - std.debug.print("Key: {d}\n", .{key.*}); - } -} +const std = @import("std"); +const AutoHashMap = std.hash_map.AutoHashMap; + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var hash_table = AutoHashMap(u32, u16).init(allocator); + defer hash_table.deinit(); + + try hash_table.put(54321, 89); + try hash_table.put(50050, 55); + try hash_table.put(57709, 41); + + var kit = hash_table.keyIterator(); + while (kit.next()) |key| { + std.debug.print("Key: {d}\n", .{key.*}); + } +} diff --git a/ZigExamples/data-structures/hash_tables.zig b/ZigExamples/data-structures/hash_tables.zig index d2073d72..64071f70 100644 --- a/ZigExamples/data-structures/hash_tables.zig +++ b/ZigExamples/data-structures/hash_tables.zig @@ -1,21 +1,21 @@ -const std = @import("std"); -const AutoHashMap = std.hash_map.AutoHashMap; - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var hash_table = AutoHashMap(u32, u32).init(allocator); - defer hash_table.deinit(); - - try hash_table.put(54321, 89); - try hash_table.put(50050, 55); - try hash_table.put(57709, 41); - - std.debug.print("N of values stored: {d}\n", .{hash_table.count()}); - std.debug.print("Value at key 50050: {d}\n", .{hash_table.get(50050).?}); - - if (hash_table.remove(57709)) { - std.debug.print("Value at key 57709 succesfully removed!\n", .{}); - } - std.debug.print("N of values stored: {d}\n", .{hash_table.count()}); -} +const std = @import("std"); +const AutoHashMap = std.hash_map.AutoHashMap; + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var hash_table = AutoHashMap(u32, u32).init(allocator); + defer hash_table.deinit(); + + try hash_table.put(54321, 89); + try hash_table.put(50050, 55); + try hash_table.put(57709, 41); + + std.debug.print("N of values stored: {d}\n", .{hash_table.count()}); + std.debug.print("Value at key 50050: {d}\n", .{hash_table.get(50050).?}); + + if (hash_table.remove(57709)) { + std.debug.print("Value at key 57709 succesfully removed!\n", .{}); + } + std.debug.print("N of values stored: {d}\n", .{hash_table.count()}); +} diff --git a/ZigExamples/data-structures/hash_tables_it.zig b/ZigExamples/data-structures/hash_tables_it.zig index bd5ca3a9..876327c4 100644 --- a/ZigExamples/data-structures/hash_tables_it.zig +++ b/ZigExamples/data-structures/hash_tables_it.zig @@ -1,21 +1,21 @@ -const std = @import("std"); -const AutoHashMap = std.hash_map.AutoHashMap; - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var hash_table = AutoHashMap(u32, u16).init(allocator); - defer hash_table.deinit(); - - try hash_table.put(54321, 89); - try hash_table.put(50050, 55); - try hash_table.put(57709, 41); - - var it = hash_table.iterator(); - while (it.next()) |kv| { - // Access the current key - std.debug.print("Key: {d} | ", .{kv.key_ptr.*}); - // Access the current value - std.debug.print("Value: {d}\n", .{kv.value_ptr.*}); - } -} +const std = @import("std"); +const AutoHashMap = std.hash_map.AutoHashMap; + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var hash_table = AutoHashMap(u32, u16).init(allocator); + defer hash_table.deinit(); + + try hash_table.put(54321, 89); + try hash_table.put(50050, 55); + try hash_table.put(57709, 41); + + var it = hash_table.iterator(); + while (it.next()) |kv| { + // Access the current key + std.debug.print("Key: {d} | ", .{kv.key_ptr.*}); + // Access the current value + std.debug.print("Value: {d}\n", .{kv.value_ptr.*}); + } +} diff --git a/ZigExamples/data-structures/linked_list.zig b/ZigExamples/data-structures/linked_list.zig index 88922fc1..66818098 100644 --- a/ZigExamples/data-structures/linked_list.zig +++ b/ZigExamples/data-structures/linked_list.zig @@ -1,19 +1,19 @@ -const std = @import("std"); -const testing = std.testing; -const SinglyLinkedList = std.SinglyLinkedList; -const Lu32 = SinglyLinkedList(u32); - -pub fn main() !void { - var list = Lu32{}; - var one = Lu32.Node{ .data = 1 }; - var two = Lu32.Node{ .data = 2 }; - var three = Lu32.Node{ .data = 3 }; - var four = Lu32.Node{ .data = 4 }; - var five = Lu32.Node{ .data = 5 }; - - list.prepend(&two); // {2} - two.insertAfter(&five); // {2, 5} - list.prepend(&one); // {1, 2, 5} - two.insertAfter(&three); // {1, 2, 3, 5} - three.insertAfter(&four); // {1, 2, 3, 4, 5} -} +const std = @import("std"); +const testing = std.testing; +const SinglyLinkedList = std.SinglyLinkedList; +const Lu32 = SinglyLinkedList(u32); + +pub fn main() !void { + var list = Lu32{}; + var one = Lu32.Node{ .data = 1 }; + var two = Lu32.Node{ .data = 2 }; + var three = Lu32.Node{ .data = 3 }; + var four = Lu32.Node{ .data = 4 }; + var five = Lu32.Node{ .data = 5 }; + + list.prepend(&two); // {2} + two.insertAfter(&five); // {2, 5} + list.prepend(&one); // {1, 2, 5} + two.insertAfter(&three); // {1, 2, 3, 5} + three.insertAfter(&four); // {1, 2, 3, 4, 5} +} diff --git a/ZigExamples/data-structures/multi-array-list.zig b/ZigExamples/data-structures/multi-array-list.zig index cbb06773..b60552b1 100644 --- a/ZigExamples/data-structures/multi-array-list.zig +++ b/ZigExamples/data-structures/multi-array-list.zig @@ -1,31 +1,31 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const Person = struct { - name: []const u8, - age: u8, - height: f32, -}; -const PersonArray = std.MultiArrayList(Person); - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var people = PersonArray{}; - defer people.deinit(allocator); - - try people.append(allocator, .{ .name = "Auguste", .age = 15, .height = 1.54 }); - try people.append(allocator, .{ .name = "Elena", .age = 26, .height = 1.65 }); - try people.append(allocator, .{ .name = "Michael", .age = 64, .height = 1.87 }); - - for (people.items(.age)) |*age| { - try stdout.print("Age: {d}\n", .{age.*}); - } - - var slice = people.slice(); - for (slice.items(.age)) |*age| { - age.* += 10; - } - for (slice.items(.name), slice.items(.age)) |*n, *a| { - try stdout.print("Name: {s}, Age: {d}\n", .{ n.*, a.* }); - } -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Person = struct { + name: []const u8, + age: u8, + height: f32, +}; +const PersonArray = std.MultiArrayList(Person); + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var people = PersonArray{}; + defer people.deinit(allocator); + + try people.append(allocator, .{ .name = "Auguste", .age = 15, .height = 1.54 }); + try people.append(allocator, .{ .name = "Elena", .age = 26, .height = 1.65 }); + try people.append(allocator, .{ .name = "Michael", .age = 64, .height = 1.87 }); + + for (people.items(.age)) |*age| { + try stdout.print("Age: {d}\n", .{age.*}); + } + + var slice = people.slice(); + for (slice.items(.age)) |*age| { + age.* += 10; + } + for (slice.items(.name), slice.items(.age)) |*n, *a| { + try stdout.print("Name: {s}, Age: {d}\n", .{ n.*, a.* }); + } +} diff --git a/ZigExamples/data-structures/ordered_remove.zig b/ZigExamples/data-structures/ordered_remove.zig index 548b9676..23367fcf 100644 --- a/ZigExamples/data-structures/ordered_remove.zig +++ b/ZigExamples/data-structures/ordered_remove.zig @@ -1,22 +1,22 @@ -const std = @import("std"); - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var buffer = try std.ArrayList(u8) - .initCapacity(allocator, 100); - defer buffer.deinit(); - - for (0..10) |i| { - const index: u8 = @intCast(i); - try buffer.append(index); - } - - std.debug.print("{any}\n", .{buffer.items}); - std.debug.print("==========================\n", .{}); - _ = buffer.orderedRemove(3); - _ = buffer.orderedRemove(3); - - std.debug.print("{any}\n", .{buffer.items}); - std.debug.print("{any}\n", .{buffer.items.len}); -} +const std = @import("std"); + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var buffer = try std.ArrayList(u8) + .initCapacity(allocator, 100); + defer buffer.deinit(); + + for (0..10) |i| { + const index: u8 = @intCast(i); + try buffer.append(index); + } + + std.debug.print("{any}\n", .{buffer.items}); + std.debug.print("==========================\n", .{}); + _ = buffer.orderedRemove(3); + _ = buffer.orderedRemove(3); + + std.debug.print("{any}\n", .{buffer.items}); + std.debug.print("{any}\n", .{buffer.items.len}); +} diff --git a/ZigExamples/data-structures/print_chars.zig b/ZigExamples/data-structures/print_chars.zig index 39832b3b..4c12a295 100644 --- a/ZigExamples/data-structures/print_chars.zig +++ b/ZigExamples/data-structures/print_chars.zig @@ -1,17 +1,17 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var buffer = try std.ArrayList(u8) - .initCapacity(allocator, 10); - defer buffer.deinit(); - - try buffer.appendSlice("My Pedro"); - try buffer.insert(4, '3'); - try buffer.insertSlice(2, " name"); - for (buffer.items) |char| { - try stdout.print("{c}", .{char}); - } -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var buffer = try std.ArrayList(u8) + .initCapacity(allocator, 10); + defer buffer.deinit(); + + try buffer.appendSlice("My Pedro"); + try buffer.insert(4, '3'); + try buffer.insertSlice(2, " name"); + for (buffer.items) |char| { + try stdout.print("{c}", .{char}); + } +} diff --git a/ZigExamples/data-structures/stack.zig b/ZigExamples/data-structures/stack.zig index 8029734f..b2bcfbc1 100644 --- a/ZigExamples/data-structures/stack.zig +++ b/ZigExamples/data-structures/stack.zig @@ -1,66 +1,66 @@ -const std = @import("std"); -const Allocator = std.mem.Allocator; - -const Stack = struct { - items: []u32, - capacity: usize, - length: usize, - allocator: Allocator, - - pub fn init(allocator: Allocator, capacity: usize) !Stack { - var buf = try allocator.alloc(u32, capacity); - @memset(buf[0..], 0); - return .{ - .items = buf[0..], - .capacity = capacity, - .length = 0, - .allocator = allocator, - }; - } - - pub fn push(self: *Stack, val: u32) !void { - if ((self.length + 1) > self.capacity) { - var new_buf = try self.allocator.alloc(u32, self.capacity * 2); - @memset(new_buf[0..], 0); - @memcpy(new_buf[0..self.capacity], self.items); - self.allocator.free(self.items); - self.items = new_buf; - } - - self.items[self.length] = val; - self.length += 1; - } - - pub fn pop(self: *Stack) void { - if (self.length == 0) return; - - self.items[self.length - 1] = 0; - self.length -= 1; - } - - pub fn deinit(self: *Stack) void { - self.allocator.free(self.items); - } -}; - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var stack = try Stack.init(allocator, 10); - defer stack.deinit(); - try stack.push(1); - try stack.push(2); - try stack.push(3); - try stack.push(4); - try stack.push(5); - try stack.push(6); - - std.debug.print("Stack len: {d}\n", .{stack.length}); - std.debug.print("Stack capacity: {d}\n", .{stack.capacity}); - - stack.pop(); - std.debug.print("Stack len: {d}\n", .{stack.length}); - stack.pop(); - std.debug.print("Stack len: {d}\n", .{stack.length}); - std.debug.print("Stack state: {any}\n", .{stack.items}); -} +const std = @import("std"); +const Allocator = std.mem.Allocator; + +const Stack = struct { + items: []u32, + capacity: usize, + length: usize, + allocator: Allocator, + + pub fn init(allocator: Allocator, capacity: usize) !Stack { + var buf = try allocator.alloc(u32, capacity); + @memset(buf[0..], 0); + return .{ + .items = buf[0..], + .capacity = capacity, + .length = 0, + .allocator = allocator, + }; + } + + pub fn push(self: *Stack, val: u32) !void { + if ((self.length + 1) > self.capacity) { + var new_buf = try self.allocator.alloc(u32, self.capacity * 2); + @memset(new_buf[0..], 0); + @memcpy(new_buf[0..self.capacity], self.items); + self.allocator.free(self.items); + self.items = new_buf; + } + + self.items[self.length] = val; + self.length += 1; + } + + pub fn pop(self: *Stack) void { + if (self.length == 0) return; + + self.items[self.length - 1] = 0; + self.length -= 1; + } + + pub fn deinit(self: *Stack) void { + self.allocator.free(self.items); + } +}; + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var stack = try Stack.init(allocator, 10); + defer stack.deinit(); + try stack.push(1); + try stack.push(2); + try stack.push(3); + try stack.push(4); + try stack.push(5); + try stack.push(6); + + std.debug.print("Stack len: {d}\n", .{stack.length}); + std.debug.print("Stack capacity: {d}\n", .{stack.capacity}); + + stack.pop(); + std.debug.print("Stack len: {d}\n", .{stack.length}); + stack.pop(); + std.debug.print("Stack len: {d}\n", .{stack.length}); + std.debug.print("Stack state: {any}\n", .{stack.items}); +} diff --git a/ZigExamples/data-structures/string_hash.zig b/ZigExamples/data-structures/string_hash.zig index 87a97869..5fb40da7 100644 --- a/ZigExamples/data-structures/string_hash.zig +++ b/ZigExamples/data-structures/string_hash.zig @@ -1,18 +1,18 @@ -const std = @import("std"); - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var ages = std.StringHashMap(u8).init(allocator); - - defer ages.deinit(); - try ages.put("Pedro", 25); - try ages.put("Matheus", 21); - try ages.put("Abgail", 42); - - var it = ages.iterator(); - while (it.next()) |kv| { - std.debug.print("Key: {s} | ", .{kv.key_ptr.*}); - std.debug.print("Age: {d}\n", .{kv.value_ptr.*}); - } -} +const std = @import("std"); + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var ages = std.StringHashMap(u8).init(allocator); + + defer ages.deinit(); + try ages.put("Pedro", 25); + try ages.put("Matheus", 21); + try ages.put("Abgail", 42); + + var it = ages.iterator(); + while (it.next()) |kv| { + std.debug.print("Key: {s} | ", .{kv.key_ptr.*}); + std.debug.print("Age: {d}\n", .{kv.value_ptr.*}); + } +} diff --git a/ZigExamples/errors/error_list.zig b/ZigExamples/errors/error_list.zig index 9857e1eb..392db6d7 100644 --- a/ZigExamples/errors/error_list.zig +++ b/ZigExamples/errors/error_list.zig @@ -1,16 +1,16 @@ -const std = @import("std"); -const AllocError = std.heap.Allocator.Error; - -fn print_name() AllocError!void { - const stdout = std.io.getStdOut().writer(); - try stdout.print("My name is Pedro!\n", .{}); - - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - const some_number = try allocator.create(u32); - defer allocator.destroy(some_number); -} - -pub fn main() !void { - try print_name(); -} +const std = @import("std"); +const AllocError = std.heap.Allocator.Error; + +fn print_name() AllocError!void { + const stdout = std.io.getStdOut().writer(); + try stdout.print("My name is Pedro!\n", .{}); + + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + const some_number = try allocator.create(u32); + defer allocator.destroy(some_number); +} + +pub fn main() !void { + try print_name(); +} diff --git a/ZigExamples/file-io/append_to_file.zig b/ZigExamples/file-io/append_to_file.zig index 3a3e1a8f..c99d4c55 100644 --- a/ZigExamples/file-io/append_to_file.zig +++ b/ZigExamples/file-io/append_to_file.zig @@ -1,10 +1,10 @@ -const std = @import("std"); - -pub fn main() !void { - const cwd = std.fs.cwd(); - const file = try cwd.openFile("foo.txt", .{ .mode = .write_only }); - defer file.close(); - try file.seekFromEnd(0); - var fw = file.writer(); - _ = try fw.writeAll("Some random text to write\n"); -} +const std = @import("std"); + +pub fn main() !void { + const cwd = std.fs.cwd(); + const file = try cwd.openFile("foo.txt", .{ .mode = .write_only }); + defer file.close(); + try file.seekFromEnd(0); + var fw = file.writer(); + _ = try fw.writeAll("Some random text to write\n"); +} diff --git a/ZigExamples/file-io/buff_io.zig b/ZigExamples/file-io/buff_io.zig index 691bfb1c..56b72892 100644 --- a/ZigExamples/file-io/buff_io.zig +++ b/ZigExamples/file-io/buff_io.zig @@ -1,14 +1,14 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -pub fn main() !void { - var file = try std.fs.cwd().openFile("ZigExamples/file-io/lorem.txt", .{}); - defer file.close(); - var buffered = std.io.bufferedReader(file.reader()); - var reader = buffered.reader(); - - var buffer: [1000]u8 = undefined; - @memset(buffer[0..], 0); - - _ = try reader.readUntilDelimiterOrEof(buffer[0..], '\n'); - try stdout.print("{s}\n", .{buffer}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +pub fn main() !void { + var file = try std.fs.cwd().openFile("ZigExamples/file-io/lorem.txt", .{}); + defer file.close(); + var buffered = std.io.bufferedReader(file.reader()); + var reader = buffered.reader(); + + var buffer: [1000]u8 = undefined; + @memset(buffer[0..], 0); + + _ = try reader.readUntilDelimiterOrEof(buffer[0..], '\n'); + try stdout.print("{s}\n", .{buffer}); +} diff --git a/ZigExamples/file-io/copy_file.zig b/ZigExamples/file-io/copy_file.zig index f4c9fad6..08549e37 100644 --- a/ZigExamples/file-io/copy_file.zig +++ b/ZigExamples/file-io/copy_file.zig @@ -1,6 +1,6 @@ -const std = @import("std"); - -pub fn main() !void { - const cwd = std.fs.cwd(); - try cwd.copyFile("foo.txt", cwd, "ZigExamples/file-io/foo.txt", .{}); -} +const std = @import("std"); + +pub fn main() !void { + const cwd = std.fs.cwd(); + try cwd.copyFile("foo.txt", cwd, "ZigExamples/file-io/foo.txt", .{}); +} diff --git a/ZigExamples/file-io/create_file.zig b/ZigExamples/file-io/create_file.zig index cdcdfe60..d75c8151 100644 --- a/ZigExamples/file-io/create_file.zig +++ b/ZigExamples/file-io/create_file.zig @@ -1,6 +1,6 @@ -const std = @import("std"); -pub fn main() !void { - const cwd = std.fs.cwd(); - const file = try cwd.createFile("foo.txt", .{}); - file.close(); -} +const std = @import("std"); +pub fn main() !void { + const cwd = std.fs.cwd(); + const file = try cwd.createFile("foo.txt", .{}); + file.close(); +} diff --git a/ZigExamples/file-io/create_file_and_read.zig b/ZigExamples/file-io/create_file_and_read.zig index 6c29c9ca..0b118039 100644 --- a/ZigExamples/file-io/create_file_and_read.zig +++ b/ZigExamples/file-io/create_file_and_read.zig @@ -1,17 +1,17 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -pub fn main() !void { - const cwd = std.fs.cwd(); - const file = try cwd.createFile("foo.txt", .{ .read = true }); - defer file.close(); - - var fw = file.writer(); - _ = try fw.writeAll("We are going to read this line\n"); - - var buffer: [300]u8 = undefined; - @memset(buffer[0..], 0); - try file.seekTo(0); - var fr = file.reader(); - _ = try fr.readAll(buffer[0..]); - try stdout.print("{s}\n", .{buffer}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +pub fn main() !void { + const cwd = std.fs.cwd(); + const file = try cwd.createFile("foo.txt", .{ .read = true }); + defer file.close(); + + var fw = file.writer(); + _ = try fw.writeAll("We are going to read this line\n"); + + var buffer: [300]u8 = undefined; + @memset(buffer[0..], 0); + try file.seekTo(0); + var fr = file.reader(); + _ = try fr.readAll(buffer[0..]); + try stdout.print("{s}\n", .{buffer}); +} diff --git a/ZigExamples/file-io/create_file_and_write_toit.zig b/ZigExamples/file-io/create_file_and_write_toit.zig index a38079e2..741cb5e4 100644 --- a/ZigExamples/file-io/create_file_and_write_toit.zig +++ b/ZigExamples/file-io/create_file_and_write_toit.zig @@ -1,9 +1,9 @@ -const std = @import("std"); -pub fn main() !void { - const cwd = std.fs.cwd(); - const file = try cwd.createFile("foo.txt", .{}); - defer file.close(); - // Do things with the file ... - var fw = file.writer(); - _ = try fw.writeAll("Writing this line to the file\n"); -} +const std = @import("std"); +pub fn main() !void { + const cwd = std.fs.cwd(); + const file = try cwd.createFile("foo.txt", .{}); + defer file.close(); + // Do things with the file ... + var fw = file.writer(); + _ = try fw.writeAll("Writing this line to the file\n"); +} diff --git a/ZigExamples/file-io/delete-dir.zig b/ZigExamples/file-io/delete-dir.zig index bc37fa30..ec847128 100644 --- a/ZigExamples/file-io/delete-dir.zig +++ b/ZigExamples/file-io/delete-dir.zig @@ -1,7 +1,7 @@ -const std = @import("std"); - -pub fn main() !void { - const cwd = std.fs.cwd(); - try cwd.makeDir("src"); - try cwd.deleteDir("src"); -} +const std = @import("std"); + +pub fn main() !void { + const cwd = std.fs.cwd(); + try cwd.makeDir("src"); + try cwd.deleteDir("src"); +} diff --git a/ZigExamples/file-io/delete_file.zig b/ZigExamples/file-io/delete_file.zig index cee31e12..54985dfd 100644 --- a/ZigExamples/file-io/delete_file.zig +++ b/ZigExamples/file-io/delete_file.zig @@ -1,6 +1,6 @@ -const std = @import("std"); - -pub fn main() !void { - const cwd = std.fs.cwd(); - try cwd.deleteFile("foo.txt"); -} +const std = @import("std"); + +pub fn main() !void { + const cwd = std.fs.cwd(); + try cwd.deleteFile("foo.txt"); +} diff --git a/ZigExamples/file-io/iterate.zig b/ZigExamples/file-io/iterate.zig index ffef6c85..bc8ddaa6 100644 --- a/ZigExamples/file-io/iterate.zig +++ b/ZigExamples/file-io/iterate.zig @@ -1,11 +1,11 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); - -pub fn main() !void { - const cwd = std.fs.cwd(); - const dir = try cwd.openDir("ZigExamples/file-io/", .{ .iterate = true }); - var it = dir.iterate(); - while (try it.next()) |entry| { - try stdout.print("File name: {s}\n", .{entry.name}); - } -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); + +pub fn main() !void { + const cwd = std.fs.cwd(); + const dir = try cwd.openDir("ZigExamples/file-io/", .{ .iterate = true }); + var it = dir.iterate(); + while (try it.next()) |entry| { + try stdout.print("File name: {s}\n", .{entry.name}); + } +} diff --git a/ZigExamples/file-io/lorem.txt b/ZigExamples/file-io/lorem.txt index b1ebaa32..5c038c7e 100644 --- a/ZigExamples/file-io/lorem.txt +++ b/ZigExamples/file-io/lorem.txt @@ -1 +1 @@ -Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed tincidunt erat sed nulla ornare, nec aliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus, ultrices eleifend porttitor ut, finibus ut tortor. Maecenas sapien justo, finibus tincidunt dictum ac, semper et lectus. Vivamus molestie egestas orci ac viverra. Pellentesque nec arcu facilisis, euismod eros eu, sodales nisl. Ut egestas sagittis arcu, in accumsan sapien rhoncus sit amet. Aenean neque lectus, imperdiet ac lobortis a, ullamcorper sed massa. Nullam porttitor porttitor erat nec dapibus. Ut vel dui nec nulla vulputate molestie eget non nunc. Ut commodo luctus ipsum, in finibus libero feugiat eget. Etiam vel ante at urna tincidunt posuere sit amet ut felis. Maecenas finibus suscipit tristique. Donec viverra non sapien id suscipit. +Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed tincidunt erat sed nulla ornare, nec aliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus, ultrices eleifend porttitor ut, finibus ut tortor. Maecenas sapien justo, finibus tincidunt dictum ac, semper et lectus. Vivamus molestie egestas orci ac viverra. Pellentesque nec arcu facilisis, euismod eros eu, sodales nisl. Ut egestas sagittis arcu, in accumsan sapien rhoncus sit amet. Aenean neque lectus, imperdiet ac lobortis a, ullamcorper sed massa. Nullam porttitor porttitor erat nec dapibus. Ut vel dui nec nulla vulputate molestie eget non nunc. Ut commodo luctus ipsum, in finibus libero feugiat eget. Etiam vel ante at urna tincidunt posuere sit amet ut felis. Maecenas finibus suscipit tristique. Donec viverra non sapien id suscipit. diff --git a/ZigExamples/file-io/make-dir.zig b/ZigExamples/file-io/make-dir.zig index db871da0..7b7b4d6d 100644 --- a/ZigExamples/file-io/make-dir.zig +++ b/ZigExamples/file-io/make-dir.zig @@ -1,7 +1,7 @@ -const std = @import("std"); - -pub fn main() !void { - const cwd = std.fs.cwd(); - try cwd.makeDir("src"); - try cwd.makePath("src/decoders/jpg/"); -} +const std = @import("std"); + +pub fn main() !void { + const cwd = std.fs.cwd(); + try cwd.makeDir("src"); + try cwd.makePath("src/decoders/jpg/"); +} diff --git a/ZigExamples/file-io/user_input.zig b/ZigExamples/file-io/user_input.zig index e9a3099f..7c710f09 100644 --- a/ZigExamples/file-io/user_input.zig +++ b/ZigExamples/file-io/user_input.zig @@ -1,10 +1,10 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const stdin = std.io.getStdIn().reader(); -pub fn main() !void { - try stdout.writeAll("Type your name\n"); - var buffer: [20]u8 = undefined; - @memset(buffer[0..], 0); - _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\n'); - try stdout.print("Your name is: {s}\n", .{buffer}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const stdin = std.io.getStdIn().reader(); +pub fn main() !void { + try stdout.writeAll("Type your name\n"); + var buffer: [20]u8 = undefined; + @memset(buffer[0..], 0); + _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\n'); + try stdout.print("Your name is: {s}\n", .{buffer}); +} diff --git a/ZigExamples/image_filter/src/image_filter.zig b/ZigExamples/image_filter/src/image_filter.zig index 5db799f7..f6d237ab 100644 --- a/ZigExamples/image_filter/src/image_filter.zig +++ b/ZigExamples/image_filter/src/image_filter.zig @@ -1,160 +1,160 @@ -const std = @import("std"); -const math = std.math; -const stdout = std.io.getStdOut().writer(); -const stderr = std.io.getStdErr().writer(); -const cmath = @cImport({ - @cInclude("math.h"); -}); -const c = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); -const png = @cImport({ - @cInclude("spng.h"); -}); - -const ImageHeader = struct { - width: u32, - height: u32, - bit_depth: u8, - color_type: u8, - compression_method: u8, - filter_method: u8, - interlace_method: u8, -}; -const ImageData = struct { - data: []u8, - decoded_size: u64, - header: ImageHeader, - - pub fn deinit(self: ImageData, allocator: std.mem.Allocator) void { - allocator.free(self.data); - } -}; - -fn get_image_header(ctx: *png.spng_ctx) !ImageHeader { - var image_header: png.spng_ihdr = undefined; - if (png.spng_get_ihdr(ctx, &image_header) != 0) { - return error.CouldNotGetImageHeader; - } - - return ImageHeader{ - .width = image_header.width, - .height = image_header.height, - .bit_depth = image_header.bit_depth, - .color_type = image_header.color_type, - .compression_method = image_header.compression_method, - .filter_method = image_header.filter_method, - .interlace_method = image_header.interlace_method, - }; -} - -fn see_image_properties(image_header: ImageHeader) !void { - try stdout.print("width: {d}, height: {d}, bit depth: {d}\n", .{ image_header.width, image_header.height, image_header.bit_depth }); -} - -fn calc_output_size(ctx: *png.spng_ctx) !u64 { - var output_size: u64 = 0; - const status = png.spng_decoded_image_size(ctx, png.SPNG_FMT_RGBA8, &output_size); - if (status != 0) { - return error.CouldNotCalcOutputSize; - } - return output_size; -} - -fn _read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void { - const status = png.spng_decode_image(ctx, buffer.ptr, buffer.len, png.SPNG_FMT_RGBA8, 0); - if (status != 0) { - return error.CouldNotDecodeImage; - } -} - -pub fn read_png(allocator: std.mem.Allocator, path: []const u8) !ImageData { - const file_descriptor = c.fopen(path.ptr, "rb"); - if (file_descriptor == null) { - return error.CouldNotOpenFile; - } - - const ctx = png.spng_ctx_new(0) orelse unreachable; - defer png.spng_ctx_free(ctx); - _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); - const image_header = try get_image_header(ctx); - try see_image_properties(image_header); - - const output_size = try calc_output_size(ctx); - var buffer = try allocator.alloc(u8, output_size); - @memset(buffer[0..], 0); - try _read_data_to_buffer(ctx, buffer[0..]); - - const close_status = c.fclose(file_descriptor); - if (close_status != 0) { - return error.CouldNotCloseFileDescriptor; - } - return ImageData{ - .data = buffer[0..], - .decoded_size = output_size, - .header = image_header, - }; -} - -fn apply_image_filter(image_data: *ImageData) !void { - const len = image_data.data.len; - const red_factor: f16 = 0.2126; - const green_factor: f16 = 0.7152; - const blue_factor: f16 = 0.0722; - var index: u64 = 0; - while (index < (len - 4)) : (index += 4) { - const rf: f16 = @floatFromInt(image_data.data[index]); - const gf: f16 = @floatFromInt(image_data.data[index + 1]); - const bf: f16 = @floatFromInt(image_data.data[index + 2]); - const y_linear: f16 = (rf * red_factor) + (gf * green_factor) + (bf * blue_factor); - image_data.data[index] = @intFromFloat(y_linear); - image_data.data[index + 1] = @intFromFloat(y_linear); - image_data.data[index + 2] = @intFromFloat(y_linear); - } -} - -fn save_png(image_data: *ImageData) !void { - const path = "pedro_pascal_filter.png"; - const file_descriptor = c.fopen(path.ptr, "wb"); - const ctx = png.spng_ctx_new(png.SPNG_CTX_ENCODER) orelse unreachable; - defer png.spng_ctx_free(ctx); - _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); - - // const image_width = image_data.decoded_size / image_data.header.height; - var image_header: png.spng_ihdr = undefined; - image_header.height = image_data.header.height; - image_header.width = image_data.header.width; - image_header.bit_depth = image_data.header.bit_depth; - image_header.color_type = image_data.header.color_type; - image_header.compression_method = image_data.header.compression_method; - image_header.filter_method = image_data.header.filter_method; - image_header.interlace_method = image_data.header.interlace_method; - - try stdout.print("Decoded size: {any}\n", .{image_data.decoded_size}); - try stdout.print("Buffer size: {any}\n", .{image_data.data.len}); - - _ = png.spng_set_ihdr(ctx, &image_header); - const encode_status = png.spng_encode_image(ctx, @as([*c]u8, @ptrCast(image_data.data)), image_data.data.len, png.SPNG_FMT_PNG, png.SPNG_ENCODE_FINALIZE); - if (encode_status != 0) { - const status: u32 = @intCast(encode_status); - try stderr.print("Test: {any}\n", .{encode_status == png.SPNG_EFMT}); - try stderr.print("Error code: {d}\n", .{status}); - return error.CouldNotEncodeImage; - } - const close_status = c.fclose(file_descriptor); - if (close_status != 0) { - return error.CouldNotCloseFileDescriptor; - } -} - -pub fn main() !void { - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var image_data = try read_png(allocator, "pedro_pascal.png"); - try apply_image_filter(&image_data); - - try save_png(&image_data); - - image_data.deinit(allocator); -} +const std = @import("std"); +const math = std.math; +const stdout = std.io.getStdOut().writer(); +const stderr = std.io.getStdErr().writer(); +const cmath = @cImport({ + @cInclude("math.h"); +}); +const c = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); +const png = @cImport({ + @cInclude("spng.h"); +}); + +const ImageHeader = struct { + width: u32, + height: u32, + bit_depth: u8, + color_type: u8, + compression_method: u8, + filter_method: u8, + interlace_method: u8, +}; +const ImageData = struct { + data: []u8, + decoded_size: u64, + header: ImageHeader, + + pub fn deinit(self: ImageData, allocator: std.mem.Allocator) void { + allocator.free(self.data); + } +}; + +fn get_image_header(ctx: *png.spng_ctx) !ImageHeader { + var image_header: png.spng_ihdr = undefined; + if (png.spng_get_ihdr(ctx, &image_header) != 0) { + return error.CouldNotGetImageHeader; + } + + return ImageHeader{ + .width = image_header.width, + .height = image_header.height, + .bit_depth = image_header.bit_depth, + .color_type = image_header.color_type, + .compression_method = image_header.compression_method, + .filter_method = image_header.filter_method, + .interlace_method = image_header.interlace_method, + }; +} + +fn see_image_properties(image_header: ImageHeader) !void { + try stdout.print("width: {d}, height: {d}, bit depth: {d}\n", .{ image_header.width, image_header.height, image_header.bit_depth }); +} + +fn calc_output_size(ctx: *png.spng_ctx) !u64 { + var output_size: u64 = 0; + const status = png.spng_decoded_image_size(ctx, png.SPNG_FMT_RGBA8, &output_size); + if (status != 0) { + return error.CouldNotCalcOutputSize; + } + return output_size; +} + +fn _read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void { + const status = png.spng_decode_image(ctx, buffer.ptr, buffer.len, png.SPNG_FMT_RGBA8, 0); + if (status != 0) { + return error.CouldNotDecodeImage; + } +} + +pub fn read_png(allocator: std.mem.Allocator, path: []const u8) !ImageData { + const file_descriptor = c.fopen(path.ptr, "rb"); + if (file_descriptor == null) { + return error.CouldNotOpenFile; + } + + const ctx = png.spng_ctx_new(0) orelse unreachable; + defer png.spng_ctx_free(ctx); + _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); + const image_header = try get_image_header(ctx); + try see_image_properties(image_header); + + const output_size = try calc_output_size(ctx); + var buffer = try allocator.alloc(u8, output_size); + @memset(buffer[0..], 0); + try _read_data_to_buffer(ctx, buffer[0..]); + + const close_status = c.fclose(file_descriptor); + if (close_status != 0) { + return error.CouldNotCloseFileDescriptor; + } + return ImageData{ + .data = buffer[0..], + .decoded_size = output_size, + .header = image_header, + }; +} + +fn apply_image_filter(image_data: *ImageData) !void { + const len = image_data.data.len; + const red_factor: f16 = 0.2126; + const green_factor: f16 = 0.7152; + const blue_factor: f16 = 0.0722; + var index: u64 = 0; + while (index < (len - 4)) : (index += 4) { + const rf: f16 = @floatFromInt(image_data.data[index]); + const gf: f16 = @floatFromInt(image_data.data[index + 1]); + const bf: f16 = @floatFromInt(image_data.data[index + 2]); + const y_linear: f16 = (rf * red_factor) + (gf * green_factor) + (bf * blue_factor); + image_data.data[index] = @intFromFloat(y_linear); + image_data.data[index + 1] = @intFromFloat(y_linear); + image_data.data[index + 2] = @intFromFloat(y_linear); + } +} + +fn save_png(image_data: *ImageData) !void { + const path = "pedro_pascal_filter.png"; + const file_descriptor = c.fopen(path.ptr, "wb"); + const ctx = png.spng_ctx_new(png.SPNG_CTX_ENCODER) orelse unreachable; + defer png.spng_ctx_free(ctx); + _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); + + // const image_width = image_data.decoded_size / image_data.header.height; + var image_header: png.spng_ihdr = undefined; + image_header.height = image_data.header.height; + image_header.width = image_data.header.width; + image_header.bit_depth = image_data.header.bit_depth; + image_header.color_type = image_data.header.color_type; + image_header.compression_method = image_data.header.compression_method; + image_header.filter_method = image_data.header.filter_method; + image_header.interlace_method = image_data.header.interlace_method; + + try stdout.print("Decoded size: {any}\n", .{image_data.decoded_size}); + try stdout.print("Buffer size: {any}\n", .{image_data.data.len}); + + _ = png.spng_set_ihdr(ctx, &image_header); + const encode_status = png.spng_encode_image(ctx, @as([*c]u8, @ptrCast(image_data.data)), image_data.data.len, png.SPNG_FMT_PNG, png.SPNG_ENCODE_FINALIZE); + if (encode_status != 0) { + const status: u32 = @intCast(encode_status); + try stderr.print("Test: {any}\n", .{encode_status == png.SPNG_EFMT}); + try stderr.print("Error code: {d}\n", .{status}); + return error.CouldNotEncodeImage; + } + const close_status = c.fclose(file_descriptor); + if (close_status != 0) { + return error.CouldNotCloseFileDescriptor; + } +} + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var image_data = try read_png(allocator, "pedro_pascal.png"); + try apply_image_filter(&image_data); + + try save_png(&image_data); + + image_data.deinit(allocator); +} diff --git a/ZigExamples/image_filter/src/test.zig b/ZigExamples/image_filter/src/test.zig index d40021f5..9cdbbf09 100644 --- a/ZigExamples/image_filter/src/test.zig +++ b/ZigExamples/image_filter/src/test.zig @@ -1,93 +1,93 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -const c = @cImport({ - @cDefine("_NO_CRT_STDIO_INLINE", "1"); - @cInclude("stdio.h"); -}); -const png = @cImport({ - @cInclude("spng.h"); -}); - -fn get_image_header(ctx: *png.spng_ctx) !png.spng_ihdr { - var image_header: png.spng_ihdr = undefined; - if (png.spng_get_ihdr(ctx, &image_header) != 0) { - return error.CouldNotGetImageHeader; - } - - return image_header; -} - -fn calc_output_size(ctx: *png.spng_ctx) !u64 { - var output_size: u64 = 0; - const status = png.spng_decoded_image_size(ctx, png.SPNG_FMT_RGBA8, &output_size); - if (status != 0) { - return error.CouldNotCalcOutputSize; - } - return output_size; -} - -fn read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void { - const status = png.spng_decode_image(ctx, buffer.ptr, buffer.len, png.SPNG_FMT_RGBA8, 0); - - if (status != 0) { - return error.CouldNotDecodeImage; - } -} - -fn apply_image_filter(buffer: []u8) !void { - const len = buffer.len; - const red_factor: f16 = 0.2126; - const green_factor: f16 = 0.7152; - const blue_factor: f16 = 0.0722; - var index: u64 = 0; - while (index < (len - 4)) : (index += 4) { - const rf: f16 = @floatFromInt(buffer[index]); - const gf: f16 = @floatFromInt(buffer[index + 1]); - const bf: f16 = @floatFromInt(buffer[index + 2]); - const y_linear: f16 = ((rf * red_factor) + (gf * green_factor) + (bf * blue_factor)); - buffer[index] = @intFromFloat(y_linear); - buffer[index + 1] = @intFromFloat(y_linear); - buffer[index + 2] = @intFromFloat(y_linear); - } -} - -fn save_png(image_header: *png.spng_ihdr, buffer: []u8) !void { - const path = "pedro_pascal_filter.png"; - const file_descriptor = c.fopen(path.ptr, "wb"); - if (file_descriptor == null) { - return error.CouldNotOpenFile; - } - const ctx = (png.spng_ctx_new(png.SPNG_CTX_ENCODER) orelse unreachable); - defer png.spng_ctx_free(ctx); - _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); - _ = png.spng_set_ihdr(ctx, image_header); - - const encode_status = png.spng_encode_image(ctx, buffer.ptr, buffer.len, png.SPNG_FMT_PNG, png.SPNG_ENCODE_FINALIZE); - if (encode_status != 0) { - return error.CouldNotEncodeImage; - } - if (c.fclose(file_descriptor) != 0) { - return error.CouldNotCloseFileDescriptor; - } -} - -pub fn main() !void { - const path = "pedro_pascal.png"; - const file_descriptor = c.fopen(path, "rb"); - if (file_descriptor == null) { - @panic("Could not open file!"); - } - const ctx = png.spng_ctx_new(0) orelse unreachable; - _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); - - var gpa = std.heap.GeneralPurposeAllocator(.{}){}; - const allocator = gpa.allocator(); - var image_header = try get_image_header(ctx); - const output_size = try calc_output_size(ctx); - var buffer = try allocator.alloc(u8, output_size); - @memset(buffer[0..], 0); - - try read_data_to_buffer(ctx, buffer[0..]); - try apply_image_filter(buffer[0..]); - try save_png(&image_header, buffer[0..]); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const c = @cImport({ + @cDefine("_NO_CRT_STDIO_INLINE", "1"); + @cInclude("stdio.h"); +}); +const png = @cImport({ + @cInclude("spng.h"); +}); + +fn get_image_header(ctx: *png.spng_ctx) !png.spng_ihdr { + var image_header: png.spng_ihdr = undefined; + if (png.spng_get_ihdr(ctx, &image_header) != 0) { + return error.CouldNotGetImageHeader; + } + + return image_header; +} + +fn calc_output_size(ctx: *png.spng_ctx) !u64 { + var output_size: u64 = 0; + const status = png.spng_decoded_image_size(ctx, png.SPNG_FMT_RGBA8, &output_size); + if (status != 0) { + return error.CouldNotCalcOutputSize; + } + return output_size; +} + +fn read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void { + const status = png.spng_decode_image(ctx, buffer.ptr, buffer.len, png.SPNG_FMT_RGBA8, 0); + + if (status != 0) { + return error.CouldNotDecodeImage; + } +} + +fn apply_image_filter(buffer: []u8) !void { + const len = buffer.len; + const red_factor: f16 = 0.2126; + const green_factor: f16 = 0.7152; + const blue_factor: f16 = 0.0722; + var index: u64 = 0; + while (index < (len - 4)) : (index += 4) { + const rf: f16 = @floatFromInt(buffer[index]); + const gf: f16 = @floatFromInt(buffer[index + 1]); + const bf: f16 = @floatFromInt(buffer[index + 2]); + const y_linear: f16 = ((rf * red_factor) + (gf * green_factor) + (bf * blue_factor)); + buffer[index] = @intFromFloat(y_linear); + buffer[index + 1] = @intFromFloat(y_linear); + buffer[index + 2] = @intFromFloat(y_linear); + } +} + +fn save_png(image_header: *png.spng_ihdr, buffer: []u8) !void { + const path = "pedro_pascal_filter.png"; + const file_descriptor = c.fopen(path.ptr, "wb"); + if (file_descriptor == null) { + return error.CouldNotOpenFile; + } + const ctx = (png.spng_ctx_new(png.SPNG_CTX_ENCODER) orelse unreachable); + defer png.spng_ctx_free(ctx); + _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); + _ = png.spng_set_ihdr(ctx, image_header); + + const encode_status = png.spng_encode_image(ctx, buffer.ptr, buffer.len, png.SPNG_FMT_PNG, png.SPNG_ENCODE_FINALIZE); + if (encode_status != 0) { + return error.CouldNotEncodeImage; + } + if (c.fclose(file_descriptor) != 0) { + return error.CouldNotCloseFileDescriptor; + } +} + +pub fn main() !void { + const path = "pedro_pascal.png"; + const file_descriptor = c.fopen(path, "rb"); + if (file_descriptor == null) { + @panic("Could not open file!"); + } + const ctx = png.spng_ctx_new(0) orelse unreachable; + _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor)); + + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + var image_header = try get_image_header(ctx); + const output_size = try calc_output_size(ctx); + var buffer = try allocator.alloc(u8, output_size); + @memset(buffer[0..], 0); + + try read_data_to_buffer(ctx, buffer[0..]); + try apply_image_filter(buffer[0..]); + try save_png(&image_header, buffer[0..]); +} diff --git a/ZigExamples/threads/cancel_thread.zig b/ZigExamples/threads/cancel_thread.zig new file mode 100644 index 00000000..1b4cd447 --- /dev/null +++ b/ZigExamples/threads/cancel_thread.zig @@ -0,0 +1,28 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +var running = std.atomic.Value(bool).init(true); +var counter: u64 = 0; +fn do_more_work() void { + std.time.sleep(2 * std.time.ns_per_s); +} +fn work() !void { + while (running.load(.monotonic)) { + for (0..10000) |_| { + counter += 1; + } + if (counter < 15000) { + _ = try stdout.write("Time to cancel the thread.\n"); + running.store(false, .monotonic); + } else { + _ = try stdout.write("Time to do more work.\n"); + do_more_work(); + running.store(false, .monotonic); + } + } +} + +pub fn main() !void { + const thread = try Thread.spawn(.{}, work, .{}); + thread.join(); +} diff --git a/ZigExamples/threads/data_race.zig b/ZigExamples/threads/data_race.zig new file mode 100644 index 00000000..9b26adcf --- /dev/null +++ b/ZigExamples/threads/data_race.zig @@ -0,0 +1,20 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; + +// Global counter variable +var counter: usize = 0; +// Function to increment the counter +fn increment() void { + for (0..100000) |_| { + counter += 1; + } +} + +pub fn main() !void { + const thr1 = try Thread.spawn(.{}, increment, .{}); + const thr2 = try Thread.spawn(.{}, increment, .{}); + thr1.join(); + thr2.join(); + try stdout.print("Couter value: {d}\n", .{counter}); +} diff --git a/ZigExamples/threads/deadlock.zig b/ZigExamples/threads/deadlock.zig new file mode 100644 index 00000000..6e88161c --- /dev/null +++ b/ZigExamples/threads/deadlock.zig @@ -0,0 +1,29 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +var mut1: std.Thread.Mutex = .{}; +var mut2: std.Thread.Mutex = .{}; +fn do_some_work1() !void { + mut1.lock(); + std.time.sleep(1 * std.time.ns_per_s); + mut2.lock(); + _ = try stdout.write("Doing some work 1\n"); + mut2.unlock(); + mut1.unlock(); +} + +fn do_some_work2() !void { + mut2.lock(); + std.time.sleep(1 * std.time.ns_per_s); + mut1.lock(); + _ = try stdout.write("Doing some work 1\n"); + mut1.unlock(); + mut2.unlock(); +} + +pub fn main() !void { + const thr1 = try Thread.spawn(.{}, do_some_work1, .{}); + const thr2 = try Thread.spawn(.{}, do_some_work2, .{}); + thr1.join(); + thr2.join(); +} diff --git a/ZigExamples/threads/detach.zig b/ZigExamples/threads/detach.zig new file mode 100644 index 00000000..5c2c064d --- /dev/null +++ b/ZigExamples/threads/detach.zig @@ -0,0 +1,13 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +fn print_id(id: *const u8) !void { + try stdout.print("Thread ID: {d}\n", .{id.*}); +} + +pub fn main() !void { + const id1: u8 = 1; + const thread1 = try Thread.spawn(.{}, print_id, .{&id1}); + thread1.detach(); + _ = try stdout.write("Finish main\n"); +} diff --git a/ZigExamples/threads/example1.zig b/ZigExamples/threads/example1.zig new file mode 100644 index 00000000..5c5026b5 --- /dev/null +++ b/ZigExamples/threads/example1.zig @@ -0,0 +1,14 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; + +fn do_some_work() !void { + _ = try stdout.write("Starting the work.\n"); + std.time.sleep(100 * std.time.ns_per_ms); + _ = try stdout.write("Finishing the work.\n"); +} + +pub fn main() !void { + const thread = try Thread.spawn(.{}, do_some_work, .{}); + thread.join(); +} diff --git a/ZigExamples/threads/example2.zig b/ZigExamples/threads/example2.zig new file mode 100644 index 00000000..12fa3a0f --- /dev/null +++ b/ZigExamples/threads/example2.zig @@ -0,0 +1,18 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; + +fn do_some_work(thread_id: *const u8) !void { + _ = try stdout.print("Starting thread {d}.\n", .{thread_id.*}); + std.time.sleep(100 * std.time.ns_per_ms); + _ = try stdout.print("Finishing thread {d}.\n", .{thread_id.*}); +} + +pub fn main() !void { + const id1: u8 = 1; + const id2: u8 = 2; + const thread1 = try Thread.spawn(.{}, do_some_work, .{&id1}); + const thread2 = try Thread.spawn(.{}, do_some_work, .{&id2}); + thread1.join(); + thread2.join(); +} diff --git a/ZigExamples/threads/example3.zig b/ZigExamples/threads/example3.zig new file mode 100644 index 00000000..d43f6202 --- /dev/null +++ b/ZigExamples/threads/example3.zig @@ -0,0 +1,21 @@ +// In this example, we are performing the some work +// in a separate thread. However, there is no real garantee that the work +// performed in this thread is going to finish before the execution of the +// main function, simply because we did not called `join()` on the thread. +// The only thing in this program that makes the execution of the thread finish before +// the execution of main() is the `sleep()` call. +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +fn do_some_work() !void { + _ = try stdout.write("Starting the work.\n"); + std.time.sleep(100 * std.time.ns_per_ms); + _ = try stdout.write("Finishing the work.\n"); +} + +pub fn main() !void { + const thread = try Thread.spawn(.{}, do_some_work, .{}); + _ = thread; + + std.time.sleep(2 * std.time.ns_per_s); +} diff --git a/ZigExamples/threads/joining.zig b/ZigExamples/threads/joining.zig new file mode 100644 index 00000000..4cd74e62 --- /dev/null +++ b/ZigExamples/threads/joining.zig @@ -0,0 +1,19 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +fn print_id(id: *const u8) !void { + try stdout.print("Thread ID: {d}\n", .{id.*}); +} + +pub fn main() !void { + const id1: u8 = 1; + const id2: u8 = 2; + const thread1 = try Thread.spawn(.{}, print_id, .{&id1}); + const thread2 = try Thread.spawn(.{}, print_id, .{&id2}); + + _ = try stdout.write("Joining thread 1\n"); + thread1.join(); + std.time.sleep(2 * std.time.ns_per_s); + _ = try stdout.write("Joining thread 2\n"); + thread2.join(); +} diff --git a/ZigExamples/threads/mutex.zig b/ZigExamples/threads/mutex.zig new file mode 100644 index 00000000..f624c178 --- /dev/null +++ b/ZigExamples/threads/mutex.zig @@ -0,0 +1,21 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +const Mutex = std.Thread.Mutex; +var counter: usize = 0; +fn increment(mutex: *Mutex) void { + for (0..100000) |_| { + mutex.lock(); + counter += 1; + mutex.unlock(); + } +} + +pub fn main() !void { + var mutex: Mutex = .{}; + const thr1 = try Thread.spawn(.{}, increment, .{&mutex}); + const thr2 = try Thread.spawn(.{}, increment, .{&mutex}); + thr1.join(); + thr2.join(); + try stdout.print("Couter value: {d}\n", .{counter}); +} diff --git a/ZigExamples/threads/pool.zig b/ZigExamples/threads/pool.zig new file mode 100644 index 00000000..584d2206 --- /dev/null +++ b/ZigExamples/threads/pool.zig @@ -0,0 +1,13 @@ +const std = @import("std"); +const Pool = std.Thread.Pool; +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + const opt = Pool.Options{ + .n_jobs = 4, + .allocator = allocator, + }; + var pool: Pool = undefined; + _ = try pool.init(opt); + defer pool.deinit(); +} diff --git a/ZigExamples/threads/pool_with_task.zig b/ZigExamples/threads/pool_with_task.zig new file mode 100644 index 00000000..82f97582 --- /dev/null +++ b/ZigExamples/threads/pool_with_task.zig @@ -0,0 +1,24 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Pool = std.Thread.Pool; + +fn print_id(id: *const u8) void { + _ = stdout.print("Thread ID: {d}\n", .{id.*}) catch void; +} + +pub fn main() !void { + var gpa = std.heap.GeneralPurposeAllocator(.{}){}; + const allocator = gpa.allocator(); + const opt = Pool.Options{ + .n_jobs = 4, + .allocator = allocator, + }; + var pool: Pool = undefined; + _ = try pool.init(opt); + defer pool.deinit(); + + const id1: u8 = 1; + const id2: u8 = 2; + try pool.spawn(print_id, .{&id1}); + try pool.spawn(print_id, .{&id2}); +} diff --git a/ZigExamples/threads/rw_lock.zig b/ZigExamples/threads/rw_lock.zig new file mode 100644 index 00000000..e39e509f --- /dev/null +++ b/ZigExamples/threads/rw_lock.zig @@ -0,0 +1,40 @@ +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +const Thread = std.Thread; +const RwLock = std.Thread.RwLock; +var counter: u32 = 0; +var buffer = [4]u32{ 512, 2700, 9921, 112 }; + +fn reader(lock: *RwLock) !void { + while (true) { + lock.lockShared(); + const v: u32 = counter; + try stdout.print("{d}", .{v}); + lock.unlockShared(); + std.time.sleep(2 * std.time.ns_per_s); + } +} + +fn writer(lock: *RwLock) void { + while (true) { + lock.lock(); + counter += 1; + lock.unlock(); + std.time.sleep(2 * std.time.ns_per_s); + } +} + +pub fn main() !void { + const ids = [3]u8{ 1, 2, 3 }; + _ = ids; + var lock: RwLock = .{}; + const thr1 = try Thread.spawn(.{}, reader, .{&lock}); + const thr2 = try Thread.spawn(.{}, reader, .{&lock}); + const thr3 = try Thread.spawn(.{}, reader, .{&lock}); + const wthread = try Thread.spawn(.{}, writer, .{&lock}); + + thr1.join(); + thr2.join(); + thr3.join(); + wthread.join(); +} diff --git a/ZigExamples/zig-basics/function_parameters_immu.zig b/ZigExamples/zig-basics/function_parameters_immu.zig index 04d6819e..ca956d6b 100644 --- a/ZigExamples/zig-basics/function_parameters_immu.zig +++ b/ZigExamples/zig-basics/function_parameters_immu.zig @@ -1,12 +1,12 @@ -const std = @import("std"); -// This code does not compile because we are trying to -// change the value of a function parameter. -fn add2(x: u32) u32 { - x = x + 2; - return x; -} - -pub fn main() !void { - const y = add2(4); - std.debug.print("{d}\n", .{y}); -} +const std = @import("std"); +// This code does not compile because we are trying to +// change the value of a function parameter. +fn add2(x: u32) u32 { + x = x + 2; + return x; +} + +pub fn main() !void { + const y = add2(4); + std.debug.print("{d}\n", .{y}); +} diff --git a/ZigExamples/zig-basics/function_parameters_mmu.zig b/ZigExamples/zig-basics/function_parameters_mmu.zig index 67b069ec..6f6f9d87 100644 --- a/ZigExamples/zig-basics/function_parameters_mmu.zig +++ b/ZigExamples/zig-basics/function_parameters_mmu.zig @@ -1,11 +1,11 @@ -const std = @import("std"); -fn add2(x: *u32) void { - const d: u32 = 2; - x.* = x.* + d; -} - -pub fn main() !void { - var x: u32 = 4; - add2(&x); - std.debug.print("{d}\n", .{x}); -} +const std = @import("std"); +fn add2(x: *u32) void { + const d: u32 = 2; + x.* = x.* + d; +} + +pub fn main() !void { + var x: u32 = 4; + add2(&x); + std.debug.print("{d}\n", .{x}); +} diff --git a/ZigExamples/zig-basics/hello_world.zig b/ZigExamples/zig-basics/hello_world.zig index 6d52e3e0..94ee166e 100644 --- a/ZigExamples/zig-basics/hello_world.zig +++ b/ZigExamples/zig-basics/hello_world.zig @@ -1,5 +1,5 @@ -const std = @import("std"); -const stdout = std.io.getStdOut().writer(); -pub fn main() !void { - try stdout.print("Hello World!\n", .{}); -} +const std = @import("std"); +const stdout = std.io.getStdOut().writer(); +pub fn main() !void { + try stdout.print("Hello World!\n", .{}); +} diff --git a/ZigExamples/zig-basics/vec3_struct.zig b/ZigExamples/zig-basics/vec3_struct.zig index 6080870d..a7c6d097 100644 --- a/ZigExamples/zig-basics/vec3_struct.zig +++ b/ZigExamples/zig-basics/vec3_struct.zig @@ -1,37 +1,37 @@ -const std = @import("std"); -const math = std.math; -const Vec3 = struct { - x: f64, - y: f64, - z: f64, - - pub fn distance(self: Vec3, other: Vec3) f64 { - const xd = math.pow(f64, self.x - other.x, 2.0); - const yd = math.pow(f64, self.y - other.y, 2.0); - const zd = math.pow(f64, self.z - other.z, 2.0); - return math.sqrt(xd + yd + zd); - } - - pub fn double(self: *Vec3) void { - self.x = self.x * 2.0; - self.y = self.y * 2.0; - self.z = self.z * 2.0; - } -}; - -pub fn main() !void { - var v1 = Vec3{ - .x = 4.2, - .y = 2.4, - .z = 0.9, - }; - const v2 = Vec3{ - .x = 5.1, - .y = 5.6, - .z = 1.6, - }; - - std.debug.print("Distance: {d}\n", .{v1.distance(v2)}); - v1.double(); - std.debug.print("Doubled: {d}\n", .{v1.x}); -} +const std = @import("std"); +const math = std.math; +const Vec3 = struct { + x: f64, + y: f64, + z: f64, + + pub fn distance(self: Vec3, other: Vec3) f64 { + const xd = math.pow(f64, self.x - other.x, 2.0); + const yd = math.pow(f64, self.y - other.y, 2.0); + const zd = math.pow(f64, self.z - other.z, 2.0); + return math.sqrt(xd + yd + zd); + } + + pub fn double(self: *Vec3) void { + self.x = self.x * 2.0; + self.y = self.y * 2.0; + self.z = self.z * 2.0; + } +}; + +pub fn main() !void { + var v1 = Vec3{ + .x = 4.2, + .y = 2.4, + .z = 0.9, + }; + const v2 = Vec3{ + .x = 5.1, + .y = 5.6, + .z = 1.6, + }; + + std.debug.print("Distance: {d}\n", .{v1.distance(v2)}); + v1.double(); + std.debug.print("Doubled: {d}\n", .{v1.x}); +} diff --git a/_freeze/Chapters/01-memory/execute-results/html.json b/_freeze/Chapters/01-memory/execute-results/html.json index 59991719..749c6684 100644 --- a/_freeze/Chapters/01-memory/execute-results/html.json +++ b/_freeze/Chapters/01-memory/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Memory and Allocators\n\n\nIn this chapter, we will talk about memory. How does Zig controls memory? What\ncommom tools are used? Are there any important aspect that makes memory\ndifferent/special in Zig? You will find the answers here.\n\nEvery computer needs memory. Is by having memory that computers can temporarily store\nthe values/results of your calculations. Without memory, programming languages would never have\nconcepts such as \"variables\", or \"objects\", to store the values that you generate.\n\n\n## Memory spaces\n\nEvery object that you create in your Zig source code needs to be stored somewhere,\nin your computer's memory. Depending on where and how you define your object, Zig\nwill use a different \"memory space\", or a different\ntype of memory to store this object.\n\nEach type of memory normally serves for different purposes.\nIn Zig, there are 3 types of memory (or 3 different memory spaces) that we care about. They are:\n\n- Global data register (or the \"global data section\");\n- Stack;\n- Heap;\n\n\n### Compile-time known versus runtime known {#sec-compile-time}\n\nOne strategy that Zig uses to decide where it will store each object that you declare, is by looking\nat the value of this particular object. More specifically, by investigating if this value is\nknown at \"compile-time\" or at \"runtime\".\n\nWhen you write a program in Zig, the values of some of the objects that you write in your program are *known\nat compile time*. Meaning that, when you compile your Zig source code, during the compilation process,\nthe `zig` compiler can figure it out what is the exact value of a particular object\nthat exists in your source code.\nKnowing the length (or the size) of each object is also important. So the length (or the size) of each object that you write in your program is,\nin some cases, *known at compile time*.\n\nThe `zig` compiler cares more about knowing the length (or the size) of a particular object\n, than to know it's actual value. But, if the `zig` compiler knows the value of the object, then, it\nautomatically knows the size of this object. Because it can simply calculate the\nsize of the object by looking at the size of the value.\n\nTherefore, the priority for the `zig` compiler is to discover the size of each object in your source code.\nIf the value of the object in question is known at compile-time, then, the `zig` compiler\nautomatically knows the size/length of this object. But if the value of this object is not\nknown at compile-time, then, the size of this object is only known at compile-time if,\nand only if, the type of this object have a known fixed size.\n\nIn order to a type have a known fixed size, this type must have data members whose size is fixed.\nIf this type includes, for example, a variable sized array in it, then, this type do not have a known\nfixed size. Because this array can have any size at runtime\n(i.e. it can be an array of 2 elements, or 50 elements, or 1 thousand elements, etc.).\n\nFor example, a string object, which internally is an array of constant u8 values (`[]const u8`)\nhave a variable size. It can be a string object with 100 or 500 characters in it. If we do not\nknow at compile-time, which exact string will be stored inside this string object, then, we cannot calculate\nthe size of this string object at compile-time. So, any type, or any struct declaration that you make, that\nincludes a string data member that do not have an explicit fixed size, makes this type, or this\nnew struct that you are declaring, a type that do not have a known fixed size at compile-time.\n\nIn contrast, if the type or this struct that you are declaring, includes a data member that is an array,\nbut this array have a known fixed size, like `[60]u8` (which declares an array of 60 `u8` values), then,\nthis type, or, this struct that you are declaring, becomes a type with a known fixed size at compile-time.\nAnd because of that, in this case, the `zig` compiler do not need to known at compile-time the exact value of\nany object of this type. Since the compiler can find the necessary size to store this object by\nlooking at the size of it's type.\n\n\nLet's look at an example. In the source code below, we have two constant objects (`name` and `array`) declared.\nBecause the values of these particular objects are written down, in the source code itself (`\"Pedro\"`\nand the number sequence from 1 to 4), the `zig` compiler can easily discover the values of these constant\nobjects (`name` and `array`) during the compilation process.\nThis is what \"known at compile time\" means. It refers to any object that you have in your Zig source code\nwhose value can be identified at compile time.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name = \"Pedro\";\nconst array = [_]u8{1, 2, 3, 4};\n_ = name; _ = array;\n\nfn input_length(input: []const u8) usize {\n const n = input.len;\n return n;\n}\n```\n:::\n\n\n\n\nThe other side of the spectrum are objects whose values are not known at compile time.\nFunction arguments are a classic example of this. Because the value of each function\nargument depends on the value that you assign to this particular argument,\nwhen you call the function.\n\nFor example, the function `input_length()` contains an argument named `input`, which is an array of constant `u8` integers (`[]const u8`).\nIs impossible to know at compile time the value of this particular argument. And it also is impossible to know the size/length\nof this particular argument. Because it is an array that do not have a fixed size specified explicitly in the argument type annotation.\n\nSo, we know that this `input` argument will be an array of `u8` integers. But we do not know at compile-time, it's value, and neither his size.\nThis information is known only at runtime, which is the period of time when you program is executed.\nAs a consequence, the value of the expression `input.len` is also known only at runtime.\nThis is an intrinsic characteristic of any function. Just remember that the value of function arguments is usually not \"compile-time known\".\n\nHowever, as I mentioned earlier, what really matters to the compiler is to know the size of the object\nat compile-time, and not necessarily it's value. So, although we don't know the value of the object `n`, which is the result of the expression\n`input.len`, at compile-time, we do know it's size. Because the expression `input.len` always return a value of type `usize`,\nand the type `usize` have a known fixed size.\n\n\n\n### Global data register\n\nThe global data register is a specific section of the executable of your Zig program, that is responsible\nfor storing any value that is known at compile time.\n\nEvery constant object whose value is known at compile time that you declare in your source code,\nis stored in the global data register. Also, every literal value that you write in your source code,\nsuch as the string `\"this is a string\"`, or the integer `10`, or a boolean value such as `true`,\nis also stored in the global data register.\n\nHonestly, you don't need to care much about this memory space. Because you can't control it,\nyou can't deliberately access it or use it for your own purposes.\nAlso, this memory space does not affect the logic of your program.\nIt simply exists in your program.\n\n\n### Stack vs Heap\n\nIf you are familiar with system's programming, or just low-level programming in general, you\nprobably have heard of the \"duel\" between Stack vs Heap. These are two different types of memory,\nor different memory spaces, which are both available in Zig.\n\nThese two types of memory don't actually duel with\neach other. This is a commom mistake that beginners have, when seeing \"x vs y\" styles of\ntabloid headlines. These two types of memory are actually complementary to each other.\nSo, in almost every Zig program that you ever write, you will likely use a combination of both.\nI will describe each memory space in detail over the next sections. But for now, I just want to\nstablish the main difference between these two types of memory.\n\nIn essence, the stack memory is normally used to store values whose length is fixed and known\nat compile time. In contrast, the heap memory is a *dynamic* type of memory space, meaning that, it is\nused to store values whose length might grow during the execution (runtime) of your program [@jenny2022].\n\nLengths that grow during runtime are intrinsically associated with \"runtime known\" type of values.\nIn other words, if you have an object whose length might grow during runtime, then, the length\nof this object becomes not known at compile time. If the length is not known at compile-time,\nthe value of this object also becomes not known at compile-time.\nThese types of objects should be stored in the heap memory space, which is\na dynamic memory space, which can grow or shrink to fit the size of your objects.\n\n\n\n### Stack {#sec-stack}\n\nThe stack is a type of memory that uses the power of the *stack data structure*, hence the name. \nA \"stack\" is a type of *data structure* that uses a \"last in, first out\" (LIFO) mechanism to store the values\nyou give it to. I imagine you are familiar with this data structure.\nBut, if you are not, the [Wikipedia page](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))[^wiki-stack]\n, or, the [Geeks For Geeks page](https://www.geeksforgeeks.org/stack-data-structure/)[^geek-stack] are both\nexcellent and easy resources to fully understand how this data structure works.\n\n[^wiki-stack]: \n[^geek-stack]: \n\nSo, the stack memory space is a type of memory that stores values using a stack data structure.\nIt adds and removes values from the memory by following a \"last in, first out\" (LIFO) principle.\n\nEvery time you make a function call in Zig, an amount of space in the stack is\nreserved for this particular function call [@jenny2022; @zigdocs].\nThe value of each function argument given to the function in this function call is stored in this\nstack space. Also, every local object that you declare inside the function scope is\nusually stored in this same stack space.\n\n\nLooking at the example below, the object `result` is a local object declared inside the scope of the `add()`\nfunction. Because of that, this object is stored inside the stack space reserved for the `add()` function.\nThe `r` object (which is declared outside of the `add()` function scope) is also stored in the stack.\nBut since it is declared in the \"outer\" scope, this object is stored in the\nstack space that belongs to this outer scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst r = add(5, 27);\n_ = r;\n\nfn add(x: u8, y: u8) u8 {\n const result = x + y;\n return result;\n}\n```\n:::\n\n\n\n\n\nSo, any object that you declare inside the scope of a function is always stored inside\nthe space that was reserved for that particular function in the stack memory. This\nalso counts for any object declared inside the scope of your `main()` function for example.\nAs you would expect, in this case, they\nare stored inside the stack space reserved for the `main()` function.\n\nOne very important detail about the stack memory is that **it frees itself automatically**.\nThis is very important, remember that. When objects are stored in the stack memory,\nyou don't have the work (or the responsibility) of freeing/destroying these objects.\nBecause they will be automatically destroyed once the stack space is freed at the end of the function scope.\n\nSo, once the function call returns (or ends, if you prefer to call it this way)\nthe space that was reserved in the stack is destroyed, and all of the objects that were in that space goes away with it.\nThis mechanism exists because this space, and the objects within it, are not necessary anymore,\nsince the function \"finished it's business\".\nUsing the `add()` function that we exposed above as an example, it means that the object `result` is automatically\ndestroyed once the function returns.\n\n::: {.callout-important}\nLocal objects that are stored in the stack space of a function are automatically\nfreed/destroyed at the end of the function scope.\n:::\n\n\nThis same logic applies to any other special structure in Zig that have it's own scope by surrounding\nit with curly braces (`{}`).\nFor loops, while loops, if else statements, etc. For example, if you declare any local\nobject in the scope of a for loop, this local object is accessible only within the scope\nof this particular for loop. Because once the scope of this for loop ends, the space in the stack\nreserved for this for loop is freed.\nThe example below demonstrates this idea.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This does not compile succesfully!\nconst a = [_]u8{0, 1, 2, 3, 4};\nfor (0..a.len) |i| {\n const index = i;\n _ = index;\n}\n// Trying to use an object that was\n// declared in the for loop scope,\n// and that does not exist anymore.\nstd.debug.print(\"{d}\\n\", index);\n```\n:::\n\n\n\n\n\n\nOne important consequence of this mechanism is that, once the function returns, you can no longer access any memory\naddress that was inside the space in the stack reserved for this particular function. Because this space was\ndestroyed. This means that, if this local object is stored in the stack,\nyou cannot make a function that **returns a pointer to this object**.\n\nThink about that for a second. If all local objects in the stack are destroyed at the end of the function scope, why\nwould you even consider returning a pointer to one of these objects? This pointer is at best,\ninvalid, or, more likely, \"undefined\".\n\nConclusion, is totally fine to write a function that returns the local object\nitself as result, because then, you return the value of that object as the result.\nBut, if this local object is stored in the stack, you should never write a function\nthat returns a pointer to this local object. Because the memory address pointed by the pointer\nno longer exists.\n\n\nSo, using again the `add()` function as an example, if you rewrite this function so that it\nreturns a pointer to the local object `result`, the `zig` compiler will actually compile\nyou program, with no warnings or erros. At first glance, it looks that this is good code\nthat works as expected. But this is a lie!\n\nIf you try to take a look at the value inside of the `r` object,\nor, if you try to use this `r` object in another expression\nor function call, then, you would have undefined behaviour, and major\nbugs in your program [@zigdocs, see \"Lifetime and Ownership\"[^life] and \"Undefined Behaviour\"[^undef] sections].\n\n[^life]: \n[^undef]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This code compiles succesfully. But it has\n// undefined behaviour. Never do this!!!\n\n// The `r` object is undefined!\nconst r = add(5, 27);\n_ = r;\n\nfn add(x: u8, y: u8) *const u8 {\n const result = x + y;\n return &result;\n}\n```\n:::\n\n\n\n\nThis \"invalid pointer to stack variable\" problem is very known across many programming language communities.\nIf you try to do the same thing, for example, in a C or C++ program (i.e. returning an address to\na local object stored in the stack), you would also get undefined behaviour\nin the program.\n\n::: {.callout-important}\nIf a local object in your function is stored in the stack, you should never\nreturn a pointer to this local object from the function. Because\nthis pointer will always become undefined after the function returns, since the stack space of the function\nis destroyed at the end of it's scope.\n:::\n\nBut what if you really need to use this local object in some way after your function returns?\nHow can you do this? The answer is: \"in the same you would do if this was a C or C++ program. By returning\nan address to an object stored in the heap\". The heap memory have a much more flexible lifecycle,\nand allows you to get a valid pointer to a local object of a function that already returned\nfrom it's scope.\n\n\n### Heap {#sec-heap}\n\nOne important limitation of the stack, is that, only objects whose length/size is known at compile-time can be\nstored in it. In contrast, the heap is a much more dynamic\n(and flexible) type of memory. It is the perfect type of memory to use\non objects whose size/length might grow during the execution of your program.\n\nVirtually any application that behaves as a server is a classic use case of the heap.\nA HTTP server, a SSH server, a DNS server, a LSP server, ... any type of server.\nIn summary, a server is a type of application that runs for long periods of time,\nand that serves (or \"deals with\") any incoming request that reaches this particular server.\n\nThe heap is a good choice for this type of system, mainly because the server does not know upfront\nhow many requests it will receive from users, while it is active. It could be one single request,\nor, 5 thousand requests, or, it could also be zero requests.\nThe server needs to have the ability to allocate and manage it's memory according to how many requests it receives.\n\nAnother key difference between the stack and the heap, is that the heap is a type\nof memory that you, the programmer, have complete control over. This makes the heap a\nmore flexible type of memory, but it also makes it harder to work with it. Because you,\nthe programmer, is responsible for managing everything related to it. Including where the memory is allocated,\nhow much memory is allocated, and where this memory is freed.\n\n> Unlike stack memory, heap memory is allocated explicitly by programmers and it won’t be deallocated until it is explicitly freed [@jenny2022].\n\nTo store an object in the heap, you, the programmer, needs to explicitly tells Zig to do so,\nby using an allocator to allocate some space in the heap. At @sec-allocators, I will present how you can use allocators to allocate memory\nin Zig.\n\n::: {.callout-important}\nEvery memory you allocate in the heap needs to be explicitly freed by you, the programmer.\n:::\n\nThe majority of allocators in Zig do allocate memory on the heap. But some exceptions to this rule are\n`ArenaAllocator()` and `FixedBufferAllocator()`. The `ArenaAllocator()` is a special\ntype of allocator that works in conjunction with a second type of allocator.\nOn the other side, the `FixedBufferAllocator()` is an allocator that works based on\nbuffer objects created on the stack. This means that the `FixedBufferAllocator()` makes\nallocations only on the stack.\n\n\n\n\n### Summary\n\nAfter discussing all of these boring details, we can quickly recap what we learned.\nIn summary, the Zig compiler will use the following rules to decide where each\nobject you declare is stored:\n\n1. every literal value (such as `\"this is string\"`, `10`, or `true`) is stored in the global data section.\n\n1. every constant object (`const`) whose value **is known at compile-time** is also stored in the global data section.\n\n1. every object (constant or not) whose length/size **is known at compile time** is stored in the stack space for the current scope.\n\n1. if an object is created with the method `alloc()` or `create()` of an allocator object, this object is stored in the memory space used by this particular allocator object. Most of allocators available in Zig use the heap memory, so, this object is likely stored in the heap (`FixedBufferAllocator()` is an exception to that).\n\n1. the heap can only be accessed through allocators. If your object was not created through the `alloc()` or `create()` methods of an allocator object, then, he is most certainly not an object stored in the heap.\n\n\n## Allocators {#sec-allocators}\n\nOne key aspect about Zig, is that there are \"no hidden-memory allocations\" in Zig.\nWhat that really means, is that \"no allocations happen behind your back in the standard library\" [@zigguide].\n\nThis is a known problem, specially in C++. Because in C++, there are some operators that do allocate\nmemory behind the scene, and there is no way for you to known that, until you actually read the\nsource code of these operators, and find the memory allocation calls.\nMany programmers find this behaviour annoying and hard to keep track of.\n\nBut, in Zig, if a function, an operator, or anything from the standard library\nneeds to allocate some memory during it's execution, then, this function/operator needs to receive (as input) an allocator\nprovided by the user, to actually be able to allocate the memory it needs.\n\nThis creates a clear distinction between functions that \"do not\" from those that \"actually do\"\nallocate memory. Just look at the arguments of this function.\nIf a function, or operator, have an allocator object as one of it's inputs/arguments, then, you know for\nsure that this function/operator will allocate some memory during it's execution.\n\nAn example is the `allocPrint()` function from the Zig standard library. With this function, you can\nwrite a new string using format specifiers. So, this function is, for example, very similar to the function `sprintf()` in C.\nIn order to write such new string, the `allocPrint()` function needs to allocate some memory to store the\noutput string.\n\nThat is why, the first argument of this function is an allocator object that you, the user/programmer, gives\nas input to the function. In the example below, I am using the `GeneralPurposeAllocator()` as my allocator\nobject. But I could easily use any other type of allocator object from the Zig standard library.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nconst name = \"Pedro\";\nconst output = try std.fmt.allocPrint(\n allocator,\n \"Hello {s}!!!\",\n .{name}\n);\ntry stdout.print(\"{s}\\n\", .{output});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello Pedro!!!\n```\n\n\n:::\n:::\n\n\n\n\n\nYou get a lot of control\nover where and how much memory this function can allocate. Because it is you,\nthe user/programmer, that provides the allocator for the function to use.\nThis makes \"total control\" over memory management easier to achieve in Zig.\n\n### What are allocators?\n\nAllocators in Zig are objects that you can use to allocate memory for your program.\nThey are similar to the memory allocating functions in C, like `malloc()` and `calloc()`.\nSo, if you need to use more memory than you initially have, during the execution of your program, you can simply ask\nfor more memory using an allocator.\n\nZig offers different types of allocators, and they are usually available through the `std.heap` module of\nthe standard library. So, just import the Zig standard library into your Zig module (with `@import(\"std\")`), and you can start\nusing these allocators in your code.\n\nFurthermore, every allocator object is built on top of the `Allocator` interface in Zig. This\nmeans that, every allocator object you find in Zig must have the methods `alloc()`,\n`create()`, `free()` and `destroy()`. So, you can change the type of allocator you are using,\nbut you don't need to change the function calls to the methods that do the memory allocation\n(and the free memory operations) for your program.\n\n### Why you need an allocator?\n\nAs we described at @sec-stack, everytime you make a function call in Zig,\na space in the stack is reserved for this function call. But the stack\nhave a key limitation which is: every object stored in the stack have a\nknown fixed length.\n\nBut in reality, there are two very commom instances where this \"fixed length limitation\" of the stack is a deal braker:\n\n1. the objects that you create inside your function might grow in size during the execution of the function.\n\n2. sometimes, it is impossible to know upfront how many inputs you will receive, or how big this input will be.\n\nAlso, there is another instance where you might want to use an allocator, which is when you want to write a function that returns a pointer\nto a local object. As I described at @sec-stack, you cannot do that if this local object is stored in the\nstack. However, if this object is stored in the heap, then, you can return a pointer to this object at the\nend of the function. Because you (the programmer) control the lyfetime of any heap memory that you allocate. You decide\nwhen this memory get's destroyed/freed.\n\nThese are commom situations where the stack is not good for.\nThat is why you need a different memory management strategy to\nstore these objects inside your function. You need to use\na memory type that can grow together with your objects, or that you\ncan control the lyfetime of this memory.\nThe heap fit this description.\n\nAllocating memory on the heap is commonly known as dynamic memory management. As the objects you create grow in size\nduring the execution of your program, you grow the amount of memory\nyou have by allocating more memory in the heap to store these objects. \nAnd you that in Zig, by using an allocator object.\n\n\n### The different types of allocators\n\n\nAt the moment of the writing of this book, in Zig, we have 6 different\nallocators available in the standard library:\n\n- `GeneralPurposeAllocator()`.\n- `page_allocator()`.\n- `FixedBufferAllocator()` and `ThreadSafeFixedBufferAllocator()`.\n- `ArenaAllocator()`.\n- `c_allocator()` (requires you to link to libc).\n\n\nEach allocator have it's own perks and limitations. All allocators, except `FixedBufferAllocator()` and `ArenaAllocator()`,\nare allocators that use the heap memory. So any memory that you allocate with\nthese allocators, will be placed in the heap.\n\n### General-purpose allocators\n\nThe `GeneralPurposeAllocator()`, as the name suggests, is a \"general purpose\" allocator. You can use it for every type\nof task. In the example below, I'm allocating enough space to store a single integer in the object `some_number`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const some_number = try allocator.create(u32);\n defer allocator.destroy(some_number);\n\n some_number.* = @as(u32, 45);\n}\n```\n:::\n\n\n\n\n\nWhile useful, you might want to use the `c_allocator()`, which is a alias to the C standard allocator `malloc()`. So, yes, you can use\n`malloc()` in Zig if you want to. Just use the `c_allocator()` from the Zig standard library. However,\nif you do use `c_allocator()`, you must link to Libc when compiling your source code with the\n`zig` compiler, by including the flag `-lc` in your compilation process.\nIf you do not link your source code to Libc, Zig will not be able to find the\n`malloc()` implementation in your system.\n\n### Page allocator\n\nThe `page_allocator()` is an allocator that allocates full pages of memory in the heap. In other words,\nevery time you allocate memory with `page_allocator()`, a full page of memory in the heap is allocated,\ninstead of just a small piece of it.\n\nThe size of this page depends on the system you are using.\nMost systems use a page size of 4KB in the heap, so, that is the amount of memory that is normally\nallocated in each call by `page_allocator()`. That is why, `page_allocator()` is considered a\nfast, but also \"wasteful\" allocator in Zig. Because it allocates a big amount of memory\nin each call, and you most likely will not need that much memory in your program.\n\n### Buffer allocators\n\nThe `FixedBufferAllocator()` and `ThreadSafeFixedBufferAllocator()` are allocator objects that\nwork with a fixed sized buffer that is stored in the stack. So these two allocators only allocates\nmemory in the stack. This also means that, in order to use these allocators, you must first\ncreate a buffer object, and then, give this buffer as an input to these allocators.\n\nIn the example below, I am creating a `buffer` object that is 10 elements long.\nNotice that I give this `buffer` object to the `FixedBufferAllocator()` constructor.\nNow, because this `buffer` object is 10 elements long, this means that I am limited to this space.\nI cannot allocate more than 10 elements with this allocator object. If I try to\nallocate more than that, the `alloc()` method will return an `OutOfMemory` error value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar buffer: [10]u8 = undefined;\nfor (0..buffer.len) |i| {\n buffer[i] = 0; // Initialize to zero\n}\n\nvar fba = std.heap.FixedBufferAllocator.init(&buffer);\nconst allocator = fba.allocator();\nconst input = try allocator.alloc(u8, 5);\ndefer allocator.free(input);\n```\n:::\n\n\n\n\n\n### Arena allocator {#sec-arena-allocator}\n\nThe `ArenaAllocator()` is an allocator object that takes a child allocator as input. The idea behind the `ArenaAllocator()` in Zig\nis similar to the concept of \"arenas\" in the programming language Go[^go-arena]. It is an allocator object that allows you\nto allocate memory as many times you want, but free all memory only once.\nIn other words, if you have, for example, called 5 times the method `alloc()` of an `ArenaAllocator()` object, you can\nfree all the memory you allocated over these 5 calls at once, by simply calling the `deinit()` method of the same `ArenaAllocator()` object.\n\n[^go-arena]: \n\nIf you give, for example, a `GeneralPurposeAllocator()` object as input to the `ArenaAllocator()` constructor, like in the example below, then, the allocations\nyou perform with `alloc()` will actually be made with the underlying object `GeneralPurposeAllocator()` that was passed.\nSo, with an arena allocator, any new memory you ask for is allocated by the child allocator. The only thing that an arena allocator\nreally do is helping you to free all the memory you allocated multiple times with just a single command. In the example\nbelow, I called `alloc()` 3 times. So, if I did not used an arena allocator, then, I would need to call\n`free()` 3 times to free all the allocated memory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nvar aa = std.heap.ArenaAllocator.init(gpa.allocator());\ndefer aa.deinit();\nconst allocator = aa.allocator();\n\nconst in1 = allocator.alloc(u8, 5);\nconst in2 = allocator.alloc(u8, 10);\nconst in3 = allocator.alloc(u8, 15);\n_ = in1; _ = in2; _ = in3;\n```\n:::\n\n\n\n\n\n\n### The `alloc()` and `free()` methods\n\nIn the code example below, we are accessing the `stdin`, which is\nthe standard input channel, to receive an input from the\nuser. We read the input given by the user with the `readUntilDelimiterOrEof()`\nmethod.\n\nNow, after reading the input of the user, we need to store this input somewhere in\nour program. That is why I use an allocator in this example. I use it to allocate some\namount of memory to store this input given by the user. More specifically, the method `alloc()`\nof the allocator object is used to allocate an array capable of storing 50 `u8` values.\n\nNotice that this `alloc()` method receives two inputs. The first one, is a type.\nThis defines what type of values the allocated array will store. In the example\nbelow, we are allocating an array of unsigned 8-bit integers (`u8`). But\nyou can create an array to store any type of value you want. Next, on the second argument, we\ndefine the size of the allocated array, by specifying how much elements\nthis array will contain. In the case below, we are allocating an array of 50 elements.\n\nAt @sec-zig-strings we described that strings in Zig are simply arrays of characters.\nEach character is represented by an `u8` value. So, this means that the array that\nwas allocated in the object `input` is capable of storing a string that is\n50-characters long.\n\nSo, in essence, the expression `var input: [50]u8 = undefined` would create\nan array for 50 `u8` values in the stack of the current scope. But, you\ncan allocate the same array in the heap by using the expression `var input = try allocator.alloc(u8, 50)`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdin = std.io.getStdIn();\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var input = try allocator.alloc(u8, 50);\n defer allocator.free(input);\n for (0..input.len) |i| {\n input[i] = 0; // initialize all fields to zero.\n }\n // read user input\n const input_reader = stdin.reader();\n _ = try input_reader.readUntilDelimiterOrEof(\n input,\n '\\n'\n );\n std.debug.print(\"{s}\\n\", .{input});\n}\n```\n:::\n\n\n\n\nAlso, notice that in this example, we use the `defer` keyword (which I described at @sec-defer) to run a small\npiece of code at the end of the current scope, which is the expression `allocator.free(input)`.\nWhen you execute this expression, the allocator will free the memory that it allocated\nfor the `input` object.\n\nWe have talked about this at @sec-heap. You **should always** explicitly free any memory that you allocate\nusing an allocator! You do that by using the `free()` method of the same allocator object you\nused to allocate this memory. The `defer` keyword is used in this example only to help us execute\nthis free operation at the end of the current scope.\n\n\n### The `create()` and `destroy()` methods\n\nWith the `alloc()` and `free()` methods, you can allocate memory to store multiple elements\nat once. In other words, with these methods, we always allocate an array to store multiple elements at once.\nBut what if you need enough space to store just a single item? Should you\nallocate an array of a single element through `alloc()`?\n\nThe answer is no! In this case,\nyou should use the `create()` method of the allocator object.\nEvery allocator object offers the `create()` and `destroy()` methods,\nwhich are used to allocate and free memory for a single item, respectively.\n\nSo, in essence, if you want to allocate memory to store an array of elements, you\nshould use `alloc()` and `free()`. But if you need to store just a single item,\nthen, the `create()` and `destroy()` methods are ideal for you.\n\nIn the example below, I'm defining a struct to represent an user of some sort.\nIt could be an user for a game, or a software to manage resources, it doesn't mater.\nNotice that I use the `create()` method this time, to store a single `User` object\nin the program. Also notice that I use the `destroy()` method to free the memory\nused by this object at the end of the scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst User = struct {\n id: usize,\n name: []const u8,\n\n pub fn init(id: usize, name: []const u8) User {\n return .{ .id = id, .name = name };\n }\n};\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const user = try allocator.create(User);\n defer allocator.destroy(user);\n\n user.* = User.init(0, \"Pedro\");\n}\n```\n:::\n", - "supporting": [ - "01-memory_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/01-zig-weird/execute-results/html.json b/_freeze/Chapters/01-zig-weird/execute-results/html.json index 42832303..d9cc8a5a 100644 --- a/_freeze/Chapters/01-zig-weird/execute-results/html.json +++ b/_freeze/Chapters/01-zig-weird/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "27b634b16b3c3fc42493ea4fabf15dd4", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n\n# Introducing Zig\n\nIn this chapter, I want to introduce you to the world of Zig.\nDespite it's rapidly growing over the last years, Zig is, still, a very young language^[New programming languages in general, take years and years to be developed.].\nAs a consequence, it's world is still very wild and to be explored.\nThis book is my attempt to help you on your personal journey for\nunderstanding and exploring the exciting world of Zig.\n\nI assume you have previous experience with some programming\nlanguage in this book, not necessarily with a low-level one.\nSo, if you have experience with Python, or Javascript, for example, is fine.\nBut, if you do have experience with low-level languages, such as C, C++, or\nRust, you will probably learn faster throughout this book.\n\n\n\n## What is Zig?\n\nZig is a modern, low-level, and general-purpose programming language. Some programmers interpret\nZig as the \"modern C language\". It is a simple language like C, but with some\nmodern features.\n\nIn the author's personal interpretation, Zig is tightly connected with \"less is more\".\nInstead of trying to become a modern language by adding more and more features,\nmany of the core improvements that Zig brings to the\ntable are actually about removing annoying and evil behaviours/features from C and C++.\nIn other words, Zig tries to be better by simplifying the language, and by having more consistent and robust behaviour.\nAs a result, analyzing, writing and debugging applications become much easier and simpler in Zig, than it is in C or C++.\n\nThis philosophy becomes clear with the following phrase from the official website of Zig:\n\n> \"Focus on debugging your application rather than debugging your programming language knowledge\".\n\nThis phrase is specially true for C++ programmers. Because C++ is a gigantic language,\nwith tons of features, and also, there are lots of different \"flavors of C++\". These elements\nare what makes C++ so much complex and hard to learn. Zig tries to go in the opposite direction.\nZig is a very simple language, more closely related to other simple languages such as C and Go.\n\nThe phrase above is still important for C programmers too. Because, even C being a simple\nlanguage, it is still hard sometimes to read and understand C code. For example, pre-processor macros in\nC are an evil source of confusion. They really makes it hard sometimes to debug\nC programs. Because macros are essentially a second language embedded in C that obscures\nyour C code. With macros, you are no longer 100% sure about which pieces\nof code are being sent to the compiler. It obscures the actual source code that you wrote.\n\nYou don't have macros in Zig. In Zig, the code you write, is the actual code that get's compiled by the compiler.\nYou don't have evil features that obscures you code.\nYou also don't have hidden control flow happening behind the scenes. And, you also\ndon't have functions or operators from the standard library that make\nhidden memory allocations behind your back.\n\nBy being a simpler language, Zig becomes much more clear and easier to read/write,\nbut at the same time, it also achieves a much more robust state, with more consistent\nbehaviour in edge situations. Once again, less is more.\n\n\n## Hello world in Zig\n\nWe begin our journey in Zig by creating a small \"Hello World\" program.\nTo start a new Zig project in your computer, you simply call the `init` command\nfrom the `zig` compiler.\nJust create a new directory in your computer, then, init a new Zig project\ninside this directory, like this:\n\n```bash\nmkdir hello_world\ncd hello_world\nzig init\n```\n\n```\ninfo: created build.zig\ninfo: created build.zig.zon\ninfo: created src/main.zig\ninfo: created src/root.zig\ninfo: see `zig build --help` for a menu of options\n```\n\n### Understanding the project files {#sec-project-files}\n\nAfter you run the `init` command from the `zig` compiler, some new files\nare created inside of your current directory. First, a \"source\" (`src`) directory\nis created, containing two files, `main.zig` and `root.zig`. Each `.zig` file\nis a separate Zig module, which is simply a text file that contains some Zig code.\n\n\nThe `main.zig` file for example, contains a `main()` function, which represents\nthe entrypoint of your program. It is where the execution of your program begins.\nAs you would expect from a C, C++, Rust or Go,\nto build an executabe program in Zig, you also need to declare a `main()` function in your module.\nSo, the `main.zig` module represents an executable program written in Zig.\n\nOn the other side, the `root.zig` module does not contain a `main()` function. Because\nit represents a library written in Zig. Libraries are different than executables.\nThey don't need to have an entrypoint to work.\nSo, you can choose which file (`main.zig` or `root.zig`) you want to follow depending on which type\nof project (executable or library) you want to develop.\n\n```bash\ntree .\n```\n\n```\n.\n├── build.zig\n├── build.zig.zon\n└── src\n ├── main.zig\n └── root.zig\n\n1 directory, 4 files\n```\n\n\nNow, in addition to the source directory, two other files were created in our working directory:\n`build.zig` and `build.zig.zon`. The first file (`build.zig`) represents a build script written in Zig.\nThis script is executed when you call the `build` command from the `zig` compiler.\nIn other words, this file contain Zig code that executes the necessary steps to build the entire project.\n\nIn general, low-level languages normally use a compiler to build your\nsource code into binary executables or binary libraries.\nNevertheless, this process of compiling your source code and building\nbinary executables or binary libraries from it, became a real challenge\nin the programming world, once the projects became bigger and bigger.\nAs a result, programmers created \"build systems\", which are a second set of tools designed to make this process\nof compiling and building complex projects, easier.\n\nExamples of build systems are CMake, GNU Make, GNU Autoconf and Ninja,\nwhich are used to build complex C and C++ projects.\nWith these systems, you can write scripts, which are called \"build scripts\".\nThey simply are scripts that describes the necessary steps to compile/build\nyour project.\n\nHowever, these are separate tools, that do not\nbelong to C/C++ compilers, like `gcc` or `clang`.\nAs a result, in C/C++ projects, you have not only to install and\nmanage your C/C++ compilers, but you also have to install and manage\nthese build systems separately.\n\nBut instead of using a separate build system, in Zig, we use the\nZig language itself to write build scripts.\nIn other words, Zig contains a native build system in it. And\nwe can use this build system to write small scripts in Zig,\nwhich describes the necessary steps to build/compile our Zig project[^zig-build-system].\nSo, everything you need to build a complex Zig project is the\n`zig` compiler, and nothing more.\n\n[^zig-build-system]: .\n\n\nNow that we described this topic in more depth, let's focus\non the second generated file (`build.zig.zon`), which is the Zig package manager configuration file,\nwhere you can list and manage the dependencies of your project. Yes, Zig has\na package manager (like `pip` in Python, `cargo` in Rust, or `npm` in Javascript) called Zon,\nand this `build.zig.zon` file is similar to the `package.json` file\nin Javascript projects, or, the `Pipfile` file in Python projects, or the `Cargo.toml` file in Rust projects.\n\n\n### Looking at the `root.zig` file {#sec-root-file}\n\nLet's take a look at the `root.zig` file, and start to analyze some of the\nsyntax of Zig.\nThe first thing that you might notice, is that every line of code\nthat have an expression in it, ends with a semicolon character (`;`). This is\nsimilar syntax to other languages such as C, C++ and Rust,\nwhich have the same rule.\n\nAlso, notice the `@import()` call at the first line. We use this built-in function\nto import functionality from other Zig modules into our current module.\nIn other words, the `@import()` function works similarly to the `#include` pre-processor\nin C or C++, or, to the `import` statement in Python or Javascript code.\nIn this example, we are importing the `std` module,\nwhich gives you access to the Zig standard library.\n\nIn this `root.zig` file, we can also see how assignments (i.e. creating new objects)\nare made in Zig. You can create a new object in Zig by using the following syntax\n`(const|var) name = value;`. In the example below, we are creating two constant\nobjects (`std` and `testing`). At @sec-assignments we talk more about objects in general.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst testing = std.testing;\n\nexport fn add(a: i32, b: i32) i32 {\n return a + b;\n}\n```\n:::\n\n\n\n\n\nFunctions in Zig are declared similarly to functions in Rust, using the `fn` keyword. In the example above,\nwe are declaring a function called `add()`, which have two arguments named `a` and `b`, and returns\na integer number (`i32`) as result.\n\nMaybe Zig is not exactly a strongly-typed language, because you do not need\nnecessarily to specify the type of every single object you create across your source code.\nBut you do have to explicitly specify the type of every function argument, and also,\nthe return type of every function you create in Zig. So, at least in function declarations,\nZig is a strongly-typed language.\n\nWe specify the type of an object or a function argument in Zig, by\nusing a colon character (`:`) followed by the type after the name of this object/function argument.\nWith the expressions `a: i32` and `b: i32`, we know that, both `a` and `b` arguments have type `i32`,\nwhich is a signed 32 bit integer. In this part,\nthe syntax in Zig is identical to the syntax in Rust, which also specifies types by\nusing the colon character.\n\nLastly, we have the return type of the function at the end of the line, before we open\nthe curly braces to start writing the function's body, which, in the example above is\nagain a signed 32 bit integer (`i32`) value. This specific part is different than it is in Rust.\nBecause in Rust, the return type of a function is specified after an arrow (`->`).\nWhile in Zig, we simply declare the return type directly after the parentheses with the function arguments.\n\nWe also have an `export` keyword before the function declaration. This keyword\nis similar to the `extern` keyword in C. It exposes the function\nto make it available in the library API.\n\nIn other words, if you have a project where you are currently building\na library for other people to use, you need to expose your functions\nso that they are available in the library's API, so that users can use it.\nIf we removed the `export` keyword from the `add()` function declaration,\nthen, this function would be no longer exposed in the library object built\nby the `zig` compiler.\n\n\nHaving that in mind, the keyword `export` is a keyword used in libraries written in Zig.\nSo, if you are not currently writing a library in your project, then, you do not need to\ncare about this keyword.\n\n\n### Looking at the `main.zig` file {#sec-main-file}\n\nNow that we have learned a lot about Zig's syntax from the `root.zig` file,\nlet's take a look at the `main.zig` file.\nA lot of the elements we saw in `root.zig` are also present in `main.zig`.\nBut we have some other elements that we did not have seen yet, so let's dive in.\n\nFirst, look at the return type of the `main()` function in this file.\nWe can see a small change. Now, the return\ntype of the function (`void`) is accompanied by an exclamation mark (`!`).\nWhat this exclamation mark is telling us, is that this `main()` function\nmight also return an error.\n\nSo, in this example, the `main()` function can either return `void`, or, return an error.\nThis is an interesting feature of Zig. If you write a function, and, something inside of\nthe body of this function might return an error, then, you are forced to:\n\n- either add the exclamation mark to the return type of the function, to make it clear that\nthis function might return an error.\n- or explicitly handle this error that might occur inside the function, to make sure that,\nif this error does happen, you are prepared, and your function will no longer return an error\nbecause you handled the error inside your function.\n\nIn most programming languages, we normally handle (or deals with) an error through\na *try catch* pattern, and Zig, this is no different. But, if we look at the `main()` function\nbelow, you can see that we do have a `try` keyword in the 5th line. But we do not have a `catch` keyword\nin this code.\n\nThis means that, we are using the keyword `try` to execute a code that might return an error,\nwhich is the `stdout.print()` expression. But because we do not have a `catch` keyword in this line,\nwe are not treating (or dealing with) this error.\nSo, if this expression do return an error, we are not catching and solving this error in any way.\nThat is why the exclamation mark was added to the return type of the function.\n\nSo, in essence, the `try` keyword executes the expression `stdout.print()`. If this expression\nreturns a valid value, then, the `try` keyword do nothing essentially. It simply passes this value forward. But, if the expression do\nreturn an error, then, the `try` keyword will unwrap and return this error from the function, and also print it's\nstack trace to `stderr`.\n\nThis might sound weird to you, if you come from a high-level language. Because in\nhigh-level languages, such as Python, if an error occurs somewhere, this error is automatically\nreturned and the execution of your program will automatically stops, even if you don't want\nto stop the execution. You are obligated to face the error.\n\nBut if you come from a low-level language, then, maybe, this idea do not sound so weird or distant to you.\nBecause in C for example, normally functions doesn't raise errors, or, they normally don't stop the execution.\nIn C, error handling\nis done by constantly checking the return value of the function. So, you run the function,\nand then, you use an if statement to check if the function returned a value that is valid,\nor, if it returned an error. If an error was returned from the function, then, the if statement\nwill execute some code that fixes this error.\n\nSo, at least for C programmers, they do need to write a lot of if statements to\nconstantly check for errors around their code. And because of that, this simple feature from Zig, might be\nextraordinary for them. Because this `try` keyword can automatically unwrap the error,\nand warn you about this error, and let you deal with it, without any extra work from the programmer.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n:::\n\n\n\n\n\nNow, another thing that you might have noticed in this code example, is that\nthe `main()` function is marked with the `pub` keyword. This keyword means\n\"public\". It marks the `main()` function as a *public function* from this module.\n\nIn other words, every function that you declare in your Zig module is, by default, a private (or \"static\")\nfunction that belongs to this Zig module, and can only be used (or called) from within this same module.\nUnless, you explicitly mark this function as a public function with the `pub` keyword.\nThis means that the `pub` keyword in Zig do essentially the opposite of what the `static` keyword\ndo in C/C++.\n\nBy making a function \"public\", you allow other Zig modules to access and call this function,\nand use it for they own purposes.\nall these other Zig modules need to do is, to import your module with the `@import()`\nbuilt-in function. Then, they get access to all public functions that are present in\nyour Zig module.\n\n\n### Compiling your source code {#sec-compile-code}\n\nYou can compile your Zig modules into a binary executable by running the `build-exe` command\nfrom the `zig` compiler. You simply list all the Zig modules that you want to build after\nthe `build-exe` command, separated by spaces. In the example below, we are compiling the module `main.zig`.\n\n```bash\nzig build-exe src/main.zig\n```\n\nSince we are building an executable, the `zig` compiler will look for a `main()` function\ndeclared in any of the files that you list after the `build-exe` command. If\nthe compiler does not find a `main()` function declared somewhere, a\ncompilation error will be raised, warning about this mistake.\n\nThe `zig` compiler also offers a `build-lib` and `build-obj` commands, which work\nthe exact same way as the `build-exe` command. The only difference is that, they compile your\nZig modules into a portale C ABI library, or, into object files, respectively.\n\nIn the case of the `build-exe` command, a binary executable file is created by the `zig`\ncompiler in the root directory of your project.\nIf we take a look now at the contents of our current directory, with a simple `ls` command, we can\nsee the binary file called `main` that was created by the compiler.\n\n```bash\nls\n```\n\n```\nbuild.zig build.zig.zon main src\n```\n\nIf I execute this binary executable, I get the \"Hello World\" message in the terminal\n, as we expected.\n\n```bash\n./main\n```\n\n```\nHello, world!\n```\n\n\n### Compile and execute at the same time {#sec-compile-run-code}\n\nOn the previous section, I presented the `zig build-exe` command, which\ncompiles Zig modules into an executable file. However, this means that,\nin order to execute the executable file, we have to run two different commands.\nFirst, the `zig build-exe` command, and then, we call the executable file\ncreated by the compiler.\n\nBut what if we wanted to perform these two steps,\nall at once, in a single command? We can do that by using the `zig run`\ncommand.\n\n```bash\nzig run src/main.zig\n```\n\n```\nHello, world!\n```\n\n### Compiling the entire project {#sec-compile-project}\n\nJust as I described at @sec-project-files, as our project grows in size and\ncomplexity, we usually prefer to organize the compilation and build process\nof the project into a build script, using some sort of \"build system\".\n\nIn other words, as our project grows in size and complexity,\nthe `build-exe`, `build-lib` and `build-obj` commands become\nharder to use directly. Because then, we start to list\nmultiple and multiple modules at the same time. We also\nstart to add built-in compilation flags to customize the\nbuild process for our needs, etc. It becomes a lot of work\nto write the necessary commands by hand.\n\nIn C/C++ projects, programmers normally opt to use CMake, Ninja, `Makefile` or `configure` scripts\nto organize this process. However, in Zig, we have a native build system in the language itself.\nSo, we can write build scripts in Zig to compile and build Zig projects. Then, all we\nneed to do, is to call the `zig build` command to build our project.\n\nSo, when you execute the `zig build` command, the `zig` compiler will search\nfor a Zig module named `build.zig` inside your current directory, which\nshould be your build script, containing the necessary code to compile and\nbuild your project. If the compiler do find this `build.zig` file in your directory,\nthen, the compiler will essentially execute a `zig run` command\nover this `build.zig` file, to compile and execute this build\nscript, which in turn, will compile and build your entire project.\n\n\n```bash\nzig build\n```\n\n\nAfter you execute this \"build project\" command, a `zig-out` directory\nis created in the root of your project directory, where you can find\nthe binary executables and libraries created from your Zig modules\naccordingly to the build commands that you specified at `build.zig`.\nWe will talk more about the build system in Zig latter in this book.\n\nIn the example below, I'm executing the binary executable\nnamed `hello_world` that was generated by the compiler after the\n`zig build` command.\n\n```bash\n./zig-out/bin/hello_world\n```\n\n```\nHello, world!\n```\n\n\n\n## How to learn Zig?\n\nWhat are the best strategies to learn Zig? \nFirst of all, of course this book will help you a lot on your journey through Zig.\nBut you will also need some extra resources if you want to be really good at Zig.\n\nAs a first tip, you can join a community with Zig programmers to get some help\n, when you need it:\n\n- Reddit forum: ;\n- Ziggit community: ;\n- Discord, Slack, Telegram, and others: ;\n\nNow, one of the best ways to learn Zig is to simply read Zig code. Try\nto read Zig code often, and things will become more clear.\nA C/C++ programmer would also probably give you this same tip.\nBecause this strategy really works!\n\nNow, where you can find Zig code to read?\nI personally think that, the best way of reading Zig code is to read the source code of the\nZig Standard Library. The Zig Standard Library is available at the [`lib/std` folder](https://github.com/ziglang/zig/tree/master/lib/std)[^zig-lib-std] on\nthe official GitHub repository of Zig. Access this folder, and start exploring the Zig modules.\n\nAlso, a great alternative is to read code from other large Zig\ncodebases, such as:\n\n1. the [Javascript runtime Bun](https://github.com/oven-sh/bun)[^bunjs].\n1. the [game engine Mach](https://github.com/hexops/mach)[^mach].\n1. a [LLama 2 LLM model implementation in Zig](https://github.com/cgbur/llama2.zig/tree/main)[^ll2].\n1. the [financial transactions database `tigerbeetle`](https://github.com/tigerbeetle/tigerbeetle)[^tiger].\n1. the [command-line arguments parser `zig-clap`](https://github.com/Hejsil/zig-clap)[^clap].\n1. the [UI framework `capy`](https://github.com/capy-ui/capy)[^capy].\n1. the [Language Protocol implementation for Zig, `zls`](https://github.com/zigtools/zls)[^zls].\n1. the [event-loop library `libxev`](https://github.com/mitchellh/libxev)[^xev].\n\n[^xev]: \n[^zls]: \n[^capy]: \n[^clap]: \n[^tiger]: \n[^ll2]: \n[^mach]: \n[^bunjs]: .\n\nAll these assets are available on GitHub,\nand this is great, because we can use the GitHub search bar in our advantage,\nto find Zig code that fits our description.\nFor example, you can always include `lang:Zig` in the GitHub search bar when you\nare searching for a particular pattern. This will limit the search to only Zig modules.\n\n[^zig-lib-std]: \n\nAlso, a great alternative is to consult online resources and documentations.\nHere is a quick list of resources that I personally use from time to time to learn\nmore about the language each day:\n\n- Zig Language Reference: ;\n- Zig Standard Library Reference: ;\n- Zig Guide: ;\n- Karl Seguin Blog: ;\n- Zig News: ;\n- Read the code written by one of the Zig core team members: ;\n- Some livecoding sessions are transmitted in the Zig Showtime Youtube Channel: ;\n\n\nAnother great strategy to learn Zig, or honestly, to learn any language you want,\nis to practice it by solving exercises. For example, there is a famous repository\nin the Zig community called [Ziglings](https://codeberg.org/ziglings/exercises/)[^ziglings]\n, which contains more than 100 small exercises that you can solve. It is a repository of\ntiny programs written in Zig that are currently broken, and your responsibility is to\nfix these programs, and make them work again.\n\n[^ziglings]: .\n\nA famous tech YouTuber known as *The Primeagen* also posted some videos (at YouTube)\nwhere he solves these exercises from Ziglings. The first video is named\n[\"Trying Zig Part 1\"](https://www.youtube.com/watch?v=OPuztQfM3Fg&t=2524s&ab_channel=TheVimeagen)[^prime1].\n\n[^prime1]: .\n\nAnother great alternative, is to solve the [Advent of Code exercises](https://adventofcode.com/)[^advent-code].\nThere are people that already took the time to learn and solve the exercises, and they posted\ntheir solutions on GitHub as well, so, in case you need some resource to compare while solving\nthe exercises, you can look at these two repositories:\n\n- ;\n- ;\n\n[^advent-code]: \n\n\n\n\n\n\n## Creating new objects in Zig (i.e. identifiers) {#sec-assignments}\n\nLet's talk more about objects in Zig. Readers that have past experience\nwith other programming languages might know this concept through\na different name, such as: \"variable\" or \"identifier\". In this book, I choose\nto use the term \"object\" to refer to this concept.\n\nTo create a new object (or a new \"identifier\") in Zig, we use\nthe keywords `const` or `var`. These keywords specificy if the object\nthat you are creating is mutable or not.\nIf you use `const`, then the object you are\ncreating is a constant (or immutable) object, which means that once you declare this object, you\ncan no longer change the value stored inside this object.\n\nOn the other side, if you use `var`, then, you are creating a variable (or mutable) object.\nYou can change the value of this object as many times you want. Using the\nkeyword `var` in Zig is similar to using the keywords `let mut` in Rust.\n\n### Constant objects vs variable objects\n\nIn the code example below, we are creating a new constant object called `age`.\nThis object stores a number representing the age of someone. However, this code example\ndoes not compiles succesfully. Because on the next line of code, we are trying to change the value\nof the object `age` to 25.\n\nThe `zig` compiler detects that we are trying to change\nthe value of an object/identifier that is constant, and because of that,\nthe compiler will raise a compilation error, warning us about the mistake.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 24;\n// The line below is not valid!\nage = 25;\n```\n:::\n\n\n\n\n\n```\nt.zig:10:5: error: cannot assign to constant\n age = 25;\n ~~^~~\n```\n\nIn contrast, if you use `var`, then, the object created is a variable object.\nWith `var` you can declare this object in your source code, and then,\nchange the value of this object how many times you want over future points\nin your source code.\n\nSo, using the same code example exposed above, if I change the declaration of the\n`age` object to use the `var` keyword, then, the program gets compiled succesfully.\nBecause now, the `zig` compiler detects that we are changing the value of an\nobject that allows this behaviour, because it is an \"variable object\".\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = 24;\nage = 25;\n```\n:::\n\n\n\n\n\n\n### Declaring without an initial value\n\nBy default, when you declare a new object in Zig, you must give it\nan initial value. In other words, this means\nthat we have to declare, and, at the same time, initialize every object we\ncreate in our source code.\n\nOn the other hand, you can, in fact, declare a new object in your source code,\nand not give it an explicit value. But we need to use a special keyword for that,\nwhich is the `undefined` keyword.\n\nIs important to emphasize that, you should avoid using `undefined` as much as possible.\nBecause when you use this keyword, you leave your object uninitialized, and, as a consequence,\nif for some reason, your code use this object while it is uninitialized, then, you will definitely\nhave undefined behaviour and major bugs in your program.\n\nIn the example below, I'm declaring the `age` object again. But this time,\nI do not give it an initial value. The variable is only initialized at\nthe second line of code, where I store the number 25 in this object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = undefined;\nage = 25;\n```\n:::\n\n\n\n\n\nHaving these points in mind, just remember that you should avoid as much as possible to use `undefined` in your code.\nAlways declare and initialize your objects. Because this gives you much more safety in your program.\nBut in case you really need to declare an object without initializing it... the\n`undefined` keyword is the way to do it in Zig.\n\n\n### There is no such thing as unused objects\n\nEvery object (being constant or variable) that you declare in Zig **must be used in some way**. You can give this object\nto a function call, as a function argument, or, you can use it in another expression\nto calculate the value of another object, or, you can call a method that belongs to this\nparticular object. \n\nIt doesn't matter in which way you use it. As long as you use it.\nIf you try to break this rule, i.e. if your try to declare a object, but not use it,\nthe `zig` compiler will not compile your Zig source code, and it will issue a error\nmessage warning that you have unused objects in your code.\n\nLet's demonstrate this with an example. In the source code below, we declare a constant object\ncalled `age`. If you try to compile a simple Zig program with this line of code below,\nthe compiler will return an error as demonstrated below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 15;\n```\n:::\n\n\n\n\n\n```\nt.zig:4:11: error: unused local constant\n const age = 15;\n ^~~\n```\n\nEverytime you declare a new object in Zig, you have two choices:\n\n1. you either use the value of this object;\n2. or you explicitly discard the value of the object;\n\nTo explicitly discard the value of any object (constant or variable), all you need to do is to assign\nthis object to an special character in Zig, which is the underscore (`_`).\nWhen you assign an object to a underscore, like in the example below, the `zig` compiler will automatically\ndiscard the value of this particular object.\n\nYou can see in the example below that, this time, the compiler did not\ncomplain about any \"unused constant\", and succesfully compiled our source code.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It compiles!\nconst age = 15;\n_ = age;\n```\n:::\n\n\n\n\n\nNow, remember, everytime you assign a particular object to the underscore, this object\nis essentially destroyed. It is discarded by the compiler. This means that you can no longer\nuse this object further in your code. It doesn't exist anymore.\n\nSo if you try to use the constant `age` in the example below, after we discarded it, you\nwill get a loud error message from the compiler (talking about a \"pointless discard\")\nwarning you about this mistake.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It does not compile.\nconst age = 15;\n_ = age;\n// Using a discarded value!\nstd.debug.print(\"{d}\\n\", .{age + 2});\n```\n:::\n\n\n\n\n\n```\nt.zig:7:5: error: pointless discard\n of local constant\n```\n\n\nThis same rule applies to variable objects. Every variable object must also be used in\nsome way. And if you assign a variable object to the underscore,\nthis object also get's discarded, and you can no longer use this object.\n\n\n\n### You must mutate every variable objects\n\nEvery variable object that you create in your source code must be mutated at some point.\nIn other words, if you declare an object as a variable\nobject, with the keyword `var`, and you do not change the value of this object\nat some point in the future, the `zig` compiler will detect this,\nand it will raise an error warning you about this mistake.\n\nThe concept behind this is that every object you create in Zig should be preferably a\nconstant object, unless you really need an object whose value will\nchange during the execution of your program.\n\nSo, if I try to declare a variable object such as `where_i_live` below,\nand I do not change the value of this object in some way,\nthe `zig` compiler raises an error message with the phrase \"variable is never mutated\".\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar where_i_live = \"Belo Horizonte\";\n_ = where_i_live;\n```\n:::\n\n\n\n\n\n```\nt.zig:7:5: error: local variable is never mutated\nt.zig:7:5: note: consider using 'const'\n```\n\n## Primitive Data Types {#sec-primitive-data-types}\n\nZig have many different primitive data types available for you to use.\nYou can see the full list of available data types at the official\n[Language Reference page](https://ziglang.org/documentation/master/#Primitive-Types)[^lang-data-types].\n\n[^lang-data-types]: .\n\nBut here is a quick list:\n\n- Unsigned integers: `u8`, 8-bit integer; `u16`, 16-bit integer; `u32`, 32-bit integer; `u64`, 64-bit integer; `u128`, 128-bit integer.\n- Signed integers: `i8`, 8-bit integer; `i16`, 16-bit integer; `i32`, 32-bit integer; `i64`, 64-bit integer; `i128`, 128-bit integer.\n- Float number: `f16`, 16-bit floating point; `f32`, 32-bit floating point; `f64`, 64-bit floating point; `f128`, 128-bit floating point;\n- Boolean: `bool`, represents true or false values.\n- C ABI compatible types: `c_long`, `c_char`, `c_short`, `c_ushort`, `c_int`, `c_uint`, and many others.\n- Pointer sized integers: `isize` and `usize`.\n\n\n\n\n\n\n\n## Arrays {#sec-arrays}\n\nYou create arrays in Zig by using a syntax that resembles the C syntax.\nFirst, you specify the size of the array (i.e. the number of elements that will be stored in the array)\nyou want to create inside a pair of brackets.\n\nThen, you specify the data type of the elements that will be stored inside this array.\nAll elements present in an array in Zig must have the same data type. For example, you cannot mix elements\nof type `f32` with elements of type `i32` in the same array.\n\nAfter that, you simply list the values that you want to store in this array inside\na pair of curly braces.\nIn the example below, I am creating two constant objets that contain different arrays.\nThe first object contains an array of 4 integer values, while the second object,\nan array of 3 floating point values.\n\nNow, you should notice that in the object `ls`, I am\nnot explicitly specifying the size of the array inside of the brackets. Instead\nof using a literal value (like the value 4 that I used in the `ns` object), I am\nusing the special character underscore (`_`). This syntax tells the `zig` compiler\nto fill this field with the number of elements listed inside of the curly braces.\nSo, this syntax `[_]` is for lazy (or smart) programmers who leave the job of\ncounting how many elements there are in the curly braces for the compiler.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst ls = [_]f64{432.1, 87.2, 900.05};\n_ = ns; _ = ls;\n```\n:::\n\n\n\n\n\nIs worth noting that these are static arrays, meaning that\nthey cannot grow in size.\nOnce you declare your array, you cannot change the size of it.\nThis is very commom in low level languages.\nBecause low level languages normally wants to give you (the programmer) full control over memory,\nand the way in which arrays are expanded is tightly related to\nmemory management.\n\n\n### Selecting elements of the array\n\nOne very commom activity is to select specific portions of an array\nyou have in your source code.\nIn Zig, you can select a specific element from your\narray, by simply providing the index of this particular\nelement inside brackets after the object name.\nIn the example below, I am selecting the third element from the\n`ns` array. Notice that Zig is a \"zero-index\" based language,\nlike C, C++, Rust, Python, and many other languages.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\ntry stdout.print(\"{d}\\n\", .{ ns[2] });\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n12\n```\n\n\n:::\n:::\n\n\n\n\n\nIn contrast, you can also select specific slices (or sections) of your array, by using a\nrange selector. Some programmers also call these selectors of \"slice selectors\",\nand they also exist in Rust, and have the exact same syntax as in Zig.\nAnyway, a range selector is a special expression in Zig that defines\na range of indexes, and it have the syntax `start..end`.\n\nIn the example below, at the second line of code,\nthe `sl` object stores a slice (or a portion) of the\n`ns` array. More precisely, the elements at index 1 and 2\nin the `ns` array. \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\n_ = sl;\n```\n:::\n\n\n\n\n\nWhen you use the `start..end` syntax,\nthe \"end tail\" of the range selector is non-inclusive,\nmeaning that, the index at the end is not included in the range that is\nselected from the array.\nTherefore, the syntax `start..end` actually means `start..end - 1` in practice.\n\nYou can for example, create a slice that goes from the first to the\nlast elements of the array, by using `ar[0..ar.len]` syntax\nIn other words, it is a slice that\naccess all elements in the array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [4]u8{48, 24, 12, 6};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\n\nYou can also use the syntax `start..` in your range selector.\nWhich tells the `zig` compiler to select the portion of the array\nthat begins at the `start` index until the last element of the array.\nIn the example below, we are selecting the range from index 1\nuntil the end of the array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..];\n_ = sl;\n```\n:::\n\n\n\n\n\n\n### More on slices\n\nAs we discussed before, in Zig, you can select specific portions of an existing\narray. This is called *slicing* in Zig [@zigguide], because when you select a portion\nof an array, you are creating a slice object from that array.\n\nA slice object is essentially a pointer object accompained by a length number.\nThe pointer object points to the first element in the slice, and the\nlength number tells the `zig` compiler how many elements there are in this slice.\n\n> Slices can be thought of as a pair of `[*]T` (the pointer to the data) and a `usize` (the element count) [@zigguide].\n\nThrough the pointer contained inside the slice you can access the elements (or values)\nthat are inside this range (or portion) that you selected from the original array.\nBut the length number (which you can access through the `len` property of your slice object)\nis the really big improvement (over C arrays for example) that Zig brings to the table here.\n\nBecause with this length number\nthe `zig` compiler can easily check if you are trying to access an index that is out of the bounds of this particular slice,\nor, if you are causing any buffer overflow problems. In the example below,\nwe access the `len` property of the slice `sl`, which tells us that this slice\nhave 2 elements in it.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\ntry stdout.print(\"{d}\\n\", .{sl.len});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n2\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Array operators\n\nThere are two array operators available in Zig that are very useful.\nThe array concatenation operator (`++`), and the array multiplication operator (`**`). As the name suggests,\nthese are array operators.\n\nOne important detail about these two operators is that they work\nonly when both operands have a size (or \"length\") that is compile-time known.\nWe are going to talk more about\nthe differences between \"compile-time known\" and \"runtime known\" at @sec-compile-time.\nBut for now, keep this information in mind, that you cannot use these operators in every situation.\n\nIn summary, the `++` operator creates a new array that is the concatenation,\nof both arrays provided as operands. So, the expression `a ++ b` produces\na new array which contains all the elements from arrays `a` and `b`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst b = [_]u8{4,5};\nconst c = a ++ b;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 4, 5 }\n```\n\n\n:::\n:::\n\n\n\n\n\nThis `++` operator is particularly useful to concatenate strings together.\nStrings in Zig are described in depth at @sec-zig-strings. In summary, a string object in Zig\nis essentially an arrays of bytes. So, you can use this array concatenation operator\nto effectively concatenate strings together.\n\nIn contrast, the `**` operator is used to replicate an array multiple\ntimes. In other words, the expression `a ** 3` creates a new array\nwhich contains the elements of the array `a` repeated 3 times.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst c = a ** 2;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 1, 2, 3 }\n```\n\n\n:::\n:::\n\n\n\n\n\n\n## Blocks and scopes {#sec-blocks}\n\nBlocks are created in Zig by a pair of curly braces. A block is just a group of\nexpressions (or statements) contained inside of a pair of curly braces. All of these expressions that\nare contained inside of this pair of curly braces belongs to the same scope.\n\nIn other words, a block just delimits a scope in your code.\nThe objects that you define inside the same block belongs to the same\nscope, and, therefore, are accessible from within this scope.\nAt the same time, these objects are not accessible outside of this scope.\nSo, you could also say that blocks are used to limit the scope of the objects that you create in\nyour source code. In less technical terms, blocks are used to specify where in your source code\nyou can access whatever object you have in your source code.\n\nSo, a block is just a group of expressions contained inside a pair of curly braces.\nAnd every block have it's own scope separated from the others.\nThe body of a function is a classic example of a block. If statements, for and while loops\n(and any other structure in the language that uses the pair of curly braces)\nare also examples of blocks.\n\nThis means that, every if statement, or for loop,\netc., that you create in your source code have it's own separate scope.\nThat is why you can't access the objects that you defined inside\nof your for loop (or if statement) in an outer scope, i.e. a scope outside of the for loop.\nBecause you are trying to access an object that belongs to a scope that is different\nthan your current scope.\n\n\nYou can create blocks within blocks, with multiple levels of nesting.\nYou can also (if you want to) give a label to a particular block, with the colon character (`:`).\nJust write `label:` before you open the pair of curly braces that delimits your block. When you label a block\nin Zig, you can use the `break` keyword to return a value from this block, like as if it\nwas a function's body. You just write the `break` keyword, followed by the block label in the format `:label`,\nand the expression that defines the value that you want to return.\n\nLike in the example below, where we are returning the value from the `y` object\nfrom the block `add_one`, and saving the result inside the `x` object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar y: i32 = 123;\nconst x = add_one: {\n y += 1;\n break :add_one y;\n};\nif (x == 124 and y == 124) {\n try stdout.print(\"Hey!\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHey!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\n\n## How strings work in Zig? {#sec-zig-strings}\n\nThe first project that we are going to build and discuss in this book is a base64 encoder/decoder (@sec-base64).\nBut in order for us to build such a thing, we need to get a better understanding on how strings work in Zig.\nSo let's discuss this specific aspect of Zig.\n\nIn Zig, a string literal value is just a pointer to a null-terminated array of bytes (i.e. the same thing as a C string).\nHowever, a string object in Zig is a little more than just a pointer. A string object\nin Zig is an object of type `[]const u8`, and, this object always contains two things: the\nsame null-terminated array of bytes that you would find in a string literal value, plus a length value.\nEach byte in this \"array of bytes\" is represented by an `u8` value, which is an unsigned 8 bit integer,\nso, it is equivalent to the C data type `unsigned char`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This is a string literal value:\n\"A literal value\";\n// This is a string object:\nconst object: []const u8 = \"A string object\";\n```\n:::\n\n\n\n\n\nZig always assumes that this sequence of bytes is UTF-8 encoded. This might not be true for every\nsequence of bytes you have it, but is not really Zig's job to fix the encoding of your strings\n(you can use [`iconv`](https://www.gnu.org/software/libiconv/)[^libiconv] for that).\nToday, most of the text in our modern world, specially on the web, should be UTF-8 encoded.\nSo if your string literal is not UTF-8 encoded, then, you will likely\nhave problems in Zig.\n\n[^libiconv]: \n\nLet’s take for example the word \"Hello\". In UTF-8, this sequence of characters (H, e, l, l, o)\nis represented by the sequence of decimal numbers 72, 101, 108, 108, 111. In xecadecimal, this\nsequence is `0x48`, `0x65`, `0x6C`, `0x6C`, `0x6F`. So if I take this sequence of hexadecimal values,\nand ask Zig to print this sequence of bytes as a sequence of characters (i.e. a string), then,\nthe text \"Hello\" will be printed into the terminal:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const bytes = [_]u8{0x48, 0x65, 0x6C, 0x6C, 0x6F};\n try stdout.print(\"{s}\\n\", .{bytes});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello\n```\n\n\n:::\n:::\n\n\n\n\n\n\nIf you want to see the actual bytes that represents a string in Zig, you can use\na `for` loop to iterate through each byte in the string, and ask Zig to print each byte as an hexadecimal\nvalue to the terminal. You do that by using a `print()` statement with the `X` formatting specifier,\nlike you would normally do with the [`printf()` function](https://cplusplus.com/reference/cstdio/printf/)[^printfs] in C.\n\n[^printfs]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |byte| {\n try stdout.print(\"{X} \", .{byte});\n }\n try stdout.print(\"\\n\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: 54 68 69 \n 73 20 69 73 20 61 6E 20 65 78 61 6D 70 6C 65 20 6F\n F 66 20 73 74 72 69 6E 67 20 6C 69 74 65 72 61 6C 2\n 20 69 6E 20 5A 69 67 \n```\n\n\n:::\n:::\n\n\n\n\n\n### Strings in C\n\nAt first glance, this looks very similar to how C treats strings as well. In more details, string values\nin C are treated internally as an array of arbitrary bytes, and this array is also null-terminated.\n\nBut one key difference between a Zig string and a C string, is that Zig also stores the length of\nthe array inside the string object. This small detail makes your code safer, because is much\neasier for the Zig compiler to check if you are trying to access an element that is \"out of bounds\", i.e. if\nyour trying to access memory that does not belong to you.\n\nTo achieve this same kind of safety in C, you have to do a lot of work that kind of seems pointless.\nSo getting this kind of safety is not automatic and much harder to do in C. For example, if you want\nto track the length of your string troughout your program in C, then, you first need to loop through\nthe array of bytes that represents this string, and find the null element (`'\\0'`) position to discover\nwhere exactly the array ends, or, in other words, to find how much elements the array of bytes contain.\n\nTo do that, you would need something like this in C. In this example, the C string stored in\nthe object `array` is 25 bytes long:\n\n```c\n#include \nint main() {\n char* array = \"An example of string in C\";\n int index = 0;\n while (1) {\n if (array[index] == '\\0') {\n break;\n }\n index++;\n }\n printf(\"Number of elements in the array: %d\\n\", index);\n}\n```\n\n```\nNumber of elements in the array: 25\n```\n\nBut in Zig, you do not have to do this, because the object already contains a `len`\nfield which stores the length information of the array. As an example, the `string_object` object below is 43 bytes long:\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"{d}\\n\", .{string_object.len});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n43\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### A better look at the object type\n\nNow, we can inspect better the type of objects that Zig create. To check the type of any object in Zig, you can use the\n`@TypeOf()` function. If we look at the type of the `simple_array` object below, you will find that this object\nis a array of 4 elements. Each element is a signed integer of 32 bits which corresponds to the data type `i32` in Zig.\nThat is what an object of type `[4]i32` is.\n\nBut if we look closely at the type of the `string_object` object below, you will find that this object is a\nconstant pointer (hence the `*const` annotation) to an array of 43 elements (or 43 bytes). Each element is a\nsingle byte (more precisely, an unsigned 8 bit integer - `u8`), that is why we have the `[43:0]u8` portion of the type below.\nIn other words, the string stored inside the `string_object` object is 43 bytes long.\nThat is why you have the type `*const [43:0]u8` below.\n\nIn the case of `string_object`, it is a constant pointer (`*const`) because the object `string_object` is declared\nas constant in the source code (in the line `const string_object = ...`). So, if we changed that for some reason, if\nwe declare `string_object` as a variable object (i.e. `var string_object = ...`), then, `string_object` would be\njust a normal pointer to an array of unsigned 8-bit integers (i.e. `* [43:0]u8`).\n\nNow, if we create an pointer to the `simple_array` object, then, we get a constant pointer to an array of 4 elements (`*const [4]i32`),\nwhich is very similar to the type of the `string_object` object. This demonstrates that a string object (or a string literal)\nin Zig is already a pointer to an array.\n\nJust remember that a \"pointer to an array\" is different than an \"array\". So a string object in Zig is a pointer to an array\nof bytes, and not simply an array of bytes.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n const simple_array = [_]i32{1, 2, 3, 4};\n try stdout.print(\"Type of array object: {}\", .{@TypeOf(simple_array)});\n try stdout.print(\n \"Type of string object: {}\",\n .{@TypeOf(string_object)}\n );\n try stdout.print(\n \"Type of a pointer that points to the array object: {}\",\n .{@TypeOf(&simple_array)}\n );\n}\n```\n:::\n\n\n\n\n\n```\nType of array object: [4]i32\nType of string object: *const [43:0]u8\nType of a pointer that points to\n the array object: *const [4]i32\n```\n\n\n### Byte vs unicode points\n\nIs important to point out that each byte in the array is not necessarily a single character.\nThis fact arises from the difference between a single byte and a single unicode point.\n\nThe encoding UTF-8 works by assigning a number (which is called a unicode point) to each character in\nthe string. For example, the character \"H\" is stored in UTF-8 as the decimal number 72. This means that\nthe number 72 is the unicode point for the character \"H\". Each possible character that can appear in a\nUTF-8 encoded string have its own unicode point.\n\nFor example, the Latin Capital Letter A With Stroke (Ⱥ) is represented by the number (or the unicode point)\n570. However, this decimal number (570) is higher than the maximum number stored inside a single byte, which\nis 255. In other words, the maximum decimal number that can be represented with a single byte is 255. That is why,\nthe unicode point 570 is actually stored inside the computer’s memory as the bytes `C8 BA`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"Ⱥ\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |char| {\n try stdout.print(\"{X} \", .{char});\n }\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: C8 BA \n```\n\n\n:::\n:::\n\n\n\n\n\n\nThis means that to store the character Ⱥ in an UTF-8 encoded string, we need to use two bytes together\nto represent the number 570. That is why the relationship between bytes and unicode points is not always\n1 to 1. Each unicode point is a single character in the string, but not always a single byte corresponds\nto a single unicode point.\n\nAll of this means that if you loop trough the elements of a string in Zig, you will be looping through the\nbytes that represents that string, and not through the characters of that string. In the Ⱥ example above,\nthe for loop needed two iterations (instead of a single iteration) to print the two bytes that represents this Ⱥ letter.\n\nNow, all english letters (or ASCII letters if you prefer) can be represented by a single byte in UTF-8. As a\nconsequence, if your UTF-8 string contains only english letters (or ASCII letters), then, you are lucky. Because\nthe number of bytes will be equal to the number of characters in that string. In other words, in this specific\nsituation, the relationship between bytes and unicode points is 1 to 1.\n\nBut on the other side, if your string contains other types of letters… for example, you might be working with\ntext data that contains, chinese, japanese or latin letters, then, the number of bytes necessary to represent\nyour UTF-8 string will likely be much higher than the number of characters in that string.\n\nIf you need to iterate through the characters of a string, instead of its bytes, then, you can use the\n`std.unicode.Utf8View` struct to create an iterator that iterates through the unicode points of your string.\n\nIn the example below, we loop through the japanese characters “アメリカ”. Each of the four characters in\nthis string is represented by three bytes. But the for loop iterates four times, one iteration for each\ncharacter/unicode point in this string:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n var utf8 = (\n (try std.unicode.Utf8View.init(\"アメリカ\"))\n .iterator()\n );\n while (utf8.nextCodepointSlice()) |codepoint| {\n try stdout.print(\n \"got codepoint {}\\n\",\n .{std.fmt.fmtSliceHexUpper(codepoint)}\n );\n }\n}\n```\n:::\n\n\n\n\n\n```\ngot codepoint E382A2\ngot codepoint E383A1\ngot codepoint E383AA\ngot codepoint E382AB\n```\n\n\n\n## Safety in Zig\n\nA general trend in modern low-level programming languages is safety. As our modern world\nbecome more interconnected with techology and computers,\nthe data produced by all of this technology becomes one of the most important\n(and also, one of the most dangerous) assets that we have.\n\nThis is probably the main reason why modern low-level programming languages\nhave been giving great attention to safety, specially memory safety, because\nmemory corruption is still the main target for hackers to exploit.\nThe reality is that we don't have an easy solution for this problem.\nFor now, we only have techniques and strategies that mitigates these\nproblems.\n\nAs Richard Feldman explains on his [most recent GOTO conference talk](https://www.youtube.com/watch?v=jIZpKpLCOiU&ab_channel=GOTOConferences)[^gotop]\n, we haven't figured it out yet a way to achieve **true safety in technology**.\nIn other words, we haven't found a way to build software that won't be exploited\nwith 100% certainty. We can greatly reduce the risks of our software being\nexploited, by ensuring memory safety for example. But this is not enough\nto achieve \"true safety\" territory.\n\nBecause even if you write your program in a \"safe language\", hackers can still\nexploit failures in the operational system where your program is running (e.g. maybe the\nsystem where your code is running have a \"backdoor exploit\" that can still\naffect your code in unexpected ways), or also, they can exploit the features\nfrom the architecture of your computer. A recently found exploit\nthat involves memory invalidation through a feature of \"memory tags\"\npresent in ARM chips is an example of that [@exploit1].\n\n[^gotop]: \n\nThe question is: what Zig and other languages have been doing to mitigate this problem?\nIf we take Rust as an example, Rust is, for the most part[^rust-safe], a memory safe\nlanguage by enforcing specific rules to the developer. In other words, the key feature\nof Rust, the *borrow checker*, forces you to follow a specific logic when you are writing\nyour Rust code, and the Rust compiler will always complain everytime you try to go out of this\npattern.\n\n[^rust-safe]: Actually, a lot of existing Rust code is still memory unsafe, because they communicate with external libraries through FFI (*foreign function interface*), which disables the borrow-checker features through the `unsafe` keyword.\n\n\nIn contrast, the Zig language is not a memory safe language by default.\nInstead of forcing the developer to follow a specific rule, the Zig language\nachieves memory safety by offering tools that the developer can use for this purpose.\nIn other words, the `zig` compiler does not obligates you to use such tools.\nBut there is often no reason to not use these tools in your Zig code,\nso you often achieve a similar level of memory safety of Rust in Zig\nby simply using these tools.\n\nThe tools listed below are related to memory safety in Zig. That is, they help you to achieve\nmemory safety in your Zig code:\n\n- `defer` allows you to keep free operations phisically close to allocations. This helps you to avoid memory leaks, \"use after free\", and also \"double-free\" problems. Furthermore, it also keeps free operations logically tied to the end of the current scope, which greatly reduces the mental overhead about object lifetime.\n- `errdefer` helps you to garantee that your program frees the allocated memory, even if a runtime error occurs.\n- pointers and object are non-nullable by default. This helps you to avoid memory problems that might arise from de-referencing null pointers.\n- Zig offers some native types of allocators (called \"testing allocators\") that can detect memory leaks and double-frees. These types of allocators are widely used on unit tests, so they make your unit tests a weapon that you can use to detect memory problems in your code.\n- arrays and slices in Zig have their lengths embedded in the object itself, which makes the `zig` compiler very effective on detecting \"index out-of-range\" type of errors, and avoiding buffer overflows.\n\n\nDespite these features that Zig offers that are related to memory safety issues, the language\nalso have some rules that help you to achieve another type of safety, which is more related to\nprogram logic safety. These rules are:\n\n- pointers and objects are non-nullable by default. Which eliminates an edge case that might break the logic of your program.\n- switch statements must exaust all possible options.\n- the `zig` compiler forces you to handle every possible error.\n\n\n## Other parts of Zig\n\nWe already learned a lot about Zig's syntax, and also, some pretty technical\ndetails about it. Just as a quick recap:\n\n- We talked about how functions are written in Zig at @sec-root-file and @sec-main-file.\n- How to create new objects/identifiers at @sec-root-file and specially at @sec-assignments.\n- How strings work in Zig at @sec-zig-strings.\n- How to use arrays and slices at @sec-arrays.\n- How to import functionality from other Zig modules at @sec-root-file.\n\n\nBut, for now, this amount of knowledge is enough for us to continue with this book.\nLater, over the next chapters we will still talk more about other parts of\nZig's syntax that are also equally important as the other parts. Such as:\n\n\n- How Object-Oriented programming can be done in Zig through *struct declarations* at @sec-structs-and-oop.\n- Basic control flow syntax at @sec-zig-control-flow.\n- Enums at @sec-enum;\n- Pointers and Optionals at @sec-pointer;\n- Error handling with `try` and `catch` at @sec-error-handling;\n- Unit tests at @sec-unittests;\n- Vectors;\n- Build System at @sec-build-system;\n\n\n\n\n", + "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Introducing Zig\n\nIn this chapter, I want to introduce you to the world of Zig.\nDespite it's rapidly growing over the last years, Zig is, still, a very young language^[New programming languages in general, take years and years to be developed.].\nAs a consequence, it's world is still very wild and to be explored.\nThis book is my attempt to help you on your personal journey for\nunderstanding and exploring the exciting world of Zig.\n\nI assume you have previous experience with some programming\nlanguage in this book, not necessarily with a low-level one.\nSo, if you have experience with Python, or Javascript, for example, is fine.\nBut, if you do have experience with low-level languages, such as C, C++, or\nRust, you will probably learn faster throughout this book.\n\n\n\n## What is Zig?\n\nZig is a modern, low-level, and general-purpose programming language. Some programmers interpret\nZig as the \"modern C language\". It is a simple language like C, but with some\nmodern features.\n\nIn the author's personal interpretation, Zig is tightly connected with \"less is more\".\nInstead of trying to become a modern language by adding more and more features,\nmany of the core improvements that Zig brings to the\ntable are actually about removing annoying and evil behaviours/features from C and C++.\nIn other words, Zig tries to be better by simplifying the language, and by having more consistent and robust behaviour.\nAs a result, analyzing, writing and debugging applications become much easier and simpler in Zig, than it is in C or C++.\n\nThis philosophy becomes clear with the following phrase from the official website of Zig:\n\n> \"Focus on debugging your application rather than debugging your programming language knowledge\".\n\nThis phrase is specially true for C++ programmers. Because C++ is a gigantic language,\nwith tons of features, and also, there are lots of different \"flavors of C++\". These elements\nare what makes C++ so much complex and hard to learn. Zig tries to go in the opposite direction.\nZig is a very simple language, more closely related to other simple languages such as C and Go.\n\nThe phrase above is still important for C programmers too. Because, even C being a simple\nlanguage, it is still hard sometimes to read and understand C code. For example, pre-processor macros in\nC are an evil source of confusion. They really makes it hard sometimes to debug\nC programs. Because macros are essentially a second language embedded in C that obscures\nyour C code. With macros, you are no longer 100% sure about which pieces\nof code are being sent to the compiler. It obscures the actual source code that you wrote.\n\nYou don't have macros in Zig. In Zig, the code you write, is the actual code that get's compiled by the compiler.\nYou don't have evil features that obscures you code.\nYou also don't have hidden control flow happening behind the scenes. And, you also\ndon't have functions or operators from the standard library that make\nhidden memory allocations behind your back.\n\nBy being a simpler language, Zig becomes much more clear and easier to read/write,\nbut at the same time, it also achieves a much more robust state, with more consistent\nbehaviour in edge situations. Once again, less is more.\n\n\n## Hello world in Zig\n\nWe begin our journey in Zig by creating a small \"Hello World\" program.\nTo start a new Zig project in your computer, you simply call the `init` command\nfrom the `zig` compiler.\nJust create a new directory in your computer, then, init a new Zig project\ninside this directory, like this:\n\n```bash\nmkdir hello_world\ncd hello_world\nzig init\n```\n\n```\ninfo: created build.zig\ninfo: created build.zig.zon\ninfo: created src/main.zig\ninfo: created src/root.zig\ninfo: see `zig build --help` for a menu of options\n```\n\n### Understanding the project files {#sec-project-files}\n\nAfter you run the `init` command from the `zig` compiler, some new files\nare created inside of your current directory. First, a \"source\" (`src`) directory\nis created, containing two files, `main.zig` and `root.zig`. Each `.zig` file\nis a separate Zig module, which is simply a text file that contains some Zig code.\n\n\nThe `main.zig` file for example, contains a `main()` function, which represents\nthe entrypoint of your program. It is where the execution of your program begins.\nAs you would expect from a C, C++, Rust or Go,\nto build an executabe program in Zig, you also need to declare a `main()` function in your module.\nSo, the `main.zig` module represents an executable program written in Zig.\n\nOn the other side, the `root.zig` module does not contain a `main()` function. Because\nit represents a library written in Zig. Libraries are different than executables.\nThey don't need to have an entrypoint to work.\nSo, you can choose which file (`main.zig` or `root.zig`) you want to follow depending on which type\nof project (executable or library) you want to develop.\n\n```bash\ntree .\n```\n\n```\n.\n├── build.zig\n├── build.zig.zon\n└── src\n ├── main.zig\n └── root.zig\n\n1 directory, 4 files\n```\n\n\nNow, in addition to the source directory, two other files were created in our working directory:\n`build.zig` and `build.zig.zon`. The first file (`build.zig`) represents a build script written in Zig.\nThis script is executed when you call the `build` command from the `zig` compiler.\nIn other words, this file contain Zig code that executes the necessary steps to build the entire project.\n\nIn general, low-level languages normally use a compiler to build your\nsource code into binary executables or binary libraries.\nNevertheless, this process of compiling your source code and building\nbinary executables or binary libraries from it, became a real challenge\nin the programming world, once the projects became bigger and bigger.\nAs a result, programmers created \"build systems\", which are a second set of tools designed to make this process\nof compiling and building complex projects, easier.\n\nExamples of build systems are CMake, GNU Make, GNU Autoconf and Ninja,\nwhich are used to build complex C and C++ projects.\nWith these systems, you can write scripts, which are called \"build scripts\".\nThey simply are scripts that describes the necessary steps to compile/build\nyour project.\n\nHowever, these are separate tools, that do not\nbelong to C/C++ compilers, like `gcc` or `clang`.\nAs a result, in C/C++ projects, you have not only to install and\nmanage your C/C++ compilers, but you also have to install and manage\nthese build systems separately.\n\nBut instead of using a separate build system, in Zig, we use the\nZig language itself to write build scripts.\nIn other words, Zig contains a native build system in it. And\nwe can use this build system to write small scripts in Zig,\nwhich describes the necessary steps to build/compile our Zig project[^zig-build-system].\nSo, everything you need to build a complex Zig project is the\n`zig` compiler, and nothing more.\n\n[^zig-build-system]: .\n\n\nNow that we described this topic in more depth, let's focus\non the second generated file (`build.zig.zon`), which is the Zig package manager configuration file,\nwhere you can list and manage the dependencies of your project. Yes, Zig has\na package manager (like `pip` in Python, `cargo` in Rust, or `npm` in Javascript) called Zon,\nand this `build.zig.zon` file is similar to the `package.json` file\nin Javascript projects, or, the `Pipfile` file in Python projects, or the `Cargo.toml` file in Rust projects.\n\n\n### Looking at the `root.zig` file {#sec-root-file}\n\nLet's take a look at the `root.zig` file, and start to analyze some of the\nsyntax of Zig.\nThe first thing that you might notice, is that every line of code\nthat have an expression in it, ends with a semicolon character (`;`). This is\nsimilar syntax to other languages such as C, C++ and Rust,\nwhich have the same rule.\n\nAlso, notice the `@import()` call at the first line. We use this built-in function\nto import functionality from other Zig modules into our current module.\nIn other words, the `@import()` function works similarly to the `#include` pre-processor\nin C or C++, or, to the `import` statement in Python or Javascript code.\nIn this example, we are importing the `std` module,\nwhich gives you access to the Zig standard library.\n\nIn this `root.zig` file, we can also see how assignments (i.e. creating new objects)\nare made in Zig. You can create a new object in Zig by using the following syntax\n`(const|var) name = value;`. In the example below, we are creating two constant\nobjects (`std` and `testing`). At @sec-assignments we talk more about objects in general.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst testing = std.testing;\n\nexport fn add(a: i32, b: i32) i32 {\n return a + b;\n}\n```\n:::\n\n\n\n\nFunctions in Zig are declared similarly to functions in Rust, using the `fn` keyword. In the example above,\nwe are declaring a function called `add()`, which have two arguments named `a` and `b`, and returns\na integer number (`i32`) as result.\n\nMaybe Zig is not exactly a strongly-typed language, because you do not need\nnecessarily to specify the type of every single object you create across your source code.\nBut you do have to explicitly specify the type of every function argument, and also,\nthe return type of every function you create in Zig. So, at least in function declarations,\nZig is a strongly-typed language.\n\nWe specify the type of an object or a function argument in Zig, by\nusing a colon character (`:`) followed by the type after the name of this object/function argument.\nWith the expressions `a: i32` and `b: i32`, we know that, both `a` and `b` arguments have type `i32`,\nwhich is a signed 32 bit integer. In this part,\nthe syntax in Zig is identical to the syntax in Rust, which also specifies types by\nusing the colon character.\n\nLastly, we have the return type of the function at the end of the line, before we open\nthe curly braces to start writing the function's body, which, in the example above is\nagain a signed 32 bit integer (`i32`) value. This specific part is different than it is in Rust.\nBecause in Rust, the return type of a function is specified after an arrow (`->`).\nWhile in Zig, we simply declare the return type directly after the parentheses with the function arguments.\n\nWe also have an `export` keyword before the function declaration. This keyword\nis similar to the `extern` keyword in C. It exposes the function\nto make it available in the library API.\n\nIn other words, if you have a project where you are currently building\na library for other people to use, you need to expose your functions\nso that they are available in the library's API, so that users can use it.\nIf we removed the `export` keyword from the `add()` function declaration,\nthen, this function would be no longer exposed in the library object built\nby the `zig` compiler.\n\n\nHaving that in mind, the keyword `export` is a keyword used in libraries written in Zig.\nSo, if you are not currently writing a library in your project, then, you do not need to\ncare about this keyword.\n\n\n### Looking at the `main.zig` file {#sec-main-file}\n\nNow that we have learned a lot about Zig's syntax from the `root.zig` file,\nlet's take a look at the `main.zig` file.\nA lot of the elements we saw in `root.zig` are also present in `main.zig`.\nBut we have some other elements that we did not have seen yet, so let's dive in.\n\nFirst, look at the return type of the `main()` function in this file.\nWe can see a small change. Now, the return\ntype of the function (`void`) is accompanied by an exclamation mark (`!`).\nWhat this exclamation mark is telling us, is that this `main()` function\nmight also return an error.\n\nSo, in this example, the `main()` function can either return `void`, or, return an error.\nThis is an interesting feature of Zig. If you write a function, and, something inside of\nthe body of this function might return an error, then, you are forced to:\n\n- either add the exclamation mark to the return type of the function, to make it clear that\nthis function might return an error.\n- or explicitly handle this error that might occur inside the function, to make sure that,\nif this error does happen, you are prepared, and your function will no longer return an error\nbecause you handled the error inside your function.\n\nIn most programming languages, we normally handle (or deals with) an error through\na *try catch* pattern, and Zig, this is no different. But, if we look at the `main()` function\nbelow, you can see that we do have a `try` keyword in the 5th line. But we do not have a `catch` keyword\nin this code.\n\nThis means that, we are using the keyword `try` to execute a code that might return an error,\nwhich is the `stdout.print()` expression. But because we do not have a `catch` keyword in this line,\nwe are not treating (or dealing with) this error.\nSo, if this expression do return an error, we are not catching and solving this error in any way.\nThat is why the exclamation mark was added to the return type of the function.\n\nSo, in essence, the `try` keyword executes the expression `stdout.print()`. If this expression\nreturns a valid value, then, the `try` keyword do nothing essentially. It simply passes this value forward. But, if the expression do\nreturn an error, then, the `try` keyword will unwrap and return this error from the function, and also print it's\nstack trace to `stderr`.\n\nThis might sound weird to you, if you come from a high-level language. Because in\nhigh-level languages, such as Python, if an error occurs somewhere, this error is automatically\nreturned and the execution of your program will automatically stops, even if you don't want\nto stop the execution. You are obligated to face the error.\n\nBut if you come from a low-level language, then, maybe, this idea do not sound so weird or distant to you.\nBecause in C for example, normally functions doesn't raise errors, or, they normally don't stop the execution.\nIn C, error handling\nis done by constantly checking the return value of the function. So, you run the function,\nand then, you use an if statement to check if the function returned a value that is valid,\nor, if it returned an error. If an error was returned from the function, then, the if statement\nwill execute some code that fixes this error.\n\nSo, at least for C programmers, they do need to write a lot of if statements to\nconstantly check for errors around their code. And because of that, this simple feature from Zig, might be\nextraordinary for them. Because this `try` keyword can automatically unwrap the error,\nand warn you about this error, and let you deal with it, without any extra work from the programmer.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n:::\n\n\n\n\nNow, another thing that you might have noticed in this code example, is that\nthe `main()` function is marked with the `pub` keyword. This keyword means\n\"public\". It marks the `main()` function as a *public function* from this module.\n\nIn other words, every function that you declare in your Zig module is, by default, a private (or \"static\")\nfunction that belongs to this Zig module, and can only be used (or called) from within this same module.\nUnless, you explicitly mark this function as a public function with the `pub` keyword.\nThis means that the `pub` keyword in Zig do essentially the opposite of what the `static` keyword\ndo in C/C++.\n\nBy making a function \"public\", you allow other Zig modules to access and call this function,\nand use it for they own purposes.\nall these other Zig modules need to do is, to import your module with the `@import()`\nbuilt-in function. Then, they get access to all public functions that are present in\nyour Zig module.\n\n\n### Compiling your source code {#sec-compile-code}\n\nYou can compile your Zig modules into a binary executable by running the `build-exe` command\nfrom the `zig` compiler. You simply list all the Zig modules that you want to build after\nthe `build-exe` command, separated by spaces. In the example below, we are compiling the module `main.zig`.\n\n```bash\nzig build-exe src/main.zig\n```\n\nSince we are building an executable, the `zig` compiler will look for a `main()` function\ndeclared in any of the files that you list after the `build-exe` command. If\nthe compiler does not find a `main()` function declared somewhere, a\ncompilation error will be raised, warning about this mistake.\n\nThe `zig` compiler also offers a `build-lib` and `build-obj` commands, which work\nthe exact same way as the `build-exe` command. The only difference is that, they compile your\nZig modules into a portale C ABI library, or, into object files, respectively.\n\nIn the case of the `build-exe` command, a binary executable file is created by the `zig`\ncompiler in the root directory of your project.\nIf we take a look now at the contents of our current directory, with a simple `ls` command, we can\nsee the binary file called `main` that was created by the compiler.\n\n```bash\nls\n```\n\n```\nbuild.zig build.zig.zon main src\n```\n\nIf I execute this binary executable, I get the \"Hello World\" message in the terminal\n, as we expected.\n\n```bash\n./main\n```\n\n```\nHello, world!\n```\n\n\n### Compile and execute at the same time {#sec-compile-run-code}\n\nOn the previous section, I presented the `zig build-exe` command, which\ncompiles Zig modules into an executable file. However, this means that,\nin order to execute the executable file, we have to run two different commands.\nFirst, the `zig build-exe` command, and then, we call the executable file\ncreated by the compiler.\n\nBut what if we wanted to perform these two steps,\nall at once, in a single command? We can do that by using the `zig run`\ncommand.\n\n```bash\nzig run src/main.zig\n```\n\n```\nHello, world!\n```\n\n### Compiling the entire project {#sec-compile-project}\n\nJust as I described at @sec-project-files, as our project grows in size and\ncomplexity, we usually prefer to organize the compilation and build process\nof the project into a build script, using some sort of \"build system\".\n\nIn other words, as our project grows in size and complexity,\nthe `build-exe`, `build-lib` and `build-obj` commands become\nharder to use directly. Because then, we start to list\nmultiple and multiple modules at the same time. We also\nstart to add built-in compilation flags to customize the\nbuild process for our needs, etc. It becomes a lot of work\nto write the necessary commands by hand.\n\nIn C/C++ projects, programmers normally opt to use CMake, Ninja, `Makefile` or `configure` scripts\nto organize this process. However, in Zig, we have a native build system in the language itself.\nSo, we can write build scripts in Zig to compile and build Zig projects. Then, all we\nneed to do, is to call the `zig build` command to build our project.\n\nSo, when you execute the `zig build` command, the `zig` compiler will search\nfor a Zig module named `build.zig` inside your current directory, which\nshould be your build script, containing the necessary code to compile and\nbuild your project. If the compiler do find this `build.zig` file in your directory,\nthen, the compiler will essentially execute a `zig run` command\nover this `build.zig` file, to compile and execute this build\nscript, which in turn, will compile and build your entire project.\n\n\n```bash\nzig build\n```\n\n\nAfter you execute this \"build project\" command, a `zig-out` directory\nis created in the root of your project directory, where you can find\nthe binary executables and libraries created from your Zig modules\naccordingly to the build commands that you specified at `build.zig`.\nWe will talk more about the build system in Zig latter in this book.\n\nIn the example below, I'm executing the binary executable\nnamed `hello_world` that was generated by the compiler after the\n`zig build` command.\n\n```bash\n./zig-out/bin/hello_world\n```\n\n```\nHello, world!\n```\n\n\n\n## How to learn Zig?\n\nWhat are the best strategies to learn Zig? \nFirst of all, of course this book will help you a lot on your journey through Zig.\nBut you will also need some extra resources if you want to be really good at Zig.\n\nAs a first tip, you can join a community with Zig programmers to get some help\n, when you need it:\n\n- Reddit forum: ;\n- Ziggit community: ;\n- Discord, Slack, Telegram, and others: ;\n\nNow, one of the best ways to learn Zig is to simply read Zig code. Try\nto read Zig code often, and things will become more clear.\nA C/C++ programmer would also probably give you this same tip.\nBecause this strategy really works!\n\nNow, where you can find Zig code to read?\nI personally think that, the best way of reading Zig code is to read the source code of the\nZig Standard Library. The Zig Standard Library is available at the [`lib/std` folder](https://github.com/ziglang/zig/tree/master/lib/std)[^zig-lib-std] on\nthe official GitHub repository of Zig. Access this folder, and start exploring the Zig modules.\n\nAlso, a great alternative is to read code from other large Zig\ncodebases, such as:\n\n1. the [Javascript runtime Bun](https://github.com/oven-sh/bun)[^bunjs].\n1. the [game engine Mach](https://github.com/hexops/mach)[^mach].\n1. a [LLama 2 LLM model implementation in Zig](https://github.com/cgbur/llama2.zig/tree/main)[^ll2].\n1. the [financial transactions database `tigerbeetle`](https://github.com/tigerbeetle/tigerbeetle)[^tiger].\n1. the [command-line arguments parser `zig-clap`](https://github.com/Hejsil/zig-clap)[^clap].\n1. the [UI framework `capy`](https://github.com/capy-ui/capy)[^capy].\n1. the [Language Protocol implementation for Zig, `zls`](https://github.com/zigtools/zls)[^zls].\n1. the [event-loop library `libxev`](https://github.com/mitchellh/libxev)[^xev].\n\n[^xev]: \n[^zls]: \n[^capy]: \n[^clap]: \n[^tiger]: \n[^ll2]: \n[^mach]: \n[^bunjs]: .\n\nAll these assets are available on GitHub,\nand this is great, because we can use the GitHub search bar in our advantage,\nto find Zig code that fits our description.\nFor example, you can always include `lang:Zig` in the GitHub search bar when you\nare searching for a particular pattern. This will limit the search to only Zig modules.\n\n[^zig-lib-std]: \n\nAlso, a great alternative is to consult online resources and documentations.\nHere is a quick list of resources that I personally use from time to time to learn\nmore about the language each day:\n\n- Zig Language Reference: ;\n- Zig Standard Library Reference: ;\n- Zig Guide: ;\n- Karl Seguin Blog: ;\n- Zig News: ;\n- Read the code written by one of the Zig core team members: ;\n- Some livecoding sessions are transmitted in the Zig Showtime Youtube Channel: ;\n\n\nAnother great strategy to learn Zig, or honestly, to learn any language you want,\nis to practice it by solving exercises. For example, there is a famous repository\nin the Zig community called [Ziglings](https://codeberg.org/ziglings/exercises/)[^ziglings]\n, which contains more than 100 small exercises that you can solve. It is a repository of\ntiny programs written in Zig that are currently broken, and your responsibility is to\nfix these programs, and make them work again.\n\n[^ziglings]: .\n\nA famous tech YouTuber known as *The Primeagen* also posted some videos (at YouTube)\nwhere he solves these exercises from Ziglings. The first video is named\n[\"Trying Zig Part 1\"](https://www.youtube.com/watch?v=OPuztQfM3Fg&t=2524s&ab_channel=TheVimeagen)[^prime1].\n\n[^prime1]: .\n\nAnother great alternative, is to solve the [Advent of Code exercises](https://adventofcode.com/)[^advent-code].\nThere are people that already took the time to learn and solve the exercises, and they posted\ntheir solutions on GitHub as well, so, in case you need some resource to compare while solving\nthe exercises, you can look at these two repositories:\n\n- ;\n- ;\n\n[^advent-code]: \n\n\n\n\n\n\n## Creating new objects in Zig (i.e. identifiers) {#sec-assignments}\n\nLet's talk more about objects in Zig. Readers that have past experience\nwith other programming languages might know this concept through\na different name, such as: \"variable\" or \"identifier\". In this book, I choose\nto use the term \"object\" to refer to this concept.\n\nTo create a new object (or a new \"identifier\") in Zig, we use\nthe keywords `const` or `var`. These keywords specificy if the object\nthat you are creating is mutable or not.\nIf you use `const`, then the object you are\ncreating is a constant (or immutable) object, which means that once you declare this object, you\ncan no longer change the value stored inside this object.\n\nOn the other side, if you use `var`, then, you are creating a variable (or mutable) object.\nYou can change the value of this object as many times you want. Using the\nkeyword `var` in Zig is similar to using the keywords `let mut` in Rust.\n\n### Constant objects vs variable objects\n\nIn the code example below, we are creating a new constant object called `age`.\nThis object stores a number representing the age of someone. However, this code example\ndoes not compiles succesfully. Because on the next line of code, we are trying to change the value\nof the object `age` to 25.\n\nThe `zig` compiler detects that we are trying to change\nthe value of an object/identifier that is constant, and because of that,\nthe compiler will raise a compilation error, warning us about the mistake.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 24;\n// The line below is not valid!\nage = 25;\n```\n:::\n\n\n\n\n```\nt.zig:10:5: error: cannot assign to constant\n age = 25;\n ~~^~~\n```\n\nIn contrast, if you use `var`, then, the object created is a variable object.\nWith `var` you can declare this object in your source code, and then,\nchange the value of this object how many times you want over future points\nin your source code.\n\nSo, using the same code example exposed above, if I change the declaration of the\n`age` object to use the `var` keyword, then, the program gets compiled succesfully.\nBecause now, the `zig` compiler detects that we are changing the value of an\nobject that allows this behaviour, because it is an \"variable object\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = 24;\nage = 25;\n```\n:::\n\n\n\n\n\n### Declaring without an initial value\n\nBy default, when you declare a new object in Zig, you must give it\nan initial value. In other words, this means\nthat we have to declare, and, at the same time, initialize every object we\ncreate in our source code.\n\nOn the other hand, you can, in fact, declare a new object in your source code,\nand not give it an explicit value. But we need to use a special keyword for that,\nwhich is the `undefined` keyword.\n\nIs important to emphasize that, you should avoid using `undefined` as much as possible.\nBecause when you use this keyword, you leave your object uninitialized, and, as a consequence,\nif for some reason, your code use this object while it is uninitialized, then, you will definitely\nhave undefined behaviour and major bugs in your program.\n\nIn the example below, I'm declaring the `age` object again. But this time,\nI do not give it an initial value. The variable is only initialized at\nthe second line of code, where I store the number 25 in this object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = undefined;\nage = 25;\n```\n:::\n\n\n\n\nHaving these points in mind, just remember that you should avoid as much as possible to use `undefined` in your code.\nAlways declare and initialize your objects. Because this gives you much more safety in your program.\nBut in case you really need to declare an object without initializing it... the\n`undefined` keyword is the way to do it in Zig.\n\n\n### There is no such thing as unused objects\n\nEvery object (being constant or variable) that you declare in Zig **must be used in some way**. You can give this object\nto a function call, as a function argument, or, you can use it in another expression\nto calculate the value of another object, or, you can call a method that belongs to this\nparticular object. \n\nIt doesn't matter in which way you use it. As long as you use it.\nIf you try to break this rule, i.e. if your try to declare a object, but not use it,\nthe `zig` compiler will not compile your Zig source code, and it will issue a error\nmessage warning that you have unused objects in your code.\n\nLet's demonstrate this with an example. In the source code below, we declare a constant object\ncalled `age`. If you try to compile a simple Zig program with this line of code below,\nthe compiler will return an error as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 15;\n```\n:::\n\n\n\n\n```\nt.zig:4:11: error: unused local constant\n const age = 15;\n ^~~\n```\n\nEverytime you declare a new object in Zig, you have two choices:\n\n1. you either use the value of this object;\n2. or you explicitly discard the value of the object;\n\nTo explicitly discard the value of any object (constant or variable), all you need to do is to assign\nthis object to an special character in Zig, which is the underscore (`_`).\nWhen you assign an object to a underscore, like in the example below, the `zig` compiler will automatically\ndiscard the value of this particular object.\n\nYou can see in the example below that, this time, the compiler did not\ncomplain about any \"unused constant\", and succesfully compiled our source code.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It compiles!\nconst age = 15;\n_ = age;\n```\n:::\n\n\n\n\nNow, remember, everytime you assign a particular object to the underscore, this object\nis essentially destroyed. It is discarded by the compiler. This means that you can no longer\nuse this object further in your code. It doesn't exist anymore.\n\nSo if you try to use the constant `age` in the example below, after we discarded it, you\nwill get a loud error message from the compiler (talking about a \"pointless discard\")\nwarning you about this mistake.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It does not compile.\nconst age = 15;\n_ = age;\n// Using a discarded value!\nstd.debug.print(\"{d}\\n\", .{age + 2});\n```\n:::\n\n\n\n\n```\nt.zig:7:5: error: pointless discard\n of local constant\n```\n\n\nThis same rule applies to variable objects. Every variable object must also be used in\nsome way. And if you assign a variable object to the underscore,\nthis object also get's discarded, and you can no longer use this object.\n\n\n\n### You must mutate every variable objects\n\nEvery variable object that you create in your source code must be mutated at some point.\nIn other words, if you declare an object as a variable\nobject, with the keyword `var`, and you do not change the value of this object\nat some point in the future, the `zig` compiler will detect this,\nand it will raise an error warning you about this mistake.\n\nThe concept behind this is that every object you create in Zig should be preferably a\nconstant object, unless you really need an object whose value will\nchange during the execution of your program.\n\nSo, if I try to declare a variable object such as `where_i_live` below,\nand I do not change the value of this object in some way,\nthe `zig` compiler raises an error message with the phrase \"variable is never mutated\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar where_i_live = \"Belo Horizonte\";\n_ = where_i_live;\n```\n:::\n\n\n\n\n```\nt.zig:7:5: error: local variable is never mutated\nt.zig:7:5: note: consider using 'const'\n```\n\n## Primitive Data Types {#sec-primitive-data-types}\n\nZig have many different primitive data types available for you to use.\nYou can see the full list of available data types at the official\n[Language Reference page](https://ziglang.org/documentation/master/#Primitive-Types)[^lang-data-types].\n\n[^lang-data-types]: .\n\nBut here is a quick list:\n\n- Unsigned integers: `u8`, 8-bit integer; `u16`, 16-bit integer; `u32`, 32-bit integer; `u64`, 64-bit integer; `u128`, 128-bit integer.\n- Signed integers: `i8`, 8-bit integer; `i16`, 16-bit integer; `i32`, 32-bit integer; `i64`, 64-bit integer; `i128`, 128-bit integer.\n- Float number: `f16`, 16-bit floating point; `f32`, 32-bit floating point; `f64`, 64-bit floating point; `f128`, 128-bit floating point;\n- Boolean: `bool`, represents true or false values.\n- C ABI compatible types: `c_long`, `c_char`, `c_short`, `c_ushort`, `c_int`, `c_uint`, and many others.\n- Pointer sized integers: `isize` and `usize`.\n\n\n\n\n\n\n\n## Arrays {#sec-arrays}\n\nYou create arrays in Zig by using a syntax that resembles the C syntax.\nFirst, you specify the size of the array (i.e. the number of elements that will be stored in the array)\nyou want to create inside a pair of brackets.\n\nThen, you specify the data type of the elements that will be stored inside this array.\nAll elements present in an array in Zig must have the same data type. For example, you cannot mix elements\nof type `f32` with elements of type `i32` in the same array.\n\nAfter that, you simply list the values that you want to store in this array inside\na pair of curly braces.\nIn the example below, I am creating two constant objets that contain different arrays.\nThe first object contains an array of 4 integer values, while the second object,\nan array of 3 floating point values.\n\nNow, you should notice that in the object `ls`, I am\nnot explicitly specifying the size of the array inside of the brackets. Instead\nof using a literal value (like the value 4 that I used in the `ns` object), I am\nusing the special character underscore (`_`). This syntax tells the `zig` compiler\nto fill this field with the number of elements listed inside of the curly braces.\nSo, this syntax `[_]` is for lazy (or smart) programmers who leave the job of\ncounting how many elements there are in the curly braces for the compiler.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst ls = [_]f64{432.1, 87.2, 900.05};\n_ = ns; _ = ls;\n```\n:::\n\n\n\n\nIs worth noting that these are static arrays, meaning that\nthey cannot grow in size.\nOnce you declare your array, you cannot change the size of it.\nThis is very commom in low level languages.\nBecause low level languages normally wants to give you (the programmer) full control over memory,\nand the way in which arrays are expanded is tightly related to\nmemory management.\n\n\n### Selecting elements of the array\n\nOne very commom activity is to select specific portions of an array\nyou have in your source code.\nIn Zig, you can select a specific element from your\narray, by simply providing the index of this particular\nelement inside brackets after the object name.\nIn the example below, I am selecting the third element from the\n`ns` array. Notice that Zig is a \"zero-index\" based language,\nlike C, C++, Rust, Python, and many other languages.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\ntry stdout.print(\"{d}\\n\", .{ ns[2] });\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n12\n```\n\n\n:::\n:::\n\n\n\n\nIn contrast, you can also select specific slices (or sections) of your array, by using a\nrange selector. Some programmers also call these selectors of \"slice selectors\",\nand they also exist in Rust, and have the exact same syntax as in Zig.\nAnyway, a range selector is a special expression in Zig that defines\na range of indexes, and it have the syntax `start..end`.\n\nIn the example below, at the second line of code,\nthe `sl` object stores a slice (or a portion) of the\n`ns` array. More precisely, the elements at index 1 and 2\nin the `ns` array. \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\n_ = sl;\n```\n:::\n\n\n\n\nWhen you use the `start..end` syntax,\nthe \"end tail\" of the range selector is non-inclusive,\nmeaning that, the index at the end is not included in the range that is\nselected from the array.\nTherefore, the syntax `start..end` actually means `start..end - 1` in practice.\n\nYou can for example, create a slice that goes from the first to the\nlast elements of the array, by using `ar[0..ar.len]` syntax\nIn other words, it is a slice that\naccess all elements in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [4]u8{48, 24, 12, 6};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\nYou can also use the syntax `start..` in your range selector.\nWhich tells the `zig` compiler to select the portion of the array\nthat begins at the `start` index until the last element of the array.\nIn the example below, we are selecting the range from index 1\nuntil the end of the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..];\n_ = sl;\n```\n:::\n\n\n\n\n\n### More on slices\n\nAs we discussed before, in Zig, you can select specific portions of an existing\narray. This is called *slicing* in Zig [@zigguide], because when you select a portion\nof an array, you are creating a slice object from that array.\n\nA slice object is essentially a pointer object accompained by a length number.\nThe pointer object points to the first element in the slice, and the\nlength number tells the `zig` compiler how many elements there are in this slice.\n\n> Slices can be thought of as a pair of `[*]T` (the pointer to the data) and a `usize` (the element count) [@zigguide].\n\nThrough the pointer contained inside the slice you can access the elements (or values)\nthat are inside this range (or portion) that you selected from the original array.\nBut the length number (which you can access through the `len` property of your slice object)\nis the really big improvement (over C arrays for example) that Zig brings to the table here.\n\nBecause with this length number\nthe `zig` compiler can easily check if you are trying to access an index that is out of the bounds of this particular slice,\nor, if you are causing any buffer overflow problems. In the example below,\nwe access the `len` property of the slice `sl`, which tells us that this slice\nhave 2 elements in it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\ntry stdout.print(\"{d}\\n\", .{sl.len});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n2\n```\n\n\n:::\n:::\n\n\n\n\n\n### Array operators\n\nThere are two array operators available in Zig that are very useful.\nThe array concatenation operator (`++`), and the array multiplication operator (`**`). As the name suggests,\nthese are array operators.\n\nOne important detail about these two operators is that they work\nonly when both operands have a size (or \"length\") that is compile-time known.\nWe are going to talk more about\nthe differences between \"compile-time known\" and \"runtime known\" at @sec-compile-time.\nBut for now, keep this information in mind, that you cannot use these operators in every situation.\n\nIn summary, the `++` operator creates a new array that is the concatenation,\nof both arrays provided as operands. So, the expression `a ++ b` produces\na new array which contains all the elements from arrays `a` and `b`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst b = [_]u8{4,5};\nconst c = a ++ b;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 4, 5 }\n```\n\n\n:::\n:::\n\n\n\n\nThis `++` operator is particularly useful to concatenate strings together.\nStrings in Zig are described in depth at @sec-zig-strings. In summary, a string object in Zig\nis essentially an arrays of bytes. So, you can use this array concatenation operator\nto effectively concatenate strings together.\n\nIn contrast, the `**` operator is used to replicate an array multiple\ntimes. In other words, the expression `a ** 3` creates a new array\nwhich contains the elements of the array `a` repeated 3 times.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst c = a ** 2;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 1, 2, 3 }\n```\n\n\n:::\n:::\n\n\n\n\n\n## Blocks and scopes {#sec-blocks}\n\nBlocks are created in Zig by a pair of curly braces. A block is just a group of\nexpressions (or statements) contained inside of a pair of curly braces. All of these expressions that\nare contained inside of this pair of curly braces belongs to the same scope.\n\nIn other words, a block just delimits a scope in your code.\nThe objects that you define inside the same block belongs to the same\nscope, and, therefore, are accessible from within this scope.\nAt the same time, these objects are not accessible outside of this scope.\nSo, you could also say that blocks are used to limit the scope of the objects that you create in\nyour source code. In less technical terms, blocks are used to specify where in your source code\nyou can access whatever object you have in your source code.\n\nSo, a block is just a group of expressions contained inside a pair of curly braces.\nAnd every block have it's own scope separated from the others.\nThe body of a function is a classic example of a block. If statements, for and while loops\n(and any other structure in the language that uses the pair of curly braces)\nare also examples of blocks.\n\nThis means that, every if statement, or for loop,\netc., that you create in your source code have it's own separate scope.\nThat is why you can't access the objects that you defined inside\nof your for loop (or if statement) in an outer scope, i.e. a scope outside of the for loop.\nBecause you are trying to access an object that belongs to a scope that is different\nthan your current scope.\n\n\nYou can create blocks within blocks, with multiple levels of nesting.\nYou can also (if you want to) give a label to a particular block, with the colon character (`:`).\nJust write `label:` before you open the pair of curly braces that delimits your block. When you label a block\nin Zig, you can use the `break` keyword to return a value from this block, like as if it\nwas a function's body. You just write the `break` keyword, followed by the block label in the format `:label`,\nand the expression that defines the value that you want to return.\n\nLike in the example below, where we are returning the value from the `y` object\nfrom the block `add_one`, and saving the result inside the `x` object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar y: i32 = 123;\nconst x = add_one: {\n y += 1;\n break :add_one y;\n};\nif (x == 124 and y == 124) {\n try stdout.print(\"Hey!\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHey!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\n## How strings work in Zig? {#sec-zig-strings}\n\nThe first project that we are going to build and discuss in this book is a base64 encoder/decoder (@sec-base64).\nBut in order for us to build such a thing, we need to get a better understanding on how strings work in Zig.\nSo let's discuss this specific aspect of Zig.\n\nIn Zig, a string literal value is just a pointer to a null-terminated array of bytes (i.e. the same thing as a C string).\nHowever, a string object in Zig is a little more than just a pointer. A string object\nin Zig is an object of type `[]const u8`, and, this object always contains two things: the\nsame null-terminated array of bytes that you would find in a string literal value, plus a length value.\nEach byte in this \"array of bytes\" is represented by an `u8` value, which is an unsigned 8 bit integer,\nso, it is equivalent to the C data type `unsigned char`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This is a string literal value:\n\"A literal value\";\n// This is a string object:\nconst object: []const u8 = \"A string object\";\n```\n:::\n\n\n\n\nZig always assumes that this sequence of bytes is UTF-8 encoded. This might not be true for every\nsequence of bytes you have it, but is not really Zig's job to fix the encoding of your strings\n(you can use [`iconv`](https://www.gnu.org/software/libiconv/)[^libiconv] for that).\nToday, most of the text in our modern world, specially on the web, should be UTF-8 encoded.\nSo if your string literal is not UTF-8 encoded, then, you will likely\nhave problems in Zig.\n\n[^libiconv]: \n\nLet’s take for example the word \"Hello\". In UTF-8, this sequence of characters (H, e, l, l, o)\nis represented by the sequence of decimal numbers 72, 101, 108, 108, 111. In xecadecimal, this\nsequence is `0x48`, `0x65`, `0x6C`, `0x6C`, `0x6F`. So if I take this sequence of hexadecimal values,\nand ask Zig to print this sequence of bytes as a sequence of characters (i.e. a string), then,\nthe text \"Hello\" will be printed into the terminal:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const bytes = [_]u8{0x48, 0x65, 0x6C, 0x6C, 0x6F};\n try stdout.print(\"{s}\\n\", .{bytes});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello\n```\n\n\n:::\n:::\n\n\n\n\n\nIf you want to see the actual bytes that represents a string in Zig, you can use\na `for` loop to iterate through each byte in the string, and ask Zig to print each byte as an hexadecimal\nvalue to the terminal. You do that by using a `print()` statement with the `X` formatting specifier,\nlike you would normally do with the [`printf()` function](https://cplusplus.com/reference/cstdio/printf/)[^printfs] in C.\n\n[^printfs]: \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |byte| {\n try stdout.print(\"{X} \", .{byte});\n }\n try stdout.print(\"\\n\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: 54 68 69 \n 73 20 69 73 20 61 6E 20 65 78 61 6D 70 6C 65 20 6F\n F 66 20 73 74 72 69 6E 67 20 6C 69 74 65 72 61 6C 2\n 20 69 6E 20 5A 69 67 \n```\n\n\n:::\n:::\n\n\n\n\n### Strings in C\n\nAt first glance, this looks very similar to how C treats strings as well. In more details, string values\nin C are treated internally as an array of arbitrary bytes, and this array is also null-terminated.\n\nBut one key difference between a Zig string and a C string, is that Zig also stores the length of\nthe array inside the string object. This small detail makes your code safer, because is much\neasier for the Zig compiler to check if you are trying to access an element that is \"out of bounds\", i.e. if\nyour trying to access memory that does not belong to you.\n\nTo achieve this same kind of safety in C, you have to do a lot of work that kind of seems pointless.\nSo getting this kind of safety is not automatic and much harder to do in C. For example, if you want\nto track the length of your string troughout your program in C, then, you first need to loop through\nthe array of bytes that represents this string, and find the null element (`'\\0'`) position to discover\nwhere exactly the array ends, or, in other words, to find how much elements the array of bytes contain.\n\nTo do that, you would need something like this in C. In this example, the C string stored in\nthe object `array` is 25 bytes long:\n\n```c\n#include \nint main() {\n char* array = \"An example of string in C\";\n int index = 0;\n while (1) {\n if (array[index] == '\\0') {\n break;\n }\n index++;\n }\n printf(\"Number of elements in the array: %d\\n\", index);\n}\n```\n\n```\nNumber of elements in the array: 25\n```\n\nBut in Zig, you do not have to do this, because the object already contains a `len`\nfield which stores the length information of the array. As an example, the `string_object` object below is 43 bytes long:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"{d}\\n\", .{string_object.len});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n43\n```\n\n\n:::\n:::\n\n\n\n\n\n### A better look at the object type\n\nNow, we can inspect better the type of objects that Zig create. To check the type of any object in Zig, you can use the\n`@TypeOf()` function. If we look at the type of the `simple_array` object below, you will find that this object\nis a array of 4 elements. Each element is a signed integer of 32 bits which corresponds to the data type `i32` in Zig.\nThat is what an object of type `[4]i32` is.\n\nBut if we look closely at the type of the `string_object` object below, you will find that this object is a\nconstant pointer (hence the `*const` annotation) to an array of 43 elements (or 43 bytes). Each element is a\nsingle byte (more precisely, an unsigned 8 bit integer - `u8`), that is why we have the `[43:0]u8` portion of the type below.\nIn other words, the string stored inside the `string_object` object is 43 bytes long.\nThat is why you have the type `*const [43:0]u8` below.\n\nIn the case of `string_object`, it is a constant pointer (`*const`) because the object `string_object` is declared\nas constant in the source code (in the line `const string_object = ...`). So, if we changed that for some reason, if\nwe declare `string_object` as a variable object (i.e. `var string_object = ...`), then, `string_object` would be\njust a normal pointer to an array of unsigned 8-bit integers (i.e. `* [43:0]u8`).\n\nNow, if we create an pointer to the `simple_array` object, then, we get a constant pointer to an array of 4 elements (`*const [4]i32`),\nwhich is very similar to the type of the `string_object` object. This demonstrates that a string object (or a string literal)\nin Zig is already a pointer to an array.\n\nJust remember that a \"pointer to an array\" is different than an \"array\". So a string object in Zig is a pointer to an array\nof bytes, and not simply an array of bytes.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n const simple_array = [_]i32{1, 2, 3, 4};\n try stdout.print(\"Type of array object: {}\", .{@TypeOf(simple_array)});\n try stdout.print(\n \"Type of string object: {}\",\n .{@TypeOf(string_object)}\n );\n try stdout.print(\n \"Type of a pointer that points to the array object: {}\",\n .{@TypeOf(&simple_array)}\n );\n}\n```\n:::\n\n\n\n\n```\nType of array object: [4]i32\nType of string object: *const [43:0]u8\nType of a pointer that points to\n the array object: *const [4]i32\n```\n\n\n### Byte vs unicode points\n\nIs important to point out that each byte in the array is not necessarily a single character.\nThis fact arises from the difference between a single byte and a single unicode point.\n\nThe encoding UTF-8 works by assigning a number (which is called a unicode point) to each character in\nthe string. For example, the character \"H\" is stored in UTF-8 as the decimal number 72. This means that\nthe number 72 is the unicode point for the character \"H\". Each possible character that can appear in a\nUTF-8 encoded string have its own unicode point.\n\nFor example, the Latin Capital Letter A With Stroke (Ⱥ) is represented by the number (or the unicode point)\n570. However, this decimal number (570) is higher than the maximum number stored inside a single byte, which\nis 255. In other words, the maximum decimal number that can be represented with a single byte is 255. That is why,\nthe unicode point 570 is actually stored inside the computer’s memory as the bytes `C8 BA`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"Ⱥ\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |char| {\n try stdout.print(\"{X} \", .{char});\n }\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: C8 BA \n```\n\n\n:::\n:::\n\n\n\n\n\nThis means that to store the character Ⱥ in an UTF-8 encoded string, we need to use two bytes together\nto represent the number 570. That is why the relationship between bytes and unicode points is not always\n1 to 1. Each unicode point is a single character in the string, but not always a single byte corresponds\nto a single unicode point.\n\nAll of this means that if you loop trough the elements of a string in Zig, you will be looping through the\nbytes that represents that string, and not through the characters of that string. In the Ⱥ example above,\nthe for loop needed two iterations (instead of a single iteration) to print the two bytes that represents this Ⱥ letter.\n\nNow, all english letters (or ASCII letters if you prefer) can be represented by a single byte in UTF-8. As a\nconsequence, if your UTF-8 string contains only english letters (or ASCII letters), then, you are lucky. Because\nthe number of bytes will be equal to the number of characters in that string. In other words, in this specific\nsituation, the relationship between bytes and unicode points is 1 to 1.\n\nBut on the other side, if your string contains other types of letters… for example, you might be working with\ntext data that contains, chinese, japanese or latin letters, then, the number of bytes necessary to represent\nyour UTF-8 string will likely be much higher than the number of characters in that string.\n\nIf you need to iterate through the characters of a string, instead of its bytes, then, you can use the\n`std.unicode.Utf8View` struct to create an iterator that iterates through the unicode points of your string.\n\nIn the example below, we loop through the japanese characters “アメリカ”. Each of the four characters in\nthis string is represented by three bytes. But the for loop iterates four times, one iteration for each\ncharacter/unicode point in this string:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n var utf8 = (\n (try std.unicode.Utf8View.init(\"アメリカ\"))\n .iterator()\n );\n while (utf8.nextCodepointSlice()) |codepoint| {\n try stdout.print(\n \"got codepoint {}\\n\",\n .{std.fmt.fmtSliceHexUpper(codepoint)}\n );\n }\n}\n```\n:::\n\n\n\n\n```\ngot codepoint E382A2\ngot codepoint E383A1\ngot codepoint E383AA\ngot codepoint E382AB\n```\n\n\n\n## Safety in Zig\n\nA general trend in modern low-level programming languages is safety. As our modern world\nbecome more interconnected with techology and computers,\nthe data produced by all of this technology becomes one of the most important\n(and also, one of the most dangerous) assets that we have.\n\nThis is probably the main reason why modern low-level programming languages\nhave been giving great attention to safety, specially memory safety, because\nmemory corruption is still the main target for hackers to exploit.\nThe reality is that we don't have an easy solution for this problem.\nFor now, we only have techniques and strategies that mitigates these\nproblems.\n\nAs Richard Feldman explains on his [most recent GOTO conference talk](https://www.youtube.com/watch?v=jIZpKpLCOiU&ab_channel=GOTOConferences)[^gotop]\n, we haven't figured it out yet a way to achieve **true safety in technology**.\nIn other words, we haven't found a way to build software that won't be exploited\nwith 100% certainty. We can greatly reduce the risks of our software being\nexploited, by ensuring memory safety for example. But this is not enough\nto achieve \"true safety\" territory.\n\nBecause even if you write your program in a \"safe language\", hackers can still\nexploit failures in the operational system where your program is running (e.g. maybe the\nsystem where your code is running have a \"backdoor exploit\" that can still\naffect your code in unexpected ways), or also, they can exploit the features\nfrom the architecture of your computer. A recently found exploit\nthat involves memory invalidation through a feature of \"memory tags\"\npresent in ARM chips is an example of that [@exploit1].\n\n[^gotop]: \n\nThe question is: what Zig and other languages have been doing to mitigate this problem?\nIf we take Rust as an example, Rust is, for the most part[^rust-safe], a memory safe\nlanguage by enforcing specific rules to the developer. In other words, the key feature\nof Rust, the *borrow checker*, forces you to follow a specific logic when you are writing\nyour Rust code, and the Rust compiler will always complain everytime you try to go out of this\npattern.\n\n[^rust-safe]: Actually, a lot of existing Rust code is still memory unsafe, because they communicate with external libraries through FFI (*foreign function interface*), which disables the borrow-checker features through the `unsafe` keyword.\n\n\nIn contrast, the Zig language is not a memory safe language by default.\nInstead of forcing the developer to follow a specific rule, the Zig language\nachieves memory safety by offering tools that the developer can use for this purpose.\nIn other words, the `zig` compiler does not obligates you to use such tools.\nBut there is often no reason to not use these tools in your Zig code,\nso you often achieve a similar level of memory safety of Rust in Zig\nby simply using these tools.\n\nThe tools listed below are related to memory safety in Zig. That is, they help you to achieve\nmemory safety in your Zig code:\n\n- `defer` allows you to keep free operations phisically close to allocations. This helps you to avoid memory leaks, \"use after free\", and also \"double-free\" problems. Furthermore, it also keeps free operations logically tied to the end of the current scope, which greatly reduces the mental overhead about object lifetime.\n- `errdefer` helps you to garantee that your program frees the allocated memory, even if a runtime error occurs.\n- pointers and object are non-nullable by default. This helps you to avoid memory problems that might arise from de-referencing null pointers.\n- Zig offers some native types of allocators (called \"testing allocators\") that can detect memory leaks and double-frees. These types of allocators are widely used on unit tests, so they make your unit tests a weapon that you can use to detect memory problems in your code.\n- arrays and slices in Zig have their lengths embedded in the object itself, which makes the `zig` compiler very effective on detecting \"index out-of-range\" type of errors, and avoiding buffer overflows.\n\n\nDespite these features that Zig offers that are related to memory safety issues, the language\nalso have some rules that help you to achieve another type of safety, which is more related to\nprogram logic safety. These rules are:\n\n- pointers and objects are non-nullable by default. Which eliminates an edge case that might break the logic of your program.\n- switch statements must exaust all possible options.\n- the `zig` compiler forces you to handle every possible error.\n\n\n## Other parts of Zig\n\nWe already learned a lot about Zig's syntax, and also, some pretty technical\ndetails about it. Just as a quick recap:\n\n- We talked about how functions are written in Zig at @sec-root-file and @sec-main-file.\n- How to create new objects/identifiers at @sec-root-file and specially at @sec-assignments.\n- How strings work in Zig at @sec-zig-strings.\n- How to use arrays and slices at @sec-arrays.\n- How to import functionality from other Zig modules at @sec-root-file.\n\n\nBut, for now, this amount of knowledge is enough for us to continue with this book.\nLater, over the next chapters we will still talk more about other parts of\nZig's syntax that are also equally important as the other parts. Such as:\n\n\n- How Object-Oriented programming can be done in Zig through *struct declarations* at @sec-structs-and-oop.\n- Basic control flow syntax at @sec-zig-control-flow.\n- Enums at @sec-enum;\n- Pointers and Optionals at @sec-pointer;\n- Error handling with `try` and `catch` at @sec-error-handling;\n- Unit tests at @sec-unittests;\n- Vectors;\n- Build System at @sec-build-system;\n\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/Chapters/02-debugging/execute-results/html.json b/_freeze/Chapters/02-debugging/execute-results/html.json index c53418cc..9e64b5ea 100644 --- a/_freeze/Chapters/02-debugging/execute-results/html.json +++ b/_freeze/Chapters/02-debugging/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Debugging Zig applications\n\nBeing able to debug your programs is essential to any programmer who wants to\ndo serious programming in any language. That is why, in this chapter, we are going to talk about the\navailable strategies and tools to debug applications written in Zig.\n\n\n## Printing debugging\n\nWe begin with the classic and battle-tested *print debugging* strategy.\nThe key advantage that debugging offers you is *visibility*.\nWith *print statements* you can easily see what results and objects\nthat are being generated within your functions.\n\nThat is the essence of *print debugging*. Is to use\nprint expressions to see the values that are being generated by your program,\nand, as a result, get a much better understanding of how your program\nis behaving.\n\nMany programmers often resort to the print functions in Zig, such as the `stdout.print()`,\nor, the `std.debug.print()`, to get a better understanding of their programs.\nThis is an known and old strategy that is very simple and effective, and it is better known within\nthe programming community as *print debugging*.\nIn Zig, you can either print information to the `stdout` or `stderr` streams of your system.\n\nLet's begin with `stdout`. First, you\nneed to get access to the `stdout`, by calling the `getStdOut()` method, from\nthe Zig standard library. This method returns a *file descriptor* object,\nand, through this object you can read/write to the `stdout`.\nI recommend you to check out all methods available in this object, by [checking the page in\nthe Zig Standard Library Official Reference for the type `File`](https://ziglang.org/documentation/master/std/#std.fs.File)[^zig-fiile-reference].\n\n[^zig-fiile-reference]: .\n\nFor our purpose here, which is to write something to the `stdout`, specially to debug our\nprogram, I recommend you to use the `writer()` method, which gives your a *writer* object.\nThis *writer* object offers some helper methods to write stuff into the file descriptor object\nthat represents the `stdout` stream. In special, the `print()` method.\n\n\nThe `print()` method from this *writer* object is a \"print formatter\" type of a function.\nIn other words, this method works exactly like the `printf()` function from C,\nor, like `println!()` from Rust.\nIn the first argument of the function, you specify a template string, and,\nin the second argument, you provide a list of values (or objects) that you want to insert\ninto your template message.\n\nIdeally, the template string in the first argument should contain some format specifier.\nEach format specifier is matched to a value (or object) that you listed in the second argument.\nSo, if you provided 5 different objects in the second argument, then, the template string\nshould contain 5 format specifiers, one for each object provided.\n\nEach format specifier is represented by a single letter, and\nyou provide this format specifier inside a pair of curly braces. So, if you want to format\nyour object using the string specifier (`s`), then, you can insert the text `{s}` in your template string.\nHere is a quick list of the most used format specifiers:\n\n- `d`: for printing integers and floating-point numbers.\n- `c`: for printing characters.\n- `s`: for printing strings.\n- `p`: for printing memory addresses.\n- `x`: for printing hexadecimal values.\n- `any`: use any compatible format specifier (i.e. it automatically selects a format specifier for you).\n\nThe code example below, gives you an example of use of this `print()` method\nwith the `d` format specifier.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn add(x: u8, y: u8) u8 {\n return x + y;\n}\n\npub fn main() !void {\n const result = add(34, 16);\n _ = try stdout.print(\"Result: {d}\", .{result});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nResult: 50\n```\n\n\n:::\n:::\n\n\n\n\nIs important to emphasize that, the `stdout.print()` method, as you would expect,\nprint your template string into the `stdout` stream of your system.\nHowever, you can also print your template string into the `stderr` stream\nif your prefer. All you need to do, is to replace the `stdout.print()`\ncall with the function `std.debug.print()`. Like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add(x: u8, y: u8) u8 {\n return x + y;\n}\n\npub fn main() !void {\n const result = add(34, 16);\n std.debug.print(\"Result: {d}\\n\", .{result});\n}\n```\n:::\n\n\n\n\n```\nResult: 50\n```\n\n\n\n## Debugging through debuggers\n\nAlthough *print debugging* is a valid and very useful strategy,\nmost programmers prefer to use a debugger to debug their programs.\nSince Zig is a low-level language, you can use either GDB (GNU Debugger),\nor LLDB (LLVM Project Debugger) as your debugger.\n\nBoth debuggers can work with Zig code, and it's a matter of taste here.\nYou choose the debugger of your preference, and you work with it.\nIn this book, I will use LLDB as my debugger on the examples.\n\n\n### Compile your source code in Debug mode {#sec-compile-debug-mode}\n\nIn order to debug your program through a debugger, you must compile\nyour source code in Debug mode. Because when you compile your\nsource code in other modes (such as Release), the compiler usually\nstrips out some essential information that is used by the debugger\nto read and track your program, like PDB (*Program Database*) files.\n\nBy compiling your source code in Debug mode, you ensure that the debugger\nwill find the necessary information in your program to debug it.\nBy default, the compiler uses the Debug mode. Having this in mind,\nwhen you compile your program with the `build-exe`\ncommand (that we exposed at @sec-compile-code), if you don't specify an explicit mode through the `-O` command-line [^oargument]\nargument, then, the compiler will compile your code in Debug mode.\n\n[^oargument]: See .\n\n\n### Let's debug a program\n\nAs an example, let's debug some Zig code, and demonstrate\nhow can we use LLDB to navigate and check the following\npiece of Zig code:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\n\nfn add_and_increment(a: u8, b: u8) u8 {\n const sum = a + b;\n const incremented = sum + 1;\n return incremented;\n}\n\npub fn main() !void {\n var n = add_and_increment(2, 3);\n n = add_and_increment(n, n);\n _ = try stdout.print(\"Result: {d}!\\n\", .{n});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nResult: 13!\n```\n\n\n:::\n:::\n\n\n\n\nThere is nothing wrong with this program. But it is\na good start for us. First, we need to compile\nthis program with the `zig build-exe` command.\nFor this example, suppose that I compiled the above\nZig code into a binary executable called `add_program`.\n\n```bash\nzig build-exe add_program.zig\n```\n\nNow, we can start LLDB with `add_program`, like this:\n\n```bash\nlldb add_program\n```\n\nFrom now on, LLDB is started, and you can know that I'm\nexecuting LLDB commands by looking at the prefix `(lldb)`.\nIf something is prefixed with `(lldb)`, then you know\nthat it is a LLDB command.\n\nThe first thing I will do, is to set a breakpoint at\nthe `main()` function, by executing `b main`.\nAfter that, I just start the execution of the program\nwith `run`.\nYou can see in the output below, that the execution\nstopped at the first line in the function `main()`, as we expected.\n\n```bash\n(lldb) b main\nBreakpoint 1: where = debugging`debug1.main + 22\n at debug1.zig:11:30, address = 0x00000000010341a6\n(lldb) run\nProcess 8654 launched: 'add_program' (x86_64)\nProcess 8654 stopped\n* thread #1, name = 'add_program',\n stop reason = breakpoint 1.1 frame #0: 0x10341a6\n add_program`debug1.main at add_program.zig:11:30\n 8 \t}\n 9 \t\n 10 \tpub fn main() !void {\n-> 11 \t var n = add_and_increment(2, 3);\n 12 \t n = add_and_increment(n, n);\n 13 \t try stdout.print(\"Result: {d}!\\n\", .{n});\n 14 \t}\n```\n\nI can start to navigate through the code, and investigating the variables\nthat are being generated. If you not familiar with the commands\navailable in LLDB, I recommend you to read the official documentation\nof the project[^lldb].\nYou can also look for cheat sheets. Which quickly describes all commands\navailable for you, and, as a result, are also good resources for you[^lldb-quick-list].\n\n[^lldb]: \n[^lldb-quick-list]: .\n\nCurrently, we are in the first line at the `main()` function. In this line, we create\nthe `n` object, by executing the `add_and_increment()` function.\nTo execute the current line of code, and go to the next line, we can\nrun the `n` LLDB command. Let's execute this command.\n\nAfter we executed this line, we can also look at the value stored inside this `n` object\nby using the `p` LLDB command. The syntax for this command is `p `.\n\nIf we take a look at the value stored in the `n` object (`p n`),\nnotice that it stores the hexadecimal value `0x06`, which\nis the number 6 in decimal. We can also see that, this value have a type `unsigned char`,\nwhich is an unsigned 8-bit integer.\nWe have talked already about this at @sec-zig-strings, that `u8` integers in Zig are equivalent\nto the C data type `unsigned char`.\n\n\n\n```bash\n(lldb) n\nProcess 4798 stopped\n* thread #1, name = 'debugging',\n stop reason = step over frame #0: 0x10341ae\n debugging`debug1.main at debug1.zig:12:26\n 9 \t\n 10 \tpub fn main() !void {\n 11 \t var n = add_and_increment(2, 3);\n-> 12 \t n = add_and_increment(n, n);\n 13 \t try stdout.print(\"Result: {d}!\\n\", .{n});\n 14 \t}\n(lldb) p n\n(unsigned char) $1 = '\\x06'\n```\n\nNow, on the next line of code, we are executing the `add_and_increment()` function once again.\nWhy not step inside this function? Shall we? We can do that, by executing the `s` LLDB command.\nNotice in the example below that, after executing this command, we entered inside the context of the\n`add_and_increment()` function.\n\nAlso notice in the example below that, I walked two more lines in the functions body, then,\nI executed the `frame variable` LLDB command, to see at once, the value stored in each of the variables\nthat were created inside the current scope.\n\nYou can see in the output below that, the object `sum` stores the value `\\f`,\nwhich represents the *form feed* character. This character in the ASCII table,\ncorresponds to the hexadecimal value `0x0C`, or, in decimal, the number 12.\nSo, this means that the result of the expression `a + b` executed at line\n5, resulted in the number 12.\n\n```bash\n(lldb) s\nProcess 4798 stopped\n* thread #1, name = 'debugging',\n stop reason = step in frame #0: 0x10342de\n debugging`debug1.add_and_increment(a='\\x02', b='\\x03')\n at debug1.zig:4:39\n-> 4 \tfn add_and_increment(a: u8, b: u8) u8 {\n 5 \t const sum = a + b;\n 6 \t const incremented = sum + 1;\n 7 \t return incremented;\n(lldb) n\n(lldb) n\n(lldb) frame variable\n(unsigned char) a = '\\x06'\n(unsigned char) b = '\\x06'\n(unsigned char) sum = '\\f'\n(unsigned char) incremented = '\\x06'\n```\n\n\n\n## How to investigate the data type of your objects\n\nSince Zig is a strongly-typed language, the data types associated with your objects\nare very important for your program. So, debugging the data types associated\nwith your objects might be important to understand bugs and errors in your program.\n\nWhen you walk through your program with a debugger, you can inspect the types of\nyour objects by simply printing them to the console, with the LLDB `p` command.\nBut you also have alternatives embedded in the language itself to access the data\ntypes of your objects.\n\nIn Zig, you can retrieve the data type of an object, by using the built-in function\n`@TypeOf()`. Just apply this function over the object, and you get access to \nthe data type of the object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst expect = std.testing.expect;\n\npub fn main() !void {\n const number: i32 = 5;\n try expect(@TypeOf(number) == i32);\n try stdout.print(\"{any}\\n\", .{@TypeOf(number)});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\ni32\n```\n\n\n:::\n:::\n\n\n\n\nThis function is similar to the `type()` built-in function from Python,\nor, the `typeof` operator in Javascript.\n", - "supporting": [ - "02-debugging_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/03-structs/execute-results/html.json b/_freeze/Chapters/03-structs/execute-results/html.json index daf89ae8..4c5753b8 100644 --- a/_freeze/Chapters/03-structs/execute-results/html.json +++ b/_freeze/Chapters/03-structs/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Structs, Modules and Control Flow\n\nI introduced a lot of the Zig's syntax to you in the last chapter,\nspecially at @sec-root-file and @sec-main-file.\nBut we still need to discuss about some other very important\nelements of the language that you will use constantly on your day-to-day\nroutine.\n\nWe begin this chapter by talking about the different keywords and structures\nin Zig related to control flow (e.g. loops and if statements).\nThen, we talk about structs and how they can be used to do some\nbasic Object-Oriented (OOP) patterns in Zig. We also talk about\ntype inference, which help us to write less code and achieve the same results.\nFinally, we end this chapter by discussing modules, and how they relate\nto structs.\n\n\n\n## Control flow {#sec-zig-control-flow}\n\nSometimes, you need to make decisions in your program. Maybe you need to decide\nwether to execute or not a specific piece of code. Or maybe,\nyou need to apply the same operation over a sequence of values. These kinds of tasks,\ninvolve using structures that are capable of changing the \"control flow\" of our program.\n\nIn computer science, the term \"control flow\" usually refers to the order in which expressions (or commands)\nare evaluated in a given language or program. But this term is also used to refer\nto structures that are capable of changing this \"evaluation order\" of the commands\nexecuted by a given language/program.\n\nThese structures are better known\nby a set of terms, such as: loops, if/else statements, switch statements, among others. So,\nloops and if/else statements are examples of structures that can change the \"control\nflow\" of our program. The keywords `continue` and `break` are also examples of symbols\nthat can change the order of evaluation, since they can move our program to the next iteration\nof a loop, or make the loop stop completely.\n\n\n### If/else statements\n\nAn if/else statement performs an \"conditional flow operation\".\nA conditional flow control (or choice control) allows you to execute\nor ignore a certain block of commands based on a logical condition.\nMany programmers and computer science professionals also use\nthe term \"branching\" in this case.\nIn essence, we use if/else statements to use the result of a logical test\nto decide whether or not to execute a given block of commands.\n\nIn Zig, we write if/else statements by using the keywords `if` and `else`.\nWe start with the `if` keyword followed by a logical test inside a pair\nof parentheses, and then, a pair of curly braces with contains the lines\nof code to be executed in case the logical test returns the value `true`.\n\nAfter that, you can optionally add an `else` statement. Just add the `else`\nkeyword followed by a pair of curly braces, with the lines of code\nto executed in case the logical test defined in the `if`\nreturns `false`.\n\nIn the example below, we are testing if the object `x` contains a number\nthat is greater than 10. Judging by the output printed to the console,\nwe know that this logical test returned `false`. Because the output\nin the console is compatible with the line of code present in the\n`else` branch of the if/else statement.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst x = 5;\nif (x > 10) {\n try stdout.print(\n \"x > 10!\\n\", .{}\n );\n} else {\n try stdout.print(\n \"x <= 10!\\n\", .{}\n );\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nx <= 10!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Swith statements {#sec-switch}\n\nSwitch statements are also available in Zig.\nA switch statement in Zig have a similar syntax to a switch statement in Rust.\nAs you would expect, to write a switch statement in Zig we use the `switch` keyword.\nWe provide the value that we want to \"switch over\" inside a\npair of parentheses. Then, we list the possible combinations (or \"branchs\")\ninside a pair of curly braces.\n\nLet's take a look at the code example below. You can see in this example that,\nI'm creating an enum type called `Role`. We talk more about enums at @sec-enum.\nBut in essence, this `Role` type is listing different types of roles in a fictitious\ncompany, like `SE` for Software Engineer, `DE` for Data Engineer, `PM` for Product Manager,\netc.\n\nNotice that we are using the value from the `role` object in the\nswitch statement, to discover which exact area we need to store in the `area` variable object.\nAlso notice that we are using type inference inside the switch statement, with the dot character,\nas we described at @sec-type-inference.\nThis makes the `zig` compiler infer the correct data type of the values (`PM`, `SE`, etc.) for us.\n\nAlso notice that, we are grouping multiple values in the same branch of the switch statement.\nWe just separate each possible value with a comma. So, for example, if `role` contains either `DE` or `DA`,\nthe `area` variable would contain the value `\"Data & Analytics\"`, instead of `\"Platform\"`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Role = enum {\n SE, DPE, DE, DA, PM, PO, KS\n};\n\npub fn main() !void {\n var area: []const u8 = undefined;\n const role = Role.SE;\n switch (role) {\n .PM, .SE, .DPE, .PO => {\n area = \"Platform\";\n },\n .DE, .DA => {\n area = \"Data & Analytics\";\n },\n .KS => {\n area = \"Sales\";\n },\n }\n try stdout.print(\"{s}\\n\", .{area});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nPlatform\n```\n\n\n:::\n:::\n\n\n\n\nNow, one very important aspect about this switch statement presented\nin the code example above, is that it exhaust all existing possibilities.\nIn other words, all possible values that could be found inside the `order`\nobject are explicitly handled in this switch statement.\n\nSince the `role` object have type `Role`, the only possible values to\nbe found inside this object are `PM`, `SE`, `DPE`, `PO`, `DE`, `DA` and `KS`.\nThere is no other possible value to be stored in this `role` object.\nThis what \"exhaust all existing possibilities\" means. The switch statement covers\nevery possible case.\n\nIn Zig, switch statements must exhaust all existing possibilities. You cannot write\na switch statement, and leave an edge case with no expliciting action to be taken.\nThis is a similar behaviour to switch statements in Rust, which also have to\nhandle all possible cases.\n\nTake a look at the `dump_hex_fallible()` function below as an example. This function\nalso comes from the Zig Standard Library, but this time, it comes from the [`debug.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/debug.zig)[^debug-mod].\nThere are multiple lines in this function, but I omitted them to focus solely on the\nswitch statement found in this function. Notice that this switch statement have four\npossible cases, or four explicit branches. Also, notice that we used an `else` branch\nin this case. Whenever you have multiple possible cases in your switch statement\nwhich you want to apply the same exact action, you can use an `else` branch to do that.\n\n[^debug-mod]: \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn dump_hex_fallible(bytes: []const u8) !void {\n // Many lines ...\n switch (byte) {\n '\\n' => try writer.writeAll(\"␊\"),\n '\\r' => try writer.writeAll(\"␍\"),\n '\\t' => try writer.writeAll(\"␉\"),\n else => try writer.writeByte('.'),\n }\n}\n```\n:::\n\n\n\n\nMany users would also use an `else` branch to handle a \"not supported\" case.\nThat is, a case that cannot be properly handled by your code, or, just a case that\nshould not be \"fixed\". So many programmers use an `else` branch to panic (or raise an error) to stop\nthe current execution.\n\nTake the code example below as an example. We can see that, we are handling the cases\nfor the `level` object being either 1, 2, or 3. All other possible cases are not supported by default,\nand, as consequence, we raise an runtime error in these cases, through the `@panic()` built-in function.\n\nAlso notice that, we are assigning the result of the switch statement to a new object called `category`.\nThis is another thing that you can do with switch statements in Zig. If the branchs in this switch\nstatement output some value as result, you can store the result value of the switch statement into\na new object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst level: u8 = 4;\nconst category = switch (level) {\n 1, 2 => \"beginner\",\n 3 => \"professional\",\n else => {\n @panic(\"Not supported level!\");\n },\n};\ntry stdout.print(\"{s}\\n\", .{category});\n```\n:::\n\n\n\n\n```\nthread 13103 panic: Not supported level!\nt.zig:9:13: 0x1033c58 in main (switch2)\n @panic(\"Not supported level!\");\n ^\n```\n\nFurthermore, you can also use ranges of values in switch statements.\nThat is, you can create a branch in your switch statement that is used\nwhenever the input value is contained in a range. These range\nexpressions are created with the operator `...`. Is important\nto emphasize that the ranges created by this operator are\ninclusive on both ends.\n\nFor example, I could easily change the code example above to support all\nlevels between 0 and 100. Like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst level: u8 = 4;\nconst category = switch (level) {\n 0...25 => \"beginner\",\n 26...75 => \"intermediary\",\n 76...100 => \"professional\",\n else => {\n @panic(\"Not supported level!\");\n },\n};\ntry stdout.print(\"{s}\\n\", .{category});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nbeginner\n```\n\n\n:::\n:::\n\n\n\n\nThis is neat, and it works with character ranges too. That is, I could\nsimply write `'a'...'z'`, to match any character value that is a\nlowercase letter, and it would work fine.\n\n\n\n### The `defer` keyword {#sec-defer}\n\nWith the `defer` keyword you can execute expressions at the end of the current scope.\nTake the `foo()` function below as an example. When we execute this function, the expression\nthat prints the message \"Exiting function ...\" get's executed only at\nthe end of the function scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn foo() !void {\n defer std.debug.print(\n \"Exiting function ...\\n\", .{}\n );\n try stdout.print(\"Adding some numbers ...\\n\", .{});\n const x = 2 + 2; _ = x;\n try stdout.print(\"Multiplying ...\\n\", .{});\n const y = 2 * 8; _ = y;\n}\n\npub fn main() !void {\n try foo();\n}\n```\n:::\n\n\n\n\n```\nAdding some numbers ...\nMultiplying ...\nExiting function ...\n```\n\nIt doesn't matter how the function exits (i.e. because\nof an error, or, because of an return statement, or whatever),\njust remember, this expression get's executed when the function exits.\n\n\n\n\n### For loops\n\nA loop allows you to execute the same lines of code multiple times,\nthus, creating a \"repetition space\" in the execution flow of your program.\nLoops are particularly useful when we want to replicate the same function\n(or the same set of commands) over several different inputs.\n\nThere are different types of loops available in Zig. But the most\nessential of them all is probably the *for loop*. A for loop is\nused to apply the same piece of code over the elements of a slice or an array.\n\nFor loops in Zig have a slightly different syntax that you are\nprobably used to see in other languages. You start with the `for` keyword, then, you\nlist the items that you want to iterate\nover inside a pair of parentheses. Then, inside of a pair of pipes (`|`)\nyou should declare an identifier that will serve as your iterator, or,\nthe \"repetition index of the loop\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (items) |value| {\n // code to execute\n}\n```\n:::\n\n\n\n\nInstead of using a `(value in items)` syntax,\nin Zig, for loops use the syntax `(items) |value|`. In the example\nbelow, you can see that we are looping through the items\nof the array stored at the object `name`, and printing to the\nconsole the decimal representation of each character in this array.\n\nIf we wanted, we could also iterate through a slice (or a portion) of\nthe array, instead of iterating through the entire array stored in the `name` object.\nJust use a range selector to select the section you want. For example,\nI could provide the expression `name[0..3]` to the for loop, to iterate\njust through the first 3 elements in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name = [_]u8{'P','e','d','r','o'};\nfor (name) |char| {\n try stdout.print(\"{d} | \", .{char});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n80 | 101 | 100 | 114 | 111 | \n```\n\n\n:::\n:::\n\n\n\n\nIn the above example we are using the value itself of each\nelement in the array as our iterator. But there are many situations where\nwe need to use an index instead of the actual values of the items.\n\nYou can do that by providing a second set of items to iterate over.\nMore precisely, you provide the range selector `0..` to the for loop. So,\nyes, you can use two different iterators at the same time in a for\nloop in Zig.\n\nBut remember from @sec-assignments that, every object\nyou create in Zig must be used in some way. So if you declare two iterators\nin your for loop, you must use both iterators inside the for loop body.\nBut if you want to use just the index iterator, and not use the \"value iterator\",\nthen, you can discard the value iterator by maching the\nvalue items to the underscore character, like in the example below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (name, 0..) |_, i| {\n try stdout.print(\"{d} | \", .{i});\n}\n```\n:::\n\n\n\n\n```\n0 | 1 | 2 | 3 | 4 |\n```\n\n\n### While loops\n\nA while loop is created from the `while` keyword. A `for` loop\niterates through the items of an array, but a `while` loop\nwill loop continuously, and infinitely, until a logical test\n(specified by you) becomes false.\n\nYou start with the `while` keyword, then, you define a logical\nexpression inside a pair of parentheses, and the body of the\nloop is provided inside a pair of curly braces, like in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: u8 = 1;\nwhile (i < 5) {\n try stdout.print(\"{d} | \", .{i});\n i += 1;\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 2 | 3 | 4 | \n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Using `break` and `continue`\n\nIn Zig, you can explicitly stop the execution of a loop, or, jump to the next iteration of the loop, using\nthe keywords `break` and `continue`, respectively. The `while` loop present in the example below, is\nat first sight, an infinite loop. Because the logical value inside the parenthese will always be equal to `true`.\nWhat makes this `while` loop stop when the `i` object reaches the count\n10? Is the `break` keyword!\n\nInside the while loop, we have an if statement that is constantly checking if the `i` variable\nis equal to 10. Since we are increasing the value of this `i` variable at each iteration of the\nwhile loop. At some point, this `i` variable will be equal to 10, and when it does, the if statement\nwill execute the `break` expression, and, as a result, the execution of the while loop is stopped.\n\nNotice the `expect()` function from the Zig standard library after the while loop.\nThis `expect()` function is an \"assert\" type of function.\nThis function checks if the logical test provided is equal to true. If this logical test is false,\nthe function raises an assertion error. But it is equal to true, then, the function will do nothing.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: usize = 0;\nwhile (true) {\n if (i == 10) {\n break;\n }\n i += 1;\n}\ntry std.testing.expect(i == 10);\ntry stdout.print(\"Everything worked!\", .{});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nEverything worked!\n```\n\n\n:::\n:::\n\n\n\n\nSince this code example was executed succesfully by the `zig` compiler,\nwithout raising any errors, then, we known that, after the execution of while loop,\nthe `i` variable is equal to 10. Because if it wasn't equal to 10, then, an error would\nbe raised by `expect()`.\n\nNow, in the next example, we have an use case for\nthe `continue` keyword. The if statement is constantly\nchecking if the current index is a multiple of 2. If\nit is, then we jump to the next iteration of the loop\ndirectly. But it the current index is not a multiple of 2,\nthen, the loop will simply print this index to the console.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [_]u8{1,2,3,4,5,6};\nfor (ns) |i| {\n if ((i % 2) == 0) {\n continue;\n }\n try stdout.print(\"{d} | \", .{i});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 3 | 5 | \n```\n\n\n:::\n:::\n\n\n\n\n\n\n## Function parameters are immutable {#sec-fun-pars}\n\nWe have already discussed a lot of the syntax behind function declarations at @sec-root-file and @sec-main-file.\nBut I want to emphasize a curious fact about function parameters (a.k.a. function arguments) in Zig.\nIn summary, function parameters are immutable in Zig.\n\nTake the code example below, where we declare a simple function that just tries to add\nsome amount to the input integer, and returns the result back. But if you look closely\nat the body of this `add2()` function, you will notice that we try\nto save the result back into the `x` function argument.\n\nIn other words, this function not only use the value that it received through the function argument\n`x`, but it also tries to change the value of this function argument, by assigning the addition result\ninto `x`. However, function arguments in Zig are immutable. You cannot change their values, or, you\ncannot assign values to them inside the body's function.\n\nThis is the reason why, the code example below do not compile successfully. If you try to compile\nthis code example, you get a compile error warning you that you are trying to change the value of a\nimmutable (i.e. constant) object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add2(x: u32) u32 {\n x = x + 2;\n return x;\n}\n\npub fn main() !void {\n const y = add2(4);\n std.debug.print(\"{d}\\n\", .{y});\n}\n```\n:::\n\n\n\n\n```\nt.zig:3:5: error: cannot assign to constant\n x = x + 2;\n ^\n```\n\n\nIf a function argument receives as input a object whose data type is\nany of the primitive types that we have listed at @sec-primitive-data-types\nthis object is always passed by value to the function. In other words, this object\nis copied to the function stack frame.\n\nHowever, if the input object have a more complex data type, for example, it might\nbe a struct instance, or an array, or a union, etc., in cases like that, the `zig` compiler\nwill take the liberty of deciding for you which strategy is best. The `zig` compiler will\npass your object to the function either by value, or by reference. The compiler will always\nchoose the strategy that is faster for you.\nThis optimization that you get for free is possible only because function arguments are\nimmutable in Zig.\n\nTo overcome this barrier, we need to take the lead, and explicitly choose to pass the\nobject by reference. That is, instead of depending on the `zig` compiler to decide for us, we need\nto explicitly mark the function argument as a pointer. This way, we are telling the compiler\nthat this function argument will be passed by reference to the function.\n\nBy making it a pointer, we can finally use and alter directly the value of this function argument inside\nthe body of the `add2()` function. You can see that the code example below compiles successfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add2(x: *u32) void {\n const d: u32 = 2;\n x.* = x.* + d;\n}\n\npub fn main() !void {\n var x: u32 = 4;\n add2(&x);\n std.debug.print(\"Result: {d}\\n\", .{x});\n}\n```\n:::\n\n\n\n\n```\nResult: 6\n```\n\n\n\n## Structs and OOP {#sec-structs-and-oop}\n\nZig is a language more closely related to C (which is a procedural language),\nthan it is to C++ or Java (which are object-oriented languages). Because of that, you do not\nhave advanced OOP (Object-Oriented Programming) patterns available in Zig, such as classes, interfaces or\nclass inheritance. Nonetheless, OOP in Zig is still possible by using struct definitions.\n\nWith struct definitions, you can create (or define) a new data type in Zig. These struct definitions work the same way as they work in C.\nYou give a name to this new struct (or, to this new data type you are creating), then, you list the data members of this new struct. You can\nalso register functions inside this struct, and they become the methods of this particular struct (or data type), so that, every object\nthat you create with this new type, will always have these methods available and associated with them.\n\nIn C++, when we create a new class, we normally have a constructor method (or, a constructor function) to construct or to instantiate every object\nof this particular class, and you also have a destructor method (or a destructor function) that\nis the function responsible for destroying every object of this class.\n\nIn Zig, we normally declare the constructor and the destructor methods\nof our structs, by declaring an `init()` and a `deinit()` methods inside the struct.\nThis is just a naming convention that you will find across the entire Zig standard library.\nSo, in Zig, the `init()` method of a struct is normally the constructor method of the class represented by this struct.\nWhile the `deinit()` method is the method used for destroying an existing instance of that struct.\n\nThe `init()` and `deinit()` methods are both used extensively in Zig code, and you will see both of\nthem being used when we talk about allocators at @sec-allocators.\nBut, as another example, let's build a simple `User` struct to represent an user of some sort of system.\nIf you look at the `User` struct below, you can see the `struct` keyword, and inside of a\npair of curly braces, we write the struct's body.\n\nNotice the data members of this struct, `id`, `name` and `email`. Every data member have it's\ntype explicitly annotated, with the colon character (`:`) syntax that we described earlier at @sec-root-file.\nBut also notice that every line in the struct body that describes a data member, ends with a comma character (`,`).\nSo every time you declare a data member in your Zig code, always end the line with a comma character, instead\nof ending it with the traditional semicolon character (`;`).\n\nNext, also notice in this example, that we registrated an `init()` function as a method\nof this `User` struct. This `init()` method is the constructor method that you use to instantiate\nevery new `User` object. That is why this `init()` function return an `User` object as result.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst User = struct {\n id: u64,\n name: []const u8,\n email: []const u8,\n\n pub fn init(id: u64,\n name: []const u8,\n email: []const u8) User {\n\n return User {\n .id = id,\n .name = name,\n .email = email\n };\n }\n\n pub fn print_name(self: User) !void {\n try stdout.print(\"{s}\\n\", .{self.name});\n }\n};\n\npub fn main() !void {\n const u = User.init(1, \"pedro\", \"email@gmail.com\");\n try u.print_name();\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\npedro\n```\n\n\n:::\n:::\n\n\n\n\nThe `pub` keyword plays an important role in struct declarations, and OOP in Zig.\nEvery method that you declare in your struct that is marked with the keyword `pub`,\nbecomes a public method of this particular struct.\n\nSo every method that you create in your struct, is, at first, a private method\nof that struct. Meaning that, this method can only be called from within this\nstruct. But, if you mark this method as public, with the keyword `pub`, then,\nyou can call the method directly from the `User` object you have\nin your code.\n\nIn other words, the functions marked by the keyword `pub`\nare members of the public API of that struct.\nFor example, if I did not marked the `print_name()` method as public,\nthen, I could not execute the line `u.print_name()`. Because I would\nnot be authorized to call this method directly in my code.\n\n\n\n### Anonymous struct literals {#sec-anonymous-struct-literals}\n\nYou can declare a struct object as a literal value. When we do that, we normally specify the\ndata type of this struct literal by writing it's data type just before the opening curly braces.\nFor example, I could write a struct literal of type `User` that we defined in the previous section like\nthis:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst eu = User {\n .id = 1,\n .name = \"Pedro\",\n .email = \"someemail@gmail.com\"\n};\n_ = eu;\n```\n:::\n\n\n\n\nHowever, in Zig, we can also write an anonymous struct literal. That is, you can write a\nstruct literal, but not especify explicitly the type of this particular struct.\nAn anonymous struct is written by using the syntax `.{}`. So, we essentially\nreplaced the explicit type of the struct literal with a dot character (`.`).\n\nAs we described at @sec-type-inference, when you put a dot before a struct literal,\nthe type of this struct literal is automatically inferred by the `zig` compiler.\nIn essence, the `zig` compiler will look for some hint of what is the type of that struct.\nIt can be the type annotation of an function argument,\nor the return type annotation of the function that you are using, or the type annotation\nof a variable.\nIf the compiler do find such type annotation, then, it will use this\ntype in your literal struct. \n\nAnonymous structs are very commom to use in function arguments in Zig.\nOne example that you have seen already constantly, is the `print()`\nfunction from the `stdout` object.\nThis function takes two arguments.\nThe first argument, is a template string, which should\ncontain string format specifiers in it, which tells how the values provided\nin the second argument should be printed into the message.\n\nWhile the second argument is a struct literal that lists the values\nto be printed into the template message specified in the first argument.\nYou normally want to use an anonymous struct literal here, so that, the\n`zig` compiler do the job of specifying the type of this particular\nanonymous struct for you.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello, world!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Struct declarations must be constant\n\nTypes in Zig must be `const` or `comptime` (we are going to talk more about comptime at @sec-comptime).\nWhat this means is that you cannot create a new data type, and mark it as variable with the `var` keyword.\nSo struct declarations are always constant. You cannot declare a new struct using the `var` keyword.\nIt must be `const`.\n\nIn the `Vec3` example below, this declaration is allowed because I'm using the `const` keyword\nto declare this new data type.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n};\n```\n:::\n\n\n\n\n\n### The `self` method argument {#sec-self-arg}\n\nIn every language that have OOP, when we declare a method of some class or struct, we\nusually declare this method as a function that have a `self` argument.\nThis `self` argument is the reference to the object itself from which the method\nis being called from.\n\nIs not mandatory to use this `self` argument. But why would you not use this `self` argument?\nThere is no reason to not use it. Because the only way to get access to the data stored in the\ndata members of your struct is to access them through this `self` argument.\nIf you don't need to use the data in the data members of your struct inside your method, then, you very likely don't need\na method, you can just simply declare this logic as a simple function, outside of your\nstruct declaration.\n\n\nTake the `Vec3` struct below. Inside this `Vec3` struct we declared a method named `distance()`.\nThis method calculates the distance between two `Vec3` objects, by following the distance\nformula in euclidean space. Notice that this `distance()` method takes two `Vec3` objects\nas input, `self` and `other`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst m = std.math;\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n\n pub fn distance(self: Vec3, other: Vec3) f64 {\n const xd = m.pow(f64, self.x - other.x, 2.0);\n const yd = m.pow(f64, self.y - other.y, 2.0);\n const zd = m.pow(f64, self.z - other.z, 2.0);\n return m.sqrt(xd + yd + zd);\n }\n};\n```\n:::\n\n\n\n\n\nThe `self` argument corresponds to the `Vec3` object from which this `distance()` method\nis being called from. While the `other` is a separate `Vec3` object that is given as input\nto this method. In the example below, the `self` argument corresponds to the object\n`v1`, because the `distance()` method is being called from the `v1` object,\nwhile the `other` argument corresponds to the object `v2`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst v1 = Vec3 {\n .x = 4.2, .y = 2.4, .z = 0.9\n};\nconst v2 = Vec3 {\n .x = 5.1, .y = 5.6, .z = 1.6\n};\n\nstd.debug.print(\n \"Distance: {d}\\n\",\n .{v1.distance(v2)}\n);\n```\n:::\n\n\n\n\n```\nDistance: 3.3970575502926055\n```\n\n\n\n### About the struct state\n\nSometimes you don't need to care about the state of your struct object. Sometimes, you just need\nto instantiate and use the objects, without altering their state. You can notice that when you have methods\ninside your struct declaration that might use the values that are present in the data members, but they\ndo not alter the values in the data members of the struct in anyway.\n\nThe `Vec3` struct that was presented at @sec-self-arg is an example of that.\nThis struct have a single method named `distance()`, and this method do use the values\npresent in all three data members of the struct (`x`, `y` and `z`). But at the same time,\nthis method do not change the values of these data members in any point.\n\nAs a result of that, when we create `Vec3` objects we usually create them as\nconstant objects, like the `v1` and `v2` objects presented at @sec-self-arg.\nWe can create them as variable objects with the `var` keyword,\nif we want to. But because the methods of this `Vec3` struct do not change\nthe state of the objects in any point, is unnecessary to mark them\nas variable objects.\n\nBut why? Why am I talkin about this here? Is because the `self` argument\nin the methods is affected depending on whether the\nmethods present in a struct change or not the state of the object itself.\nMore specifically, when you have a method in a struct that changes the state\nof the object (i.e. change the value of a data member), the `self` argument\nin this method must be annotated in a different manner.\n\nAs I described at @sec-self-arg, the `self` argument in methods of\na struct is the argument that receives as input the object from which the method\nwas called from. We usually annotate this argument in the methods by writing `self`,\nfollowed by the colon character (`:`), and the data type of the struct to which\nthe method belongs to (e.g. `User`, `Vec3`, etc.).\n\nIf we take the `Vec3` struct that we defined in the previous section as an example,\nwe can see in the `distance()` method that this `self` argument is annotated as\n`self: Vec3`. Because the state of the `Vec3` object is never altered by this\nmethod.\n\nBut what if we do have a method that alters the state of the object, by altering the\nvalues of it's data members. How should we annotate `self` in this instance? The answer is:\n\"we should annotate `self` as a pointer of `x`, instead of just `x`\".\nIn other words, you should annotate `self` as `self: *x`, instead of annotating it\nas `self: x`.\n\nIf we create a new method inside the `Vec3` object that, for example, expands the\nvector by multiplying it's coordinates by a factor o two, then, we need to follow\nthis rule specified in the previous paragraph. The code example below demonstrates\nthis idea:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst m = std.math;\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n\n pub fn distance(self: Vec3, other: Vec3) f64 {\n const xd = m.pow(f64, self.x - other.x, 2.0);\n const yd = m.pow(f64, self.y - other.y, 2.0);\n const zd = m.pow(f64, self.z - other.z, 2.0);\n return m.sqrt(xd + yd + zd);\n }\n\n pub fn double(self: *Vec3) void {\n self.x = self.x * 2.0;\n self.y = self.y * 2.0;\n self.z = self.z * 2.0;\n }\n};\n```\n:::\n\n\n\n\nNotice in the code example above that we have added a new method\nto our `Vec3` struct named `double()`. This method essentially doubles the\ncoordinate values of our vector object. Also notice that, in the\ncase of the `double()` method, we annotated the `self` argument as `*Vec3`,\nindicating that this argument receives a pointer (or a reference, if you prefer to call it this way)\nto a `Vec3` object as input.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar v3 = Vec3 {\n .x = 4.2, .y = 2.4, .z = 0.9\n};\nv3.double();\nstd.debug.print(\"Doubled: {d}\\n\", .{v3.x});\n```\n:::\n\n\n\n\n```\nDoubled: 8.4\n```\n\n\n\nNow, if you change the `self` argument in this `double()` method to `self: Vec3`, like in the\n`distance()` method, you will get the compiler error exposed below as result. Notice that this\nerror message is indicating a line from the `double()` method body,\nindicating that you cannot alter the value of the `x` data member.\n\n```zig\n// If we change the function signature of double to:\n pub fn double(self: Vec3) void {\n```\n\nThis error message indicates that the `x` data member belongs to a constant object,\nand, because of that, it cannot be changed. Ultimately, this error message\nis telling us that the `self` argument is constant.\n\n```\nt.zig:16:13: error: cannot assign to constant\n self.x = self.x * 2.0;\n ~~~~^~\n```\n\nIf you take some time, and think hard about this error message, you will understand it.\nYou already have the tools to understand why we are getting this error message.\nWe have talked about it already at @sec-fun-pars.\nSo remember, every function argument is immutable in Zig, and `self`\nis included in this rule.\n\nIt does not matter if the object that you pass as input to the function argument is\na variable object or not. In this example, we marked the `v3` object as a variable object.\nBut this does not matter. Because it is not about the input object, it is about\nthe function argument.\n\nThe problem begins when we try to alter the value of `self` directly, which is a function argument,\nand, every function argument is immutable by default. You may quest yourself how can we overcome\nthis barrier, and once again, the solution was also discussed at @sec-fun-pars.\nWe overcome this barrier, by explicitly marking the `self` argument as a pointer.\n\n\n::: {.callout-note}\nIf a method of your `x` struct alters the state of the object, by\nchanging the value of any data member, then, remember to use `self: *x`,\ninstead of `self: x` in the function signature of this method.\n:::\n\n\nYou could also interpret the content discussed in this section as:\n\"if you need to alter the state of your `x` struct object in one of it's methods,\nyou must explicitly pass the `x` struct object by reference to the `self` argument of this method\".\n\n\n\n## Type inference {#sec-type-inference}\n\nZig is kind of a strongly typed language. I say \"kind of\" because there are situations\nwhere you don't have to explicitly write the type of every single object in your source code,\nas you would expect from a traditional strongly typed language, such as C and C++.\n\nIn some situations, the `zig` compiler can use type inference to solves the data types for you, easing some of\nthe burden that you carry as a developer.\nThe most commom way this happens is through function arguments that receives struct objects\nas input.\n\nIn general, type inference in Zig is done by using the dot character (`.`).\nEverytime you see a dot character written before a struct literal, or before a enum value, or something like that,\nyou know that this dot character is playing a special party in this place. More specifically, it is\ntelling the `zig` compiler something on the lines of: \"Hey! Can you infer the type of this\nvalue for me? Please!\". In other words, this dot character is playing a role similar to the `auto` keyword in C++.\n\nI give you some examples of this at @sec-anonymous-struct-literals, where we present anonymous struct literals.\nAnonymous struct literals are, essentially, struct literals that use type inference to\ninfer the exact type of this particular struct literal.\nThis type inference is done by looking for some minimal hint of the correct data type to be used.\nYou could say that the `zig` compiler looks for any neighbouring type annotation that might tell him\nwhat would be the correct type.\n\nAnother commom place where we use type inference in Zig is at switch statements (which we talk about at @sec-switch).\nSo I also gave some other examples of type inference at @sec-switch, where we were inferring the data types of enum values listed inside\nof switch statements (e.g. `.DE`).\nBut as another example, take a look at this `fence()` function reproduced below,\nwhich comes from the [`atomic.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/atomic.zig)[^fence-fn]\nof the Zig Standard Library.\n\n[^fence-fn]: .\n\nThere are a lot of things in this function that we haven't talked about yet, such as:\nwhat `comptime` means? `inline`? `extern`? What is this star symbol before `Self`?\nLet's just ignore all of these things, and focus solely on the switch statement\nthat is inside this function.\n\nWe can see that this switch statement uses the `order` object as input. This `order`\nobject is one of the inputs of this `fence()` function, and we can see in the type annotation,\nthat this object is of type `AtomicOrder`. We can also see a bunch of values inside the\nswitch statements that begins with a dot character, such as `.release` and `.acquire`.\n\nBecause these weird values contain a dot character before them, we are asking the `zig`\ncompiler to infer the types of these values inside the switch statement. Then, the `zig`\ncompiler is looking into the current context where these values are being used, and it is\ntrying to infer the types of these values.\n\nSince they are being used inside a switch statement, the `zig` compiler looks into the type\nof the input object given to the switch statement, which is the `order` object in this case.\nBecause this object have type `AtomicOrder`, the `zig` compiler infers that these values\nare data members from this type `AtomicOrder`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub inline fn fence(self: *Self, comptime order: AtomicOrder) void {\n // LLVM's ThreadSanitizer doesn't support the normal fences so we specialize for it.\n if (builtin.sanitize_thread) {\n const tsan = struct {\n extern \"c\" fn __tsan_acquire(addr: *anyopaque) void;\n extern \"c\" fn __tsan_release(addr: *anyopaque) void;\n };\n\n const addr: *anyopaque = self;\n return switch (order) {\n .unordered, .monotonic => @compileError(@tagName(order) ++ \" only applies to atomic loads and stores\"),\n .acquire => tsan.__tsan_acquire(addr),\n .release => tsan.__tsan_release(addr),\n .acq_rel, .seq_cst => {\n tsan.__tsan_acquire(addr);\n tsan.__tsan_release(addr);\n },\n };\n }\n\n return @fence(order);\n}\n```\n:::\n\n\n\n\nThis is how basic type inference is done in Zig. If we didn't use the dot character before\nthe values inside this switch statement, then, we would be forced to write explicitly\nthe data types of these values. For example, instead of writing `.release` we would have to\nwrite `AtomicOrder.release`. We would have to do this for every single value\nin this switch statement, and this is a lot of work. That is why type inference\nis commonly used on switch statements in Zig.\n\n\n\n## Modules\n\nWe already talked about what modules are, and also, how to import other modules into\nyou current module through *import statements*, so that you can use functionality from these other modules in\nyour current module.\nBut in this section, I just want to make it clear that modules are actually structs in Zig.\n\nIn other words, every Zig module (i.e. a `.zig` file) that you write in your project\nis internally stored as a struct object.\nTake the line exposed below as an example. In this line we are importing the\nZig Standard Library into our current module.\n\n```zig\nconst std = @import(\"std\");\n```\n\nWhen we want to access the functions and objects from the standard library, we\nare basically accessing the data members of the struct stored in the `std`\nobject. That is why we use the same syntax that we use in normal structs, with the dot operator (`.`)\nto access the data members and methods of the struct.\n\nWhen this \"import statement\" get's executed, the result of this expression is a struct\nobject that contains the Zig Standard Library modules, global variables, functions, etc.\nAnd this struct object get's saved (or stored) inside the constant object named `std`.\n\n\nTake the [`thread_pool.zig` module from the project `zap`](https://github.com/kprotty/zap/blob/blog/src/thread_pool.zig)[^thread]\nas an example. This module is written as if it was\na big struct. That is why we have a top-level and public `init()` method\nwritten in this module. The idea is that all top-level functions written in this\nmodule are methods from the struct, and all top-level objects and struct declarations\nare data members of this struct. The module is the struct itself.\n\n[^thread]: \n\n\nSo you would import and use this module by doing something like this:\n\n```zig\nconst std = @import(\"std\");\nconst ThreadPool = @import(\"thread_pool.zig\");\nconst num_cpus = std.Thread.getCpuCount()\n catch @panic(\"failed to get cpu core count\");\nconst num_threads = std.math.cast(u16, num_cpus)\n catch std.math.maxInt(u16);\nconst pool = ThreadPool.init(\n .{ .max_threads = num_threads }\n);\n```\n", - "supporting": [ - "03-structs_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/03-unittests/execute-results/html.json b/_freeze/Chapters/03-unittests/execute-results/html.json index d21da93e..0ddb2bab 100644 --- a/_freeze/Chapters/03-unittests/execute-results/html.json +++ b/_freeze/Chapters/03-unittests/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Unit tests {#sec-unittests}\n\nIn this chapter, I want to dive in on how unit tests are developed in\nZig. We are going to talk about what is the testing wokflow in Zig, and\nalso, about the `test` command from the `zig` compiler.\n\n\n## Introducing the `test` block\n\nIn Zig, unit tests are written inside a `test` declaration, or, how I prefer to call it, inside a `test` block.\nEvery `test` block is written by using the keyword `test`.\nYou can optionally use a string literal to write a label, which is responsible for identifying\nthe specific group of unit tests that you are writing inside this specific `test` block.\n\nIn the example below, we are testing if the sum of two objects (`a` and `b`)\nis equal to 4. The `expect()` function from the Zig Standard Library\nis a function that receives a logical test as input. If this logical test\nresults in `true`, then, the test passes. But if it results\nin `false`, then, the test fails.\n\nYou can write any Zig code you want inside of each `test` block.\nPart of this code might be some necessary commands to setup your testing\nenvironment, or just initializing some necessary objects.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest \"testing simple sum\" {\n const a: u8 = 2;\n const b: u8 = 2;\n try expect((a + b) == 4);\n}\n```\n:::\n\n\n\n\nYou can have multiple `test` blocks written on the same Zig module.\nAlso, you can mix `test` blocks with your source code, with no problems\nor consequences. If you mix `test` blocks with your normal source code,\nwhen you execute the `build`, `build-exe`, `build-obj` or `build-lib` commands from the\n`zig` compiler that we exposed at @sec-compile-code, these `test` blocks are automatically ignored by the\ncompiler.\n\nIn other words, the `zig` compiler only builds and execute your tests\nwhen you ask it to. By default, the compiler always ignore `test`\nblocks written in your Zig modules. The compiler normally checks only if\nthere are any syntax errors in these `test` blocks.\n\nIf you look at the source code for most of the files present in the\nZig Standard Library[^zig-std-lib], you can see that the `test` blocks\nare written together with the normal source code of the library.\nYou can see this for example, at the [`array_list` module](https://github.com/ziglang/zig/blob/master/lib/std/array_list.zig)[^zig-array].\nSo, the standard that the Zig developers decided to adopt\nis to keep their unit tests together with the source code\nof the functionality that they are testing.\n\nEach programmer might have a different opinion on this.\nSome of them might prefer to keep unit tests separate from the actual\nsource code of their application. If that is your case, you can\nsimply create a separate `tests` folder in your project, and\nstart writing Zig modules that contains only unit tests (as would normally do on a Python project with `pytest`, for example),\nand everything will work fine.\nIt boils down to which is your preference here.\n\n[^zig-std-lib]: \n[^zig-array]: \n\n\n## How to run your tests\n\nIf the `zig` compiler ignores any `test` block by default, how can\nyou compile and run your unit tests? The answer is the `test` command from\nthe `zig` compiler. By running `zig test` command, the compiler will\nfind every instance of `test` block in your Zig module, and, it will\ncompile and run the unit tests you wrote.\n\n\n```bash\nzig test simple_sum.zig\n```\n\n```\n1/1 simple_sum.test.testing simple sum... OK\nAll 1 tests passed.\n```\n\n\n## Testing memory allocations\n\nOne of the advantages of Zig is that it offers great tools\nthat hep us, programmers, to avoid (but also detect) memory problems, such as\nmemory leaks and double-frees. The `defer` keyword\nis specially helpful in this regard.\n\nWhen developing your\nsource code, you, the programmer, is responsible for making\nsure that your code do not produce such problems. However,\nyou can also use a special type of allocator object in Zig,\nthat is capable of automatically detect such problems for you.\nThis is the `std.testing.allocator` object.\nThis allocator object offers some basic memory safety detection\nfeatures, which are capable of detecting memory leaks.\n\nAs we described at @sec-heap, to allocate memory on the heap, you need\nto use an allocator object, and your functions that use these objects to allocate memory\non the heap, should receive an allocator object as one of it's inputs.\nEvery memory on the heap that you allocate using these allocator objects,\nmust also be freed using this same allocator object.\n\nSo, if you want to test the memory allocations performed by your functions,\nand make sure that you don't have problems in these allocations, you can simply\nwrite unit tests for these functions, where you provide the\n`std.testing.allocator` object as input to these functions.\n\nLook at the example below, where I'm defining a function that clearly causes\na memory leak. Because we allocate memory with the allocator object,\nbut we do not free this allocated memory in any point. So, when the function\nreturns, we lose the reference to the `buffer` object, which contains\nthe allocated memory, and, as a result, we can no longer free this memory.\n\nNotice that, inside a `test` block I execute this function with\nthe `std.testing.allocator`. Since no visible errors were raised inside\nthe `test` block, the `zig` compiler completes the process indicating that\nthe unit tests performed inside the `test` block labeled as `\"memory leak`\nhave all passed. But despite this result, the allocator object was capable\nof looking deeper in our program, and detecting the memory leak. As a result,\nthis allocator object returns a message \"tests leaked memory\", and also,\na stack trace showing the exact point where the memory was leaked.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Allocator = std.mem.Allocator;\nfn some_memory_leak(allocator: Allocator) !void {\n const buffer = try allocator.alloc(u32, 10);\n _ = buffer;\n // Return without freeing the\n // allocated memory\n}\n\ntest \"memory leak\" {\n const allocator = std.testing.allocator;\n try some_memory_leak(allocator);\n}\n```\n:::\n\n\n\n\n```\nTest [1/1] leak_memory.test.memory leak...\n [gpa] (err): memory address 0x7c1fddf39000 leaked: \n./ZigExamples/debugging/leak_memory.zig:4:39: 0x10395f2\n const buffer = try allocator.alloc(u32, 10);\n ^\n./ZigExamples/debugging/leak_memory.zig:12:25: 0x10398ea\n try some_memory_leak(allocator);\n\n... more stack trace\n```\n\n\n## Testing errors\n\nOne commom style of unit tests are those that look for\nspecific errors in your functions. In other words, you write\na unit test that tries to assert if a specific function call\nreturns any error, or a specific type of error.\n\nIn C++ you would normally write this stye of unit test using, for example,\nthe functions `REQUIRE_THROWS()` or `CHECK_THROWS()` from the [`Catch2` test framework](https://github.com/catchorg/Catch2/tree/devel)[^catch2].\nIn the case of a Python project, you would use the\n[`raises()` function from `pytest`](https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises)[^pytest].\nWhile in Rust, you would probably use `assert_eq!()` in conjunction with `Err()`.\n\n[^pytest]: \n[^catch2]: \n\n\nBut in Zig, we use the `expectError()` function, from the `std.testing` module.\nWith this function, you can test if a specific function call returns the exact\ntype of error that you expect it to return. To use this function, you first write\n`try expectError()`. Then, on the first argument, you provide the type of error that you\nare expecting from the function call. Then, on the second argument, you write\nthe function call you expect to fail.\n\nThe code example below demonstrates such type of unit test in Zig.\nNotice that, inside the function `alloc_error()` we are allocating\n100 bytes of memory, or, an array of 100 elements, for the object `ibuffer`. However,\nin the `test` block, we are using the `FixedBufferAllocator()`\nallocator object, which is limited to 10 bytes of space, because\nthe object `buffer`, which we provided to the allocator object,\nhave only 10 bytes of space.\n\nThat is why, the `alloc_error()` function raises an `OutOfMemory` error\non this case.\nBecause this function is trying to allocate more space than the allocator\nobject allows.\nSo, in essence, we are testing for a specific type of error,\nwhich is `OutOfMemory`. If the `alloc_error()` function returns any other type of error,\nthen, the `expectError()` function would make the entire test fail.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Allocator = std.mem.Allocator;\nconst expectError = std.testing.expectError;\nfn alloc_error(allocator: Allocator) !void {\n var ibuffer = try allocator.alloc(u8, 100);\n defer allocator.free(ibuffer);\n ibuffer[0] = 2;\n}\n\ntest \"testing error\" {\n var buffer: [10]u8 = undefined;\n var fba = std.heap.FixedBufferAllocator.init(&buffer);\n const allocator = fba.allocator();\n try expectError(error.OutOfMemory, alloc_error(allocator));\n}\n```\n:::\n\n\n\n\n```\n1/1 oom.test.testing error... OK\nAll 1 tests passed.\n```\n\n\n\n## Testing simple equalities\n\nIn Zig, there are some different ways you can test for an equality.\nYou already saw that we can use `expect()` with the logical operator `==`\nto essentially reproduce an equality test. But we also have\nsome helper functions that you should know about, specially\n`expectEqual()`, `expectEqualSlices()` and `expectEqualStrings()`.\n\n\nThe `expectEqual()` function, as the name suggests, is a classic\ntest equality function. It receives two objects as input. The first\nobject is the value that you expect to be in the second object.\nWhile second object is the object you have, or, the object that your application\nproduced as result. So, with `expectEqual()` you are essentially\ntesting if the values stored inside the two provided objects\nare equal or not.\n\nYou can see in the example below that, the test performed by\n`expectEqual()` failed. Because the objects `v1` and `v2` contain\ndifferent values in them.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\ntest \"values are equal?\" {\n const v1 = 15;\n const v2 = 18;\n try std.testing.expectEqual(v1, v2);\n}\n```\n:::\n\n\n\n\n```\n1/1 ve.test.values are equal?...\n expected 15, found 18\n FAIL (TestExpectedEqual)\nve.zig:5:5: test.values are equal? (test)\n try std.testing.expectEqual(v1, v2);\n ^\n0 passed; 0 skipped; 1 failed.\n```\n\n\nAlthough useful, the `expectEqual()` function does not work with arrays.\nFor testing if two arrays are equal, you should use the `expectEqualSlices()`\nfunction instead. This function have three arguments. First, you provide\nthe data type contained in both arrays that you are trying to compare.\nWhile the second and third arguments corresponds to the array objects that you want to compare.\n\nIn the example below, we are using this function to test if two array\nobjects (`array1` and `array2`) are equal or not. Since they\nare in fact equal, the unit test passed with no errors.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\ntest \"arrays are equal?\" {\n const array1 = [3]u32{1, 2, 3};\n const array2 = [3]u32{1, 2, 3};\n try std.testing.expectEqualSlices(\n u32, &array1, &array2\n );\n}\n```\n:::\n\n\n\n\n```\n1/1 oom.test.arrays are equal?... OK\nAll 1 tests passed.\n```\n\n\nAt last, you might also want to use the `expectEqualStrings()` function.\nAs the name suggests, you can use this function to test if two strings\nare equal or not. Just provide the two string objects that you want to compare,\nas inputs to the functions.\n\nIf the function finds any existing difference between the two strings,\nthen, the function will raise an error, and also, print an error message\nthat shows the exact difference between the two string objects provided,\nas the example below demonstrates:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\ntest \"strings are equal?\" {\n const str1 = \"hello, world!\";\n const str2 = \"Hello, world!\";\n try std.testing.expectEqualStrings(\n str1, str2\n );\n}\n```\n:::\n\n\n\n\n```\n1/1 t.test.strings are equal?... \n====== expected this output: =========\nhello, world!␃\n======== instead found this: =========\nHello, world!␃\n======================================\nFirst difference occurs on line 1:\nexpected:\nhello, world!\n^ ('\\x68')\nfound:\nHello, world!\n^ ('\\x48')\n```\n\n", - "supporting": [ - "03-unittests_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/04-http-server/execute-results/html.json b/_freeze/Chapters/04-http-server/execute-results/html.json index 36177ba1..8193d20b 100644 --- a/_freeze/Chapters/04-http-server/execute-results/html.json +++ b/_freeze/Chapters/04-http-server/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Project 2 - Building a HTTP Server from scratch\n\nIn this chapter, I want to implement a new\nsmall project with you. This time, we are going\nto implement a basic HTTP Server from scratch.\n\nThe Zig Standard Library already have a HTTP Server\nimplemented, which is available at `std.http.Server`.\nBut again, our objective here in this chapter, is to implement\nit **from scratch**. So we can't use this server object available\nfrom the Zig Standard Library.\n\n## What is a HTTP Server?\n\nFirst of all, what is a HTTP Server?\nA HTTP server, as any other type of server, is essentially\na program that runs indefinetly, on a infinite loop, waiting for incoming connections\nfrom clients. Once the server receives an incoming connection, it will\naccept this connection, and it will send messages back-and-forth to the client\nthrough this connection.\n\nBut the messages that are transmitted inside this connection are in a\nspecific format. They are HTTP messages\n(i.e. messages that uses the HTTP Protocol specification).\nThe HTTP Protocol is the backbone of the modern web.\nThe world wide web as we know it today, would not exist without the \nHTTP Protocol.\n\nSo, Web servers (which is just a fancy name to\nHTTP Servers) are servers that exchange HTTP messages with clients.\nAnd these HTTP servers and the HTTP Protocol specification\nare simply essential to the operation of the internet of today.\n\nThat is the whole picture of the process.\nAgain, we have two subjects involved here, a server (which is\na program that is running indefinetly, waiting for connections),\nand a client (which is someone that wants to connect to\nand exchange HTTP messages with the server).\n\nYou may find the material about the [HTTP Protocol available at the Mozilla MDN Docs](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)[^mdn-http]\n, a great resource for you to also look at. It gives you a great overview on how\nHTTP works, and what role the server plays in this matter.\n\n[^mdn-http]: .\n\n## How a HTTP Server works? {#sec-how-http-works}\n\nImagine a HTTP Server as if it were the receptionist of a large hotel. In a hotel,\nyou have a reception, and inside that reception there is a receptionist\nwaiting for customers to arrive. A HTTP Server is essentially a receptionist\nthat is indefinetly waiting for new customers (or, in the context of HTTP, new clients)\nto arrive in the hotel.\n\nWhen a customer arrives at the hotel, \nthat customer starts a conversation with the\nreceptionist. He tells the receptionist how many days he wants to stay at the hotel.\nThen, the receptionist search for an available apartment. If there is an available apartment\nat the moment, the customer pays the hotel fees, then, he gets the keys to the apartment,\nand then, he goes to the apartment to rest.\n\nAfter this entire process of dealing with the customer (searching for available rooms, receiving payment, handing over the keys),\nthe receptionist goes back to what he was doing earlier, which is to wait.\nWait for new customers to arrive.\n\nThat is, in a nutshell, what a HTTP Server do. It waits for clients to connect to the\nserver. When a client attempts to connect to the server, the server accepts this connection,\nand it starts to exchange messages with the client through this connection.\nThe first message that happens inside this connection is always a message from the client\nto the server. This message is called the *HTTP Request*.\n\nThis HTTP Request is a HTTP message that contains what\nthe client wants from the server. Is literally a request. The client\nthat connected to the server is asking this server to do something for him.\n\nThere are different \"types of request\" that any client can do to a HTTP Server.\nBut the most basic type of request, is when a client ask to the\nHTTP Server to serve (i.e. to send) some specific web page (which is a HTML file) to him.\nWhen you type `google.com` in your web browser, you are essentially sending a HTTP Request to Google's\nHTTP servers, which asks these servers to send the Google webpage to you.\n\nNonetheless, when the server reiceves this first message, the *HTTP Request*, it\nanalyzes this request, to understand: who the client is? what he wants the server to do?\nthis client provided all the necessary information to perform the action that he\nasked? Etc.\n\nOnce the server understands what the client wants, he simply perform the action\nthat was requested, and, to finish the whole process, the server sends back\na HTTP message to the client, informing if the action performed was succesful or not,\nand, at last, the server ends (or closes) the connection with the client.\n\nThis last HTTP message sent from the server to the client, is called the *HTTP Response*.\nBecause the server is responding to the action that was requested by the client.\nThe main objective of this response message is let the client knows if the\naction requested was succesful or not, before the server closes the connection.\n\n\n## How a HTTP server is normally implemented? {#sec-http-how-impl}\n\nLet's use the C language as an example. There are many materials\nteaching how to write a simple HTTP server in C code, like @jeffrey_http,\nor @nipun_http, or @eric_http.\nHaving this in mind, I will not show C code examples here, because you\ncan find them on the internet.\nBut I will describe the theory behind the necessary steps to perform in the C code.\n\n\nIn essence, we normally implement a HTTP server in C using WebSocket technology,\nwhich involves the following steps:\n\n1. Create a socket object.\n1. Bind a name (or more specifically, an address) to this socket object.\n1. Make this socket object to start listening and waiting for incoming connections.\n1. When a connection arrive, we accept this connection, and we exchange the HTTP messages (HTTP Request and HTTP Response).\n1. Then, we simply close this connection.\n\n\nA socket object (which uses the WebSocket technology) is essentially a channel of communication.\nYou are creating a channel where people can send messages to.\nWhen you create a socket object, this object is not binded to any particular\naddress. This means that with this object you have a representation of a channel of communication\nin your hands. But this channel is not currently available, or, it is not currently accessible,\nbecause it do not have a known address where you can find it.\n\nThat is what the \"bind\" operation do. It binds a name (or more specifically, an address) to\nthis socket object, or, this channel of communication, so that it becomes available,\nor, accessible through this address. While the \"listen\" operation makes the socket object to\nlisten for incoming connections in this address. In other words, the \"listen\" operation\nmakes the socket to wait for incoming connections.\n\nNow, when a client actually attempts to connect to the server through the address we specify,\nthen, the socket object needs to accept this incoming connection. When we\naccept it, then, the connection is established, client and server are now\nconnected to each other, and they can read or write messages in this\nestablished connection.\n\nAfter we received the HTTP Request from the client,\nand sent the HTTP Response to the client,\nwe can now close the connection, and end\nthis communication.\n\n\n## Implementing the server - Part 1\n\n### Creating the socket object {#sec-create-socket}\n\nLet's begin with creating the socket for our server.\nJust to make things shorter, I will create this socket object in\na separate Zig module. I will name it `config.zig`.\n\nIn Zig, we can create a web socket using\nthe `std.posix.socket()` function, from the Zig Standard Library.\nAs I meantioned earlier at @sec-http-how-impl, every socket object we create\nrepresents a communication channel, and we need to bind this channel to a specific address.\nAn \"address\" is defined as an IP address, or, more specifically, an IPv4 address^[It can be also an IPv6 address. But normally, we use a IPv4 address for that.].\nEvery IPv4 address is composed by two components. The first component is the host,\nwhich is a sequence of 4 numbers separated by dot characters (`.`) that identifies the machine used.\nWhile the second component is a port number, which identifies the specific\ndoor, or, the specific port to use in the host.\n\nThe sequence of 4 numbers (i.e. the host) identifies the machine (i.e. the computer itselft) where\nthis socket will live in. Every computer normally have multiple \"doors\" available inside him, because \nthis allows the computer to receive multiple connections at the same time.\nHe simply use a single door for each connection. So the port number, is\nessentially a number that identifies the specific door in the computer that will be resposible\nfor receiving the connection. That is, it identifies the \"door\" in the computer that the socket will use\nto receive incoming connections.\n\nTo make things simpler, I will use an IP address that identifies our current machine in this example.\nThis means that, our socket object will reside on the same computer that we are currently using\n(this is also known as the \"localhost\") to write this Zig source code.\n\nBy convention, the IP address that identifies the \"locahost\", which is the current machine we\nare using, is the IP `127.0.0.1`. So, that is the IP\naddress we are going to use in our server. I can declare it in Zig\nby using an array of 4 integers, like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst localhost = [4]u8{\n 127, 0, 0, 1\n};\n_ = localhost;\n```\n:::\n\n\n\n\nNow, we need to decide which port number to use. By convention, there are some\nport numbers that are reserved, meaning that, we cannot use them for our own\npurposes, like the port 22 (which is normally used for SSH connections).\nFor TCP connections, which is our case here,\na port number is a 16-bit unsigned integer (type `u16` in Zig),\nthus ranging from 0 to 65535 [@wikipedia_port].\nSo, we can choose\na number from 0 to 65535 for our port number. In the \nexample of this book, I will use the port number 3490\n(just a random number).\n\n\nNow that we have these two informations at hand, I can\nfinally create our socket object, using the `std.posix.socket()` function.\nFirst, we use the host and the port number to create an `Address` object,\nwith the `std.net.Address.initIp4()` function, like in the example below.\nAfter that, I use this address object inside the `socket()` function\nto create our socket object.\n\nThe `Socket` struct defined below summarizes all the logic behind this\nthis process. In this struct, we have two data members, which are:\n1) the address object; 2) and a stream object, which is\nthe object we will use to read and write the messages into any connection we establish.\n\nNotice that, inside the constructor method of this struct,\nwhen we create the socket object, we are using the `IPROTO.TCP` property as an input to\ntell the function to create a socket for TCP connections.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst builtin = @import(\"builtin\");\nconst net = @import(\"std\").net;\n\npub const Socket = struct {\n _address: std.net.Address,\n _stream: std.net.Stream,\n\n pub fn init() !Socket {\n const host = [4]u8{ 127, 0, 0, 1 };\n const port = 3490;\n const addr = net.Address.initIp4(host, port);\n const socket = try std.posix.socket(\n addr.any.family,\n std.posix.SOCK.STREAM,\n std.posix.IPPROTO.TCP\n );\n const stream = net.Stream{ .handle = socket };\n return Socket{ ._address = addr, ._stream = stream };\n }\n};\n```\n:::\n\n\n\n\n\n### Listening and receiving connections\n\nRemember that we stored the `Socket` struct\ndeclaration that we built at @sec-create-socket inside a Zig module named `config.zig`.\nThis is why I imported this module into our main module (`main.zig`) in the example below, as the `SocketConf` object,\nto access the `Socket` struct.\n\nOnce we created our socket object, we can focus now on making this socket object to\nlisten and receive new incoming connections. We do that, by calling the `listen()`\nmethod from the `Address` object that is contained in the socket object, and then,\nwe call the `accept()` method over the result.\n\nThe `listen()` method from the `Address` object produces a server object,\nwhich is an object that will stay open and running indefinitely, waiting\nto receive an incoming connection. Therefore, if you try to run the code\nexample below, by calling the `run` command from the `zig` compiler,\nyou will notice that the programs keeps running indefinitely,\nwithout a clear end.\n\nThis happens, because the program is waiting for something to happen.\nIt is waiting for someone to try to connect to the address (`http://127.0.0.1:3490`) where\nthe server is running and listening for incoming connections. This is what\nthe `listen()` method do, it makes the socket to be active waiting for someone\nto connect.\n\nOn the other side, the `accept()` method is the function that establish the connection\nwhen someone try to connect to the socket. This means that, the `accept()` method\nreturns a new connection object as a result. And you can use this connection object\nto read or write messages from or to the client.\nFor now, we are not doing anything with this connection object.\nBut we are going to use it on the next section.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n _ = connection;\n}\n```\n:::\n\n\n\n\nThis code example allows one single connection. In other words, the\nserver will wait for one incoming connection, and as soon as the\nserver is done with the first connection that it establishs, the\nprogram ends, and the server stops.\n\nThis is not the norm on the real world. Most people that write\na HTTP server like this, usually put the `accept()` method\ninside a `while` (infinite) loop, where if a connection\nis created with `accept()`, a new thread of execution is created to deal with\nthis new connection and the client. That is, real-world examples of HTTP Servers\nnormally rely on parallel computing to work.\n\nWith this design, the server simply accepts the connection,\nand the whole process of dealing with the client, and receiving\nthe HTTP Request, and sending the HTTP Response, all of this\nis done in the background, on a separate execution thread.\n\nSo, as soon as the server accepts the connection, and creates\nthe separate thread, the server goes back to what he was doing,\nwhich is to wait indefinitely for a new connection to accept.\nHaving this in mind, the code example exposed above, is a\nserver that serves only a single client. Because the program\nterminates as soon as the connection is accepted.\n\n\n\n### Reading the message from the client {#sec-read-http-message}\n\nNow that we have a connection established, i.e. the connection\nobject that we created through the `accept()` function, we can now\nuse this connection object to read any messages that the client\nsend to our server. But we can also use it to send messages back\nto the client.\n\nThe basic idea is, if we **write** any data into this connection object,\nthen, we are sending data to the client, and if we **read** the data present in\nthis connection object, then, we are reading any data that the\nclient sent to us, through this connection object. So, just\nhave this logic in mind. \"Read\" is for reading messages from the client,\nand \"write\" is to send a message to the client.\n\nRemember from @sec-how-http-works that, the first thing that we need to do is to read the HTTP Request\nsent by the client to our server. Because it is the first message that happens\ninside the established connection, and, as a consequence, it is the first\nthing that we need to deal with.\n\nThat is why, I'm going to create a new Zig module in this small project, named `request.zig`\nto keep all functions related to the HTTP Request\ntogether. Then, I will create a new function named `read_request()` that will\nuse our connection object to read the message sent by the client,\nwhich is the HTTP Request.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Connection = std.net.Server.Connection;\npub fn read_request(conn: Connection,\n buffer: []u8) !void {\n const reader = conn.stream.reader();\n _ = try reader.read(buffer);\n}\n```\n:::\n\n\n\n\n\nThis function accepts a slice object which behaves as a buffer.\nThe `read_request()` function reads the contents of the message sent into\nthe connection object, and saves this content into this buffer object that we provided\nas input.\n\nNotice that I'm using the connection object that we created to read\nthe message from the client. I first access the `reader` object that lives inside the\nconnection object. Then, I call the `read()` method of this `reader` object\nto effectivelly read and save the data sent by the client into the buffer object\nthat we created earlier. I'm discarting the return value\nof the `read()` method, by assigning it to the underscore character (`_`),\nbecause this return value is not useful for us right now.\n\n\n\n## Looking at the current state of the program\n\n\nI think it is a good time to see how our program is currently working. Shall we?\nSo, the first thing I will do is to update the `main.zig` module in our small Zig project,\nso that the `main()` function call this new `read_request()` function that we just created.\nI will also add a print statement at the end of the `main()` function,\njust so that you can see what the HTTP Request we just loaded into the buffer object\nlooks like.\n\nAlso, I'm creating the buffer object in the `main()` function, which will be\nresponsible for storing the message sent by the client, and, I'm also\nusing a `for` loop to initialize all fields of this buffer object to the number zero.\nThis is important to make sure that we don't have uninitialized memory in\nthis object. Because uninitialized memory may cause undefined behaviour in our program.\n\nSince the `read_request()` receives as input the buffer object as a slice object (`[]u8`),\nI am using the syntax `array[0..array.len]` to get access to a slice of this `buffer` object,\nand provide it as input to the function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n _ = try Request.read_request(\n connection, buffer[0..buffer.len]\n );\n try stdout.print(\"{s}\\n\", .{buffer});\n}\n```\n:::\n\n\n\n\nNow, I'm going to execute this program, with the `run` command from the\n`zig` compiler. But remember, as we sad earlier, as soon as I execute this program, it will\nhang indefinitely, because the program is waiting for a client trying to\nconnect to the server.\n\nMore specifically, the program will pause at the line\nwith the `accept()` call. As soon as a client try to connect to the\nserver, then, the execution will \"unpause\", and the `accept()` function\nwill finally be executed to create the\nconnection object we need, and the remaining of the program\nwill run.\n\nYou can see that at @fig-print-zigrun1. The message\n`Server Addr: 127.0.0.1:3490` is printed to the console, and the program\nis now waiting for an incoming connection.\n\n![A screenshot of running the program](./../Figures/print-zigrun1.png){#fig-print-zigrun1}\n\n\nWell, we can finally try to connect to this server, and there are several ways we can do this.\nFor example, we could use the following Python script:\n\n```python\nimport requests\nrequests.get(\"http://127.0.0.1:3490\")\n```\n\nOr, we could also open any web browser of our preference, and type\nthe URL `localhost:3490`. OBS: `localhost` is the same thing as the\nIP `127.0.0.1`. When you press enter, and your web browser go\nto this address, first, the browser will probably print a message\nsaying that \"this page isn't working\", and, then, it will\nprobably change to a new message saying that \"the site can't be\nreached\".\n\nYou get these \"error messages\" in the web browser, because\nit got no response back from the server. In other words, when the web\nbrowser connected to our server, it did send the HTTP Request through the established connection.\nThen, the web browser was expecting to receive a HTTP Response back, but\nit got no response from the server (we didn't implemented the HTTP Response logic yet).\n\nBut that is okay. We achieved the result that we wanted for now,\nwhich is to connect to the server, and see the HTTP Request\nthat was sent by the web browser (or by the Python script)\nto the server.\n\nIf you comeback to the console that you left open\nwhen you executed the program, you will see that the\nprogram finished it's execution, and, a new message is\nprinted in the console, which is the actual HTTP Request\nmessage that was sent by the web browser to the server.\nYou can see this message at @fig-print-zigrun2.\n\n![A screenshot of the HTTP Request sent by the web browser](./../Figures/print-zigrun2.png){#fig-print-zigrun2}\n\n\n\n\n## Learning about Enums in Zig {#sec-enum}\n\nEnums structures are available in Zig through the `enum` keyword.\nAn enum (short for \"enumeration\") is a special structure that represents a group of constant values.\nSo, if you have a variable which can assume a short and known\nset of values, you might want to associate this variable to an enum structure,\nto make sure that this variable only assumes a value from this set.\n\nA classic example for enums are primary colors. If for some reason, your program\nneeds to represent one of the primary colors, you can create a enum\nthat represents one of these colors.\nIn the example below, we are creating the enum `PrimaryColorRGB`, which\nrepresents a primary color from the RGB color system. By using this enum,\nI am garanteed that the `acolor` object for example, will contain\none of these three values: `RED`, `GREEN` or `BLUE`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst PrimaryColorRGB = enum {\n RED, GREEN, BLUE\n};\nconst acolor = PrimaryColorRGB.RED;\n_ = acolor;\n```\n:::\n\n\n\n\nIf for some reason, my code tries to save in `acolor`,\na value that is not in this set, I will get an error message\nwarning me that a value such as \"MAGENTA\" do not exist\ninside the `PrimaryColorRGB` enum.\nThen I can easily fix my mistake.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst acolor = PrimaryColorRGB.MAGENTA;\n```\n:::\n\n\n\n\n```\ne1.zig:5:36: error: enum 'PrimaryColorRGB' has\n no member named 'MAGENTA':\n const acolor = PrimaryColorRGB.MAGENTA;\n ^~~~~~~\n```\n\nBehind the hood, enums in Zig work the same way that enums\nwork in C. Each enum value is essentially represented as a integer.\nThe first value in the set is represented as zero,\nthen, the second value is one, ... etc.\n\nOne thing that we are going to learn on the next section is that\nenums can have methods in them. Wait... What? This is amazing!\nYes, enums in Zig are similar to structs, and they can have\nprivate and public methods inside them.\n\n\n\n\n\n\n\n## Implementing the server - Part 2\n\nNow, on this section, I want to focus on parsing\nthe HTTP Request we received from the client.\nHowever, to effectively parse a HTTP Request message, we first need to understand it's\nstructure.\nIn summary, a HTTP Request is a text message that is divided into 3 different\nsections (or parts):\n\n- The top-level header indicating the method of the HTTP Request, the URI, and the HTTP version used in the message.\n- A list of HTTP Headers.\n- The body of the HTTP Request.\n\n### The top-level header\n\nThe first line of text in a HTTP Request always come with the three most essential\ninformation about the request being analyzed. These three key attributes of the HTTP Request\nare separated by a simple space in this first line of the request.\nThe first information is the HTTP method that is being\nused in the request, second, we have the URI to which this HTTP Request is being sent to,\nand third, we have the version of the HTTP protocol that is being used in this HTTP Request.\n\nIn the snippet below, you will find an example of this first line in a HTTP Request.\nFirst, we have the the HTTP method of this request (`GET`). Many programmers\nrefer to the URI component (`/users/list`) as the \"API endpoint\" to which the HTTP Request\nis being sent to. In the context of this specific request, since it is a GET request,\nyou could also say that the URI component is the path to the resource we want to access,\nor, the path to the document (or the file) that we want to retrieve from the server.\n\n```\nGET /users/list HTTP/1.1\n```\n\nAlso, notice that this HTTP Request is using the version 1.1 of the HTTP protocol,\nwhich is the most popular version of the protocol used in the web.\n\n\n\n### The list of HTTP headers\n\nMost HTTP Requests also include a section of HTTP Headers,\nwhich is just a list of attributes or key-value pairs associated with this\nparticular request. This section always comes right after the \"top-level header\" of the request.\n\nFor our purpose in this chapter, which is to build a simple HTTP Server,\nwe are going to ignore this section of the HTTP Request, for simplicity.\nBut most HTTP servers that exist in the wild parses and use these\nHTTP headers to change the way that the server responds to the request\nsend by the client.\n\nFor example, many requests we encounter in the real-world comes with\na HTTP header called `Accept`. In this header, we find a list of [MIME types](https://en.wikipedia.org/wiki/Media_type)[^mime].\nThis list indicates the file formats that the client can read, or parse, or interpret.\nIn other words, you also interpret this header as the client saying the following phrase\nto the server: \"Hey! Look, I can read only HTML documents, so please, send me back\na document that is in a HTML format.\".\n\n[^mime]: .\n\nSo, if the HTTP server can read and use this `Accept` header, then, the server can identify\nwhich is the best format of the document to send to the client. Maybe the HTTP server have\nthe same document in multiple formats, for example, in JSON, in XML, in HTML and in PDF,\nbut the client can only understand documents in the format HTML. That is the purpose\nof this `Accept` header.\n\n\n### The body\n\nThe body comes after the list of HTTP headers, and it is an optional section of the HTTP Request, meaning that, not\nall HTTP Request will come with a body in it. For example, every HTTP Request that uses the\nGET method usually do not come with a body.\n\nBecause a GET request is used to request data,\ninstead of sending it to the server. So, the body section is more related to the POST method, which is a method that involves\nsending data to the server, to be processed and stored.\n\nSince we are going to support only the GET method in this project, it means that\nwe also do not need to care about the body of the request.\n\n\n\n### Creating the HTTP Method enum\n\nEvery HTTP Request comes with a explicit method. The method used in a HTTP Request\nis identified by one these words:\n\n- GET;\n- POST;\n- OPTIONS;\n- PATCH;\n- DELETE;\n- and some other methods.\n\nEach HTTP method is used for a specific type of task. The POST method for example is normally\nused to post some data into the destination. In other words, it is used\nto send some data to the HTTP server, so that it can be processed and stored by the server.\n\nAs another example, the GET method is normally used to get content from the server.\nIn other words, we use this method whenever we want the server to send some\ncontent back to us. It can be any type of content. It can be a web page,\na document file, or some data in JSON format.\n\nWhen a client send a POST HTTP Request, the HTTP Response sent by the server normally have the sole purpose of\nletting the client know if the server processed and stored the data succesfully.\nIn contrast, when the server receives a GET HTTP Request, then, the server sends the content\nthat the client asked for in the HTTP Response itself. This demonstrates that the method associated\nwith the HTTP Request changes a lot on the dynamics and the roles that each party\nplays in the whole process.\n\nSince the HTTP method of the HTTP Request is identified by this very small and specific\nset of words, it would be interesting to create an enum structure to represent a HTTP method.\nThis way, we can easily check if the HTTP Request we receive from the client is a\nHTTP method that we currently support in our small HTTP server project.\n\nThe `Method` structure below represents this enumeration.\nNotice that, for now, only the GET HTTP method is included in this\nenumeration. Because, for the purpose of this chapter, I want to\nimplement only the GET HTTP method. That is why I am not\nincluding the other HTTP methods in this enumeration.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Method = enum {\n GET\n};\n```\n:::\n\n\n\n\n\nNow, I think we should add two methods to this enum structure. One method is `is_supported()`,\nwhich will be a function that returns a boolean value, indicating if the input HTTP method is supported\nor not by our HTTP Server. The other is `init()`, which is a constructor function that takes a string as input,\nand tries to convert it into a `Method` value.\n\n\nBut in order to build these functions, I will use a functionality from the Zig Standard Library, called\n`StaticStringMap()`. This function allows us to create a simple map from strings to enum values.\nIn other words, we can use this map structure to map a string to the respective enum value.\nTo some extent, this specific structure from the standard library works almost like a \"hashtable\" structure,\nand it is optmized for a small sets of words, or, a small set of keys, which is our case here.\n\nTo use this function, you have to import it from the `std.static_string_map` module. Just\nto make things shorter and easier to type, I am going to import this function through a different\nand shorter name (`Map`).\n\nWith `Map()` imported, we just apply this function over the enum structure\nthat we are going to use in the resulting map. In our case here, it is the `Method` enum structure\nthat we declared at the last code example. Then, I call the `initComptime()` method with the\nmap, i.e. the list of key-value pairs that we are going to use.\n\nYou can see in the example below that I write this map using multiple anonymous struct literals.\nInside the first (or \"top-level\") struct literal we have a list (or a sequence) of struct literals.\nEach struct literal in this list represents a key-value pair. The first element\nin this struct is the key, while the second element is the value. The first element (or the key)\nin each key-value pair should always be a string value. While the second element should\nbe a value from the enum structure that you used inside the `Map()` function.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Map = std.static_string_map.StaticStringMap;\nconst MethodMap = Map(Method).initComptime(.{\n .{ \"GET\", Method.GET },\n});\n```\n:::\n\n\n\n\nTherefore, the `MethodMap` object is basically a `std::map` object from C++, or,\na `dict` object from Python. You can retrieve (or get) the enum value that\ncorresponds to a particular key, by using the `get()` method from the map\nobject. This method returns an optional value, so, the `get()` method might\nresult in a null value.\n\nWe can use this in our advantage to detect if a particular HTTP method is\nsupported or not in our HTTP server. Because, if the `get()` method returns null,\nit means that it did not find this method inside our `MethodMap` object, and,\nas a consequence, this method is not supported by our HTTP server.\nIn this specific case, a null value means \"not supported method\".\n\nThe `init()` method below, takes a string value as input, and then, it simply passes this string value\nto the `get()` method of our `MethodMap` object. As consequence, we should get the enum value that corresponds\nto this input string.\n\nNotice in the example below that, the `init()` method returns either an error\n(which might happen if the `?` method returns `unreacheable`, checkout @sec-null-handling for more details)\nor a `Method` object as result. Since `GET` is currently the only value in our `Method` enum\nstructure, it means that, the `init()` method will most likely return the value `Method.GET` as result.\n\nAlso notice that, in the `is_supported()` method, we are using the optional value returned\nby the `get()` method from our `MethodMap` object. The if statement unwrapes the optional value\nreturned by this method, and returns `true` in case this optional value is a not-null value.\nOtherwise, it simply returns `false`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Method = enum {\n GET,\n pub fn init(text: []const u8) !Method {\n return MethodMap.get(text).?;\n }\n pub fn is_supported(m: []const u8) bool {\n const method = MethodMap.get(m);\n if (method) |_| {\n return true;\n }\n return false;\n }\n};\n```\n:::\n\n\n\n\n\n\n\n\n\n\n### Writing the parse request function\n\nNow that we created the enum that represents our HTTP method,\nwe should start to write the function responsible for\nactually parsing the HTTP Request. Let's first, understand\nthe structure of a HTTP Request.\n\nThe first thing we can do, is to write a struct to represent the HTTP Request.\nTake the `Request` struct below as an example. It contains the three\nessential information from the \"top-level\" header (i.e. the first line)\nin the HTTP Request.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Request = struct {\n method: Method,\n version: []const u8,\n uri: []const u8,\n pub fn init(method: Method,\n uri: []const u8,\n version: []const u8) Request {\n return Request{\n .method = method,\n .uri = uri,\n .version = version,\n };\n }\n};\n```\n:::\n\n\n\n\n\nThe `parse_request()` function should receive a string as input. This input string\ncontains the entire HTTP Request message, and the parsing function should\nread and understand the individual parts of this message.\n\nNow, remember that for the purpose of this chapter, we care about only about the first\nline in this message, which contains the \"top-level header\", or, the three essential attributes about the HTTP Request,\nwhich are the HTTP method used, the URI and the HTTP version.\n\nNotice that I use the function `indexOfScalar()` in `parse_request()`. This function from the\nZig Standard Library returns the first index where the scalar value that we provide\nhappens in a string. In this case, I'm looking at the first occurrence of the new line character (`\\n`).\nBecause once again, we care only about the first line in the HTTP Request message.\nThis is the line where we have the three information we want to parse\n(version of HTTP, the HTTP method and the URI).\n\nTherefore, we are using this `indexOfScalar()` function\nto limit our parsing process to the first line in the message.\nIs also worth mentioning that, the `indexOfScalar()` function returns an optional value.\nThat is why I use the `orelse` keyword to provide an alternative value, in case\nthe value returned by the function is a null value.\n\nSince each of these three attributes are separated by a simple space, we\ncould use the function `splitScalar()` from the Zig Standard Library to split\nthe input string into sections by looking for every position that appears\na simple space. In other words, this `splitScalar()` function is equivalent\nto the `split()` method in Python, or, the `std::getline()` function from C++,\nor the `strtok()` function in C.\n\nWhen you use this `splitScalar()` function you get an iterator as the result.\nThis iterator have a `next()` method that you can use to advance the iterator\nto the next position, or, to the next section of the splitted string.\nNote that, when you use `next()`, the method not only advances the iterator,\nbut it also returns a slice to the current section of the splitted\nstring as result.\n\nNow, if you want to get a slice to the current section of the splitted\nstring, but not advance the iterator to the next position, you can use\nthe `peek()` method. Both `next()` and `peek()` methods return an optional value, that is\nwhy I use the `?` method to unwrap these optional values.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn parse_request(text: []u8) Request {\n const line_index = std.mem.indexOfScalar(\n u8, text, '\\n'\n ) orelse text.len;\n var iterator = std.mem.splitScalar(\n u8, text[0..line_index], ' '\n );\n const method = try Method.init(iterator.next().?);\n const uri = iterator.next().?;\n const version = iterator.next().?;\n const request = Request.init(method, uri, version);\n return request;\n}\n```\n:::\n\n\n\n\nHave you notice that the \"string utility functions\" we imported from the Zig Standard Library,\nand that we applied over the input string, comes from the `mem` module (which is the \"memory utility functions\" module)?\n\nAs I described at @sec-zig-strings, strings in Zig are simply arrays of bytes in the language.\nSo, you will find inside this `mem` module lots of excellent utility functions to work directly with bytes\nand arrays in Zig, and, as a result, these functions will also be very useful if you have to work with string data.\n\n\n\n### Using the parse request function\n\nNow that we wrote the function responsible for parsing the HTTP Request,\nwe can add the function call to `parse_request()` in\nthe `main()` function of our program.\n\nAfter, is a good idea to test once again the state of our program.\nI execute this program again with the `run` command from the `zig` compiler,\nthen, I use my web browser to connect once again to the server through the URL `localhost:3490`, and finally,\nthe end result of our `Request` object is printed to the console.\n\nA quick observation, since I used the `any` format specifier in the\nprint statement, the data members `version` and `uri` of the `Request`\nstruct were printed as raw integer values. String data being printed\nas integer values is commom in Zig, and remember, these integer values are just the decimal representation of\nthe bytes that form the string in question.\n\nIn the result below, the sequence of decimal values 72, 84, 84, 80, 47, 49, 46, 49, and 13,\nare the bytes that form the text \"HTTP/1.1\". And the integer 47, is the decimal value of\nthe character `/`, which represents our URI in this request.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n try Request.read_request(\n connection, buffer[0..buffer.len]\n );\n const request = Request.parse_request(buffer[0..buffer.len]);\n try stdout.print(\"{any}\\n\", .{request});\n}\n```\n:::\n\n\n\n\n```\nrequest.Request{\n .method = request.Method.GET,\n .version = {72, 84, 84, 80, 47, 49, 46, 49, 13},\n .uri = {47}\n}\n```\n\n\n\n### Sending the HTTP Response to the client\n\nIn this last part, we are going to write the logic responsible for\nsending the HTTP Response from the server to the client. To make things\nsimple, the server in this project will send just a simple web page\ncontaining the text \"Hello world\".\n\nFirst, I create a new Zig module in the project, named `response.zig`.\nIn this module, I will declare just two functions. Each function\ncorresponds to a specific status code in the HTTP Response.\nThe `send_200()` function will send a HTTP Response with status code 200\n(which means \"Success\") to the client. While the `send_404()` sends a response\nwith status code 404 (which means \"Not found\").\n\nThis is definitely not the most ergonomic and adequate way of handling the\nHTTP Response, but it works for our case here. We are just building toy projects\nin this book after all. So, the source code we write do not need to be perfect.\nIt just needs to work!\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Connection = std.net.Server.Connection;\npub fn send_200(conn: Connection) !void {\n const message = (\n \"HTTP/1.1 200 OK\\nContent-Length: 48\"\n ++ \"\\nContent-Type: text/html\\n\"\n ++ \"Connection: Closed\\n\\n\"\n ++ \"

Hello, World!

\"\n );\n _ = try conn.stream.write(message);\n}\n\npub fn send_404(conn: Connection) !void {\n const message = (\n \"HTTP/1.1 404 Not Found\\nContent-Length: 50\"\n ++ \"\\nContent-Type: text/html\\n\"\n ++ \"Connection: Closed\\n\\n\"\n ++ \"

File not found!

\"\n );\n _ = try conn.stream.write(message);\n}\n```\n:::\n\n\n\n\nNotice that both functions receives the connection object as input, and\nuse the `write()` method to write the HTTP Response message directly\ninto this communication channel. As result, the party in the other\nside of the connection (i.e. the client), will receive such message.\n\nMost real-world HTTP Servers will have a single function (or a single struct) to effectively handle\nthe response. It gets the HTTP Request already parsed as input, and then, it tries to build\nthe HTTP Response bit by bit, before the function sends it over the connection.\n\nWe would also have a specialized struct to represent a HTTP Response, and\na lot of methods that would be used to build each part or component of the response object.\nTake the `Response` struct created by the Javascript runtime Bun as an example.\nYou can find this struct in the [`response.zig` module](https://github.com/oven-sh/bun/blob/main/src/bun.js/webcore/response.zig)[^bun-resp]\nin their GitHub project.\n\n[^bun-resp]: .\n\n\n## The end result\n\nWe can now, update once again our `main()` to incorporate our new\nfunctions from the `respons.zig` module. First I need to import this module\ninto our `main.zig` module, then, I add the function calls to `send_200()`\nand `send_404()`.\n\nNotice that I'm using if statements to decide which \"response function\" to call,\nbased specially on the URI present in the HTTP Request. If the user asked for\na content (or a document) that is not present in our server, we should respond\nwith a 404 status code. But since we have just a simple HTTP server, with no\nreal documents to send, we can just check if the URI is the root path (`/`)\nor not to decide which function to call.\n\nAlso, notice that I'm using the function `std.mem.eql()` from the Zig Standard Library\nto check if the string from `uri` is equal or not the string `\"/\"`. This function is\nused to check if two arrays are equal or not. In the first argument, you provide\nthe type of the elements present in this array. While in the second and third arguments,\nyou provide the arrays to be compared.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst Response = @import(\"response.zig\");\nconst Method = Request.Method;\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n try Request.read_request(connection, buffer[0..buffer.len]);\n const request = Request.parse_request(buffer[0..buffer.len]);\n if (request.method == Method.GET) {\n if (std.mem.eql(u8, request.uri, \"/\")) {\n try Response.send_200(connection);\n } else {\n try Response.send_404(connection);\n }\n }\n}\n```\n:::\n\n\n\n\n\nNow that we adjusted our `main()` function, I can now execute our program, and\nsee the effects of these last changes. First, I execute the program once again, with the\n`run` command of the `zig` compiler. The program will hang, waiting for a client to connect.\n\nThen, I open my web browser, and try to connect to the server again, using the URL `localhost:3490`.\nThis time, instead of getting ar \"error message\" from the browser, you will get the message\n\"Hello World\" printed into your web browser. Because this time, the server sended the HTTP Response\nsuccesfully to the web browser, as demonstrated by @fig-print-zigrun3.\n\n\n![The Hello World message sent in the HTTP Response](./../Figures/print-zigrun3.png){#fig-print-zigrun3}\n\n", - "supporting": [ - "04-http-server_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/05-pointers/execute-results/html.json b/_freeze/Chapters/05-pointers/execute-results/html.json index d635e449..5041df79 100644 --- a/_freeze/Chapters/05-pointers/execute-results/html.json +++ b/_freeze/Chapters/05-pointers/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Pointers and Optionals {#sec-pointer}\n\nOn our next project we are going to build a HTTP server from scratch.\nBut in order to do that, we need to learn more about pointers and how they work in Zig.\nPointers in Zig are similar to pointers in C.\n\nA pointer is an object that contains a memory address. This memory address is the address where\na particular value is stored in the memory. It can be any value. Most of the times,\nit is a value that comes from another object (or variable) present in our code.\n\nIn the example below, I'm creating two objects (`number` and `pointer`).\nThe `pointer` object contains the memory address where the value of the `number` object\n(the number 5) is stored. So, that is a pointer in a nutshell. It is a memory\naddress that points to a particular existing value in the memory. You could\nalso say, that, the `pointer` object points to the memory address where the `number` object is\nstored.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number: u8 = 5;\nconst pointer = &number;\n```\n:::\n\n\n\n\nWe create a pointer object in Zig by using the `&` operator. When you put this operator\nbefore the name of an existing object, you get the memory address of this object as result.\nWhen you store this memory address inside an object, this object becomes a pointer object.\nBecause this object stores a memory address.\n\nPeople mostly use pointers as an alternative way to access a particular value.\nFor example, I can use the `pointer` object to access the value stored by\nthe `number` object. We call this operation (i.e. accessing the value that the\npointer points to) of *dereferencing the pointer*. So, we can dereference\na pointer by using the `*` method of the pointer object. Like in the example\nbelow, where we are taking the number 5 pointed by the `pointer` object,\nand doubling it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst doubled = 2 * pointer.*;\nstd.debug.print(\"{d}\\n\", .{doubled});\n```\n:::\n\n\n\n\n```\n10\n```\n\nThis syntax to dereference the pointer is nice. Because we can easily chain it with\nmethods of the value pointed by the pointer. We can use the `User` struct that we\ncreated at @sec-structs-and-oop as an example. If you comeback to that section,\nyou will see that this struct have a method named `print_name()`.\n\nSo, for example, if we have an user object, and a pointer that points to this user object,\nwe can use the pointer to access this user object, and, at the same time, call the method `print_name()`\non it, by chaining the dereference method (`*`) with the `print_name()` method. Like in the\nexample below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst u = User.init(1, \"pedro\", \"email@gmail.com\");\nconst pointer = &u;\ntry pointer.*.print_name();\n```\n:::\n\n\n\n\n```\npedro\n```\n\nWe can also use pointers to effectively alter the value of an object.\nFor example, I could use the `pointer` object to set\nthe value of the object `number` to 6, like in the example below.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nconst pointer = &number;\npointer.* = 6;\ntry stdout.print(\"{d}\\n\", .{number});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n6\n```\n\n\n:::\n:::\n\n\n\n\n\nTherefore, as I mentioned earlier, people use pointers as an alternative way to access a particular value.\nAnd they use it specially when they do not want to \"move\"\nthe values they produce in their code. There are situations where,\nyou want to access a particular value in a different scope of your code,\nbut you do not want to \"move\" this value to the new scope that you are in.\n\nThis matters especially if this value is big in size. Because if it is, then,\nmoving this value becomes an expensive operation to do.\nThe computer will have to spend a considerable amount of time\ncopying this value to a new location.\n\nThat is why, many programmers avoid this heavy operation of copying the value,\nby accessing it through pointers.\nWe are going to talk more about this \"moving operation\" on the next sections.\nFor now, just keep in your mind that avoiding this \"move operation\" is\none of main reasons why pointers are used in programming languages.\n\n\n\n\n\n## Constant objects vs variable objects {#sec-pointer-var}\n\nYou can have a pointer that points to a constant object, or, a pointer that points to a variable object.\nBut regardless of who this pointer is, a pointer **must always respect the characteristics of the object that it points to**.\nAs a consequence, if the pointer points to a constant object, then, you cannot use this pointer\nto change the value that it points to. Because it points to a value that is constant. As we discussed at @sec-assignments, you cannot\nchange a value that is constant.\n\nFor example, if I have a `number` object, which is constant, I cannot execute\nthe expression below where I'm trying to change the value of `number` to 6 through\nthe `pointer` object. As demonstrated below, when you try to do something\nlike that, you get a compile time error:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number = 5;\nconst pointer = &number;\npointer.* = 6;\n```\n:::\n\n\n\n\n```\np.zig:6:12: error: cannot assign to constant\n pointer.* = 6;\n```\n\nIf I change the `number` object to be a variable object, by introducing the `var` keyword,\nthen I can change the value of this object through a pointer, as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nconst pointer = &number;\npointer.* = 6;\ntry stdout.print(\"{d}\\n\", .{number});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n6\n```\n\n\n:::\n:::\n\n\n\n\nYou can see this relationship on the data type of your pointer object.\nIn other words, the data type of a pointer object already gives you\nsome clues about whether the value that it points to is\nconstant or not.\n\nWhen a pointer object points to a constant value, then, this pointer have a data type `*const T`,\nwhich means \"a pointer to a constant value of type `T`\".\nIn contrast, if the pointer points to a variable value, then, the type of the pointer is usually `*T`, which is\nsimply \"a pointer to a value of type `T`\".\n\nWhenever you see a pointer object whose type is in the format `*const T`, then, you know that you cannot use this pointer\nto change the value that it points to. Because this pointer points to a constant value of type `T`.\n\n\nHence, we talked about the value pointed by the pointer being constant or not,\nand the consequences that arises from it. But, what about the pointer object itself? I mean, what happens\nif the pointer object itself is constant or not? Think about it.\nWe can have a constant pointer that points to a constant value.\nBut we can also have a variable pointer that points to a constant value. And vice-versa.\n\nUntil this point, the `pointer` object was always constant,\nbut what this means for us? What is the consequence of the\n`pointer` object being constant? The consequence is that\nwe cannot change the pointer object, because it is constant. We can use the\npointer object in multiple ways, but we cannot change the\nmemory address that is inside this pointer object.\n\nHowever, if we mark the `pointer` object as a variable object,\nthen, we can change the memory address pointed by this `pointer` object.\nThe example below demonstrates that. Notice that the object pointed\nby the `pointer` object changes from `c1` to `c2`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c1: u8 = 5;\nconst c2: u8 = 6;\nvar pointer = &c1;\ntry stdout.print(\"{d}\\n\", .{pointer.*});\npointer = &c2;\ntry stdout.print(\"{d}\\n\", .{pointer.*});\n```\n:::\n\n\n\n\n```\n5\n6\n```\n\nThus, by setting the `pointer` object to a `var` or `const` object,\nyou specify if the memory address contained in this pointer object can change or not\nin your program. On the other side, you can change the value pointed by the pointer,\nif, and only if this value is stored in a variable object. If this value\nis in a constant object, then, you cannot change this value through a pointer.\n\n\n## Types of pointer\n\nIn Zig, there are two types of pointers [@zigdocs], which are:\n\n- single-item pointer (`*`);\n- many-item pointer (`[*]`);\n\n\nSingle-item pointer objects are objects whose data types are in the format `*T`.\nSo, for example, if an object have a data type `*u32`, it means that, this\nobject contains a pointer that points to an unsigned 32-bit integer value.\nAs another example, if an object have type `*User`, then, it contains\na pointer to a `User` value.\n\nIn contrast, many-item pointer objects are objects whose data types are in the format `[*]T`.\nNotice that the star symbol (`*`) is now inside a pair of brackets (`[]`). If the star\nsymbol is inside a pair of brackets, you know that this object is a many-item pointer.\n\nWhen you apply the `&` operator over an object, you will always get a single-item pointer.\nMany-item pointers are more of a \"internal type\" of the language, more closely\nrelated to slices. So, when you deliberately create a pointer with the `&` operator,\nyou always get a single-item pointer as result. Remember that.\n\n\n\n## Pointer arithmethic\n\nPointer arithmethic is available in Zig, and they work the same way they work in C.\nWhen you have a pointer that points to an array, the pointer usually points to\nthe first element in the array, and you can use pointer arithmethic to\nadvance this pointer and access the other elements in the array.\n\n\nNotice in the example below, that initially, the `ptr` object was pointing\nto the first element in the array `ar`. But then, I started to walk through the array, by advancing\nthe pointer with simple pointer arithmethic.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [_]i32{1,2,3,4};\nvar ptr = &ar;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\nptr += 1;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\nptr += 1;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\n```\n:::\n\n\n\n\n```\n1\n2\n3\n```\n\nAlthough you can create a pointer to an array like that, and\nstart to walk through this array by using pointer arithmethic,\nin Zig, we prefer to use slices, which were presented at @sec-arrays.\n\nBehind the hood, slices already are pointers,\nand they also come with the `len` property, which indicates\nhow many elements are in the slice. This is good because the `zig` compiler\ncan use it to check for potential buffer overflows, and other problems like that.\n\nAlso, you don't need to use pointer arithmethic to walk through the elements\nof a slice. You can simply use the `slice[index]` syntax to directly access\nany element you want in the slice.\nAs I mentioned at @sec-arrays, you can get a slice from an array by using\na range selector inside brackets. In the example below, I'm creating\na slice (`sl`) that covers the entire `ar` array. I can access any\nelement of `ar` from this slice, and, the slice itself already is a pointer\nbehind the hood.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [_]i32{1,2,3,4};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\n\n## Optionals and Optional Pointers\n\nLet's talk about optionals and how they relate to pointers in Zig.\nBy default, objects in Zig are **non-nullable**. This means that, in Zig,\nyou can safely assume that any object in your source code is not null.\n\nThis is a powerful feature of Zig when you compare it to the developer experience in C.\nBecause in C, any object can be null at any point, and, as consequence, a pointer in C\nmight point to a null value. This is a commom source of undefined behaviour in C.\nWhen programmers work with pointers in C, they have to constantly check if\ntheir pointers are pointing to null values or not.\n\nIf for some reason, your Zig code produces a null value somewhere, and, this null\nvalue ends up in an object that is non-nullable, a runtime error is always\nraised by your Zig program. Take the program below as an example.\nThe `zig` compiler can see the `null` value at compile time, and, as result,\nit raises a compile time error. But, if a `null` value is raised during\nruntime, a runtime error is also raised by the Zig program, with a\n\"attempt to use null value\" message.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nnumber = null;\n```\n:::\n\n\n\n```\np5.zig:5:14: error: expected type 'u8',\n found '@TypeOf(null)'\n number = null;\n ^~~~\n```\n\n\nYou don't get this type of security in C.\nIn C, you don't get warnings or errors about null values being produced in your program.\nIf for some reason, your code produces a null value in C, most of the times, you end up getting a segmentation fault error\nas result, which can mean many things.\nThat is why programmers have to constantly check for null values in C.\nBecause they do not have much visibility of null values.\n\nPointers in Zig are also, by default, **non-nullable**. This is another amazing\nfeature in Zig. So, any pointer you create in your code, you can\nsafely assume that this pointer is pointing to a non-null value.\nTherefore, you don't have this heavy work of checking if the pointers you create\nin Zig are pointing to a null value.\n\nThis is an important feature, because this is also a commom source\nsource of undefined behaviour in C. For example, if you have a pointer `x` that points to\nan object `y`, the value of the object `y` might change during the runtime\nof your program, and, as consequence, the value pointed by the pointer `x`\nwill change. I demonstrated already how the value pointed by a pointer\nmight change at @sec-pointer-var.\n\nSo, if for some reason, the value of `y` changes to a null value during runtime,\nand, you try to use the object `y` in some way in your code, the chances of\nproducing a undefined behaviour in your program are sky high.\n\nIn C, when the value of `y` changes, you have no garantees that the value\nof the object `y` changed to a null value or not. You simply don't know\nupfront if you have a null value in `y` or not. But in Zig,\nyou can safely assume that the object `y` will never have a null value.\n\n\n### What are optionals?\n\nOk, we know now that all objects are non-nullable by default in Zig.\nBut what if we actually need to use an object that might receive a null value?\nHere is where optionals come in.\n\nAn optional object in Zig is an object that can be null.\nTo mark an object as optional, we use the `?` operator. When you put\nthis `?` operator right before the data type of an object, you transform\nthis data type into an optional data type, and the object becomes an optional object.\n\nTake the snippet below as an example. We are creating a new variable object\ncalled `num`. This object have data type `?i32`, which means that,\nthis object contains either a signed 32-bit integer (`i32`), or, a null value.\nBoth alternatives are valid values to the `num` object.\nThat is why, I can actually change the value of this object to null, and,\nno errors are raised by the `zig` compiler, as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: ?i32 = 5;\nnum = null;\n```\n:::\n\n\n\n\n### Optional pointers\n\nYou can also mark a pointer object as an optional pointer, meaning that,\nthis pointer points to either a null value, or, a value that is not null.\nIn other words, by marking a pointer in Zig as optional, you get a pointer in Zig that\nbehaves like a pointer in C.\n\nWhen you mark a pointer as optional, the data type of this pointer object\nbecomes `?*const T` or `?*T`, depending if the value pointed by the pointer\nis a constant value or not. The `?` identifies the object as optional, and\nthe `*` identifies it as a pointer object.\n\nIn the example below, we are creating a variable object named `num`, and an\noptional pointer object named `ptr`. Notice that the data type of the object\n`ptr` indicates that it is either a null value, or a pointer to an `i32` value.\nAlso, notice that the pointer object (`ptr`) can be marked as optional, even if\nthe object `num` is not optional.\n\nWhat this code tells us is that, the `num` variable will never contain a null value.\nThis variable will always contain a valid `i32` value. But in contrast, the `ptr` object might contain either a null\nvalue, or, a pointer to an `i32` value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: i32 = 5;\nconst ptr: ?*i32 = #\n_ = ptr;\n```\n:::\n\n\n\n\nBut what happens if we turn the table, and mark the `num` object as optional,\ninstead of the pointer object. If we do that, then, the pointer object is\nnot optional anymore. It would be a similar (altough different) result. Because then, we would have\na pointer to an optional value. In other words, a pointer to a value that is either a\nnull value, or, a not-null value.\n\nIn the example below, we are recreating this idea. Now, the `ptr` object\nhave a data type of `*?i32`, instead of `?*i32`. Notice that the `*` symbol comes before of `?`\nthis time. So now, we have a pointer that points to a value that is either null\n, or, a signed 32-bits integer.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: ?i32 = 5;\n// ptr have type `*?i32`, instead of `?*i32`.\nconst ptr = #\n_ = ptr;\n```\n:::\n\n\n\n\n\n### Null handling in optionals {#sec-null-handling}\n\nWhen you have an optional object in your Zig code, you have to explicitly handle\nthe possibility of this object being null. Is like error-handling with `try` and `catch`.\nIn Zig you also have to handle null values like if they were a type of error.\n\nWe can do that, by using either:\n\n- an if statement, like you would do in C.\n- the `orelse` keyword.\n- unwrap the optional value with the `?` method.\n\nWhen you use an if statement, you use a pair of pipes\nto unwrap the optional value, and use this \"unwrapped object\"\ninside the if block.\nUsing the example below as a reference, if the object `num` is null,\nthen, the code inside the if statement is not executed. Otherwise,\nthe if statement will unwrap the object `num` into the `not_null_num`\nobject. This `not_null_num` object is garanteed to be not null inside\nthe scope of the if statement.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst num: ?i32 = 5;\nif (num) |not_null_num| {\n try stdout.print(\n \"{d}\\n\", .{not_null_num}\n );\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n5\n```\n\n\n:::\n:::\n\n\n\n\nNow, the `orelse` keyword behaves like a binary operator. You connect two expressions with this keyword.\nOn the left side of `orelse`, you provide the expression that might result\nin a null value, and on the right side of `orelse` you provide another expression\nthat will not result in a null value.\n\nThe idea behind the `orelse` keyword is: if the expression on the left side\nresult in a not-null value, then, this not-null value is used. However,\nif this expression on the left side result in a null value, then, the value\nof the expression on the right side is used instead.\n\nLooking at the example below, since the `x` object is currently null, the\n`orelse` decided to use the alternative value, which is the number 15.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst x: ?i32 = null;\nconst dbl = (x orelse 15) * 2;\ntry stdout.print(\"{d}\\n\", .{dbl});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n30\n```\n\n\n:::\n:::\n\n\n\n\nYou can use the if statement or the `orelse` keyword, when you want to\nsolve (or deal with) this null value. However, if there is no clear solution\nto this null value, and the most logic and sane path is to simply panic\nand raise a loud error in your program when this null value is encountered,\nyou can use the `?` method of your optional object.\n\nIn essence, when you use this `?` method, the optional object is unwraped.\nIf a not-null value is found in the optional object, then, this not-null value is used.\nOtherwise, the `unreachable` keyword is used. You can read more about this\n[`unreacheable` keyword at the official documentation](https://ziglang.org/documentation/master/#unreachable)[^un-docs].\nBut in essence, when you build your Zig source code using the build modes `ReleaseSafe` or `Debug`, this\n`unreacheable` keyword causes the program to panic and raise an error during runtime,\nlike in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn return_null(n: i32) ?i32 {\n if (n == 5) return null;\n return n;\n}\n\npub fn main() !void {\n const x: i32 = 5;\n const y: ?i32 = return_null(x);\n try stdout.print(\"{d}\\n\", .{y.?});\n}\n```\n:::\n\n\n\n\n```\nthread 12767 panic: attempt to use null value\np7.zig:12:34: 0x103419d in main (p7):\n try stdout.print(\"{d}\\n\", .{y.?});\n ^\n```\n\n\n[^un-docs]: .\n\n\n", - "supporting": [ - "05-pointers_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/07-build-system/execute-results/html.json b/_freeze/Chapters/07-build-system/execute-results/html.json index c0f45cdb..6171182e 100644 --- a/_freeze/Chapters/07-build-system/execute-results/html.json +++ b/_freeze/Chapters/07-build-system/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Build System {#sec-build-system}\n\n\nIn this chapter, we are going to talk about the build system in Zig, and how an entire project\nis built in Zig.\nOne key advantage of Zig is that it includes a build system embedded in the language itself.\nThis is great, because then you do not have to depend on a external system, separated\nfrom the compiler, to build your code.\n\n\nYou can find a good description of Zig's build system\non the [article entitled \"Build System\"](https://ziglang.org/learn/build-system/#user-provided-options)[^zig-art1]\navailable in the official Zig's website.\nWe also have the excellent [series of posts written by Felix](https://zig.news/xq/zig-build-explained-part-1-59lf)[^felix].\nBut this chapter represents an extra resource for you to consult and rely on.\n\n[^felix]: \n[^zig-art1]: \n\nBuilding code is one of the things that Zig is best at. One thing that is particularly\ndifficult in C/C++ and even in Rust, is to cross-compile source code to multiple targets\n(e.g. multiple computer architectures and operational systems),\nand the `zig` compiler is known for being one of the best existing pieces of software\nfor this particular task.\n\n\n\n\n## How source code is built?\n\nWe already talked about the challenges of building source code in low-level languages\nat @sec-project-files. As we described at that section, programmers invented Build Systems\nto surpass these challenges on the building processes of low-level languages.\n\nLow-level languages uses a compiler to compile (or to build) your source code into binary instructions.\nIn C and C++, we normally use compilers like `gcc`, `g++` or `clang` to compile\nour C and C++ source code into these instructions.\nEvery language have it's own compiler, and this is no different in Zig.\n\nIn Zig, we have the `zig` compiler to compile our Zig source code into\nbinary instructions that can be executed by our computer.\nIn Zig, the compilation (or the build) process involves\nthe following components:\n\n- The Zig modules that contains your source code;\n- Library files (either a dynamic library or a static library);\n- Compiler flags that tailors the build process to your needs.\n\nThese are the things that you need to connect together in order to build your\nsource code in Zig. In C and C++, you would have an extra component, which is the header files of\nthe libraries that you are using. But header files do not exist in Zig, so, you only need\nto care about them if you are linking your Zig source code with a C library.\nIf that is not your case, you can forget about it.\n\nYour build process is usually organized in a build script. In Zig, we normally\nwrite this build script into a Zig module in the root directory of our project,\nnamed as `build.zig`. You write this build script, then, when you run it, your project\nget's built into binary files that you can use and distribute to your users.\n\nThis build script is normally organized around *target objects*. A target is simply\nsomething to be built, or, in other words, it's something that you want the `zig` compiler\nto build for you. This concept of \"targets\" is present in most Build Systems,\nspecially in CMake[^cmake].\n\n[^cmake]: \n\nThere are four types of target objects that you can build in Zig, which are:\n\n- An executable, which is simply a binary executable file (e.g. a `.exe` file on Windows).\n- A shared library, which is simply a binary library file (e.g. a `.so` file in Linux or a `.dll` file on Windows).\n- A static library, which is simply a binary library file (e.g. a `.a` file in Linux or a `.lib` file on Windows).\n- An unit tests executable, which is an executable file that executes only unit tests.\n\nWe are going to talk more about these target objects at @sec-targets.\n\n\n\n## The `build()` function {#sec-build-fun}\n\nA build script in Zig always contains a public (and top-level) `build()` function declared.\nIt is like the `main()` function on the main Zig module of your project, that we discussed at @sec-main-file.\nBut instead of creating the entrypoint to your code, this `build()` function is the entrypoint to the build process.\n\nThis `build()` function should accept a pointer to a `Build` object as input, and it should use this \"build object\" to perform\nthe necessary steps to build your project. The return type of this function is always `void`,\nand this `Build` struct comes directly from the Zig Standard Library (`std.Build`). So, you can \naccess this struct by just importing the Zig Standard Library into your `build.zig` module.\n\nJust as a very simple example, here you can see the source code necessary to build\nan executable file from the `hello.zig` Zig module.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn build(b: *std.Build) void {\n const exe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"hello.zig\"),\n .target = b.host,\n });\n b.installArtifact(exe);\n}\n```\n:::\n\n\n\n\nYou can define and use other functions and objects in this build script. You can also import\nother Zig modules as you would normally do in any other module of your project.\nThe only real requirement for this build script, is to have a public and top-level\n`build()` function defined, that accepts a pointer to a `Build` struct as input.\n\n\n## Target objects {#sec-targets}\n\nAs we described over the previous sections, a build script is composed around target objects.\nEach target object is normally a binary file (or an output) that you want to get from the build process. You can list\nmultiple target objects in your build script, so that the build process generates multiple\nbinary files for you at once.\n\nFor example, you may be a developer working in a cross-platform application,\nand, because this application is cross-platform, you probably need to realease\nbinary files of your software for each OS supported by your application to your end users.\nYou can define a target object in your build script\nfor each OS (Windows, Linux, etc.) where you want to publish your software.\nThis will make the `zig` compiler to build your project to multiple target OS's at once.\nThe Zig Build System official documentation have a [great code example that demonstrates this strategy](https://ziglang.org/learn/build-system/#handy-examples)[^zig-ex].\n\n[^zig-ex]: \n\n\nA target object is created by the following methods of the `Build` struct that we introduced\nat @sec-build-fun:\n\n- `addExecutable()` creates an executable file;\n- `addSharedLibrary()` creates a shared library file;\n- `addStaticLibrary()` creates a static library file;\n- `addTest()` creates an executable file that executes unit tests.\n\n\nThese functions are methods from the `Build` struct that you receive\nas input of the `build()` function. All of them, create as output\na `Compile` object, which represents a target object to be compiled\nby the `zig` compiler. All of these functions accept a similar struct literal as input.\nThis struct literal defines three essential specs about this target object that you are building:\n`name`, `target` and `root_source_file`.\n\nWe already saw these three options being used on the previous example,\nwhere we used the `addExecutable()` method to create an executable target object.\nThis example is reproduced below. Notice the use of the `path()` method\nfrom the `Build` struct, to define a path in the `root_source_file` option.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nexe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"hello.zig\"),\n .target = b.host,\n});\n```\n:::\n\n\n\n\nThe `name` option specificy the name that you want to give to the binary file defined\nby this target object. So, in this example, we are building an executable file named `hello`.\nIs traditional to set this `name` option to the name of your project.\n\n\nFurthermore, the `target` option specify the target computer architecture (or the target operational system) of this\nbinary file. For example, if you want this target object to run on a Windows machine\nthat uses a `x86_64` architecture, you can set this `target` option to `x86_64-windows-gnu` for example.\nThis will make the `zig` compiler to compile the project to run on a `x86_64` Windows machine.\nYou can see the full list of architectures and OS's that the `zig` compiler supports by running\nthe `zig targets` command in the terminal.\n\nNow, if you are building the project to run on the current machine\nthat you are using to run this build script, you can set this `target`\noption to the `host` method of the `Build` object, like we did in the example above.\nThis `host` method identifies the current machine where you are\ncurrently running the `zig` compiler.\n\n\nAt last, the `root_source_file` option specifies the root Zig module of your project.\nThat is the Zig module that contains the entrypoint to your application (i.e. the `main()` function), or, the main API of your library.\nThis also means that, all the Zig modules that compose your project are automatically discovered\nfrom the import statements that you have inside this \"root source file\".\nThe `zig` compiler can detect when a Zig module depends on the other through the import statements,\nand, as a result, it can discover the entire map of Zig modules used in your project.\n\nThis is handy, and it is different from what happens in other build systems.\nIn CMake for example, you have to explicitly list the paths to all source files that you want to\ninclude in your build process. This is probably a symptom of the \"lack of conditional\ncompilation\" in C and C++ compilers. Since they lack this feature, you have\nto explicitly choose which source files are sent to the C/C++ compiler, since not\nevery C/C++ code is portable or supported in every operational system, and, therefore,\nwould cause a compilation error in the C/C++ compiler.\n\n\nNow, one important detail about the build process is that, you have to **explicitly\ninstall the target objects that you create in your build script**, by using the\n`installArtifact()` method of the `Build` struct.\n\nEverytime you invoke the build process of your project, by calling the `build` command\nof the `zig` compiler, a new directory named `zig-out` is created in the root\ndirectory of your project. This new directory contains the output of the build process,\nthat is, the binary files built from your source code.\n\nWhat the `installArtifact()` method do is install (or copy) the built target objects\nthat you defined to this `zig-out` directory.\nThis means that, if you do not\ninstall the target objects you define in your build script, these target objects are\nessentially discarded at the end of the build process.\n\nFor example, you might be building a project that uses a third party library that is built\ntogether with the project. So, when you build your project, you would need first, to\nbuild the third party library, and then, you link it with the source code of your project.\nSo, in this case, we have two binary files that are generated in the build process (the executable file of your project, and the third party library).\nBut only one is of interest, which is the executable file of our project.\nWe can discard the binary file of the third party library, by simply not installing it\ninto this `zig-out` directory.\n\nSo, is easy to use this `installArtifact()` method. Just remember to apply it to every\ntarget object that you want to save it into the `zig-out` directory, like in the example below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nexe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"hello.zig\"),\n .target = b.host,\n});\n\nb.installArtifact(exe);\n```\n:::\n\n\n\n\n\n## Setting the build mode\n\nWe talked about the three essential options that are set when you create a new target object.\nBut there is also a fourth option that you can use to set the build mode of this target object,\nwhich is the `optimize` option.\nThis option is called this way, because the build modes in Zig are treated more of\nan \"optimization vs safety\" problem. So optmization plays an important role here.\nDon't worry, I'm going back to this question very soon.\n\nIn Zig, we have the four build modes listed below. Each one of these build modes offer\ndifferent advantages and characteristics. As we described at @sec-compile-debug-mode, the `zig` compiler\nuses the `Debug` build mode by default, when you don't explicitly choose a build mode.\n\n- `Debug`, mode that produces and includes debugging information in the output of the build process (i.e. the binary file defined by the target object);\n- `ReleaseSmall`, mode that tries to produce a binary file that is small in size;\n- `ReleaseFast`, mode that tries to optimize your code, in order to produce a binary file that is as fast as possible;\n- `ReleaseSafe`, mode that tries to make your code as safe as possible, by including safeguards when possible.\n\nSo, when you build your project, you can set the build mode of your target object to `ReleaseFast` for example, which will tell\nthe `zig` compiler to apply important optimizations in your code. This creates a binary file\nthat simply runs faster on most contexts, because it contains a more optimized version of your code.\nHowever, as a result, we normally loose some security funcionalities in our code.\nBecause some safety checks are removed from the final binary file,\nwhich makes your code run faster, but in a less safe manner.\n\nThis choice depends on your current priorities. If you are building a cryptography or banking system, you might\nprefer to prioritize safety in your code, so, you would choose the `ReleaseSafe` build mode, which is a little\nslower to run, but much more secure, because it includes all possible runtime safety checks in the binary file\nbuilt in the build process. In the other hand, if you are writing a game for example, you might prefer to prioritize performance\nover safety, by using the `ReleaseFast` build mode, so that your users can experience faster frame rates in your game.\n\nIn the example below, we are creating the same\ntarget object that we used on previous examples. But this time, we are specifying\nthe build mode of this target object to use the `ReleaseSafe` mode.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst exe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"hello.zig\"),\n .target = b.host,\n .optimize = .ReleaseSafe\n});\nb.installArtifact(exe);\n```\n:::\n\n\n\n\n\n## Setting the version of your build\n\nEverytime you build a target object in your build script, you can assign a version\nnumber to this specific build, following a semantic versioning framework.\nYou can find more about semantic versioning by visiting the [Semantic Versioning website](https://semver.org/)[^semver].\nAnyway, in Zig,\nyou can specify the version of your build, by providing a `SemanticVersion` struct to\nthe `version` option, like in the example below:\n\n\n[^semver]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst exe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"hello.zig\"),\n .target = b.host,\n .version = .{\n .major = 2, .minor = 9, .patch = 7\n }\n});\nb.installArtifact(exe);\n```\n:::\n\n\n\n\n\n## Detecting the OS in your build script {#sec-detect-os}\n\nIs very commom in Build Systems to use different options, or, to include different modules, or,\nto link against different libraries depending on the Operational System (OS)\nthat you are targeting in the build process.\n\nIn Zig, you can detect the target OS of the build process, by looking\nat the `os.tag` inside the `builtin` module from the Zig library.\nIn the example below, we are using an if statement to run some\narbitrary code when the target of the build process is a\nWindows system.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst builtin = @import(\"builtin\");\nif (builtin.target.os.tag == .windows) {\n // Code that runs only when the target of\n // the compilation process is Windows.\n}\n```\n:::\n\n\n\n\n\n## Adding a run step to your build process\n\nOne thing that is neat in Rust is that you can compile and run your\nsource code with one single command (`cargo run`) from the Rust compiler.\nWe saw at @sec-compile-run-code how can we perform a similar job in Zig, by\nbuilding and running our Zig source code through the `run` command from the `zig` compiler.\n\nBut how can we, at the same time, build and run the binary file specified by a target object in our\nbuild script?\nThe answer is by including a \"run artifact\" in our build script.\nA run artifact is created through the `addRunArtifact()` method from the `Build` struct.\nWe simply provide as input to this function the target object that describes the binary file that we\nwant to execute, and the function creates as output, a run artifact capable of executing\nthis binary file.\n\nIn the example below, we are defining an executable binary file named `hello`,\nand we use this `addRunArtifact()` method to create a run artifact that will execute\nthis `hello` executable file.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst exe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"src/hello.zig\"),\n .target = b.host\n});\nb.installArtifact(exe);\nconst run_arti = b.addRunArtifact(exe);\n```\n:::\n\n\n\n\nNow that we created the run artifact, we need to include it in\nthe build process. We do that by declaring a new step in our build\nscript to call this artifact, through the `step()` method of the `Build`\nstruct.\nWe can give any name we want to this step, but, for our\ncontext here, I'm going to name this step as \"run\".\nAlso, I give it a brief description to this step (\"Run the project\").\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst run_step = b.step(\n \"run\", \"Run the project\"\n);\n```\n:::\n\n\n\n\n\nNow that we declared this \"run step\" we need to tell Zig that\nthis \"run step\" depends on the run artifact.\nIn other words, a run artifact always depends on a \"step\" to effectively be executed.\nBy creating this dependency\nwe finally stablish the necessary commands to build and run the executable file\nfrom the build script.\n\nWe establish a dependency between the run step and the run artifact\nby using the `dependsOn()` method from the run step. So, we first\ncreate the run step, and then, we link it with the run artifact, by\nusing this `dependsOn()` method from the run step.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nrun_step.dependOn(&run_arti.step);\n```\n:::\n\n\n\n\nThe entire source code of this specific build script that\nwe wrote, piece by piece, in this section, is\navailable in the `build_and_run.zig` module. You can\nsee this module by\n[visiting the official repository of this book](https://github.com/pedropark99/zig-book/blob/main/ZigExamples/build_system/build_and_run.zig)\n[^module-src].\n\n\n[^module-src]: \n\nWhen you declare a new step in your build script, this step\nbecomes available through the `build` command in the `zig` compiler.\nYou can actually see this step by running `zig build --help` in the terminal, like\nin the example below, where we can see that this new \"run\"\nstep that we declared in the build script appeared in the output.\n\n```bash\nzig build --help\n```\n\n```\nSteps:\n ...\n run Run the project\n ...\n```\n\nNow, everything that we need to is to\ncall this \"run\" step that we created in our build script. We\ncall it by using the name that we gave to this step\nafter the `build` command from the `zig` compiler.\nThis will cause the compiler to build our executable file\nand execute it at the same time.\n\n```bash\nzig build run\n```\n\n## Build unit tests in your project\n\nWe talk at length about writing unit tests in Zig, and we also talk about how to execute these unit tests through\nthe `test` command of the `zig` compiler at @sec-unittests. However,\nas we did with the `run` command on the previous section, we also might want to\ninclude some commands in our build script to also build and execute the unit tests in our project.\n\nSo, once again, we are going to discuss how a specific built-in command from the `zig` compiler,\n(in this case, the `test` command) can be used in a build script in Zig.\nThis means that, we can include a step in our build script to build and run\nall unit tests in our project at once.\n\nHere is where a \"test target object\" comes into play.\nAs was described at @sec-targets, we can create a test target object by using the `addTest()` method of\nthe `Build` struct. So the first thing that we need to do is to\ndeclare a test target object in our build script.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst test_exe = b.addTest(.{\n .name = \"unit_tests\",\n .root_source_file = b.path(\"src/main.zig\"),\n .target = b.host,\n});\nb.installArtifact(test_exe);\n```\n:::\n\n\n\n\n\nA test target object essentially filter all `test` blocks in all Zig modules\nacross your project, and builds only the source code present inside\nthese `test` blocks in your project. As a result, this target object\ncreates an executable file that contains only the source code present\nin all of these `test` blocks (i.e. the unit tests) in your project.\n\nPerfect! Now that we declared this test target object, an executable file\nnamed `unit_tests` is built by the `zig` compiler when we trigger the build\nscript with the `build` command. After the build\nprocess is finished, we can simply execute this `unit_tests` executable\nin the terminal.\n\nHowever, if you remember the previous section, we already learned\nhow can we create a run step in our build script, to execute an executable file\nbuilt by the build script.\n\nSo, we could simply add a run step in our build script to run these unit tests\nfrom a single command in the `zig` compiler, to make our lifes easier.\nIn the example below, we demonstrate the commands to\nregister a new build step called \"tests\" in our build script\nto run these unit tests.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst run_arti = b.addRunArtifact(test_exe);\nconst run_step = b.step(\"tests\", \"Run unit tests\");\nrun_step.dependOn(&run_arti.step);\n```\n:::\n\n\n\n\nNow that we registered this new build step, we can trigger it by calling the command below\nin the terminal. You can also checkout the complete source\ncode for this specific build script at the `build_tests.zig` module at the\n[official repository of this book](https://github.com/pedropark99/zig-book/blob/main/ZigExamples/build_system/build_tests.zig)\n[^module-src2].\n\n\n[^module-src2]: \n\n\n```bash\nzig build tests\n```\n\n\n## Tailoring your build process with user-provided options\n\nSometimes, you want to make a build script that is customizable by the user\nof your project. You can do that by creating user-provided options in\nyour build script. In Zig, we create these options using the\n`option()` method from the `Build` struct.\n\nWith this method, we create a \"build option\" which can be passed\nto the `build.zig` script at the command line. The user have the power of setting\nthis option at the `build` command from the\n`zig` compiler. In other words, each build option that we create\nbecomes a new command line argument accessible in the `build` command\nof the compiler.\n\nThese \"user-provided options\" are set by using the prefix `-D` in the command line.\nFor example, if we declare an option named `use_zlib`, that receives a boolean value which\nindicates if we should link our source code to `zlib` or not, we can set the value\nof this option in the command line with `-Duse_zlib`. The code example below\ndemonstrates this idea:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn build(b: *std.Build) void {\n const use_zlib = b.option(\n bool,\n \"use_zlib\",\n \"Should link to zlib?\"\n ) orelse false;\n const exe = b.addExecutable(.{\n .name = \"hello\",\n .root_source_file = b.path(\"example.zig\"),\n .target = b.host,\n });\n if (use_zlib) {\n exe.linkSystemLibrary(\"zlib\");\n }\n b.installArtifact(exe);\n}\n```\n:::\n\n\n\n\nYou can set this `use_zlib` option at the command line when you are invoking the\n`build` command from the `zig` compiler. In the example below, we set this option\nto false, which means that the build script will not link our binary executable to\nthe `zlib` library.\n\n```bash\nzig build -Duse_zlib=false\n```\n\n\n## Linking to external libraries\n\n\nOne essential part of every build process is the linking stage.\nThis stage is responsible for combining the multiple object files\nthat represent your code, into a single executable file. It also links\nthis executable file to an external libraries, if you use any in your code.\n\nIn Zig, we have two notions of a \"library\", which are: 1) a system's library;\n2) a local library. A system's library is just a library that already is installed\nin your system. While a local library is a library that belongs to the current\nproject. Is a library that is present in your project directory, and\nthat you are building together with your project source code.\n\nThe basic difference between the two, is that a system's library is already\nbuilt and installed in your system, supposedly, and all you need to do\nis to link your source code to this library to start using it.\nWe do that by using the `linkSystemLibrary()` method from a\n`Compile` object. This method accepts the name of the library\nin a string as input. Remember from @sec-targets that a `Compile` object\nis a target object that you declare in your build script.\n\nWhen you link a particular target object with a system's library,\nthe `zig` compiler will use `pkg-config` to find where in your system\nare the binary files and also the header files of this library that you requested for.\nWhen it finds these files, the linker present in the `zig` compiler\nwill link your object files with the files of this library to\nproduce a single executable file.\n\nIn the example below, we are creating an executable file named `image_filter`,\nand, we are linking this executable file to the C Standard Library with the\nmethod `linkLibC()`, but we also are linking this executable file to the\nC library `libpng` that is currently installed in my system.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn build(b: *std.Build) void {\n const exe = b.addExecutable(.{\n .name = \"image_filter\",\n .root_source_file = b.path(\"src/main.zig\"),\n .target = target,\n .optimize = optimize,\n });\n exe.linkLibC();\n exe.linkSystemLibrary(\"png\");\n b.installArtifact(exe);\n}\n```\n:::\n\n\n\n\nIf you are linking with a C library in your project, is generally a good idea\nto also link your code with the C Standard Library. Because is very likely\nthat this C library uses some functionality of the C Standard Library at some point.\nThe same goes to C++ libraries. So, if you are linking with\nC++ libraries, is a good idea to link your project with the C++\nStandard Library using the `linkLibCpp()` method.\n\nOn the order side, when you want to link with a local library,\nyou should use the `linkLibrary()` method of a `Compile` object.\nThis method expects to receive another `Compile` object as input.\nThat is, another target object defined in your build script,\nusing either the `addStaticLibrary()` or `addSharedLibrary()` methods\nwhich defines a library to be built.\n\nBecause as we discussed earlier, a local library is a library\nthat is local to your project, and that is being built together\nwith your project. So, you need to create a target object in your build script\nto build this local library. Then, you link the target objects of interest in your project,\nwith this target object that identifies this local library.\n\nTake a look at this example extracted from the build script of the [`libxev` library](https://github.com/mitchellh/libxev/tree/main)[^libxev2].\nYou can see in this snippet that\nwe are declaring a shared library file, from the `c_api.zig`\nmodule. Then later in the build script, we declare an\nexecutable file named \"dynamic-binding-test\", which\nlinks to this shared library that we defined earlier\nin the script.\n\n[^libxev2]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst optimize = b.standardOptimizeOption(.{});\nconst target = b.standardTargetOptions(.{});\n\nconst dynamic_lib = b.addSharedLibrary(.{\n .name = dynamic_lib_name,\n .root_source_file = b.path(\"src/c_api.zig\"),\n .target = target,\n .optimize = optimize,\n});\nb.installArtifact(dynamic_lib);\n// ... more lines in the script\nconst dynamic_binding_test = b.addExecutable(.{\n .name = \"dynamic-binding-test\",\n .target = target,\n .optimize = optimize,\n});\ndynamic_binding_test.linkLibrary(dynamic_lib);\n```\n:::\n\n\n\n\n\n\n## Building C code {#sec-building-c-code}\n\nThe `zig` compiler comes with a C compiler embedded in it. In other words,\nyou can use the `zig` compiler to build C projects. This C compiler is available\nthrough the `cc` command of the `zig` compiler.\n\nAs an example, let's use the famous [FreeType library](https://freetype.org/)[^freetype].\nFreeType is one of the most widely used pieces of software in the world.\nIt is a C library designed to produce high-quality fonts. But it is also\nheavily used in the industry to natively render text and fonts\nin the screen of your computer.\n\nIn this section, we are going to write a build script, piece by piece, that is capable\nof building the FreeType project from source.\nYou can find the source code of this build script on the\n[`freetype-zig` repository](https://github.com/pedropark99/freetype-zig/tree/main)[^freetype-zig]\navailable at GitHub.\n\n[^freetype]: \n[^freetype-zig]: \n\nAfter you download the source code of FreeType from the official website[^freetype],\nyou can start writing the `build.zig` module. We begin by defining the target object\nthat defines the binary file that we want to compile.\n\nAs an example, I will build the project as a static library file using the `addStaticLibrary()` method\nto create the target object.\nAlso, since FreeType is a C library, I will also link the library\nagainst `libc` through the `linkLibC()` method, to garantee that any use\nof the C Standard Library is covered in the compilation process.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst target = b.standardTargetOptions(.{});\nconst opti = b.standardOptimizeOption(.{});\nconst lib = b.addStaticLibrary(.{\n .name = \"freetype\",\n .optimize = opti,\n .target = target,\n});\nlib.linkLibC();\n```\n:::\n\n\n\n\n### Creating C compiler flags\n\nCompiler flags are also known as \"compiler options\" by many programmers,\nor also, as \"command options\" in the GCC official documentation.\nIt's fair to also call them as the \"command-line arguments\" of the C compiler.\nIn general, we use compiler flags to turn on (or turn off) some features from the compiler,\nor to tweak the compilation process to fit the needs of our project.\n\nIn build scripts written in Zig, we normally list the C compiler flags to be used in the compilation process\nin a simple array, like in the example below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c_flags = [_][]const u8{\n \"-Wall\",\n \"-Wextra\",\n \"-Werror\",\n};\n```\n:::\n\n\n\n\nIn theory, there is nothing stopping you from using this array to add \"include paths\" (with the `-I` flag)\nor \"library paths\" (with the `-L` flag) to the compilation process. But there are formal ways in Zig to\nadd these types of paths in the compilation process. Both are discussed at @sec-include-paths\nand @sec-library-paths.\n\nAnyway, in Zig, we add C flags to the build process together with the C files that we want to compile, using the\n`addCSourceFile()` and `addCSourceFiles()` methods. In the example above, we have just declared\nthe C flags that we want to use. But we haven't added them to the build process yet.\nTo do that, we also need to list the C files to be compiled.\n\n### Listing your C files\n\nThe C files that contains \"cross-platform\" source code are listed in the `c_source_files`\nobject below. These are the C files that are included by default in every platform\nsupported by the FreeType library. Now, since the amount of C files in the FreeType library is big,\nI have omitted the rest of the files in the code example below, for brevity purposes.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c_source_files = [_][]const u8{\n \"src/autofit/autofit.c\",\n \"src/base/ftbase.c\",\n // ... and many other C files.\n};\n```\n:::\n\n\n\n\nNow, in addition to \"cross-platform\" source code, we also have some C files in the FreeType project\nthat are platform-specific, meaning that, they contain source code that can obly be compiled in specific platforms,\nand, as a result, they are only included in the build process on these specific target platforms.\nThe objects that list these C files are exposed in the code example below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst windows_c_source_files = [_][]const u8{\n \"builds/windows/ftdebug.c\",\n \"builds/windows/ftsystem.c\"\n};\nconst linux_c_source_files = [_][]const u8{\n \"src/base/ftsystem.c\",\n \"src/base/ftdebug.c\"\n};\n```\n:::\n\n\n\n\nNow that we declared both the files that we want to include and the C compiler flags to be used,\nwe can add them to the target object that describes the FreeType library, by using the\n`addCSourceFile()` and `addCSourceFiles()` methods.\n\nBoth of these functions are methods from a `Compile` object (i.e. a target object).\nThe `addCSourceFile()` method is capable of adding a single C file to the target object,\nwhile the `addCSourceFiles()` method is used to add multiple C files in a single command.\nYou might prefer to use `addCSourceFile()` when you need to use different compiler flags\non specific C files in your project. But, if you can use the same compiler flags\nacross all C files, then, you will probably find `addCSourceFiles()` a better choice.\n\nNotice that we are using the `addCSourceFiles()` method in the example below,\nto add both the C files and the C compiler flags. Also notice that we\nare using the `os.tag` that we learned about at @sec-detect-os, to add the platform-specific\nC files.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst builtin = @import(\"builtin\");\nlib.addCSourceFiles(\n &c_source_files, &c_flags\n);\n\nswitch (builtin.target.os.tag) {\n .windows => {\n lib.addCSourceFiles(\n &windows_c_source_files,\n &c_flags\n );\n },\n .linux => {\n lib.addCSourceFiles(\n &linux_c_source_files,\n &c_flags\n );\n },\n else => {},\n}\n```\n:::\n\n\n\n\n\n### Defining C Macros\n\nC Macros are an essential part of the C programming language,\nand they are commonly defined through the `-D` flag from a C compiler.\nIn Zig, you can define a C Macro to be used in your build process\nby using the `defineCMacro()` method from the target object that\ndefines the binary file that you are building.\n\nIn the example below, we are using the `lib` object that we defined\non the previous sections to define some C Macros used by FreeType\nin the compilation process. These C Macros specify if FreeType\nshould (or should not) include functionalities from different\nexternal libraries.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nlib.defineCMacro(\"FT_DISABLE_ZLIB\", \"TRUE\");\nlib.defineCMacro(\"FT_DISABLE_PNG\", \"TRUE\");\nlib.defineCMacro(\"FT_DISABLE_HARFBUZZ\", \"TRUE\");\nlib.defineCMacro(\"FT_DISABLE_BZIP2\", \"TRUE\");\nlib.defineCMacro(\"FT_DISABLE_BROTLI\", \"TRUE\");\nlib.defineCMacro(\"FT2_BUILD_LIBRARY\", \"TRUE\");\n```\n:::\n\n\n\n\n\n### Adding library paths {#sec-library-paths}\n\nLibrary paths are paths in your computer where the C compiler will look (or search) for\nlibrary files to link against your source code. In other words, when you use a library in your\nC source code, and you ask the C compiler to link your source code against this library,\nthe C compiler will search for the binary files of this library across the paths listed\nin this \"library paths\" set.\n\nThese paths are platform specific, and, by default, the C compiler starts by looking at a\npre-defined set of places in your computer. But you can add more paths (or more places)\nto this list. For example, you may have a library installed in a non-conventional place of your\ncomputer, and you can make the C compiler \"see\" this \"non-conventional place\" by adding this path\nto this list of pre-defined paths.\n\nIn Zig, you can add more paths to this set by using the `addLibraryPath()` method from your target object.\nFirst, you defined a `LazyPath` object, containing the path you want to add, then,\nyou provide this object as input to the `addLibraryPath()` method, like in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst lib_path: std.Build.LazyPath = .{\n .cwd_relative = \"/usr/local/lib/\"\n};\nlib.addLibraryPath(lib_path);\n```\n:::\n\n\n\n\n\n\n\n### Adding include paths {#sec-include-paths}\n\nThe preprocessor search path is a popular concept from the\nC community, but it is also known by many C programmers as \"include paths\", because\nthe paths in this \"search path\" relate to the `#include` statements found in the C files.\n\nInclude paths are similar to library paths. They are a set of pre-defined places in your computer where\nthe C compiler will look for files during the compilation process. But instead of looking for\nlibrary files, the include paths are places where the compiler looks for header files included\nin your C source code.\nThis is why many C programmers prefer to call these paths as the \"preprocessor search path\".\nBecause header files are processed during the preprocessor stage of the compilation\nprocess.\n\nSo, every header file that you include in your C source code, through a `#include` statements needs to\nbe found somewhere, and the C compiler will search for this header file across the paths listed in this \"include paths\" set.\nInclude paths are added to the compilation process through the `-I` flag.\n\nIn Zig, you can add new paths to this pre-defined set of paths, by using the `addIncludePath()` method\nfrom your target object. This method also accepts a `LazyPath` object as input.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst inc_path: std.Build.LazyPath = .{\n .path = \"./include\"\n};\nlib.addIncludePath(inc_path);\n```\n:::\n", - "supporting": [ - "07-build-system_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/09-data-structures/execute-results/html.json b/_freeze/Chapters/09-data-structures/execute-results/html.json index e7e3e797..ea44b35c 100644 --- a/_freeze/Chapters/09-data-structures/execute-results/html.json +++ b/_freeze/Chapters/09-data-structures/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Data Structures\n\nIn this chapter, we are going to discuss some Data Structures that are available from\nthe Zig Standard Library, specially `ArrayList` and also `HashMap`. I'm also want\nto talk about one of the key features of Zig in this chapter, which is `comptime`, and\nhow we can use it to create generics in Zig.\n\n\n## Dynamic Arrays {#sec-dynamic-array}\n\nIn high level languages, arrays are usually dynamic. They easily grow\nin size when they have to, and you don't need to worry about it.\nIn contrast, arrays in low level languages are usually static by default.\nThis is the reality of C, C++, Rust and also Zig. Static arrays were presented at\n@sec-arrays, but in this section, we are going to talk about dynamic arrays.\n\nDynamic arrays are simply arrays that can grow in size during the runtime\nof your program. Most low level languages offer some implementation of\na dynamic array in their standard library. C++ have `std::vector`, Rust have `Vec`,\nand Zig have `std.ArrayList`.\n\nThe `std.ArrayList` struct provides a contiguous and growable array for you.\nIt works like any other dinamic array, it allocates a contiguous block of memory, and when this block have no space left,\n`ArrayList` allocates another contiguous and bigger block of memory, copies the\nelements to this new location, and erases (or frees) the previous block of memory.\n\n\n### Capacity vs Length\n\nWhen we talk about dynamic arrays, we have two similar concepts that\nare very essential to how a dynamic array works behind the hood.\nThese concepts are *capacity* and *length*. In some contexts, specially\nin C++, *length* is also called of *size*.\n\nAlthough they look similar, these concepts represent different things\nin the context of dynamic arrays. *Capacity* is the number of items (or elements)\nthat your dynamic array can currently hold without the need to allocate more memory.\n\nIn contrast, the *length* refers to how many elements in the array\nare currently being used, or, in other words, how many elements in this array\nthat you assigned a value to. Every dynamic array works around\na block of allocated memory that represents an array with total capacity of $n$ elements,\nbut only a portion of these $n$ elements are being used most of the time. This portion\nof $n$ is the *length* of the array. So every time you append a new value\nto the array, you are incrementing it's *length* by one.\n\nThis means that a dynamic array usually works with an extra margin, or, an extra space\nwhich is currently empty, but it is waiting and ready to be used. This \"extra space\"\nis essentially the difference between *capacity* and *length*. *Capacity* represents\nthe total number of elements that the array can hold without the need to re-allocate\nor re-expand the array, while the *length* represents how much of this capacity\nis currently being used to hold/store values.\n\n@fig-capacity-length presents this idea visually. Notice that, at first,\nthe capacity of the array is greater than the length of the array.\nSo, the dynamic array have extra space that is currently empty, but it\nis ready to receive a value to be stored.\n\n![Difference between capacity and length in a dynamic array](./../Figures/dynamic-array.png){#fig-capacity-length}\n\nWe can also see at @fig-capacity-length that, when *length* and *capacity* are equal, it means that the array have no space left.\nWe reached the roof of our capacity, and because of that, if we want to store more values\nin this array, we need to expand it. We need to get a bigger space that can hold more values\nthat we currently have.\n\nA dynamic array works by expanding the underlying array, whenever the *length* becomes equal\nto the *capacity* of the array. It basically allocates a new contiguos block of memory that is bigger\nthan the previous one, then, it copies all values that are currently being stored to this new\nlocation (i.e. this new block of memory), then, it frees the previous block of\nmemory. At the end of this process, the new underlying array have a bigger *capacity*, and, therefore,\nthe *length* becomes once again smaller than the *capacity* of the array.\n\nThis is the cycle of an dynamic array. Notice that, throughout this cycle, the *capacity* is always\neither equal to or higher than the *length* of the array. If youh have an `ArrayList` object, let's suppose\nyou named it of `buffer`, you can check the current capacity of your array by accessing the `capacity`\nattribute of your `ArrayList` object, while the current *length* of it is available through the `items.len`\nattribute of your `ArrayList` object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// Check capacity\nbuffer.capacity;\n// Check length\nbuffer.items.len;\n```\n:::\n\n\n\n\n### Creating an `ArrayList` object\n\nIn order to use `ArrayList`, you must provide an allocator object to it.\nRemember, Zig does not have a default memory allocator. And as I described at @sec-allocators, all memory\nallocations must be done by allocator objects that you define, that\nyou have control over. In our example here, I'm going to use\na general purpose allocator, but you can use any other allocator\nof your preference.\n\nWhen you initialize an `ArrayList` object, you must provide the data type of the elements of\nthe array. In other words, this defines the type of data that this array (or container) will\nstore. Therefore, if I provide the `u8` type to it, then, I will create a dynamic\narray of `u8` values. However, if I provide a struct that I defined instead, like the struct `User`\nfrom @sec-structs-and-oop, then, a dynamic array of `User` values\nwill be created. In the example below, with the expression `ArrayList(u8)` we\nare creating a dynamic array of `u8` values.\n\nAfter you provide the data type of the elements of the array, you can initialize\nan `ArrayList` object by either using the `init()` or the `initCapacity()` method.\nThe former method receives only the allocator object\nas input, while the latter method receives both the allocator object and a capacity number as inputs.\nWith the latter method, you not only initialize the struct, but you\nalso set the starting capacity of the allocated array.\n\nUsing the `initCapacity()` method is the preferred way to initialize your dynamic array.\nBecause reallocations, or, in other words, the process of expanding the capacity of the array,\nis always a high cost operation. You should take any possible opportunity to avoid reallocations in\nyour array. If you know how much space your array needs to occupy at the beginning,\nyou should always use `initCapacity()` to create your dynamic array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 100);\ndefer buffer.deinit();\n```\n:::\n\n\n\n\n\nIn the example above, the `buffer` object starts as an array of 100 elements. If this\n`buffer` object needs to create more space to accomodate more elements during the runtime of your program, the `ArrayList`\ninternals will perform the necessary actions for you automatically.\nAlso notice the `deinit()` method being used to destroy the `buffer` object at the\nend of the current scope, by freeing all the memory that was allocated for the dynamic\narray stored in this `buffer` object.\n\n\n### Adding new elements to the array\n\nNow that we created our dynamic array, we can start to use it. You can append (a.k.a \"add\")\nnew values to this array by using the `append()` method. This method works the same way\nas the `append()` method from a Python list, or, the `emplace_back()` method from `std::vector` of C++.\nYou provide a single value to this method, and the method appends this value to the array.\n\nYou can also use the `appendSlice()` method to append multiple values at once. You provide\na slice (slices were described at @sec-arrays) to this method, and the method adds all values present\nin this slice to your dynamic array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\ntry buffer.append('H');\ntry buffer.append('e');\ntry buffer.append('l');\ntry buffer.append('l');\ntry buffer.append('o');\ntry buffer.appendSlice(\" World!\");\n```\n:::\n\n\n\n\n### Removing elements from the array {#sec-dynamic-array-remove}\n\nYou can use the `pop()` method to \"pop\" or remove\nthe last element in the array. Is worth noting that this method\ndo not change the capacity of the array. It just deletes or erases\nthe last value stored in the array.\n\nAlso, this method returns as result the value that got deleted. That is, you can\nuse this method to both get the last value in the array, and also, remove\nit from the array. It is a \"get and remove value\" type of method.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst exclamation_mark = buffer.pop();\n```\n:::\n\n\n\n\nNow, if you want to remove specific elements from specific positions\nof your array, you can use the `orderedRemove()` method from your\n`ArrayList` object. With this method, you can provide an index as input,\nthen, the method will delete the value that is at this index in the array.\nThis effectively reduces the *length* of the array everytime you execute\nan `orderedRemove()` operation.\n\nIn the example below, we first create an `ArrayList` object, and we fill it\nwith numbers. Then, we use `orderedRemove()` to remove the value at\nindex 3 in the array, two consecutive times.\n\nAlso, notice that we are assigning the result of `orderedRemove()` to the\nunderscore character. So we are discarding the result value of this method.\nAs the result value, the `orderedRemove()` method returns the value that\ngot deleted, in a similar style to the `pop()` method.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 100);\ndefer buffer.deinit();\n\nfor (0..10) |i| {\n const index: u8 = @intCast(i);\n try buffer.append(index);\n}\n\nstd.debug.print(\n \"{any}\\n\", .{buffer.items}\n);\n_ = buffer.orderedRemove(3);\n_ = buffer.orderedRemove(3);\n\nstd.debug.print(\n \"{any}\\n\", .{buffer.items}\n);\nstd.debug.print(\n \"{any}\\n\", .{buffer.items.len}\n);\n```\n:::\n\n\n\n\n```\n{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }\n{ 0, 1, 2, 5, 6, 7, 8, 9 }\n8\n```\n\nOne key characteristic about `orderedRemove()` is that it preserves the order\nof the values in the array. So, it deletes the value that you asked it to\nremove, but it also makes sure that the order of the values that remain in the array\nstay the same as before.\n\nNow, if you don't care about the order of the values, for example, maybe you want to treat\nyour dynamic array as a set of values, like the `std::unordered_set`\nstructure from C++, you can use the `swapRemove()` method instead. This method\nworks similarly to the `orderedRemove()` method. You give an index to this\nmethod, then, it deletes the value that is at this index in the array.\nBut this method does not preserve the original order of the values that remain\nin the array. As a result, `swapRemove()` is, in general, faster than `orderedRemove()`.\n\n\n### Inserting elements at specific indexes\n\nWhen you need to insert values in the middle of your array,\ninstead of just appending them to the end of the array, you need to use\nthe `insert()` and `insertSlice()` methods, instead of\nthe `append()` and `appendSlice()` methods.\n\nThese two methods work very similarly to `insert()` and `insert_range()`\nfrom the C++ vector class. You provide an index to these methods,\nand they insert the values that you provide at that index in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 10);\ndefer buffer.deinit();\n\ntry buffer.appendSlice(\"My Pedro\");\ntry buffer.insert(4, '3');\ntry buffer.insertSlice(2, \" name\");\nfor (buffer.items) |char| {\n try stdout.print(\"{c}\", .{char});\n}\n```\n:::\n\n\n\n\n```\nMy name P3edro\n```\n\n\n### Conclusion\n\nIf you feel the lack of some other method, I recommend\nyou to read the [official documentation for the `ArrayListAligned`](https://ziglang.org/documentation/master/std/#std.array_list.ArrayListAligned)[^zig-array2]\nstruct, which describes most of the methods available\nthrough the `ArrayList` object.\n\nYou will notice that there is a lot other methods in this page that\nI did not described here, and I recommend you to explore these methods,\nand understand how they work.\n\n[^zig-array2]: \n\n\n\n## Maps or HashTables\n\nSome professionals know this type of data structure by different terms, like \"map\", \"hashmap\" or \"associative arrays\". But most professionals\nknow this structure by the name *hashtable*.\nEvery programming language normally have some implementation of a hashtable in their\nstardard libraries. Python have `dict()`, C++ have `std::map` and `std::unordered_map`, Rust\nhave `HashMap`, Javascript have `Object()` and `Map()`,\nC# have `Hashtable()`, etc.\n\n\n\n### What is a hashtable?\n\nA hashtable is a data structure based on key-value pairs.\nYou provide a key and a value to this structure, then, the hashtable will store\nthe input value at a location that can be identified by the input\nkey that you provided.\nIt does that by using an underlying array and a hash function.\nThese two components are essential to how a hashtable works.\n\nUnder the hood, the hashtable contains an array. This array is where the values\nare stored, and the elements of this array are usually called of *buckets*.\nSo the values that you provide to the hashtable are stored inside buckets,\nand you access each bucket by using an index.\n\nWhen you provide a key to a hashtable, it passes this key to the\nhash function. This hash function uses some sort of hashing algorithm to transform\nthis key into an index. This index is actually an array index. It is a position\nin the underlying array of the hashtable.\nThis is how a key identifies a specific position (or location) inside the hashtable\nstructure.\n\nSo you provide a key to the hashtable, and this key identifies an specific location\ninside the hastable, then, the hashtable takes the input value that you provided,\nand stores this value in the location identified by the input key that you provided.\nYou could say that the key maps to the value stored in the hashtable. You find\nthe value, by using the key that identifies the location where the value is stored.\nThe @fig-hashtable presents this process visually.\n\n\n![A diagram of a Hashtable. Source: Wikipedia, the free encyclopedia.](./../Figures/hashtable.svg){#fig-hashtable}\n\n\nThe operation described in the previous paragraph is normally called an *insertion* operation.\nBecause you are inserting new values into the hashtable.\nBut there are other types of operations in hashtables such as *delete* and *lookup*.\nDelete is self describing, it is when you delete (or remove) a value from the hashtable.\nWhile lookup corresponds to when you retrieve (or look at) a value that is stored in\nthe hashtable, by using the key that identifies the location where this value is stored.\n\nSometimes, instead of storing the values directly, the underlying array of the hashtable might be an array of pointers,\ni.e. the buckets of the array stores pointers that points to the value,\nor also, may be an array of linked lists.\nThese cases are commom on hashtables that allows duplicate keys, or, in other words,\non hashtables that effectively handle \"collisions\" that may arise from the hash function.\n\nDuplicate keys, or this \"collision\" thing that I'm talking about, is when you have two different keys that points to the same location (i.e. to the same index)\nin the underlying array of the hashtable. This might happen depending on the characteristics of the hash function\nthat is being used in the hashtable. Some implementations of the hashtable will actively deal with collisions,\nmeaning that, they will handle this case in some way. For example, the hashtable\nmight transform all buckets into linked lists. Because with a liked list you can store\nmultiple values into a single bucket.\n\nThere are different techniques to handle collisions in hashtables, which I will not describe\nin this book, because it is not our main scope here. But you can find a good description of\nsome of the most commom techniques at the Wikipedia page of hashtables [@wikipedia_hashtables].\n\n\n### Hashtables in Zig {#sec-hashmap}\n\nThe Zig Standard Library provides different implementations of a hashtable,\nlike the struct `HashMap`. Each implementation have it's own cons and pros, which we will\ndiscuss later on, and all of them are available through the `std.hash_map` module.\n\nThe `HashMap` struct is a general-purpose hashtable,\nwhich have very fast operations (lookup, insertion, delete), and also,\nquite high load factors for low memory usage. You can create and provide a context object\nto the `HashMap` constructor. This context object allows you to tailor\nthe behaviour of the hashtable itself, because you can\nprovide a hash function implementation to be used by the hashtable\nthrough this context object.\n\nBut let's not worry about this context object now, because it is meant to be used\nby \"experts in the field of hashtables\". Since we are most likely not\nexperts in this field, we are going to take the easy way to create\na hashtable. Which is by using the `AutoHashMap()` function.\n\n\nThis `AutoHashMap()` function is essentially a \"create a hashtable object that uses the default settings\"\ntype of function. It chooses a context object, and, therefore, a hash function implementation,\nautomatically for you. This function receives two data types as input, the first data type is the data type of the keys\nthat will be used in this hashtable, while the second data type is the data type of that data that will be\nstored inside the hashtable, that is, the data type of the values to be stored.\n\nIn the example below, we are providing the data type `u32` in the first argument, and `u16` in the second argument of this\nfunction. It means that we are going to use `u32` values as keys in this hashtable, while `u16` values are the actual values\nthat are going to be stored into this hashtable.\nAt the end of this process, the `hash_table` object contains a `HashMap` object as output\nthat uses the default context, and the default load factor.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst AutoHashMap = std.hash_map.AutoHashMap;\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var hash_table = AutoHashMap(u32, u16).init(allocator);\n defer hash_table.deinit();\n\n try hash_table.put(54321, 89);\n try hash_table.put(50050, 55);\n try hash_table.put(57709, 41);\n std.debug.print(\n \"N of values stored: {d}\\n\",\n .{hash_table.count()}\n );\n std.debug.print(\n \"Value at key 50050: {d}\\n\",\n .{hash_table.get(50050).?}\n );\n\n if (hash_table.remove(57709)) {\n std.debug.print(\n \"Value at key 57709 succesfully removed!\\n\",\n .{}\n );\n }\n std.debug.print(\n \"N of values stored: {d}\\n\",\n .{hash_table.count()}\n );\n}\n```\n:::\n\n\n\n\n```\nN of values stored: 3\nValue at key 50050: 55\nValue at key 57709 succesfully removed!\nN of values stored: 2\n```\n\nYou can add/put new values into the hashtable by using the `put()` method. The first argument\nis the key to be used, and the second argument is the actual value that you want to store inside\nthe hashtable. In the example below, we first add the value 89 using the key 54321, next, we add\nthe value 55 using the key 50050, etc.\n\nNotice that we used the method `count()` to see how many values are currently stored in the\nhashtable. After that, we also used the `get()` method to access (or look) at the value stored in\nthe position identified by the key 500050. The output of this `get()` method is an optional value,\nand that is why we use the `?` method at the end to get access to the actual value.\n\nAlso notice that we can remove (or delete) values from a hashtables by using the `remove()` method.\nYou provide the key that identifies the value that you want to delete, then, the method will\ndelete this value and return a `true` value as output. This `true` value essentially tells us\nthat the method succesfully deleted the value.\n\nBut this delete operation might not be always successful. For example, you might provide the wrong\nkey to this method. I mean, maybe you provide\n(either intentionally or unintentionally) a key that points to an empty bucket,\ni.e. a bucket that still doesn't have a value in it.\nIn this case, the `remove()` method would return a `false` value.\n\n\n\n### Iterating through the hashtable\n\nIterating through the keys and values that are currently being stored in\nthe hashtable is a very commom need.\nYou can do that in Zig by using an iterator object that can iterate\nthrough the elements of you hashtable object.\n\nThis iterator object works like any other iterator object that you would\nfind in languages such as C++ and Rust. It is basically a pointer object\nthat points to some value in the container, and has a `next()` method\nthat you can use to navigate (or iterate) through the next values in the\ncontainer.\n\nYou can create such iterator object by using the `iterator()` method of the hashtable object.\nThis method returns an iterator object, from which you can use the `next()` method in conjunction\nwith a while loop to iterate through the elements of your hashtable. The `next()` method returns an optional\n`Entry` value, and therefore, you must unwrap this optional value to get the actual `Entry` value\nfrom which you can access the key and also the value identified by this key.\n\nWith this `Entry` value at hand, you can access the key of this current entry by using the `key_ptr`\nattribute and dereferencing the pointer that lives inside of it, while the value identified by this\nkey is accessed through the `value_ptr` attribute instead, which is also a pointer to be dereferenced.\nThe code example below demonstrates the use of these elements:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst AutoHashMap = std.hash_map.AutoHashMap;\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var hash_table = AutoHashMap(u32, u16).init(allocator);\n defer hash_table.deinit();\n\n try hash_table.put(54321, 89);\n try hash_table.put(50050, 55);\n try hash_table.put(57709, 41);\n\n var it = hash_table.iterator();\n while (it.next()) |kv| {\n // Access the current key\n std.debug.print(\"Key: {d} | \", .{kv.key_ptr.*});\n // Access the current value\n std.debug.print(\"Value: {d}\\n\", .{kv.value_ptr.*});\n }\n}\n```\n:::\n\n\n\n\n```\nKey: 54321 | Value: 89\nKey: 50050 | Value: 55\nKey: 57709 | Value: 41\n```\n\n\nIf you want to iterate through only the values or the keys of your hashtable,\nyou can create a key iterator or a value iterator object. These are also iterator\nobjects, which have the same `next()` method that you can use to iterate through the\nsequence of values.\n\nKey iterators are created from the `keyIterator()` method of your\nhashtable object, while value iterators are created from the `valueIterator()` method.\nAll you have to do is to unwrap the value from the `next()` method and deference it\ndirectly to access the key or value that you iterating over.\nThe code example below demonstrates what would this be for a key iterator,\nbut you can replicate the same logic to a value iterator.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar kit = hash_table.keyIterator();\nwhile (kit.next()) |key| {\n std.debug.print(\"Key: {d}\\n\", .{key.*});\n}\n```\n:::\n\n\n\n\n```\nKey: 54321\nKey: 50050\nKey: 57709\n```\n\n\n### The `ArrayHashMap` hashtable {#sec-array-map}\n\nIf you need to iterate through the elements of your hashtable constantly,\nyou might want to use the `ArrayHashMap` struct for your specific case,\ninstead of going with the usual and general-purpose `HashMap` struct.\n\nThe `ArrayHashMap` struct creates a hashtable that is faster to iterate over.\nThat is why this specific type of hashtable might be valuable to you.\nSome other properties of a `ArrayHashMap` hashtable are:\n\n- the order of insertion is preserved. So the order of the values you find while iterating through this hashtable\nare actually the order in which these values were inserted in the hashtable.\n\n- the key-value pairs are stored sequentially, one after another.\n\n\nYou can create an `ArrayHashMap` object by using, once again, a helper function that\nchooses automatically for you a hash function implementation. This is the\n`AutoArrayHashMap()` function, which works very similarly to the `AutoHashMap()`\nfunction that we presented at @sec-hashmap.\n\nYou provide two data types to this function. The data type of the keys that will be\nused in this hashtable, and the data type of the values that will be stored in\nthis hashtable.\n\nAn `ArrayHashMap` object have essentially the exact same methods from the `HashMap` struct.\nSo you can insert new values into the hashtable by using the `put()` method, you can look (or get)\na value from the hashtable by using the `get()` method. But the `remove()` method is not available\nin this specific type of hashtable.\n\nIn order to delete values from the hashtable, you would use the same methods that you find in\nan `ArrayList` object, i.e. a dynamic array. I presented these methods at @sec-dynamic-array-remove,\nwhich are the `swapRemove()` and `orderedRemove()` methods. These methods have here the same meaning, or,\nthe same effect that they have in an `ArrayList` object.\n\nThis means that, with `swapRemove()` you remove the value from the hashtable, but you do not preserve\nthe order in which the values were inserted into the structure. While `orderedRemove()` is capable\nof retaining the insertion order of these values.\n\nBut instead of providing an index as input to `swapRemove()` or `orderedRemove()`, like I described\nat @sec-dynamic-array-remove, these methods here in an `ArrayHashMap` take a key as input, like\nthe `remove()` method from a `HashMap` object. If you want to provide an index as input, instead\nof a key, you should use the `swapRemoveAt()` and `orderedRemoveAt()` methods.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar hash_table = AutoArrayHashMap(u32, u16)\n .init(allocator);\ndefer hash_table.deinit();\n```\n:::\n\n\n\n\n\n\n### The `StringHashMap` hashtable {#sec-string-hash-map}\n\nOne thing that you will notice in the other two types of hashtables that I\npresented in the last sections, is that neither of them accepts a slice data type\nin their keys.\nWhat this means is that you cannot use a slice value to represent a key in\nthese types of hashtable.\n\nThe most obvious consequence of this, is that you cannot use strings as keys\nin these hashtables. But is extremely commom to use string values as keys\nin hashtables.\n\nTake this very simple Javascript code snippet as an example. We are creating\na simple hashtable object named `people`. Then, we add a new entry to this\nhashtable, which is identified by the string `'Pedro'`. This string is the\nkey in this case, while the object containing different personal information such as\nage, height and city, is the value to be stored in the hashtable.\n\n```js\nvar people = new Object();\npeople['Pedro'] = {\n 'age': 25,\n 'height': 1.67,\n 'city': 'Belo Horizonte'\n};\n```\n\nThis pattern of using strings as keys is very commom in\nall sorts of situations. That is why the Zig Standard Library offers a\nspecific type of hashtable for this purpose, which is created through the `StringHashMap()` function.\nThis function creates a hashtable that uses strings as keys. The only input of this\nfunction is the data type of the values that will be stored into this hashtable.\n\nIn the example below, I'm creating a hashtable to store the ages of different people.\nThe keys to be used in this hashtable are the names of each person, while the value stored in the\nhashtable is the age of the person identified by the key.\n\nThat is why I provide the `u8` data type (which is the data type used by the age values) as input to this `StringHashMap()` function.\nAs the result, it creates a hashtable that uses string values as keys, and, that stores\n`u8` values in it. Notice that an allocator object is provided at the `init()` method of the\nresulting object from the `StringHashMap()` function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var ages = std.StringHashMap(u8).init(allocator);\n defer ages.deinit();\n\n try ages.put(\"Pedro\", 25);\n try ages.put(\"Matheus\", 21);\n try ages.put(\"Abgail\", 42);\n\n var it = ages.iterator();\n while (it.next()) |kv| {\n std.debug.print(\"Key: {s} | \", .{kv.key_ptr.*});\n std.debug.print(\"Age: {d}\\n\", .{kv.value_ptr.*});\n }\n}\n```\n:::\n\n\n\n\n```\nKey: Pedro | Age: 25\nKey: Abgail | Age: 42\nKey: Matheus | Age: 21\n```\n\n\n### The `StringArrayHashMap` hashtable\n\nThe Zig Standard Library also provides a type of hashtable that mix the cons and pros of the\ntypes of hashtables that were presented on the previous two sections. That is, a hashtable\nthat uses strings as keys, but also have the advantages from the `ArrayHashMap` struct.\nIn other words, you can have a hashtable that is fast to iterate over,\nthat preserves insertion order, and also, that uses strings as keys.\n\nYou can create such type of hashtable by using the `StringArrayHashMap()` function.\nThis function accepts a data type as input, which is the data type of the values that are\ngoing to be stored inside this hashtable, in the same style as the function presented\nat @sec-string-hash-map.\n\nYou can insert new values into this hashtable by using the same `put()` method that\nI presented at @sec-string-hash-map. And you can also get values from the hashtable\nby using the same `get()` method that I exposed on previous sections.\nLike it's `ArrayHashMap` brother, to delete values from this specific type of hashtable,\nwe also use the `orderedRemove()` and `swapRemove()` methods, with the same effects that\nI described at @sec-array-map.\n\nIf we take the code example that was exposed at @sec-string-hash-map, we can\nachieve the exact same result with `StringArrayHashMap()`. All we have to do\nis to change the use of `StringHashMap()` to `StringArrayHashMap()` at the\nfifth line in this code example. It would change to this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar ages = std.StringArrayHashMap(u8).init(allocator);\n```\n:::\n\n\n\n\n\n\n## Linked lists\n\nThe Zig Standard Library provides implementation for both single and doubly linked lists.\nA linked list is a linear data structure that looks like a chain, or, a rope.\nThe main advantage of this data structure is that you normally have fast\ninsertion and deletion operations. But, as a disadvantage, iterating through\nthis data structure is usually not so fast as iterating through an array.\n\nThe idea behind a linked list is basically build a structure that concists of a series of nodes\nconnected to each other by pointers. This means that linked lists are usually not contiguos\nin memory, because each node might be south in memory, but the next node might be north\nin memory, then the next node might be left in memory, anyway, you get it, they can be anywhere.\n\nAt @fig-linked-list we can see a diagram of a singly linked list. Notice that we begin with\na first node. This first node is usually called \"the head of the linked list\". Then, from this\nfirst node we uncover the remaining nodes in the structure, by following the locations pointed\nby the pointers.\n\nEvery node have two things in it. It have the value that is stored in the current node\n, and also have a pointer. This pointer points to the next node in the list. If this pointer\nis null, then, it means that we reached the end of our linked list.\n\n![A diagram of a singly linked list.](./../Figures/linked-list.png){#fig-linked-list}\n\n\nAt @fig-linked-list2 we can see a diagram of a doubly linked list. The only thing that really\nchanges is that every node in the linked list have both a pointer to the previous node,\nand, a pointer to the next node. So every node have now two pointers in it. These are\nusually called the `prev` (for \"previous\") and `next` (for \"next\") pointers of the node.\n\nIn the singly linked list example, we had only one single pointer in each node, and this singular\npointer was always pointing to the next node in the sequence. In other words, singly linked lists\nnormally have only the `next` pointer in them.\n\n![A diagram of a doubly linked list.](./../Figures/doubly-linked-list.png){#fig-linked-list2}\n\n\n\nLinked lists are available in Zig through the functions `SinglyLinkedList()` and\n`DoublyLinkedList()`, for \"singly linked lists\" and \"doubly linked lists\", respectively. These functions are\nactually generic functions, which we are going to talk more about at @sec-generic-fun.\n\nFor now, just understand that, in order to create a linked list object,\nwe begin by providing a data type to these functions. This data type defines\nthe type of data that this linked list will store. In the example below,\nwe are creating a singly linked list capable of storing `u32` values.\nSo each node in this linked list will store a `u32` value.\n\nBoth the `SinglyLinkedList()` and `DoublyLinkedList()` functions returns a type, i.e. a struct definition, as result. This means that\nthe object `Lu32` is actually a type definition, or a struct definition. It defines\nthe type \"singly linked list of `u32` values\".\n\nSo now that we have the definition of the struct, we have to instantiate a `Lu32` object.\nWe normally instantiate struct objects in Zig by using an `init()` method.\nBut in this case, we are instantiating the struct directly, by using an empty\n`struct` literal, in the expression `Lu32{}`.\n\nIn this example, we first create multiple node objects, and after we create them,\nwe start to insert and connect these nodes to build the linked list, using the\n`prepend()` and `insertAfter()` methods. Notice that the `prepend()` method\nis a method from the linked list object, while the `insertAfter()` is a method\npresent in the node objects.\n\nIn essence, the `prepend()` method inserts a node at the beginning of the linked\nlist. In other words, the node that you provide to this method, becomes the new\n\"head node\" of the linked list. It becomes the first node in the list (see @fig-linked-list).\n\nOn the other side, the `insertAfter()` method is used to basically connect two nodes together.\nWhen you provide a node to this method, it creates a pointer to this input node,\nand stores this pointer in the current node, from which the method was called from.\nIn other words, this method creates the pointer that connects these two nodes together\nand stores it in the `next` attribute of the current node.\n\nSince doubly linked list have both a `next` and a `prev` pointers in each node,\nreferring to the next and previous nodes in the sequence, respectively,\nas I described at @fig-linked-list2, a node object created from\na `DoublyLinkedList()` object would have both a\n`insertBefore()` (for `prev`) and a `insertAfter()` (for `next`) methods\navailable.\n\nThis means that, if we used a doubly linked list, we could use the `insertBefore()` method\nto store the pointer to the input node in the `prev` attribute. This would put the input\nnode as the \"previous node\", or, the node before the current node. The `insertAfter()` method\nhave \"after\" in it's name to indicate that this method puts the pointer created to the input\nnode in the `next` attribute of the current node, and as the result, the input node becomes\nthe \"next node\" of the current node.\n\nSince we are using a singly linked list in this example, we have only the `insertAfter()` method\navailable in the node objects that we create from our `Lu32` type.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SinglyLinkedList = std.SinglyLinkedList;\nconst Lu32 = SinglyLinkedList(u32);\n\npub fn main() !void {\n var list = Lu32{};\n var one = Lu32.Node{ .data = 1 };\n var two = Lu32.Node{ .data = 2 };\n var three = Lu32.Node{ .data = 3 };\n var four = Lu32.Node{ .data = 4 };\n var five = Lu32.Node{ .data = 5 };\n\n list.prepend(&two); // {2}\n two.insertAfter(&five); // {2, 5}\n list.prepend(&one); // {1, 2, 5}\n two.insertAfter(&three); // {1, 2, 3, 5}\n three.insertAfter(&four); // {1, 2, 3, 4, 5}\n}\n```\n:::\n\n\n\n\n\nThere are other methods available from the linked list object, depending if this object is\na singly linked list or a doubly linked list, that might be very useful for you, like:\n\n- `remove()` to remove a specific node from the linked list.\n- `popFirst()` to remove the first node from the linked list.\n- if singly linked list, `len()` to count how many nodes there is in the linked list.\n- if doubly linked list, checkout the `len` attribute to see how many nodes there is in the linked list.\n- if singly linked list, `popFirst()` to remove the first node from the linked list.\n- if doubly linked list, `pop()` and `popFirst()` to remove the last and first nodes from the linked list, respectively.\n- if doubly linked list, `append()` to add a new node to end of the linked list (i.e. inverse of `prepend()`).\n\n\n\n## Multi array structure\n\nZig introduces a new data structure called `MultiArrayList()`. It is a different version of the dynamic array\nthat we have introduced at @sec-dynamic-array. The difference between this structure and the `ArrayList()`\nthat we know from @sec-dynamic-array, is that `MultiArrayList()` creates a separate dynamic array\nfor each field of the struct that you provide as input.\n\nConsider the following code example. We create a new custom struct called `Person`. This\nstruct contains three different data members, or, three different fields. As consequence,\nwhen we provide this `Person` data type as input to `MultiArrayList()`, this\ncreates a \"struct of three different arrays\" called `PersonArray`. In other words,\nthis `PersonArray` is a struct that contains three internal dynamic arrays in it.\nOne array for each field found in the `Person` struct definition.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Person = struct {\n name: []const u8,\n age: u8,\n height: f32,\n};\nconst PersonArray = std.MultiArrayList(Person);\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var people = PersonArray{};\n defer people.deinit(allocator);\n\n try people.append(allocator, .{\n .name = \"Auguste\", .age = 15, .height = 1.54\n });\n try people.append(allocator, .{\n .name = \"Elena\", .age = 26, .height = 1.65\n });\n try people.append(allocator, .{\n .name = \"Michael\", .age = 64, .height = 1.87\n });\n}\n```\n:::\n\n\n\n\nIn other words, instead of creating an array of \"persons\", the `MultiArrayList()` function\ncreates a \"struct of arrays\". Each data member of this struct is a different array that stores\nthe values of a specific field from the `Person` struct values that were added (or, appended) to this \"struct of arrays\".\nOne important detail is that each of these separate internal arrays stored inside `PersonArray`\nare dynamic arrays. This means that these arrays can grow in capacity automatically as needed, to accomodate\nmore values.\n\nThe @fig-multi-array exposed below presents a diagram that describes the `PersonArray` struct\nthat we have created in the previous code example. Notice that the values of the data members\npresent in each of the three `Person` values that we have appended into the `PersonArray` object\nthat we have instantiated, are scattered across three different internal arrays of the `PersonArray` object.\n\n![A diagram of the `PersonArray` struct.](./../Figures/multi-array.png){#fig-multi-array}\n\nYou can easily access each of these arrays separately, and iterate over the values of each array.\nFor that, you will need to call the `items()` method from the `PersonArray` object, and provide as input\nto this method, the name of the field that you want to iterate over.\nIf you want to iterate through the `.age` array for example, then, you need to call `items(.age)` from\nthe `PersonArray` object, like in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (people.items(.age)) |*age| {\n try stdout.print(\"Age: {d}\\n\", .{age.*});\n}\n```\n:::\n\n\n\n\n```\nAge: 15\nAge: 26\nAge: 64\n```\n\n\nIn the above example, we are iterating over the values of the `.age` array, or,\nthe internal array of the `PersonArray` object that contains the values of the `age`\ndata member from the `Person` values that were added to the multi array struct.\n\nIn this example we are calling the `items()` method directly from the `PersonArray`\nobject. However, it is recommended on most situations to call this `items()` method\nfrom a \"slice object\", which you can create from the `slice()` method.\nThe reason for this is that calling `items()` multiple times have better performance\nif you use a slice object.\n\nIn other words, if you are planning to access only one of the\ninternal arrays from your \"multi array struct\", it is fine to call `items()` directly\nfrom the multi array object. But if you need to access many of the internal arrays\nfrom your \"multi array struct\", then, you will likely need to call `items()` more\nthan once, and, in such circustance, is better to call `items()` through a slice object.\nThe example below demonstrates the use of such object:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar slice = people.slice();\nfor (slice.items(.age)) |*age| {\n age.* += 10;\n}\nfor (slice.items(.name), slice.items(.age)) |*n,*a| {\n try stdout.print(\n \"Name: {s}, Age: {d}\\n\", .{n.*, a.*}\n );\n}\n```\n:::\n\n\n\n\n```\nName: Auguste, Age: 25\nName: Elena, Age: 36\nName: Michael, Age: 74\n```\n\n\n## Conclusion\n\nThere are many other data structures that I did not presented here.\nBut you can check them out at the offical Zig Standard Library documentation page.\nActually, when you get into the [homepage of the documentation](https://ziglang.org/documentation/master/std/#)[^home], the first thing\nthat appears to you in this page, is a list of types and data structures.\n\n\nIn this section you can see a list of the many different data structures available in\nthe Zig Standard Library. There are some very specific structures in this list, like a\n[`BoundedArray` struct](https://ziglang.org/documentation/master/std/#std.bounded_array.BoundedArray)[^bounded]\n, but there is also some more general structures, such as a\n[`PriorityQueue` struct](https://ziglang.org/documentation/master/std/#std.priority_queue.PriorityQueue)[^priority].\n\n\n[^home]: \n[^priority]: .\n[^bounded]: \n\n\n\n\n\n\n", - "supporting": [ - "09-data-structures_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/09-error-handling/execute-results/html.json b/_freeze/Chapters/09-error-handling/execute-results/html.json index 5ede25bf..a5b3e7c1 100644 --- a/_freeze/Chapters/09-error-handling/execute-results/html.json +++ b/_freeze/Chapters/09-error-handling/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Error handling and unions {#sec-error-handling}\n\nIn this chapter, I want to discuss how error handling is done in Zig.\nWe already briefly learned about one of the available strategies to handle errors in Zig,\nwhich is the `try` keyword presented at @sec-main-file. But we still haven't learned about\nthe other methods, such as the `catch` keyword.\nI also want to discuss in this chapter how enum types are created in Zig.\n\n## Learning more about errors in Zig\n\nBefore we get into how error handling is done, we need to learn more about what errors are in Zig.\nAn error is actually a value in Zig [@zigoverview]. In other words, when an error occurs inside your Zig program,\nit means that somewhere in your Zig codebase, an error value is being generated.\nAn error value is similar to any integer value that you create in your Zig code.\nYou can take an error value and pass it as input to a function,\nand you can also cast (or coerce) it into a different type of error value.\n\nThis have some similarities with exceptions in C++ and Python.\nBecause in C++ and Python, when an exception happens inside a `try` block,\nyou can use a `catch` block (in C++) or an `except` block (in Python)\nto capture the exception produced in the `try` block,\nand pass it to functions as an input.\n\n\nAlthough they are normal values as any other, you cannot ignore error values in your Zig code. Meaning that, if an error\nvalue appears somewhere in your source code, this error value must be explicitly handled in some way.\nThis also means that you cannot discard error values by assigning them to a underscore,\nas you could do with normal values and objects.\n\nTake the source code below as an example. Here we are trying to open a file that does not exist\nin my computer, and as a result, an obvious error value of `FileNotFound` is returned from the `openFile()`\nfunction. But because I'm assigning the result of this function to an underscore, I end up\ntrying to discard an error value.\n\nThe `zig` compiler detects this mistake, and raises a compile\nerror telling me that I'm trying to discard an error value.\nIt also adds a note message that suggests the use of `try`,\n`catch` or an if statement to explicitly handle this error value\nThis note is reinforcing that every possible error value must be explicitly handled in Zig.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst dir = std.fs.cwd();\n_ = dir.openFile(\"doesnt_exist.txt\", .{});\n```\n:::\n\n\n\n\n```\nt.zig:8:17: error: error set is discarded\nt.zig:8:17: note: consider using 'try', 'catch', or 'if'\n```\n\n### Returning errors from functions\n\nAs we described at @sec-main-file, when we have a function that might return an error\nvalue, this function normally includes an exclamation mark (`!`) in it's return type\nannotation. The presence of this exclamation mark indicates that this function might\nreturn an error value as result, and, the `zig` compiler forces you to always handle explicitly\nthe case of this function returning an error value.\n\nTake a look at the `print_name()` function below. This function might return an error in the `stdout.print()` function call,\nand, as a consequence, it's return type (`!void`) includes an exclamation mark in it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn print_name() !void {\n const stdout = std.getStdOut().writer();\n try stdout.print(\"My name is Pedro!\", .{});\n}\n```\n:::\n\n\n\n\nIn the example above, we are using the exclamation mark to tell the `zig` compiler\nthat this function might return some error. But which error exactly is returned from\nthis function? For now, we are not specifying a specific error value. We only\nknown for now that some error value (whatever it is) might be returned.\n\nBut in fact, you can (if you want to) specify clearly which exact error values\nmight be returned from this function. There are lot of examples of\nthis in the Zig Standard Library. Take this `fill()` function from\nthe `http.Client` module as an example. This function returns\neither a error value of type `ReadError`, or `void`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn fill(conn: *Connection) ReadError!void {\n // The body of this function ...\n}\n```\n:::\n\n\n\n\nThis idea of specifying the exact error values that you expect to be returned\nfrom the function is interesting. Because they automatically become some sort of documentation\nof your function, and also, it allows the `zig` compiler to perform some extra checks over\nyour code. Because it can check if there is any other type of error value\nthat is being generated inside your function, and, that it is not being accounted\nfor in this return type annotation.\n\nAnyway, you can list the types of errors that can be returned from the function\nby listing them on the left side of the exclamation mark. While the valid values\nstay on the right side of the exclamation mark. So the syntax format become:\n\n```\n!\n```\n\n### Error sets\n\nBut what about when we have a single function that might return different types of errors?\nWhen you have such a function, you can list\nall of these different types of errors that can be returned from this function,\nthrough a structure in Zig that we call of *error set*.\n\nAn error set is a special case of an union type.\nIt essentially is an union that contains error values in it.\nNot all programming languages have a notion of an \"union object\".\nBut in summary, an union is just a list of the options that\nan object can be. For example, a union of `x`, `y` and `z`, means that\nan object can be either of type `x`, or type `y` or type `z`.\n\nWe are going to talk in more depth about unions at @sec-unions.\nBut you can write an error set by writing the keyword `error` before\na pair of curly braces, then you list the error values that can be\nreturned from the function inside this pair of curly braces.\n\nTake the `resolvePath()` function below as an example, which comes from the\n`introspect.zig` module of the Zig Standard Library. We can see in it's return type annotation, that this\nfunction return either: 1) a valid slice of `u8` values (`[]u8`); or, 2) one of the three different\ntypes of error values listed inside the error set (`OutOfMemory`, `Unexpected`, etc.).\nThis is an example of use of an error set.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn resolvePath(\n ally: mem.Allocator,\n p: []const u8,\n) error{\n OutOfMemory,\n CurrentWorkingDirectoryUnlinked,\n Unexpected,\n}![]u8 {\n // The body of the function ...\n}\n```\n:::\n\n\n\n\n\nThis is a valid way of annotating the return value of a Zig function. But, if you navigate through\nthe modules that composes the Zig Standard Library, you will notice that, for the majority of cases,\nthe programmers prefer to give a descriptive name to this error set, and then, use this name (or this \"label\")\nof the error set in the return type annotation, instead of using the error set directly.\n\nWe can see that in the `ReadError` error set that we showed earlier in the `fill()` function,\nwhich is defined in the `http.Client` module.\nSo yes, I presented the `ReadError` as if it was just a standard and single error value, but in fact,\nit is an error set defined in the `http.Client` module, and therefore, it actually represents\na set of different error values that might happen in the `fill()` and other functions.\n\n\nTake a look at the `ReadError` definition reproduced below. Notice that we are grouping all of these\ndifferent error values into a single object, and then, we use this object into the return type annotation of the functions.\nLike the `fill()` function that we showed earlier, or, the `readvDirect()` function from the same module,\nwhich is reproduced below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const ReadError = error{\n TlsFailure,\n TlsAlert,\n ConnectionTimedOut,\n ConnectionResetByPeer,\n UnexpectedReadFailure,\n EndOfStream,\n};\n// Some lines of code\npub fn readvDirect(\n conn: *Connection,\n buffers: []std.posix.iovec\n ) ReadError!usize {\n // The body of the function ...\n}\n```\n:::\n\n\n\n\nSo, an error set is just a convenient way of grouping a set of\npossible error values into a single object, or a single type of an error value.\n\n\n### Casting error values\n\nLet's suppose you have two different error sets, named `A` and `B`.\nIf error set `A` is a superset of error set `B`, then, you can cast (or coerce)\nerror values from `B` into error values of `A`.\n\nError sets are just a set of error values. So, if the error set `A`\ncontains all error values from the error set `B`, then `A`\nbecomes a superset of `B`. You could also say\nthat the error set `B` is a subset of error set `A`.\n\nThe example below demonstrates this idea. Because `A` contains all\nvalues from `B`, `A` is a superset of `B`.\nIn math notation, we would say that $A \\supset B$.\nAs a consequence, we can give an error value from `B` as input to the `cast()`\nfunction, and, implicitly cast this input into the same error value, but from the `A` set.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst A = error{\n ConnectionTimeoutError,\n DatabaseNotFound,\n OutOfMemory,\n InvalidToken,\n};\nconst B = error {\n OutOfMemory,\n};\n\nfn cast(err: B) A {\n return err;\n}\n\ntest \"coerce error value\" {\n const error_value = cast(B.OutOfMemory);\n try std.testing.expect(\n error_value == A.OutOfMemory\n );\n}\n```\n:::\n\n\n\n\n\n## How to handle errors\n\nNow that we learned more about what errors are in Zig,\nlet's discuss the available strategies to handle these errors,\nwhich are:\n\n- `try` keyword;\n- `catch` keyword;\n- an if statement;\n- `errdefer` keyword;\n\n\n\n### What `try` means?\n\nAs I described over the previous sections, when we say that an expression might\nreturn an error, we are basically referring to an expression that have\na return type in the format `!T`.\nThe `!` indicates that this expression returns either an error value, or a value of type `T`.\n\nAt @sec-main-file, I presented the `try` keyword and where to use it.\nBut I did not talked about what exactly this keyword does to your code,\nor, in other words, I have not explained yet what `try` means in your code.\n\nIn essence, when you use the `try` keyword in an expression, you are telling\nthe `zig` compiler the following: \"Hey! Execute this expression for me,\nand, if this expression return an error, please, return this error for me\nand stop the execution of my program. But if this expression return a valid\nvalue, then, return this value, and move on\".\n\nIn other words, the `try` keyword is essentially, a strategy to enter in panic mode, and stop\nthe execution of your program in case an error occurs.\nWith the `try` keyword, you are telling the `zig` compiler, that stopping the execution\nof your program is the most reasonable strategy to take if an error occurs\nin that particular expression.\n\n### The `catch` keyword\n\nOk, now that we understand properly what `try` means, let's discuss `catch` now.\nOne important detail here, is that you can use `try` or `catch` to handle your errors,\nbut you **cannot use `try` and `catch` together**. In other words, `try` and `catch`\nare different and completely separate strategies in the Zig language.\n\nThis is uncommon, and different than what happens in other languages. Most\nprogramming languages that adopts the *try catch* pattern (such as C++, R, Python, Javascript, etc.), normally use\nthese two keywords in conjunction to form the complete logic to\nproperly handle the errors.\nAnyway, Zig tries a different approach in the *try catch* pattern.\n\nSo, we learned already about what `try` means, and we also known that both\n`try` and `catch` should be used alone, separate from each other. But\nwhat exactly `catch` do in Zig? With `catch`, we can construct a block of\nlogic to handle the error value, in case it happens in the current expression.\n\nLook at the code example below. Once again, we go back to the previous\nexample where we were trying to open a file that doesn't exist in my computer,\nbut this time, I use `catch` to actually implement a logic to handle the error, instead of\njust stopping the execution right away.\n\nMore specifically, in this example, I'm using a logger object to record some logs into\nthe system, before I return the error, and stops the execution of the program. For example,\nthis could be some part of the codebase of a complex system that I do not have full control over,\nand I want to record these logs before the program crashes, so that I can debug it later\n(e.g. maybe I cannot compile the full program, and properly debug it with a debugger. So, these logs might\nbe a valid strategy to surpass this barrier).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst dir = std.fs.cwd();\nconst file = dir.openFile(\n \"doesnt_exist.txt\", .{}\n) catch |err| {\n logger.record_context();\n logger.log_error(err);\n return err;\n};\n```\n:::\n\n\n\n\n\nTherefore, we use `catch` to create a block of expressions that will handle the error.\nI can return the error value from this block of expressions, like I did in the above example,\nwhich, will make the program enter in panic mode, and, stop the execution.\nBut I could also, return a valid value from this block of code, which would\nbe stored in the `file` object.\n\nNotice that, instead of writing the keyword before the expression that might return the error,\nlike we do with `try`,\nwe write `catch` after the expression. We can open the pair of pipes (`|`),\nwhich captures the error value returned by the expression, and makes\nthis error value available in the scope of the `catch` block as the object named `err`.\nIn other words, because I wrote `|err|` in the code, I can access the error value\nreturned by the expression, by using the `err` object.\n\nAlthough this being the most common use of `catch`, you can also use this keyword\nto handle the error in a \"default value\" style. That is, if the expression returns\nan error, we use the default value instead. Otherwise, we use the valid value returned\nby the expression.\n\nThe Zig official language reference, provides a great example of this \"default value\"\nstrategy with `catch`. This example is reproduced below. Notice that we are trying to parse\nsome unsigned integer from a string object named `str`. In other words, this function\nis trying to transform an object of type `[]const u8` (i.e. an array of characters, a string, etc.)\ninto an object of type `u64`.\n\nBut this parsing process done by the function `parseU64()` may fail, resulting in a runtime error.\nThe `catch` keyword used in this example provides an alternative value (13) to be used in case\nthis `parseU64()` function raises an error. So, the expression below essentially means:\n\"Hey! Please, parse this string into a `u64` for me, and store the results into the\nobject `number`. But, if an error occurs, then, return the value `13` instead\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number = parseU64(str, 10) catch 13;\n```\n:::\n\n\n\n\nSo, at the end of this process, the object `number` will contain either a `u64` integer\nthat was parsed succesfully from the input string `str`, or, if an error in the\nparsing process occurs, it will contain the `u64` value `13` that was provided by the `catch`\nkeyword as the \"default\", or, the \"alternative\" value.\n\n\n\n### Using if statements\n\nNow, you can also use if statements to handle errors in your Zig code.\nIn the example below, I'm reproducing the previous example, where\nwe try to parse an integer value from an input string with a function\nnamed `parseU64()`.\n\nWe execute the expression inside the \"if\". If this expression returns an\nerror value, the \"if branch\" (or, the \"true branch\") of the if statement is not executed.\nBut if this expression returns a valid value instead, then, this value is unwrapped\ninto the `number` object.\n\nThis means that, if the `parseU64()` expression returns a valid value, this value becomes available\ninside the scope of this \"if branch\" (i.e. the \"true branch\") through the object that we listed inside the pair\nof pipe charactes (`|`), which is the object `number`.\n\nIf an error occurs, we can use an \"else branch\" (or the \"false branch\") of the if statement\nto handle the error. In the example below, we are using the `else` in the if statement\nto unwrap the error value (that was returned by `parseU64()`) into the `err` object,\nand handle the error.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nif (parseU64(str, 10)) |number| {\n // do something with `number` here\n} else |err| {\n // handle the error value.\n}\n```\n:::\n\n\n\n\nNow, if the expression that you are executing returns different types of error values,\nand you want to take a different action in each of these types of error values, the\n`catch` keyword becomes limited.\n\nFor this type of situation, the official documentation\nof the language suggests the use of a switch statement with an if statement [@zigdocs].\nThe basic idea is, to use the if statement to execute the expression, and\nuse the \"else branch\" to pass the error value to a switch statement, where\nyou define a different action for each type of error value that might be\nreturned by the expression executed in the if statement.\n\nThe example below demonstrates this idea. We first try to add (or register) a set of\ntasks to a queue. If this \"registration process\" occurs well, we then try\nto distribute these tasks across the workers of our system. But\nif this \"registration process\" returns an error value, we then use a switch\nstatement in the \"else branch\" to handle each possible error value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nif (add_tasks_to_queue(&queue, tasks)) |_| {\n distribute_tasks(&queue);\n} else |err| switch (err) {\n error.InvalidTaskName => {\n // do something\n },\n error.TimeoutTooBig => {\n // do something\n },\n error.QueueNotFound => {\n // do somethimg\n },\n // and all the other error options ...\n}\n```\n:::\n\n\n\n\n\n### The `errdefer` keyword\n\nA commom pattern in C programs in general, is to clean resources when an error occurs during\nthe execution of the program. In other words, one commom way to handle errors, is to perform\n\"cleanup actions\" before we exit our program. This garantees that a runtime error does not make\nour program to leak resources of the system.\n\n\nThe `errdefer` keyword is a tool to perform such \"cleanup actions\" in hostile situations.\nThis keyword is commonly used to clean (or to free) allocated resources, before the execution of our program\nget's stopped because of an error value being generated.\n\nThe basic idea is to provide an expression to the `errdefer` keyword. Then,\n`errdefer` executes this expression if, and only if, an error occurs\nduring the execution of the current scope.\nIn the example below, we are using an allocator object (that we presented at @sec-allocators)\nto create a new `User` object. If we are succesfull in creating and registering this new user,\nthis `create_user()` function will return this new `User` object as it's return value.\n\nHowever, if for some reason, an error value is generated by some expression\nthat is after the `errdefer` line, for example, in the `db.add(user)` expression,\nthe expression registered by `errdefer` get's executed before the error value is returned\nfrom the function, and before the program enters in panic mode and stops the\ncurrent execution.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn create_user(db: Database, allocator: Allocator) !User {\n const user = try allocator.create(User);\n errdefer allocator.destroy(user);\n\n // Register new user in the Database.\n _ = try db.register_user(user);\n return user;\n}\n```\n:::\n\n\n\n\nBy using `errdefer` to destroy the `user` object that we have just created,\nwe garantee that the memory allocated for this `user` object\nget's freed, before the execution of the program stops.\nBecause if the expression `try db.add(user)` returns an error value,\nthe execution of our program stops, and we loose all references and control over the memory\nthat we have allocated for the `user` object.\nAs a result, if we do not free the memory associated with the `user` object before the program stops,\nwe cannot free this memory anymore. We simply loose our chance to do the right thing.\nThat is why `errdefer` is essential in this situation.\n\nJust to make very clear the differences between `defer` (which I described at @sec-defer)\nand `errdefer`, it might be worth to discuss the subject a bit further.\nYou might still have the question \"why use `errdefer` if we can use `defer` instead?\"\nin your mind.\n\nAlthough being similar, the key difference between `errdefer` and `defer` keyword\nis when the provided expression get's executed.\nThe `defer` keyword always execute the provided expression at the end of the\ncurrent scope, no matter how your code exits this scope.\nIn contrast, `errdefer` executes the provided expression only when an error occurs in the\ncurrent scope.\n\nThis becomes important if a resource that you allocate in the\ncurrent scope get's freed later in your code, in a different scope.\nThe `create_user()` functions is an example of this. If you think\nclosely about this function, you will notice that this function returns\nthe `user` object as the result.\n\nIn other words, the allocated memory for the `user` object does not get\nfreed inside the `create_user()`, if the function returns succesfully.\nSo, if an error does not occur inside this function, the `user` object\nis returned from the function, and probably, the code that runs after\nthis `create_user()` function will be responsible for freeying\nthe memory of the `user` object.\n\nBut what if an error do occur inside the `create_user()`? What happens then?\nThis would mean that the execution of your code would stop in this `create_user()`\nfunction, and, as a consequence, the code that runs after this `create_user()`\nfunction would simply not run, and, as a result, the memory of the `user` object\nwould not be freed before your program stops.\n\nThis is the perfect scenario for `errdefer`. We use this keyword to garantee\nthat our program will free the allocated memory for the `user` object,\neven if an error occurs inside the `create_user()` function.\n\nIf you allocate and free some memory for an object in the same scope, then,\njust use `defer` and be happy, `errdefer` have no use for you in such situation.\nBut if you allocate some memory in a scope A, but you only free this memory\nlater, in a scope B for example, then, `errdefer` becomes useful to avoid leaking memory\nin sketchy situations.\n\n\n\n## Union type in Zig {#sec-unions}\n\nAn union type defines a set of types that an object can be. It is like a list of\noptions. Each option is a type that an object can assume. Therefore, unions in Zig\nhave the same meaning, or, the same role as unions in C. They are used for the same purpose.\nYou could also say that unions in Zig produces a similar effect to\n[`typing.Union` in Python](https://docs.python.org/3/library/typing.html#typing.Union)[^pyunion].\n\n[^pyunion]: \n\nFor example, you might be creating an API that sends data to a data lake, hosted\nin some private cloud infrastructure. Suppose you created different structs in your codebase,\nto store the necessary information that you need, in order to connect to the services of\neach mainstream data lake service (Amazon S3, Azure Blob, etc.).\n\nNow, suppose you also have a function named `send_event()` that receives an event as input,\nand, a target data lake, and it sends the input event to the data lake specified in the\ntarget data lake argument. But this target data lake could be any of the three mainstream data lakes\nservices (Amazon S3, Azure Blob, etc.). Here is where an union can help you.\n\nThe union `LakeTarget` defined below allows the `lake_target` argument of `send_event()`\nto be either an object of type `AzureBlob`, or type `AmazonS3`, or type `GoogleGCP`.\nThis union allows the `send_event()` function to receive an object of any of these three types\nas input in the `lake_target` argument.\n\nRemember that each of these three types\n(`AmazonS3`, `GoogleGCP` and `AzureBlob`) are separate structs that we defined in\nour source code. So, at first glance, they are separate data types in our source code.\nBut is the `union` keyword that unifies them into a single data type called `LakeTarget`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst LakeTarget = union {\n azure: AzureBlob,\n amazon: AmazonS3,\n google: GoogleGCP,\n};\n\nfn send_event(\n event: Event,\n lake_target: LakeTarget\n) bool {\n // body of the function ...\n}\n```\n:::\n\n\n\n\nAn union definition is composed by a list of data members. Each data member is of a specific data type.\nIn the example above, the `LakeTarget` union have three data members (`azure`, `amazon`, `google`).\nWhen you instantiate an object that uses an union type, you can only use one of it's data members\nin this instantiation.\n\nYou could also interpret this as: only one data member of an union type can be activated at a time, the other data\nmembers remain deactivated and unaccessible. For example, if you create a `LakeTarget` object that uses\nthe `azure` data member, you can no longer use or access the data members `google` or `amazon`.\nIt is like if these other data members didn't exist at all in the `LakeTarget` type.\n\nYou can see this logic in the example below. Notice that, we first instantiate the union\nobject using the `azure` data member. As a result, this `target` object contains only\nthe `azure` data member inside of it. Only this data member is active in this object.\nThat is why the last line in this code example is invalid. Because we are trying to instantiate the data member\n`google`, which is currently inactive for this `target` object, and as a result, the program\nenters in panic mode warning us about this mistake through a loud error message.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar target = LakeTarget {\n .azure = AzureBlob.init()\n};\n// Only the `azure` data member exist inside\n// the `target` object, and, as a result, this\n// line below is invalid:\ntarget.google = GoogleGCP.init();\n```\n:::\n\n\n\n\n```\nthread 2177312 panic: access of union field 'google' while\n field 'azure' is active:\n target.google = GoogleGCP.init();\n ^\n```\n\nSo, when you instantiate an union object, you must choose one of the data types (or, one of the data members)\nlisted in the union type. In the example above, I choose to use the `azure` data member, and, as a result,\nall other data members were automatically deactivated,\nand you can no longer use them after you instantiate the object.\n\nYou can activate another data member by completely redefining the entire enum object.\nIn the example below, I initially use the `azure` data member. But then, I redefine the\n`target` object to use a new `LakeTarget` object, which uses this time the `google` data member.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar target = LakeTarget {\n .azure = AzureBlob.init()\n};\ntarget = LakeTarget {\n .google = GoogleGCP.init()\n};\n```\n:::\n\n\n\n\nAn curious fact about union types, is that, at first, you cannot use them in switch statements (that we preseted at @sec-switch).\nIn other words, if you have an object of type `LakeTarget` for example, you cannot give this object\nto a switch statement as input.\n\nBut what if you really need to do so? What if you actually need to\nprovide an \"union object\" to a switch statement? The answer to this question relies on another special type in Zig,\nwhich are the *tagged unions*. To create a tagged union, all you have to do is to add\nan enum type into your union declaration.\n\nAs an example of a tagged union in Zig, take the `Registry` type exposed\nbelow. This type comes from the\n[`grammar.zig` module](https://github.com/ziglang/zig/blob/30b4a87db711c368853b3eff8e214ab681810ef9/tools/spirv/grammar.zig)[^grammar]\nfrom the Zig repository. This union type lists different types of registries.\nBut notice this time, the use of `(enum)` after the `union` keyword. This is what makes\nthis union type a tagged union. Also, by being a tagged union, an object of this `Registry` type\ncan be used as input in a switch statement. This is all you have to do. Just add `(enum)`\nto your `union` declaration, and you can use it in switch statements.\n\n[^grammar]: .\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Registry = union(enum) {\n core: CoreRegistry,\n extension: ExtensionRegistry,\n};\n```\n:::\n", - "supporting": [ - "09-error-handling_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/10-stack-project/execute-results/html.json b/_freeze/Chapters/10-stack-project/execute-results/html.json index f4498127..65790c42 100644 --- a/_freeze/Chapters/10-stack-project/execute-results/html.json +++ b/_freeze/Chapters/10-stack-project/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Project 3 - Building a stack data structure\n\nIn this chapter we are going to implement a stack data structure as our next small project\nin this book. Implementing basic data structures in any language is kind of a\n\"kindergarten task\" (if this term even exist) in computer science (CS), because\nwe normally learn and implement them in the first semesters of CS.\n\nBut this is actually good! Since this should be a very easy task, we don't need much to explain\nwhat a stack is, then, we can concentrate on what is really important here, which is learning\nhow the concept of \"generics\" are implemented in the Zig language, and how one of the key\nfeatures of Zig, which is comptime, works, and use the stack data structure to demonstrate\nthese concepts on the fly.\n\nBut before we get into building the stack data structure, we first need to understand\nwhat the `comptime` keyword does to your code, and after that, we also need to learn about\nhow generics work in Zig.\n\n\n## Understanding `comptime` in Zig {#sec-comptime}\n\nOne of the key features of Zig is `comptime`. This keyword introduces a whole\nnew concept and paradigm, that is tightly connected with the compilation process.\nAt @sec-compile-time we described the importance and the role that \"compile-time vs runtime\"\nplays into Zig. At that section, we learned that the rules applied to a value/object change\na lot depending on whether this value is known at compile-time, or just at runtime.\n\nThe `comptime` keyword is strongly related to these two spaces of time in the compilation process,\ni.e. the compile-time and the runtime. Let's quickly recap the differences. Compile-time is the period of time when your\nZig source code is being compiled by the `zig` compiler, while the runtime is\nthe period of time when your Zig program is being executed, i.e. when we execute\nthe binary files outputted by the `zig` compiler.\n\nThere are three ways in which you can apply the `comptime` keyword, which are:\n\n- apply `comptime` on a function argument.\n- apply `comptime` on an object.\n- apply `comptime` on a block of expressions.\n\n\n\n### Applying over a function argument\n\nWhen you apply the `comptime` on a function argument you are saying to the `zig` compiler\nthat the value assigned to that particular argument of the function must be known at compile-time.\nWe explained in details at @sec-compile-time what exactly \"value known at compile-time\" means, so,\nin case you have doubts about this idea, comeback to that section.\n\nNow let's think about the consequences of this idea. First of all, we are imposing a limit, or, a requirement\nto that particular function argument. If the programmer accidentally tries to give a value to this\nfunction argument that is not known at compile time, the `zig` compiler will notice this problem, and\nas a consequence, it will raise a compilation error saying that it cannot compile your program. Because\nyou are providing a value that is \"runtime known\" to a function argument that must be \"compile-time known\".\n\nTake a look at this very simple example below, where we define a `double()` function, that simply\ndoubles the input value named `num`. Notice that, we use the `comptime` keyword before the name\nof the function argument. This keyword is marking the function argument `num` as a \"comptime argument\".\n\nThat is a function argument whose value must be compile-time known. This is why the expression `double(5678)`\nis valid, and no compilation errors are raised.\nBecause the value `5678` is compile-time known, so this is the expected behaviour for this function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn double(comptime num: u32) u32 {\n return num * 2;\n}\ntest \"test comptime\" {\n _ = double(5678);\n}\n```\n:::\n\n\n\n\nBut what if we provide a number that is not compile-time known to this function?\nFor example, we might provide a different input value to this function depending\non the target OS of our compilation process. The code example delow demonstrates such case.\n\nBecause the value of the object `n` is determined at runtime, we cannot provide this object\nas input to the `double()` function. The `zig` compiler will not allow it, because we marked\nthe `num` argument as a \"comptime argument\". That is why the `zig` compiler raises\nthe compile-time error exposed below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst builtin = @import(\"builtin\");\nfn double(comptime num: u32) u32 {\n return num * 2;\n}\ntest \"test comptime\" {\n var n: u32 = undefined;\n if (builtin.target.os.tag == .windows) {\n n = 1234;\n } else {\n n = 5678;\n }\n _ = double(n);\n}\n```\n:::\n\n\n\n\n```\nt.zig:12:16: error: runtime-known argument passed to comptime parameter \n```\n\nComptime arguments are frequently used on functions that return some sort\nof generic structure. In fact, `comptime` is the essence (or the basis) to make generics in Zig.\nWe are going to talk more about generics at @sec-generics.\n\nFor now, let's take a look at this code example from @karlseguin_generics. You\ncan see that this `IntArray()` function have one argument named `length`.\nThis argument is marked as comptime, and receives a value of type `usize` as input. So the value given to this argument\nmust be compile-time known.\nWe can also see that this function returns an array of `i64` values as output.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn IntArray(comptime length: usize) type {\n return [length]i64;\n}\n```\n:::\n\n\n\n\nNow, the key component of this function is the `length` argument. This argument\nis used to determine the size of the array that is produced by the function. Let's\nthink about the consequences of that. If the size of the array is dependent on\nthe value assigned to the `length` argument, this means that the data type of the\noutput of the function depends on the value of this `length` argument.\n\nLet this statement sink for a bit in your mind. As I described at @sec-root-file,\nZig is a strongly-typed language, specially on function declarations.\nSo every time we write a function in Zig, we have to annotate the data type of\nthe value returned by the function. But how can we do that, if this data type\ndepends on the value given to the argument of the function?\n\nThink about this for a second. If `length` is equal to 3 for example, then, the\nreturn type of the function is `[3]i64`. But if `length` is equal to 40, then,\nthe return type becomes `[40]i64`. At this point the `zig` compiler would be confused,\nand raise a compilation error, saying something like this:\n\n> Hey! You have annotated that this function should return a `[3]i64` value, but I got a `[40]i64` value instead! This doesn't look right!\n\nSo how can you solve this problem? How do we overcome this barrier? This is when\nthe `type` keyword comes in. This `type` keyword is basically saying to the\n`zig` compiler that this function will return some type, but it doesn't know yet\nwhat exactly type that is. We will talk more about this at @sec-generics.\n\n\n\n### Applying over an expression\n\nWhen you apply the `comptime` keyword over an expression, then, it is garanteed that the `zig` compiler will\nexecute this expression at compile-time. If for some reason, this expression cannot be executed at compile-time\n(e.g. for example, maybe this expression depends on a value that is only known at runtime), then, the `zig` compiler\nwill raise a compilation error.\n\nTake this example from the official documentation of Zig [@zigdocs]. We\nare executing the same `fibonacci()` function both at runtime, and, at compile-time.\nThe function is by default executed at runtime, but because we use the `comptime`\nkeyword at the second \"try expression\", this expression is executed at compile-time.\n\nThis might be a bit confusing for some people. Yes! When I say that this expression\nis executed at compile-time, I mean that this expression is compiled and executed\nwhile the `zig` compiler is compiling your Zig source code.\n\nIf you think hard enough about how the C and C++ macros work, you can reach the conclusion\nthat they are also executed at compile-time. Because C and C++ macros are\njust simple text replacements operations, that are performed at compile-time to form the final version\nof the C or C++ source code that get's sent to the compiler.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst expect = @import(\"std\").testing.expect;\nfn fibonacci(index: u32) u32 {\n if (index < 2) return index;\n return fibonacci(index - 1) + fibonacci(index - 2);\n}\n\ntest \"fibonacci\" {\n // test fibonacci at run-time\n try expect(fibonacci(7) == 13);\n // test fibonacci at compile-time\n try comptime expect(fibonacci(7) == 13);\n}\n```\n:::\n\n\n\n\nA lot of your Zig source code might be potentially executed at compile-time,\nbecause the `zig` compiler can figure it out the result value of some types of expressions.\nSpecially if these expressions involves only compile-time known values.\nWe have talked about this at @sec-compile-time.\n\nBut when you use the `comptime` keyword on an expression, there is no \"it might be executed\nat compile-time\" anymore. With the `comptime` keyword you are ordering the `zig` compiler\nto execute this expression at compile-time. You are imposing this rule, it is garanteed\nthat the compiler will always execute it at compile-time. Or it at least, the compiler\nwill try to execute it. If the compiler cannot execute the expression for whatever reason,\nthe compiler will raise a compilation error.\n\n\n### Applying over a block\n\nBlocks were described at @sec-blocks. When you apply the `comptime` keyword over a\nblock of expressions, you get essentially the same effect when you apply this keyword to\na single expression. That is, the entire block of expressions is executed at\ncompile-time by the `zig` compiler.\n\nIn the example below, we mark the block labeled of `blk` as a comptime block,\nand, therefore, the expressions inside this block are executed at compile-time.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst expect = @import(\"std\").testing.expect;\nfn fibonacci(index: u32) u32 {\n if (index < 2) return index;\n return fibonacci(index - 1) + fibonacci(index - 2);\n}\n\ntest \"fibonacci in a block\" {\n const x = comptime blk: {\n const n1 = 5;\n const n2 = 2;\n const n3 = n1 + n2;\n try expect(fibonacci(n3) == 13);\n break :blk n3;\n };\n _ = x;\n}\n```\n:::\n\n\n\n\n\n\n\n\n## Introducing Generics {#sec-generics}\n\nFirst of all, what is a generic? Generic is the idea to allow a type\n(`f64`, `u8`, `u32`, `bool`, and also, user-defined types, like the `User` struct\nthat we defined at @sec-structs-and-oop) to be a parameter to methods, classes and\ninterfaces [@geeks_generics]. In other words, a \"generic\" is a class (or a method) that can work\nwith multiple data types.\n\nFor example, in Java, generics are created through the operator `<>`. With this operator,\na Java class is capable of receiving a data type as input, and therefore, the class can fit\nit's features according to this input data type.\nAs another example, generics in C++ are supported through the concept of templates.\nClass templates in C++ are generics.\n\nIn Zig, generics are implemented through `comptime`. The `comptime` keyword\nallows us to collect a data type at compile time, and pass this data type as\ninput to a piece of code.\n\n\n### A generic function {#sec-generic-fun}\n\nTake the `max()` function exposed below as a first example.\nThis function is essentially a \"generic function\".\nIn this function, we have a comptime function argument named `T`.\nNotice that this `T` argument have a data type of `type`. Weird right? This `type` keyword is the\n\"father of all types\", or, \"the type of types\" in Zig. Because we used this `type` keyword in the `T`\nargument, we are telling the `zig` compiler that this `T` argument will receive some data type\nas input.\n\nAlso notice the use of the `comptime` keyword in this argument.\nAs I described at @sec-comptime, every time you use this keyword in a function argument,\nthis means that the value of this argument must be known at compile-time.\nThis makes sense, right? Because there is no data type that you can make\nthat is known only at runtime.\n\nThink about this. Every data type that you will ever write is always\nknown at compile-time. Specially because data types are an essential\ninformation for the compiler to actually compile your source code.\nHaving this in mind, makes sense to mark this argument as a comptime argument.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn max(comptime T: type, a: T, b: T) T {\n return if (a > b) a else b;\n}\n```\n:::\n\n\n\n\nAlso notice that the value of the `T` argument is actually used\nto define the data type of the other arguments of the function, `a` and `b`, and also at the\nreturn type annotation of the function.\nThat is, the data type of these arguments (`a` and `b`), and, the return data type of the function itself,\nare determined by the input value given to the `T` argument.\n\nAs a result, we have a generic function that works with different data types.\nFor example, I can provide `u8` values to this `max()` function, and it will work as expected.\nBut if I provide `f64` values instead, it will also work as expected.\nIf I did not use a generic function, I would have to write a `max()`\nfor each one of the data types that I wanted to use.\nThis generic function provides a very useful shortcut for us.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn max(comptime T: type, a: T, b: T) T {\n return if (a > b) a else b;\n}\ntest \"test max\" {\n const n1 = max(u8, 4, 10);\n std.debug.print(\"Max n1: {d}\\n\", .{n1});\n const n2 = max(f64, 89.24, 64.001);\n std.debug.print(\"Max n2: {d}\\n\", .{n2});\n}\n```\n:::\n\n\n\n\n```\nMax n1: 10\nMax n2: 89.24\n```\n\n\n\n### A generic data structure {#sec-generic-struct}\n\nEvery data structure that you find in the Zig Standard Library (e.g. `ArrayList`, `HashMap`, etc.)\nis essentially a generic data structure.\nThese data structures are generic in the sense that they work with any data type you want.\nYou just say which is the data type of the values that are going to be stored in this data\nstructure, and they just work as expected.\n\nA generic data structure in Zig is the way to replicate a generic class from Java,\nor, a class template from C++.\nBut you may quest yourself: how do we build a generic data structure in Zig?\n\nThe basic idea is to write a generic function that creates\nthe data structure definition for the specific type we want.\nIn other words, this generic function behaves as a \"constructor\". The function outputs the `struct` definition\nthat defines this data structure for a specific data type.\n\nTo create such function, we need to add a comptime argument to this function that receives a data type\nas input. We already learned how to do this at the previous section (@sec-generic-fun).\n\nI think the best way to demonstrate how to create a generic data structure is to actually write one.\nThis where we go to our next small project in this book. This one is a very small project,\nwhich is to write a stack data structure.\n\n\n\n\n## What is a stack? {#sec-what-stack}\n\nA stack data structure is a structure that follows a LIFO (*last in first out*) principle.\nOnly two operations are normally supported in a stack data structure, which are `push` and `pop`.\nThe `push` operation is used to add new values to the stack, while `pop` is used to remove\nvalues from the stack.\n\nWhen people try to explain how the stack data structure works, the most common analogy\nthat people use is a stack of plates. Imagine that you have\na stack of plates, for example, a stack of 10 plates in your table.\nEach plate represents a value that is currently stored in this stack.\n\nCurrently we have a stack with 10 different values, or 10 different plates. Now, imagine that you want to\nadd a new plate (or a new value) to this stack, which translates to the `push` operation.\nYou would add this plate (or this value) by just putting the new plate\non the top of the stack. Then, you would increase the stack to 11 plates.\n\nBut how would you remove plates (or remove values) from this stack (a.k.a. the `pop` operation) ?\nTo do that, we would have to remove the plate on the top of the stack, and, as a result, we would\nhave, once again, 10 plates in the stack.\n\nThis demonstrates the LIFO concept, because the first plate in the stack, which is the plate\nin the bottom of the stack is the last plate to get out of the stack. Think about it. In order\nto remove this specific plate from the stack, we have to remove all plates in the\nstack. So every operation in the stack, either insertion or deletion, is always made at the top of the stack.\nThe @fig-stack below exposes this logic visually:\n\n![A diagram of a stack structure. Source: Wikipedia, the free encyclopedia.](./../Figures/lifo-stack.svg){#fig-stack}\n\n\n\n## Writing the stack data structure\n\nWe are going to write the stack data structure in two steps. First, we are going\nto implement a stack that can only store `u32` values. Then, after that, we are going\nto extend our implementation to make it generic, so that it works with any data type\nwe want.\n\nFirst, we need to decide how the values will be stored inside the stack. There are multiple\nways to implement the storage behind a stack structure. Some people prefer to use a doubly linked list,\nsome others prefer to use a dynamic array, etc. In this example we are going to use an array behind the hood,\nto store the values in the stack, which is the `items` data member of our `Stack` struct definition.\n\nAlso notice in our `Stack` struct that we have three other data members: `capacity`, `length` and `allocator`.\nThe `capacity` member contains the capacity of the underlying array that stores the values in the stack.\nThe `length` contains the number of values that are currently being stored in the stack.\nAnd the `allocator` contains the allocator object which will be used by the stack structure whenever it\nneeds to allocate more space for the values that are being stored.\n\nWe begin by defining an `init()` method of this struct, which is going to be\nresponsible for instantiating a `Stack` object. Notice that, inside this\n`init()` method, we start by allocating an array with the capacity specified\nin the `capacity` argument. Then, we use the built-in function `@memset()`\nto set all elements of this new array that we have allocated to zero.\n\nIn other words, this `@memset()` function is basically equivalent to the `memset()`\nfunction from C Standard Library[^cmemset]. In this example, we are using this function to essentially\nassign the value `0` to all elements of this array. We do that to initialize this memory that was allocated.\nUninitialized memory is a commom source of undefined behaviour, so, this step\nis made for safety reasons.\n\n[^cmemset]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Allocator = std.mem.Allocator;\n\nconst Stack = struct {\n items: []u32,\n capacity: usize,\n length: usize,\n allocator: Allocator,\n\n pub fn init(allocator: Allocator, capacity: usize) !Stack {\n var buf = try allocator.alloc(u32, capacity);\n @memset(buf[0..], 0);\n return .{\n .items = buf[0..],\n .capacity = capacity,\n .length = 0,\n .allocator = allocator,\n };\n }\n};\n```\n:::\n\n\n\n\n\n### Implementing the `push` operation\n\nNow that we have written the basic logic to create a new `Stack` object,\nwe can start writing the logic responsible for performing a push operation.\nRemember, a push operation in a stack data structure is the operation\nresponsible for adding a new value to the stack.\n\nSo how can we add a new value to the `Stack` object that we have?\nThe `push()` function exposed below is an answer to this question.\nRemember from what we discussed at @sec-what-stack that values are always added to the top of the stack.\nThis means that this `push()` function must always find the element in the underlying array of the stack,\nthat currently represents the top position of the stack, and add this value there.\n\nFirst, we have an if statement in this function. This if statement is\nchecking whether we need to expand the underlying array to store\nthis new value that we are adding to the stack. In other words, maybe\nthe underlying array does not have enough capacity to store this new\nvalue, and, in this case, we need to expand our array to get the capacity that we need.\n\nSo, if the logic test in this if statement returns true, it means that the array\ndoes not have enough capacity, and we need to expand it before we store this new value.\nSo inside this if statement we are executing the necessary expressions\nto expand the underlying array.\n\nNotice that we use the allocator object to allocate a new array that is twice as bigger\nthan the current array (`self.capacity * 2`). Then, we use `memset()` once again to\nset all fields in this array to zero, making sure that all memory is properly initialized.\n\nAfter that, we use a different built-in function named `@memcpy()`. This built-in function\nis equivalent to the `memcpy()` function from the C Standard Library[^cmemcpy]. It is used to\ncopy the values from one block of memory to another block of memory. In other words,\nyou can use this function to copy the values from one array into another array.\n\n[^cmemcpy]: \n\nWe are using this `@memcpy()` built-in function to copy the values that are currently\nin the underlying array of the stack object (`self.items`) into our new and bigger array that\nwe have allocated (`new_buf`). After we execute this function, the `new_buf` contains a copy\nof the values that are present at `self.items`.\n\nNow that we have secured a copy of our current values in the `new_buf` object, we\ncan now free the memory currently allocated at `self.items`. After that, we just need\nto assign our new and bigger array to `self.items`. This is the sequence\nof steps necessary to expand our array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn push(self: *Stack, val: u32) !void {\n if ((self.length + 1) > self.capacity) {\n var new_buf = try self.allocator.alloc(\n u32, self.capacity * 2\n );\n @memset(new_buf[0..], 0);\n @memcpy(\n new_buf[0..self.capacity], self.items\n );\n self.allocator.free(self.items);\n self.items = new_buf;\n }\n\n self.items[self.length] = val;\n self.length += 1;\n}\n```\n:::\n\n\n\n\nAfter we make sure that we have enough room to store this new value\nthat we are adding to the stack, all we have to do is to assign\nthis value to the top element in this stack, and, increase the\nvalue of the `length` attribute by one. We find the top element\nin the stack by using the `length` attribute.\n\n\n\n### Implementing the `pop` operation\n\nNow, we can implement the pop operation of our stack object.\nThis is a much easier operation to implement, and the pop method below summarises\nall the logic that is needed.\n\nWe just have to find the element in the underlying array that currently represents the top\nof the stack, and set this element to zero, to indicate that\nthis element is \"empty\". After that, we also need to decrease\nthe `length` attribute of the stack by one.\n\nIf the current length of the stack is zero, it means that there is\nno values being stored in the stack currently. So, in this case,\nwe could just return from the function and do nothing really.\nThis is what the if statement inside this function is checking for.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn pop(self: *Stack) void {\n if (self.length == 0) return;\n\n self.items[self.length - 1] = 0;\n self.length -= 1;\n}\n```\n:::\n\n\n\n\n\n\n### Implementing the `deinit` method\n\nWe have implemented the methods responsible for the two main operations\nassociated with the stack data structure, which is `pop()` and `push()`,\nand we also have implemented the method responsible for instantiating\na new `Stack` object, which is the `init()` method.\n\nBut now, we need to implement also the method responsible for destroying\na `Stack` object. In Zig, this task is commonly associated with the method\nnamed `deinit()`. Most struct objects in Zig have such method, and it\nis commonly nicknamed of \"the destructor method\".\n\nIn theory, all we have to do to destroy the `Stack` object is to make\nsure that we free the allocated memory for the underlying array, using\nthe allocator object that is stored inside the `Stack` object.\nThis is what the `deinit()` method below is doing.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn deinit(self: *Stack) void {\n self.allocator.free(self.items);\n}\n```\n:::\n\n\n\n\n\n\n\n## Making it generic\n\nNow that we have implemented the basic skeleton of our stack data structure,\nwe can now focus on discussing how can we make it generic. How can we make\nthis basic skeleton to work not only with `u32` values, but also, with any other\ndata type we want?\nFor example, we might need to create a stack object to store `User` values\nin it. How can we make this possible? The answer lies on the use of generics\nand `comptime`.\n\nAs I described at @sec-generic-struct, the basic idea is to write a generic\nfunction that returns as result a struct definition as output.\nIn theory, we do not need much to transform our `Stack` struct into a generic\ndata structure. All that we need to do, to make our `Stack` struct\ngeneric, is to transform the underlying array in the stack structure\ninto a generic array.\n\nIn other words, this underlying array needs to be a \"chameleon\". It needs to adapt,\nand transform it into an array of the data type that we give it to. For example, if we need to create\na stack that will store `u8` values, then, this underlying array needs to be\na `u8` array (i.e. `[]u8`). But if we need to store `User` values instead, then,\nthis array needs to be a `User` array (i.e. `[]User`). Etc.\n\nWe do that by using a generic function. Because a generic function can receive a data type\nas input, and we can pass this data type to the struct definition of our `Stack` object.\nTherefore, we can use the generic function to create a `Stack` object that can store\nthe data type we want. If we want to create a stack structure that stores `User` values,\nwe pass the `User` data type to this generic function, and it will create for us\nthe struct definition that describes a `Stack` object that can store `User` values in it.\n\nTake the code example below as an example. I truncated some parts of the `Stack` struct\nfor brevity reasons. But if a specific part of our `Stack` struct is not exposed here\nin this example, is because this part did not changed from the previous example.\nIt remains the same.\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn Stack(comptime T: type) type {\n return struct {\n items: []T,\n capacity: usize,\n length: usize,\n allocator: Allocator,\n const Self = @This();\n\n pub fn init(allocator: Allocator, capacity: usize) !Stack(T) {\n var buf = try allocator.alloc(T, capacity);\n @memset(buf[0..], 0);\n return .{\n .items = buf[0..],\n .capacity = capacity,\n .length = 0,\n .allocator = allocator,\n };\n }\n\n pub fn push(self: *Self, val: T) !void {\n // Truncate the rest of the struct\n };\n}\n```\n:::\n\n\n\n\nNotice that we have created a function in this example named `Stack()`. This function\ntakes a type as input, and passes this type to the struct definition of our\n`Stack` object. The data member `items` is now, an array of type `T`, which is the\ndata type that we passed as input to the function. The function argument\n`val` in the `push()` function is now a value of type `T` too.\n\nNow we have finally a stack data structure that is generic. I mean, we probably still\nneed to adjust the `@memset()` calls, to trully make it generic for any data type.\nBut this version is good enough, and it really works well with any of the primitive data\ntypes of Zig.\n\nWe can just provide a data type to this function, and it will create a definition of a\n`Stack` object that can store values of the data type that we provided. In the example below, we are creating\nthe definition of a\n`Stack` object that can store `u8` values in it. This definition is stored at the `Stacku8` object.\nThis `Stacku8` object becomes our new struct, it is the struct that we are going to use\nto create our `Stack` object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nconst Stacku8 = Stack(u8);\nvar stack = try Stacku8.init(allocator, 10);\ndefer stack.deinit();\ntry stack.push(1);\ntry stack.push(2);\ntry stack.push(3);\ntry stack.push(4);\ntry stack.push(5);\ntry stack.push(6);\n\nstd.debug.print(\"Stack len: {d}\\n\", .{stack.length});\nstd.debug.print(\"Stack capacity: {d}\\n\", .{stack.capacity});\n\nstack.pop();\nstd.debug.print(\"Stack len: {d}\\n\", .{stack.length});\nstack.pop();\nstd.debug.print(\"Stack len: {d}\\n\", .{stack.length});\nstd.debug.print(\"Stack state: {any}\\n\", .{stack.items});\n```\n:::\n\n\n\n\n```\nStack len: 6\nStack capacity: 10\nStack len: 5\nStack len: 4\nStack state: { 1, 2, 3, 4, 0, 0, 0, 0, 0, 0 }\n```\n\nEvery generic data structure in the Zig Standard Library (`ArrayList`, `HashMap`, `SinlyLinkedList`, etc.) is implemented through this logic,\nof using a generic function to create the struct definition that can work\nwith the data type that you provided as input.\n\n\n\n\n## Conclusion\n\nThe full source code of the stack structure discussed in this chapter is freely available at the official\nrepository for this book. Just checkout the [`stack.zig`](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/data-structures/stack.zig)[^zig-stack]\nfor the `u32` version of our stack,\nand the [`generic_stack.zig`](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/data-structures/generic_stack.zig)[^zig-stack2]\nfor the generic version,\navailable inside the `ZigExamples` folder of the repository.\n\n\n[^zig-stack]: \n[^zig-stack2]: \n\n", - "supporting": [ - "10-stack-project_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/12-file-op/execute-results/html.json b/_freeze/Chapters/12-file-op/execute-results/html.json index ece7d694..f1092b56 100644 --- a/_freeze/Chapters/12-file-op/execute-results/html.json +++ b/_freeze/Chapters/12-file-op/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Filesystem and Input/Output (IO) {#sec-filesystem}\n\nIn this chapter we are going to discuss how to use the cross-platform structs and functions available\nin the Zig Standard Library that executes filesystem operations. Most of these functions and structs\ncomes from the `std.fs` module.\n\nWe are also going to talk about Input/Output (also known as IO) operations in Zig. Most of\nthese operations are made by using the structs and functions from `std.io` module, which defines\ndescriptors for the *standard channels* of your system (`stdout` and `stdin`), and also,\nfunctions to create and use I/O streams.\n\n\n## Input/Output basics {#sec-io-basics}\n\nIf you have programming experience in a high-level language, you certainly have used before\nthe input and output functionalities of this language. In other words, you certainly have\nbeen in a situation where you needed to sent some output to the user, or, to receive an input\nfrom the user.\n\nFor example, in Python we can receive some input from the user by using the `input()` built-in\nfunction. But we can also print (or \"show\") some output to the user by using the `print()`\nbuilt-in function. So yes, if you have programmed before in Python, you certainly have\nused these functions once before.\n\nBut do you know how these functions relate back to your operating system (OS)? How exactly\nthey are interacting with the resources of your OS to receive or sent input/output.\nIn essence, these input/output functions from high-level languages are just abstractions\nover the *standard output* and *standard input* channels of your operating system.\n\nThis means that we receive an input, or send some output, through the operating system.\nIt is the OS that makes the bridge between the user and your program. Your program\ndoes not have a direct access to the user. It is the OS that intermediates every\nmessage exchanged between your program and the user.\n\nThe *standard output* and *standard input* channels of your OS are commonly known as the\n`stdout` and `stdin` channels of your OS, respectively. In some contexts, they are also called of the *standard output device*\nand *standard input device*. As the name suggests, the *standard output*\nis the channel through which output flows, while the *standard input* is the channel in which\ninput flows.\n\nFurthermore, OS's also normally create a dedicated channel for exchanging error messages, known as the\n*standard error* channel, or, the `stderr` channel. This is the channel to which error and warning messages\nare usually sent to. These are the messages that are normally displayed in red-like or orange-like colors\ninto your terminal.\n\nNormally, every OS (e.g. Windows, MacOS, Linux, etc.) creates a dedicated and separate pair of\n*standard output*, *standard error* and *standard input* channels for every single program (or process) that runs in your computer.\nThis means that every program you write have a dedicated `stdin`, `stderr` and `stdout` that are separate\nfrom the `stdin`, `stderr` and `stdout` of other programs and processes that are currently running.\n\nThis is a behaviour from your OS.\nThis does not come from the programming language that you are using.\nBecause as I sad earlier, input and output in programming languages, specially\nin high-level ones, are just a simple abstraction over the `stdin`, `stderr` and `stdout` from your current OS.\nThat is, your OS is the intermediary between every input/output operation made in your program,\nregardless of the programming language that you are using.\n\n\n### The writer and reader pattern {#sec-writer-reader}\n\nIn Zig, there is a pattern around input/output (IO). I (the author of this book) don't know if there is an official name for this pattern.\nBut here, in this book, I will call it the \"writer and reader pattern\". In essence, every IO operation in Zig is\nmade through either a `GenericReader` or a `GenericWriter` object[^gen-zig].\n\nThese two data types come from the `std.io` module of the Zig Standard Library. As their names suggests, a\n`GenericReader` is an object that offers tools to read data from \"something\" (or \"somewhere\"), while a `GenericWriter`\noffers tools to write data into this \"something\".\nThis \"something\" might be different things: like a file that exists in your filesystem; or, it might be a network socket of your system[^sock]; or,\na continuous stream of data, like a standard input device from your system, that might be constantly\nreceiving new data from users, or, as another example, a live chat in a game that is constantly receiving and displaying new messages from the\nplayers of the game.\n\n[^gen-zig]: Previously, these objects were known as the `Reader` and `Writer` objects.\n[^sock]: The socket objects that we have created at @sec-create-socket, are examples of network sockets.\n\nSo, if you want to **read** data from something, or somewhere, it means that you need to use a `GenericReader` object.\nBut if you need instead, to **write** data into this \"something\", then, you need to use a `GenericWriter` object instead.\nBoth of these objects are normally created from a file descriptor object. More specifically, through the `writer()` and `reader()`\nmethods of this file descriptor object. If you are not familiar with this type of object, go to the\nnext section.\n\nEvery `GenericWriter` object have methods like `print()`, which allows you to write/send a formatted string\n(i.e. this formatted string is like a `f` string in Python, or, similar to the `printf()` C function)\ninto the \"something\" (file, socket, stream, etc.) that you are using. It also have a `writeAll()` method, which allows you to\nwrite a string, or, an array of bytes into the \"something\".\n\nLikewise, every `GenericReader` object have methods like `readAll()`, which allows you to read the\ndata from the \"something\" (file, socket, stream, etc.) until it fills a particular array (i.e. a \"buffer\") object.\nIn other words, if you provide an array object of 300 `u8` values to `readAll()`, then, this method attempts to read 300 bytes\nof data from the \"something\", and it stores them into the array object that you have provided.\n\nWe also have other methods, like the `readAtLeast()` method,\nwhich allows you to specify how many bytes exactly you want to read from the \"something\".\nIn more details, if you give the number $n$ as input to this method, then, it will attempt to read at least $n$ bytes of data from the \"something\".\nThe \"something\" might have less than $n$ bytes of data available for you to read, so, it is not garanteed\nthat you will get precisely $n$ bytes as result.\n\nAnother useful method is `readUntilDelimiterOrEof()`. In this method, you specify a \"delimiter character\".\nThe idea is that this function will attempt to read as many bytes of data as possible from the \"something\",\nuntil it encounters the end of the stream, or, it encounters the \"delimiter character\" that you have specified.\n\nIf you don't know exactly how many bytes will come from the \"something\", you may find the `readAllAlloc()` method\nuseful. In essence, you provide an allocator object to this method, so that it can allocate more space if needed.\nAs consequence, this method will try to read all bytes of the \"something\", and, if it runs out of space at some point\nduring the \"reading process\", it uses the allocator object to allocate more space to continue reading the bytes.\nAs result, this method returns a slice to the array object containing all the bytes read.\n\nThis is just a quick description of the methods present in these types of objects. But I recommend you\nto read the official docs, both for\n[`GenericWriter`](https://ziglang.org/documentation/master/std/#std.io.GenericWriter)[^gen-write] and\n[`GenericReader`](https://ziglang.org/documentation/master/std/#std.io.GenericReader)[^gen-read].\nI also think it is a good idea to read the source code of the modules in the Zig Standard Library\nthat defines the methods present in these objects, which are the\n[`Reader.zig`](https://github.com/ziglang/zig/blob/master/lib/std/io/Reader.zig)[^mod-read]\nand [`Writer.zig`]()[^mod-write].\n\n[^gen-read]: .\n[^gen-write]: .\n[^mod-read]: .\n[^mod-write]: .\n\n\n### Introducing file descriptors {#sec-file-descriptor}\n\nA \"file descriptor\" object is a core component behind every I/O operation that is made in any operating system (OS).\nSuch object is an identifier for a particular input/output (IO) resource from your OS [@wiki_file_descriptor].\nIt describes and identifies this particular resource. An IO resource might be:\n\n- an existing file in your filesystem.\n- an existing network socket.\n- other types of stream channels.\n- a pipeline (or just \"pipe\") in your terminal[^pipes].\n\n[^pipes]: A pipeline is a mechanism for inter-process communication, or, inter-process IO. You could also interpret a pipeline as a \"set of processes that are chained together, through the standard input/output devices of the system\". At Linux for example, a pipeline is created inside a terminal, by connecting two or more terminal commands with the \"pipe\" character (`|`).\n\nFrom the bulletpoints listed aboved, we know that although the term \"file\" is present,\na \"file descriptor\" might describe something more than just a file.\nThis concept of a \"file descriptor\" comes from the Portable Operating System Interface (POSIX) API,\nwhich is a set of standards that guide how operating systems across the world should be implemented,\nto maintain compatibility between them.\n\nA file descriptor not only identifies the input/output resource that you are using to receive or send some data,\nbut it also describes where this resource is, and also, which IO mode this resource is currently using.\nFor example, this IO resource might be using only the \"read\" IO mode, which means that this resource\nis open to \"read operations\", while \"write operations\" are closed and not authorized.\nThese IO modes are essentially, the modes that you provide to the argument `mode`\nfrom the `fopen()` C function, and also, from the `open()` Python built-in function.\n\nIn C, a \"file descriptor\" is a `FILE` pointer, but, in Zig, a file descriptor is a `File` object.\nThis data type (`File`) is described in the `std.fs` module of the Zig Standard Library.\nWe normally don't create a `File` object directly in our Zig code. Instead, we normally get such object as result when we\nopen an IO resource. In other words, we normally ask to our OS to open and use a particular IO\nresource, and, if the OS do open succesfully this IO resource, the OS normally handles back to us\na file descriptor to this particular IO resource.\n\nSo you usually get a `File` object by using functions and methods from the Zig Standard Library\nthat asks the OS to open some IO resources, like the `openFile()` method that opens a file in the\nfilesystem. The `net.Stream` object that we have created at @sec-create-socket is also a type of\nfile descriptor object.\n\n\n### The *standard output*\n\nYou already saw across this book, how can we access and use specifically the `stdout` in Zig\nto send some output to the user.\nFor that, we use the `getStdOut()` function from the `std.io` module. This function returns\na file descriptor that describes the `stdout` channel of your current OS. Through this file\ndescriptor object, we can read from or write stuff to the `stdout` of our program.\n\nAlthough we can read stuff recorded into the `stdout` channel, we normally only\nwrite to (or \"print\") stuff into this channel. The reason is very similar to what we discussed at\n@sec-read-http-message, when we were discussing what \"reading from\" versus \"writing to\" the connection\nobject from our small HTTP Server project would mean.\n\nWhen we write stuff into a channel, we are essentially sending data to the other end of this channel.\nIn contrast, when we read stuff from this channel, we are essentially reading the data that was sent\nthrough this channel. Since the `stdout` is a channel to send output to the user, the key verb here\nis **send**. We want to send something to someone, and, as consequence, we want to **write** something\ninto some channel.\n\nThat is why, when we use `getStdOut()`, most of the times, we also use the `writer()` method from the `stdout` file descriptor,\nto get access to a writer object that we can use to write stuff into this `stdout` channel.\nMore specifically, this `writer()` method returns a `GenericWriter` object. One of the\nmain methods of this `GenericWriter` object is the `print()` method that we have used\nbefore to write (or \"print\") a formatted string into the `stdout` channel.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n try stdout.writeAll(\n \"This message was written into stdout.\\n\"\n );\n}\n```\n:::\n\n\n\n\n```\nThis message was written into stdout.\n```\n\n\nThis `GenericWriter` object is like any other generic writer object that you would normally get from a file descriptor object.\nSo, the same methods from a generic writer object that you would use while writing files to the filesystem for example, you could also\nuse them here, from the file descriptor object of `stdout`, and vice-versa.\n\n\n### The *standard input*\n\nYou can access the *standard input* (i.e. `stdin`) in Zig by using the `getStdIn()` function from the `std.io` module.\nLike it's sister (`getStdOut()`), this function also returns a file descriptor object that describes the `stdin` channel\nof your OS.\n\nSince now, we want to receive some input from the user, the key verb here becomes **receive**, and, as consequence,\nwe usually want to **read** data from the `stdin` channel, instead of writing data into it. So, we normally use\nthe `reader()` method of the file descriptor object returned by `getStdIn()`, to get access to a `GenericReader`\nobject that we can use to read data from `stdin`.\n\nIn the example below, we are creating a small buffer capable of holding 20 characters. Then, we try to read\nthe data from the `stdin` with the `readUntilDelimiterOrEof()` method, and save this data into the `buffer` object.\nAlso notice that we are reading the data from the `stdin` until we hit a new line character (`'\\n'`).\n\nIf you execute this program, you will notice that this program stops the execution, and start to wait indefinetly\nfor some input from the user. In other words, you need to type your name into the terminal, and then, you press Enter to\nsend your name to `stdin`. After you send your name to `stdin`, the program reads this input, and continues with the execution,\nby printing the given name to `stdout`. In the example below, I typed my name (Pedro) into the terminal, and then, pressed Enter.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst stdin = std.io.getStdIn().reader();\npub fn main() !void {\n try stdout.writeAll(\"Type your name\\n\");\n var buffer: [20]u8 = undefined;\n @memset(buffer[0..], 0);\n _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\\n');\n try stdout.print(\"Your name is: {s}\\n\", .{buffer});\n}\n```\n:::\n\n\n\n\n```\nType your name\nYour name is: Pedro\n\n```\n\n\n### The *standard error*\n\nThe *standard error* (a.k.a. the `stderr`) works exactly the same as the `stdout`.\nYou just call the `getStdErr()` function from the `std.io` module, and you get the file descriptor to `stderr`.\nIdeally, you should write only error or warning messages to `stderr`, because this is\nthe purpose of this channel.\n\n\n\n\n\n## Buffered IO\n\nAs we described at @sec-io-basics, input/output (IO) operations are made directly by the operating system.\nIt is the OS that manages the IO resource that you want to use for your IO operations.\nThe consequence of this fact is that IO operations are heavilly based on system calls (i.e. calling the operating system directly).\n\nJust to be clear, there is nothing particularly wrong with system calls. We use them all the time on\nany serious codebase written in any low-level programming language. However, system calls are\nalways orders of magnitude slower than many different types of operations.\n\nSo is perfectly fine to use a system call once in a while. But when these system calls start to be used often,\nyou can clearly notice most of the times the lost of performance in your application. So, the good rule of thumbs\nis to use a system call only when it is needed, and also, only in infrequent situations, to reduce\nthe number of system calls performed to a minimum.\n\n\n### Understanding how buffered IO works\n\nBuffered IO is a strategy to achieve better performance. It is used to reduce the number of system calls made by IO operations, and, as\nconsequence, achieve a much higher performance. At @fig-buff-diff you can find two different diagrams which presents the differences between\nread operations performed in an unbuferred IO environment versus a buffered IO environemnt.\n\nTo give a better context to these diagrams, let's suppose that we have a text file that contains the famous Lorem ipsum text[^lorem]\nin our filesystem. Let's also suppose that these diagrams at @fig-buff-diff\nare showing the read operations that we are performing to read the Lorem ipsum text from this text file.\nThe first thing you notice when looking at the diagrams, is that in an unbuffered environment the read operations leads to many system calls.\nMore precisely, in the diagram exposed at @fig-unbuffered-io we get one system call per each byte that we read from the text file.\nOn the other hand, at @fig-buffered-io we have only one system call at the very beginning.\n\nWhen we use a buffered IO system, at the first read operation we perform, instead of sending one single byte directly\nto our program, the OS first sends a chunk of bytes from the file to a buffer object (i.e. an array).\nThis chunk of bytes are cached/stored inside this buffer object, and when this operation is done, then\nyour program receives the byte that it actually asked for.\n\nFrom now on, for every new read operation that you perform, instead of making a new system call to ask\nfor the next byte in the file to the OS, this read operation is redirected to the buffer object, that have\nthis next byte already cached and ready to go.\n\n\n[^lorem]: .\n\n::: {#fig-buff-diff layout-nrow=2}\n\n![Unbuffered IO](./../Figures/unbuffered-io.png){#fig-unbuffered-io width=60%}\n\n![Buffered IO](./../Figures/buffered-io.png){#fig-buffered-io}\n\nDiagrams of read operations performed in buffered IO and unbuffered IO environments.\n\n:::\n\nThis is the basic logic behind buffered IO systems. The size of the buffer object depends, but most of the times,\nit is equal to a full page of memory (4096 bytes). If we follow this logic, then, the OS reads the first 4096 bytes\nof the file and caches it into the buffer object. As long as your program does not consume all of the 4096 bytes from the buffer,\nnot a single system call is created.\n\nHowever, as soon as you consume all of the 4096 bytes from the buffer, it means that there is no bytes left in the buffer.\nIn this situation, a new system call is made to ask the OS to send the next 4096 bytes in the file, and once again,\nthese bytes are cached into the buffer object, and the cycle starts once again.\n\n\n### Buffered IO across different languages\n\nIO operations made through a `FILE` pointer in C are buffered\nby default, so, at least in C, you don't need to worry about this subject. But in contrast, IO operations in both Rust and Zig are not\nbuffered depending on which functions from the standard libraries that you are using.\n\nFor example, in Rust, buffered IO is implemented through the `BufReader` and `BufWriter` structs, while in Zig, it is implemented\nthrough the `BufferedReader` and `BufferedWriter` structs.\nSo any IO operation that you perform through the `GenericWriter` and `GenericReader` objects\nthat I presented at @sec-writer-reader are not buffered, which means that these objects\nmight create a lot of system calls depending on the situation.\n\n\n### Using buffered IO in Zig\n\nUsing buffered IO in Zig is actually very easy. All you have to do is to just\ngive the `GenericWriter` object to the `bufferedWriter()` function, or, to give the `GenericReader`\nobject to the `bufferedReader()` function. These functions come from the `std.io` module,\nand they will construct the `BufferedWriter` or `BufferedReader` object for you.\n\nAfter you create this new `BufferedWriter` or `BufferedReader` object, you can call the `writer()`\nor `reader()` method of this new object, to get access to a new (and buffered) generic reader or\ngeneric writer.\n\nLet's describe the process once again. Every time that you have a file descriptor object, you first get the generic writer or generic reader\nobject from it, by calling the `writer()` or `reader()` methods of this file descriptor object.\nThen, you provide this generic writer or generic reader to the `bufferedWriter()` or `bufferedReader()`\nfunction, which creates a new `BufferedWriter` or `BufferedReader` object. Then, you call\nthe `writer()` or `reader()` methods of this buffered writer or buffered reader object,\nwhich gives you access to a generic writer or generic reader object that is buffered.\n\nTake this program as an example. This program is essentially demonstrating the process exposed at @fig-buffered-io.\nWe are simply opening a text file that contains the Lorem ipsum text, and then, we create a buffered IO reader object\nat `bufreader`, and we use this `bufreader` object to read the contents of this file into a buffer object, then,\nwe end the program by printing this buffer to `stdout`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar file = try std.fs.cwd().openFile(\n \"ZigExamples/file-io/lorem.txt\", .{}\n);\ndefer file.close();\nvar buffered = std.io.bufferedReader(file.reader());\nvar bufreader = buffered.reader();\n\nvar buffer: [1000]u8 = undefined;\n@memset(buffer[0..], 0);\n\n_ = try bufreader.readUntilDelimiterOrEof(\n buffer[0..], '\\n'\n);\ntry stdout.print(\"{s}\\n\", .{buffer});\n```\n:::\n\n\n\n\n```\nLorem ipsum dolor sit amet, consectetur\nadipiscing elit. Sed tincidunt erat sed nulla ornare, nec\naliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus,\nultrices eleifend porttitor ut, finibus ut tortor. Maecenas\nsapien justo, finibus tincidunt dictum ac, semper et lectus.\nVivamus molestie egestas orci ac viverra. Pellentesque nec\narcu facilisis, euismod eros eu, sodales nisl. Ut egestas\nsagittis arcu, in accumsan sapien rhoncus sit amet. Aenean\nneque lectus, imperdiet ac lobortis a, ullamcorper sed massa.\nNullam porttitor porttitor erat nec dapibus. Ut vel dui nec\nnulla vulputate molestie eget non nunc. Ut commodo luctus ipsum,\nin finibus libero feugiat eget. Etiam vel ante at urna tincidunt\nposuere sit amet ut felis. Maecenas finibus suscipit tristique.\nDonec viverra non sapien id suscipit.\n```\n\nDespite being a buffered IO reader, this `bufreader` object is similar to any other `GenericReader` object,\nand have the exact same methods. So, although these two types of objects perform very different IO operations,\nthey have the same interface, so, you the programmer, can interchangeably use them\nwithout the need to change anything in your source code.\nSo a buffered IO reader or a buffered IO writer objects have the same methods than it's generic and unbuffered brothers,\ni.e. the generic reader and generic writer objects that I presented at @sec-writer-reader.\n\n::: {.callout-tip}\nIn general, you should always use a buffered IO reader or a buffered IO writer object to perform\nIO operations in Zig. Because they deliver better performance to your IO operations.\n:::\n\n\n## Filesystem basics\n\nNow that we have discussed the basics around Input/Output operations in Zig, we need to\ntalk about the basics around filesystems, which is another core part of any operating system.\nAlso, filesystems are related to input/output, because the files that we store and create in our\ncomputer are considered an IO resource, as we described at @sec-file-descriptor.\n\nLikewise when we were talking about input/output, if you have ever programmed in your life, you probably know\nsome basics about filesystems and file operations, etc.\nBut, since I don't know you, I don't know what is your background. As a result,\nthese concepts that I will describe might be clear in your mind, but they also maybe be not as clear as you think.\nJust bare with me, while I'm trying to put everyone on the same basis.\n\n\n### The concept of current working directory (CWD)\n\nThe working directory is the folder on your computer where you are currently rooted at,\nor in other words, it is the folder that your program is currently looking at.\nTherefore, whenever you are executing a program, this program is always working with\na specific folder on your computer. It is always in this folder that the program will initially\nlook for the files you require, and it is also in this folder that the program\nwill initially save all the files you ask it to save.\n\nThe working directory is determined by the folder from which you invoke your program\nin the terminal. In other words, if you are in the terminal of your OS, and you\nexecute a binary file (i.e. a program) from this terminal, the folder to which your terminal\nis pointing at is the current working directory of your program that is being executed.\n\nAt @fig-cwd we have an example of me executing a program from the terminal. We are executing\nthe program outputted by the `zig` compiler by compiling the Zig module named `hello.zig`.\nThe CWD in this case is the `zig-book` folder. In other words, while the `hello.zig` program\nis executing, it will be looking at the `zig-book` folder, and any file operation that we perform\ninside this program, will be using this `zig-book` folder as the \"starting point\", or, as the \"central focus\".\n\n![An example of executing a program from the terminal](./../Figures/cwd.png){#fig-cwd}\n\nJust because we are rooted inside a particular folder (in the case of @fig-cwd, the `zig-book` folder) of our computer,\nit doesn't mean that we cannot access or write resources in other locations of our computer.\nThe current working directory (CWD) mechanism just defines where your program will look first\nfor the files you ask for. This does not prevent you from accessing files that are located\nelsewhere on your computer. However, to access any file that is in a folder other than your\ncurrent working directory, you must provide a path to that file or folder.\n\n\n### The concept of paths\n\nA path is essentially a location. It points to a location in your filesystem. We use\npaths to describe the location of files and folders in our computer.\nOne important aspect is that paths are always written inside strings,\ni.e. they are always provided as text values.\n\nThere are two types of paths that you can provide to any program in any OS: a relative path, or an absolute path.\nAbsolute paths are paths that start at the root of your filesystem, and go all the way to the file name or the specfic folder\nthat you are referring to. This type of path is called absolute, because it points to a unique, absolute location on your computer.\nThat is, there is no other existing location on your computer that corresponds to this path. It is an unique identifier.\n\nIn Windows, an absolute path is a path that starts with a hard disk identifier (e.g. `C:/Users/pedro`).\nOn the other hand, absolute paths in Linux and MacOS, are paths that start with a forward slash character (e.g. `/usr/local/bin`).\nNotice that a path is composed by \"segments\". Each segment is connected to each other by a slash character (`\\` or `/`).\nOn Windows, the backward slash (`\\`) is normally used to connect the path segments. While on Linux and MacOS, the forward\nslash (`/`) is the character used to connect path segments.\n\nIn contrast, a relative path is a path that start at the CWD. In other words, a relative path is\n\"relative to the CWD\". The path used to access the `hello.zig` file at @fig-cwd is an example of relative path. This path\nis reproduced below. This path begins at the CWD, which in the context of @fig-cwd, is the `zig-book` folder,\nthen, it goes to the `ZigExamples` folder, then, into `zig-basics`, then, to the `hello.zig` file.\n\n```\nZigExamples/zig-basics/hello_world.zig\n```\n\n\n### Path wildcards\n\nWhen providing paths, specially relative paths, you have the option of using a *wildcard*.\nThere are two commonly used *wildcards* in paths, which are \"one period\" (.) and \"two periods\" (..).\nIn other words, these two specific characters have special meanings when used in paths,\nand can be used on any operating system (Mac, Windows, Linux, etc.). That is, they\nare \"cross platform\".\n\nThe \"one period\" represents an alias for your current working directory.\nThis means that the relative paths `\"./Course/Data/covid.csv\"` and `\"Course/Data/covid.csv\"` are equivalent.\nOn the other hand, the \"two periods\" refers to the previous directory.\nFor example, the path `\"Course/..\"` is equivalent to the path `\".\"`, that is, the current working directory.\n\nTherefore, the path `\"Course/..\"` refers to the folder before the `Course` folder.\nAs another example, the path `\"src/writexml/../xml.cpp\"` refers to the file `xml.cpp`\nthat is inside the folder before the `writexml` folder, which in this example is the `src` folder.\nTherefore, this path is equivalent to `\"src/xml.cpp\"`.\n\n\n\n\n## The CWD handler\n\nIn Zig, filesystem operations are usually made through a directory handler object.\nA directory handler in Zig is an object of type `Dir`, which is an object that describes\na particular folder in the filesystem of our computer.\nYou normally create a `Dir` object, by calling the `std.fs.cwd()` function.\nThis function returns a `Dir` object that points to (or, that describes) the\ncurrent working directory (CWD).\n\nThrough this `Dir` object, you can create new files, or modify, or read existing ones that are\ninside your CWD. In other words, a `Dir` object is the main entrypoint in Zig to perform\nmultiple types of filesystem operations.\nIn the example below, we are creating this `Dir` object, and storing it\ninside the `cwd` object. Although we are not using this object at this code example,\nwe are going to use it a lot over the next examples.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\n_ = cwd;\n```\n:::\n\n\n\n\n\n\n\n\n\n\n\n## File operations\n\n### Creating files {#sec-creating-files}\n\nWe create new files by using the `createFile()` method from the `Dir` object.\nJust provide the name of the file that you want to create, and this function will\ndo the necessary steps to create such file. You can also provide a relative path to this function,\nand it will create the file by following this path, which is relative to the CWD.\n\nThis function might return an error, so, you should use `try`, `catch`, or any of the other methods presented\nat @sec-error-handling to handle the possible error. But if everything goes well,\nthis `createFile()` method returns a file descriptor object (i.e. a `File` object) as result,\nthrough which you can add content to the file with the IO operations that I presented before.\n\nTake this code example below. In this example, we are creating a new text file\nnamed `foo.txt`. If the function `createFile()` succeeds, the object named `file` will contain a file descriptor\nobject, which we can use to write (or add) new content to the file, like we do in this example, by using\na buffered writer object to write a new line of text to the file.\n\nNow, a quick note, when we create a file descriptor object in C, by using a C function like `fopen()`, we must always close the file\nat the end of our program, or, as soon as we complete all operations that we wanted to perform\non the file. In Zig, this is no different. So everytime we create a new file, this file remains\n\"open\", waiting for some operation to be performed. As soon as we are done with it, we always have\nto close this file, to free the resources associated with it.\nIn Zig, we do this by calling the method `close()` from the file descriptor object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.createFile(\"foo.txt\", .{});\n// Don't forget to close the file at the end.\ndefer file.close();\n// Do things with the file ...\nvar fw = file.writer();\n_ = try fw.writeAll(\n \"Writing this line to the file\\n\"\n);\n```\n:::\n\n\n\n\n\nSo, in this example we not only have created a file into the filesystem,\nbut we also wrote some data into this file, using the file descriptor object\nreturned by `createFile()`. If the file that you are trying to create\nalready exists in your filesystem, this `createFile()` call will\noverwrite the contents of the file, or, in other words, it will\nin practice erase all the contents of the existing file.\n\nIf you don't want this to happen, meaning, that you don't want to overwrite\nthe contents of the existing file, but you want to write data to this file anyway\n(i.e. you want to append data to the file), you should use the `openFile()`\nmethod from the `Dir` object.\n\nAnother important aspect about `createFile()` is that this method creates a file\nthat is not opened to read operations by default. It means that you cannot read this file.\nYou are not allowed to.\nSo for example, you might want to write some stuff into this file at the beginning of the execution\nof your program. Then, at a future point in your program you might need to read what you have\nwroted into this file. If you try to read data from this file, you will likely\nget a `NotOpenForReading` error as result.\n\n\nBut how can you overcome this barrier? How can you create a file that is open\nto read operations? All you have to do, is to set the `read` flag to true\nin the second argument of `createFile()`. When you set this flag to true,\nthen the file get's create with \"read permissions\", and, as consequence,\na program like this one below becomes valid:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.createFile(\"foo.txt\", .{ .read = true });\ndefer file.close();\n\nvar fw = file.writer();\n_ = try fw.writeAll(\"We are going to read this line\\n\");\n\nvar buffer: [300]u8 = undefined;\n@memset(buffer[0..], 0);\ntry file.seekTo(0);\nvar fr = file.reader();\n_ = try fr.readAll(buffer[0..]);\ntry stdout.print(\"{s}\\n\", .{buffer});\n```\n:::\n\n\n\n\n\n```\nWe are going to read this line\n```\n\n\nIf you are not familiar with position indicators, you may not recognize what the method\n`seekTo()` is, or, what does it do. If that is your case, do not worry,\nwe are going to talk more about this method at @sec-indicators. But essentially\nthis method is moving the position indicator back to the beginning of the file,\nso that we can read the contents of the file from the beginning.\n\n\n### Opening files and appending data to it\n\nOpening files is easy. Just use the `openFile()` method instead of `createFile()`.\nIn the first argument of `openFile()` you provide the path to the file that\nyou want to open. Then, on the second argument you provide the flags (or, the options)\nthat dictates how the file is opened.\n\nYou can see the full list of options for `openFile()` by visiting the documentation for\n[`OpenFlags`](https://ziglang.org/documentation/master/std/#std.fs.File.OpenFlags)[^oflags].\nBut the main flag that you will most certainly be worried about is the `mode` flag.\nThis flag specifies the IO mode that the file will be using when it get's opened.\nThere are three IO modes, or, three values that you can provide to this flag, which are:\n\n- `read_only`, allows only read operations on the file. All write operations are blocked.\n- `write_only`, allows only write operations on the file. All read operations are blocked. \n- `read_write`, allows both write and read operations on the file.\n\n[^oflags]: \n\nThese modes are similar to the modes that you provide to the `mode` argument of the\n`open()` Python built-in function[^py-open], or, the `mode` argument of the\n`fopen()` C function[^c-open].\nIn the code example below, we are opening the `foo.txt` text file with a `write_only` mode,\nand appending a new line of text to the end of the file. We use `seekFromEnd()` this time\nto garantee that we are going to append the text to the end of the file. Once again, methods\nsuch as `seekFromEnd()` are described in more depth at @sec-indicators.\n\n[^py-open]: \n[^c-open]: \n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.openFile(\"foo.txt\", .{ .mode = .write_only });\ndefer file.close();\ntry file.seekFromEnd(0);\nvar fw = file.writer();\n_ = try fw.writeAll(\"Some random text to write\\n\");\n```\n:::\n\n\n\n\n\n### Deleting files\n\nSometimes, we just need to delete/remove the files that we have.\nTo do that, we use the `deleteFile()` method. You just provide the path of the\nfile that you want to delete, and this method will try to delete the file located\nat this path.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.deleteFile(\"foo.txt\");\n```\n:::\n\n\n\n\n### Copying files\n\nTo copy existing files, we use the `copyFile()` method. The first argument in this method\nis the path to the file that you want to copy. The second argument is a `Dir` object, i.e. a directory handler,\nmore specifically, a `Dir` object that points to the folder in your computer where you want to\ncopy the file to. The third argument is the new path of the file, or, in other words, the new location\nof the file. The fourth argument is the options (or flags) to be used in the copy operation.\n\nThe `Dir` object that you provide as input to this method will be used to copy the file to\nthe new location. You may create this `Dir` object before calling the `copyFile()` method.\nMaybe you are planning to copy the file to a completly different location in your computer,\nso it might be worth to create a directory handler to that location. But if you copying the\nfile to a subfolder of your CWD, then, you can just simply pass the CWD handler to this argument.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.copyFile(\n \"foo.txt\",\n cwd,\n \"ZigExamples/file-io/foo.txt\",\n .{}\n);\n```\n:::\n\n\n\n\n\n### Read the docs!\n\nThere are some other useful methods for file operations available at `Dir` objects,\nsuch as the `writeFile()` method, but I recommend you to read the docs for the\n[`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir]\nto explore the other available methods, since I already talked too much about them.\n\n\n[^zig-dir]: \n\n\n\n\n## Position indicators {#sec-indicators}\n\nA position indicator is like a type of cursor, or, an index. This \"index\" identifies the current\nlocation in the file (or, in the data stream) that the file descriptor object that you have\nis currently looking at.\nWhen you create a file descriptor, the position indicator starts at the beginning of the file,\nor, at the beginning of the stream. When you read or write data into the file (or socket, or data stream, etc.)\ndescribed by this file descriptor object, you end up moving the position indicator.\n\nIn other words, any IO operation have a common side effect, which is moving the position indicator.\nFor example, suppose that we have a file of 300 bytes total in size. If you\nread 100 bytes from the file, the position indicator moves 100 bytes forward. If you try\nto write 50 bytes into this same file, these 50 bytes will be written from the current\nposition indicated by the position indicator. Since the indicator is at a 100 bytes forward from\nthe beginning of the file, these 50 bytes would be written in the middle of the file.\n\nThis is why we have used the `seekTo()` method at the last code example presented at @sec-creating-files.\nWe have used this method to move the position indicator back to the beginning of the file, which\nwould make sure that we would write the text that we wanted to write from the beginning of the file,\ninstead of writing it from the middle of the file. Because before the write operation, we already had\nperformed a read operation, which means that the position indicator was moved in this read operation.\n\nThe position indicators of a file descriptor object can be changed (or altered) by using the\n\"seek\" methods from this file descriptor, which are: `seekTo()`, `seekFromEnd()` and `seekBy()`.\nThese methods have the same effect, or, the same resposibility that the\n[`fseek()`](https://en.cppreference.com/w/c/io/fseek)[^c-fseek]\nC function.\n\n[^c-fseek]: \n\n\nConsidering that `offset` refers to the index that you provide as input to these \"seek\" methods,\nthe bulletpoints below summarises what is the effect of each of these methods.\nA quick note, in the case of `seekFromEnd()` and `seekBy()`, the `offset` provided can be either a\npositive or negative index.\n\n- `seekTo()` will move the position indicator to the location that is `offset` bytes from the beginning of the file.\n- `seekFromEnd()` will move the position indicator to the location that is `offset` bytes from the end of the file.\n- `seekBy()` will move the position indicator to the location that is `offset` bytes from the current position in the file.\n\n\n\n\n\n\n\n## Directory operations\n\n### Iterating through the files in a directory\n\nOne of the most classic tasks related to filesystem is to be able\nto iterate through the existing files in a directory. Iteration\nover a directory is made in Zig through a iterator pattern. In other words, we need\nto create a iterator object, and use this object to iterate through the files.\n\nYou can produce such directory object by using either the `iterate()` or `walk()` methods\nof a `Dir` object. Both methods return a iterator object as result, which you can advance by using\n`next()`. The difference between these methods, is that `iterate()` returns a non-recursive iterator,\nwhile `walk()` does. It means that the iterator returned by `walk()` will not only iterate through\nthe files available in the current directory, but also, through the files from any subdirectory found\ninside the current directory.\n\nIn the example below, we are displaying the names of the files stored inside the\ndirectory `ZigExamples/file-io`. Notice that we had to open this directory through\nthe `openDir()` function. Also notice that we provided the flag `iterate` in the\nsecond argument of `openDir()`. This flag is important, because without this flag,\nwe would not be allowed to iterate through the files in this directory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst dir = try cwd.openDir(\n \"ZigExamples/file-io/\",\n .{ .iterate = true }\n);\nvar it = dir.iterate();\nwhile (try it.next()) |entry| {\n try stdout.print(\n \"File name: {s}\\n\",\n .{entry.name}\n );\n}\n```\n:::\n\n\n\n\n```\nFile name: create_file_and_write_toit.zig\nFile name: create_file.zig\nFile name: lorem.txt\nFile name: iterate.zig\nFile name: delete_file.zig\nFile name: append_to_file.zig\nFile name: user_input.zig\nFile name: foo.txt\nFile name: create_file_and_read.zig\nFile name: buff_io.zig\nFile name: copy_file.zig\n```\n\n\n### Creating new directories\n\nThere are two methods that are important when it comes to\ncreating directories, which are `makeDir()` and `makePath()`.\nThe difference between these two methods is that `makeDir()` can\nonly create one single directory in the current directory in each call,\nwhile `makePath()` is capable of recursively create subdirectories in the same call.\n\n\nThis is why the name of this method is \"make path\". It will create as many\nsubdirectories as necessary to create the path that you provided as input.\nSo, if you provide the path `\"sub1/sub2/sub3\"` as input to this method,\nit will create three different subdirectories, `sub1`, `sub2` and `sub3`,\nwithin the same function call. In contrast, if you provided such path\nas input to `makeDir()`, you would likely get an error as result, since\nthis method can only create a single subdirectory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.makeDir(\"src\");\ntry cwd.makePath(\"src/decoders/jpg/\");\n```\n:::\n\n\n\n\n### Deleting directories\n\nTo delete a directory, just provide the path to the directory that you want to delete\nas input to the `deleteDir()` method from a `Dir` object. In the example below,\nwe are deleting the `src` directory that we have just created in the previous example.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.deleteDir(\"src\");\n```\n:::\n\n\n\n\n\n## Conclusion\n\nIn this chapter, I have described how to perform in Zig the most common filesystem and IO operations.\nBut you might feel the lack of some other, less common, operation in this chapter, such as: how to rename files,\nor how to open a directory, or how to create symbolic links, or how to use `access()` to test if a particular\npath exists in your computer. But for all of these less common tasks, I recommend you to read\nthe docs of the [`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir]\n, since you can find a good description of these cases there.\n\n\n\n\n", - "supporting": [ - "12-file-op_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/13-image-filter/execute-results/html.json b/_freeze/Chapters/13-image-filter/execute-results/html.json index 8596e7a3..d8c7a80b 100644 --- a/_freeze/Chapters/13-image-filter/execute-results/html.json +++ b/_freeze/Chapters/13-image-filter/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Project 4 - Developing an image filter\n\nIn this chapter we are going to build a new small project. The objective of\nthis project is to build a program in Zig that applies a filter over an image.\nMore specifically, a \"grayscale filter\". This filter essentially transforms\nany color image into a grayscale image.\n\nWe are going to use the image displayed at @fig-pascal as the example in this project.\nIn other words, we are going to transform this colored image, into a grayscale image,\nusing our \"image filter program\" written in Zig.\n\n![A photo of the chilean-american actor Pedro Pascal. Source: Google Images.](../ZigExamples/image_filter/pedro_pascal.png){#fig-pascal}\n\nWe don't need to write a lot of code to build such \"image filter program\". However, in order for us\nto build such program, we first need to understand how digital images work. That is why we begin this chapter\nby explaining the theory behind digital images and how colors are represented in modern computers.\nWe also give a brief explanation about the file format PNG (Portable Network Graphics), which is used\nin the example images.\n\nAt the end of this chapter, we will have a full example of a program that takes the PNG image displayed at @fig-pascal\nas input, and writes a new image to the current working directory that is the grayscale version of the input image.\nThis grayscale version of @fig-pascal that is written by the program is exposed at @fig-pascal-gray.\nYou can find the full source code of this small project at the `ZigExamples/image_filter`\n[folder at the official repository of this book](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/image_filter)[^img-filter-folder].\n\n\n![The grayscale version of the photo.](../ZigExamples/image_filter/pedro_pascal_filter.png){#fig-pascal-gray}\n\n\n## How we see things? {#sec-eyes}\n\nIn this section, I want to briefly describe to you how we (humans) actually see things with our own eyes.\nI mean, how our eyes work? If you do have a very basic understanding of how our eyes work, you will understand\nmore easily how digital images are made. Because the techniques used to produce a digital image\nwere developed by taking a lot of inspiration from how our human eyes work.\n\nYou can interpret a human eye as a light sensor, or, a light receptor. The eye receives some amount of light as input,\nand it interprets the colors that are present in this \"amount of light\".\nIf no amount of light hits the eye, then, the eye cannot extract color from it, and as result,\nwe end up seeing nothing, or, more precisely, we see complete blackness.\n\nSo everything depends on light. What we actually see are the colors (blue, red, orange, green, purple, yellow, etc.) that\nare being reflected from the light that is hitting our eyes. **Light is the source of all colors!**\nThis is what Isaac Newton discovered on his famous prism experiment[^newton] in the 1660s.\n\n[^newton]: \n\nInside our eyes, we have a specific type of cell called the \"cone cell\".\nOur eye have three different types, or, three different versions of these \"cone cells\".\nEach of these three types of cone cell is very sensitive to a specific spectrum of the light,\nwhich are the spectrums that define the colors red, green and blue.\nSo, in summary, our eyes have specific types of cells that\nare highly sensitive to these three colors (red, green and blue).\n\nThese are the cells responsible for perceiving the color present in the light that hits our eyes.\nAs a result, our eyes perceives color as a mixture of these three colors (red, green and blue). By having an amount\nof each one of these three colors, and mixing them together, we can get any other visible color\nthat we want. So every color that we see is perceived as a specific mixture of blues, greens and reds,\nlike 30% of red, plus 20% of green, plus 50% of blue.\n\nWhen these cone cells perceive (or, detect) the colors that are found in the\nlight that is hitting our eyes, these cells produce electrical signals and sent them to the brain.\nOur brain interprets these electrical signals, and use them to form the image that we are seeing\ninside our head.\n\nBased on what we have discussed here, the items below describes the sequence of events that\ncomposes this very simplified version of how our human eyes work:\n\n1. Light hits our eyes.\n1. The cone cells perceive the colors that are present in this light.\n1. Cone cells produce electrical signals that describes the colors that were perceived in the light.\n1. The electrical signals are sent to the brain.\n1. Brain interprets these signals, and form the image based on the colors identified by these electrical signals.\n\n\n## How digital images work? {#sec-digital-img}\n\nA digital image is a \"digital representation\" of an image that we see with our eyes.\nIn other words, a digital image is a \"digital representation\" of the colors that we see\nand perceive through the light.\nIn the digital world, we have two types of images, which are: vector image and raster image.\nVector images are not described here. So just remember that every single aspect that we discuss\nhere in this chapter about digital images **are related solely to raster images**, and not vector images.\n\nRaster images are digital images that are represented as a 2D (two dimensional) matrix\nof pixels. In other words, every raster image is basically a rectangle of pixels. Each pixel have a particular color.\nSo, a raster image is just a rectangle of pixels, and each of these pixels are displayed in the screen of your computer (or the screen\nof any other device, e.g. laptop, tablet, smartphone, etc.) as a color.\n\n@fig-raster demonstrates this idea. If you take any raster image, and you zoom into it very hard,\nyou will see the actual pixels of the image. Although JPEG and PNG are file formats to store raster images,\nwhen you zoom into JPEG, PNG, and some other types of raster image files, you usually do not quite see the pixels.\nThat is because most of these file formats implement techniques that affect how the pixels are displayed,\nwith the objective of increasing the details in the image. The most famous of these techniques is anti-aliasing,\nwhich you probably know from video-games. But nevertheless, the pixels are still there! They are\njust displayed differently in JPEG/PNG files.\n\n![Zooming over a raster image to see the pixels. Source: Google Images.](../Figures/imagem-raster.png){#fig-raster}\n\nThe more pixels the image has, the more information and detail we can include in the image.\nThe more precise, sharp and pretty will look the image. That is why photographic cameras\nusually produces big raster images, with several megapixels of resolution, to include as much detail as possible into the final image.\nAs an example, a digital image with dimensions of 1920 pixels of width and 1080 pixels of height, would be a image that\ncontains $1920 \\times 1080 = 2073600$ pixels in total. You could also say that the \"total area\" of the image is\nof 2073600 pixels, although the concept of \"area\" is not very used here in computer graphics.\n\nMost digital images we see in our modern world uses the RGB color model. RGB stands for (red, green and blue) if you did not\ndetected that yet. So the color of each pixel in these raster images are usually\nrepresented as a mixture of red, green and blue, just like in our eyes. That is, the color of each pixel is identified by a set of\nthree different integer values. Each integer value identifies the \"amount\" of each color (red, green and blue).\nFor example, the set `(199, 78, 70)` identifies a color that is close to red. We have 199 of red, 78 of green,\nand 70 of blue. In contrast, the set `(129, 77, 250)` describes a color that is more close to purple. Et cetera.\n\n\n\n### Images are displayed from top to bottom\n\nThis is not a rule written in stone, but the big majority of digital images are displayed from top\nto bottom and left to right. Most computers screens also follow this pattern. So, the first pixels\nin the image are the ones that are at the top and left corner of the image. You can find a visual representation\nof this logic at @fig-img-display.\n\nAlso notice in @fig-img-display that, since a digital image is essentially a 2D matrix of pixels,\nthe image is organized into rows and columns of pixels. The columns are defined by the horizontal x axis,\nwhile the rows are defined by the vertical y axis.\n\nEach pixel (i.e. the gray rectangles) exposed at @fig-img-display contains a number inside of it.\nThese numbers are the indexes of the pixels. You can notice that the first pixels are in the top and left\ncorner, and also, that the indexes of these pixels \"grow to the sides\", or, in other words, they grow in the direction of the horizontal x axis.\nThis means that most digital images are usually organized as rows of pixels. So when these digital images are\ndisplayed, the screen display the first row of pixels, then, the second row, then, the third row, etc.\n\n![How the pixels of raster images are displayed.](./../Figures/image-display.png){#fig-img-display}\n\n\n\n\n\n\n### Representing the matrix of pixels in code {#sec-pixel-repr}\n\nOk, we know already that digital images are represented as a 2D matrix of pixels.\nBut we do not have a notion of a 2D matrix in Zig. Most low-level languages in general (Zig, C, Rust, etc.) do not have such notion.\nSo how do we represent such matrix of pixels in Zig, or any other low-level language.\nThe strategy that most programmers choose is to just use a normal 1D array to store the values of\nthis 2D matrix. In other words, you just create an normal 1D array, and store all values from both dimensions into this 1D array.\n\n\nNow, remember, a digital image is represented as a\n2D matrix of pixels, and each pixel is represented by 3 unsigned 8bit-integer values.\nSo, if we have for example a very small image of dimensions 4x3, then,\nwe have 12 pixels in total in this image. As a result, we need to create a normal array that\ncan store $3 \\times 12 = 36$ integer values, more precisely, an array of 36 `u8` values.\n\nThe reason why unsigned 8-bit integer (`u8`) values are used to represent the amounts of each color,\ninstead of any other integer type, is because they take the minimum amount of space as possible, or,\nthe minimum amount of bits as possible. Which helps to reduces the binary size of the image, i.e. of the 2D matrix.\nAlso, they convey a good amount of precision and detail about the colors, even though they can represent\na relatively small range (from 0 to 255) of \"color amounts\".\n\nComing back to our initial example of a 4x3 image,\nthe `matrix` object exposed below could be an example of an 1D array that stores\nthe data that represents this 4x3 image.\n\n```zig\nconst matrix = [_]u8{\n 201, 10, 25, 185, 65, 70,\n 65, 120, 110, 65, 120, 117,\n 98, 95, 12, 213, 26, 88,\n 143, 112, 65, 97, 99, 205,\n 234, 105, 56, 43, 44, 216,\n 45, 59, 243, 211, 209, 54,\n};\n```\n\nThe first three integer values in this array are the color amounts of the first pixel in the image.\nWhile the next three integer are the colors amounts for the second pixel.\nAnd the sequence goes on in this pattern. So the size of the array that stores\nthe values of the pixels from a raster image is usually a multiple of 3.\nIn this case, the array have a size of 36.\n\nI mean, the size of the array is **usually** a multiple of 3, because in specific circumstances,\nit can also be a multiple of 4. This happens when a transparency amount is\nalso included into the raster image. In other words, there are some types of raster images\nthat follow a different color model, which is the RGBA (red, green, blue and alpha)\ncolor model. The \"alpha\" corresponds to an amount of transparency in the pixel.\nSo every pixel in a RGBA image is represented by a red, green, blue and alpha values.\n\nMost raster images uses the standard RGB model, so, for the most part, you will\nsee arrays sizes that are multiples of 3. But some images, specially the ones\nthat are stored in PNG files, might be using the RGBA model, and, therefore, are\nrepresented by an array whose size is a multiple of 4.\n\nIn our case here, the example image of our project (@fig-pascal) is a raster image\nstored in a PNG file, and this specific image is using the RGBA color model. So\neach pixel in the image is represented by 4 different integer values, and, as consequence,\nto store this image in our Zig code, we need to create an array whose size is a multiple of 4.\n\n\n## The PNG library we are going to use\n\nLet's begin our project by focusing on writing the necessary Zig code to\nread the data from the PNG file. In other words, we want\nto read the PNG file exposed at @fig-pascal, and parse it's data to extract the 2D matrix of pixels\nthat represents the image.\n\nAs we have discussed at @sec-pixel-repr, the image that we are using as example here\nis a PNG file that uses the RGBA color model, and, threfore, each pixel of the image\nis represented by 4 integer values. You can download this image by visiting the `ZigExamples/image_filter`\n[folder at the official repository of this book](https://github.com/pedropark99/zig-book/tree/main/ZigExamples/image_filter)[^img-filter-folder].\nYou can also find in this folder the complete source code of this small project that we\nare developing here.\n\n[^img-filter-folder]: \n\nThere are some C libraries available that we can use to read and parse PNG files.\nThe most famous and used of all is the `libpng`, which is the \"official library\" for reading and writing\nPNG files. This C library is available in most operating system. But this C library is very known\nfor being a little complex and hard to use.\n\nThat is why, I'm going to use a more modern alternative here in this project, which is the `libspng` library.\nI have choose to use this C library here, because it is much, much\nsimpler to use than `libpng`, and also, offers very good performance for all operations.\nYou can checkout the [official website of the library](https://libspng.org/)[^libspng]\nto know more about it.\nYou will also find there, some documentation that might help you to understand and\nfollow the code examples exposed here.\n\n[^libspng]: \n\n\nFirst of all, remember to build and install this `libspng` into your system. Because\nif you don't do this step, the `zig` compiler will not find the files and resources of\nthis library in your computer, and link them with the Zig code that we are going to write here\ntogether. There is good information about how to build and install the library at the\n[build section of the library documentation at the official website](https://libspng.org/docs/build/)[^lib-build].\n\n[^lib-build]: \n\n\n\n\n## Reading the PNG file\n\nIn order to extract the pixel data from the PNG file, we need to read and decode the file.\nA PNG file is just a binary file written in the \"PNG format\". We need to decode the\nfile in order to extract the pixel data from it. Luckily, the `libspng` library offers\na function called `spng_decode_image()` that does all this heavy work for us.\n\nNow, since `libspng` is a C library, most of the file and I/O operations in this library are made by using\na `FILE` C pointer. Because of that, is probably a better idea to use the `fopen()` C function\nto open our PNG file, instead of using the `openFile()` method that I introduced at @sec-filesystem.\nThat is why I'm importing the `stdio.h` header in this project, and using the `fopen()` C function to open the file.\n\nIf you look at the snippet below, you can see that we are:\n\n1. opening the PNG file with `fopen()`.\n1. creating the `libspng` context with `spng_ctx_new()`.\n1. using `spng_set_png_file()` to specify the `FILE` object that reads the PNG file that we are going to use.\n\nEvery operation in `libspng` is made through a \"context object\". In our snippet below, this object is `ctx`.\nAlso, to perform an operation over a PNG file, we need to specify which PNG file we are referring to.\nThis is the job of `spng_set_png_file()`. We are using this function to specify the file descriptor\nobject that reads the PNG file that we want to use.\n\n\n```zig\nconst c = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n});\nconst png = @cImport({\n @cInclude(\"spng.h\");\n});\n\nconst path = \"pedro_pascal.png\";\nconst file_descriptor = c.fopen(path, \"rb\");\nif (file_descriptor == null) {\n @panic(\"Could not open file!\");\n}\nconst ctx = png.spng_ctx_new(0) orelse unreachable;\n_ = png.spng_set_png_file(\n ctx, @ptrCast(file_descriptor)\n);\n```\n\nBefore we continue, is important to emphasize the following: since we have opened the file with `fopen()`,\nwe have to remember to close the file at the end of the program, with `fclose()`.\nIn other words, after we have done everything that we wanted to do with the PNG file\n`pedro_pascal.png`, we need to close the file, by applying `fclose()` over the file descriptor object.\nWe could use also the `defer` keyword to help us in this task, if we want to.\nThis code snippet below demonstrates this step:\n\n```zig\nif (c.fclose(file_descriptor) != 0) {\n return error.CouldNotCloseFileDescriptor;\n}\n```\n\n\n\n\n### Reading the image header section\n\nNow, the context object `ctx` is aware of our PNG file `pedro_pascal.png`, and it has access to\na file descriptor object to this file, and we can start doing operations in this file. The first thing that we\nare going to do is to read the \"image header section\" of the PNG file. This \"image header section\" is the section\nof the file that contains some basic information about the PNG file, like, the bit depth of the pixel data\nof the image, the color model used in the file, the dimensions of the image (height and width in number of pixels),\netc.\n\nTo make things easier, I will encapsulate this \"read image header\" operation into a\nnice small function called `get_image_header()`. All that this function needs to do\nis to call the `spng_get_ihdr()` function. This function from `libspng` is responsible\nfor reading the image header data, and storing it into a C struct named `spng_ihdr`.\nThus, an object of type `spng_ihdr` is a C struct that contains the data from the\nimage header section of the PNG file.\n\nSince this Zig function is receiving a C object (the `libspng` context object) as input, I marked\nthe function argument `ctx` as \"a pointer to the context object\" (`*png.spng_ctx`), following the recommendations\nthat we have discussed at @sec-pass-c-structs.\n\n```zig\nfn get_image_header(ctx: *png.spng_ctx) !png.spng_ihdr {\n var image_header: png.spng_ihdr = undefined;\n if (png.spng_get_ihdr(ctx, &image_header) != 0) {\n return error.CouldNotGetImageHeader;\n }\n\n return image_header;\n}\n\nvar image_header = try get_image_header(ctx);\n```\n\nAlso notice in this function, that I'm checking\nif the `spng_get_ihdr()` function call have returned or not an integer value that is different than zero.\nMost functions from the `libspng` library return a code status as result, and the code status zero means\n\"success\". So any code status that is different than zero means that an error occurred while running `spng_get_ihdr()`.\nThis is why I'm returning an error value from the function in case the code status returned by the function\nis different than zero.\n\n\n### Allocating space for the pixel data\n\nBefore we read the pixel data from the PNG file, we need to allocate enough space to hold this data.\nBut in order to allocate such space, we first need to know how much space we need to allocate.\nThe image dimension is obviously needed to calculate the size of this space. But there are\nother elements that also affect this number, such as the color model used in the image, the bit depth, and others.\n\nAnyway, all of this means that calculating the size of the space that we need, is not a simple task.\nThat is why the `libspng` library offers a utility function named\n`spng_decoded_image_size()` to calculate this size for us. Once again, I'm going\nto encapsulate the logic around this function into a nice small Zig function\nnamed `calc_output_size()`.\nYou can see that this function returns a nice integer value as result, informing\nthe size of the space that we need to allocate.\n\n\n```zig\nfn calc_output_size(ctx: *png.spng_ctx) !u64 {\n var output_size: u64 = 0;\n const status = png.spng_decoded_image_size(\n ctx, png.SPNG_FMT_RGBA8, &output_size\n );\n if (status != 0) {\n return error.CouldNotCalcOutputSize;\n }\n return output_size;\n}\n```\n\n\n\nYou might quest yourself what the value `SPNG_FMT_RGBA8` means. This value is actually an enum value defined in the `spng.h` header file.\nThis enum is used to identify a \"PNG format\". More precisely, it identifies a PNG file\nthat uses the RGBA color model and 8 bit depth. So by providing this enum value as input to the `spng_decoded_image_size()` function,\nwe are saying to this function to calculate the size of the decoded pixel data considering a PNG file that follows this\n\"RGBA color model with 8 bit depth format\".\n\nHaving this function, we can use it in conjunction with an allocator object, to allocate an array of bytes (`u8` values) that is big enough\nto store the decoded pixel data of the image. Notice that I'm using `@memset()` to properly initialize\nthe entire array to zero.\n\n```zig\nconst output_size = try calc_output_size(ctx);\nvar buffer = try allocator.alloc(u8, output_size);\n@memset(buffer[0..], 0);\n```\n\n\n### Decoding the image data\n\nNow that we have the necessary space to store the decoded pixel data of the image,\nwe can start to actually decode and extract this pixel data from the image,\nby using the `spng_decode_image()` function.\n\nThe `read_data_to_buffer()` function exposed below summarises the necessary\nsteps to read this decoded pixel data, and store it into an input buffer.\nNotice that this function is encapsulating the logic around the `spng_decode_image()` function.\nAlso, we are using the `SPNG_FMT_RGBA8` enum value once again to inform the corresponding function,\nthat the PNG image being decoded, uses the RGBA color model and 8 bit depth.\n\n```zig\nfn read_data_to_buffer(ctx: *png.spng_ctx, buffer: []u8) !void {\n const status = png.spng_decode_image(\n ctx,\n buffer.ptr,\n buffer.len,\n png.SPNG_FMT_RGBA8,\n 0\n );\n\n if (status != 0) {\n return error.CouldNotDecodeImage;\n }\n}\n```\n\nHaving this function at hand, we can apply it over our context object, and also, over\nthe buffer object that we have created in the previous section to hold the decoded pixel data\nof the image:\n\n```zig\ntry read_data_to_buffer(ctx, buffer[0..]);\n```\n\n\n### Looking at the pixel data\n\nNow that we have the pixel data stored in our \"buffer object\", we can take just\na quick look at the bytes. In the example below, we are looking at the first\n12 bytes in the decoded pixel data.\n\nIf you take a close look at these values, you might notice that every 4 bytes\nin the sequence is 255. Which, coincidentally is the maximum possible integer value\nto be represented by a `u8` value. So, if the range from 0 to 255, which is the range\nof integer values that can be represented by an `u8` value, can be represented as a scale from 0% to 100%,\nthese 255 values are essentially 100% in that scale.\n\nIf you recall from @sec-pixel-repr, I have\ndescribed in that section that our `pedro_pascal.png` PNG file uses the RGBA color model,\nwhich adds an alpha (or transparency) byte to each pixel in the image.\nAs consequence, each pixel in the image is represented by 4 bytes. Since we are looking\nhere are the first 12 bytes in the image, it means that we are looking at the data of\nthe first $12 / 4 = 3$ pixels in the image.\n\nSo, based on how these first 12 bytes (or these 3 pixels) look, with these 255 values at every 4 bytes, we can say that is likely\nthat every pixel in the image have alpha (or transparency) setted to 100%. This might not be true,\nbut, is the most likely possibility. Also, if we look at the image itself, which if your recall is\nexposed at @fig-pascal, we can see that the transparency of the image does not change across the image,\nwhich enforces this theory.\n\n\n```zig\ntry stdout.print(\n \"{any}\\n\", .{buffer[0..12]}\n);\n```\n\n```\n{\n 200, 194, 216, 255, 203, 197,\n 219, 255, 206, 200, 223, 255\n}\n```\n\n\nWe can see in the above result that the first pixel in this image have 200 of red, 194 of green, and 216 of blue.\nHow do I know the order in which the colors appears in the sequence? If you have not guessed that yet,\nis because of the acronym RGB. First RED, then GREEN, then BLUE. If we scale these integer values\naccording to our scale of 0% to 100% (0 to 255), we get 78% of red, 76% of green and 85% of blue.\n\n\n\n## Applying the image filter\n\nNow that we have the data of each pixel in the image, we can focus on applying our image\nfilter over these pixels. Remember, our objective here is to apply a grayscale filter over\nthe image. A grayscale filter is a filter that transforms a colored image into a grayscale image.\n\nThere are different formulas and strategies to transform a colored image into a grayscale image.\nBut all of these different strategies involve\napplying some math over the colors of each pixel. In this project, we are going to use the\nmost general formula, which is exposed below.\nThis formula considers $r$ as the red of the pixel, $g$ as the green, $b$ as the blue, and $p'$ as the\nlinear luminance of the pixel.\n\n$$\n p' = (0.2126 \\times r) + (0.7152 \\times g) + (0.0722 \\times b)\n$${#eq-grayscale}\n\nThis @eq-grayscale is the formula to calculate the linear luminance of a pixel. Is worth noting that this formula\nworks only for images whose pixels are using the sRGB color space, which is the standard color space\nfor the web. This means that, ideally, all images on the web should use this color space. Luckily,\nthis is our case here, i.e. the `pedro_pascal.png` image is using this sRGB color space, and, as consequence,\nwe can use the @eq-grayscale. You can read more about this at the Wikipedia page for grayscale [@wiki_grayscale].\n\nThe `apply_image_filter()` function exposed below summarises the necessary steps to\napply @eq-grayscale over the pixels in the image. We just apply this function\nover our buffer object that contains our pixel data, and, as result, the pixel\ndata stored in this buffer object should now represent the grayscale version of our image.\n\n```zig\nfn apply_image_filter(buffer:[]u8) !void {\n const len = buffer.len;\n const red_factor: f16 = 0.2126;\n const green_factor: f16 = 0.7152;\n const blue_factor: f16 = 0.0722;\n var index: u64 = 0;\n while (index < (len - 4)) : (index += 4) {\n const rf: f16 = @floatFromInt(buffer[index]);\n const gf: f16 = @floatFromInt(buffer[index + 1]);\n const bf: f16 = @floatFromInt(buffer[index + 2]);\n const y_linear: f16 = (\n (rf * red_factor) + (gf * green_factor)\n + (bf * blue_factor)\n );\n buffer[index] = @intFromFloat(y_linear);\n buffer[index + 1] = @intFromFloat(y_linear);\n buffer[index + 2] = @intFromFloat(y_linear);\n }\n}\n\ntry apply_image_filter(buffer[0..]);\n```\n\n\n\n## Saving the grayscale version of the image\n\nSince we have now the grayscale version of our image stored in our buffer object,\nwe need to encode this buffer object back into the \"PNG format\", and save the encoded data into\na new PNG file in our filesystem, so that we can access and see the grayscale version of our image\nthat was produced by our small program.\n\nTo do that, the `libspng` library help us once again by offering an \"encode data to PNG\" type of function,\nwhich is the `spng_encode_image()` function. But in order to \"encode data to PNG\" with `libspng`, we need\nto create a new context object. This new context object must use a \"encoder context\", which\nis identified by the enum value `SPNG_CTX_ENCODER`.\n\nThe `save_png()` function exposed below, summarises all the necessary steps to save the\ngrayscale version of our image into a new PNG file in the filesystem. By default, this\nfunction will save the grayscale image into a file named `pedro_pascal_filter.png` in the CWD.\n\nNotice in this code example that we are using the same image header object (`image_header`) that we have\ncollected previously with the `get_image_header()` function. Remember, this image header object\nis a C struct (`spng_ihdr`) that contains basic information about our PNG file, specially\nthe dimensions of the image, the color model used, etc.\n\nIf we wanted to save a very different image in this new PNG file, e.g. an image\nwith different dimensions, or, an image that uses a different color model, a different bit depth, etc.\nwe would need to create a new image header (`spng_ihdr`) object that describes the properties\nof this new image.\n\nBut we are essentially saving the same image that we have begin with here (the dimensions of\nthe image, the color model, etc. are all still the same). The only difference\nbetween the two images are the colors of the pixels, which are now shades of gray.\nAs consequence, we can safely use the exact same image header data\nin this new PNG file.\n\n\n\n```zig\nfn save_png(image_header: *png.spng_ihdr, buffer: []u8) !void {\n const path = \"pedro_pascal_filter.png\";\n const file_descriptor = c.fopen(path.ptr, \"wb\");\n if (file_descriptor == null) {\n return error.CouldNotOpenFile;\n }\n const ctx = (\n png.spng_ctx_new(png.SPNG_CTX_ENCODER)\n orelse unreachable\n );\n defer png.spng_ctx_free(ctx);\n _ = png.spng_set_png_file(ctx, @ptrCast(file_descriptor));\n _ = png.spng_set_ihdr(ctx, image_header);\n\n const encode_status = png.spng_encode_image(\n ctx,\n buffer.ptr,\n buffer.len,\n png.SPNG_FMT_PNG,\n png.SPNG_ENCODE_FINALIZE\n );\n if (encode_status != 0) {\n return error.CouldNotEncodeImage;\n }\n if (c.fclose(file_descriptor) != 0) {\n return error.CouldNotCloseFileDescriptor;\n }\n}\n\ntry save_png(&image_header, buffer[0..]);\n```\n\nAfter we execute this `save_png()` function, we should have a new PNG file\ninside our CWD, named `pedro_pascal_filter.png`. If we open this PNG file,\nwe will see the same image exposed at @fig-pascal-gray.\n\n\n## Building our project\n\nNow that we have written the code, let's discuss how can we build/compile this project.\nTo do that, I'm going to create a `build.zig` file in the root directory of our project,\nand start writing the necessary code to compile the project, using the knowledge\nthat we have acquired from @sec-build-system.\n\n\nWe first create the build target for our executable file, that executes our\nZig code. Let's suppose that all of our Zig code was written into a Zig module\nnamed `image_filter.zig`, then, the `exe` object described in the build script below\ndescribes the build target for our executable file.\n\nSince we have used C code from the `libspng` library in our Zig code,\nwe need to link our Zig code (which is in the `exe` build target) to both\nthe C Standard Library, and, to the `libspng` library. We do that, by calling\nthe `linkLibC()` and `linkSystemLibrary()` methods from our `exe` build target.\n\n```zig\nconst std = @import(\"std\");\npub fn build(b: *std.Build) void {\n const target = b.standardTargetOptions(.{});\n const optimize = b.standardOptimizeOption(.{});\n const exe = b.addExecutable(.{\n .name = \"image_filter\",\n .root_source_file = b.path(\"src/image_filter.zig\"),\n .target = target,\n .optimize = optimize,\n });\n exe.linkLibC();\n // Link to spng library:\n exe.linkSystemLibrary(\"spng\");\n b.installArtifact(exe);\n}\n```\n\nBecause we are using the `linkSystemLibrary()` method, it means that the library\nfiles for `libspng` are searched in your system to be linked with the `exe` build target.\nIf you have not yet built and installed the `libspng` library into your system, this\nlinkage step will likely not work. Because it will not find the library files.\n\n\nSo, just remember to install `libspng` in your system, if you want to build this project.\nHaving this build script above written, we can finally build our project by\nrunning the `zig build` command in the terminal.\n\n\n", - "supporting": [ - "13-image-filter_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/Chapters/14-threads/execute-results/html.json b/_freeze/Chapters/14-threads/execute-results/html.json new file mode 100644 index 00000000..1f378664 --- /dev/null +++ b/_freeze/Chapters/14-threads/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "0b600f7714de797984e518f84475a873", + "result": { + "engine": "knitr", + "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Introducing threads and parallelism in Zig {#sec-thread}\n\nThreads are available in Zig through the `Thread` struct\nfrom the Zig Standard Library. This struct represents a kernel thread, and it follows a POSIX Thread pattern,\nmeaning that, it works similarly to a thread from the `pthread` C library, which is usually available on any distribution\nof the GNU C Compiler (`gcc`). If you are not familiar with threads, I will give you some threory behind it first, shall we?\n\n\n## What are threads? {#sec-what-thread}\n\nA thread is basically a separate context of execution.\nWe use threads to introduce parallelism into our program,\nwhich in most cases, makes the program runs faster, because we have multiple tasks\nbeing performed at the same time, parallel to each other.\n\nPrograms are normally single-threaded by default. Which means that each program\nusually runs on a single thread, or, a single context of execution. When we have only one thread running, we have no\nparallelism. And when we don't have parallelism, the commands are executed sequentially, that is,\nonly one command is executed at a time, one after another. By creating multiple threads inside our program,\nwe start to execute multiple commands at the same time.\n\nPrograms that create multiple threads are very commom on the wild. Because many different types\nof applications are well suited for parallelism. Good examples are video and photo-editing applications\n(e.g. Adobe Photoshop or DaVinci Resolve)\n, games (e.g. The Witcher 3), and also web browsers (e.g. Google Chrome, Firefox, Microsoft Edge, etc).\nFor example, in web browsers, threads are normally used to implement tabs.\nIn other words, the tabs in a web browsers usually run as separate threads in the main process of\nthe web browser. That is, each new tab that you open in your web browser,\nusually runs on a separate thread of execution.\n\nBy running each tab in a separate thread, we allow all open tabs in the browser to run at the same time,\nand independently from each other. For example, you might have YouTube, or Spotify, currently opened in\na tab, and you are listening to some podcast in that tab, while, at the same time,\nyou are working in another tab, writing an essay on Google Docs. Even if you are not looking\ninto the YouTube tab, you can still hear the podcast only because this YouTube tab is running in parallel\nwith the other tab where Google Docs is running.\n\nWithout threads, the other alternative would be to run each tab as a completely separate running\nprocess in your computer. But that would be a bad choice, because just a few tabs would already consume\ntoo much power and resources from your computer. In other words, is very expensive to create a completely new process,\ncompared to creating a new thread of execution. Also, the chances of you experiencing lag and overhead\nwhile using the browser would be significant. Threads are faster to create, and they also consume\nmuch, much less resources from the computer, specially because they share some resources\nwith the main process.\n\nTherefore, is the use of threads in modern web browsers that allows you to hear the podcast\nat the same time while you are writing something on Google Docs.\nWithout threads, a web browser would probably be limited to just one single tab.\n\nThreads are also well-suited for anything that involves serving requests or orders.\nBecause serving a request takes time, and usually involves a lot of \"waiting time\".\nIn other words, we spend a lot of time in idle, waiting for something to complete.\nFor example, consider a restaurant. Serving orders in a restaurant usually involves\nthe following steps:\n\n1. receive order from the client.\n1. pass the order to the kitchen, and wait for the food to be cooked.\n1. start cooking the food in the kitchen.\n1. when the food is fully cooked deliver this food to the client.\n\nIf you think about the bulletpoints above, you will notice that one big moment of waiting\nis present in this hole process, which is while the food is being prepared and cooked\ninside the kitchen. Because while the food is being prepped, both the waiter and the client\nitself are waiting for the food to be ready and delivered.\n\nIf we write a program to represent this restaurant, more specifically, a single-threaded program, then,\nthis program would be very inefficient. Because the program would stay in idle, waiting for a considerable amount\nof time on the \"check if food is ready\" step.\nConsider the code snippet exposed below that could potentially represent such\nprogram.\n\nThe problem with this program is the while loop. This program will spend a lot of time\nwaiting on the while loop, doing nothing more than just checking if the food is ready.\nThis is a waste of time. Instead of waiting for something to happen, the waiter\ncould just send the order to the kitchen, and just move on, and continue with receiving\nmore orders from other clients, and sending more orders to the kitchen, insteading\nof doing nothing and waiting for the food to be ready.\n\n```zig\nconst order = Order.init(\"Pizza Margherita\", n = 1);\nconst waiter = Waiter.init();\nwaiter.receive_order(order);\nwaiter.ask_kitchen_to_cook();\nvar food_not_ready = false;\nwhile (food_not_ready) {\n food_not_ready = waiter.is_food_ready();\n}\nconst food = waiter.get_food_from_kitchen();\nwaiter.send_food_to_client(food);\n```\n\nThis is why threads would be a great fit for this program. We could use threads\nto free the waiters from their \"waiting duties\", so they can go on with their\nother tasks, and receive more orders. Take a look at the next example, where I have re-written the above\nprogram into a different program that uses threads to cook and deliver the orders.\n\nYou can see in this program that when a waiter receives a new order\nfrom a client, this waiter executes the `send_order()` function.\nThe only thing that this function does is: it creates a new thread\nand detaches it. Since creating a thread is a very fast operation,\nthis `send_order()` function returns almost immediatly,\nso the waiter spends almost no time worring about the order, and just\nmove on and tries to get the next order from the clients.\n\nInside the new thread created, the order get's cooked by a chef, and when the\nfood is ready, it is delivered to the client's table.\n\n\n```zig\nfn cook_and_deliver_order(order: *Order) void {\n const chef = Chef.init();\n const food = chef.cook(order.*);\n chef.deliver_food(food);\n}\nfn send_order(order: Order) void {\n const cook_thread = Thread.spawn(\n .{}, cook_and_deliver_order, .{&order}\n );\n cook_thread.detach();\n}\n\nconst waiter = Waiter.init();\nwhile (true) {\n const order = waiter.get_new_order();\n if (order) {\n send_order(order);\n }\n}\n```\n\n\n\n## Threads versus processes\n\nWhen we run a program, this program is executed as a *process* in the operating system.\nThis is a one to one relationship, each program or application that you execute\nis a separate process in the operating system. But each program, or each process,\ncan create and contain multiple threads inside of it. Therefore,\nprocesses and threads have a one to many relationship.\n\nThis also means that every thread that we create is always associated with a particular process in our computer.\nIn other words, a thread is always a subset (or a children) of an existing process.\nAll threads share some of the resources associated with the process from which they were created.\nAnd because threads share resources with the process, they are very good for making communication\nbetween tasks easier.\n\nFor example, suppose that you were developing a big and complex application\nthat would be much simpler if you could split it in two, and make these two separate pieces talk\nwith each other. Some programmers opt to effectively write these two pieces of the codebase as two\ncompletely separate programs, and then, they use IPC (*inter-process communication*) to make these\ntwo separate programs/processes talk to each other, and make them work together.\n\nHowever, some programmers find IPC hard to deal with, and, as consequence,\nthey prefer to write one piece of the codebase as the \"main part of the program\",\nor, as the part of the code that runs as the process in the operating system,\nwhile the other piece of the codebase is written as a task to be executed in\na new thread. A process and a thread can easily comunicate with each other\nthrough both control flow, and also, through data, because they share and have\naccess to the same standard file descriptors (`stdout`, `stdin`, `stderr`) and also to the same memory space\non the heap and global data section.\n\n\nIn more details, each thread that you create have a separate stack frame reserved just for that thread,\nwhich essentially means that each local object that you create inside this thread, is local to that\nthread, i.e. the other threads cannot see this local object. Unless this object that you have created\nis an object that lives on the heap. In other words, if the memory associated with this object\nis on the heap, then, the other threads can potentially access this object.\n\nTherefore, objects that are stored in the stack are local to the thread where they were created.\nBut objects that are stored on the heap are potentially accessible to other threads. All of this means that,\neach thread have it's own separate stack frame, but, at the same time, all threads share\nthe same heap, the same standard file descriptors (which means that they share the same `stdout`, `stdin`, `stderr`),\nand the same global data section in the program.\n\n\n\n## Creating a thread\n\nWe create new threads in Zig, by first, importing the `Thread` struct into\nour current Zig module, and then, calling the `spawn()` method of this struct,\nwhich creates (or, \"spawns\") a new thread of execution from our current process.\nThis method have three arguments, which are, respectively:\n\n1. a `SpawnConfig` object, which contains configurations for the spawn process.\n1. the name of the function that is going to be executed (or, that is going to be \"called\") inside this new thread.\n1. a list of arguments (or inputs) to be passed to the function provided in the second argument.\n\nWith these three arguments, you can control how the thread get's created, and also, specify which\nwork (or \"tasks\") will be performed inside this new thread. A thread is just a separate context of execution,\nand we usually create new threads in our code, because we want to perform some work inside this\nnew context of execution. And we specify which exact work, or, which exact steps that are going to be\nperformed inside this context, by providing the name of a function on the second argument of the `spawn()` method.\n\nThus, when this new thread get's created, this function that you provided as input to the `spawn()`\nmethod get's called, or, get's executed inside this new thread. You can control the\narguments, or, the inputs that are passed to this function when it get's called, by providing\na list of arguments (or a list of inputs) on the third argument of the `spawn()` method.\nThese arguments are passed to the function in the same order that they are\nprovided to `spawn()`.\n\nFurthermore, the `SpawnConfig` is a struct object with only two possible fields, or, two possible members, that you\ncan set to tailor the spawn behaviour. These fields are:\n\n- `stack_size`: you can provide an `usize` value to specify the size (in bytes) of the thread's stack frame. By default, this value is: $16 \\times 1024 \\times 1024$.\n- `allocator`: you can provide an allocator object to be used when allocating memory for the thread.\n\nTo use one of these two fields (or, \"configs\") you just have to create a new object of type `SpawnConfig`,\nand provide this object as input to the `spawn()` method. But, if you are not interested in using\none of these configs, and you are ok with using just the defaults, you can just provide an anonymous\nstruct literal (`.{}`) in the place of this `SpawnConfig` argument.\n\nAs our first, and very simple example, consider the code exposed below.\nInside the same program, you can create multiple threads of execution if you want to.\nBut, in this first example, we are creating just a single thread of execution, because\nwe call `spawn()` only once.\n\nAlso, notice in this example that we are executing the function `do_some_work()`\ninside the new thread. Since this function receives no inputs, because it has\nno arguments, in this instance, we have passed an empty list, or, more precisely, an empty and anonymous struct (`.{}`)\nin the third argument of `spawn()`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Thread = std.Thread;\nfn do_some_work() !void {\n _ = try stdout.write(\"Starting the work.\\n\");\n std.time.sleep(100 * std.time.ns_per_ms);\n _ = try stdout.write(\"Finishing the work.\\n\");\n}\n\npub fn main() !void {\n const thread = try Thread.spawn(.{}, do_some_work, .{});\n thread.join();\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nStarting the work.Finishing the work.\n```\n\n\n:::\n:::\n\n\n\n\nNotice the use of `try` when calling the `spawn()` method. This means\nthat this method can return an error in some circunstances. One circunstance\nin particular is when you attempt to create a new thread, when you have already\ncreated too much (i.e. you have excedeed the quota of concurrent threads in your system).\n\nBut, if the new thread is succesfully created, the `spawn()` method returns a handler\nobject (which is just an object of type `Thread`) to this new thread. You can use\nthis handler object to effectively control all aspects of the thread.\n\nThe instant that you create the new thread, the function that you provided as input to `spawn()`\nget's invoked (i.e. get's called) to start the execution on this new thread.\nIn other words, everytime you call `spawn()`, not only a new thread get's created,\nbut also, the \"start work button\" of this thread get's automatically pressed.\nSo the work being performed in this thread starts at the moment that the thread is created.\nThis is similar to how `pthread_create()` from the `pthreads` library in C works,\nwhich also starts the execution at the moment that the thread get's created.\n\n\n## Returning from a thread\n\nWe have learned on the previous section that the execution of the thread starts at the moment\nthat the thread get's created. Now, we will learn how to \"join\" or \"detach\" a thread in Zig.\n\"Join\" and \"detach\" are operations that control how the thread returns to\nthe main thread, or, to the main process in our program.\n\nWe perform these operations by using the methods `join()` and `detach()` from the thread handler object.\nEvery thread that you create can be marked as either *joinable* or *detached* [@linux_pthread_create].\nYou can turn a thread into a *detached* thread by calling the `detach()` method\nfrom the thread handler object. But if you call the `join()` method instead, then, this thread\nbecomes a *joinable* thread.\n\nA thread cannot be both *joinable* and *detached*. Which in general means\nthat you cannot call both `join()` and `detach()` on the same thread.\nBut a thread must be one of the two, meaning that, you should always call\neither `join()` or `detach()` over a thread. If you don't call\none of these two methods over your thread, you introduce undefined behaviour into your program,\nwhich is described at @sec-not-call-join-detach.\n\nNow, let's describe what each of these two methods do to your thread.\n\n\n### Joining a thread\n\nWhen you join a thread, you are essentially saying: \"Hey! Could you please wait for the thread to finish,\nbefore you continue with your execution?\". For example, if we comeback to our first and simpliest example\nof a thread in Zig, in that example we have created a single thread inside the `main()` function of our program,\nand just called `join()` over this thread at the end. This section of the code example is reproduced below.\n\nBecause we are joining this new thread inside the `main()`'s scope, it means that the\nexecution of the `main()` function is temporarily stopped, to wait for the execution of the thread\nto finish. That is, the execution of `main()` stops temporarily at the line where `join()` get's called,\nand it will continue only after the thread has finished it's tasks.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn main() !void {\n const thread = try Thread.spawn(.{}, do_some_work, .{});\n thread.join();\n}\n```\n:::\n\n\n\n\nBecause we have joined this new thread inside `main()`, by calling `join()`, we have a\ngarantee that this new thread will finish before the end of the execution of `main()`.\nBecause it is garanteed that `main()` will wait for the thread to finish it's tasks.\nYou could also interpret this as: the execution of main will hang at\nthe line where `join()` is called, and the next lines of code that come after\nthis `join()` call, will be executed solely after the execution of main\nis \"unlocked\" after the thread finish it's tasks.\n\nIn the example above, there is no more expressions after the `join()` call. We just have the end\nof the `main()`'s scope, and, therefore after the thread finish it's tasks, the execution\nof our program just ends, since there is nothing more to do. But what if we had more stuff to do\nafter the join call?\n\nTo demonstrate this other possibility, consider the next example exposed\nbelow. Here, we create a `print_id()` function, that just receives an id\nas input, and prints it to `stdout`. In this example, we are creating two\nnew threads, one after another. Then, we join the first thread, then,\nwe wait for two hole seconds, then, at last, we join the second thread.\n\nThe idea behind this example is that the last `join()` call is executed\nonly after the first thread finish it's task (i.e. the first `join()` call),\nand also, after the two seconds of delay. If you compile and run this\nexample, you will notice that most messages are quickly printed to `stdout`,\ni.e. they appear almost instantly on your screen.\nHowever, the last message (\"Joining thread 2\") takes aroung 2 seconds to appear\nin the screen.\n\n\n```zig\nfn print_id(id: *const u8) !void {\n try stdout.print(\"Thread ID: {d}\\n\", .{id.*});\n}\n\npub fn main() !void {\n const id1: u8 = 1;\n const id2: u8 = 2;\n const thread1 = try Thread.spawn(.{}, print_id, .{&id1});\n const thread2 = try Thread.spawn(.{}, print_id, .{&id2});\n\n _ = try stdout.write(\"Joining thread 1\\n\");\n thread1.join();\n std.time.sleep(2 * std.time.ns_per_s);\n _ = try stdout.write(\"Joining thread 2\\n\");\n thread2.join();\n}\n```\n\n```\nThread ID: Joining thread 1\n1\nThread ID: 2\nJoining thread 2\n```\n\nThis demonstrates that both threads finish their work (i.e. printing the IDs)\nvery fast, before the two seconds of delay end. Because of that, the last `join()` call\nreturns pretty much instantly. Because when this last `join()` call happens, the second\nthread have already finished it's task.\n\nNow, if you compile and run this example, you will also notice that, in some cases,\nthe messages get intertwined with each other. In other words, you might see\nthe message \"Joining thread 1\" inserted in the middle of the message \"Thread 1\",\nor vice-versa. This happens because:\n\n- the threads are executing basically at the same time as the main process of the program (i.e. the `main()` function).\n- the threads share the same `stdout` from the main process of the program, which means that the messages that the threads produce are sent to exact same place as the messages produced by the main process.\n\nBoth of these points were described previously at @sec-what-thread.\nSo the messages might get intertwined because they are being produced and\nsent to the same `stdout` roughly at the same time.\nAnyway, when you call `join()` over a thread, the current process will wait\nfor the thread to finish before it continues, and, when the thread does finishs it's\ntask, the resources associated with this thread are automatically freed, and,\nthe current process continues with it's execution.\n\n\n### Detaching a thread\n\nWhen you detach a thread, by calling the `detach()` method, the thread is marked as *detached*.\nWhen a *detached* thread terminates, its resources are automatically released back to the system without\nthe need for another thread to join with this terminated thread.\n\nIn other words, when you call `detach()` over a thread is like when your children becomes adults,\ni.e. they become independent from you. A detached thread frees itself, and it does need to report the results back\nto you, when the thread finishs it's task. Thus, you normally mark a thread as *detached*\nwhen you don't need to use the return value of the thread, or, when you don't care about\nwhen exactly the thread finishs it's job, i.e. the thread solves everything by itself.\n\nTake the code example below. We create a new thread, detach it, and then, we just\nprint a final message before we end our program. We use the same `print_id()`\nfunction that we have used over the previous examples.\n\n\n```zig\nfn print_id(id: *const u8) !void {\n try stdout.print(\"Thread ID: {d}\\n\", .{id.*});\n}\n\npub fn main() !void {\n const id1: u8 = 1;\n const thread1 = try Thread.spawn(.{}, print_id, .{&id1});\n thread1.detach();\n _ = try stdout.write(\"Finish main\\n\");\n}\n```\n\n```\nFinish main\n```\n\nNow, if you look closely at the output of this code example, you will notice\nthat only the final message in main was printed to the console. The message\nthat was supposed to be printed by `print_id()` did not appear in the console.\nWhy? Is because the main process of our program has finished first,\nbefore the thread was able to say anything.\n\nAnd that is perfectly ok behaviour, because the thread was detached, so, it was\nable to free itself, without the need of the main process.\nIf you ask main to sleep (or \"wait\") for some extra nanoseconds, before it ends, you will likely\nsee the message printed by `print_id()`, because you give enough time for the thread to\nfinish before the main process ends.\n\n\n## Thread pools\n\nThread pools is a very popular programming pattern, which is used specially on servers and daemons processes. A thread pool is just a\nset of threads, or, a \"pool\" of threads. Many programmers like to use this pattern, because it makes\neasier to manage and use multiple threads, instead of manually creating the threads when you need them.\n\nAlso, using thread pools might increase performance as well in your program,\nespecially if your program is constantly creating threads to perform short-lived tasks.\nIn such instance, a thread pool might cause an increase in performance because you do not have be constantly\ncreating and destroying threads all the time, so you don't face a lot of the overhead involved\nin this constant process of creating and destroying threads.\n\nThe main idea behind a thread pool is to have a set of threads already created and ready to perform\ntasks at all times. You create a set of threads at the moment that your program starts, and keep\nthese threads alive while your program runs. Each of these threads will be either performing a task, or,\nwaiting for a task to be assigned.\nEvery time a new task emerges in your program, this task is added to a \"queue of tasks\".\nThe moment that a thread becomes available and ready to perform a new task,\nthis thread takes the next task in the \"queue of tasks\", then,\nit simply performs the task.\n\nThe Zig Standard Library offers a thread pool implementation on the `std.Thread.Pool` struct.\nYou create a new instance of a `Pool` object by providing a `Pool.Options` object\nas input to the `init()` method of this struct. A `Pool.Options` object, is a struct object that contains\nconfigurations for the pool of threads. The most important settings in this struct object are\nthe members `n_jobs` and `allocator`. As the name suggests, the member `allocator` should receive an allocator object,\nwhile the member `n_jobs` specifies the number of threads to be created and maintained in this pool.\n\nConsider the example exposed below, that demonstrates how can we create a new thread pool object.\nHere, we create a `Pool.Options` object that contains\na general purpose allocator object, and also, the `n_jobs` member was set to 4, which\nmeans that the thread pool will create and use 4 threads.\n\nAlso notice that the `pool` object was initially set to `undefined`. This allow us\nto initially declare the thread pool object, but not properly instantiate the\nunderlying memory of the object. You have to initially declare your thread pool object\nby using `undefined` like this, because the `init()` method of `Pool` needs\nto have an initial pointer to properly instantiate the object.\n\nSo, just\nremember to create your thread pool object by using `undefined`, and then,\nafter that, you call the `init()` method over the object.\nYou should also not forget to call the `deinit()` method over the thread pool\nobject, once you are done with it, to release the resources allocated for the thread pool. Otherwise, you will\nhave a memory leak in your program.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Pool = std.Thread.Pool;\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const opt = Pool.Options{\n .n_jobs = 4,\n .allocator = allocator,\n };\n var pool: Pool = undefined;\n _ = try pool.init(opt);\n defer pool.deinit();\n}\n```\n:::\n\n\n\n\nNow that we know how to create `Pool` objects, we have\nto understand how to assign tasks to be executed by the threads in this pool object.\nTo assign a task to be performed by a thread, we need to call the `spawn()` method\nfrom the thread pool object.\n\nThis `spawn()` method works identical to the `spawn()` method from the\n`Thread` object. The method have almost the same arguments as the previous one,\nmore precisely, we don't have to provide a `SpawnConfig` object in this case.\nBut instead of creating a new thread, this `spawn()` method from\nthe thread pool object just register a new task in the internal \"queue of tasks\" to be performed,\nand any available thread in the pool will get this task, and it will simply perform the task.\n\nIn the example below, we are using our previous `print_id()` function once again.\nBut you may notice that the `print_id()` function is a little different this time,\nbecause now we are using `catch` instead of `try` in the `print()` call.\nCurrently, the `Pool` struct only supports functions that don't return errors\nas tasks. Thus, when assigining tasks to threads in a thread pool, is essential to use functions\nthat don't return errors. That is why we are using `catch` here, so that the\n`print_id()` function don't return an error.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn print_id(id: *const u8) void {\n _ = stdout.print(\"Thread ID: {d}\\n\", .{id.*})\n catch void;\n}\nconst id1: u8 = 1;\nconst id2: u8 = 2;\ntry pool.spawn(print_id, .{&id1});\ntry pool.spawn(print_id, .{&id2});\n```\n:::\n\n\n\n\nThis limitation should probably not exist, and, in fact, it is already on the radar of the\nZig team to fix this issue, and it is being tracked on an [open issue](https://github.com/ziglang/zig/issues/18810)[^issue].\nSo, if you do need to provide a function that might return an error as the task\nto be performed by the threads in the thread pool, then, you are either limited to:\n\n- implementing your own thread pool that does not have this limitation.\n- wait for the Zig team to actually fix this issue.\n\n[^issue]: \n\n\n\n\n## Mutexes\n\nMutexes are a classic component of every thread library. In essence, a mutex is a *Mutually Exclusive Flag*, and this flag\nacts like a type of \"lock\", or as a gate keeper to a particular section of your code. Mutexes are related to thread syncronization,\nmore specifically, they prevent you from having some classic race conditions in your program,\nand, therefore, major bugs and undefined behaviour that are usually difficult to track and understand.\n\nThe main idea behind a mutex is to help us to control the execution of a particular section of the code, and to\nprevent two or more threads from executing this particular section of the code at the same time.\nMany programmers like to compare a mutex to a bathroom door (which usually have a lock).\nWhen a thread locks it's own mutex object, it is like if the bathroom door was locked,\nand, therefore, the other people (in this case, the other threads) that wants to use the same bathroom at the same time\nhave to be patient, and simply wait for the other person (or the other thread) to unlock the door and get out of the bathroom.\n\nSome other programmers also like to explain mutexes by using the analogy of \"each person will have their turn to speak\".\nThis is the analogy used on the [*Multithreading Code* video from the Computherfile project](https://www.youtube.com/watch?v=7ENFeb-J75k&ab_channel=Computerphile)[^computerphile].\nImagine\nif you are in a conversation circle. There is a moderator in this circle, which is the person that decides who\nhave the right to speak at that particular moment. The moderator gives a green card (or some sort of an authorization card) to the person that\nis going to speak, and, as a result, everyone else must be silent and hear this person that has the green card.\nWhen the person finishs talking, it gives the green card back to the moderator, and the moderator decides\nwho is going to talk next, and delivers the green card to that person. And the cycle goes on like this.\n\n[^computerphile]: \n\n\nA mutex acts like the moderator in this conversation circle. The mutex authorizes one single thread to execute a specific section of the code,\nand it also blocks the other threads from executing this same section of the code. If these other threads wants to execute this same\npiece of the code, they are forced to wait for the the authorized thread to finish first.\nWhen the authorized thread finishs executing this code, the mutex authorizes the next thread to execute this code,\nand the other threads are still blocked. Therefore, a mutex is like a moderator that does a \"each thread will have their turn to execute this section of the code\"\ntype of control.\n\n\nMutexes are specially used to prevent data race problems from happening. A data race problem happens when two or more threads\nare trying to read from or write to the same shared object at the same time.\nSo, when you have an object that is shared will all threads, and, you want to avoid two or more threads from\naccessing this same object at the same time, you can use a mutex to lock the part of the code that access this specific object.\nWhen a thread tries to run this code that is locked by a mutex, this thread stops it's execution, and patiently waits for this section of the codebase to be\nunlocked to continue.\n\nIn other words, the execution of the thread is paused while the code section\nis locked by the mutex, and it is unpaused the moment that the code section is unlocked by the other thread that\nwas executing this code section.\nNotice that mutexes are normally used to lock areas of the codebase that access/modify data that is **shared** with all threads,\ni.e. objects that are either stored in the global data section, or, in the heap space of your program.\nSo mutexes are not normally used on areas of the codebase that access/modify objects that are local to the thread.\n\n\n\n### Critical section {#sec-critical-section}\n\nCritical section is a concept commonly associated with mutexes and thread syncronization.\nIn essence, a critical section is the section of the program that a thread access/modify a shared resource\n(i.e. an object, a file descriptor, something that all threads have access to). In other words,\na critical section is the section of the program where race conditions might happen, and, therefore,\nwhere undefined behaviour can be introduced into the program.\n\nWhen we use mutexes in our program, the critical section defines the area of the codebase that we want to lock.\nSo we normally lock the mutex object at the beginning of the critical section,\nand then, we unlock it at the end of the critical section.\nThe two bulletpoints exposed below comes from the \"Critical Section\" article from GeekFromGeeks,\nand they summarise well the role that a critical section plays in the thread syncronization problem [@geeks_critical_section].\n\n\n1. The critical section must be executed as an atomic operation, which means that once one thread or process has entered the critical section, all other threads or processes must wait until the executing thread or process exits the critical section. The purpose of synchronization mechanisms is to ensure that only one thread or process can execute the critical section at a time.\n2. The concept of a critical section is central to synchronization in computer systems, as it is necessary to ensure that multiple threads or processes can execute concurrently without interfering with each other. Various synchronization mechanisms such as semaphores, mutexes, monitors, and condition variables are used to implement critical sections and ensure that shared resources are accessed in a mutually exclusive manner.\n\n\n### Atomic operations {#sec-atomic-operation}\n\nYou will also see the term \"atomic operation\" a lot when reading about threads, race conditions and mutexes.\nIn summary, an operation is categorized as \"atomic\", when there is no way to happen a context switch in\nthe middle of this operation. In other words, this operation is always done from beginning to end, without interruptions\nof another process or operation in the middle of it's execution phase.\n\nNot many operations today are atomic. But why atomic operations matters here? Is because data races\n(which is a type of a race condition) cannot happen on operations that are atomic.\nSo if a particular line in your code performs an atomic operation, then, this line will never\nsuffer from a data race problem. Therefore, programmers sometimes use an atomic operation\nto protect themselves from data race problems in their code.\n\nWhen you have an operation that is compiled into just one single assembly instruction, this operation might be atomic,\nbecause is just one assembly instruction. But this is not guaranteed. This is usually true for old CPU architectures (such as `x86`). But nowadays, most\nassembly instructions in modern CPU architectures turn into multiple micro-tasks, which inherently makes the operation not atomic anymore,\neven though it has just one single assembly instruction.\n\nThe Zig Standard Library offers some atomic functionality at the `std.atomic` module.\nIn this module, you will find a public and generic function called `Value()`. With this function we create an \"atomic object\", which is\na value that contains some native atomic operations, most notably, a `load()` and a `fetchAdd()` operation.\nIf you have experience with multithreading in C++, you probably have recognized this pattern. So yes, this generic\n\"atomic object\" in Zig is essentially identical to the template struct `std::atomic` from the C++ Standard Library.\nIs important to emphasize that only primitive data types (i.e. the types presented at @sec-primitive-data-types)\nare supported by these atomic operations.\n\n\n\n\n\n### Data races and race conditions\n\nTo understand why mutexes are used, we need to understand better the problem that they seek\nto solve, which can be summarized into data races problems. A data race problem is a type of a race condition,\nwhich happens when one thread is accessing a particular memory location (i.e. a particular shared object) at the same\ntime that another thread is trying to write/save new data into this same memory location (i.e. the same shared object).\n\nWe can simply define a race condition as any type of bug in your program that is based\non a \"who get's there first\" problem. A data race problem is a type of a race condition, because it occurs when two or more parties\nare trying to read and write into the same memory location at the same time, and, therefore, the end result of this operation\ndepends completely on who get's to this memory location first.\nAs consequence, a program that have a data race problem will likely produce a different result each time that we execute it.\n\nThus, race conditions produce unefined behaviour and unpredictability because the program produces\na different answer in each time that a different person get's to the target location first than the others.\nAnd we have no easy way to either predict or control who is going to get to this target location first.\nIn other words, in each execution of your program,\nyou get a different answer, because a different person, or, a different function, or, a different part of the code is finishing\nits tasks (or it is reaching a location) first than the others.\n\nAs an example, consider the code snippet exposed below. In this example, we create a global counter\nvariable, and we also create a `increment()` function, whose job is to just increment this global counter\nvariable in a for loop.\n\nSince the for loop iterates 1 hundred thousand times, and, we create two separate threads\nin this code example, what number do you expect to see in the final message printed to `stdout`?\nThe answer should be 2 hundred thousand. Right? Well, in threory, this program was supposed\nto print 2 hundred thousand at the end, but in practice, every time that I execute this program\nI get a different answer.\n\nIn the example exposed below, you can see that this time we have executed the program, the end\nresult was 117254, instead of the expected 200000. The second time I have executed this program,\nI got the number 108592 as result. So the end result of this program is varying, but it never gets\nto the expected 200000 that we want.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// Global counter variable\nvar counter: usize = 0;\n// Function to increment the counter\nfn increment() void {\n for (0..100000) |_| {\n counter += 1;\n }\n}\n\npub fn main() !void {\n const thr1 = try Thread.spawn(.{}, increment, .{});\n const thr2 = try Thread.spawn(.{}, increment, .{});\n thr1.join();\n thr2.join();\n try stdout.print(\"Couter value: {d}\\n\", .{counter});\n}\n```\n:::\n\n\n\n\n```\nCouter value: 117254\n```\n\n\nWhy this is happening? The answer is: because this program contains a data race problem.\nThis program would print the correct number 200000, if, and only if the first thread finishs\nit's tasks before the second thread starts to execute. But that is very unlikely to happen.\nBecause the process of creating the thread is too fast, and therefore, both threads starts to execute roughly\nat the same time. If you change this code to add some nanoseconds of sleep between the first and the second calls to `spawn()`,\nyou will increase the chances of the program producing the \"correct result\".\n\nSo the data race problem happens, because both threads are reading and writing to the same\nmemory location at roughly the same time. In this example, each thread is essentially performing\nthree basic operations at each iteration of the for loop, which are:\n\n1. reading the current value of `count`.\n1. incrementing this value by 1.\n1. writing the result back into `count`.\n\nIdeally, a thread B should read the value of `count`, only after the other thread A has finished\nwriting the incremented value back into the `count` object. Therefore, in the ideal scenario, which is demonstrated\nat @tbl-data-race-ideal, the threads should work in sync with each other. But the reality is that these\nthreads are out of sync, and because of that, they suffer from a data race problem, which is demonstrated\nat @tbl-data-race-not.\n\nNotice that, in the data race scenario (@tbl-data-race-not), the read performed by a thread B happens\nbefore the write operation of thread A, and that ultimately leads to wrong results at the end of the program.\nBecause when the thread B reads the value from the `count` variable, the thread A is still processing\nthe initial value from `count`, and it did not write the new and incremented value into `count` yet. So what\nhappens is that thread B ends up reading the same initial value (the \"old\" value) from `count`, instead of\nreading the new and incremented version of this value that would be calculated by thread A.\n\n\n::: {#tbl-data-race-ideal}\n\n| Thread 1 | Thread 2 | Integer value |\n|-------------|-------------|---------------|\n| read value | | 0 |\n| increment | | 1 |\n| write value | | 1 |\n| | read value | 1 |\n| | increment | 2 |\n| | write value | 2 |\n\n: An ideal scenario for two threads incrementing the same integer value\n:::\n\n::: {#tbl-data-race-not}\n\n| Thread 1 | Thread 2 | Integer value |\n|-------------|-------------|---------------|\n| read value | | 0 |\n| | read value | 0 |\n| increment | | 1 |\n| | increment | 1 |\n| write value | | 1 |\n| | write value | 1 |\n\n: A data race scenario when two threads are incrementing the same integer value\n:::\n\n\nIf you think about these diagrams exposed in form of tables, you will notice that they relate back to our discussion of atomic operations\nat @sec-atomic-operation. Remember, atomic operations are operations that the CPU executes\nfrom beginning to end, without interruptions from other threads or processes. So,\nthe scenario exposed at @tbl-data-race-ideal do not suffer from a data race, because\nthe operations performed by thread A are not interrupted in the middle by the operations\nfrom thread B.\n\nIf we also think about the discussion of critical section from @sec-critical-section, we can identify\nthe section that representes the critical section of the program, which is the section that is vulnerable\nto data race conditions. In this example, the critical section of the program is the line where we increment\nthe `counter` variable (`counter += 1`). So, ideally, we want to use a mutex, and lock right before this line, and then,\nunlock right after this line.\n\n\n\n\n### Using mutexes in Zig\n\nNow that we know the problem that mutexes seek to solve, we can learn how to use them in Zig.\nMutexes in Zig are available through the `std.Thread.Mutex` struct from the Zig Standard Library.\nIf we take the same code example from the previous example, and improve it with mutexes, to solve\nour data race problem, we get the code example exposed below.\n\nNotice that we had this time to alter the `increment()` function to receive a pointer to\nthe `Mutex` object as input. All that we need to do, to make this program safe against\ndata race problems, is to call the `lock()` method at the beginning of\nthe critical section, and then, call `unlock()` at the end of the critical section.\nNotice that the output of this program is now the correct number of 200000.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Thread = std.Thread;\nconst Mutex = std.Thread.Mutex;\nvar counter: usize = 0;\nfn increment(mutex: *Mutex) void {\n for (0..100000) |_| {\n mutex.lock();\n counter += 1;\n mutex.unlock();\n }\n}\n\npub fn main() !void {\n var mutex: Mutex = .{};\n const thr1 = try Thread.spawn(.{}, increment, .{&mutex});\n const thr2 = try Thread.spawn(.{}, increment, .{&mutex});\n thr1.join();\n thr2.join();\n try stdout.print(\"Couter value: {d}\\n\", .{counter});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCouter value: 200000\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n## Read/Write locks\n\nMutexes are normally used when is always not safe to have two or more threads running the same\npiece of code at the same time. In contrast, read/write locks are normally used in situations\nwhere you have a mixture of scenarios, i.e. there are some pieces of the codebase that are safe to run in parallel, and other pieces that\nare not safe.\n\nFor example, suppose that you have multiple threads that uses the same shared file in the filesystem to store some configurations, or,\nstatistics. If two or more threads try to read the data from this same file at the same time, nothing bad happens.\nSo this part of the codebase is perfectly safe to be executed in parallel, with multiple threads reading the same file at the same time.\n\nHowever, if two or more threads try to write data into this same file at the same time, then, we cause some race conditions\nproblems. So this other part of the codebase is not safe to be executed in parallel.\nMore specifically, a thread might end up writing data in the middle of the data written by the other thread.\nThis process of two or more threads writing to the same location, might lead to data corruption.\nThis specific situation is usually called of a *torn write*.\n\nThus, what we can extract from this is that there is certain types of operations that causes a race condition, but there\nare also, other types of operations that do not cause a race condition problem.\nYou could also say that, there are types of operations that are susceptible to race condition problems,\nand there are other types of operations that are not.\n\nA read/write lock is a type of lock that acknowledges the existance of this specific scenario, and you can\nuse this type of lock to control which parts of the codebase are safe to run in parallel, and which parts are not safe.\n\n\n\n### Exclusive lock vs shared lock\n\nTherefore, a read/write lock is a little different from a mutex. Because a mutex is always an *exclusive lock*, meaning that, only\none thread is allowed to execute at all times. With an exclusive lock, the other threads are always \"excluded\",\ni.e. they are always blocked from executing.\nBut in a read/write lock, the other threads might be authorized to run at the same time, depending on the type of lock that they acquire.\n\nWe have two types of locks in a read/write lock, which are: an exclusive lock and a shared lock. An exclusive lock works exactly the same\nas a mutex, while a shared lock is a lock that does not block the other threads from running.\nIn the `pthreads` C library, read/write locks are available through the `pthread_rwlock_t` C struct. With\nthis C struct, you can create a \"write lock\", which corresponds to an exclusive lock, or, you can create a \"read lock\",\nwhich corresponds to a shared lock. The terminology might be a little different, but the meaning is the same,\nso just remember this relationship, write locks are exclusive locks, while read locks are shared locks.\n\nWhen a thread tries to acquire a read lock (i.e. a shared lock), this thread get's the shared lock\nif, and only if another thread does not currently holds a write lock (i.e. an exclusive lock), and also,\nif there are no other threads that are already in the queue,\nwaiting for their turn to acquire a write lock. In other words, the thread in the queue have attempted\nto get a write lock earlier, but this thread was blocked\nbecause there was another thread running that already had a write lock. As consequence, this thread is on the queue to get a write lock,\nand it's currently waiting for the other thread with a write lock to finish it's execution.\n\nWhen a thread tries to acquire a read lock, but it fails in acquiring this read lock, either because there is\na thread with a write lock already running, or, because there is a thread in the queue to get a write lock,\nthe execution of this thread is instantly blocked, i.e. paused. This thread will indefinitely attempt to get the\nread lock, and it's execution will be unblocked (or unpaused) only after this thread successfully acquires the read lock.\n\nIf you think deeply about this dynamic between read locks versus write locks, you might notice that a read lock is basically a safety mechanism.\nMore specifically, it is a way for us to\nallow a particular thread to run together with the other threads, only when it's safe to. In other words, if there is currently\na thread with a write lock running, then, it is very likely not safe for the thread that is trying to acquire the read lock to run now.\nAs consequence, the read lock protects this thread from running into dangerous waters, and patienly waits for the\n\"write lock\" thread to finishs it's tasks before it continues.\n\nOn the other hand, if there are only \"read lock\" (i.e. \"shared lock\") threads currently running\n(i.e. not a single \"write lock\" thread currently exists), then,\nis perfectly safe for this thread that is acquiring the read lock to run in parallel with the other\nthreads. As a result, the read lock just\nallows for this thread to run together with the other threads.\n\nThus, by using read locks (shared locks) in conjunction with write locks (exclusive locks), we can control which regions or sections\nof our multithreaded code is safe for us to have parallelism, and which sections are not safe to have parallelism.\n\n\n\n\n\n### Using read/write locks in Zig\n\nThe Zig Standard Library supports read/write locks through the `std.Thread.RwLock` module.\nIf you want to a particular thread to acquire a shared lock (i.e. a read lock), you should\ncall the `lockShared()` method from the `RwLock` object. But, if you want for this thread\nto acquire an exclusive lock (i.e. a write lock) instead, then, you should call the\n`lock()` method from the `RwLock` object.\n\nAs with mutexes, we also have to unlock the shared or exclusive locks that we acquire through a read/write lock object,\nonce we are at the end of our \"critical section\". If you have acquired an exclusive lock, then, you unlock\nthis exclusive lock by calling the `unlock()` method from the read/write lock object. In contrast,\nif you have acquired a shared lock instead, then, call `unlockShared()` to unlock this shared lock.\n\nAs a simple example, the code below creates three separate threads responsible for reading the\ncurrent value in a `counter` object, and it also creates another thread, responsible for writing\nnew data into the `counter` object (incrementing it, more specifically).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar counter: u32 = 0;\nfn reader(lock: *RwLock) !void {\n while (true) {\n lock.lockShared();\n const v: u32 = counter;\n try stdout.print(\"{d}\", .{v});\n lock.unlockShared();\n std.time.sleep(2 * std.time.ns_per_s);\n }\n}\nfn writer(lock: *RwLock) void {\n while (true) {\n lock.lock();\n counter += 1;\n lock.unlock();\n std.time.sleep(2 * std.time.ns_per_s);\n }\n}\n\npub fn main() !void {\n var lock: RwLock = .{};\n const thr1 = try Thread.spawn(.{}, reader, .{&lock});\n const thr2 = try Thread.spawn(.{}, reader, .{&lock});\n const thr3 = try Thread.spawn(.{}, reader, .{&lock});\n const wthread = try Thread.spawn(.{}, writer, .{&lock});\n\n thr1.join();\n thr2.join();\n thr3.join();\n wthread.join();\n}\n```\n:::\n\n\n\n\n\n## Yielding a thread\n\nThe `Thread` struct supports yielding through the `yield()` method.\nYielding a thread means that the execution of the thread is temporarily stopped,\nand the thread comes back to the end of the queue of priority of the scheduler from\nyour operating system.\n\nThat is, when you yield a thread, you are essentially saying the following to your OS:\n\"Hey! Could you please stop executing this thread for now, and comeback to continue it later?\".\nYou could also interpret this yield operation as: \"Could you please deprioritize this thread,\nto focus on doing other things instead?\".\nSo this yield operation is also a way for you\nto stop a particular thread, so that you can work and prioritize other threads instead.\n\nIs important to say that, yielding a thread is a \"not so commom\" thread operation these days.\nIn other words, not many programmers use yielding in production, simply because is hard to use\nthis operation and make it work properly, and also, there\nare better alternatives. Most programmers prefer to use `join()` instead.\nIn fact, most of the times, when you see somebody using yield in some code example, they are mostly using it to help them\ndebug race conditions in their applications. That is, yield is mostly used as a debug tool nowadays.\n\nAnyway, if you want to yield a thread, just call the `yield()` method from it, like this:\n\n```zig\nthread.yield();\n```\n\n\n\n\n\n\n## Common problems in threads\n\n\n\n### Deadlocks\n\nA deadlock occurs when two or more threads are blocked forever,\nwaiting for each other to release a resource. This usually happens when multiple locks are involved,\nand the order of acquiring them is not well managed.\n\nThe code example below demonstrates a deadlock situation. We have two different threads that execute\ntwo different functions (`work1()` and `work2()`) in this example. And we also have two separate\nmutexes. If you compile and run this code example, you will notice that the program just runs indefinitely,\nwithout ending.\n\nWhen we look into the first thread, which executes the `work1()` function, we can\nnotice that this function acquires the `mut1` lock first. Because this is the first operation\nthat is executed inside this thread, which is the first thread created in the program.\nAfter that, the function sleeps for 1 second, to\nsimulate some type of work, and then, the function tries to acquire the `mut2` lock.\n\nOn the other hand, when we look into the second thread, which executes the `work2()` function,\nwe can see that this function acquires the `mut2` lock first. Because when this thread get's created and it tries\nto acquire this `mut2` lock, the first thread is still sleeping on that \"sleep 1 second\" line.\nAfter acquiring `mut2`, the `work2()` function also sleeps for 1 second, to\nsimulate some type of work, and then, the function tries to acquire the `mut1` lock.\n\nThis creates a deadlock situation, because after the \"sleep for 1 second\" line in both threads,\nthe thread 1 is trying to acquire the `mut2` lock, but this lock is currently being used by thread 2.\nHowever, at this moment, the thread 2 is also trying to acquire the `mut1` lock, which is currently\nbeing used by thread 1. Therefore, both threads end up waiting for ever. Waiting for their peer to\nfree the lock that they want to acquire.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar mut1: Mutex = .{}; var mut2: Mutex = .{};\nfn work1() !void {\n mut1.lock();\n std.time.sleep(1 * std.time.ns_per_s);\n mut2.lock();\n _ = try stdout.write(\"Doing some work 1\\n\");\n mut2.unlock(); mut1.unlock();\n}\n\nfn work2() !void {\n mut2.lock();\n std.time.sleep(1 * std.time.ns_per_s);\n mut1.lock();\n _ = try stdout.write(\"Doing some work 1\\n\");\n mut1.unlock(); mut2.unlock();\n}\n\npub fn main() !void {\n const thr1 = try Thread.spawn(.{}, work1, .{});\n const thr2 = try Thread.spawn(.{}, work2, .{});\n thr1.join();\n thr2.join();\n}\n```\n:::\n\n\n\n\n\n### Not calling `join()` or `detach()` {#sec-not-call-join-detach}\n\nWhen you do not call either `join()` or `detach()` over a thread, then, this thread becomes a \"zombie thread\",\nbecause it does not have a clear \"return point\".\nYou could also interpret this as: \"nobody is properly resposible for managing the thread\".\nWhen we don't establish if a thread is either *joinable* or *detached*,\nnobody becomes responsible for dealing with the return value of this thread, and also,\nnobody becomes responsible for clearing (or freeing) the resources associated with this thread.\n\nYou don't want to be in this situation, so remember to always use `join()` or `detach()`\non the threads that you create. When you don't use these methods, the execution of the thread\nbecomes completely independent from the execution of the main process in your program.\nThis means that the main process of your program might end before the thread finish it's job,\nor vice-versa. The idea is that we have no idea of who is going to finish first. It\nbecomes a race condition problem.\nIn such case, we loose control over this thread, and it's resources are never freed\n(i.e. you have leaked resources in the system).\n\n\n### Cancelling or killing a particular thread\n\nWhen we think about the `pthreads` C library, there is a possible way to asynchronously kill or cancel\na thread, which is by sending a `SIGTERM` signal to the thread through the `pthread_kill()` function.\nBut canceling a thread like this is bad. Is dangerously bad. As consequence, the Zig implementation\nof threads does not have a similar function, or, a similar way to asynchronously cancel or kill\na thread.\n\nTherefore, if you want to cancel a thread in the middle of it's execution in Zig,\nthen, one good strategy that you can take is to use control flow in your favor in conjunction with `join()`.\nMore specifically, you can design your thread around a while loop, that is constantly\nchecking if the thread should continue running.\nIf is time to cancel the thread, we could make the while loop break, and join the thread with the main thread\nby calling `join()`.\n\nThe code example below demonstrates to some extent this strategy.\nHere, we are using control flow to break the while loop, and exit the thread earlier than\nwhat we have initially planned to. This example also demonstrates how can we use\natomic objects in Zig with the `Value()` generic function that we have mentioned at @sec-atomic-operation.\n\n\n```zig\nvar running = std.atomic.Value(bool).init(true);\nvar counter: u64 = 0;\nfn do_more_work() void {\n std.time.sleep(2 * std.time.ns_per_s);\n}\nfn work() !void {\n while (running.load(.monotonic)) {\n for (0..10000) |_| { counter += 1; }\n if (counter < 15000) {\n _ = try stdout.write(\"Time to cancel the thread.\\n\");\n running.store(false, .monotonic);\n } else {\n _ = try stdout.write(\"Time to do more work.\\n\");\n do_more_work();\n running.store(false, .monotonic);\n }\n }\n}\n\npub fn main() !void {\n const thread = try Thread.spawn(.{}, work, .{});\n thread.join();\n}\n```\n\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/Chapters/14-zig-c-interop/execute-results/html.json b/_freeze/Chapters/14-zig-c-interop/execute-results/html.json index 9d459347..246bfa20 100644 --- a/_freeze/Chapters/14-zig-c-interop/execute-results/html.json +++ b/_freeze/Chapters/14-zig-c-interop/execute-results/html.json @@ -3,9 +3,7 @@ "result": { "engine": "knitr", "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Zig interoperability with C\n\nIn this chapter, we are going to discuss the interoperability of Zig with C.\nWe have discussed at @sec-building-c-code how to build C code using the `zig` compiler.\nBut we haven't discussed yet how to actually use C code in Zig. In other words,\nwe haven't discussed yet how to call and use C code from Zig.\n\nThese matters are discussed here, in this chapter.\nAlso, in our next small project in this book, we are going to use a C library in it.\nAs consequence, we will put in practice a lot of the knowledge discussed here on\nthis next project.\n\n\n## How to call C code from Zig\n\nInterop with C is not something new. Most high-level programming languages have FFI (foreign function interfaces),\nwhich can be used to call C code. For example, Python have Cython, R have `.Call()`, Javascript have `ccall()`, etc.\nBut Zig integrates with C in a deeper level, which affects not only the way that C code get's called, but also,\nhow this C code is compiled and incorporated into your Zig project.\n\nIn summary, Zig have great interoperability with C. But if you want to call C code from Zig\nyou will have to perform the following steps:\n\n- import a C header file into your Zig code.\n- link your Zig code with the C library.\n\nIn more details, you should begin by importing into your Zig code the C header file that describes the C functions\nthat you want to call. Which is pretty much the same thing that you would do in C, by including the header files into your C module.\nAfter you import the C header file, you can start calling and using the C functions described in this header file\ndirectly in your Zig code.\n\nEverytime you use a C library in your Zig code, **you introduce a dependency in your build process**.\nThis should come as no surprise to anyone that have any experience with C and C++.\nBecause this is no different in C. Everytime you use a C library in your C code, you also\nhave to build and link your C code with this C library that you are using.\n\nWhen we use a C library in our Zig code, the `zig` compiler needs to access the definition of the C functions that\nare being called in your Zig code. The C header file that we have imported into our Zig code provides the\ndeclarations of these C functions, but not their definitions. So, in order to access these definitions,\nthe `zig` compiler needs to build your Zig code and link it with the C library in the build process.\n\nAs we discussed across the @sec-build-system, there are different strategies to link something with a library.\nThis might involve building the C library first, and then, linking it with the Zig code. Or,\nit could also involve just the linking step, if this C library is already built and\ninstalled in your system. Anyway, if you have doubts about this, comeback to @sec-build-system.\n\n\n\n## Importing C header files {#sec-import-c-header}\n\nTo import a C header file into our Zig code, we use the built-in functions `@cInclude()` and `@cImport()`.\nInside the `@cImport()` function, we open a block (with a pair of curly braces). Inside this block\nwe can (if we need to) include multiple `@cDefine()` calls to define C macros when including this specific C header file.\nBut for the most part, you will probably need to use just a single call inside this block at `@cImport()`,\nwhich is a call to `@cInclude()`.\n\nThis `@cInclude()` function is equivalent to the `#include` statement in C.\nYou provide the name of the C header that you want to include as input to this `@cInclude()` function,\nthen, in conjunction with `@cImport()`, it will perform the necessary steps\nto include this C header file into your Zig code.\n\nYou should bind the result of `@cImport()` to a constant object, pretty much like you would do with\n`@import()`. You just assign the result to a constant object in your\nZig code, and, as consequence, all C functions, C structs, C macros, etc. that are defined inside the\nC header file will be available through this constant object.\n\nLook at the code example below, where we are importing the Standard I/O C Library (`stdio.h`),\nand calling the `printf()`[^printf] C function. Notice that we have also used in this example the C function `powf()`[^powf],\nwhich comes from the C Math Library (`math.h`).\nIn order to compile this example, you have to link this Zig code with both\nthe C Standard Library and the C Math Library, by passing the flags `-lc` and `-lm`\nto the `zig` compiler.\n\n[^printf]: \n[^powf]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cmath = @cImport({\n @cInclude(\"math.h\");\n});\nconst stdio = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n});\n\npub fn main() !void {\n const x: f32 = 15.2;\n const y = cmath.powf(x, @as(f32, 2.6));\n _ = stdio.printf(\"%.3f\\n\", y);\n}\n```\n:::\n\n\n\n\n```\n1182.478\n```\n\n\n## About passing Zig values to C functions {#sec-zig-obj-to-c}\n\nZig objects have some intrinsic differences between their C equivalents.\nProbably the most noticeable one is the difference between C strings and Zig strings,\nwhich I described at @sec-zig-strings.\nZig strings are objects that contains both an array of arbitrary bytes and a length value.\nOn the other hand, a C string is usually just a pointer to a null-terminated array of arbitrary bytes.\n\nBecause of these intrinsic differences, in some specific cases, you cannot pass Zig objects directly\nas inputs to C functions before you convert them into C compatible values. However, in some other cases,\nyou are allowed to pass Zig objects and Zig literal values directly as inputs to C functions,\nand everything will work just fine, because the `zig` compiler will handle everything for you.\n\nSo we have two different scenarios being described here. Let's call them \"auto-conversion\" and \"need-conversion\".\nThe \"auto-conversion\" scenario is when the `zig` compiler handles everything for you, and automatically convert your\nZig objects/values into C compatible values. In contrast,\nthe \"need-conversion\" scenario is when you, the programmer, have the responsibility of converting\nthat Zig object into a C compatible value, before passing it to C code.\n\nThere is also a third scenario that is not being described here, which is when you create a C object, or, a C struct, or\na C compatible value in your Zig code, and you pass this C object/value as input to a C function in your Zig code.\nThis scenario will be described later at @sec-c-inputs. In this section, we are focused on the scenarios where\nwe are passing Zig objects/values to C code, instead of C objects/values being passed to C code.\n\n\n### The \"auto-conversion\" scenario\n\nAn \"auto-conversion\" scenario is when the `zig` compiler automatically converts our Zig objects into\nC compatible values for us. This specific scenario happens mostly in two instances:\n\n- with string literal values;\n- with any of the primitive data types that were introduced at @sec-primitive-data-types.\n\nWhen we think about the second instance described above, the `zig` compiler does automatically\nconvert any of the primitive data types into their C equivalents, because the compiler knows how\nto properly convert a `i16` into a `signed short`, or, a `u8` into a `unsigned char`, etc.\nNow, when we think about string literal values, they can be automatically\nconverted into C strings as well, specially because the `zig` compiler does not forces\na specific Zig data type into a string literal at first glance, unless you store this\nstring literal into a Zig object, and explicitly annotate the data type of this object.\n\nThus, with string literal values, the `zig` compiler have more freedom to infer which is the appropriate data type\nto be used in each situation. You could say that the string literal value \"inherits it's data type\" depending on the context that\nit is used. Most of the times, this data type is going to be the type that we commonly associate with Zig strings (`[]const u8`).\nBut it might be a different type depending on the situation. When the `zig` compiler detects that you are providing\na string literal value as input to some C function, the compiler automatically interprets this string\nliteral as a C string value.\n\nAs an example, look at the code exposed below. Here we are using\nthe `fopen()` C function to simply open and close a file. If you do not know how this `fopen()`\nfunction works in C, it takes two C strings as input. But in this code example below, we are passing some\nstring literals written in our Zig code directly as inputs to this `fopen()` C function.\n\nIn other words, we are not doing any type of conversion from a Zig string to a C string.\nWe are just passing the Zig string literals directly as inputs to the C function. And it works just fine!\nBecause the compiler inteprets the string `\"foo.txt\"` as a C string, as a result of the current context\nthat this string literal is being used.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n});\n\npub fn main() !void {\n const file = c.fopen(\"foo.txt\", \"rb\");\n if (file == null) {\n @panic(\"Could not open file!\");\n }\n if (c.fclose(file) != 0) {\n return error.CouldNotCloseFileDescriptor;\n }\n}\n```\n:::\n\n\n\n\nLet's make some experiments, by writing the same code in different manners, and we\nsee how this affects the program. As a starting point, let's store the `\"foo.txt\"` string inside\na Zig object, like the `path` object below, and then, we pass this Zig object as input to the `fopen()` C function.\n\nIf we do this, the program still compiles and runs successfully. Notice that I have ommitted most of the code in this example below.\nThis is just for brevitty reasons, because the remainder of the program is still the same.\nThe only difference between this example and the previous example is just these two lines exposed below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\nNow, what happens if you give an explicit data type to the `path` object? Well, if I force\nthe `zig` compiler to interpret this `path` object as a Zig string object,\nby annotating the `path` object with the data type `[]const u8`, then, I actually get a compile error\nas demonstrated below. We get this compile error because now I'm forcing the `zig` compiler\nto interpret `path` as a Zig string object.\n\nAccording to the error message, the `fopen()` C function was expecting to receive an\ninput value of type `[*c]const u8` (C string) instead of a value of type `[]const u8` (Zig string).\nIn more details, the type `[*c]const u8` is actually the Zig type representation of a C string.\nThe `[*c]` portion of this type identifies a C pointer. So, this Zig type essentially means: a C pointer to an array (`[*c]`) of\nconstant bytes (`const u8`).\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n```\nt.zig:10:26: error: expected type '[*c]const u8', found '[]const u8'\n const file = c.fopen(path, \"rb\");\n ^~~~\n```\n\nTherefore, when we talk exclusively about string literal values, as long as you don't give an\nexplicit data type to these string literal values, the `zig` compiler should be capable of automatically\nconverting them into C strings as needed.\n\nBut what about using one of the primitive data types that were introduced at @sec-primitive-data-types?\nLet's take code exposed below as an example of that. Here, we are giving some float literal values as input\nto the C function `powf()`. Notice that this code example compiles and runs succesfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst cmath = @cImport({\n @cInclude(\"math.h\");\n});\n\npub fn main() !void {\n const y = cmath.powf(15.68, 2.32);\n try stdout.print(\"{d}\\n\", .{y});\n}\n```\n:::\n\n\n\n\n```\n593.2023\n```\n\nOnce again, because the `zig` compiler does not associate a specific data type with the literal values\n`15.68` and `2.32` at first glance, the compiler can automatically convert these values\ninto their C `float` (or `double`) equivalents, before it passes to the `powf()` C function.\nNow, even if I give an explicit Zig data type to these literal values, by storing them into a Zig object,\nand explicit annotating the type of these objects, the code still compiles and runs succesfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const x: f32 = 15.68;\n const y = cmath.powf(x, 2.32);\n // The remainder of the program\n```\n:::\n\n\n\n\n```\n593.2023\n```\n\n\n\n### The \"need-conversion\" scenario\n\nA \"need-conversion\" scenario is when we need to manually convert our Zig objects into C compatible values\nbefore passing them as input to C functions. You will fall in this scenario, when passing Zig string objects\nto C functions.\n\nWe already saw this specific circumstance on the last `fopen()` example,\nwhich is reproduced below. You can see in this example, that we have given an explicit Zig data type\n(`[]const u8`) to our `path` object, and, as a consequence of that, we have forced the `zig` compiler\nto see this `path` object, as a Zig string object. Because of that, we need now to manually convert\nthis `path` object into a C string before we pass it to `fopen()`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n```\nt.zig:10:26: error: expected type '[*c]const u8', found '[]const u8'\n const file = c.fopen(path, \"rb\");\n ^~~~\n```\n\n\nThere are different ways to convert a Zig string object into a C string.\nOne way to solve this problem is to provide the pointer to the underlying array\nof bytes, instead of providing the Zig object directly as input.\nYou can access this pointer by using the `ptr` property of the Zig string object.\n\nThe code example below demonstrates this strategy. Notice that, by giving the\npointer to the underlying array in `path` through the `ptr` property, we get no compile errors as result\nwhile using the `fopen()` C function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path.ptr, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\nThis strategy works because this pointer to the underlying array found in the `ptr` property,\nis semantically identical to a C pointer to a null-terminated array of bytes, i.e. a C object of type `*unsigned char`.\nThis is why this option also solves the problem of converting the Zig string into a C string.\n\nAnother option is to explicitly convert the Zig string object into a C pointer by using the\nbuilt-in function `@ptrCast()`. With this function we can convert\nan object of type `[]const u8` into an object of type `[*c]const u8`.\nAs I described at the previous section, the `[*c]` portion of the type\nmeans that it is a C pointer. This strategy is not-recommended. But it is\nuseful to demonstrate the use of `@ptrCast()`.\n\nYou may recall of the `@as()` built-in function, which is used to explicit convert (or cast) a Zig value from a type `x`\nto a type `y`, etc. That is, this `@as()` Zig function is equivalent to the\n`as` keyword in Rust, and the C type casting syntax (e.g. `(int) x`).\nBut in our case here, we are not converting any type of object.\nMore specifically, we are converting something into a pointer, or, a C pointer more specifically.\nEverytime a pointer is involved in some \"type casting operation\" in Zig,\nthe `@ptrCast()` function is involved. This `@ptrCast()` function is responsible\nfor converting a pointer of one type to a pointer of another type.\n\nIn the example below, we are using this function to cast our `path` object\ninto a C pointer to an array of bytes. Then, we pass this C pointer as input\nto the `fopen()` function. Notice that this code example compiles succesfully\nwith no errors.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const c_path: [*c]const u8 = @ptrCast(path);\n const file = c.fopen(c_path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n\n\n## Creating C objects in Zig {#sec-c-inputs}\n\nCreating C objects, or, in other words, creating instances of C structs in your Zig code\nis actually something quite easy to do. You first need to import the C header file (like I described at @sec-import-c-header) that describes\nthe C struct that you are trying to instantiate in your Zig code. After that, you can just\ncreate a new object in your Zig code, and annotate it with the C type of the struct.\n\nFor example, suppose we have a C header file called `user.h`, and that this header file is declaring a new struct named `User`.\nThis C header file is exposed below:\n\n```c\n#include \n\ntypedef struct\n{\n uint64_t id;\n char* name;\n} User;\n```\n\nThis `User` C struct have two distinct fields, or two struct members, named `id` and `name`.\nThe field `id` is a unsigned 64-bit integer value, while the field `name` is just a standard C string.\nNow, suppose that I want to create an instance of this `User` struct in my Zig code.\nI can do that by importing this `user.h` header file into my Zig code, and creating\na new object with type `User`. These steps are reproduced in the code example below.\n\nNotice that I have used the keyword `undefined` in this example. This allows me to\ncreate the `new_user` object without the need to provide an initial value to the object.\nAs consequence, the underlying memory associated with this `new_user` is unintialized,\ni.e. the memory is currently populated with \"garbage\" values.\nThus, this expression have the exact same effect of the expression `User new_user;` in C,\nwhich means \"declare a new object named `new_user` of type `User`\".\n\nIs our responsibility to properly initialize this memory associated with this `new_user` object,\nby assigining valid values to the members (or the fields) of the C struct. In the example below, I am assigning the integer 1 to the\nmember `id`. I am also saving the string `\"pedropark99\"` into the member `name`.\nNotice in this example that I manually add the null character (zero byte) to the end of the allocated array\nfor this string. This null character marks the end of the array in C.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst c = @cImport({\n @cInclude(\"user.h\");\n});\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n\n var new_user: c.User = undefined;\n new_user.id = 1;\n var user_name = try allocator.alloc(u8, 12);\n defer allocator.free(user_name);\n @memcpy(user_name[0..(user_name.len - 1)], \"pedropark99\");\n user_name[user_name.len - 1] = 0;\n new_user.name = user_name.ptr;\n}\n```\n:::\n\n\n\n\nSo, in this example above, we are manually initializing each field of the C struct.\nWe could say that, in this instance, we are \"manually instantiating\nthe C struct object\". However, when we use C libraries in our Zig code, we rarely need\nto manually instantiate the C structs like in the above example. Only because C libraries\nusually provide \"constructor functions\" in their public APIs. As consequence, we normally rely on\nthese constructor functions to properly initialize the C structs, and\nthe struct fields for us.\n\nFor example, consider the Harfbuzz C library. This a text shaping C library,\nand it works around a \"buffer object\", or, more specifically, an instance of\nthe C struct `hb_buffer_t`. Therefore, we need to create an instance of\nthis C struct if we want to use this C library. Luckily, this library offers\nthe function `hb_buffer_create()`, which we can use to create such object.\nSo the Zig code necessary to create such object would probably look something like this:\n\n```zig\nconst c = @cImport({\n @cInclude(\"hb.h\");\n});\nvar buf: c.hb_buffer_t = c.hb_buffer_create();\n// Do stuff with the \"buffer object\"\n```\n\nTherefore, we do not need to manually create an instance of the C struct\n`hb_buffer_t` here, and manually assign valid values to each field in this C struct.\nBecause the constructor function `hb_buffer_create()` is doing this heavy job for us.\n\nSince this `buf` object (and also the `new_user` object) is an instance of a C struct, this\nobject is, in itself, a C compatible value. It is a C object defined in our Zig code. As consequence,\nyou can freely pass this object as input to any C function that expects to receive this type\nof C struct as input. You do not need to use any special syntax, or, to convert this object in\nany special manner to use it in C code.\nThis is how we create and use C objects in our Zig code.\n\n\n\n## Passing C structs across Zig functions {#sec-pass-c-structs}\n\nNow that we have learned how to create/declare C objects in our Zig code, we\nneed to learn how to pass these C objects as inputs to Zig functions.\nAs I described at @sec-c-inputs, we can freely pass these C objects as inputs to C code\nthat we call from our Zig code. But what about passing these C objects as inputs to Zig functions?\n\nIn essence, this specific case requires one small adjustment in the Zig function declaration.\nAll you need to do, is to make sure that you pass your C object *by reference* to the function,\ninstead of passing it *by value*. To do that, you have to annotate the data type of the function argument\nthat is receiving this C object as \"a pointer to the C struct\", instead of annotating it as \"an instance of the C struct\".\n\nLet's consider the C struct `User` from the `user.h` C header file that we have used at @sec-c-inputs.\nNow, consider that we want to create a Zig function that sets the value of the `id` field\nin this C struct, like the `set_user_id()` function declared below.\nNotice that the `user` argument in this function is annotated as a pointer (`*`) to a `c.User` object.\n\nTherefore, essentially, all you have to do when passing C objects to Zig functions, is to add `*` to the\ndata type of the function argument that is receiving the C object. This will make sure that\nthe C object is passed *by reference* to the function.\n\nNow, because we have transformed the function argument into a pointer,\neverytime that you have to access the value pointed by the input pointer inside the function body, for whatever reason (e.g. you want\nto read, update, or delete this value), you have to dereference the pointer with the `.*` syntax that we\nlearned from @sec-pointer. Notice that the `set_user_id()` function is using this syntax to alter\nthe value in the `id` field of the `User` struct pointed by the input pointer.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst c = @cImport({\n @cInclude(\"user.h\");\n});\nfn set_user_id(id: u64, user: *c.User) void {\n user.*.id = id;\n}\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n\n var new_user: c.User = undefined;\n new_user.id = 1;\n var user_name = try allocator.alloc(u8, 12);\n defer allocator.free(user_name);\n @memcpy(user_name[0..(user_name.len - 1)], \"pedropark99\");\n user_name[user_name.len - 1] = 0;\n new_user.name = user_name.ptr;\n\n set_user_id(25, &new_user);\n try stdout.print(\"New ID: {any}\\n\", .{new_user.id});\n}\n```\n:::\n\n\n\n\n```\nNew ID: 25\n```\n\n", - "supporting": [ - "14-zig-c-interop_files" - ], + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json index 9fc44143..123378ea 100644 --- a/_freeze/index/execute-results/html.json +++ b/_freeze/index/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "59aac007bbff258bb5af0e77915366c8", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"./Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n::: {.content-visible when-format=\"html\"}\n\n# Welcome {.unnumbered}\n\nWelcome! This is the initial page for the \"Open Access\" HTML version of the book \"Introduction to Zig: a project-based book\",\nwritten by [Pedro Duarte Faria](https://pedro-faria.netlify.app/).\nThis is an open book that provides an introduction to the [Zig programming language](https://ziglang.org/),\nwhich is a new general-purpose, and low-level language for building robust and optimal software.\n\n## About this book {.unnumbered}\n\nThis an open book, meaning that, it is open-source, and it will always be open\nfor anyone that wants to read it. However, this book is still under construction 🚧 and active development,\nso, it's contents might change drastically in the near future.\n\nAlso, this is a project-based book, which means that we learn how to use the Zig programming language\nthrough small and simple projects, in a similar style to the famous \"Python Crash Course\" book from Eric Matthes.\n\nOfficial book's repository: \n\n:::\n\n\n## About the author {.unnumbered}\n\nPedro Duarte Faria have a bachelor degree in Economics from Federal University of Ouro Preto - Brazil.\nCurrently, he is a Data Platform Engineer at [Blip](https://www.blip.ai/en/)[^blip], and\nan Associate Developer for Apache Spark 3.0 certified by Databricks.\n\n[^blip]: \n\n\nThe author have more than 4 years of experience in the data industry. Developing data products, pipelines,\nreports and analysis for research institutions and some of the largest companies in the\nbrazilian financial sector, such as the BMG Bank, Sodexo and Pan Bank.\n\nBut Pedro is also a passionate software developer that loves to\nlearn and teach about programming.\nAlthough Pedro uses many different languages in his work, he is specialized in the R programming language, and have given several\nlectures and courses about it, inside graduate centers (such as PPEA-UFOP^[]),\nin addition to federal and state organizations (such as FJP-MG^[]).\n\n\nPersonal Website: \n\nLinkedin: \n\nMastodon: [\\@pedropark99\\@fosstodon.org](https://fosstodon.org/@pedropark99)\n\nTwitter (X): [\\@PedroPark9](https://twitter.com/PedroPark9)\n\n## License {.unnumbered}\n\nCopyright © 2024 Pedro Duarte Faria. This book is licensed by the [CC-BY 4.0 Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/)[^cc-license].\n\n[^cc-license]: \n\n![](Figures/creative-commoms-88x31.png){width=88px}\n\n\n## Book compilation metadata {.unnumbered}\n\nThis book was compiled using the following versions of [Zig](https://ziglang.org) and [Quarto](https://quarto.org):\n\n\n\n\n\n- System version: Linux, 6.8.0-41-generic, NA, x86_64.\n\n- Zig version: 0.14.0-dev.1166+bb7050106.\n\n- Quarto version: 1.5.56.\n\n\n\n\n\n## Book citation {.unnumbered}\n\nYou can use the following BibTex entry to cite this book:\n\n```\n@book{pedro2024,\n author = {Pedro Duarte Faria},\n title = {Introduction to Zig},\n subtitle = {a project-based book},\n month = {December},\n edition = {1},\n year = {2024},\n address = {Belo Horizonte},\n url = {https://pedropark99.github.io/zig-book/}\n}\n```\n\n## Corresponding author and maintainer {.unnumbered}\n\nPedro Duarte Faria\n\nContact: [pedropark99\\@gmail.com](mailto:pedropark99@gmail.com)\n\nPersonal website: \n", + "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"./Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n::: {.content-visible when-format=\"html\"}\n\n# Welcome {.unnumbered}\n\nWelcome! This is the initial page for the \"Open Access\" HTML version of the book \"Introduction to Zig: a project-based book\",\nwritten by [Pedro Duarte Faria](https://pedro-faria.netlify.app/).\nThis is an open book that provides an introduction to the [Zig programming language](https://ziglang.org/),\nwhich is a new general-purpose, and low-level language for building robust and optimal software.\n\n## About this book {.unnumbered}\n\nThis an open book, meaning that, it is open-source, and it will always be open\nfor anyone that wants to read it. However, this book is still under construction 🚧 and active development,\nso, it's contents might change drastically in the near future.\n\nAlso, this is a project-based book, which means that we learn how to use the Zig programming language\nthrough small and simple projects, in a similar style to the famous \"Python Crash Course\" book from Eric Matthes.\n\nOfficial book's repository: \n\n:::\n\n\n## About the author {.unnumbered}\n\nPedro Duarte Faria have a bachelor degree in Economics from Federal University of Ouro Preto - Brazil.\nCurrently, he is a Data Platform Engineer at [Blip](https://www.blip.ai/en/)[^blip], and\nan Associate Developer for Apache Spark 3.0 certified by Databricks.\n\n[^blip]: \n\n\nThe author have more than 4 years of experience in the data industry. Developing data products, pipelines,\nreports and analysis for research institutions and some of the largest companies in the\nbrazilian financial sector, such as the BMG Bank, Sodexo and Pan Bank.\n\nBut Pedro is also a passionate software developer that loves to\nlearn and teach about programming.\nAlthough Pedro uses many different languages in his work, he is specialized in the R programming language, and have given several\nlectures and courses about it, inside graduate centers (such as PPEA-UFOP^[]),\nin addition to federal and state organizations (such as FJP-MG^[]).\n\n\nPersonal Website: \n\nLinkedin: \n\nMastodon: [\\@pedropark99\\@fosstodon.org](https://fosstodon.org/@pedropark99)\n\nTwitter (X): [\\@PedroPark9](https://twitter.com/PedroPark9)\n\n## License {.unnumbered}\n\nCopyright © 2024 Pedro Duarte Faria. This book is licensed by the [CC-BY 4.0 Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/)[^cc-license].\n\n[^cc-license]: \n\n![](Figures/creative-commoms-88x31.png){width=88px}\n\n\n## Book compilation metadata {.unnumbered}\n\nThis book was compiled using the following versions of [Zig](https://ziglang.org) and [Quarto](https://quarto.org):\n\n\n\n\n- System version: Linux, 6.8.0-44-generic, NA, x86_64.\n\n- Zig version: 0.14.0-dev.1166+bb7050106.\n\n- Quarto version: 1.5.56.\n\n\n\n\n## Book citation {.unnumbered}\n\nYou can use the following BibTex entry to cite this book:\n\n```\n@book{pedro2024,\n author = {Pedro Duarte Faria},\n title = {Introduction to Zig},\n subtitle = {a project-based book},\n month = {December},\n edition = {1},\n year = {2024},\n address = {Belo Horizonte},\n url = {https://pedropark99.github.io/zig-book/}\n}\n```\n\n## Corresponding author and maintainer {.unnumbered}\n\nPedro Duarte Faria\n\nContact: [pedropark99\\@gmail.com](mailto:pedropark99@gmail.com)\n\nPersonal website: \n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_quarto.yml b/_quarto.yml index 6acfbeb6..f762c283 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -24,6 +24,7 @@ book: - Chapters/12-file-op.qmd - Chapters/14-zig-c-interop.qmd - Chapters/13-image-filter.qmd + - Chapters/14-threads.qmd - references.qmd bibliography: references.bib diff --git a/contributors.txt b/contributors.txt new file mode 100644 index 00000000..578eb4cc --- /dev/null +++ b/contributors.txt @@ -0,0 +1,2 @@ +name,user_name, +Calin Martinconi,@martinconic, diff --git a/docs/Chapters/01-base64.html b/docs/Chapters/01-base64.html index 5bc8ac37..131e2b36 100644 --- a/docs/Chapters/01-base64.html +++ b/docs/Chapters/01-base64.html @@ -289,6 +289,12 @@ 15  Project 4 - Developing an image filter + + + + + + + + + + + + + + + +