Skip to content

Latest commit

 

History

History
228 lines (174 loc) · 5.16 KB

18_strings.md

File metadata and controls

228 lines (174 loc) · 5.16 KB

Strings

The slice string type &str points to a UTF-8 valid collection of bytes of type &[u8].

Tye String type is a UTF-8 valid wrapper around Vec<u8> that contains utility methods for string manipulation:

fn main() {
    let mut s = String::new();

    for c in "Hello".chars() {
        s.push(c);
    }
    s.push_str(", world!");

    println!("{}", s);
}

Characters

The char type is a 4-byte primitive type that holds a single Unicode code point. These code points form graphemes, either individually, or as grapheme clusters:

fn main() {
    let chars: &[char] = &['न', 'म', 'स', '्', 'त', 'े'];
    let graphemes = ["न", "म", "स्", "ते"]; //'स', '्' makes "स्", 'त', 'े' makes "ते"
}

Individual string characters can be iterated using the chars method:

fn main() {
    for c in "नमस्ते".chars() {
        println!("{}", c); // prints न म स ् त े
    }
}

Individual characters take up more space than strings, because char is always 4-bytes in size, compared to many string characters being 1 to 3-bytes in size.

Indexing

Indexing strings is ambiguous, because it is not clear whether bytes or chars are being indexed. For this reason, indexing strings is done explicitly via:

  • .chars().nth(i) for chars
  • .bytes().nth(i) for bytes
fn main() {
    let ciao = "Здравствуйте";

    // prints 12 characters
    for i in 0..ciao.chars().count() {
        println!("ciao.chars().nth({}) = {}", i, ciao.chars().nth(i).unwrap());
    }

    // prints 24 bytes
    for i in 0..ciao.len() {
        println!("ciao.bytes().nth({}) = {}", i, ciao.bytes().nth(i).unwrap());
    }
}

Note that the len method returns the number of bytes of a string, not chars.

Escaping

The \ character is used for escaping. To write a literal \, it has to be escaped with \\. String or character literal delimeters within a literal must be escaped:

fn main() {
    println!("backslash: \\");
    println!("chars: {}", '\'');
    println!("strings: {}", "\"");
}

Escaping can be used for writing bytes by their hexadecimal value, or Unicode code points:

fn main() {
    println!("how about \x74\x68\x65\x20\x67\x61\x6d\x65");  // bytes
    println!("Unicode char U+211D is \u{211D}");             // Unicode
}

Escaping allows writing multiline strings with escaped whitespace:

fn main() {
    let s = "Did your \
    mother fuck \
    a snowman?";

    println!("{}", s);
}

Raw strings

Useful when no escaping at all is desired. They can be declared using r"" and optionally an arbitrary number of # pairs outside of "", depending on whether " is in the string and how many # characters are used within the string:

fn main() {
    let raw = r"nope: \u{211D}, nope: \x67\x61\x6d\x65";
    let raw = r#"even more "nope" here"#;
    let raw = r###"nope #nope ##nope"###;
}

Byte strings

Strings of bytes that are mostly text are created using b"" and are stored as an array of type [u8; N]:

fn main() {
    let bytes = b"raw bytes amirite?"; // type &[u8; 18]
}

They allow escaping the same way as regular strings, except for Unicode code points:

fn main() {
    let bytes = b"the \x67\x61\x6d\x65 again lmao";  // ok
    // let bytes = b"nope \u{211D}";                 // nope 🙀
}

Byte strings don't have to be a valid UTF-8:

use std::str;

fn main() {
    let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82\xbb"; // "ようこそ" in SHIFT-JIS

    match str::from_utf8(shift_jis) {
        Ok(s) => println!("Like that's ever going to happen: {}", s),
        Err(e) => println!("Told ya: {}", e),
    };
}

They can be made raw the same way as regular strings:

fn main() {
    let rbs = br##"hashtag #raw "strings" amirite?"##; // type &[u8; 31]
}

Formatting

Concatenating can be done using the + operator:

fn main() {
    let s = "top".to_string();
    println!("{}", s + "kek"); // topkek
}

More complex formatting can be done using the format! macro:

fn main() {
    let s = format!("{}, {}!", "hello", "world");
}

Styles

The formatting syntax has the form {<position>:<format>}, both parts being optional. When none are supplied also the : can be omitted. It is verified at compile-time.

The <position> part can be the argument position, or a named argument:

fn main() {
    println!("Rofl {}", "lmao");           // implicit position
    println!("Rofl {0}", "lmao");          // explicit position
    println!("Rofl {arg}", arg = "lmao");  // named position
}

The <format> part determines which trait to use when formatting:

  • nothing for Display
  • ? for Debug
  • o for Octal
  • x for LowerHex
  • X for UpperHex
  • p for Pointer
  • b for Binary
  • e for LowerExp
  • E for UpperExp
fn main() {
    println!("{:?}", 1337);  // debug
    println!("{:b}", 1337);  // binary
    println!("{:X}", 1337);  // upper-case hexadecimal

    println!("1337 = {leet:X}, 420 = {:?}", 420, leet = 1337);  // mishmash
}

Further traits can be added in the future.