Day Four: Passport Processing

Problem statement
Solution code

Part 1

In today's puzzle, we are given a data file that represents a batch of passports to validate. I think the hardest part of the solution you'll see is that there are only so many synonyms for "validation," so it can get a little wordy. Let's start!

As usual, the first step is to handle data validation. The input file contains a list of passports, where a passport is a set of key-value pairs that are separated by colons. A passport may take up one or more lines, and each passport is separated by a blank line. As I mentioned in a previous article, I'm going to create separate functions for parse-input and parse-passport, where the former breaks up the input file into each unparsed passport, and the latter parses the passport itself. This also makes testing with the REPL easier.

For parse-input, the tricky part is dealing with the variable number of lines. There's probably a simpler solution, but I'm not too upset with my function. Let's go through it line by line.

NOTE: The description below of parse-input and parse-passport will work and was my solution, but I've since found a simpler solution. My source code now shows the new functions. See the "Revisions" section at the bottom of this document for details. So either skip these next two functions, or read them and then see how they changed below.

Split the input into each line, so we get a sequence of strings. The shape of the result will be ("a" "b" "c" "" "d" "e" "").
Use partition-by to create groupings of lines, separated by the empty lines. partition-by takes in a sequence and returns a sequence of sequences that are broken apart based on the predicate. In this case, my predicate is #(str/blank? %), which means to group together all lines until we reach an empty line; then start a new group. The shape of the result will be (("a" "b" "c") ("") ("d" "e") ("")).
Now we need to get rid of the empty groups, since we don't need them. A simple filter should work, and (filter #(not= % '(""))) does the trick. Note that we need to filter out groups with a single blank string, rather than filtering out blank strings themselves, so (filter #(not (str/empty? %))) is incorrect. The shape of the result will be (("a" "b" "c") ("d" "e")).
To finish the initial parsing, we want each passport's data to be represented as a single line, since the passport parser shouldn't care how many lines the input originally used. (map #(str/join " " %)) works well here, as we join all strings into a single large string, adding in an empty string to avoid accidentally joining the last key-value pair from one line with the first pair from the next. The shape of the result is a nice clean ("a b c" "d e").
Finally, we send each value on to the yet-unwritten parse-passport function.

(defn parse-input [input]
  (->> (str/split-lines input)
       (partition-by #(str/blank? %))
       (filter #(not= '("") %))
       (map #(str/join " " %))
       (map parse-passport)))

Let's move on to parse-passport. The input line is a string of key:value strings, separated by spaces. A reasonable structure for a passport would be a simple map, so this isn't too hard. First we split the line into each key:value string using a space regex - (str/split line #" "). Then for each substring, we can again use a regex split on the colon character, and that will map each substring into a two-element vector. Finally, merge it all into a map.

In my original implementation, I did a single regex match on both the space and the colon, but that meant having to use partition again, and I thought this is simpler to read.

(defn parse-passport [line]
  (->> (str/split line #" ")
       (map #(str/split % #":"))
       (into {})))

Now we need to test if a passport is valid, which requires that it has a value for all required fields. The cid field represents the Country ID which can be missing according to the instructions. So let's make a simple set of all fields we do require, and then assert that a given passport has those fields. Because the passport is just a map, we can take all of the keys out using the keys function, turn that into a set, and use set/difference to verify that there are no required fields that are missing from those keys.

(def required-fields #{"byr" "iyr" "eyr" "hgt" "hcl" "ecl" "pid"})

(defn has-required-fields? [passport]
  (->> (keys passport)
       set
       (set/difference required-fields)
       empty?))

Ok, let's throw it all together with our part1 function. We'll take the input string, parse it into the passports, and count the number of them that have the required fields.

(defn part1 [input]
  (->> (parse-input input)
       (filter has-required-fields?)
       count))

Part 2

Oh, airport security. Thank you for taking a simple process and making it more complicated. We now need to verify each of the fields within each passport to ensure that they are not only present, but also pass basic data validations. The good news is that the validations are pretty straightforward.

Three validations (birth year, issue year, and expiration year) require a numeric value within a range. I don't know for sure that the passport values will be numeric, so I wrote a little helper function called in-range? that tries to parse a string as an integer, and if it works, verifies if the result is within the min and max values, inclusive. This is the first time I'm showing Clojure's try-catch structure, which is very straightforward. It takes the form (try (expression to try) (catch ThrowableClass binding catch-expression) (finally expression)) where the catch and finally blocks are optional. It should look just like Java, but with parentheses instead of curly braces.

Also what's neat here is the when-let function. Like a let function, it takes in a vector, but it only allows a single binding. If the bind expression returns a non-nil value, then bind it and run the expression. If the bind expression is nil, then skip everything else and just return nil.

(defn in-range? [min max s]
  (when-let [i (try (Integer/parseInt s)
                    (catch NumberFormatException _ nil))]
    (<= min i max)))

(defn birth-year? [s] (in-range? 1920 2002 s))
(defn issue-year? [s] (in-range? 2010 2020 s))
(defn expiration-year? [s] (in-range? 2020 2030 s))

The hardest data validation is for height, which required a number followed by either cm or in, and different range checks for each unit. We'll use a bunch of functions we've already seen before -- re-matches to attempt a regular expression match, when-let to process the resulting values if the regex matches, and then a case statement to decide which min and max values to apply. We might as well use the in-range? function since it's already there. Note that once again we destructure the result of re-matches, ignoring the first value since it's the entire String s.

(defn height? [s] (when-let [[_ amt unit] (re-matches #"(\d+)(in|cm)" s)]
                    (case unit
                      "cm" (in-range? 150 193 amt)
                      "in" (in-range? 59 76 amt))))

The last three checks are much simpler. Hair color and passport ID just need regular expressions that test the shape of the data, but don't require range checks. The functions hair-color? and passport-id? will return nil if the values don't match their regexes. Clojure predicates are truthy, so any response that isn't false or nil will resolve to true, which is good enough here. And eye-color? just ensures that the input value exists within a set; I could test that with (contains? the-set s), but calling (the-set s) works just as well, since it returns either s if it's in the set, or nil if not.

(defn hair-color?  [s] (re-matches #"\#[0-9a-f]{6}" s))
(defn eye-color?   [s] (#{"amb" "blu" "brn" "gry" "grn" "hzl" "oth"} s))
(defn passport-id? [s] (re-matches #"[\d]{9}" s))

Now the instructions want us to count the number of valid passports, which are "those that have all required fields and valid values." In part 1, we only needed the number of passports with the required fields, so we've figured out the point of reuse between parts 1 and 2 -- given a set of validations to apply, how many passports are pass those validations? It might be easiest in this case to think about how we want the part1 and part2 functions to look, as they should only differ in the validation rules to apply:

(defn part1 [input]
  (num-valid-passports input [has-required-fields?]))
(defn part2 [input]
  (num-valid-passports input [has-required-fields? passport-passes-data-integrity?]))

Alright, so we need a function num-valid-passports that takes in an input string and a sequence of validations. I like little functions, so we'll parse the input into its passports, filter based on those that pass an as-yet defined function called valid-passport?, and then count the number of results.

(defn num-valid-passports [input validations]
  (->> (parse-input input)
       (filter (fn [passport] (valid-passport? passport validations)))
       count))

On to the valid-passport? function. For this, we want to apply all of the validations we receive onto the passport, and return true only if all validations are truthy. I could do this with a map, but I think reduce conveys our intentions better. A passport is valid until a validation rule says otherwise, so our initialization value (second param) is true, which we then and with each validation check.

(defn valid-passport? [passport validations]
  (reduce (fn [acc check] (and acc (check passport)))
          true
          validations))

Finally, let's fix up our validation functions has-required-fields? and passport-passes-data-integrity?. Sorry again about the long names. We already implemented has-required-fields? in part 1, so let's look at passport-passes-data-integrity?. For this, we want to look at each field within the passport, look up the data integrity rule (if any) for that field, and return the result. If we were to have a passport with an unknown field, we should accept it without running any validations. The easiest way to manage these rules is to map the field name to its validation rule, and if there's no rule for a given field, then always return true. The any? function returns true for any input, so that seems perfect. We'll iterate over every element of the passport as a [k v] vector, call field-passes-data-integrity?, and then assert that all results are truthy using the every? function.

There is one other trick here I haven't shown before. or is a wonderful function because it can be used both for normal boolean operations and to return "the first non-nil value in a sequence." So the test function we want to apply is either the value in the passport-fields map, or if there isn't one, then move on to the function any?. The or function can take in any number of arguments, returning the first truthy value.

(def passport-fields {"byr" birth-year?
                      "iyr" issue-year?
                      "eyr" expiration-year?
                      "hgt" height?
                      "hcl" hair-color?
                      "ecl" eye-color?
                      "pid" passport-id?})

(defn field-passes-data-integrity? [[k v]]
  (let [check (or (passport-fields k)
                 any?)]
    (check v)))

(defn passport-passes-data-integrity? [passport]
  (every? field-passes-data-integrity? passport))

That works! But let's make one last change - the set required-fields and the map passport-fields seem redundant. We can remove the set and rewrite the has-required-fields? function to leverage the keys of the passport-fields map. For each passport field name, we'll try to retrieve it from the passport, and then check (every? some?) to assert that every mapped value has some non-nil value:

(defn has-required-fields? [passport]
  (->> (keys passport-fields)
       (map #(passport %))
       (every? some?)))

Easy peasy! On to day 5.

Revisions Ideas

Part of the joy of Advent Of Code is reading other folks' solutions. I saw Nufflee's solution solution via Twitter with a Rust solution that I think is better than my original one. I'm keeping my explanation up top for historical reasons, but here is a much cleaner solution for parsing the input.

Instead of using partition-by and filter, the idea is that parse-input just has to split the giant string based on the presence of two newlines. Now I'm coding on a Windows box, so my test String showed \n\n as a blank line, while the puzzle data showed \r\n\r\n as a blank line, so I accommodated for that. As a result, parse-input gets rid of all silly \r characters, splits the original line by \n\n, and then calls parse-passport. I could have replaced the remaining \n characters with spaces, but instead I opted to update parse-passport to split the input data by either a newline or a space. The resulting parse logic is very clean.

One implementation note for parse-input - I wanted to use a threading pipeline as usual, but in this case neither -> nor ->> would suffice. When calling str/split, the String is the first argument because the regex is second. But when calling map, the threaded sequence is the last argument instead of the first. The as-> function shines here, since it defines a binding to apply to each step in the pipeline. So I bind str/replace to x, inject the result as x as the first argument in str/split; and then inject the result of that, again as x, as the last argument in map.

(defn parse-passport [line]
  (->> (str/split line #"[\n ]")
       (map #(str/split % #":"))
       (into {})))

(defn parse-input [input]
  (as-> (str/replace input "\r" "") x
        (str/split x #"\n\n")
        (map parse-passport x)))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!