Source: High-Entropy Passphrases
Part 1: Given a list of passphrases, count how many contain no duplicate words.
There are two ways that I worked out how to solve this problem. First, you can use regular expressions with the
\b ‘word boundary’ marker:
valid_count = 0 for line in lib.input(): if not re.search(r'\b(\w+)\b.*?\b\1\b'): valid_count += 1 print(valid_count)
\b(\w+)\b- match any complete word (
\bmatches only boundaries between non-word characters (like spaces and the start/end of a string) and word-characters)
.*- match anything between the two words
\b\1\b- match the first group above, but only if it’s a complete word (this avoids matching
Alternatively, you can use a
to compare the words. If the number of words in the list and the set are different, there was at least one duplicate:
valid_count = 0 for line in lib.input(): words = line.split() if len(words) == len(set(words)): valid_count += 1 print(valid_count)
Part 2: Passphrases may no longer contain anagrams. How many are still valid?
This would be rather more difficult to solve with regular expressions1, but it’s actually pretty straight forward with the
set based solution:
valid_count = 0 for line in lib.input(): words = [''.join(sorted(word)) for word in line.split()] if len(words) == len(set(words)): valid_count += 1 print(valid_count)
Rather than trying to scramble all of the words to find anagrams, we replace each word with the letters in lexicographic order. That means that any anagrams will become the same word. We can then use the same
len(words) == len(set(words)) code as before to look for duplicates.
Run it to check timing:
$ python3 run-all.py day-04 day-04 python3 password-validator.py input.txt 0.062463998794555664 337 day-04 python3 password-validator.py input.txt --no-anagrams 0.06604528427124023 231
Is this even possible? Since the words can be arbitrarily long, I’m not certain that it is. ↩︎