Source: High-Entropy Passphrases
Part 1: Given a list of passphrases, count how many contain no duplicate words.
There are two ways that I worked out how to solve this problem. First, you can use regular expressions with the \b
‘word boundary’ marker:
valid_count = 0
for line in lib.input():
if not re.search(r'\b(\w+)\b.*?\b\1\b'):
valid_count += 1
print(valid_count)
This means:
\b(\w+)\b
- match any complete word (\b
matches only boundaries between non-word characters (like spaces and the start/end of a string) and word-characters).*
- match anything between the two words\b\1\b
- match the first group above, but only if it’s a complete word (this avoids matchingaa aaa
)
Alternatively, you can use a set
to compare the words. If the number of words in the list and the set are different, there was at least one duplicate:
valid_count = 0
for line in lib.input():
words = line.split()
if len(words) == len(set(words)):
valid_count += 1
print(valid_count)
Part 2: Passphrases may no longer contain anagrams. How many are still valid?
This would be rather more difficult to solve with regular expressions1, but it’s actually pretty straight forward with the set
based solution:
valid_count = 0
for line in lib.input():
words = [''.join(sorted(word)) for word in line.split()]
if len(words) == len(set(words)):
valid_count += 1
print(valid_count)
Rather than trying to scramble all of the words to find anagrams, we replace each word with the letters in lexicographic order. That means that any anagrams will become the same word. We can then use the same len(words) == len(set(words))
code as before to look for duplicates.
Run it to check timing:
$ python3 run-all.py day-04
day-04 python3 password-validator.py input.txt 0.062463998794555664 337
day-04 python3 password-validator.py input.txt --no-anagrams 0.06604528427124023 231
Is this even possible? Since the words can be arbitrarily long, I’m not certain that it is. ↩︎