Categorizing r/Fantasy Book Bingo Books

I’ve been working through the r/Fantasy 2021 Book Bingo this year:

2021 Book Bingo

Attack on Titan, Vol. 1

by Hajime Isayama


Hard Mode ✓


5 SFF Short Stories (Hard: An entire anthology or collection)

The Poppy War

by R.F. Kuang


Hard Mode ✓


Set in Asia (Hard: by an Asian author)

The Changeling

by Victor LaValle


Hard Mode ✓


r/Fantasy A to Z Genre Guide (Hard: by a BIPOC author)

The House in the Cerulean Sea

by T.J. Klune


Hard Mode ✓


Found Family (Hard: featuring an LGBTQ+ character)

The Scorpio Races

by Maggie Stiefvater


Hard Mode ✓


First person POV (Hard: Multiple)

The Wyrmling Horde

by David Farland


r/Fantasy Book Club (Hard: with participation)

Replaced with: Sequel: Not the First Book in the Series (2017)

The Borrowers Afield

by Mary Norton


Hard Mode ✓


New to you author (Hard: haven’t heard much about)

Mexican Gothic

by Silvia Moreno-Garcia


Hard Mode ✓


Gothic Fantasy (Hard: not in the Book Riot article)

Transmetropolitan, Vol. 1: Back on the Street

by Warren Ellis


Hard Mode ✓


Backlist book (Hard: published before 2000)

Red Sister

by Mark Lawrence


Hard Mode ✓


Revenge-seeking character (Hard: revenge as the major book plot)

Six Wakes

by Mur Lafferty


Hard Mode ✓


Mystery plot (Hard: not primary world urban fantasy)

Wild Sign

by Patricia Briggs


Hard Mode ✓


Comfort read (Hard: that isn’t a reread)

Tales of Nezura: Book 1: The Zevolra

by Randall Cooper


Hard Mode ✓


Debut novel (Hard: published in 2021)

Hellblazer, Vol. 1: Original Sins

by Jamie Delano


Hard Mode ✓


Cat squasher (500+ pages; Hard: 800+ pages)

Daemon Voices

by Philip Pullman


Hard Mode ✓


SFF-related nonfiction (Hard: published in the last 5 years)

Cece Rios and the Desert of Souls

by Kaela Rivera


Hard Mode ✓


Latinx or Latin American author (Hard: with fewer than 1000 Goodreads ratings)

Black Rain and Paper Cranes

by R.S. Craig


Hard Mode ✓


Self published (Hard: with fewer than 50 Goodreads ratings)

Annihilation

by Jeff VanderMeer


Hard Mode ✓


Forest setting (Hard: for the entire book)

Gideon the Ninth

by Tamsyn Muir


Hard Mode ✓


Genre mashup (Hard: of three or more genres)

The Midnight Library

by Matt Haig


Hard Mode ✓


Has chapter titles of more than one word (Hard: for every chapter)

An Alchemy of Masques and Mirrors

by Curtis Craddock


Hard Mode ✓


___ of ___ (Hard: and ___)

Project Hail Mary

by Andy Weir


Hard Mode ✓


First contact (Hard: that doesn’t lead to war)

Black Sun

by Rebecca Roanhorse


Trans or Nonbinary (Hard: protagonist)

The Long Way to a Small, Angry Planet

by Becky Chambers


Hard Mode ✓


Debut author (Hard: with an AMA)

A Great and Terrible Beauty

by Libba Bray


Hard Mode ✓


Witches (Hard: as the main protagonist)

One thing that I’ve been having a bit of trouble with is categorizing books for that. There is a very active recommendations thread, but without the ability to load the entire thread, it … isn’t great to search. So let’s make that easier.

First thing first, let’s get the raw data. It turns out that Reddit has a wonderfully simple API to start with, just add .json to a URL to get a thread in JSON format. Example. It’s a bit of a weird format, but it’s parsable. From there, you have references to child nodes that you can download in order to get one giant JSON object for the entire thread. Which sounds like a fascinating problem, but this time around, I just skipped that and used this code. Give it a thread, wait a bit (for such a large thread), get JSON.

You could keep that as a JSON file, but I wanted to be sneaky/weird and put it straight in the script. It’s a fair chunk of data with a number of weird characters, so storing it could be tricky… unless you just base64 encode the entire thing. You can then store it straight inline and get it all out with data = json.loads(base64.b64decode('W3siYm9keS...')). It’s actually not that unusual of an idea. You see the same thing with inline data: images in webpages or games that directly embed art assets in the compiled file for optimization/distribution reasons.

Next, parsing. In this case, the recommendations thread has one first level response for each of the categories in the bingo, but after that just about any level of response could contain book titles. So what we want is to search the JSON object recursively.

  • For dictionaries, search the ‘body’ (for text) and ‘replies’ (for further children)
  • For lists, search all entries (lists of replies)
  • For strings (bodies), search the text (case insensitive)
import base64
import json
import sys

data = json.loads(base64.b64decode('W3siYm9keS...'))
    
def search(key, data, path = None):
    path = path or []
    
    if isinstance(data, list):
        for i, child in enumerate(data):
            yield from search(key, child, path)
    elif isinstance(data, dict):
        if body := data.get('body'):
            yield from search(key, body, path)
            yield from search(key, data.get('replies'), path + [body])
    elif isinstance(data, str):
        if key.lower() in data.lower():
            yield path + [data]

def top_level_search(key):
    results = set()
    for result in search(key, data):
        results.add(result[0])
    return list(sorted(results))

for arg in sys.argv[1:]:
    print(arg)
    for result in top_level_search(arg):
        print(result)
    print()

I really do love generators in this case, with yield from. You can recursively scan through the entire structure and just sort of return a flat list for free. In this case, I’m keeping track of the path through the nodes that I took to get to a specific point, although I’m only ending up returning the top_level_search for each thread (I did the whole path at first, which was neat).

And as a result:

$ python3 ~/Dropbox/book-bingo.py 'Six Wakes'

Six Wakes
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

$ python3 ~/Dropbox/book-bingo.py 'Annihilation'

Annihilation
**First Contact** \- From Wikipedia:  Science Fiction about the first meeting between humans and extraterrestrial life, or of any sentient species' first encounter with another one, given they are from different planets or natural satellites. **HARD MODE:** War does not break out as a result of contact.
**First Person POV** \- defined as:  a literary style in which the narrative is told from the perspective of a narrator speaking directly about themselves. [Link for examples.](https://examples.yourdictionary.com/examples-of-point-of-view.html) **HARD MODE:**  There is more than one perspective, but each perspective is written in First Person.
**Forest Setting** \-  This setting must be used be for a good portion of the book. **HARD MODE:** The entire book takes place in this setting.
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

Pretty handy!