Categorizing r/Fantasy Book Bingo Books

I’ve been working through the r/Fantasy 2021 Book Bingo this year:

2021 Book Bingo

Attack on Titan, Vol. 1

by Hajime Isayama

Hard Mode ✓

5 SFF Short Stories (Hard: An entire anthology or collection)

The Poppy War

by R.F. Kuang

Hard Mode ✓

Set in Asia (Hard: by an Asian author)

The Changeling

by Victor LaValle

Hard Mode ✓

r/Fantasy A to Z Genre Guide (Hard: by a BIPOC author)

The House in the Cerulean Sea

by T.J. Klune

Hard Mode ✓

Found Family (Hard: featuring an LGBTQ+ character)

The Scorpio Races

by Maggie Stiefvater

Hard Mode ✓

First person POV (Hard: Multiple)

The Wyrmling Horde

by David Farland

r/Fantasy Book Club (Hard: with participation)

Replaced with: Sequel: Not the First Book in the Series (2017)

The Borrowers Afield

by Mary Norton

Hard Mode ✓

New to you author (Hard: haven’t heard much about)

Mexican Gothic

by Silvia Moreno-Garcia

Hard Mode ✓

Gothic Fantasy (Hard: not in the Book Riot article)

Transmetropolitan, Vol. 1: Back on the Street

by Warren Ellis

Hard Mode ✓

Backlist book (Hard: published before 2000)

Red Sister

by Mark Lawrence

Hard Mode ✓

Revenge-seeking character (Hard: revenge as the major book plot)

Six Wakes

by Mur Lafferty

Hard Mode ✓

Mystery plot (Hard: not primary world urban fantasy)

Wild Sign

by Patricia Briggs

Hard Mode ✓

Comfort read (Hard: that isn’t a reread)

Tales of Nezura: Book 1: The Zevolra

by Randall Cooper

Hard Mode ✓

Debut novel (Hard: published in 2021)

Hellblazer, Vol. 1: Original Sins

by Jamie Delano

Hard Mode ✓

Cat squasher (500+ pages; Hard: 800+ pages)

Daemon Voices

by Philip Pullman

Hard Mode ✓

SFF-related nonfiction (Hard: published in the last 5 years)

Cece Rios and the Desert of Souls

by Kaela Rivera

Hard Mode ✓

Latinx or Latin American author (Hard: with fewer than 1000 Goodreads ratings)

Black Rain and Paper Cranes

by R.S. Craig

Hard Mode ✓

Self published (Hard: with fewer than 50 Goodreads ratings)


by Jeff VanderMeer

Hard Mode ✓

Forest setting (Hard: for the entire book)

Gideon the Ninth

by Tamsyn Muir

Hard Mode ✓

Genre mashup (Hard: of three or more genres)

The Midnight Library

by Matt Haig

Hard Mode ✓

Has chapter titles of more than one word (Hard: for every chapter)

An Alchemy of Masques and Mirrors

by Curtis Craddock

Hard Mode ✓

___ of ___ (Hard: and ___)

Project Hail Mary

by Andy Weir

Hard Mode ✓

First contact (Hard: that doesn’t lead to war)

Black Sun

by Rebecca Roanhorse

Trans or Nonbinary (Hard: protagonist)

The Long Way to a Small, Angry Planet

by Becky Chambers

Hard Mode ✓

Debut author (Hard: with an AMA)

A Great and Terrible Beauty

by Libba Bray

Hard Mode ✓

Witches (Hard: as the main protagonist)

One thing that I’ve been having a bit of trouble with is categorizing books for that. There is a very active recommendations thread, but without the ability to load the entire thread, it … isn’t great to search. So let’s make that easier.

First thing first, let’s get the raw data. It turns out that Reddit has a wonderfully simple API to start with, just add .json to a URL to get a thread in JSON format. Example. It’s a bit of a weird format, but it’s parsable. From there, you have references to child nodes that you can download in order to get one giant JSON object for the entire thread. Which sounds like a fascinating problem, but this time around, I just skipped that and used this code. Give it a thread, wait a bit (for such a large thread), get JSON.

You could keep that as a JSON file, but I wanted to be sneaky/weird and put it straight in the script. It’s a fair chunk of data with a number of weird characters, so storing it could be tricky… unless you just base64 encode the entire thing. You can then store it straight inline and get it all out with data = json.loads(base64.b64decode('W3siYm9keS...')). It’s actually not that unusual of an idea. You see the same thing with inline data: images in webpages or games that directly embed art assets in the compiled file for optimization/distribution reasons.

Next, parsing. In this case, the recommendations thread has one first level response for each of the categories in the bingo, but after that just about any level of response could contain book titles. So what we want is to search the JSON object recursively.

  • For dictionaries, search the ‘body’ (for text) and ‘replies’ (for further children)
  • For lists, search all entries (lists of replies)
  • For strings (bodies), search the text (case insensitive)
import base64
import json
import sys

data = json.loads(base64.b64decode('W3siYm9keS...'))
def search(key, data, path = None):
    path = path or []
    if isinstance(data, list):
        for i, child in enumerate(data):
            yield from search(key, child, path)
    elif isinstance(data, dict):
        if body := data.get('body'):
            yield from search(key, body, path)
            yield from search(key, data.get('replies'), path + [body])
    elif isinstance(data, str):
        if key.lower() in data.lower():
            yield path + [data]

def top_level_search(key):
    results = set()
    for result in search(key, data):
    return list(sorted(results))

for arg in sys.argv[1:]:
    for result in top_level_search(arg):

I really do love generators in this case, with yield from. You can recursively scan through the entire structure and just sort of return a flat list for free. In this case, I’m keeping track of the path through the nodes that I took to get to a specific point, although I’m only ending up returning the top_level_search for each thread (I did the whole path at first, which was neat).

And as a result:

$ python3 ~/Dropbox/ 'Six Wakes'

Six Wakes
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

$ python3 ~/Dropbox/ 'Annihilation'

**First Contact** \- From Wikipedia:  Science Fiction about the first meeting between humans and extraterrestrial life, or of any sentient species' first encounter with another one, given they are from different planets or natural satellites. **HARD MODE:** War does not break out as a result of contact.
**First Person POV** \- defined as:  a literary style in which the narrative is told from the perspective of a narrator speaking directly about themselves. [Link for examples.]( **HARD MODE:**  There is more than one perspective, but each perspective is written in First Person.
**Forest Setting** \-  This setting must be used be for a good portion of the book. **HARD MODE:** The entire book takes place in this setting.
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

Pretty handy!