Categorizing r/Fantasy Book Bingo Books

I’ve been working through the r/Fantasy 2021 Book Bingo this year:

2021 Book Bingo

Attack on Titan, Vol. 1

(Hard Mode)


5 SFF Short Stories (Hard: An entire anthology or collection)

The Poppy War

(Hard Mode)


Set in Asia (Hard: by an Asian author)

The Changeling

(Hard Mode)


r/Fantasy A to Z Genre Guide (Hard: by a BIPOC author)

The House in the Cerulean Sea

(Hard Mode)


Found Family (Hard: featuring an LGBTQ+ character)

The Scorpio Races

(Hard Mode)


First person POV (Hard: Multiple)

The Wyrmling Horde

r/Fantasy Book Club (Hard: with participation)

Replaced with: Sequel: Not the First Book in the Series (2017)

The Borrowers Afield

(Hard Mode)


New to you author (Hard: haven’t heard much about)

Mexican Gothic

(Hard Mode)


Gothic Fantasy (Hard: not in the Book Riot article)

Transmetropolitan, Vol. 1: Back on the Street

(Hard Mode)


Backlist book (Hard: published before 2000)

Red Sister

(Hard Mode)


Revenge-seeking character (Hard: revenge as the major book plot)

Six Wakes

(Hard Mode)


Mystery plot (Hard: not primary world urban fantasy)

Wild Sign

(Hard Mode)


Comfort read (Hard: that isn’t a reread)

Tales of Nezura: Book 1: The Zevolra

(Hard Mode)


Debut novel (Hard: published in 2021)

Hellblazer, Vol. 1: Original Sins

(Hard Mode)


Cat squasher (500+ pages; Hard: 800+ pages)

Daemon Voices

(Hard Mode)


SFF-related nonfiction (Hard: published in the last 5 years)

Cece Rios and the Desert of Souls

(Hard Mode)


Latinx or Latin American author (Hard: with fewer than 1000 Goodreads ratings)

Black Rain and Paper Cranes

(Hard Mode)


Self published (Hard: with fewer than 50 Goodreads ratings)

Annihilation

(Hard Mode)


Forest setting (Hard: for the entire book)

Gideon the Ninth

(Hard Mode)


Genre mashup (Hard: of three or more genres)

The Midnight Library

(Hard Mode)


Has chapter titles of more than one word (Hard: for every chapter)

An Alchemy of Masques and Mirrors

(Hard Mode)


___ of ___ (Hard: and ___)

Project Hail Mary

(Hard Mode)


First contact (Hard: that doesn’t lead to war)

Black Sun

Trans or Nonbinary (Hard: protagonist)

The Long Way to a Small, Angry Planet

(Hard Mode)


Debut author (Hard: with an AMA)

A Great and Terrible Beauty

(Hard Mode)


Witches (Hard: as the main protagonist)

One thing that I’ve been having a bit of trouble with is categorizing books for that. There is a very active recommendations thread, but without the ability to load the entire thread, it … isn’t great to search. So let’s make that easier.

First thing first, let’s get the raw data. It turns out that Reddit has a wonderfully simple API to start with, just add .json to a URL to get a thread in JSON format. Example. It’s a bit of a weird format, but it’s parsable. From there, you have references to child nodes that you can download in order to get one giant JSON object for the entire thread. Which sounds like a fascinating problem, but this time around, I just skipped that and used this code. Give it a thread, wait a bit (for such a large thread), get JSON.

You could keep that as a JSON file, but I wanted to be sneaky/weird and put it straight in the script. It’s a fair chunk of data with a number of weird characters, so storing it could be tricky… unless you just base64 encode the entire thing. You can then store it straight inline and get it all out with data = json.loads(base64.b64decode('W3siYm9keS...')). It’s actually not that unusual of an idea. You see the same thing with inline data: images in webpages or games that directly embed art assets in the compiled file for optimization/distribution reasons.

Next, parsing. In this case, the recommendations thread has one first level response for each of the categories in the bingo, but after that just about any level of response could contain book titles. So what we want is to search the JSON object recursively.

  • For dictionaries, search the ‘body’ (for text) and ‘replies’ (for further children)
  • For lists, search all entries (lists of replies)
  • For strings (bodies), search the text (case insensitive)
import base64
import json
import sys

data = json.loads(base64.b64decode('W3siYm9keS...'))
    
def search(key, data, path = None):
    path = path or []
    
    if isinstance(data, list):
        for i, child in enumerate(data):
            yield from search(key, child, path)
    elif isinstance(data, dict):
        if body := data.get('body'):
            yield from search(key, body, path)
            yield from search(key, data.get('replies'), path + [body])
    elif isinstance(data, str):
        if key.lower() in data.lower():
            yield path + [data]

def top_level_search(key):
    results = set()
    for result in search(key, data):
        results.add(result[0])
    return list(sorted(results))

for arg in sys.argv[1:]:
    print(arg)
    for result in top_level_search(arg):
        print(result)
    print()

I really do love generators in this case, with yield from. You can recursively scan through the entire structure and just sort of return a flat list for free. In this case, I’m keeping track of the path through the nodes that I took to get to a specific point, although I’m only ending up returning the top_level_search for each thread (I did the whole path at first, which was neat).

And as a result:

$ python3 ~/Dropbox/book-bingo.py 'Six Wakes'

Six Wakes
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

$ python3 ~/Dropbox/book-bingo.py 'Annihilation'

Annihilation
**First Contact** \- From Wikipedia:  Science Fiction about the first meeting between humans and extraterrestrial life, or of any sentient species' first encounter with another one, given they are from different planets or natural satellites. **HARD MODE:** War does not break out as a result of contact.
**First Person POV** \- defined as:  a literary style in which the narrative is told from the perspective of a narrator speaking directly about themselves. [Link for examples.](https://examples.yourdictionary.com/examples-of-point-of-view.html) **HARD MODE:**  There is more than one perspective, but each perspective is written in First Person.
**Forest Setting** \-  This setting must be used be for a good portion of the book. **HARD MODE:** The entire book takes place in this setting.
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

Pretty handy!