Categorizing r/Fantasy Book Bingo Books

I’ve been working through the r/Fantasy 2021 Book Bingo this year:

2021 Book Bingo

SFF anthology or collection

Attack on Titan, Vol. 1

Set in Asia (Hard: by an Asian author)

r/Fantasy A to Z Genre Guide (Hard: by a BIPOC author)

Found Family (Hard: featuring an LGBTQ+ character)

First person POV (Hard: Multiple)

r/Fantasy Book Club (Hard: with participation)

New to you author (Hard: haven’t heard much about)

Gothic Fantasy (Hard: not in the Book Riot article)

Mexican Gothic

(Hard Mode)

Backlist book (Hard: published before 2000)

Revenge-seeking character (Hard: revenge as the major book plot)

Mystery plot (Hard: not primary world urban fantasy)

Six Wakes

(Hard Mode)

Comfort read (Hard: that isn’t a reread)

Wild Sign

(Hard Mode)

Debut novel (Hard: published in 2021)

Tales of Nezura: Book 1: The Zevolra

(Hard Mode)

Cat squasher (500+ pages; Hard: 800+ pages)

SFF-related nonfiction (Hard: published in the last 5 years)

Latinx or Latin American author (Hard: with fewer than 1000 Goodreads ratings)

Self published (Hard: with fewer than 50 Goodreads ratings)

Tales of Nezura: Book 1: The Zevolra

(Hard Mode)

Forest setting (Hard: for the entire book)

Annihilation

(Hard Mode)

Genre mashup (Hard: of three or more genres)

Gideon the Ninth

(Hard Mode)

Has chapter titles of more than one word (Hard: for every chapter)

The Midnight Library

(Hard Mode)

___ of ___ (Hard: and ___)

First contact (Hard: that doesn’t lead to war)

Project Hail Mary

(Hard Mode)

Trans or Nonbinary (Hard: protagonist)

Debut author (Hard: with an AMA)

Witches (Hard: as the main protagonist)

A Great and Terrible Beauty

(Hard Mode)

One thing that I’ve been having a bit of trouble with is categorizing books for that. There is a very active recommendations thread, but without the ability to load the entire thread, it … isn’t great to search. So let’s make that easier.

First thing first, let’s get the raw data. It turns out that Reddit has a wonderfully simple API to start with, just add .json to a URL to get a thread in JSON format. Example. It’s a bit of a weird format, but it’s parsable. From there, you have references to child nodes that you can download in order to get one giant JSON object for the entire thread. Which sounds like a fascinating problem, but this time around, I just skipped that and used this code. Give it a thread, wait a bit (for such a large thread), get JSON.

You could keep that as a JSON file, but I wanted to be sneaky/weird and put it straight in the script. It’s a fair chunk of data with a number of weird characters, so storing it could be tricky… unless you just base64 encode the entire thing. You can then store it straight inline and get it all out with data = json.loads(base64.b64decode('W3siYm9keS...')). It’s actually not that unusual of an idea. You see the same thing with inline data: images in webpages or games that directly embed art assets in the compiled file for optimization/distribution reasons.

Next, parsing. In this case, the recommendations thread has one first level response for each of the categories in the bingo, but after that just about any level of response could contain book titles. So what we want is to search the JSON object recursively.

  • For dictionaries, search the ‘body’ (for text) and ‘replies’ (for further children)
  • For lists, search all entries (lists of replies)
  • For strings (bodies), search the text (case insensitive)
import base64
import json
import sys

data = json.loads(base64.b64decode('W3siYm9keS...'))
    
def search(key, data, path = None):
    path = path or []
    
    if isinstance(data, list):
        for i, child in enumerate(data):
            yield from search(key, child, path)
    elif isinstance(data, dict):
        if body := data.get('body'):
            yield from search(key, body, path)
            yield from search(key, data.get('replies'), path + [body])
    elif isinstance(data, str):
        if key.lower() in data.lower():
            yield path + [data]

def top_level_search(key):
    results = set()
    for result in search(key, data):
        results.add(result[0])
    return list(sorted(results))

for arg in sys.argv[1:]:
    print(arg)
    for result in top_level_search(arg):
        print(result)
    print()

I really do love generators in this case, with yield from. You can recursively scan through the entire structure and just sort of return a flat list for free. In this case, I’m keeping track of the path through the nodes that I took to get to a specific point, although I’m only ending up returning the top_level_search for each thread (I did the whole path at first, which was neat).

And as a result:

$ python3 ~/Dropbox/book-bingo.py 'Six Wakes'

Six Wakes
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

$ python3 ~/Dropbox/book-bingo.py 'Annihilation'

Annihilation
**First Contact** \- From Wikipedia:  Science Fiction about the first meeting between humans and extraterrestrial life, or of any sentient species' first encounter with another one, given they are from different planets or natural satellites. **HARD MODE:** War does not break out as a result of contact.
**First Person POV** \- defined as:  a literary style in which the narrative is told from the perspective of a narrator speaking directly about themselves. [Link for examples.](https://examples.yourdictionary.com/examples-of-point-of-view.html) **HARD MODE:**  There is more than one perspective, but each perspective is written in First Person.
**Forest Setting** \-  This setting must be used be for a good portion of the book. **HARD MODE:** The entire book takes place in this setting.
**Mystery Plot** \- The main plot of the book centers around solving a mystery. **HARD MODE:** Not a primary world Urban Fantasy (secondary world urban fantasy is okay!)

Pretty handy!