I’m a bit of an avid reader, or so I thought before I undertook this project—when I pulled my Goodreads data, I was shocked to see that there were significantly fewer books than I remembered reading. I’ve been doing reading challenges for a few years now, and they all ran along the lines of 50 books each year, so why did I only have about 185 records? When I dug into the data, it became clear: I hadn’t actually used Goodreads to document my reading challenges prior to 2018.
This turn of events led me to my research question:
What did my usage of Goodreads look like before and after I started using it to document every single book I’ve been reading, and what publishers and authors did I most document?
My audience for these visualizations is pretty limited to my friends and family. My friends in particular are in the literary community, would have a lot of context for the data in question, as I talked a lot about these books while reading them, and they would probably be able to tell that I hadn’t documented the data completely.
The below is a dashboard consisting of three separate visualizations of data I documented on Goodreads—1) a bar chart of the books I documented, split up into month and year; 2) a bubble chart of the publishing imprints I documented most; and 3) a tree map showing the authors with the most books I documented reading. They can be filtered by year, and the colors for the bubble chart represent the average amount of pages.
Overall, I wanted a layout that was organized, neat, and as legible as I could get it. I chose the primary color scheme as a kind of homage to the yellowing of book pages as they grow older. The below charts are the visualizations from the dashboard, broken down individually.
The bar chart is the most interesting visualization overall in terms of being able to see usage—2018 and 2019 are the longest sections in it, with significantly higher bars, because that’s when I began using the built-in reading challenge on Goodreads, which allowed me to note down when I finished a book, and counted it towards a real, visible goal. I chose a bar chart to represent this rather than a line chart or a scatter plot because the bar chart really illustrates how the data gets more full around 2018 and 2019. I also chose to put the number of books above each bar to make it easier on the eyes.
The moment I thought of representing publishers, I felt the call of a bubble chart. It’s whimsical in a way that publishing can feel when engaging with their social media, and really represents the idea of big imprints versus smaller ones. When I created it, it was all one color, and I decided to add a variable for the average amount of pages per publisher, which created some surprising results, like highlighting a much smaller publisher (Gollancz). The filter can do a lot of work here—for example, Tor is at the same size level as Orbit, which is an imprint of Little, Brown (one of the Big Five), but if you add 2013, it becomes much, much bigger.
The tree map ended up being one of my favorites, because like in the bubble chart, the filter significantly changes what you see—in 2014, I documented a lot of what I read by Brandon Sanderson, but he drops almost completely off the map if you exclude 2014, and the forefront of the map gets mostly taken up by female authors. I chose a tree map because I wanted to be able to see the authors’ names, while also seeing just how many books I’d been reading and documenting by those authors.
Most of the challenges I ran into with the data were tied to the cleaning, as I suspected when I began this. There was a lot of missing data, some of which I just couldn’t find or get a hold of. I tried to find the missing data, first contacting the NYPL to see if they had my checkout history, and then combing through my old social media posts to see if I could find mentions of the books I read (which was surprisingly exhausting, emotionally). However, doing that made me realize I was working with data that was not representative of my reading habits as a whole, but rather of my documentation habits of my reading for this one specific website. It’s one thing to read a book, and it’s a whole different thing to sit down, search for the book, mark it read, remember what dates I read it, and then star and review it, especially back when the mobile app wasn’t as usable. I had actually listed out the books in my Notes app instead from 2014-2016, and thought about pulling that data in, but in the end I found that there was a much more compelling story in the fact that I’ve started to actually use Goodreads to track my reading over the past two years.
One of the things I wanted to do with this project that I ended up scrapping for lack of time and sheer lack of data was a visualization of my actual reading habits, so I’d love to be able to dig deeper and compare my actual reading habits to what I was able to document on Goodreads. From what I can see so far, there is some crossover between the data, but the missing data amounts to around 40-50 records for 2014, 2015, and 2016 combined. My estimate for 2017’s missing data is around 30-40 records as well, so it would be really interesting to pull up an area chart of both sets of records to compare them. If I could also list out which imprints belong to which of the “Big Five” publishing companies, I think that would be a really compelling visualization.