Using Calibre to embrace my inner librarian for ebooks

I have used Calibre literally for years to manage all my ebooks. It started way back when Kindle was doing a huge business of people pushing freebies of their ebooks. Some good, some slush, all free. But it meant a LOT of ebooks to manage. So I tried a couple of programs, most of which were nothing more than list managers in a database format; essentially, little more than “collection” managers for people who had adapted them from album, CD or physical book trackers.

Calibre was different. It had lots of fields, and it kept multiple formats of the books together. And you could even convert from one format to another. Digital Rights Management was a small, noisy fly easily swatted away by entering your Kindle serial number into a plugin, and everyone justified it by saying they were making “backups” of their books in case Amazon ever went away. Or something like that.

Over the years, I’ve played with multiple options. I tried having different libraries for different things, like a library for mysteries, a library for non-fiction, or a library for books I finished reading. Sometimes it took way too long to move between libraries, sometimes it was fine. I like to call those options Poly Library 1.0.

Poly Library 2.0

Eventually, I went back to a single library and realized that what I was really trying to do was create a good workflow. The most basic workflow for books is a To Be Read pile/category, an Active Reading pile/category and a Finished pile/category.

Of course, it gets a bit messy with just three piles. What about ones that I have finished reading, but I haven’t reviewed yet? That clearly goes between reading and being completely finished.

And what about new books that I add to the library but I haven’t catalogued yet or validated that the format is readable, etc.? I created an “intake” heading.

But wait, there’s more. My anal-retentive inner librarian showed up.

And suddenly I had folders for TBR – Fiction and – Non-fiction; Mystery – Series and – Standalone; Fantasy / Sci Fi – Series and – Standalone; Non-fiction folders for – Astronomy, – Biography, – Books & Writing, – Business, – Goals, – Government, – Health, – Hobbies and Crafts, – HR, – Learning, and – Other.

Most metadata is automatically imported from plugins that scrape Goodreads, LibraryThing, WorldCat, Amazon, Indigo, Google Books, SmashBooks, and more. I don’t really have to “catalogue” them much, I mostly just clean up the data so that if it says “My Big Beautiful Life: A Novel”, I tend to take the “Novel” part out, and make sure it is sorting properly on books that start with “A” or “The”. Not every site does it the same, so there’s a small cleansing role.

Most of these sound like simple tags, and in most library setups in Calibre, that would be true. But I got cute. I discovered that if you create a custom category for Workflow (for example), and make them all single option tags, i.e., they couldn’t be more than one tag at the same time, I essentially created a virtual workflow where things started at “Intake” and went all the way to “Final – Fiction” or “Final – Non-fiction”.

Except I borked it. I was playing with the database after making major revisions, and I haven’t backed up in the last couple of weeks while I’ve been working on this part. I went to highlight about 20 books and move them from one workflow category to another. Except, oops, I accidentally clicked the category twice instead of once and didn’t notice. If I click it once, I would get all the books in “TBR – Fiction” (about 20), and I could then move them to Standalone Fiction. Unless I click it TWICE, which I did, in which case it doesn’t show you all the books in that category; it shows you all the books that are NOT in that category. So the whole library, except for those 20. And I moved them to the new category.

Did I mention that, while that sounds relatively simple, the database part is actually really quite complicated? Thousands of books with one change in them. Not one change easily undone, but several thousand little changes in sequence. And you CAN’T undo it. It’s done. Permanent. Without a backup, no way to revert the index. Frak.

I asked for help online, and the best advice was basically, “Next time, do a backup, dodo bird!” Ook.

It sounds bad, but honestly, I could really easily revert something else back to the basic three buckets — TBR, active, and finished. I have more than that, but I also had a lot that were not sorted well.

Hmm…perhaps this is an opportunity in disguise! Enter Classification Man! A super hero librarian with the resources of the internet to design the ultimate in metadata sorting and fields. The ultimate library setup. Muahhahahah!

(Sorry, that laugh makes me think he’s more of a Super Villain than a Super Hero. But I digress.)

Playing with a classification “menu”

With all the time and energy I’ve put into tweaks over the years, I thought it was time to do some serious analysis before I start PolyLibrary 3.0.

The first area of “tags” is generally what I would call the “book profile“. It includes the obvious ones from any list, like the title and author, although it gets a little more sophisticated in the details. The title includes options for the title itself (i.e., the “presentation” title) as well as a field for the sort order. So a book like “The Whispering Pines” would show up in presentation as “The Whispering Pines” but in the sort field as “Whispering Pines, The”. Authors get a little more sophisticated still — if you put in that it is a collaboration between “John Smith and Jane Doe”, it will treat that as one author’s name. If, instead, you say John Smith AND Jane Doe, it will treat it as two authors. I’ll experiment with the Title to see if I can add a subtitle option so it shows either way. It’s an important field, and I’m not sure whether it allows listing both ways, like the author field does. I am also considering adding a subtitle option…I really don’t like when it says “Make It So: The blah blah blah of Captain Jean Luc Picard”; I just want the main title. As I mentioned, I can download the metadata from various sites so that I don’t have to “clean it up”, but every site varies slightly.

Of course, the title and author fields are not nearly enough. There is also a publisher field, the publication date, the book’s formats (i.e., which e-formats I have, not which other formats it comes in), and a cover. Technically, the cover isn’t really part of the database; it’s just a link to an image file stored separately with the book, rather than embedded (i.e., if the book has it embedded, that’s a separate thing). And then there is the biggie — an ID number.

Calibre actually has space for three ID numbers. It has a “Universal Unique ID” (UUID), a long alphanumeric string it generates for each book, so the database can never confuse the record with any other. It has nothing specific to do with the book; it was just generated so it can be tracked in the database. It also has a relatively simple ID, which is more like “which record is it?” i.e., #1, 2, 3, 4, 5. Except, of course, like any good little database, you can move stuff around, copy it to other libraries, copy it back, etc. The simple ID can change, the UUID will not. And then there is the ID field for the book’s public ID numbers, like an ISBN # that all commercial books use, an ASIN number that Amazon uses, a Goodreads tracker number, a DOI #, etc. There are quite a few that get tracked by the field so that it can sync with various sites. Which is really useful when you have an ebook that doesn’t have ANY public ID numbers (often indie- or self-published ebooks on non-large commercial sites don’t have the big numbers that everyone else does, partly because some countries charge fees for ISBN #s, although in Canada it is free).

I also mentioned that there are default fields for three dates: the published date I mentioned above; the date and timestamp when the book was added to Calibre; and the date when the record was last modified. I find it a bit amusing how many people online, and even some of the documentation, describe the third field as the date the book was modified. It isn’t “changing the book”, it is changing the metadata for the book — basically updating the catalogue information only (although, technically, Calibre IS powerful enough to edit the actual book file in many cases). You know, updating the database record for that book. Because in the end, that is what Calibre is. A database with fields for all this info, including links to the actual ebooks themselves. Which is the last field in the main area — the path to the folder where the books are stored.

Those are the main fields. You can add as many as you want, and as part of my inner librarian duties, I looked into what else people use in this tag category. Some like to add information about physical copies, including condition, where they are kept (in different libraries in the house or loaned to someone), trim size, weight, whether they are signed copies, etc. None of which is really useful to me in the “ebook” world, as I’ve purged almost my entire physical library. I’m considering adding a field if I still have a paper copy, too.

Another group of people are really into the production elements of the books. Are there different editions? Is there a formal subtitle (mentioned above)? What about editors or translators? Or even library catalogue info like Dewey decimal numbers or BIPAD/ISSN numbers for periodicals. In a similar vein, some people read online books that might have multiple versions or publication and/or revision dates. Most of which don’t really apply to my usage.

There are even those who want to get hardcore into the Digital Rights Management side of things, including the DRM status at purchase, what it is now, whether it’s a personal copy, and so on. I understand their interest; I don’t share their desire.

There is a last sub-category that I find interesting, before I come to a gap in the above framework. There is a plugin for Calibre, and several online sites, that track other details about books, documents, etc. It is a literacy overview, of sorts, with the # of pages, the # of words, and with the help of the plugin, an estimated literacy grade of the level of reading difficulty. I love all three, I really do, and I have them for every finished book, and yet I do nothing with that info. I have no idea what it would be good for, particularly as it is a generic set of numbers unique to how **I** calculate it or rather how I have the system calculate it. It isn’t a formal piece of information that the publishers always provide. I’d also like to include an estimated reading time, but that’s just a rough estimate. Average reading speeds range from 200 to 300 words per minute, so any estimate would depend on what number I choose. I read closer to the high end, while others might be closer to the low end. And is it really relevant?

I mentioned above that there is a gap in my profile framework. I posted my outline on the Calibre Reddit list to see if any other inner librarians might embrace my framework and comment. Several did, and one pointed out a field that they use regularly: the country of publication. I love the premise at first blush, but then it gets complicated. Take J. K. Rowling’s Harry Potter series. The first book in the series was called The Philosopher’s Stone in the UK, but was later retitled The Sorcerer’s Stone in the US and many other countries. Which means it’s a UK book published in Canada and the US, and with different titles in some cases, but even in the version in Canada, with the original title, do I call it a UK book because Rowling is from the UK, or do I count it as a Canadian because I got the Canadian e-version? I know it matters a whole lot to a certain sub-group of people, mostly because some American readers hate British spelling, and some Canadian and British readers hate American spelling. But I don’t really care. I read so fast that an American or British spelling doesn’t stop my train of thought. I’m used to both. I’d like to flag Aussie or Norwegian authors, but I’m running into the same issue: should I code the AUTHOR or the BOOK? I haven’t wrestled that to the ground yet. The funny part is that those who DO use country codes often use small country flags in the database to symbolize nationality, and that looks cool visually. I’m a nutbar if I add it just because it looks cool, right?

The second tag category is what I call “user engagement“. I’ll admit that some people don’t separate this section from my next one (user tools), as they are almost all coded by the user, but you’ll see why I do in a moment. To me, this section is about me as the reader dealing with the reading process.

Calibre starts with an obvious field for you to enter a rating from 1 to 5 stars. GoodReads, Amazon, Chapters, and most book sites also use a five-star rating system, and if you download metadata, it will first populate the average rating from that site. Plus all the metadata from the book profile above.

But if you are so inclined, Calibre also has default options for a comments field where you can add a blurb, synopsis, personal notes, or even your review. Of course, the downside of this default field is that many plugins use it to dump info from various websites when they grab metadata for a download. If I add my notes and then run a metadata download from Amazon, it overwrites what I already had. I had forgotten that until recently, when I was testing a different plugin on some sample data, trying to better integrate my library with GoodReads. I write reviews for every book I finish, and I store copies there. Because I had already downloaded the metadata before pasting my review, I never even considered what I might lose if I redownloaded it. I definitely need a new custom field for MY review.

Of course, there are many ways to do that: a single field that has my whole review in it; a series of fields that together “build” the review for the plot/premise, what I liked, what I didn’t like, and my bottom-line / one-line review; or a hybrid of several options. Some reviewers also want to include a reason for abandoning a book if they did not finish (DNF), fields for favourite quotes, or maybe even (in my case), where I have posted my reviews online or even that they ARE posted. Interestingly, I read on my Kindle and soon (there will be a separate post), on a revived tablet for PDFs. In both cases, I can make notes as I go and save them with the book. There is a plugin for the Kindle side, and potentially for the PDFs, that lets all my notes while reading be sent back to the desktop and included as a field. It’s not fully seamless yet for either source, but I’m working to get there. I generally highlight only in my non-fiction reading, and I don’t tend to save quotes from fiction. But I like the premise of saving the annotations, as once I delete it from my Kindle or tablet, those notes are gone forever.

Within user engagement, there is one last area: tracking your reading progress. It generally includes an actual field for progress, which sites like GoodReads will let you sync your Kindle to so that it a) shows what you are reading; b) lets you check in how far you have read in the ebook; and c) registers when you have finished reading. I kind of like the premise, but any book on my Kindle already tells me that. I don’t need Calibre to track it as well. Once I start, I generally go until I’m done. Sites like GoodReads and others also want a Date Started and a Date Finished/Read so you can track the duration. But I think about my own reading, and it almost makes no sense. Or at least doesn’t really resonate with me. A book like Robert Jordan’s Wheel of Time series has really long books, and I can’t just plow through them quickly. Equally, I’m struggling to finish Dostoevsky’s Crime and Punishment that I have been reading forever. It’s awesome for plot, but the prose is slow as molasses. I plan to finish it this year, and the timing isn’t relevant. Nor is it relevant if I pick up a simple murder mystery and finish it in a day.

Then, my brain borks. Because I consider books only “finished” when I actually review them. And I have over 300 in backlog, with dates I know were 2025, 2024, and then “sometime before that”. I’ve put in dates where I could, or at least years, because I use the date “read” to help me see how many books I’ve read in a given year. I participate in Reading Challenges, but because my “reviewing” list isn’t up to date, my other stats aren’t either.

My third and final tag category is what I call “user tools“. I mentioned above that I separate this from user engagement because most of the information here, while often bibliographic or self-generated, is used to help the user sort lists in various ways, not necessarily to engage with the book. To me, it means engaging all the books, not just this one.

The obvious field up front is just labelled “tags”. It is a giant catchall field where people can literally tag anything they want…fiction, non-fiction; mystery, suspense; point of view; etc. Most people use it to tag genres, and I do too. Where I differ is that I force the book into a single genre, while someone tagging Harry Potter might tag fiction, magic, UK, male lead, mystery, series, good defeats evil, coming of age, etc. I sort books separately by fiction and non-fiction, but I haven’t added a field for it. I just stored them in separate workflows. Calibre also assumes that you might have books in a Series, so there is a field that doubles up to include both the name of the series (such as Harry Potter) and the position the book is in the series (like The Philosopher’s Stone is book #1). It seemed weird at first as the number of the book has decimal points with it. I was like, “Huh?” Except often there are prequels, side books, short stories, or novellas between book 2 and 3, for example, so you can actually number it 2.5! I find that kind of cool, actually.

But with the power of Calibre, there really is no end to what you can create and tag:

Genres as nested hierarchies or relational tags for filters and sorts — Fiction / NF categories, type of text (play, SS, novella, poem, full-length book, collection), genre categories (limit one per book)
Series chronology if the numbering isn’t sufficient?
Series or standalone (if you fill in the series field with the word standalone, Calibre will think all books by all authors that are standalone are part of the same series!)
Vibes (mood, pacing, setting)
Tropes (meh)
Point of View
Content warnings
Status (owned, borrowed, library, store, prices?)
Shelves (GoodReads is big on this with shelves for read, TBR)
Context (reading challenge, award, gift, book club, recommended by someone)
Priority for TBR (aka up next)
Workflow (staging, sorting, cleaning up metadata, reading, reviewing, final archive) — this is where I got into trouble!
Years only (publication, reading, reviewing), rather than months and days
Count of how many books you have by that author in the database? (already generated in lists)
URLs of links to books on places like GRs, review site, etc.

Plus, there are hundreds of plugins that will let you add fields for just about anything. Mostly around creating ways to filter and manipulate your list.

The only other field I added is another ID #: the number I assigned to the review of that book. My list started at 00001 and is now just over 00300. I can go up to 99999, so I’ll never need six digits. I’ll likely break 1000 one day, and I could theoretically hit 10,000, but that’s highly unlikely. I’d have to write a review a day for 26 years. 🙂 (Challenge accepted!)

Okay, so what am I actually including in PolyLibrary 3.0?

I mentioned above that I discussed this with a guy on Reddit and a guy I know through another site who is bibliographically inclined, and they both thought, “Holy crap, that’s way too much!” (my interpretation of their words). Apparently, I didn’t explain that it was the full menu, not what I was ordering.

Let’s weed the list above to a more manageable size. Fields marked with an asterisk (*) are downloadable or generated by plugins, not me.

Book Profile
- Title + Title sort (investigate option of alternate titles or just add an alternate title to the same field) (*)
- Author + Author sort (including & for others, and figure out how to best indicate editors) (*)
- Subtitle (a new field, if / where warranted)
- Publisher (*)
- Publication date (change to year only) (*)
- Cover (link) (*)
- Paper copy too (a new field)
- Literacy overview (# of pages, # of words, literacy grade) (*)
- Country of author (still considering)
- Type of text (new field for play, shortstory, novella, poem, full book, collection/anthology)
- Plus defaults: Formats, ID x 3 (UUID, simple ID, ISBN/ISSN/ASIN), Path (*)
User Engagement
- Rating (original + add a new one for MY rating, not just the metadata download) (*)
- Comments (*)
- Review field (new one for MY reviews + Separate one-line review + Review tracker for BR # plus + where posted including link to PolyWogg URL)
- Annotations field (for notes synched between Kindle or Tablet) (*)
- Year Finished, Year Reviewed
User Tools
- Fiction / non-fiction (new toggle field, or potentially nested with the next two)
- Fiction genre (modification to tag field and workflow tags so it’s just MY tags)
- Non-fiction genre (modification to tag field and workflow tags)
- Series name (keep original with position) (*)
- Series / standalone (new toggle field or nested with fiction/non-fiction hierarchy)
- Read / TBR (new toggle field, or could expand to include active or other shelves from GoodReads and modify Workflow)
- Source of recommendation (new for Reading Challenge, award, gift, book club, personal recommendation)

Moving forward

I’m quite proud of that list, actually, and I’m happy that I did the deep dive. However, there are a few little niggly things I want to add to the database, all of which are “calculation” fields for display.

One of the guys on Reddit shared an example of his database, and while he is heavily invested in syncing with GoodReads, what interested me more was that he found a way to take a whole bunch of complex info you need in some fields and turn it into quick visuals. For example, while he has a field called Nationality, he also has another field that looks up the info in the Nationality field and displays a small flag for that country in his columns…the text column is there and hidden, but his display just shows the little flag. Similarly, for say, genre, he might have a magnifying glass icon for mysteries and a moon icon for astronomy. Quick little icons to represent text that takes up a lot of space in other columns that don’t have to show. It made the display really sleek and manageable. So, that’s on my list now… creating columns with images. 🙂 On the positive side, if I do that, they’re basically just fields that calculate content from other fields; I don’t need to calculate those separately or enter data in them. There are several fields above that I can distill into quick visuals.

Pray for my inner librarian. Even just for having to fix 300 book reviews that are in the wrong field.

Name*

Email*

Website

2 Comments

Carolee Awde

3 months ago

I put these icons in front of my title where applicable
🇨🇦🕵‍♂️🕵‍♀️🏡
Canadian author.
Mystery/Detective /Police Procedural with gender of protagonist
Cozies

Admire your dedication!

Paul Sadler

Reply to Carolee Awde

Cozies! I like all the icons. 🙂

Paul

The PolyBlog

My view from the lilypads

Using Calibre to embrace my inner librarian for ebooks

Poly Library 2.0

Playing with a classification “menu”

Okay, so what am I actually including in PolyLibrary 3.0?

Moving forward

Poly Library 2.0

Playing with a classification “menu”

Okay, so what am I actually including in PolyLibrary 3.0?

Moving forward

Related posts: