I built epub-utils: a CLI tool and Python library for inspecting EPUB files

I’ve been working on a Python tool called epub-utils that lets you inspect and extract data from EPUB files directly from the command line. I just shipped some major updates and wanted to share what it can do.

What My Project Does

A command-line tool that treats EPUB files like objects you can query:

pip install epub-utils # Quick metadata extraction epub-utils book.epub metadata –format kv # title: The Great Gatsby # creator: F. Scott Fitzgerald # language: en # publisher: Scribner # See the complete structure epub-utils book.epub manifest epub-utils book.epub spine

Target Audience

Developers building publishing tools that make heavy use of EPUB archives.

Comparison

I kept running into situations where I needed to peek inside EPUB files – checking metadata for publishing workflows, extracting content for analysis, debugging malformed files. For this I was simply using the unzip command but it didn’t give me the structured data access I wanted for scripting. epub-utils instead allows you to inspect specific parts of the archive

The files command lets you access any file in the EPUB by its path relative to the archive root:

# List all files with compression info epub-utils book.epub files # Extract specific files directly epub-utils book.epub files OEBPS/chapter1.xhtml –format plain epub-utils book.epub files OEBPS/styles/main.css

Content extraction by manifest ID:

# Get chapter text for analysis epub-utils book.epub content chapter1 –format plain

Pretty-printing for all XML output:

epub-utils book.epub package –pretty-print

A Python API is also available

from epub_utils import Document doc = Document(« book.epub ») # Direct attribute access to metadata print(f »Title: {doc.package.metadata.title} ») print(f »Author: {doc.package.metadata.creator} ») # File system access css_content = doc.get_file_by_path(‘OEBPS/styles/main.css’) chapter_text = doc.find_content_by_id(‘chapter1’).to_plain()

epub-utils Handles both EPUB 2.0.1 and EPUB 3.0+ with proper Dublin Core metadata parsing and W3C specification adherence.

It makes it easy to

Automate publishing pipeline validation Debug EPUB structure issues Extract metadata for catalogs Quickly inspect EPUB without opening GUI apps

The tool is still in alpha (version 0.0.0a5) but the API is stabilising. I’ve been using it daily for EPUB work and it’s saved me tons of time.

GitHub: https://github.com/ernestofgonzalez/epub-utils
PyPI: https://pypi.org/project/epub-utils/

Would love feedback from anyone else working with EPUB files programmatically!

submitted by /u/makeascript to r/Python
[link] [comments]

I built epub-utils: a CLI tool and Python library for inspecting EPUB files

Commentaires

Laisser un commentaire Annuler la réponse