Rick's b.log - entry 2016/07/17

mailto: blog -at- heyrick -dot- eu

Creating an EPUB file

Having purchased myself a little e-reader, and then another one (this time with a touch screen and pretence at being splashproof and dustproof) when the price was marked down a second time (to a quarter of its list price), I began to think about how to get content onto the device. You see, I am sitting outside under the shade of a willow tree to write this on my iPad, but it is not so easy to see the screen. The e-reader, on the other hand, suffers in the reverse - it is difficult to read at night (no backlight on these models) but it is quite clear and visible in bright sunlight.

Plus, if I am ever going to distribute some of my writing electronically, I'm going to need to know how to format content for e-readers. That, then, is the purpose of this b.log article.

What you want

The first thing you will need is a word processor. It doesn't matter what so long as you are happy with it and it is capable of exporting text in UTF-8 format. For this, I am using Google Docs as it is fairly capable as an app (on the iPad, a little less so on an Android phone but that may be more down to the reduced display size than anything else) plus it is fairly simple to log into Docs on a PC for getting at the content.

The next thing you will need is a text editor. For this, I recommend NotePad++. Using the WebEdit plugin, it can be taught to make HTML markup a breeze.

Finally, you will need a large package called Calibre. This is an e-book creation and management program that you will be using to assemble your e-book.

The process is, essentially, as follows:

Write the content, feel free to use headings and italics and all that sort of stuff.
Export the content as plain text.
Apply markup. You don't need to worry about the boilerplate HTML document wrapper, just mark out the headings, the italics, and so on.
Use Calibre to create a new EPUB book, then create a new XHTML file for each section of the book, and copy-paste the correct parts in the correct places.
Use Calibre to "install" your book onto your e-reader, then ensure that it is correctly formatted.

Yes, it is a bit of work, especially marking up what should already have styles, however...

What you don't want

My method of creating an e-book is not the simplest, and HTML wizards might prefer to just write the content into a text editor and mark it up as they are going. Alternatively, one might ask why I'm doing all this other rubbish given that Google Docs is actually capable of exporting a EPUB file itself, and Microsoft Word (if you use that) can directly export XHTML that can be imported into Calibre).

Let's address these in reverse order. Firstly, Google Docs is indeed able to export an EPUB file, however like Microsoft Word, it suffers from being extremely literal in its output - meaning that the markup is horrendous, bloated, and simply does not work. By way of example, here is a draft of this document as exported by Google Docs (web application) as it appears on my e-reader =, scaled to 50% for size. Note that it is not possible to change the display parameters - it is all fixed as-is.

The golden rule here is that EPUB markup should be simple. Font sizes should be relative. A heading can be 200% larger, but shouldn't be 48pt. Markup should be as simple as possible. Wrap important words in bold or italics tags, but not a paragraph of styles. You can use simple CSS to apply styles to the whole book, but the more you add, the more is likely to go wrong. Just because it looks okay on your e-reader doesn't necessarily mean that it will look the same on others. That's why keeping the markup as simple as possible is important.
Remember, also, that most e-readers allow the user to alter the text size, line spacing, etc. It is best not to interfere with these options.

Which brings me to my second point. It may seem better to just write and markup as you go, however doing so makes it harder to get into the process of writing, and also makes it harder to proofread. Sure, you can pass over the document with a spellcheck, but even so there are times some times when it would be better to see the flow of the text without obstructions. Did you notice the "times some times" in the previous sentence? That is the sort of thing that I am talking about. The more junk that is present in the document, the harder it will be to spot such things.
It is best, perhaps, to consider the mechanics of the document structure as a completely different and distinct step to the writing. In this case, I am writing this document into Google Docs. I'm applying styles (bold, italic, and heading) where appropriate, and I am concentrating entirely on what I want to say. No thought is being given to the markup.
Not at this stage.

How to do it

As I said in the previous part, I am writing this in a word processor, concentrating only on the document's content.

When I have finished, I will ensure that the document has been synced with Google (I don't have good WiFi outside, and the iPad's WiFi capabilities are, without exaggeration, the worst of every device that I own - even that cheap Xperia U phone!).

Then, I would log into docs.google.com and load the document into my browser. I would then go to the File menu (within Docs) and Download as, choosing the option to save as plain text.

The text file is then opened in Notepad++. With the aid of the WebEdit plugin and some remapping of keypresses (look in Settings ? Shortcut Mapper), editing the document is a breeze - just select the words you want in a style, then press the shortcut key.
If you have not set up Notepad++ to do this, all is not lost. You'll just have to write the HTML tags in by hand. It's a little bit more fiddly, but not the end of the world.
Verify that the document has been opened in UTF-8 mode. This appears under the Encoding menu; for me it was selected as "UTF-8 BOM". If your text editor cannot make any such distinction, you are probably best not using it.

Now, using the original document as a guide, add in the the text effects and headings.

Once the markup has been applied, ensure the document has been saved, but keep it open and on-screen.

Load Calibre.
Once Calibre has loaded, press Ctrl+Shift+E to create a new book. Enter your name in the box, and a series if applicable. Make sure that "Create an empty EPUB file as well" is ticked.
When the book appears, right-click and choose "Edit metadata" and "Edit metadata individually". In the window that appears, fill in the required information - title and such. You can also add an image to be the "front cover" if you wish. Aim for a JPEG that is 600x800, and remember that most e-ink screens can only manage a few shades of grey, so don't go for things that are complex or depend upon colour. Certain combinations such as red on black, will be practically impossible to see.

Right-click on the book information and choose "Edit book".

Import the cover image (yes, again) and save it as "Images/cover.jpeg".

Delete the file "start.xhtml", and create a new file "Text/cover.xhtml". Some wrapper markup will have been added. Where the cursor is, between the body tags, insert the following: <div> <img src="../Images/cover.jpeg" alt="cover" style="height: 100%" /> </div>

Setting the height to 100% will request the image to be scaled to fit the display.

Create a new file called "Text/contents.xhtml". Write into this file: <h2>Contents</h2>

That's all. Don't worry about anything else in this file right now.

The next stage is to create a new file for each one of your headings in the text document, and to transfer (simply copy-paste) the marked up text section by section into the EPUB editor.
For instance, this document would contain the following parts:

Text

cover.xhtml
contents.xhtml
creating.xhtml
want.xhtml
dontwant.xhtml
howto.xhtml
finished.xhtml
hints.xhtml

Images

cover.jpeg
example.jpeg

The final steps are to create and insert a table of contents. To create one, go to "Tools ? Table of Contents ? Edit Table of Contents", and in the window that pops up, click on "Generate ToC from files".
Calibre is smart enough to pick up the section names from the headings.

Open the "contents.xhtml" file, then choose "Tools ? Table of Contents ? Insert inline Table of Contents". A file called "toc.xhtml" will have been created. Copy-paste the contents of that file into your existing contents.xhtml file, amending the title if you prefer it to read simply "Contents".
Then delete the "toc.xhtml" file.

Enter Ctrl-S to ensure everything has been saved, then close the editor.

Protip! Don't close the editor just yet! We are dealing with XHTML and the e-reader is not a web browser, so is unlikely to be tolerant of bad markup. Press F7 to check the document markup. If there are issues, correct them. If you do not, you may find some e-readers just give up attempting to display your content.

Double-click on the book title to view it in the built-in e-book viewer. If it looks okay there, plug in your e-reader and transfer the book to your e-reader to check it looks okay there as well.

The finished product

When you are happy that the EPUB file looks okay, you can pick up the file from Calibre's library. This will usually be in "My Documents", although if you are running the portable version (as I am), it will be within Calibre itself.
It may be useful to "Convert books" to make a MOBI version that can be used with Kindle devices.

Here, then, is this document as an e-book:

EPUB

MOBI

There are alternative distribution channels for e-books, if you don't want to host them on your own website or you want to make money from the books. There is a Kindle Publishing system, or you could try smashwords.com?

Some hints and tips

Levels of heading are reflected in the Table of Contents. You can use h2 and h3 headings as chapter and subsection, for instance.

Use <p /> <p /> for larger paragraph spaces. If the document check complains about "named entities", it is baulking at the " ", so simply delete and retype the ';' and Calibre will replace the entity with an actual hard space.

The XHTML parser may be finicky and just give up attempting to render content if the markup is wrong. Use <br /> and <p /> (or <p>...</p>), don't forget the / at the end of an img tag, etc.
Use Calibre's document check (F7 in book editor) to ensure the markup is good.
Don't take this lightly - because of some <br> tags, my e-reader omitted half of my content!

Don't specify fonts or absolute sizes. Let the reader use defaults, let the user change these if they wish.

The text is UTF-8, not an old-fashioned code page of any sort.

However, you generally won't see non-Latin text unless you instruct the e-reader accordingly. For example you may not see this "鬼束ちひろ" but you may see this "鬼束ちひろ".
The difference? The latter was wrapped in a tag denoting it as being Japanese:
<span xml:lang="ja-JP">鬼束ちひろ</span>
If you don't see the Japanese at all, your device may not have the necessary fonts installed.

Your comments:

Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.

You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.

No comments yet...

My YouTube channel

Names of things

(Felicity? Marte? Find out!)

Mom (1948-2019)

(and what went wrong)

Tiny (2004-2016)

Get Ovation DTP v1.55

📺 The SIBA stories 📹

List all b.log entries

Return to the site index

Alphabetical:

Search Rick's b.log!

PS: Don't try to be clever.
It's a simple substring match.

Last read at 13:57 on 2024/11/21.

[ b.log2 development version log ]

This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.

Have you noticed the watermarks on pictures?

Read the explanation.

Next entry - 2016/07/23
Return to top of page

Retrieved from https://heyrick.eu/blog/index.php?diary=20160717 on 21st November 2024