mailto: blog -at- heyrick -dot- eu

I'm invalid... and proud of it!

A friend, Mick, sent me an archive containing nsgmls which is a new C++ version of the sgmls parser. I will forgive you if you've never heard of that before, so in a nutshell it reads a "dtd" file which describes a specific version of the HTML 'language' and it reads a web page. Then it complains. A lot.

For example, validating my Animé page will have the following effect:

C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:8:11:E: there is no attribute "LANG"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:11:13:E: there is no attribute "TARGET"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:11:19:E: required attribute "HREF" not specified
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:30:12:E: there is no attribute "TYPE"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:107:13:E: there is no attribute "TYPE"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:193:11:E: there is no attribute "LANG"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:194:16:E: there is no attribute "BEGIN"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:194:32:E: there is no attribute "TITLE"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:194:59:E: element "SPLITTER" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:197:19:E: there is no attribute "LANG"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:197:30:E: there is no attribute "STYLE"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:197:98:E: element "SPAN" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:197:101:E: "21644" is not a character number in the document character set
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:198:34:E: element "SPAN" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:199:51:E: element "SPAN" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:199:106:E: element "SPAN" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:200:26:E: there is no attribute "STYLE"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:201:94:E: element "SPAN" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:201:97:E: "24950" is not a character number in the document character set
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:205:60:E: there is no attribute "BORDERCOLOR"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:205:75:E: there is no attribute "BGCOLOR"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:205:112:E: there is no attribute "CLASS"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:205:253:E: element "SPAN" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:206:19:E: value of attribute "ALIGN" cannot be "JUSTIFY"; must be one of "LEFT", "CENTER", "RIGHT"
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:221:35:E: element "NOSCRIPT" undefined
C:\RickMisc\sp\bin\nsgmls:C:\RickMisc\wwwsite\ricksworld\anime\index.html:233:5:E: start tag for "LI" omitted, but its declaration does not permit this
And so on, many pages of it.

I had originally defined the document as being based upon HTML 3.2 (which is roughly what Fresco can do), and all the additional markup can be omitted if not understood. That failed under the validator because, stupid me, I set the word "public" in the DOCTYPE to be lower case and not upper case, thus it seemed to revert to something predating HTML 1!

I modified my document to be the transitional form of HTML 4, this made it understand a few extra things, however I note that it does not recognise <span> which I make frequent use of to mark which words are in Japanese, like this: こにちわ; or in source form, like "<span lang="ja">&#12371;&#12395;&#12385;&#12431;</span>".
Notice, also, it complains about the above-255 character code.

There are some other valid things - I had been using "<br clear="both">" which I picked up from Wikipedia; only I got it slightly wrong. It is "both" when it is used as a CSS element, i.e. "clear: both;", but it is all when used as an HTML parameter!
For many of the other errors I am starting to wonder why I would bother using a validator? After all, the majority of browsers work on the principle of performing the actions that they are capable of, and disregarding the rest. I write my pages using MSIE6 which is the friendliest in off-line mode. These pages are then tested on Firefox and also using Opera; I take specific care with the scripting as there are annoying differences between the two. Every once in a while I also test my pages with Fresco and Nutsurf, though I think we will have to accept the inherent limitations in the RISC OS browsers as compared the power and force of movement behind those of the more mainstream platforms.

In my opinion, it was a stupid decision to have a differing parameter between the CSS version and the HTML version; especially given that left and right and none (the alternative flow options) are the same. It is only both and all that differ. I strongly suspect this is why the browsers obeyed the incorrect tag, that somebody who wrote that into the parser realised the likelihood of people getting this muddled.

My overall aim is to have pages which perform well across the range of browsers, which is not necessarily valid HTML. For one thing, one of my major complaints with CSS is that it is not backwardly compatible. If you mark out a document with lots of "<div class=...>" instead of inline bold, font colours, text sizes... what does it look like on a browser that cannot do CSS properly, or at all? It looks like a crap flat text document badly rendered. This is why I use CSS, but I also in-line a lot of things. So it looks good on MSIE/Firefox/Opera, but still looks like something on Fresco.

Then comes the most shocking thing. The <noscript> tag is defined as something that belongs in the document header, and only in the header.

WHY?

Again, an example from my animé document. Latinised Japanese, when written in the modified Hepburn style, gives the closest representation of the language that a English speaker is likely to pronounce mostly-correctly. However, there exist elongated vowels, and these are marked with a caron. That's a flat bar over the letter. For systems which are broadly Unicode, I would like to display the character correctly. And for RISC OS, for which the majority is not Unicode (I believe RISC OS 5 is, but the browser isn't?) I wish to revert to the cheap'n'cheerful method if whacking a circumflex accent on the character. That is not correct, but it is frequently used as a closest-approximation. And, finally, I make the assumption that systems which don't support Javascript are unlikely to support Unicode, hence the fall-back is inlined circumflex accented characters.

The code at the point of display is as follows:

<script type="text/javascript"><!--
   WriteJapanese("T&#333;ky&#333;", "T&ocirc;ky&ocirc;", ""); //-->
   </script><noscript>T&ocirc;ky&ocirc;</noscript>
I'm not going to bother showing the JavaScript - it's in the document source if you are interested. What happens is the first parameter ("T&#333;ky&#333;") is used for systems deemed capable, and the second parameter ("T&ocirc;ky&ocirc;") is used otherwise. If scripting fails, then "T&ocirc;ky&ocirc;" is used.

  • "T&#333;ky&#333;" looks like "Tōkyō".
  • "T&ocirc;ky&ocirc;" looks like "Tôkyô".
(what you will actually see depends upon your browser/OS capabilities; the picture on the right shows what it should look like)

But then comes the biggest gripe.
The HTML 4.01 loose dtd defines the document head as follows (the blue "[...]" means unrelated stuff clipped out):

<!--================ Document Head =======================================-->
<!-- %head.misc; defined earlier on as "SCRIPT|STYLE|META|LINK|OBJECT" -->
<!ENTITY % head.content "TITLE & ISINDEX? & BASE?">

<!ELEMENT HEAD O O (%head.content;) +(%head.misc;) -- document head -->
<!ATTLIST HEAD
[...]

<!ELEMENT TITLE - - (#PCDATA) -(%head.misc;) -- document title -->
[...]

<!ELEMENT BASE - O EMPTY               -- document base URI -->
[...]

<!ELEMENT META - O EMPTY               -- generic metainformation -->
[...]

<!ELEMENT STYLE - - %StyleSheet        -- style info -->
[...]

<!ELEMENT SCRIPT - - %Script;          -- script statements -->
[...]

<!ELEMENT NOSCRIPT - - (%flow;)*
  -- alternate content container for non script-based rendering -->
<!ATTLIST NOSCRIPT
  %attrs;                              -- %coreattrs, %i18n, %events --
  >
(it is the same in the definitions for HTML 4.01 strict and XHTML 1)

Far be it from little me to question the wisdom of the W3C organisation, but let me ask you a question. The <script> tag introduces some script, most of which appears in the document header where it can be hidden away, right? This script is accessed by way of events such as onClick, right?
The alternative is, logically enough, <noscript> which allows either a simpler form of 'whatever' to be provided, a link to a plain form, or a "sorry, you need javascript" message. Well, excuse me, but what is a tag that could result in textual output doing in the document header?
Furthermore, it appears that scripting outside of the header is not valid. Script should be in the header, and script functions are called by way of events. Not terribly useful if you wish to in-line a message that changes depending on the time of day, or document.write different colours into a table for the time of year (greens for Spring, yellows for Summer, browns for Autumn, blues for Winter...).

There is justifiable reason for wanting small bits of scripting in documents (my Japanese accenting being a good example, note also that the grunt-work is performed by a function in the header, the in-line is the minimal necessary to get the job done); and it seems to me to be bordering on unforgivable to specify a tag which is frequently likely to result in displayable output within the document head.
It is my understanding that, to the letter of the law, a browser is not supposed to display anything until it hits the 'body' tag (certainly, my OvHTML importer will read some stuff from the header, but actual conversion begins at the <body> tag), which would therefore render useless most of the content for which the <noscript> might be used.

Browsers accept this stuff in the document body. You will frequently find books and tutorials instructing you to make use of these tags in the document body. However when it comes to validation, it will fail.

So, with many thanks to Mick, I think I will skip validating my pages and instead will do what I always do - throw the document at a variety of browsers to check it displays as expected. If that makes me invalid, so be it.

It would be nice to use nsgmls once in a while to trap idiot errors, like a <div> with no closing tag, however don't be fooled into thinking that the quoted line numbers, the ones on the left, bear much relationship to the document being examined.
Unfortunately the output does not quote offending lines (unlike the on-line W3C validator) so it is sometimes near-impossible to track down problems. If you look at the last line of quoted problems, you will see start tag for "LI" omitted, but its declaration does not permit this; the nearest list block looks correct both in script and on-screen, so I don't know what it is complaining about.

Oh, and my final word on this - to flog a dead donkey - it would be really nice if the W3C would extend the markup language to those of us who use correct English; namely "colour" and "centre" as acceptable alternatives to "color" and "center".

 

Hello Kitty revisited

Following from the publicity I mentioned back on the 18th (re. the spooky girl), mom got one of those T-shirts.
She would also like me to point out that she fits into a 16 year old's size. Not a size 16, but a size-for-a-16-year-old.

Oh, and if you are wondering about her expression, it's a real shame we didn't go all eco and install a wind turbine. August, so far, has been a pretty windy month, and this affected mom's hair. I think this was, what, the eighth attempt to take this photo?
 

Mobile madness

Orange is quite good about sending texts to remind you that your credit is about to run out, though to be honest you would have thought they would have a sanity filter to stop ones like this creeping through:
Translation: Mobicarte (pay-as-you-go) - Primary account. Warning, you have less than a day to use your credit of €0.00.

 

The jury is out on yesterday's plane crash in Madrid. What I hear is a lot of speculation, sensationalisation, and graphically morbid details.

What I am going to talk about in this section of my b.log is reckless cost cutting. It may be that the Madrid plane was brought down by a bad cost-cutting measure. It may be a mechanic who passed a part as okay when it wasn't (remember, this was apparently the plane's second take-off attempt), it may be the unlikely event of a tiny meteorite hit the engine, or it may just have been some other random act of serendipity that had a rather catastrophic effect.

This leads me to question. There are times when a corner or two can be cut and a few fine details can be skimped. For example, if I have the washing in the machine and the weather looks on the change, I might skip over a bit of the rinse cycle and maybe stop the spin cycle when it sounds like most of the water is spun out. Fairly harmless. You are probably wondering why the hell I'm on about washing when I'm about to present a picture like:


The correlation is actually fairly obvious. I'm cutting a corner or two. Now ask yourself, public transport comes in three main forms which we can categorise as "planes, trains, and automobiles", to borrow a movie title. The last group is 'buses', but I'm sure you will agree that all three consist of rather large lumps of metal that move at some measure of speed, and in many cases contain an explosive mix of fuel and other chemicals.
Over 100 people were erased from this planet so entirely that it will take DNA testing to work out who is who, well, which bits are who. Humans are inherently soft and squishy and a massive metal structure nosediving into a field at around 150kph followed by the fireball of a full amount of aviation fuel, it is actually something of a miracle that anybody survived. But, sadly, it isn't the first and it won't be the last.

How wise is it to cost-cut? I think we are running into something of a double-edged sword. Greedy executives and shareholders want a bigger slice of the pie whilst providing their staff with insulting pay offers which lead to threats of strikes; yet oddly enough prices keep rising, the cost of fuel being blamed for a lot of it (though I believe that to be a convenient 'excuse'); meanwhile people want it cheaper and cheaper still. I know people who sift the Internet for flights where the airport tax costs more than a few hundred miles by plane, and they'd be happy to travel miles out of their way for a flight £10 cheaper.
It is crazy. A fragile machine hurtling around, containing maybe hundreds of equally fragile people, is so utterly NOT the time to be skimping on anything.
In fact, I would like to see an air company put its prices up while ensuring:
  1. Stock holders and directors don't pilfer the profits.
  2. The mechanics are doubled up, well trained, everything is checked and double-checked and if this takes twice as long as everybody else... it takes twice as long as everybody else.
  3. Flight crew are paid a fair wage, are treated well, and are not overworked.
If I could find an airline that runs itself like that, I'd fly with them. We don't need pilots that are stressed and half asleep. We don't need train drivers that are on the go for many hours, with the scant few minutes walk along the train to swap ends at a turn-around station being considered adequate rest time. We don't need flight attendants and conductors who are barely paid above minumum wage and are considering strike action next week. We don't need bus drivers that sneak in MP3 players to keep themselves awake.

If you are the sort of person that habitually takes budget airlines, perhaps you've notched up several dozen flights, maybe more? And maybe it has all gone more or less without incident. Well, this could be a good time to reflect upon the events in Madrid and to realise that minor incidents are no big thing, but the major incidents are the ones to worry about. Five hundred hassle-free flights don't have a lot of value if the 501st nosedives into a field.

So... you take all of my scattered thoughts and you roll them up in a big ball and you ask if you really want to bounce that ball around the inside of the next plane/train/bus that you will be taking.

 

And this happened because?

Sometimes XP can act a bit odd, like if you are refreshing a long web document and converting a programme to XviD and trying to grab pictures off a live TV source; and you think a 450MHz processor (128Mb RAM) will do all of this without a hiccup now and again.
Well, here's the hiccup. A terribly useful message that pinpoints the problem exactly:

 

Safe as houses?

Surely a picture like this ought to tell the government something?

 

Today's word

Today's word is pallor (pah-lor), which is a word meaning colourless or paleness, especially when unnatural: Belinda's pallor is perhaps because she thinks she saw a ghost?
You may alternatively say: Belinda's pallid complexion is perhaps because she thinks she saw a ghost?

 

Your comments:

Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
 
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.

No comments yet...

Add a comment (v0.11) [help?] . . . try the comment feed!
Your name
Your email (optional)
Validation Are you real? Please type 48963 backwards.
Your comment
French flagSpanish flagJapanese flag
Calendar
«   August 2008   »
MonTueWedThuFriSatSun
    123
4568910
121314151617
202223
25262728293031

(Felicity? Marte? Find out!)

Last 5 entries

List all b.log entries

Return to the site index

Geekery
 
Alphabetical:

Search

Search Rick's b.log!

PS: Don't try to be clever.
It's a simple substring match.

Etc...

Last read at 19:08 on 2024/11/24.

QR code


Valid HTML 4.01 Transitional
Valid CSS
Valid RSS 2.0

 

© 2008 Rick Murray
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.

 

Have you noticed the watermarks on pictures?
Next entry - 2008/08/24
Return to top of page