Demoronizer for a Microsoft’s HTML

I discovered another piece of computer’s history related to Microsoft’s standards ignorance. Eventually, in the early days of WEB, there was ASCII which supported only American english characters. It was fine for American-speaking users, but obviously not fine for the rest of the word who wanted to use other symbols. To solve it, ISO 8859 family of encodings was standardized. These encodings were using 8th bit of ASCII (occupying 7 bits) and were an addition to ASCII, hence, they were backwards compatible with ASCII.

The ISO 8859-1, or “Latin 1” was the most widely used encoding. it was used as a default charset for Linux, for an early HTML versions, for X server etc. Since it was used in HTML, it was used by a Microsoft software too. Not surprisingly, Microsoft had it’s own unique vision of standards.

Microsoft delivered their own Latin-1 – Windows 1252 with their own set of characters in a range from 0x82 through 0x95. The mentioned range is not used in ISO Latin-1. What characters were in the proprietary range? There were quotes, apostrophes and some more.

If you were creating HTML in a Microsoft software, and it had “smart quotes” option enabled (it’s enabled by default), the resulting HTML document would be in Windows-1252. Hence, it would contain quotes and apostrophes from the ISO-unused range. As a result, when windows-1252 file is opened in a browser in non-windows OS, the invalid characters are ignored or replaced by either question marks or a white space, making the page ugly.

Surprisingly, There are still quoteless and apostropheless web pages out there in the Internet. If own one of them, please apply demoronizer to it.

Resources: