1. Purpose 2. Usage 3. TODO 4. Installation 5. Example |
6. Feedback 7. Requirements 7.1. Compilation problems 8. Changelog 9. Copying 10. Downloading |
Standard-correct HTML is a good thing. One of the goals in the development of this program is that it never makes the HTML more broken than it previously was. It should even make it better than it was. So if you see that the program does the opposite, please tell me.
htmlrecode 1.2.0 - Copyright (C) 1992,2003 Bisqwit (http://iki.fi/bisqwit/) Usage: htmlrecode [<option> [<...>]] Reads stdin, writes stdout. Options: -I, --inset setname Assumed input character set (default: iso-8859-1) -O, --outset setname Wanted output character set (default: iso-8859-1) -V, --version Displays version information. -e, --usehex Use hexadecimal escapes. -g, --signature Prefix the file with an unicode signature. -h, --help This help. -l, --lossy Disable lossless conversion. -q, --quiet Be less verbose. -s, --strict Turn off support for slightly broken HTML. -v, --verbose Be less quiet. -x, --xmlmode XML mode: all tag param values quoted. Pipe in the html file and pipe the output to result file.
$ make $ su # make installIf you do not want to install libargh (included in the archive), do not use "make install" and edit Makefile and enable the STATIC linking instead of DYNAMIC.
Here are some latin letters: åäöñé
Here are some CJK (chinese/japanese/korean ideograms): 日本
Here are some html escapes: >"äöê
Source code of the above:
Here are some latin letters: åäöñé<br> Here are some CJK (chinese/japanese/korean ideograms): 日本<br> Here are some html escapes: >"äöê<br>What your browser is getting, is not 日 etc but the actual utf-8 characters.
htmlrecode.hh
has some settings you can
try to choose between. Try this:Replace
//#define wstring ucs4string typedef wchar_t ucs4; //typedef unsigned int ucs4; //typedef basic_string<ucs4> wstring;With
//#define wstring ucs4string //typedef wchar_t ucs4; typedef unsigned int ucs4; typedef basic_string<ucs4> wstring;This might help compiling on g++-2.95.
Since 1.3.0: - Compilation fixes on more up-to-date compilers. (Thanks Santiago M. Mola) Since 1.2.0: - Abrubtly terminated multibyte sequences no longer cause htmlrecode to enter an infinite loop Since 1.1.5: - Tags are now recognized in all mixed case - Tag values can be in '', not only in "" - -:_. are recognized to be part of tag value if no "" is there - Nonspace are also recognized as above :( (unless -s option was used) - SCRIPT and STYLE contents are "raw" until the next </, unless -s was used - SCRIPT/STYLE contents are properly rehidden if necessary - " and ' quotes (and no quotes) are used wisely - Warnings from some bad HTML - Indentations inside tags are now kept mostly intact - XHTML support - Unicode signature character support - Major structural rewrites - New "configure" script - Big thanks to Winfried Szukalski for his thorough testing efforts and comments. Since 1.1.4: - workaround for g++ versions, now compiles with g++-3 Since 1.1.3: - optimizations - error resistence Since 1.1.2: - hex support - g++ string workarounds Since 1.1.1: - improved documentation - fixed < (was outputted as >, should be <)
Generated from
progdesc.php (last updated: Tue, 21 Jul 2009 15:55:58 +0300)
with docmaker.php (last updated: Tue, 21 Jul 2009 15:55:58 +0300)
at Tue, 21 Jul 2009 15:55:58 +0300