How can I make my existing HTML files work in XML?

来源:互联网 发布:怎样把淘宝宝贝排 编辑:程序博客网 时间:2024/05/16 06:08
<script type="text/javascript">google_ad_client = "pub-8800625213955058";/* 336x280, 创建于 07-11-21 */google_ad_slot = "0989131976";google_ad_width = 336;google_ad_height = 280;//</script><script type="text/javascript"src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>Either convert them to conform to some new document type (with or without a DTD) and write a stylesheet to go with them; or edit them to conform to XHTML.It is necessary to convert existing HTML files because XML does not permit end-tag minimization (missing

, etc), unquoted attribute values, and a number of other shortcuts which are normal in most HTML DTDs. However, many HTML authoring tools already produce almost (but not quite) well-formed XML. As a preparation for XML, the W3C's HTML Tidy program can clean up some of the formatting mess left behind by inadequate HTML editors, and even separate out some of the formatting to a stylesheet, but there is usually still some hand-editing to do.Converting to a new document typeIf you want to move your files out of HTML into some other DTD entirely, there are already many native XML application DTDs, and several XML versions of popular SGML DTDs like TEI and DocBook to choose from. There is a pilot site run by CommerceNet (http://www.xmlx.com/) for the exchange of XML DTDs.Alternatively you could just make up your own markup: so long as it makes sense and you create a well-formed file, you should be able to write a CSS or XSLT stylesheet and have your document displayed in a browser.Converting valid HTML to XHTMLIf your HTML files are valid (full formal validation with an SGML parser, not just a simple syntax check), then try validating them as XHTML. If you have been creating clean HTML without embedded formatting then this process should throw up only mismatches in upper/lowercase element and attribute names, and empty elements (plus perhaps the odd non-standard element type name if you use them). Simple hand-editing or a short script should be enough to fix these changes.If your HTML validly uses end-tag omission, this can be fixed automatically by a normalizer program like sgmlnorm (part of SP) or by the sgml-normalize function in an editor like Emacs/psgml (don't be put off by the names, they both do XML).If you have a lot of valid HTML files, could write a script to do this in a programming language which understands SGML/XML markup (such as Omnimark, Balise, SGMLC, or a system using one of the SGML libraries for Perl, Python, or Tcl), or you could even use editor macros if you know what you're doing.Converting invalid HTML to well-formed XHTMLIf your files are invalid HTML (95% of the Web) they can be converted to well-formed DTDless files as follows:replace the DOCTYPE Declaration with the XML Declaration . If there was no DOCTYPE Declaration, just prepend the XML Declaration.change any EMPTY elements (eg every , , , , and in the header, and every ,
,
, , , , , , , , , , , , , , , , , , and in the body of the document) so that they end with /> instead, for example Picture;make all element names and attribute names lowercase;ensure there are correctly-matched explicit end-tags for all non-empty elements; eg every

must have a

, etc;escape all < and & non-markup (ie literal text) characters as < and & respectively (there shouldn't be any isolated < characters to start with);ensure all attribute values are in quotes.Be aware that many HTML browsers may not accept XML-style EMPTY elements with the trailing slash, so the above changes may not be backwards-compatible. An alternative is to add a dummy end-tag to all EMPTY elements, so becomes . This is still valid XML provided you guarantee never to put any text content in such elements. Adding a space before the slash (eg ) may also fool older browsers into accepting XHTML as HTML.If your HTML files fall into this category (HTML created by some WYSIWYG editors is frequently invalid) then they will almost certainly have to be converted manually, although if the deformities are regular and carefully constructed, the files may actually be almost well-formed, and you could write a program or script to do as described above. The oddities you may need to check for include:do the files contain markup syntax errors? For example, are there any missing angle-brackets, backslashes instead of forward slashes on end-tags, or elements which nest incorrectly (eg an element starting inside another but ending outside)?are there any URLs (eg in hrefs or srcs) which use backslashes instead of forward slashes?do the files contain markup which conflicts with HTML DTDs, such as headings or lists inside paragraphs, list items outside list environments, header elements like preceding the first , etc?do the files use imaginary elements which are not in any known HTML DTD? (large amounts of these are used in proprietary markup systems masquerading as HTML). Although this is easy to transform to a DTDless well-formed file (because you don't have to define elements in advance) most proprietary or browser-specific extensions have never been formally defined, so it is often impossible to work out meaningfully where the element types can be used.Are there any non-ISO Latin-1 (8859-1) characters or wrongly-coded characters in your files? Look especially for native Apple Mac characters left by careless designers, or any of the illegal characters (the 32 characters at decimal codes 128-159 inclusive) inserted by MS-Windows editors. These need to be converted to the correct characters in ISO 8859-1 or the relevant plane of Unicode (and the XML Declaration should show iso-8859-1 encoding unless you specifically know otherwise).Do your files contain malformed (Mosaic/Netscape-style) comments? Comments must look If you answer Yes to any of these, you can save yourself a lot of grief by fixing those problems first before doing anything else. You will likely then be getting close to having well-formed files.Markup which is syntactically correct but semantically meaningless or void should be edited out before conversion. Examples are spacing devices such as repeated empty paragraphs or linebreaks, empty tables, invisible spacing GIFs etc: XML uses stylesheets, so you won't need any of these.Unfortunately there is rather a lot of work to do if your files are invalid: this is why many professional Webmasters will always insist that only valid or well-formed files are used (and why you should instruct designers to do the same), in order to avoid unnecessary manual maintenance and conversion costs later.
原创粉丝点击
热门IT博客
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 产后瑜伽动作 瑜伽垫什么牌子的好 瑜伽初级教程 瑜伽球怎么用 阴瑜伽是什么 高温瑜伽的好处 男士瑜伽入门基本动作 瑜伽垫多少钱 双人瑜伽图片 瑜伽拉伸动作 减肥瑜伽动作9个动作图片 瑜伽证怎么考 瑜伽可以长高吗 瑜伽分类及特点 瑜伽是什么运动 练瑜伽的好处和坏处 关于瑜伽的优美句子 瑜伽动作大全 简单瑜伽动作图片 男士瑜伽动作图片 瑜伽馆年卡多少钱 瑜伽练习时间 瑜伽照片图片 瑜伽有什么作用 成都瑜伽教练培训 瑜伽教练证多少钱 附近的瑜伽馆 专业瑜伽培训 瑜伽的基本动作 塑身减肥瑜伽 经典初级瑜伽教程 瑜伽基础训练 瑜伽学费一般是多少 初学者瑜伽动作图片 改善驼背的瑜伽动作 学习瑜伽有什么好处 什么是哈他瑜伽 瑜伽教练学校 瑜伽培训费用 收胯瑜伽动作图片 瑜伽简单动作