|
|||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||
|
XML Finally Arrives in Microsoft OfficeBy William H. DuBay The XML train is finally pulling into the station. It brings an ocean change in the way we create, store, and manage information. In October of last year, Microsoft released Office 2003, which brings the promise of XML to the desktop. Previously, Word 2000 saved only the Properties of documents in an XML module in files converted to HTML. In this new edition, you can save or export all Office documents as XML documents. Using XML tags, we can now identify various elements of our documents for manipulation, storage, and retrieval as you would data in a data bank. It also enables us to more easily share information in those documents across other applications (including Web applications), networks, and operating systems. You need to use a schemaThe Office 2003 implementation of XML requires the strict use of XML schemas to control and validate XML files. You cannot create or use an XML file without a schema. A schema is an XML file that, replacing the older DTD, defines each of the tagged elements in your XML files. All three applications have default schemas: WordML for Word, ReportML for Access, and XMLSS for Excel. Access, Excel, and the stand-alone or Professional version of Word also let you to use your own custom schemas. The version of Word that comes with the Office Standard Edition 2003 only supports the default schema. With it, you can save any Word document as an XML file, tagged automatically using the WordML schema. The Standard Office Edition of Word does not allow you to manually tag elements in a file, as do the other versions. It does enable you, however, to create and use Smart Documents (see below). Because you can use more than one schema with an XML file, the stand-alone and Professional version of Word 2003 let you create an XML file with any combination of schemas, including the default. When you attach a custom schema to your file, the Task Pane shows the structure of the XML document. You can apply a tag by first selecting text in the main pane and then selecting the element available for that text in the Task pane. In the main window, you can turn the markup tags on or off. The default WordML schema supports all the rich-text formatting and objects that we are used to in Word documents. If you have created a regular Word document, you can use a custom schema that tags only certain elements of a file. When you save the file as an XML document, your special elements will be tagged and validated according to your own schema. You also have the choice of including automatically tagging with WordML, all the items you did not manually tag using your own schema. If you do not choose this option, the items you did not manually tag will not be saved in the XML document. The Professional and stand-alone of Word also supports XSLT files, files that you can use to transform and format your XML files into other formats, such as HTML. The main problem with the new XML feature is that there is no means, yet, for general users of Office to do "data mining"—retrieving items from tagged documents as you would from a database. That will have to wait until the arrival of the next version of Microsoft Windows, with its new database indexing system. That will give us the ability to do such things as retrieve all the <abstract> items from all the documents written by a certain <author> within a certain period. Other applications, however, will also be able to do that.
Fig 1. Custom-tagged data displayed in a Word 2003 document, with the XML structure shown in the Task Pane on the right. You have the choice of saving or not saving in the XML file those items not manually tagged with the custom schema. If you choose to save them, the items not manually tagged will be automatically tagged according to the default WordML schema. Smart documents and InfoPathAll versions of Word 2003 and Excel feature Smart Documents, which use XML-enabled Smart Tags. A smart document can automatically retrieve and enter related data in the correct places. When the smart document recognizes a name, for example, it can place a related address, telephone number, and other information in appropriate places elsewhere in the file. This database efficiency reduces the possibility of error.
To read about smart documents, go to the Microsoft Smart
Document Web site: You can also use Office's new InfoPath application, which also comes with the Professional Edition, to create and use highly structured XML forms. Both technical communicators and IT professionals will find many uses for these new documents. Microsoft has tons of information about the new technology. You can get a general introduction at: http://www.microsoft.com/office/editions/prodinfo/technologies/xml.mspx There is an excellent download, complete with a tutorial and sample XML, schema, and XSLT files at: http://msdn.microsoft.com/library/default.asp?url=/downloads/list/office2k3.asp For those interested in parsing and accessing the XML files created with the default WordML schema, you can download the complete schema and documentation at: http://rep.oio.dk/Microsoft.com/officeschemas/welcome.htm
(This article is used with permission of the Orange County STC Newsletter TechniScribe. It has gone through minor editing as per Indus publishing style and policy. William DuBay is plain-language consultant and a senior member of the STC. His website is http://www.impact-information.com). STC India | Home | Contact Us |
||||||||||||||||||||||||||||||||