By what mechanism does XML handle white-space in the documents?
All white-space, including linebreaks, TAB characters, and normal spaces, even between ’structural’ elements where no text can ever appear, is passed by the parser unchanged to the application (browser, formatter, viewer, converter, etc), identifying the context in which the white-space was found (element content, data content,or mixed content, if this information is available to the parser, eg from a DTD or Schema). This means it is the application’s responsibility to decide what to do with such space, not the parser’s:
insignificant white-space between structural elements (space which occurs where only element content is allowed, ie between other elements, where text data never occurs) will get passed to the application (in SGML this white-space gets suppressed, which is why you can put all that extra space in HTML documents and not worry about it) significant white-space (space which occurs within elements which can contain text and markup mixed together, usually mixed content or PCDATA) will still get passed to the application exactly as under SGML. It is the application’s responsibility to handle it correctly. The parser must inform the application that white-space has occurred in element content, if it can detect it. (Users of SGML will recognize that this information is not in the ESIS, but it is in the Grove.)
Explain about what parts of an XML document are case-sensitive?
- All of it, both markup and text. This is significantly different from HTML and most other SGML applications. It was done to allow markup in non-Latin-alphabet languages, and to obviate problems with case-folding in writing systems which are caseless.
- Element type names are case-sensitive: you must follow whatever combination of upper- or lower-case you use to define them (either by first usage or in a DTD or Schema). So you can’t say <BODY>…</body>:upper- and lower-case must match; thus <Img/>, <IMG/>, and <img/> are three different element types;
- For well-formed XML documents with no DTD, the first occurrence of an element type name defines the casing;
- Attribute names are also case-sensitive, for example the two width attributes in <PIC width=”7in”/> and <PIC WIDTH=”6in”/> (if they occurred in the same file) are separate attributes, because of the different case of width and WIDTH;
- Attribute values are also case-sensitive. CDATA values (eg Url=”MyFile.SGML”) always have been, but NAME types (ID and IDREF attributes, and token list attributes) are now case-sensitive as well;
- All general and parameter entity names (eg A), and your data content (text), are case-sensitive as always.
Does XML provide the option to use non-Latin characters?
Yes, the XML Specification explicitly says XML uses ISO 10646,the international standard character repertoire which covers most known languages. Unicode is an identical repertoire, and the two standards track each other. The spec says (2.2): ‘All XML processors must accept the UTF-8 and UTF-16 encodings of ISO 10646…’.
There is a Unicode FAQ at http://www.unicode.org/faq/FAQ. UTF-8 is an encoding of Unicode into 8-bit characters: the first 128 are the same as ASCII, and higher-order characters are used to encode anything else from Unicode into sequences of between 2 and 6 bytes. UTF-8 in its single-octet form is therefore the same as ISO 646 IRV (ASCII), so you can continue to use ASCII for English or other languages using the Latin alphabet without diacritics.Note that UTF-8 is incompatible with ISO 8859-1 (ISO Latin-1) after code point 127 decimal (the end of ASCII). UTF-16 is an encoding of Unicode into 16-bit characters, which lets it represent 16 planes. UTF-16 is incompatible with ASCII because it uses two 8-bit bytes per character (four bytes above U+FFFF).
Does XML allow the user to make up his own tags?
No, it lets you make up names for your own element types. If you think tags and elements are the same thing you are already in considerable trouble: read the rest of this question carefully.
How can one create his own document type?
Document types usually need a formal description, either a DTD or a Schema. Whilst it is possible to process well-formed XML documents without any such description, trying to create them without one is asking for trouble. A DTD or Schema is used with an XML editor or API interface to guide and control the construction of the document, making sure the right elements go in the right places. Creating your own document type therefore begins with an analysis of the class of documents you want to describe: reports, invoices, letters, configuration files, credit-card verification requests, or whatever. Once you have the structure correct, you write code to express this formally, using DTD or Schema syntax.
How can we get XML into or out of a database?
Ask your database manufacturer: they all provide XML import and export modules to connect XML applications with databases. In some trivial cases there will be a 1:1 match between field names in the database table and element type names in the XML Schema or DTD, but in most cases some programming will be required to establish the desired match. This can usually be stored as a procedure so that subsequent uses are simply commands or calls with the relevant parameters.
In less trivial, but still simple, cases, you could export by writing a report routine that formats the output as an XML document, and you could import by writing an XSLT transformation that formatted the XML data as a load file.
Explain the way by which XML handle metadata?
Because XML lets you define your own markup languages, you can make full use of the extended hypertext features of XML (see the question on Links) to store or link to metadata in any format (eg using ISO 11179, as a Topic Maps Published Subject, with Dublin Core, Warwick Framework, or with Resource Description Framework (RDF), or even Platform for Internet Content Selection (PICS)).
There are no predefined elements in XML, because it is an architecture, not an application, so it is not part of XML’s job to specify how or if authors should or should not implement metadata. You are therefore free to use any suitable method. Browser makers may also have their own architectural recommendations or methods to propose.