|
Short notes on XML [ Tutorial ] |
|
|
Written by Veena Devi
|
This is a short tutorial on XML
What is XML?
• XML stands for eXtensible Markup Language
• XML is a universal method representing data
– Used in applications, web and for data exchange
• XML is a markup language much like HTML, but used for different purposes
• XML is not a replacement for HTML
• XML was designed to describe data
• XML is a cross-platform, software and hardware independent tool for transmitting or exchanging information.
• XML is an open-standards-based technology
• Extensible
• Both Human and machine readable
What Exactly is XML used for?
• Storing data in a structured manner. ( Tree structure)
• Storing configuration information – typically data in an application which is not stored in a database
– Most server software have configuration files in XML formats
• Transmitting data between applications
– Overcomes Problems in Client Server applications which are crossplatform in nature
• Ex: A Windows program talking to a mainframe
• Little and Big Endian problems
• Data type size variations across platforms
– XML is a universal, standardized language used to represent data such that it can be both processed independently and exchanged between programs and applications and between clients and servers
- Disparate systems can exchange information in a common format
XML Syntax
• The syntax rules of XML are very simple and very strict.
• XML tags are not predefined. You must define your own tags
– <company> Infosys</company>
• All XML elements must have a closing tag
– <para>This is a paragraph</para>
• XML tags are case sensitive
– <Msg>This is incorrect</msg> Incorrect
– <msg>This is correct</msg> Correct
• All XML elements must be properly nested
– <name>Jill<lname>Jack</name></lname> Incorrect
– <name>Jill<lname>Jack</lname></name> Correct
• Attribute values must always be quoted
– <pen color=red>reynolds</pen> Incorrect
– <pen color=“red”>reynolds</pen> Correct
XML Syntax
• All XML documents must have a root element
<parent>
<child>
<subchild>.....</subchild>
</child>
</parent>
• With XML, the white space in your document is not truncated
– <name> Bill Joy</name>
• With XML, CR / LF is converted to LF
– In Windows applications, a new line is normally stored as a pair of characters:
carriage return (CR) and line feed (LF).
– In Unix applications, a new line is normally stored as a LF character.
XML Comments
• Comments in XML
– Comments are similar to HTML
– <!-- This is a comment -->
<?xml version="1.0"?>
<!–- Customer details -->
<customer>
<name>John Conlon</name>
<email>
This email address is being protected from spam bots, you need Javascript enabled to view it
</email>
</customer>
XML Code
<?xml version="1.0"?>
<customers>
<customer>
<name>John Conlon</name>
<email>
This email address is being protected from spam bots, you need Javascript enabled to view it
</email>
</customer>
<customer>
<name>Tom</name>
<email/>
</customer>
</customers>
Extensibility in XML
• A typical XML document is made up of tags enclosing the data; tag names describe the data
• Because the language is extensible, you can create tags that are specific to your need
• For example, your document may contain tags to structure information about employees
– The tags may include <Name>, <Designation>,and <Address>
• Data stored in XML is self-descriptive
– One can understand the data by just looking at tag names
XML – Exchanging Info Between Apps
• Convert information stored in the database (or any other format) to an XML format
• Once it is in XML format, other applications/programs can parse (read) the XML document, which is made up of the initial data
• XML parsers are freely available and are part of many new programming languages


Document Type Declaration (DTD)
• DTD (Document Type Definition) is used to enforce structure requirements for
an XML document
• Document type declaration contains reference to Document Type Definition
(DTD) and tells the parser which DTD to use for validation
<?xml version="1.0"?>
<!DOCTYPE customers [
<!ELEMENT customers (customer)>
<!ELEMENT customer (name,email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
<customers>
<customer>
<name>John Conlon</name>
<email>
This email address is being protected from spam bots, you need Javascript enabled to view it
</email>
</customer>
</customers>
XML Schema
• An XML based alternative to DTD
• Richer and more useful than DTDs
• Written in XML and Simpler than DTDs
• Support data type validation (DTD does not support data type validation)
<?xml version="1.0"?>
<addressBook>
<person>
<cname>Harrison Ford</cname>
<email>
This email address is being protected from spam bots, you need Javascript enabled to view it
</email>
</person>
<person>
<cname>Julia Roberts</cname>
<email>
This email address is being protected from spam bots, you need Javascript enabled to view it
</email>
</person>
</addressBook>
XML Schema
<?xml version="1.0"?>
<xs:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>
<xs:complexType name="record">
<xs:sequence>
<xs:element name="cname" type="xs:string"/>
<xs:element name="email" type="xs:string/>
</xs:sequence>
</xs:complexType>
<xs:element name="addressBook">
<xs:complexType>
<xs:sequence>
<xs:element name="person" type="record"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Well-formed XML Documents
• A document is made of elements; There is exactly one element, called the root, or document element
• For all other elements, the elements, delimited by start- and end-tags, nest properly within each other
• Attributes if any, should have their values enclosed within quotes
Valid XML Documents
• An XML document is valid if it has an associated DTD or Schema and if the document complies with the constraints expressed in it
• If an XML document is valid, it is also well-formed
Parsers
• An XML parser is a processor that reads an XML document and determines the structure and properties of the data.
• Validating XML Parser
– Checks for validity of XML documents against the DTD or schema
• Non-validating XML Parser
– Ignores the constraints mentioned in DTD or schema
• Any parser capable of checking for validity is capable of checking for well-formedness of XML as well
• There are two types of parsers based on the architecture and working of the parsers
Types of Parsers
• DOM Parser (Document Object Model)
– Parse and load entire XML file into memory.
– Create an instance of org.w3c.dom.Document .
– Forms a hierarchical tree structure.
– Not useful for large XML files.
– Easy to Implement
• SAX Parser (Simple API for XML)
– Do not create any internal representation of the document .
– Event Based.
• Calls handler functions when certain events take place.
– Executes from Top to Bottom ( Sequential)
– Useful for large XML files.
• The MSXML Parser
– Implemented as a DOM Parser
– MSXML (now known as Microsoft XML Core Services) is the API to parse and
process XML documents in Visual Basic, Visual C++ etc
– It comes along with IE5 (and above) browser
– In this course, we use MSXML Parser in IE browser to validate XML
HTML vs XML
• XML and HTML were designed with different purposes
• XML was designed to describe data and to focus on what data is.
• HTML was designed to display data and to focus on how data looks.
| HTML |
XML |
Defines how data should be displayed
Predefined tags
Not extensible
Tags are not case sensitive
All elements need not be well formed
Used for designing Web pages
Interpreted by the Web browser.
Plain ASCII format |
Defines how data should be stored or structured
Uses user defined tags
Extensible
Tags are case sensitive
All elements must be properly nested
Not used for Presentation directly
Store data for any kind of application
Used to create newer markup languages
Parsed by application or parser.
Plain ASCII format |
Applications of XML
• XML can be used to Store Data
– XML can also be used to store data in files or in databases
• XML is used to create configuration files for different applications.
– Most Servers use XML configuration files.
– Typically used for small configuration information which has to be both human and machine readable
• XML is used to Exchange Data in cross-platform applications
– data can be exchanged between incompatible systems.
• Used in Web applications
– XML application development, programming and contracting services are using in a variety of web applications and services.
• XML can be used to Create new data representative Languages
– The Wireless Markup Language (WML) is written in XML
– CML (Chemical Markup Language).
|