You can also find all 66 answers here π Devinterview.io - XML
XML stands for "Extensible Markup Language". It is a versatile and standardized language for document markup. XML is known for its self-descriptive, human- and machine-readable text format.
- Elements: The fundamental building blocks denoted by start and end tags, e.g.,
<name>John</name>
. - Attributes: Properties described within start tags, e.g.,
<book genre="fantasy">
. - Text Content: The data enclosed within start and end tags.
- Well-Formed: A document that adheres to basic XML syntax rules.
- Valid: Goes beyond being well-formed to comply with a specific Document Type Definition (DTD) or XML Schema.
- Root Element: The topmost element in a hierarchical structure.
- Namespace: A mechanism for avoiding element name conflicts.
While both are based on markup, HTML is a specific application focusing on web content, whereas XML offers a generalized framework. Unlike HTML, XML doesn't come with predefined tags or a rendering mechanism; its strength lies in customizability and data structuring.
- Data Exchange: Widely employed in web services, APIs, and more for data transmission.
- Configuration Files: Many software applications use XML for their configuration needs.
- Document Management: Especially for data storage and retrieval in a consistent format.
XML has been instrumental in shaping various other technologies, including:
- XPath: For navigating and selecting XML nodes.
- XSLT: A tool for transforming XML content.
- XQuery: A query language tailored for XML data sources.
- XML Schema: A standard for defining document structure and data types.
- XML Namespaces: To avoid conflicts when combining XML vocabularies.
XML (eXtensible Markup Language) is a versatile tool for structuring and sharing information across a variety of platforms. Let's look at its key features and benefits:
-
Self-Descriptive: XML documents come with clear metadata, making their content understandable without external resources.
-
Hierarchical Structure: Data is organized in a tree-like manner with parent and child nodes.
-
Text-Based: Being human-readable, XML is an excellent choice for data interchange among applications.
-
Platform-Independent: It's compatible with different operating systems and software.
-
Unicode Support: XML readily handles multilingual content.
-
Extensible: Its design allows for custom tags, catering to specific data requirements.
-
Data Validation: XML Schema and Document Type Definitions (DTDs) ensure data integrity.
-
Comment and Processing Instruction Support: XML can include non-content directives.
-
Versioning: Various XML standards and namespaces allow for content differentiation.
-
Interoperability: XML enables data transfer between heterogeneous systems.
-
Standardization: Its widespread acceptance ensures a degree of uniformity in data formats.
-
Data Integrity: Schemas help ensure documents adhere to predefined structures.
-
Documented Structure: The clear organization aids in data comprehension.
-
Longevity: XML's stable, non-proprietary nature ensures documents remain accessible over time.
XML and HTML have distinct roles and design philosophies.
HTML is primarily used for displaying and formatting content in web browsers, while XML is a versatile tool for structuring, sharing, and storing data across a variety of applications.
- HTML: Designed for web content. Specifies the presentation of content for web browsers.
- XML: Content-agnostic. Focuses on structuring data in a machine-readable format, adaptable for various domains and applications.
- HTML: Semantically organizes content for display. Emphasizes how data should look.
- XML: Focuses on structuring and describing data semantically. Designed to represent information in a meaningful way, independent of its visual representation.
- HTML: Follows a specific set of rules defined by the HTML standard. HTML documents can be validated against this standard.
- XML: Documents need to be "well-formed," adhering to syntax rules such as having a root element and proper tag nesting. However, unlike HTML, XML doesn't imply a single, universally accepted standard. This allows for more flexibility to cater to specific data structuring needs.
- HTML: Tags are predefined and tied to specific rendering behaviors.
- XML: Tag names and document structures can be customized to reflect the nature of the data being represented. This is accomplished through XML Schema or Document Type Definitions (DTDs).
- HTML: Uses attributes to provide metadata or additional information about certain elements.
- XML: Accommodates both attributes and child elements for representing metadata, offering a more flexible approach.
- HTML: Supports scripting and interactivity through mechanisms like the
<script>
and<a>
tags. - XML: Primarily focuses on data representation and doesn't inherently support interactive features.
XML (Extensible Markup Language) serves as a structured and platform-independent means for data representation and transmission.
At its core, an XML document is a collection of labeled data, enclosed within tags, and organized in a hierarchical manner.
-
Prolog: This optional section at the document's beginning provides metadata and defines the character encoding.
<?xml version="1.0" encoding="UTF-8"?>
-
Root Element: Every XML document must have a single root element, which serves as the starting point for its hierarchical structure.
<root> <!-- Child elements and data here --> </root>
-
Elements: These are the data containers, marked by start and end tags. Elements can be either empty or contain text and/or child elements.
<example>This is element content.</example> <empty-element />
-
Attributes: Elements can have associated attributes, providing additional metadata. Attributes are represented within the element's start tag.
<book title="Sample Title">...</book>
- Well-Formedness: All XML documents must adhere to a set of syntax rules to be considered "well-formed."
- Validity: Beyond being well-formed, a document can also conform to a predefined structure, as described by a Document Type Definition (DTD) or a schema.
Here is the code:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="001">Title 1</book>
<book id="002">Title 2</book>
</catalog>
XML elements form the foundational units of an XML document and follow a specific structure.
An XML element typically contains:
-
Start-Tag: This tag begins the element and may include attributes.
-
Element Content: This is the actual data or child elements contained within the element.
-
End-Tag: This tag marks the completion of the element, using the element's name.
-
Completion: Some elements may be self-closing, especially if they have no content and no need for separate end tag.
Here is the XML and the JSON:
<book>
<title lang="en">Understanding XML</title>
<author><![CDATA[Joe Author]]></author>
<pubDate>2008-10-01</pubDate>
<price>30.00</price>
<discounted/>
</book>
{
"book": {
"title": {
"_attributes": {"lang": "en"},
"_cdata": "Understanding XML"
},
"author": "Joe Author",
"pubDate": "2008-10-01",
"price": 30.00,
"discounted": true
}
}
In this example, "book," "title," "author," "pubDate," and "price" are regular elements, while "discounted" is a self-closing element.
Well-formed XML needs to adhere to these fundamental rules:
- Nesting: Elements must nest properly, like containers within containers.
- Matching: Every start-tag must have a corresponding end-tag.
- Attribute Structure: Within an element, attributes can appear in the start-tag, and each attribute can only have one value.
XML attributes are designed to provide information about an element rather than its content. Unlike child elements, which are enclosed in start and end tags, attributes are key-value pairs that sit within a start tag.
- Uniqueness: Each attribute within an element is unique.
- Simplicity: Attributes are limited to scalar values.
- Flexibility: Most attributes can be optional or mandatory, and their values can often be constrained.
- Scope: They apply only to the specific element that hosts them.
- For Meta-Data: Use attributes to convey metadata or properties about an element.
- For Simple Content: Opt for attributes when you need to associate straightforward, scalar values with an element.
Here is the XML code:
<book category="fiction">
<title>1984</title>
<author genre="dystopian">George Orwell</author>
<price currency="USD">9.99</price>
</book>
In this example, both the category
and genre
attributes help characterize their respective elements. The currency
attribute for the price
element provides context about the monetary unit, and the value "9.99" is inherently tied to the element, denoting the book's price.
XML namespaces are a mechanism for avoiding name conflicts when integrating XML documents from multiple sources or defining multiple elements and attributes within the same document.
In a heterogeneous XML environment, different sources or standards can use identical element names for various purposes. Without a mechanism to distinguish these homonymous elements, integrating or interpreting XML data becomes challenging.
For instance, both a music and astronomy dataset might feature <note>
elements, each with its unique meaning.
Namespaces consist of two primary components:
- Namespace Name: A URI serving as a unique identifier. This does not have to resolve to an actual web resource but should be distinct across XML documents.
- Prefixed Identifiers: In the XML document, elements and attributes are augmented with a prefix linked to the namespace. This creates qualified names like
abc:element
.
music.xml:
<song>
<note>Play softly</note>
</song>
astronomy.xml:
<discovery>
<note>Interesting observation</note>
</discovery>
Merging these documents leads to potential misinterpretations with <note>
.
To address this, XML documents evolved to support namespaces. Here's how the music and astronomy dataset would look using namespaces:
music_ns.xml:
<m:metadata xmlns:m="https://music-ns.example.com">
<m:composer>Beethoven</m:composer>
<m:note>Play softly</m:note>
</m:metadata>
astronomy_ns.xml:
<a:record xmlns:a="https://astro-ns.example.com">
<a:observation>
<a:note>Interesting phenomenon</a:note>
</a:observation>
</a:record>
Now, with distinct namespace-qualified names (m:note
, a:note
), misunderstandings are less likely.
Think of namespaces like virtual Internet domains. Each domain can have a note
resource (e.g., a web page), and within the context of that domain (namespace), the meaning of note
is specific and unambiguous.
Well-formedness is a baseline requirement for any XML document and ensures its structural and syntactic accuracy. Validity, on the other hand, pertains to the conformance of the XML document to a specific Document Type Definition (DTD) or XML Schema Definition (XSD).
-
Well-Formedness
- Serves as a foundational prerequisite.
- Universally applicable.
- Ensures structural and syntactic accuracy, covering aspects like balanced tags, attribute quoting, and proper nesting.
-
Validity
- Dependent on the presence of a DTD or XSD.
- Conforms to a more advanced set of criteria specific to the defined schema or DTD.
- Includes requirements laid out in the schema, such as data types for attributes, allowed element structures, and more.
- Proper Nesting: Elements must nest properly, i.e., they cannot intersect. For example,
<a><b></a></b>
is not allowed. - Root Element: The document must have a single root element.
- Attribute Values: Values must be enclosed in quotes.
- Self-Closing Tags: Empty elements should end with
/
, or the closing tag should be separate. - Special Character Escaping: Certain characters like
<
and&
have special escape sequences.
- Conformance to Schema: The XML document must align with the rules and structures defined in its associated DTD or XSD.
- Data Type Adherence: Attribute values should match the specified data types. For instance, an attribute expecting an integer should not have a non-numeric value.
- Occurrence Constraints: The number of times an element can appear in a specific context, based on the schema, must be adhered to.
The XML declaration is an optional, top-of-the-document statement that begins with <?xml ... ?>
. It is not considered a tag because it doesn't describe any data or structure. Instead, its purpose is to provide essential information about the document itself.
-
Document Labeling: The XML declaration clearly indicates that the file is an XML document. This simple, recognizable structure helps both humans and software recognize the file type.
-
Encoding Specification: XML allows various textual encodings, from UTF-8 to UTF-16 and more. The declaration identifies which encoding the document adheres to. For instance,
<?xml version="1.0" encoding="UTF-8"?>
. -
Version Tracking: Different XML standards and best practices may evolve. The version attribute in the declaration helps identify which XML standard the document follows. For example,
<?xml version="1.0"?>
. -
Namespace Awareness: XML documents might employ namespaces to prevent naming conflicts or integrate data from diverse sources.
-
Standalone Mode: Documents marked as "standalone" must operate independently, lacking external dependencies.
The XML declaration adheres to a precise structure:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Every XML rule (such as version and encoding) is enclosed in double quotes.
Here is an example of the version attribute:
<?xml version="1.1"?>
The encoding attribute ensures consistent display and interpretation of special characters and symbols. Here is an example:
<?xml version="1.0" encoding="UTF-8"?>
Namespaces are a primary mechanism, especially in more advanced XML deployments like SOAP or XHTML.
This is an example on how to use a namespace in XML:
<note xmlns:ex="http://www.example.com">
<ex:body />
</note>
The standalone
attribute, when present, should have a value of either "yes" or "no". It signals whether the document is entirely autonomous or if it requires external resources.
Here is an example:
<?xml version="1.0" standalone="yes"?>
-
Best Practice: While not mandatory, it is generally advisable to include the XML declaration. This is especially crucial for documents that might undergo post-production processing, like XSLT transformations.
-
Encoding: It's common to employ
UTF-8
for new documents because of its comprehensive character support and lightweight design. Still, when in doubt, the encoding attribute offers clarity. -
Version: Even though
1.0
remains widespread, it's good to check for supersedors or updates. -
Standalone Attribute: This attribute's utility is somewhat limited in modern applications and hence is often omitted.
XML entities provide a way to represent special characters, reuse data, and reference external resources in an XML document.
- General Entities: Used for storing and reusing arbitrary XML content, such as text or markup.
- Parameter Entities : Are used only in DTDs (Document Type Definitions) in the definition of elements with sub-elements.
- Unparsed Entities: Save data, such as images or other files, in a non-text format.
-
Use
&entityName;
to refer to an entity.-
For general entities, it could represent a character or a sequence enclosed by
<!
and>
. -
For unparsed entities, it starts with
&
and ends with;
.
-
-
Place entity declarations within the DTD or an external subset for parsed and general entities.
Here is the XML code:
<!DOCTYPE note [
<!ENTITY greeting "Hello! Welcome to the interview.">
<!ENTITY logo SYSTEM "company_logo.gif">
]>
<note>
&greeting;
<logo>&logo;</logo>
</note>
In this example, &greeting;
displays the predefined text, while <logo>&logo;</logo>
references an external, unparsed entity, displaying a logo image.
XML elements and attributes have naming rules to follow:
- Elements: Defined by a start and end tag, such as
<title>
and</title>
. - Attributes: Provided within start tags, like in
<book ISBN="123456">
, the "ISBN" is the attribute.
Here are the shared naming rules for both elements and attributes:
-
Validity: Must conform to the defined syntax rules for names.
-
Uniqueness: Within the context of its parent, the combination of its name and type (either element or attribute) must be unique.
-
Type-Specific Rules:
- Elements: Should not start with the letters "xml" (or any letter case variation).
- Attributes: Unless otherwise specified, there are no additional restrictions on attribute names.
-
Case Sensitivity: XML is case-sensitive. This trait extends to both element and attribute names.
-
Special Characters:
- In Elements: The name shouldn't contain certain characters like
<
,>
or the whitespace character at the start of the name. - In Attributes: The name shouldn't include the characters
"
,'
,<
, or>
. Moreover, it cannot begin with one or more spaces.
- In Elements: The name shouldn't contain certain characters like
-
Content-vs.-Meta Naming: The criteria utilized for naming an element or attribute should mirror its intended or conveyed semantic meaning.
XML elements are the building blocks of an XML document and can contain rich, structured data.
Attributes, on the other hand, are used to provide metadata or additional information about an element.
-
Elements: They define the hierarchical structure of the XML document, indicating what pieces of data are contained within others. For example, in the markup
<book><title>Sample Title</title></book>
, the<title>
element is contained within the<book>
element and represents the title of the book. -
Attributes: Attributes are not containers; they are name-value pairs that provide additional metadata about an element. In the XML
<book genre="novel">
,genre
is the attribute name, and "novel" is its value.
-
Elements: They can contain any kind of data, including simple text, numbers, or more complex structures like arrays or objects. In XML, all data is represented either as element content or as the text value of an element. For example, both
<title>Sample Title</title>
and<author><first-name>John</first-name><last-name>Doe</last-name></author>
represent textual data within elements. -
Attributes: Attribute values are generally simple and are treated as plain text. They can't contain child elements or complex data structures.
-
Elements: Unless explicitly defined as such in the XML schema or document description, most elements are used to represent unique pieces of data within their parent container. In other words, an element name within a particular parent context represents that specific kind of data. For instance, in
<book><title>One</title><title>Two</title></book>
, the intention might be to provide two distinct titles. Such design choices need to be managed consistently across the XML document. -
Attributes: Within the same element, an attribute can only be defined once. In the context of the XML parent element, the combination of the attribute name and its assigned value uniquely characterizes the element. For example, the XML markup
<person id="123" name="Alice" age="30" />
conveys that for the unique person entity, the ID, name, and age attributes have specific values.
XML comments facilitate clear, concise, and structured documentation within XML files.
They are especially useful when sharing XML-based content across teams or for remembering specific details within the editorial process.
XML comments are recognizable through their <!--
start sequence and -->
end sequence.
<!-- This is a XML comment -->
-
Visibility: Unlike regular content, comments are not visible in many rendering or processing tools. They are, however, accessible in the XML source.
-
Parsing: Content within comments isn't parsed as XML. This can be useful for keeping draft sections or details that might not conform to the XML structure.
-
Versatility: While comments offer a textual method for annotating XML, more comprehensive metadata and documentation approaches can be established using dedicated XML schemes like DTDs, Schemas, or namespaces.
XML processing instructions (PI) provide a way to embed machine-readable directives in XML documents. They offer control over various processing tasks, including styling, inclusion of external resources, and application-specific actions.
PIs start with the <?
sequence and end with ?>
. They are case-sensitive and whitespace-aware, typically following this structure: <?targetName instruction1="value1" instruction2="value2"?>
.
xml
: For specifying XML document attributes.xml-stylesheet
: For associating an XSLT style sheet with the XML document.- Custom Names: Used for domain-specific processing.
XSLT can be applied for rendering, whereas CDATA
sections help with unescaped content, and entity
references can represent special characters.
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
Associates an XSLT file for styling.
Wrap text content that might contain <
or &
characters.
<description><![CDATA[Some text with <markup> and an &]]></description>
Represent special characters, e.g., <
for <
.
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
In XML, there are distinct ways to represent an element that contains no content. These are Empty Elements and Self-Closing Elements. While they achieve a similar outcome, each has specific use-cases and limitations.
-
Attributes: Self-closing elements, known as "unary tags" in some XML-based languages like XHTML, are tagged on elements to indicate that the opening and closing tags are one and the same. Conversely, any standard XML element can be made into an empty element by removing its closing tag.
-
Content Possibility: Unlike self-closing elements, which are standalone, empty elements can still have attributes and may also contain a mixture of child elements and text content.
This type is suitable when the element doesn't have any attributes and doesn't require any child elements.
<example />
This is the appropriate choice when an element doesn't have attributes, but it may contain child elements or text content.
<root>
<example />
</root>
-
HTML: In HTML, all elements marked as self-closing can also be represented as empty elements. For instance, in XHTML, the
<br>
tag can be written as<br />
. -
XHTML: The elements in this branch of HTML are constrained by its corresponding XML structure and therefore must be consistently represented either as self-closing or as empty elements.
For instance, an image in XHTML would be:
<img src="example.png" alt="Example" />
Meanwhile, in non-XHTML HTML, such as HTML5:
<img src="example.png" alt="Example" />
- Android Layouts: In the context of Android's XML-based layouts, elements can be represented using either the self-closing or empty format. Both representations are equivalent.
Here is an example for an ImageView
:
<ImageView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:src="@drawable/example" />
Or as an self-closing element:
<ImageView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:src="@drawable/example" />