CVE-2019-12415: XML processing vulnerability in Apache POI
Apache POI is a popular Java library for working with Microsoft documents. For example, it allows you reading and writing Microsoft Excel files using Java. When I was recently looking into the library, I noticed a little vulnerability which then became CVE-2019-12415. The issue has been fixed in POI 4.1.1. Below are the details.
The issue
Besides many other formats, Apache POI can work with Microsoft Excel documents. In particular, the library contains the XSSFExportToXml
class that is used in processing Microsoft Excel Open XML Spreadsheet (XLSX) files. The class takes a Map
element, which is defined in Open Office XML specification, and converts it to XML.
The Map
element contains the internals of an XLSX file. In particular, it contains an XSD schema. The XSSFExportToXml.exportToXml()
method may be instructed to use this schema for XML validation:
private boolean isValid(Document xml) throws SAXException {
try {
String language = "http://www.w3.org/2001/XMLSchema";
SchemaFactory factory = SchemaFactory.newInstance(language);
Source source = new DOMSource(map.getSchema());
Schema schema = factory.newSchema(source);
Validator validator = schema.newValidator();
validator.validate(new DOMSource(xml));
...
The problem here is that SchemaFactory
doesn’t turn the security XML processing mode on which results in an XXE vulnerability if an attacker can pass a malicious XSD schema to the isValid()
method.
How does the schema come to the isValid()
method? The answer is simple: it comes from an XLSX document. First of all, an XSLT document is just a ZIP archive. If you extract it, you’ll find a bunch of XML documents and other files. The XSD schema comes from the xl/xmlMaps.xml
file which contains something like the following:
<MapInfo xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" SelectionNamespaces="">
<Schema ID="Schema2">
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xmlns="">
...
There may be various payloads. For example, an attacker can inject a <xs:redefine schemaLocation="https://internal.site/endpoint">
element into the schema. Here https://internal.site/endpoint
is a URL to a resource from the private network which can’t be directly accessed by an attacker:
<MapInfo xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" SelectionNamespaces="">
<Schema ID="Schema2">
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xmlns="">
<xs:redefine schemaLocation="https://internal.site/endpoint">
...
Then, the attacker archives everything back, and the malicious XLSX is ready. When SchemaFactory
loads the schema, it’s going to access https://internal.site/endpoint
.
The issue has been fixed by setting XMLConstants.FEATURE_SECURE_PROCESSING
feature to SchemaFactory
.
What an attacker can do
The attacker can use all features of the XSD format in payloads. Possible consequences include but may not be limited to:
- Server Side Request Forgery (SSRF)
- Sensitive information leak from local and remote resources
Pre-requisites for an exploit
Here is what makes an application vulnerable:
- The application uses Apache POI 4.1.0 and below.
- The application allows untrusted data to be processed by the
XSSFExportToXml
class. - The third parameter of the
XSSFExportToXml.exportToXml()
method is set to true which enables XML validation.
Conclusion
The Java standard library offers a lot of classes for XML processing. DocumentBuilder
is one of the most popular classes to load an XML document. However, there are many other classes such as SchemaFactory
, Transformer
and so on which may parse XML documents. All of these classes should be configured to do it in a safe way if they take date from untrusted sources.