Apache POI is a popular Java library for working with Microsoft documents. For example, it allows you reading and writing Microsoft Excel files using Java. When I was recently looking into the library, I noticed a little vulnerability which then became CVE-2019-12415. The issue has been fixed in POI 4.1.1. Below are the details.

CVE-2019-12415: XML processing vulnerability in Apache POI

The issue

Besides many other formats, Apache POI can work with Microsoft Excel documents. In particular, the library contains the XSSFExportToXml class that is used in processing Microsoft Excel Open XML Spreadsheet (XLSX) files. The class takes a Map element, which is defined in Open Office XML specification, and converts it to XML.

The Map element contains the internals of an XLSX file. In particular, it contains an XSD schema. The XSSFExportToXml.exportToXml() method may be instructed to use this schema for XML validation:

private boolean isValid(Document xml) throws SAXException {
    try {
        String language = "http://www.w3.org/2001/XMLSchema";
        SchemaFactory factory = SchemaFactory.newInstance(language);
        Source source = new DOMSource(map.getSchema());
        Schema schema = factory.newSchema(source);
        Validator validator = schema.newValidator();
        validator.validate(new DOMSource(xml));
...

The problem here is that SchemaFactory doesn’t turn the security XML processing mode on which results in an XXE vulnerability if an attacker can pass a malicious XSD schema to the isValid()method.

How does the schema come to the isValid() method? The answer is simple: it comes from an XLSX document. First of all, an XSLT document is just a ZIP archive. If you extract it, you’ll find a bunch of XML documents and other files. The XSD schema comes from the xl/xmlMaps.xml file which contains something like the following:

<MapInfo xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" SelectionNamespaces="">
    <Schema ID="Schema2">
        <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xmlns="">
        ...

There may be various payloads. For example, an attacker can inject a <xs:redefine schemaLocation="https://internal.site/endpoint"> element into the schema. Here https://internal.site/endpoint is a URL to a resource from the private network which can’t be directly accessed by an attacker:

<MapInfo xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" SelectionNamespaces="">
    <Schema ID="Schema2">
        <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xmlns="">
            <xs:redefine schemaLocation="https://internal.site/endpoint">
            ...

Then, the attacker archives everything back, and the malicious XLSX is ready. When SchemaFactory loads the schema, it’s going to access https://internal.site/endpoint.

The issue has been fixed by setting XMLConstants.FEATURE_SECURE_PROCESSING feature to SchemaFactory.

What an attacker can do

The attacker can use all features of the XSD format in payloads. Possible consequences include but may not be limited to:

  1. Server Side Request Forgery (SSRF)
  2. Sensitive information leak from local and remote resources

Pre-requisites for an exploit

Here is what makes an application vulnerable:

  1. The application uses Apache POI 4.1.0 and below.
  2. The application allows untrusted data to be processed by the XSSFExportToXml class.
  3. The third parameter of the XSSFExportToXml.exportToXml() method is set to true which enables XML validation.

Conclusion

The Java standard library offers a lot of classes for XML processing. DocumentBuilder is one of the most popular classes to load an XML document. However, there are many other classes such as SchemaFactory, Transformer and so on which may parse XML documents. All of these classes should be configured to do it in a safe way if they take date from untrusted sources.

References