Blessing Krofegha
21 Apr 2022
•
4 min read
The security of any internet application is critical; safely rendering data as an HTML document in an application is even more challenging; if adequate security precautions are not taken, an application could be hijacked by malicious code attackers. The HTML Sanitizer API addresses this issue. The HTML Sanitizer API is explained in this article and how to use it in web applications.
HTML Sanitization is the process of checking an HTML document for safety. It entails examining a current HTML document to create a new HTML document from it, with only the elements regarded as safe and undangerous. This can protect a web application from cross-site scripting (XSS) attacks by allowing basic HTML tags to be inserted into a webpage while disallowing more advanced tags or HTML attributes Onclick attributes that can be used by attackers.
To use this API, you will need to instantiate a new object from the Sanitizer class, which we can use to sanitize strings of HTML so we can safely insert it into the DOM during instantiation. It can pass an optional object parameter to the constructor to configure the Sanitizer instance. By default, the Sanitizer constructor removes out XSS-relevant inputs, including script tags, exposing an application to malicious codes. Passing the configuration parameter is only necessary to handle application-specific use cases. To instantiate a new Sanitizer object, we use the code below;
// Instantiating the object with the default configurations
let sanitizer = new Sanitizer();
// Instantiating the sanitizer object configuration
const config = {
allowElements: ["em", "b", "p"],
blockElements: ["i", "span"],
dropElements: ["h6"],
// allow styles only on divs
allowAttributes: { style: ["div"] }, // to allow styles on all elements, {"style": ["*"]}
// drop the id attribute on span
dropAttributes: { id: ["span"] }, // to drop the id attribute everywhere {"id": ["*"]}
allowCustomElements: true,
allowComments: true,
};
const configured_sanitizer = new Sanitizer(config);
The parameters supplied into the config object in the preceding sample code are described below.
allowElements
The allowElements option is an array of strings with elements that the sanitizer should retain in the input.
blockElements
The blockElements
option is an array of strings with elements where the sanitizer should remove the elements from the input but retain their children.
dropElements
The dropElements option is an array of strings with elements that the sanitizer should remove from the input, including its children.
allowAttributes
The allowAttributes
option is an attribute match list, which determines whether an attribute (on a given element) should be allowed.
dropAttributes
The dropAttributes
option is an attribute match list, which determines whether it should drop an attribute (on a given element).
allowCustomElements
They allow custom elements option controls whether or not custom elements are taken into account, and dropping them is the default. We will still verify custom elements against all other built-in or specified tests if this option is true.
allowComments
The allowComments
option determines whether HTML comments are allowed.
The API exposes three core methods that developers can use to check for the safety of an HTML string they are;
setHTML(input, sanitizer) is the syntax for the setHTML method, where input is the string of HTML to be sanitized and sanitizer is an instance of the Sanitizer Class.
The setHTML
method is part of the Element interface and is used to parse and sanitize an HTML string. The parsing step removes any HTML elements that are invalid in the context of the element from the input parameter. In contrast, the sanitization process removes any additional dangerous or undesired elements and attributes. Instead of using the Element.innerHTML method to inject an untrusted string of HTML into an element, use the Element.setHTML
method.
const unsanitized_html_string = "hello <script>alert(123)</script> world"; // Unsanitized string of HTML
const sanitizer = new Sanitizer(); // Default sanitizer;
// Get the Element with id "target" and set it with the sanitized string.
const target = document.getElementById("target");
target.setHTML(unsanitized_html_string, sanitizer);
console.log(target.innerHTML);
// "hello world"
Sanitizer.sanitizeFor( ) sanitizeFor(element, input) is the syntax for the sanitizeFor method. The element parameter is a string indicating the element that the input will be inserted into, for example, "div," "p," "section," "article." The input parameter is the string of HTML to be sanitized.
The sanitizeFor method is part of the Sanitizer interface and accepts the destination tag name of an HTML element as the first parameter and the string to be sanitized as the second parameter. The returned value is the HTML element object of that type supplied as a parameter to the function that contains the sanitized subtree as its child. For example, if "div" was passed as an argument, the return value will be an HTMLDivElement. This method is used to sanitize an untrusted HTML string that is available but that the developer wants to insert into the DOM later.
const unsanitized_html_string = "hello <script>alert(123)</script> world"; // Unsanitized string of HTML
const sanitizer = new Sanitizer(); // Default sanitizer;
// sanitizeFor used to sanitize the string
let sanitizedDiv = sanitizer.sanitizeFor("div", unsanitized_html_string);
//We can verify the returned element type, and view sanitized HTML in string form:
console.log( (sanitizedDiv instanceof HTMLDivElement) );
// true
console.log(sanitizedDiv.innerHTML)
// "hello world"
// At a later time ...
// Get the element to update. This must be a div to match our sanitizeFor() context.
// Set its content to be the children of our sanitized element.
document.querySelector("div#target").replaceChildren(sanitizedDiv.children);
Sanitizer.sanitize( ) The sanitize method is part of the Sanitizer interface; it sanitizes a tree of DOM nodes and removes every unwanted element and attribute. This method is to be used when the data to be sanitized already available as nodes in the DOM. The syntax for this method is sanitized (input). The input parameter is a DocumentFragment or Document.
The sanitize method is used below to sanitize the content of an iframe with id myFrame
const sanitizer = new Sanitizer();
const frame_element = document.getElementById("myFrame")
const unsanitized_frame_tree = frame_element.contentWindow.document;
// Sanitize the document tree and update the frame.
const sanitized_frame_tree = sanitizer.sanitize(unsanitized_frame_tree);
frame_element.replaceChildren(sanitized_frame_tree);
The Sanitizer API aims to reduce the level of vulnerability a web application can have by cleaning untrusted HTML elements before they are injected into the DOM. It should be utilized by developers to boost the security of web apps as the number of applications on the internet grows. As of writing, this technology is still considered experimental and should not be used in a production context.
Blessing Krofegha
Blessing Krofegha is a Software Engineer Based in Lagos Nigeria, with a burning desire to contribute to making the web awesome for all, by writing and building solutions.
See other articles by Blessing
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!