Package org.jsoup.safety
Class Cleaner
- java.lang.Object
-
- org.jsoup.safety.Cleaner
-
public class Cleaner extends java.lang.Object
The whitelist based HTML cleaner. Use to ensure that end-user provided HTML contains only the elements and attributes that you are expecting; no junk, and no cross-site scripting attacks!The HTML cleaner parses the input as HTML and then runs it through a white-list, so the output HTML can only contain HTML that is allowed by the whitelist.
It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned white-lists only allow body contained tags.
Rather than interacting directly with a Cleaner object, generally see the
clean
methods inJsoup
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
Cleaner.CleaningVisitor
Iterates the input and copies trusted nodes (tags, attributes, text) into the destination.private static class
Cleaner.ElementMeta
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Document
clean(Document dirtyDocument)
Creates a new, clean document, from the original dirty document, containing only elements allowed by the whitelist.private int
copySafeNodes(Element source, Element dest)
private Cleaner.ElementMeta
createSafeElement(Element sourceEl)
boolean
isValid(Document dirtyDocument)
Determines if the input document bodyis valid, against the whitelist.boolean
isValidBodyHtml(java.lang.String bodyHtml)
-
-
-
Field Detail
-
whitelist
private Whitelist whitelist
-
-
Constructor Detail
-
Cleaner
public Cleaner(Whitelist whitelist)
Create a new cleaner, that sanitizes documents using the supplied whitelist.- Parameters:
whitelist
- white-list to clean with
-
-
Method Detail
-
clean
public Document clean(Document dirtyDocument)
Creates a new, clean document, from the original dirty document, containing only elements allowed by the whitelist. The original document is not modified. Only elements from the dirt document'sbody
are used.- Parameters:
dirtyDocument
- Untrusted base document to clean.- Returns:
- cleaned document.
-
isValid
public boolean isValid(Document dirtyDocument)
Determines if the input document bodyis valid, against the whitelist. It is considered valid if all the tags and attributes in the input HTML are allowed by the whitelist, and that there is no content in thehead
.This method can be used as a validator for user input. An invalid document will still be cleaned successfully using the
clean(Document)
document. If using as a validator, it is recommended to still clean the document to ensure enforced attributes are set correctly, and that the output is tidied.- Parameters:
dirtyDocument
- document to test- Returns:
- true if no tags or attributes need to be removed; false if they do
-
isValidBodyHtml
public boolean isValidBodyHtml(java.lang.String bodyHtml)
-
createSafeElement
private Cleaner.ElementMeta createSafeElement(Element sourceEl)
-
-