Class XMLText


  • public final class XMLText
    extends Object
    A collection of utility functions (static methods) related to XML characters and XML text.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static String checkId​(String s)
      Equivalent to checkId(s, false, '_').
      static String checkId​(String s, boolean keepStartNCNameChar, char replacementChar)
      Converts specified string to a valid, though not always unique, ID.
      static boolean checkText​(String text)
      Returns false if specified text contains non-XML characters.
      static String collapseWhiteSpace​(String value)
      Replaces successive XML space characters by a single space character (' ') then removes leading and trailing space characters if any.
      static String compressWhiteSpace​(String value)
      Replaces successive XML space characters ('\t', '\r', '\n', ' ') by a single space character (' ').
      static void escapeXML​(char[] chars, int offset, int length, StringBuilder escaped)
      Escapes specified character array (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
      static void escapeXML​(char[] chars, int offset, int length, StringBuilder escaped, int maxCode)
      Escapes specified character array (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
      static String escapeXML​(String string)
      Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
      static void escapeXML​(String string, StringBuilder escaped)
      Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
      static String filterText​(String text)
      Returns a copy of specified text after removing all non-XML characters (if any).
      static boolean isName​(String s)
      Tests if specified string is a lexically correct Name.
      static boolean isNameChar​(char c)
      Tests if specified character can used in an Name at a position other the first one.
      static boolean isNameOtherChar​(char c)
      Tests if specified character, even if not authorized as the first character of an Name, can be one of the other characters of an Name.
      static boolean isNameStartChar​(char c)
      Tests if specified character can used as the start of an Name.
      static boolean isNCName​(String s)
      Tests if specified string is a lexically correct NCName.
      static boolean isNCNameChar​(char c)
      Tests if specified character can used in an NCName at a position other the first one.
      static boolean isNCNameOtherChar​(char c)
      Tests if specified character, even if not authorized as the first character of an NCName, can be one of the other characters of an NCName.
      static boolean isNCNameStartChar​(char c)
      Tests if specified character can used as the start of an NCName.
      static boolean isNmtoken​(String s)
      Tests if specified string is a lexically correct NMTOKEN.
      static boolean isPITarget​(String s)
      Tests if specified string is a lexically correct target for a process instruction.
      static boolean isXMLChar​(char c)
      Tests if specified character is a character which can be contained in a XML document.
      static boolean isXMLSpace​(char c)
      Tests if specified character is a XML space ('\t', '\r', '\n', ' ').
      static boolean isXMLSpace​(CharSequence chars)
      Tests whether specified character sequence only contains XML space ('\t', '\r', '\n', ' ').
      static String quoteXML​(String string)
      Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc) then puts the escaped string between quotes (").
      static void quoteXML​(String string, StringBuilder quoted)
      Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc) then puts the escaped string between quotes (").
      static String replaceWhiteSpace​(String value)
      Replaces sequence "\r\n" and characters '\t', '\r', '\n' by a single space character ' '.
      static String[] splitList​(String s)
      Splits specified string at XML whitespace character boundaries ('\t', '\r', '\n', ' ').
      static String unescapeXML​(String text)
      Unescapes specified string.
      static void unescapeXML​(String text, int offset, int length, StringBuilder unescaped)
      Unescapes specified string.
    • Method Detail

      • isXMLSpace

        public static boolean isXMLSpace​(char c)
        Tests if specified character is a XML space ('\t', '\r', '\n', ' ').
        Parameters:
        c - character to be tested
        Returns:
        true if test is successful; false otherwise
      • isXMLSpace

        public static boolean isXMLSpace​(CharSequence chars)
        Tests whether specified character sequence only contains XML space ('\t', '\r', '\n', ' ').
        Parameters:
        chars - character sequence to be tested
        Returns:
        true if chars is empty or only contains XML space; false otherwise
      • isXMLChar

        public static boolean isXMLChar​(char c)
        Tests if specified character is a character which can be contained in a XML document.
        Parameters:
        c - character to be tested
        Returns:
        true if test is successful; false otherwise
      • checkText

        public static boolean checkText​(String text)
        Returns false if specified text contains non-XML characters. Otherwise, return true.
      • filterText

        public static String filterText​(String text)
        Returns a copy of specified text after removing all non-XML characters (if any). Moreover, this function always replaces '\r' and "\r\n" by '\n'.
        Parameters:
        text - text to be filtered
        Returns:
        filtered text
      • isNCNameStartChar

        public static boolean isNCNameStartChar​(char c)
        Tests if specified character can used as the start of an NCName.

        Corresponds to: Letter | '_'.

        See Also:
        isNCNameOtherChar(char), isNCNameChar(char)
      • isNCNameOtherChar

        public static boolean isNCNameOtherChar​(char c)
        Tests if specified character, even if not authorized as the first character of an NCName, can be one of the other characters of an NCName.

        Corresponds to: Digit | '.' | '-' | CombiningChar | Extender.

        See Also:
        isNCNameStartChar(char), isNCNameChar(char)
      • isNCNameChar

        public static boolean isNCNameChar​(char c)
        Tests if specified character can used in an NCName at a position other the first one.

        Corresponds to: Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender.

        See Also:
        isNCNameStartChar(char), isNCNameOtherChar(char)
      • isNCName

        public static boolean isNCName​(String s)
        Tests if specified string is a lexically correct NCName.
        Parameters:
        s - string to be tested
        Returns:
        true if test is successful; false otherwise
      • isNameStartChar

        public static boolean isNameStartChar​(char c)
        Tests if specified character can used as the start of an Name.

        Corresponds to: Letter | '_' | ':'.

        See Also:
        isNameOtherChar(char), isNameChar(char)
      • isNameOtherChar

        public static boolean isNameOtherChar​(char c)
        Tests if specified character, even if not authorized as the first character of an Name, can be one of the other characters of an Name.

        Corresponds to: Digit | '.' | '-' | ':' | CombiningChar | Extender.

        See Also:
        isNameStartChar(char), isNameChar(char)
      • isNameChar

        public static boolean isNameChar​(char c)
        Tests if specified character can used in an Name at a position other the first one.

        Corresponds to: Letter|Digit | '.' | '-' | '_' | ':' | CombiningChar|Extender.

        See Also:
        isNameStartChar(char), isNameOtherChar(char)
      • isName

        public static boolean isName​(String s)
        Tests if specified string is a lexically correct Name.
        Parameters:
        s - string to be tested
        Returns:
        true if test is successful; false otherwise
      • isNmtoken

        public static boolean isNmtoken​(String s)
        Tests if specified string is a lexically correct NMTOKEN.
        Parameters:
        s - string to be tested
        Returns:
        true if test is successful; false otherwise
      • isPITarget

        public static boolean isPITarget​(String s)
        Tests if specified string is a lexically correct target for a process instruction.

        Note that Names starting with "xml" (case-insensitive) are rejected.

        Parameters:
        s - string to be tested
        Returns:
        true if test is successful; false otherwise
      • checkId

        public static String checkId​(String s,
                                     boolean keepStartNCNameChar,
                                     char replacementChar)
        Converts specified string to a valid, though not always unique, ID. Returns "_" for an empty or null string.
        Parameters:
        s - string to be checked as an NCName.
        keepStartNCNameChar - if true and first char is an NCName, keep it and prepend replacementChar to before it.
        replacementChar - character used to replace invalid ones. Must be a letter or '_'.
        Returns:
        replacementChar as a string for an empty or null string.
      • collapseWhiteSpace

        public static String collapseWhiteSpace​(String value)
        Replaces successive XML space characters by a single space character (' ') then removes leading and trailing space characters if any.
        Parameters:
        value - string to be processed
        Returns:
        processed string
      • compressWhiteSpace

        public static String compressWhiteSpace​(String value)
        Replaces successive XML space characters ('\t', '\r', '\n', ' ') by a single space character (' ').
        Parameters:
        value - string to be processed
        Returns:
        processed string
      • replaceWhiteSpace

        public static String replaceWhiteSpace​(String value)
        Replaces sequence "\r\n" and characters '\t', '\r', '\n' by a single space character ' '.
        Parameters:
        value - string to be processed
        Returns:
        processed string
      • splitList

        public static String[] splitList​(String s)
        Splits specified string at XML whitespace character boundaries ('\t', '\r', '\n', ' '). Returns list of parts.
        Parameters:
        s - string to be split
        Returns:
        list of parts
      • quoteXML

        public static String quoteXML​(String string)
        Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc) then puts the escaped string between quotes (").
        Parameters:
        string - string to be escaped and quoted
        Returns:
        escaped and quoted string
      • quoteXML

        public static void quoteXML​(String string,
                                    StringBuilder quoted)
        Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc) then puts the escaped string between quotes (").
        Parameters:
        string - string to be escaped and quoted
        quoted - buffer used to store escaped and quoted string (characters are appended to this buffer)
      • escapeXML

        public static String escapeXML​(String string)
        Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
        Parameters:
        string - string to be escaped
        Returns:
        escaped string
      • escapeXML

        public static void escapeXML​(String string,
                                     StringBuilder escaped)
        Escapes specified string (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
        Parameters:
        string - string to be escaped
        escaped - buffer used to store escaped string (characters are appended to this buffer)
      • escapeXML

        public static void escapeXML​(char[] chars,
                                     int offset,
                                     int length,
                                     StringBuilder escaped)
        Escapes specified character array (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
        Parameters:
        chars - character array to be escaped
        offset - specifies first character in array to be escaped
        length - number of characters in array to be escaped
        escaped - buffer used to store escaped string (characters are appended to this buffer)
      • escapeXML

        public static void escapeXML​(char[] chars,
                                     int offset,
                                     int length,
                                     StringBuilder escaped,
                                     int maxCode)
        Escapes specified character array (that is, '<' is replaced by "&#60;", '&' is replaced by "&#38;", etc).
        Parameters:
        chars - character array to be escaped
        offset - specifies first character in array to be escaped
        length - number of characters in array to be escaped
        escaped - buffer used to store escaped string (characters are appended to this buffer)
        maxCode - characters with code > maxCode are escaped as &#code;. Pass 127 for US-ASCII, 255 for ISO-8859-1, otherwise pass Integer.MAX_VALUE.
      • unescapeXML

        public static String unescapeXML​(String text)
        Unescapes specified string. Inverse operation of escapeXML(java.lang.String).
        Parameters:
        text - string to be unescaped
        Returns:
        unescaped string
      • unescapeXML

        public static void unescapeXML​(String text,
                                       int offset,
                                       int length,
                                       StringBuilder unescaped)
        Unescapes specified string. Inverse operation of escapeXML(java.lang.String).
        Parameters:
        text - string to be unescaped
        offset - specifies first character in string to be unescaped
        length - number of characters in string to be unescaped
        unescaped - buffer used to store unescaped string (characters are appended to this buffer)