Class LoadText


  • public final class LoadText
    extends Object
    A utility class allowing to load a text file. For example, a CSS stylesheet starting with a BOM, a @charset or no special encoding specification.

    Unlike FileUtil.loadString(java.io.File) and URLUtil.loadString(java.net.URL), this utility class implements the detection of the encoding.

    Note that the detection of the encoding always succeeds because it uses a fallback value.

    • Method Detail

      • loadText

        public static String loadText​(URL url,
                                      boolean followRedirects,
                                      int timeout,
                                      String fallbackEncoding,
                                      String[] encoding,
                                      LoadText.EncodingDetector... detectors)
                               throws IOException
        Loads the contents of specified text file.
        Parameters:
        url - the location of the text file.
        followRedirects - if true, follow redirections, ("301: Moved Permanently", "302: Temporary Redirect") including very common http to https ones. No effect unless url is an http/https URL.
        timeout - specifies both connect and read timeout values in milliseconds. 0 means: infinite timeout. A negative value means: default value.
        fallbackEncoding - the fallback encoding. May be null in which case a sensible value (generally SystemUtil.defaultEncoding) is automatically determined.
        encoding - the encoding actually used to load the text is copied there. May be null.
        detectors - unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like @charset "UTF-8";.
        Returns:
        the contents of the text file
        Throws:
        IOException - if there is an I/O problem
      • loadText

        public static String loadText​(InputStream in,
                                      String fallbackEncoding,
                                      String[] encoding,
                                      LoadText.EncodingDetector... detectors)
                               throws IOException
        Loads the contents of specified text source.

        This method implements the detection of the encoding.

        Note that the detection of the encoding always works because it uses a fallback value.

        Parameters:
        in - the text source.
        fallbackEncoding - the fallback encoding. May be null in which case a sensible value (generally SystemUtil.defaultEncoding) is automatically determined.
        encoding - the encoding actually used to load the text is copied there. May be null.
        detectors - unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like @charset "UTF-8";.
        Returns:
        the contents of the text source
        Throws:
        IOException - if there is an I/O problem
      • loadChars

        public static String loadChars​(Reader in)
                                throws IOException
        Load the characters contained in specified source.
        Parameters:
        in - the character source
        Returns:
        the contents of the character source
        Throws:
        IOException - if there is an I/O problem
      • createReader

        public static Reader createReader​(InputStream in,
                                          String fallbackEncoding,
                                          String[] encoding,
                                          LoadText.EncodingDetector... detectors)
                                   throws IOException
        Creates a reader which can be used to read the contents of specified text source.
        Parameters:
        in - the text source.
        fallbackEncoding - the fallback encoding. May be null in which case a sensible value (generally SystemUtil.defaultEncoding) is automatically determined.
        encoding - the encoding actually used to load the text is copied there. May be null.
        detectors - unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like @charset "UTF-8";.
        Returns:
        a reader allowing to read the contents of the text source. This reader will automatically skip the BOM if any.
        Throws:
        IOException - if there is an I/O problem
      • checkEncoding

        public static final String checkEncoding​(String encoding)
        Returns the canonical name of encoding if valid; null otherwise.
      • detectEncoding

        public static String detectEncoding​(byte[] bytes,
                                            int byteCount,
                                            int[] bomLength,
                                            LoadText.EncodingDetector... detectors)
        Detect encoding by examining specified bytes which have been read at the very start of a text file.
        Parameters:
        bytes - bytes read at the beginning of a text file.
        byteCount - number of bytes read at the beginning of a text file.
        bomLength - the length of the BOM is stored as the first element of this array. This allows to skip the BOM. May be null.
        detectors - unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like @charset "UTF-8";.
        Returns:
        encoding if detected; null otherwise
      • guessEncoding

        public static LoadText.Encoding guessEncoding​(byte[] bytes,
                                                      int offset,
                                                      int length)
        Guess the encoding of a text file by examining its first few bytes.
        Parameters:
        bytes - byte buffer.
        offset - byte buffer offset.
        length - byte buffer length. At least 4 for this function to work.
        Returns:
        encoding if detected; Encoding.UNKNOWN otherwise