Package com.xmlmind.util
Class LoadText
- java.lang.Object
-
- com.xmlmind.util.LoadText
-
public final class LoadText extends Object
A utility class allowing to load a text file. For example, a CSS stylesheet starting with a BOM, a@charset
or no special encoding specification.Unlike
FileUtil.loadString(java.io.File)
andURLUtil.loadString(java.net.URL)
, this utility class implements the detection of the encoding.Note that the detection of the encoding always succeeds because it uses a fallback value.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
LoadText.EmacsStyleDetector
Detects an encoding by parsing-*- coding: ENCODING -*-
.static class
LoadText.Encoding
Encoding returned byguessEncoding(byte[], int, int)
.static interface
LoadText.EncodingDetector
Detects an encoding by parsing an ASCII encoding specification (example:@charset "UTF-8";
).static class
LoadText.EncodingDetectorBase
A base class which checks for validity the encoding returned byLoadText.EncodingDetectorBase.doDetectEncoding(java.lang.String)
.static class
LoadText.HTMLCharsetDetector
Detects an encoding by parsing<meta charset="ENCODING" >
or<meta http-equiv="Content-Type" content="text/html; charset=ENCODING">
.static class
LoadText.KeywordBasedDetector
Detects an encoding by parsingKEYWORD "ENCODING";
, for example@charset "ENCODING";
.static class
LoadText.XMLEncodingDetector
Detects an encoding by parsing<?xml encoding="ENCODING"?>
.
-
Field Summary
Fields Modifier and Type Field Description static LoadText.EncodingDetector[]
ALL_ENCODING_DETECTORS
A ready-to-use array containing allLoadText.EncodingDetector
s.static byte[]
BOM_UTF16_BE
TheUTF-16BE
BOM (Byte Order Mark).static byte[]
BOM_UTF16_LE
TheUTF-16LE
BOM (Byte Order Mark).static byte[]
BOM_UTF8
TheUTF-8
BOM (Byte Order Mark).static LoadText.KeywordBasedDetector
CSS_CHARSET_DETECTOR
A ready-to-use instance ofKeywordBasedDetector("@charset")
(CSS stylesheets).static LoadText.EmacsStyleDetector
EMACS_STYLE_DETECTOR
A ready-to-use instance ofLoadText.EmacsStyleDetector
.static LoadText.HTMLCharsetDetector
HTML_CHARSET_DETECTOR
A ready-to-use instance ofLoadText.HTMLCharsetDetector
.static LoadText.XMLEncodingDetector
XML_ENCODING_DETECTOR
A ready-to-use instance ofLoadText.XMLEncodingDetector
.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
checkEncoding(String encoding)
Returns the canonical name ofencoding
if valid;null
otherwise.static Reader
createReader(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)
Creates a reader which can be used to read the contents of specified text source.static String
detectEncoding(byte[] bytes, int byteCount, int[] bomLength, LoadText.EncodingDetector... detectors)
Detect encoding by examining specified bytes which have been read at the very start of a text file.static LoadText.Encoding
guessEncoding(byte[] bytes, int offset, int length)
Guess the encoding of a text file by examining its first few bytes.static String
loadChars(Reader in)
Load the characters contained in specified source.static String
loadText(File file, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)
static String
loadText(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)
Loads the contents of specified text source.static String
loadText(URL url, boolean followRedirects, int timeout, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)
Loads the contents of specified text file.static String
loadText(URL url, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)
-
-
-
Field Detail
-
XML_ENCODING_DETECTOR
public static final LoadText.XMLEncodingDetector XML_ENCODING_DETECTOR
A ready-to-use instance ofLoadText.XMLEncodingDetector
.
-
CSS_CHARSET_DETECTOR
public static final LoadText.KeywordBasedDetector CSS_CHARSET_DETECTOR
A ready-to-use instance ofKeywordBasedDetector("@charset")
(CSS stylesheets).
-
HTML_CHARSET_DETECTOR
public static final LoadText.HTMLCharsetDetector HTML_CHARSET_DETECTOR
A ready-to-use instance ofLoadText.HTMLCharsetDetector
.
-
EMACS_STYLE_DETECTOR
public static final LoadText.EmacsStyleDetector EMACS_STYLE_DETECTOR
A ready-to-use instance ofLoadText.EmacsStyleDetector
.
-
ALL_ENCODING_DETECTORS
public static final LoadText.EncodingDetector[] ALL_ENCODING_DETECTORS
A ready-to-use array containing allLoadText.EncodingDetector
s.
-
BOM_UTF16_BE
public static final byte[] BOM_UTF16_BE
TheUTF-16BE
BOM (Byte Order Mark).
-
BOM_UTF16_LE
public static final byte[] BOM_UTF16_LE
TheUTF-16LE
BOM (Byte Order Mark).
-
BOM_UTF8
public static final byte[] BOM_UTF8
TheUTF-8
BOM (Byte Order Mark).
-
-
Method Detail
-
loadText
public static String loadText(File file, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
- Throws:
IOException
-
loadText
public static String loadText(URL url, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
- Throws:
IOException
-
loadText
public static String loadText(URL url, boolean followRedirects, int timeout, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
Loads the contents of specified text file.- Parameters:
url
- the location of the text file.followRedirects
- iftrue
, follow redirections, ("301: Moved Permanently", "302: Temporary Redirect") including very commonhttp
tohttps
ones. No effect unlessurl
is anhttp
/https
URL.timeout
- specifies both connect and read timeout values in milliseconds. 0 means: infinite timeout. A negative value means: default value.fallbackEncoding
- the fallback encoding. May benull
in which case a sensible value (generallySystemUtil.defaultEncoding
) is automatically determined.encoding
- the encoding actually used to load the text is copied there. May benull
.detectors
- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";
.- Returns:
- the contents of the text file
- Throws:
IOException
- if there is an I/O problem
-
loadText
public static String loadText(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
Loads the contents of specified text source.This method implements the detection of the encoding.
Note that the detection of the encoding always works because it uses a fallback value.
- Parameters:
in
- the text source.fallbackEncoding
- the fallback encoding. May benull
in which case a sensible value (generallySystemUtil.defaultEncoding
) is automatically determined.encoding
- the encoding actually used to load the text is copied there. May benull
.detectors
- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";
.- Returns:
- the contents of the text source
- Throws:
IOException
- if there is an I/O problem
-
loadChars
public static String loadChars(Reader in) throws IOException
Load the characters contained in specified source.- Parameters:
in
- the character source- Returns:
- the contents of the character source
- Throws:
IOException
- if there is an I/O problem
-
createReader
public static Reader createReader(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
Creates a reader which can be used to read the contents of specified text source.- Parameters:
in
- the text source.fallbackEncoding
- the fallback encoding. May benull
in which case a sensible value (generallySystemUtil.defaultEncoding
) is automatically determined.encoding
- the encoding actually used to load the text is copied there. May benull
.detectors
- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";
.- Returns:
- a reader allowing to read the contents of the text source. This reader will automatically skip the BOM if any.
- Throws:
IOException
- if there is an I/O problem
-
checkEncoding
public static final String checkEncoding(String encoding)
Returns the canonical name ofencoding
if valid;null
otherwise.
-
detectEncoding
public static String detectEncoding(byte[] bytes, int byteCount, int[] bomLength, LoadText.EncodingDetector... detectors)
Detect encoding by examining specified bytes which have been read at the very start of a text file.- Parameters:
bytes
- bytes read at the beginning of a text file.byteCount
- number of bytes read at the beginning of a text file.bomLength
- the length of the BOM is stored as the first element of this array. This allows to skip the BOM. May benull
.detectors
- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";
.- Returns:
- encoding if detected;
null
otherwise
-
guessEncoding
public static LoadText.Encoding guessEncoding(byte[] bytes, int offset, int length)
Guess the encoding of a text file by examining its first few bytes.- Parameters:
bytes
- byte buffer.offset
- byte buffer offset.length
- byte buffer length. At least 4 for this function to work.- Returns:
- encoding if detected;
Encoding.UNKNOWN
otherwise
-
-