public final class LoadText extends Object
@charset
or no special encoding specification.
Unlike FileUtil.loadString(java.io.File)
and URLUtil.loadString(java.net.URL)
,
this utility class implements the detection of the encoding.
Note that the detection of the encoding always succeeds because it uses a fallback value.
Modifier and Type | Class and Description |
---|---|
static class |
LoadText.EmacsStyleDetector
Detects an encoding by parsing
-*- coding: ENCODING -*-.
|
static class |
LoadText.Encoding
Encoding returned by
guessEncoding(byte[], int, int) . |
static interface |
LoadText.EncodingDetector
Detects an encoding by parsing an ASCII encoding specification
(example:
@charset "UTF-8"; ). |
static class |
LoadText.EncodingDetectorBase
A base class which checks for validity the encoding returned by
LoadText.EncodingDetectorBase.doDetectEncoding(java.lang.String) . |
static class |
LoadText.HTMLCharsetDetector
Detects an encoding by parsing
<meta charset="ENCODING" > or
<meta http-equiv="Content-Type"
content="text/html; charset=ENCODING">.
|
static class |
LoadText.KeywordBasedDetector
Detects an encoding by parsing
KEYWORD "ENCODING";,
for example @charset "ENCODING";.
|
static class |
LoadText.XMLEncodingDetector
Detects an encoding by parsing
<?xml encoding="ENCODING"?>.
|
Modifier and Type | Field and Description |
---|---|
static LoadText.EncodingDetector[] |
ALL_ENCODING_DETECTORS
A ready-to-use array containing all
LoadText.EncodingDetector s. |
static byte[] |
BOM_UTF16_BE
The
UTF-16BE BOM (Byte Order Mark). |
static byte[] |
BOM_UTF16_LE
The
UTF-16LE BOM (Byte Order Mark). |
static byte[] |
BOM_UTF8
The
UTF-8 BOM (Byte Order Mark). |
static LoadText.KeywordBasedDetector |
CSS_CHARSET_DETECTOR
A ready-to-use instance of
KeywordBasedDetector("@charset") (CSS stylesheets). |
static LoadText.EmacsStyleDetector |
EMACS_STYLE_DETECTOR
A ready-to-use instance of
LoadText.EmacsStyleDetector . |
static LoadText.HTMLCharsetDetector |
HTML_CHARSET_DETECTOR
A ready-to-use instance of
LoadText.HTMLCharsetDetector . |
static LoadText.XMLEncodingDetector |
XML_ENCODING_DETECTOR
A ready-to-use instance of
LoadText.XMLEncodingDetector . |
Modifier and Type | Method and Description |
---|---|
static String |
checkEncoding(String encoding)
Returns the canonical name of encoding if valid;
null otherwise. |
static Reader |
createReader(InputStream in,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
Creates a reader which can be used to read the contents
of specified text source.
|
static String |
detectEncoding(byte[] bytes,
int byteCount,
int[] bomLength,
LoadText.EncodingDetector... detectors)
Detect encoding by examining specified bytes which
have been read at the very start of a text file.
|
static LoadText.Encoding |
guessEncoding(byte[] bytes,
int offset,
int length)
Guess the encoding of a text file by examining its first few bytes.
|
static String |
loadChars(Reader in)
Load the characters contained in specified source.
|
static String |
loadText(File file,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
|
static String |
loadText(InputStream in,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
Loads the contents of specified text source.
|
static String |
loadText(URL url,
boolean followRedirects,
int timeout,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
Loads the contents of specified text file.
|
static String |
loadText(URL url,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
|
public static final LoadText.XMLEncodingDetector XML_ENCODING_DETECTOR
LoadText.XMLEncodingDetector
.public static final LoadText.KeywordBasedDetector CSS_CHARSET_DETECTOR
KeywordBasedDetector("@charset")
(CSS stylesheets).public static final LoadText.HTMLCharsetDetector HTML_CHARSET_DETECTOR
LoadText.HTMLCharsetDetector
.public static final LoadText.EmacsStyleDetector EMACS_STYLE_DETECTOR
LoadText.EmacsStyleDetector
.public static final LoadText.EncodingDetector[] ALL_ENCODING_DETECTORS
LoadText.EncodingDetector
s.public static final byte[] BOM_UTF16_BE
UTF-16BE
BOM (Byte Order Mark).public static final byte[] BOM_UTF16_LE
UTF-16LE
BOM (Byte Order Mark).public static final byte[] BOM_UTF8
UTF-8
BOM (Byte Order Mark).public static String loadText(File file, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
IOException
public static String loadText(URL url, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
IOException
public static String loadText(URL url, boolean followRedirects, int timeout, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
url
- the location of the text file.followRedirects
- if true
, follow redirections,
("301: Moved Permanently", "302: Temporary Redirect") including
very common http
to https
ones.
No effect unless url
is an
http
/https
URL.timeout
- specifies both connect and read timeout values
in milliseconds. 0 means: infinite timeout.
A negative value means: default value.fallbackEncoding
- the fallback encoding.
May be null
in which case a sensible value
(generally SystemUtil.defaultEncoding
) is automatically
determined.encoding
- the encoding actually used to load the text
is copied there. May be null
.detectors
- unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";
.IOException
- if there is an I/O problempublic static String loadText(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
This method implements the detection of the encoding.
Note that the detection of the encoding always works because it uses a fallback value.
in
- the text source.fallbackEncoding
- the fallback encoding.
May be null
in which case a sensible value
(generally SystemUtil.defaultEncoding
) is automatically
determined.encoding
- the encoding actually used to load the text
is copied there. May be null
.detectors
- unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";
.IOException
- if there is an I/O problempublic static String loadChars(Reader in) throws IOException
in
- the character sourceIOException
- if there is an I/O problempublic static Reader createReader(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
in
- the text source.fallbackEncoding
- the fallback encoding.
May be null
in which case a sensible value
(generally SystemUtil.defaultEncoding
) is automatically
determined.encoding
- the encoding actually used to load the text
is copied there. May be null
.detectors
- unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";
.IOException
- if there is an I/O problempublic static final String checkEncoding(String encoding)
null
otherwise.public static String detectEncoding(byte[] bytes, int byteCount, int[] bomLength, LoadText.EncodingDetector... detectors)
bytes
- bytes read at the beginning of a text file.byteCount
- number of bytes read at the beginning of a text file.bomLength
- the length of the BOM is stored as the first element
of this array. This allows to skip the BOM. May be null
.detectors
- unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";
.null
otherwisepublic static LoadText.Encoding guessEncoding(byte[] bytes, int offset, int length)
bytes
- byte buffer.offset
- byte buffer offset.length
- byte buffer length. At least 4 for this function to work.Encoding.UNKNOWN
otherwise