Java And UTF-8

When running a java application,
make sure to use -Dfile.encoding=UTF8,
for an 8-bit Unicode-support.

Interaction with standard input/output
requires setting the output’s code-page too.

For Windows run: chcp 65001 before you execute java.

For Unix/Linux: export LC_ALL="en_US.UTF-8".


Here is an example of a Windows batch-file named apktool.cmd,
that wraps around a jar-program named apktool.jar:

@echo off
chcp 65001 2>nul >nul
call java.exe "-d64 -Xverify:none -Duser.language=en -Dfile.encoding=UTF8" -jar "%~dp0\apktool.jar" %*

and here is why it is important to specify the output’s encoding: github.com/iBotPeaches/Apktool/issues/1595


Reader classes are now covered, for Writer classes make sure to explicit use UTF-8.
prefer using OutputStreamWriter (instead of FileWriter which does not support explicit charset):

BufferedWriter out = new BufferedWriter(
                       new OutputStreamWriter(
                         new FileOutputStream(path)
                        ,"UTF-8"
                       )
                     );

Some swear by explicit providing and additional reader/writer set,
so here is that too:

 OutputStreamWriter char_output = new OutputStreamWriter(
     new FileOutputStream("some_output.utf8"),
     Charset.forName("UTF-8").newEncoder() 
 );

 InputStreamReader char_input = new InputStreamReader(
     new FileInputStream("some_input.utf8"),
     Charset.forName("UTF-8").newDecoder() 
 );

which can be specified to handle input-error, which is sometime useful..

CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
encoder.onMalformedInput(CodingErrorAction.REPORT);
encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));

Everything here is considered quite-safe to use,
since 8-bit Unicode (UTF-8) will still allow a fallback to displaying
characters out-of US-ASCII (for example) without the need for adding an
extra BOM bits, unlike UTF-16…

Enjoy!