Java处理UTF-8带BOM的文本的读写



	 

	什么是BOM

	 

	BOM（byte-order mark），即字节顺序标记，它是插入到以UTF-8、UTF16或UTF-32编码Unicode文件开头的特殊标记，用来识别Unicode文件的编码类型。对于UTF-8来说，BOM并不是必须的，因为BOM用来标记多字节编码文件的编码类型和字节顺序（big-endian或little- endian）。

	 

	BOMs 文件头:

	   00 00 FE FF    = UTF-32, big-endian

	   FF FE 00 00    = UTF-32, little-endian

	   EF BB BF       = UTF-8,

	   FE FF          = UTF-16, big-endian

	   FF FE          = UTF-16, little-endian

	 

	‍

	 

	下面举个例子，针对UTF-8的文件BOM做个处理：

	 

	String　xmla　=　StringFileToolkit.file2String（new　File（“D：\\projects\\mailpost\\src\\a.xml”），“UTF-8”）;

	 

	byte［］　b　=　xmla.getBytes（“UTF-8”）;

	 

	String　xml　=　new　String（b，3，b.length-3，“UTF-8”）;

	 

	..............

	 

	思路是：先按照UTF-8编码读取文件后，跳过前三个字符，重新构建一个新的字符串，然后用Dom4j解析处理，这样就不会报错了。

	 

	其他编码的方式处理思路类似，其实可以写一个通用的自动识别的BOM的工具，去掉BOM信息，返回字符串。

	 

	不过这个处理过程已经有牛人解决过了：http://koti.mbnet.fi/akini/java/unicodereader/

	 

	Java代码 

	‍Example code using UnicodeReader class 

	Here is an example method to read text file. It will recognize bom marker and skip it while reading.  

	 

	//import ‍http://koti.mbnet.fi/akini/java/unicodereader/UnicodeReader.java.txt 

	   public static char[] loadFile(String file) throws IOException { 

	      // read text file, auto recognize bom marker or use  

	      // system default if markers not found. 

	      BufferedReader reader = null; 

	      CharArrayWriter writer = null; 

	      UnicodeReader r = new UnicodeReader(new FileInputStream(file), null); 

	   

	      char[] buffer = new char[16 * 1024];   // 16k buffer 

	      int read; 

	      try { 

	         reader = new BufferedReader(r); 

	         writer = new CharArrayWriter(); 

	         while( (read = reader.read(buffer)) != -1) { 

	            writer.write(buffer, 0, read); 

	         } 

	         writer.flush(); 

	         return writer.toCharArray(); 

	      } catch (IOException ex) { 

	         throw ex; 

	      } finally { 

	         try { 

	            writer.close(); reader.close(); r.close(); 

	         } catch (Exception ex) { } 

	      } 

	   } 

	 

	Java代码 

	Example code to write UTF-8 with bom marker 

	Write bom marker bytes to start of empty file and all proper text editors have no problems using a correct charset while reading files. Java's OutputStreamWriter does not write utf8 bom marker bytes.  

	 

	 

	   public static void saveFile(String file, String data, boolean append) throws IOException { 

	      BufferedWriter bw = null; 

	      OutputStreamWriter osw = null; 

	   

	      File f = new File(file); 

	      FileOutputStream fos = new FileOutputStream(f, append); 

	      try { 

	         // write UTF8 BOM mark if file is empty 

	         if (f.length() < 1) { 

	           final byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF }; 

	            fos.write(bom); 

	         } 

	 

	         osw = new OutputStreamWriter(fos, "UTF-8"); 

	         bw = new BufferedWriter(osw); 

	         if (data != null) bw.write(data); 

	      } catch (IOException ex) { 

	         throw ex; 

	      } finally { 

	         try { bw.close(); fos.close(); } catch (Exception ex) { } 

	      } 

	   } 

	  

	 

	 

	实际应用：

	Java代码 

	package com.dayo.gerber; 

	 

	import java.io.BufferedReader; 

	import java.io.BufferedWriter; 

	import java.io.File; 

	import java.io.FileInputStream; 

	import java.io.FileOutputStream; 

	import java.io.IOException; 

	import java.io.InputStream; 

	import java.io.InputStreamReader; 

	import java.io.OutputStreamWriter; 

	import java.io.Reader; 

	import java.util.Properties; 

	 

	/**

	 * 

	 * @author 刘飞(liufei)

	 * 

	 */ 

	public class Generate4YYQTPScript { 

	 

	    private static final String ENCODING = "UTF-8"; 

	    private static final String GERBER_CONFIG = "config/gerber4yy.properties"; 

	 

	    private static Properties GERBER_CONFIG_PROPS = null; 

	    private static final String GERBER_FORMAT_DIALOG_TITLE_SCRIPT = "{#GERBER_FORMAT_DIALOG_TITLE}"; 

	    private static String GERBER_FORMAT_DIALOG_TITLE = ""; 

	 

	    /* gerber properties parmters keys config */ 

	    private static final String QTP_SCRIPT_IN = "script.in"; 

	 

	    private static final String QTP_SCRIPT_OUT = "script.out"; 

	 

	    private static final String QTP_SYSTEM_PATH = "QTP.system.path"; 

	    private static final String QTP_SYSTEM_PATH_S

补充：软件开发 , Java ,