InputStream乱码 用Socket读取网页内容
用Socket读取中文网页,内容打印出来都是乱码希望 路过 的大虾,不吝赐教!!!
HTTP 响应头
HTTP/1.1 200 OK
Date: Thu, 07 Oct 2010 08:34:53 GMT
Server: Apache
...
Content-Language: zh-CN
Set-Cookie: _lang=zh_CN:GBK; Domain=.xxx.com; Path=/
Set-Cookie: _lang=zh_CN:GBK; Domain=.xxx.com; Path=/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 26499
Connection: close
Content-Type: text/html;charset=GBK
Java 代码
--------------------编程问答-------------------- public String(byte[] bytes,
public void getHtml(String host,String address){
connectSocket(host);
try {
pw = new PrintWriter(new OutputStreamWriter(socket.getOutputStream()));
} catch (IOException e) {
System.out.println("socket.getOutputStream *** IOException ");
e.printStackTrace();
}
pw.println("GET "+address+ " HTTP/1.1");
pw.println("Host: "+host);
pw.println("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");
pw.println("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
pw.println("Accept-Language: zh-cn,zh;q=0.5");
pw.println("Accept-Encoding: gzip,deflate");
pw.println("Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7");
pw.println("Keep-Alive: 300");
pw.println("Connection: keep-alive");
pw.println("Cookie: " + cookies);
pw.println();
pw.flush();
}
try {
br = new
BufferedReader(new InputStreamReader(socket.getInputStream(),"GBK"));
} catch (IOException e) {
System.out.println("### socket.getInputStream --- IOException");
e.printStackTrace();
}
}
String htmlContent = getHtmlContent(br);
System.out.println(htmlContent);
public String getHtmlContent(BufferedReader br){
String htmlContent = "";
String str = "";
try {
while((str = br.readLine()) != null){
System.out.println(str);
}
} catch (IOException e) {
System.out.println("br.readLine ++ IOException");
e.printStackTrace();
}
return htmlContent;
}
String charsetName)
throws UnsupportedEncodingException构造一个新的 String,方法是使用指定的字符集解码指定的字节数组。新的 String 的长度是一个字符集函数,因此不能等于字节数组的长度。
当给定字节在给定字符集中无效的情况下,该构造方法无指定的行为。当需要进一步控制解码过程时,应使用 CharsetDecoder 类。
参数:
bytes - 要解码为字符的字节
charsetName - 受支持的 charset 的名称
抛出:
UnsupportedEncodingException - 如果指定字符集不受支持
--------------------编程问答-------------------- 引用自王明哲 --------------------编程问答-------------------- in = new BufferedReader(new InputStreamReader(socket.getInputStream(),"UTF-8"));
out = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(),"UTF-8"));
编码问题 应该用UTF-8 --------------------编程问答--------------------
补充:Java , J2ME