如何用VB提取网页上的网址
比如 我在百度搜索“保险柜”这个词,得到的结果中的网页源代码中</td></tr></table>
<table border="0" cellpadding="0" cellspacing="0" id="0"><tr><td class=f><a href="http://www.baidu.com/baidu.php?url=qhcK00KWM8-st9rMVn9TaDVjCd6HO4Pr7z9u1i2Kc6_4lCYKNtYP6aPWrk-QyAcurm4YoBphL0lo8HMQpaXeqd易做图Plb6Td8BdjiZrYPm4mZvJfuk-m9SvmXtlXg.DR_jvwwRsQDk83vRyXgZJyAp7WFYeqqp60.THvkCtON8xD0T1Yk0Z0qn0KW5H00UHYs0APzm1YYn1cvn60.UAsqnH010Atqnf" target="_blank"><font size="3">QNN Safe 全能牌<font color=#C60A00>保险柜</font> 全球著名品牌</font></a><br><font size=-1>QNN Safe全能牌<font color=#C60A00>保险柜</font>,全球著名<font color=#C60A00>保险柜</font>品牌,华南地区最大<font color=#C60A00>保险柜</font>制造商,始创于1939年,产品有家用<font color=#C60A00>保险柜</font>,商用<font color=#C60A00>保险柜</font>,酒店<font color=#C60A00>保险柜</font>,抢柜,智能<font color=#C60A00>保险柜</font>,等上百个品种,全国免费服务热线:400-830-4555. <br><font color=#008000>www.qnn.com.cn/ 1K 2009-06 </font> - <a href="http://e.baidu.com" target="_blank" class=m>推广
我如何在得到a href="http://www.baidu.com/baidu.php?url=qhcK00KWM8-st9rMVn9TaDVjCd6HO4Pr7z9u1i2Kc6_4lCYKNtYP6aPWrk-QyAcurm4YoBphL0lo8HMQpaXeqd易做图Plb6Td8BdjiZrYPm4mZvJfuk-m9SvmXtlXg.DR_jvwwRsQDk83vRyXgZJyAp7WFYeqqp60.THvkCtON8xD0T1Yk0Z0qn0KW5H00UHYs0APzm1YYn1cvn60.UAsqnH010Atqnf"
这个地址的同时,得到“font color=#008000>www.qnn.com.cn/ 1K 2009-06 </font> ”这个网址?
关键是如何提取出来,不知用什么标签来识别,比如a href="http://www.baidu.com/baidu.php?url=qhcK00KWM8-st9rMVn9TaDVjCd6HO4Pr7z9u1i2Kc6_4lCYKNtYP6aPWrk-QyAcurm4YoBphL0lo8HMQpaXeqd易做图Plb6Td8BdjiZrYPm4mZvJfuk-m9SvmXtlXg.DR_jvwwRsQDk83vRyXgZJyAp7WFYeqqp60.THvkCtON8xD0T1Yk0Z0qn0KW5H00UHYs0APzm1YYn1cvn60.UAsqnH010Atqnf"
可以查找“A”,然后用href提取出来;
“font color=#008000>www.qnn.com.cn/ 1K 2009-06 </font> ”这个用什么来提取呢?
--------------------编程问答--------------------
'引用的是microsoft vbscript regular expression 5.5--------------------编程问答-------------------- 用Mshtml解析也挺爽的 --------------------编程问答-------------------- '引用的是microsoft vbscript regular expression 5.5
Function RegExpTest(patrn, strng) 'patrn:需要查找的字符 strng:被查找的字符串
Dim regEx, Match, Matches ' 创建变量。
Set regEx = New RegExp ' 创建正则表达式。
regEx.Pattern = patrn ' 设置模式。'"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"'
regEx.IgnoreCase = True ' 设置是否区分大小写。
regEx.Global = True ' 设置全程匹配。
Set Matches = regEx.Execute(strng) ' 执行搜索。
For Each Match In Matches ' 循环遍历Matches集合。
RetStr = RetStr & Match.Value & vbCrLf
Next
RegExpTest = RetStr
End Function
Private Sub Command1_Click()
Dim URLRegExp As String, MailRegExp As String, ChiniRegExp As String
Dim FileName As String, sFile As String, MuName As String, Chans As String
Dim i As Long, arr() As String, arr1() As String, arr2() As String
URLRegExp = "http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?" 'URL正则表达式
MailRegExp = "\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*" '电子邮件正则表达式
ChiniRegExp = "[^\x00-\xff]* "
Open "c:\temp.html" For Binary As #1
sFile = Space(LOF(1))
Get #1, , sFile
Close #1
Text1.Text = RegExpTest(URLRegExp, sFile)
End Sub
Function RegExpTest(patrn, strng) 'patrn:需要查找的字符 strng:被查找的字符串
Dim regEx, Match, Matches ' 创建变量。
Set regEx = New RegExp ' 创建正则表达式。
regEx.Pattern = patrn ' 设置模式。'"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"'
regEx.IgnoreCase = True ' 设置是否区分大小写。
regEx.Global = True ' 设置全程匹配。
Set Matches = regEx.Execute(strng) ' 执行搜索。
For Each Match In Matches ' 循环遍历Matches集合。
RetStr = RetStr & Match.Value & vbCrLf
Next
RegExpTest = RetStr
End Function
Private Sub Command1_Click()
Dim URLRegExp As String, MailRegExp As String, ChiniRegExp As String
Dim FileName As String, sFile As String, MuName As String, Chans As String
Dim i As Long, arr() As String, arr1() As String, arr2() As String
URLRegExp = "http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?" 'URL正则表达式
MailRegExp = "\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*" '电子邮件正则表达式
ChiniRegExp = "[^\x00-\xff]* "
Open "c:\temp.html" For Binary As #1
sFile = Space(LOF(1))
Get #1, , sFile
Close #1
Text1.Text = RegExpTest(URLRegExp, sFile)
End Sub --------------------编程问答-------------------- 学习了,好帖子啊
补充:VB , 网络编程