C# 如何根据关键词获取html页面的超链接

例如
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default3.aspx.cs" Inherits="Default3" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title></title>
</head>
<body>
<div id="show">

<a href="www.baidu.com">百度</a>

</div>

</body>
</html>

我根据百度这个关键词获取 www.baidu.com

--------------------编程问答-------------------- BS中不知道怎么弄？给你个思路，你可以遍历整个html，找a标签然后把href中内容和被包含的标签值（如百度）存为键值对，可以用正则实现。但是不知道BS读取html是不是要简单一些呢 --------------------编程问答-------------------- 遍历html标签，然后用正则表达式去匹配a标签，提取文本为百度的 a标签的href属性 --------------------编程问答-------------------- 在C盘新建一个Txt，内容如下



<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default3.aspx.cs" 



Inherits="Default3" %>



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 



"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">



<html xmlns="http://www.w3.org/1999/xhtml">

<head>

    

<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />

<title></title>

</head>

<body>

<div id="show">

<a href="www.baidu.com">百度</a>

<a href="www.sina.com">新浪</a>

<a href="www.google.cn">谷歌</a>

<a href="www.souhu.com">搜狐</a>

</div>

</body>

</html>



            Dictionary<string, string> dicstr = new Dictionary<string, string>();

            string strfromtxt = File.ReadAllText(@"C:\1.txt", Encoding.GetEncoding("GB2312"));

            string res = @"(?is)<a\s*href=""(?<href>([^>]*))""\s*>(?<value>(.*?))</a>";

            MatchCollection matches = Regex.Matches(strfromtxt, res);

            foreach (Match match in matches)

            {

                dicstr.Add(match.Groups["value"].Value.Trim(), match.Groups["href"].Value.Trim());//数据结果在dicstr

            }

--------------------编程问答-------------------- 思路是有的，用正则取出<a href="www.baidu.com">百度</a>,之后在根据a标签的文本在取出地址只是正则表达式不知道怎么写楼上的是取出a标签的所有属性 --------------------编程问答--------------------



            string source = @"<a href=""www.baidu.com"">百度</a>";

            Regex reg = new Regex(@"<a href=""(?<web>[^""]+)"">百度");

            MatchCollection mc = reg.Matches(source);

            foreach (Match m in mc)

            {

                MessageBox.Show(m.Groups["web"].Value);

            }

--------------------编程问答-------------------- 遍历html标签，然后用正则表达式去匹配a标签，提取文本为百度的 a标签的href属性
--------------------编程问答--------------------

引用 4 楼 yuhaichao928 的回复:

思路是有的，用正则取出<a href="www.baidu.com">百度</a>,之后在根据a标签的文本在取出地址只是正则表达式不知道怎么写楼上的是取出a标签的所有属性

dicstr中Key是百度，value是www.baidu.com，你先把代码试试 --------------------编程问答-------------------- webrequest 对象可以获取网页源码，然后遍历查找字符串 --------------------编程问答-------------------- --------------------编程问答-------------------- 用jQuery可以么？



$("a").each(function(){

    if($(this).html()=="百度")

      {

         alert($(this).attr("href"));

      }

});

补充：.NET技术 ,  ASP.NET