当前位置:编程学习 > php >>

用php抓取google关键词排名

 

说下思路,利用PHP的curl函数储存cookie,google搜索页面是无法用file_get_connents打开的,必须要完全模拟浏览器才行,百度就不同了,直接用file_get_conntens抓取页面,然后用正则处理下就行了,这里就不列举百度了。

 <?php

header("Content-Type: text/html;charset=utf-8");

 

function ggsearch($url_s, $keyword, $page = 1) {

        $enKeyword = urlencode($keyword);

 

        $rsState = false;

 

        $page_num = ($page -1) * 10;

 

 

        if ($page <= 10) {

                $inte易做图ce = "eth0:" . rand(1, 4); //避免GG封IP

                $cookie_file = dirname(__FILE__) . "/temp/google.txt"; //存储cookie值

                $url = "http://www.google.com/search?q=$enKeyword&hl=en&prmd=imvns&ei=JPnJTvLFI8HlggeXwbRl&start=$page_num&sa=N";

                $ch = curl_init();

 

                curl_setopt($ch, CURLOPT_URL, $url);

 

                //curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);//获取浏览器类型

                curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5");

                curl_setopt($ch, CURLOPT_INTERFACE, "$inte易做图ce"); //指定访问IP地址

                curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

 

                curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

 

                curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);

 

                $contents = curl_exec($ch);

 

                curl_close($ch);

 

                $match = "!<div\s*id=\"search\">(.*)</div>\s+<\!--z-->!";

                preg_match_all("$match", "$contents", $line);

                while (list ($k, $v) = each($line[0])) {

                        preg_match_all("!<h3\s+class=\"r\"><a[^>]+>(.*?)</a>!", $v, $title);

                        $num = count($title[1]);

                        for ($i = 0; $i < $num; $i++) {

                                if (strstr($title[0][$i], $url_s)) {

                                        $rsState = true;

                                        $j = $i +1;

                                        $sum = $j + (($page) * 10 - 10);

                                        //echo $contents;

                                        echo "关键字" . $keyword . "<br>" . "排名:" . '<font color="red" size="20" >' . $sum . '</font>' . "####" . "第" . '<font color="#00FFFF" size="18" >'.$page . '</font>'. " 页" . "第" .'<font color="#8000FF" size="15" >'.$j . '</font>'. "名" . $title[0][$i] . "<br>";

                                        echo "<a href='" . $url . "'>" . "点击搜索结果" . "</a>" . "<br>";

                                        echo "<hr>";

                                        break;

                                }

                        }

                }

                unset ($contents);

                if ($rsState === false) {

                        ggsearch($url_s, $keyword, ++ $page); //找不到搜索页面的继续往下

补充:Web开发 , php ,
CopyRight © 2022 站长资源库 编程知识问答 zzzyk.com All Rights Reserved
部分文章来自网络,