前言
相信对于爬虫大家一定不陌生吧,之前接触python时我也尝试爬过某些网站.但是因为python(神奇)的缩进,使我写的程序经常报错(╯°A°)╯︵○○○,所以我就尝试用php来爬取一次网站.
主要函数
首先介绍一下今天主要的函数:
file_get_contents -> 获取网站html
strpos -> 搜索字符并输出该字符出现的第一个位置 substr ->
截取字符串
实现
这里我就直接拿我之前写的一个爬取墨迹天气官网获取天气信息的源代码做示范.
TIP:因为我是直接在CLI里用的,所以换行直接nr而不是
源代码
<?php $url = "https://tianqi.moji.com/weather/china/jiangsu/tongzhou-district"; $html = file_get_contents($url); //GET /*****************************目前天气*******************************/ $uptime = strpos($html,"info_uptime")+13; $htmll = substr($html,$uptime); $endup = strpos($htmll,"<"); echo $uptimes = substr($htmll,0,$endup); echo "rn"; echo "rn"; $num = strpos($html,"description") + 22; $htmlx = substr($html,$num); $c= strpos($htmlx,">"); $htmlx = substr($htmlx,0,$c-1); echo $htmlx; //一句话 echo "rn"; echo "rn目前天气:"; $tianqistart = strpos($htmlx,"度")+4; $tianqiend = strpos($htmlx,","); $tianqix = $tianqiend - $tianqistart; $tianqi = substr($htmlx,$tianqistart,$tianqix); echo $tianqi; //目前天气 echo "rn目前温度:"; $tempnowstart = strpos($htmlx,":")+3; $tempnowend = strpos($htmlx,"度")+3; $tempnowx = $tempnowend- $tempnowstart; $tempnow = substr($htmlx,$tempnowstart,$tempnowx); echo $tempnow; //目前温度 /**************************今天天气**************************/ $numx = strpos($html,"days clearfix")+300; $htmlxs = substr($html,$numx); $htmlxsend = strpos($htmlxs,"</strong>"); $htmlxs = substr($htmlxs,0,$htmlxsend); $numxs = strpos($htmlxs,"alt")+5; $htmlxs = substr($htmlxs,$numxs); //GET今日天气 echo "rn"; echo "rn今天天气: "; $tianqitodaystart = 0; $tianqitodayend = strpos($htmlxs,">")-1; $tianqitodayx = $tianqitodayend - $tianqitodaystart; $tianqitoday = substr($htmlxs,$tianqitodaystart,$tianqitodayx); echo $tianqitoday; //今天天气 echo "rn今天温度: "; $temptodaystart = strpos($htmlxs,"<li>")+4; $temptodayend = strpos($htmlxs,"<em>")-49; $temptodayx = $temptodayend - $temptodaystart; $temptoday = substr($htmlxs,$temptodaystart,$temptodayx); echo $temptoday; /*****************************明天天气**********************************/ echo "rn"; $numx = strpos($html,"days clearfix")+300; $htmlxs = substr($html,$numx); $numx = strpos($htmlxs,"days clearfix")+300; $htmlxsx = substr($htmlxs,$numx); $htmlxsend = strpos($htmlxsx,"</strong>"); $htmlxsx = substr($htmlxsx,0,$htmlxsend); $numxs = strpos($htmlxsx,"alt")+5; $htmlxsx = substr($htmlxsx,$numxs); //GET明天天气 echo "rn"; echo "rn明天天气: "; $tianqitodaystart = 0; $tianqitodayend = strpos($htmlxsx,">")-1; $tianqitodayx = $tianqitodayend - $tianqitodaystart; $tianqitoday = substr($htmlxsx,$tianqitodaystart,$tianqitodayx); echo $tianqitoday; //今天天气 echo "rn明天温度: "; $temptodaystart = strpos($htmlxsx,"<li>")+4; $temptodayend = strpos($htmlxsx,"<em>")-49; $temptodayx = $temptodayend - $temptodaystart; $temptoday = substr($htmlxsx,$temptodaystart,$temptodayx); echo $temptoday; /*************************后天天气***************************/ echo "rn"; $numx = strpos($html,"days clearfix")+300; $htmlxs = substr($html,$numx); $numx = strpos($htmlxs,"days clearfix")+300; $htmlxsx = substr($htmlxs,$numx); $numx = strpos($htmlxsx,"days clearfix")+300; $htmlxsxx = substr($htmlxsx,$numx); $htmlxsend = strpos($htmlxsxx,"</strong>"); $htmlxsxx = substr($htmlxsxx,0,$htmlxsend); $numxs = strpos($htmlxsxx,"alt")+5; $htmlxsxx = substr($htmlxsxx,$numxs); //GET明天天气 echo "rn"; echo "rn后天天气: "; $tianqitodaystart = 0; $tianqitodayend = strpos($htmlxsxx,">")-1; $tianqitodayx = $tianqitodayend - $tianqitodaystart; $tianqitoday = substr($htmlxsxx,$tianqitodaystart,$tianqitodayx); echo $tianqitoday; //今天天气 echo "rn后天温度: "; $temptodaystart = strpos($htmlxsxx,"<li>")+4; $temptodayend = strpos($htmlxsxx,"<em>")-49; $temptodayx = $temptodayend - $temptodaystart; $temptoday = substr($htmlxsxx,$temptodaystart,$temptodayx); echo $temptoday; echo "rn"; ?>
© 版权声明
文章版权归作者所有,未经允许请勿转载。
THE END
喜欢就支持以下吧