《PHP教程:PHP利用正則表達式將相對路徑轉成絕對路徑的方法示例》要點:
本文介紹了PHP教程:PHP利用正則表達式將相對路徑轉成絕對路徑的方法示例,希望對您有用。如果有疑問,可以聯系我們。
PHP編程前言
PHP編程大家應該都有所體會,很多時候在做網絡爬蟲的時候特別需要將爬蟲搜索到的超鏈接進行處理,統一都改成絕對路徑的,所以本文就寫了一個正則表達式來對搜索到的鏈接進行處理.下面話不多說,來看看詳細的介紹吧.
PHP編程通常我們可能會搜索到如下的鏈接:
PHP編程
<!-- 空超鏈接 -->
<a href=""></a>
<!-- 空白符 -->
<a href=" " rel="external nofollow" > </a>
<!-- a標簽含有其它屬性 -->
<a href="index.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="超鏈接"> index.html </a>
<a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" target="_blank"> / target="_blank" </a>
<a target="_blank" href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="超鏈接" > target="_blank" / alt="超鏈接" </a>
<a target="_blank" title="超鏈接" href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="超鏈接" > target="_blank" title="超鏈接" / alt="超鏈接" </a>
<!-- 根目錄 -->
<a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" > / </a>
<a href="a" rel="external nofollow" > a </a>
<!-- 含參數 -->
<a href="/index.html?id=1" rel="external nofollow" > /index.html?id=1 </a>
<a href="?id=2" rel="external nofollow" > ?id=2 </a>
<!-- // -->
<a rel="external nofollow" > //index.html </a>
<a rel="external nofollow" > //www.mafutian.net </a>
<!-- 站內鏈接 -->
<a rel="external nofollow" > http://www.hole_1.com/index.html </a>
<!-- 站外鏈接 -->
<a rel="external nofollow" > http://www.mafutian.net </a>
<a rel="external nofollow" > http://www.numberer.net </a>
<!-- 圖片,文本文件格式的鏈接 -->
<a href="1.jpg" rel="external nofollow" > 1.jpg </a>
<a href="1.jpeg" rel="external nofollow" > 1.jpeg </a>
<a href="1.gif" rel="external nofollow" > 1.gif </a>
<a href="1.png" rel="external nofollow" > 1.png </a>
<a href="1.txt" rel="external nofollow" > 1.txt </a>
<!-- 普通鏈接 -->
<a href="index.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" > index.html </a>
<a href="index.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" > index.html </a>
<a href="./index.html" rel="external nofollow" > ./index.html </a>
<a href="../index.html" rel="external nofollow" > ../index.html </a>
<a href=".../" rel="external nofollow" > .../ </a>
<a href="..." rel="external nofollow" > ... </a>
<!-- 非鏈接,含有鏈接冒號 -->
<a href="void(0)" rel="external nofollow" > void(0) </a>
<a href="a:b" rel="external nofollow" > a:b </a>
<a href="/a#a:b" rel="external nofollow" > /a#a:b </a>
<a href="mailto:'mafutian@126.com'" rel="external nofollow" > mailto:'mafutian@126.com' </a>
<a href="/tencent://message/?uin=335134463" rel="external nofollow" > /tencent://message/?uin=335134463 </a>
<!-- 相對路徑 -->
<a href="." rel="external nofollow" > . </a>
<a href=".." rel="external nofollow" > .. </a>
<a href="../" rel="external nofollow" > ../ </a>
<a href="/a/b/.." rel="external nofollow" > /a/b/.. </a>
<a href="/a" rel="external nofollow" > /a </a>
<a href="./b" rel="external nofollow" > ./b </a>
<a href="./././././././././b" rel="external nofollow" > ./././././././././b </a> <!-- 其實就是 ./b -->
<a href="../c" rel="external nofollow" > ../c </a>
<a href="../../d" rel="external nofollow" > ../../d </a>
<a href="../a/../b/c/../d" rel="external nofollow" > ../a/../b/c/../d </a>
<a href="./../e" rel="external nofollow" > ./../e </a>
<a rel="external nofollow" > http://www.hole_1.org/./../e </a>
<a href="./.././f" rel="external nofollow" > ./.././f </a>
<a rel="external nofollow" > http://www.hole_1.org/../a/.../../b/c/../d/.. </a>
<!-- 帶有端口號 -->
<a href=":8081/index.html" rel="external nofollow" > :8081/index.html </a>
<a rel="external nofollow" > :80/index.html </a>
<a rel="external nofollow" > http://www.mafutian.net:8081/index.html </a>
<a rel="external nofollow" > http://www.mafutian.net:8082/index.html </a>
PHP編程處理的第一步,設置成絕對路徑:
PHP編程
http:// ... / ../ ../
PHP編程然后本文講講如何去除絕對路徑中的 './'、'../'、'/..'的實現代碼:
PHP編程
function url_to_absolute($relative)
{
$absolute = '';
// 去除所有的 './'
$absolute = preg_replace('/(?<!\.)\.\//','',$relative);
$count = preg_match_all('/(?<!\/)\/([^\/]{1,}?)\/\.\.\//',$absolute,$res);
// 迭代去除所有的 '/abc/../'
do
{
$absolute = preg_replace('/(?<!\/)\/([^\/]{1,}?)\/\.\.\//','/',$absolute);
$count = preg_match_all('/(?<!\/)\/([^\/]{1,}?)\/\.\.\//',$absolute,$res);
}while($count >= 1);
// 除去最后的 '/..'
$absolute = preg_replace('/(?<!\/)\/([^\/]{1,}?)\/\.\.$/','/',$absolute);
$absolute = preg_replace('/\/\.\.$/','',$absolute);
// 除去存在的 '../'
$absolute = preg_replace('/(?<!\.)\.\.\//','',$absolute);
return $absolute;
}
$relative = 'http://www.mytest.org/../a/.../../b/c/../d/..';
var_dump(url_to_absolute($relative));
// 輸出:string 'http://www.mytest.org/a/b/' (length=26)
PHP編程總結
PHP編程以上就是這篇文章的全部內容了,希望本文的內容對大家的學習或者工作能帶來一定的幫助,如果有疑問大家可以留言交流,謝謝大家對維易PHP的支持.
轉載請注明本頁網址:
http://www.snjht.com/jiaocheng/1542.html