所谓网页小偷程序,其实就是网页部分内容的抓取器,通过了XMLHTTP组件调用其它网站上的网页,通过过虑网页内容,来得到自己所需信息,例如获取新闻内容,获取网站用户信息等等.
下面将通过一段javascript脚本,获取Chinadaily英语点津页面上的每日一句的内容,将下面的内容保存为html文件,在浏览器上打开.
<HTML>
<HEAD>
<TITLE> *** Place Title Here *** </TITLE>
<script language=vbscript>...
Function bytes2BSTR(arrBytes)
strReturn = ""
arrBytes = CStr(arrBytes)
For i = 1 To LenB(arrBytes)
ThisCharCode = AscB(MidB(arrBytes, i, 1))
If ThisCharCode < &H80 Then
strReturn = strReturn & Chr(ThisCharCode)
Else
NextCharCode = AscB(MidB(arrBytes, i+1, 1))
strReturn = strReturn & Chr(CLng(ThisCharCode) * &H100 + CInt(NextCharCode))
i = i + 1
End If
Next
bytes2BSTR = strReturn
End Function
'Dim objXMLHTTP, xml
'Set xml = CreateObject("Microsoft.XMLHTTP")
'xml.Open "GET", "http://www.chinadaily.com.cn/language_tips/index.html", False
'xml.Send
'document.Write bytes2BSTR(xml.responseBody)
'Set xml = Nothing
</script>
<SCRIPT LANGUAGE = JavaScript>...
function getHtml()
...{
var objXMLHTTP =new ActiveXObject("Microsoft.XMLHTTP");
//建立XMLHTTP对象
objXMLHTTP.open("GET","http://www.chinadaily.com.cn/language_tips/index.html",false);
objXMLHTTP.Send(null);
if(objXMLHTTP.readyState == 4 && objXMLHTTP.Status == 200)
...{
var HTML=objXMLHTTP.responseBody;//得到网页内容
var txt=bytes2BSTR(HTML);//解决中文乱码问题
var regx=/<span class="ywzi">([^<a].+?)</span>/g;
var t=regx.exec(txt);//取出想要的内容
var t1=t[0].split('.')
document.write(t1[0]+".<br>"+t1[1]);
}
}
</SCRIPT>
</HEAD>
<BODY BGCOLOR="white">
<SCRIPT LANGUAGE = JavaScript>
getHtml();
</SCRIPT>
</BODY>
</HTML>
最后得到的结果:
Cry wine and sell vinegar.
挂羊头卖狗肉。
怎么样,是不是很简单呢