模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
来源:互联网 发布:fix you知乎 编辑:程序博客网 时间:2024/05/22 16:01
【教程】模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
http://www.crifan.com/emulate_login_website_using_csharp/
这个网站很好 ,他研究了 很多 post的 基础知识。。。
还有 就是 他 驱动应该也 涉及过
之前已经介绍过了网络相关的一些基础知识了:
【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
以及简单的网页内容抓取,用C#是如何实现的:
【教程】抓取网并提取网页中所需要的信息 之 C#版
现在接着来介绍,以模拟登陆百度首页:
http://www.baidu.com/
为例,说明如何通过C#模拟登陆网站。
不过,此处需要介绍一下此文前提:
假定你已经看完了:
【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
了解了基本的网络相关基本概念;
看完了:
【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器
知道了如何使用IE9的F12等工具去分析网页执行的过程。
1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑
此想要通过程序,即C#代码,实现模拟登陆百度首页之前。
你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。
而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:
【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑
看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。
注:
(1)关于在C#中如何利用cookie,不熟悉的,先去看:
【经验总结】Http,网页访问,HttpRequest,HttpResponse相关的知识
(2)对于正则表达式不熟悉的,去参考:
正则表达式学习心得
(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:
C#中的正则表达式的学习心得
此处,再把分析出来的流程,贴出来,以便方便和代码对照:
顺序访问地址
访问类型
发送的数据
需要获得/提取的返回的值
1http://www.baidu.com/GET无返回的cookie中的BAIDUID2https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=trueGET包含BAIDUID这个cookie从返回的html中提取出token的值3https://passport.baidu.com/v2/api/?loginPOST一堆的post data,其中token的值是之前提取出来的需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID
然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。
【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】
其中,通过UI中,点击“获取cookie BAIDUID”:
然后调用下面这部分代码:
private
void
btnGetBaiduid_Click(
object
sender, EventArgs e)
{
//http://www.baidu.com/
string
baiduMainUrl = txbBaiduMainUrl.Text;
//generate http request
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
//add follow code to handle cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
//use request to get response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
txbGotBaiduid.Text =
""
;
foreach
(Cookie ck
in
resp.Cookies)
{
txbGotBaiduid.Text +=
"["
+ ck.Name +
"]="
+ ck.Value;
if
(ck.Name ==
"BAIDUID"
)
{
gotCookieBaiduid =
true
;
}
}
if
(gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show(
"错误:没有找到cookie BAIDUID !"
);
}
}
获得上述所看到的BAIDUID这个cookie的值了。
然后接着点击“获取token值”,然后调用下面的代码:
private
void
btnGetToken_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid)
{
string
getapiUrl =
"https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true"
;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
//add previously got cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
respHtml = sr.ReadToEnd();
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string
tokenValP =
@"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"
;
Match foundTokenVal = (
new
Regex(tokenValP)).Match(respHtml);
if
(foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups[
"tokenVal"
].Value;
extractTokenValueOK =
true
;
}
else
{
txbExtractedTokenVal.Text =
"错误:没有找到token的值!"
;
}
}
else
{
MessageBox.Show(
"错误:之前没有正确获得Cookie:BAIDUID !"
);
}
}
就可以获取对应的token的值了:
接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:
private
void
btnEmulateLoginBaidu_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid && extractTokenValueOK)
{
string
staticpage =
"http://www.baidu.com/cache/user/html/jump.html"
;
//init post dict info
Dictionary<
string
,
string
> postDict =
new
Dictionary<
string
,
string
>();
//postDict.Add("ppui_logintime", "");
postDict.Add(
"charset"
,
"utf-8"
);
//postDict.Add("codestring", "");
postDict.Add(
"token"
, txbExtractedTokenVal.Text);
postDict.Add(
"isPhone"
,
"false"
);
postDict.Add(
"index"
,
"0"
);
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add(
"staticpage"
, staticpage);
postDict.Add(
"loginType"
,
"1"
);
postDict.Add(
"tpl"
,
"mn"
);
postDict.Add(
"callback"
,
"parent.bdPass.api.login._postCallback"
);
postDict.Add(
"username"
, txbBaiduUsername.Text);
postDict.Add(
"password"
, txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add(
"mem_pass"
,
"on"
);
string
baiduMainLoginUrl =
"https://passport.baidu.com/v2/api/?login"
;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
//add cookie
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
//set to POST
req.Method =
"POST"
;
req.ContentType =
"application/x-www-form-urlencoded"
;
//prepare post data
string
postDataStr = quoteParas(postDict);
byte
[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
//send post data
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
//got response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
//got returned html
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
loginBaiduRespHtml = sr.ReadToEnd();
//check whether got all expected cookies
Dictionary<
string
,
bool
> cookieCheckDict =
new
Dictionary<
string
,
bool
>();
string
[] cookiesNameList = {
"BDUSS"
,
"PTOKEN"
,
"STOKEN"
,
"SAVEUSERID"
};
foreach
(String cookieToCheck
in
cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck,
false
);
}
foreach
(Cookie singleCookie
in
resp.Cookies)
{
if
(cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] =
true
;
}
}
bool
allCookiesFound =
true
;
foreach
(
bool
foundCurCookie
in
cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if
(loginBaiduOk)
{
txbEmulateLoginResult.Text =
"成功模拟登陆百度首页!"
;
}
else
{
txbEmulateLoginResult.Text =
"模拟登陆百度首页 失败!"
;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的Header信息为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的HTML源码为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
}
}
else
{
MessageBox.Show(
"错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"
);
}
}
如果用户名和密码都是正确的话,即可成功登陆:
当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:
完整的C#模拟登陆百度首页的代码,如下:
using
System;
using
System.Collections.Generic;
using
System.ComponentModel;
using
System.Data;
using
System.Drawing;
using
System.Text;
using
System.Windows.Forms;
using
System.Net;
using
System.IO;
using
System.Text.RegularExpressions;
using
System.Web;
namespace
emulateLoginBaidu
{
public
partial
class
frmEmulateLoginBaidu : Form
{
CookieCollection curCookies =
null
;
bool
gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;
public
frmEmulateLoginBaidu()
{
InitializeComponent();
}
private
void
frmEmulateLoginBaidu_Load(
object
sender, EventArgs e)
{
//init
curCookies =
new
CookieCollection();
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
}
/******************************************************************************
functions in crifanLib.cs
*******************************************************************************/
//quote the input dict values
//note: the return result for first para no '&'
public
string
quoteParas(Dictionary<
string
,
string
> paras)
{
string
quotedParas =
""
;
bool
isFirst =
true
;
string
val =
""
;
foreach
(
string
para
in
paras.Keys)
{
if
(paras.TryGetValue(para,
out
val))
{
if
(isFirst)
{
isFirst =
false
;
quotedParas += para +
"="
+ HttpUtility.UrlPathEncode(val);
}
else
{
quotedParas +=
"&"
+ para +
"="
+ HttpUtility.UrlPathEncode(val);
}
}
else
{
break
;
}
}
return
quotedParas;
}
/******************************************************************************
Demo emulate login baidu related functions
*******************************************************************************/
private
void
btnGetBaiduid_Click(
object
sender, EventArgs e)
{
//http://www.baidu.com/
string
baiduMainUrl = txbBaiduMainUrl.Text;
//generate http request
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
//add follow code to handle cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
//use request to get response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
txbGotBaiduid.Text =
""
;
foreach
(Cookie ck
in
resp.Cookies)
{
txbGotBaiduid.Text +=
"["
+ ck.Name +
"]="
+ ck.Value;
if
(ck.Name ==
"BAIDUID"
)
{
gotCookieBaiduid =
true
;
}
}
if
(gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show(
"错误:没有找到cookie BAIDUID !"
);
}
}
private
void
btnGetToken_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid)
{
string
getapiUrl =
"https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true"
;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
//add previously got cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
respHtml = sr.ReadToEnd();
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string
tokenValP =
@"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"
;
Match foundTokenVal = (
new
Regex(tokenValP)).Match(respHtml);
if
(foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups[
"tokenVal"
].Value;
extractTokenValueOK =
true
;
}
else
{
txbExtractedTokenVal.Text =
"错误:没有找到token的值!"
;
}
}
else
{
MessageBox.Show(
"错误:之前没有正确获得Cookie:BAIDUID !"
);
}
}
private
void
btnEmulateLoginBaidu_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid && extractTokenValueOK)
{
string
staticpage =
"http://www.baidu.com/cache/user/html/jump.html"
;
//init post dict info
Dictionary<
string
,
string
> postDict =
new
Dictionary<
string
,
string
>();
//postDict.Add("ppui_logintime", "");
postDict.Add(
"charset"
,
"utf-8"
);
//postDict.Add("codestring", "");
postDict.Add(
"token"
, txbExtractedTokenVal.Text);
postDict.Add(
"isPhone"
,
"false"
);
postDict.Add(
"index"
,
"0"
);
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add(
"staticpage"
, staticpage);
postDict.Add(
"loginType"
,
"1"
);
postDict.Add(
"tpl"
,
"mn"
);
postDict.Add(
"callback"
,
"parent.bdPass.api.login._postCallback"
);
postDict.Add(
"username"
, txbBaiduUsername.Text);
postDict.Add(
"password"
, txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add(
"mem_pass"
,
"on"
);
string
baiduMainLoginUrl =
"https://passport.baidu.com/v2/api/?login"
;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
//add cookie
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
//set to POST
req.Method =
"POST"
;
req.ContentType =
"application/x-www-form-urlencoded"
;
//prepare post data
string
postDataStr = quoteParas(postDict);
byte
[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
//send post data
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
//got response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
//got returned html
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
loginBaiduRespHtml = sr.ReadToEnd();
//check whether got all expected cookies
Dictionary<
string
,
bool
> cookieCheckDict =
new
Dictionary<
string
,
bool
>();
string
[] cookiesNameList = {
"BDUSS"
,
"PTOKEN"
,
"STOKEN"
,
"SAVEUSERID"
};
foreach
(String cookieToCheck
in
cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck,
false
);
}
foreach
(Cookie singleCookie
in
resp.Cookies)
{
if
(cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] =
true
;
}
}
bool
allCookiesFound =
true
;
foreach
(
bool
foundCurCookie
in
cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if
(loginBaiduOk)
{
txbEmulateLoginResult.Text =
"成功模拟登陆百度首页!"
;
}
else
{
txbEmulateLoginResult.Text =
"模拟登陆百度首页 失败!"
;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的Header信息为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的HTML源码为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
}
}
else
{
MessageBox.Show(
"错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"
);
}
}
private
void
lklEmulateLoginTutorialUrl_LinkClicked(
object
sender, LinkLabelLinkClickedEventArgs e)
{
string
emulateLoginTutorialUrl =
"http://www.crifan.com/emulate_login_website_using_csharp"
;
System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
}
private
void
btnClearAll_Click(
object
sender, EventArgs e)
{
curCookies =
new
CookieCollection();
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
txbGotBaiduid.Text =
""
;
txbExtractedTokenVal.Text =
""
;
txbBaiduUsername.Text =
""
;
txbBaiduPassword.Text =
""
;
txbEmulateLoginResult.Text =
""
;
}
}
}
对应的,完整的VS2010的C#项目,可以去这里下载:
emulateLoginBaidu_csharp_2012-11-07.7z
【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】
后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。
下面是完整的,利用到crifanLib.cs的版本,的C#代码:
using
System;
using
System.Collections.Generic;
using
System.ComponentModel;
using
System.Data;
using
System.Drawing;
using
System.Text;
using
System.Windows.Forms;
using
System.Net;
using
System.IO;
using
System.Text.RegularExpressions;
using
System.Web;
namespace
emulateLoginBaidu
{
public
partial
class
frmEmulateLoginBaidu : Form
{
CookieCollection curCookies =
null
;
bool
gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;
public
frmEmulateLoginBaidu()
{
InitializeComponent();
}
private
void
frmEmulateLoginBaidu_Load(
object
sender, EventArgs e)
{
this
.AcceptButton =
this
.btnEmulateLoginBaidu;
//init for crifanLib.cs
curCookies =
new
CookieCollection();
//init for demo login
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
}
/******************************************************************************
functions in crifanLib.cs
Online browser:http://code.google.com/p/crifanlib/source/browse/trunk/csharp/crifanLib.cs
Download: http://code.google.com/p/crifanlib/
*******************************************************************************/
//quote the input dict values
//note: the return result for first para no '&'
public
string
quoteParas(Dictionary<
string
,
string
> paras)
{
string
quotedParas =
""
;
bool
isFirst =
true
;
string
val =
""
;
foreach
(
string
para
in
paras.Keys)
{
if
(paras.TryGetValue(para,
out
val))
{
if
(isFirst)
{
isFirst =
false
;
quotedParas += para +
"="
+ HttpUtility.UrlPathEncode(val);
}
else
{
quotedParas +=
"&"
+ para +
"="
+ HttpUtility.UrlPathEncode(val);
}
}
else
{
break
;
}
}
return
quotedParas;
}
/*********************************************************************/
/* cookie */
/*********************************************************************/
//add a single cookie to cookies, if already exist, update its value
public
void
addCookieToCookies(Cookie toAdd,
ref
CookieCollection cookies,
bool
overwriteDomain)
{
bool
found =
false
;
if
(cookies.Count > 0)
{
foreach
(Cookie originalCookie
in
cookies)
{
if
(originalCookie.Name == toAdd.Name)
{
// !!! for different domain, cookie is not same,
// so should not set the cookie value here while their domains is not same
// only if it explictly need overwrite domain
if
((originalCookie.Domain == toAdd.Domain) ||
((originalCookie.Domain != toAdd.Domain) && overwriteDomain))
{
//here can not force convert CookieCollection to HttpCookieCollection,
//then use .remove to remove this cookie then add
// so no good way to copy all field value
originalCookie.Value = toAdd.Value;
originalCookie.Domain = toAdd.Domain;
originalCookie.Expires = toAdd.Expires;
originalCookie.Version = toAdd.Version;
originalCookie.Path = toAdd.Path;
//following fields seems should not change
//originalCookie.HttpOnly = toAdd.HttpOnly;
//originalCookie.Secure = toAdd.Secure;
found =
true
;
break
;
}
}
}
}
if
(!found)
{
if
(toAdd.Domain !=
""
)
{
// if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!!
cookies.Add(toAdd);
}
}
}
//addCookieToCookies
//add singel cookie to cookies, default no overwrite domain
public
void
addCookieToCookies(Cookie toAdd,
ref
CookieCollection cookies)
{
addCookieToCookies(toAdd,
ref
cookies,
false
);
}
//check whether the cookies contains the ckToCheck cookie
//support:
//ckTocheck is Cookie/string
//cookies is Cookie/string/CookieCollection/string[]
public
bool
isContainCookie(
object
ckToCheck,
object
cookies)
{
bool
isContain =
false
;
if
((ckToCheck !=
null
) && (cookies !=
null
))
{
string
ckName =
""
;
Type type = ckToCheck.GetType();
//string typeStr = ckType.ToString();
//if (ckType.FullName == "System.string")
if
(type.Name.ToLower() ==
"string"
)
{
ckName = (
string
)ckToCheck;
}
else
if
(type.Name ==
"Cookie"
)
{
ckName = ((Cookie)ckToCheck).Name;
}
if
(ckName !=
""
)
{
type = cookies.GetType();
// is single Cookie
if
(type.Name ==
"Cookie"
)
{
if
(ckName == ((Cookie)cookies).Name)
{
isContain =
true
;
}
}
// is CookieCollection
else
if
(type.Name ==
"CookieCollection"
)
{
foreach
(Cookie ck
in
(CookieCollection)cookies)
{
if
(ckName == ck.Name)
{
isContain =
true
;
break
;
}
}
}
// is single cookie name string
else
if
(type.Name.ToLower() ==
"string"
)
{
if
(ckName == (
string
)cookies)
{
isContain =
true
;
}
}
// is cookie name string[]
else
if
(type.Name.ToLower() ==
"string[]"
)
{
foreach
(
string
name
in
((
string
[])cookies))
{
if
(ckName == name)
{
isContain =
true
;
break
;
}
}
}
}
}
return
isContain;
}
//isContainCookie
// update cookiesToUpdate to localCookies
// if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate
public
void
updateLocalCookies(CookieCollection cookiesToUpdate,
ref
CookieCollection localCookies,
object
omitUpdateCookies)
{
if
(cookiesToUpdate.Count > 0)
{
if
(localCookies ==
null
)
{
localCookies = cookiesToUpdate;
}
else
{
foreach
(Cookie newCookie
in
cookiesToUpdate)
{
if
(isContainCookie(newCookie, omitUpdateCookies))
{
// need omit process this
}
else
{
addCookieToCookies(newCookie,
ref
localCookies);
}
}
}
}
}
//updateLocalCookies
//update cookiesToUpdate to localCookies
public
void
updateLocalCookies(CookieCollection cookiesToUpdate,
ref
CookieCollection localCookies)
{
updateLocalCookies(cookiesToUpdate,
ref
localCookies,
null
);
}
/*********************************************************************/
/* HTTP */
/*********************************************************************/
/* get url's response */
public
HttpWebResponse getUrlResponse(
string
url,
Dictionary<
string
,
string
> headerDict,
Dictionary<
string
,
string
> postDict,
int
timeout,
string
postDataStr)
{
//CookieCollection parsedCookies;
HttpWebResponse resp =
null
;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.AllowAutoRedirect =
true
;
req.Accept =
"*/*"
;
//const string gAcceptLanguage = "en-US"; // zh-CN/en-US
//req.Headers["Accept-Language"] = gAcceptLanguage;
req.KeepAlive =
true
;
//IE8
//const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";
//IE9
//const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64
const
string
gUserAgent =
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
;
// x86
//Chrome
//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";
//Mozilla Firefox
//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";
req.UserAgent = gUserAgent;
req.Headers[
"Accept-Encoding"
] =
"gzip, deflate"
;
req.AutomaticDecompression = DecompressionMethods.GZip;
req.Proxy =
null
;
if
(timeout > 0)
{
req.Timeout = timeout;
}
if
(curCookies !=
null
)
{
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.PerDomainCapacity = 40;
// following will exceed max default 20 cookie per domain
req.CookieContainer.Add(curCookies);
}
if
(headerDict !=
null
)
{
foreach
(
string
header
in
headerDict.Keys)
{
string
headerValue =
""
;
if
(headerDict.TryGetValue(header,
out
headerValue))
{
// following are allow the caller overwrite the default header setting
if
(header.ToLower() ==
"referer"
)
{
req.Referer = headerValue;
}
else
if
(header.ToLower() ==
"allowautoredirect"
)
{
bool
isAllow =
false
;
if
(
bool
.TryParse(headerValue,
out
isAllow))
{
req.AllowAutoRedirect = isAllow;
}
}
else
if
(header.ToLower() ==
"accept"
)
{
req.Accept = headerValue;
}
else
if
(header.ToLower() ==
"keepalive"
)
{
bool
isKeepAlive =
false
;
if
(
bool
.TryParse(headerValue,
out
isKeepAlive))
{
req.KeepAlive = isKeepAlive;
}
}
else
if
(header.ToLower() ==
"accept-language"
)
{
req.Headers[
"Accept-Language"
] = headerValue;
}
else
if
(header.ToLower() ==
"useragent"
)
{
req.UserAgent = headerValue;
}
else
{
req.Headers[header] = headerValue;
}
}
else
{
break
;
}
}
}
if
(postDict !=
null
|| postDataStr !=
""
)
{
req.Method =
"POST"
;
req.ContentType =
"application/x-www-form-urlencoded"
;
if
(postDict !=
null
)
{
postDataStr = quoteParas(postDict);
}
//byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData);
byte
[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
}
else
{
req.Method =
"GET"
;
}
//may timeout, has fixed in:
//http://www.crifan.com/fixed_problem_sometime_httpwebrequest_getresponse_timeout/
resp = (HttpWebResponse)req.GetResponse();
updateLocalCookies(resp.Cookies,
ref
curCookies);
return
resp;
}
public
HttpWebResponse getUrlResponse(
string
url,
Dictionary<
string
,
string
> headerDict,
Dictionary<
string
,
string
> postDict)
{
return
getUrlResponse(url, headerDict, postDict, 0,
""
);
}
public
HttpWebResponse getUrlResponse(
string
url)
{
return
getUrlResponse(url,
null
,
null
, 0,
""
);
}
// valid charset:"GB18030"/"UTF-8", invliad:"UTF8"
public
string
getUrlRespHtml(
string
url,
Dictionary<
string
,
string
> headerDict,
string
charset,
Dictionary<
string
,
string
> postDict,
int
timeout,
string
postDataStr)
{
string
respHtml =
""
;
//HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout);
HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr);
//long realRespLen = resp.ContentLength;
StreamReader sr;
if
((charset !=
null
) && (charset !=
""
))
{
Encoding htmlEncoding = Encoding.GetEncoding(charset);
sr =
new
StreamReader(resp.GetResponseStream(), htmlEncoding);
}
else
{
sr =
new
StreamReader(resp.GetResponseStream());
}
respHtml = sr.ReadToEnd();
return
respHtml;
}
public
string
getUrlRespHtml(
string
url, Dictionary<
string
,
string
> headerDict,
string
charset, Dictionary<
string
,
string
> postDict,
string
postDataStr)
{
return
getUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr);
}
public
string
getUrlRespHtml(
string
url, Dictionary<
string
,
string
> headerDict, Dictionary<
string
,
string
> postDict)
{
return
getUrlRespHtml(url, headerDict,
""
, postDict,
""
);
}
public
string
getUrlRespHtml(
string
url, Dictionary<
string
,
string
> headerDict)
{
return
getUrlRespHtml(url, headerDict,
null
);
}
public
string
getUrlRespHtml(
string
url,
string
charset,
int
timeout)
{
return
getUrlRespHtml(url,
null
, charset,
null
, timeout,
""
);
}
public
string
getUrlRespHtml(
string
url,
string
charset)
{
return
getUrlRespHtml(url, charset, 0);
}
public
string
getUrlRespHtml(
string
url)
{
return
getUrlRespHtml(url,
""
);
}
/******************************************************************************
Demo emulate login baidu related functions
*******************************************************************************/
private
void
btnGetBaiduid_Click(
object
sender, EventArgs e)
{
//http://www.baidu.com/
string
baiduMainUrl = txbBaiduMainUrl.Text;
HttpWebResponse resp = getUrlResponse(baiduMainUrl);
txbGotBaiduid.Text =
""
;
foreach
(Cookie ck
in
resp.Cookies)
{
txbGotBaiduid.Text +=
"["
+ ck.Name +
"]="
+ ck.Value;
if
(ck.Name ==
"BAIDUID"
)
{
gotCookieBaiduid =
true
;
}
}
if
(gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show(
"错误:没有找到cookie BAIDUID !"
);
}
}
private
void
btnGetToken_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid)
{
string
getapiUrl =
"https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true"
;
string
respHtml = getUrlRespHtml(getapiUrl);
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string
tokenValP =
@"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"
;
Match foundTokenVal = (
new
Regex(tokenValP)).Match(respHtml);
if
(foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups[
"tokenVal"
].Value;
extractTokenValueOK =
true
;
}
else
{
txbExtractedTokenVal.Text =
"错误:没有找到token的值!"
;
}
}
else
{
MessageBox.Show(
"错误:之前没有正确获得Cookie:BAIDUID !"
);
}
}
private
void
btnEmulateLoginBaidu_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid && extractTokenValueOK)
{
string
staticpage =
"http://www.baidu.com/cache/user/html/jump.html"
;
//init post dict info
Dictionary<
string
,
string
> postDict =
new
Dictionary<
string
,
string
>();
//postDict.Add("ppui_logintime", "");
postDict.Add(
"charset"
,
"utf-8"
);
//postDict.Add("codestring", "");
postDict.Add(
"token"
, txbExtractedTokenVal.Text);
postDict.Add(
"isPhone"
,
"false"
);
postDict.Add(
"index"
,
"0"
);
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add(
"staticpage"
, staticpage);
postDict.Add(
"loginType"
,
"1"
);
postDict.Add(
"tpl"
,
"mn"
);
postDict.Add(
"callback"
,
"parent.bdPass.api.login._postCallback"
);
postDict.Add(
"username"
, txbBaiduUsername.Text);
postDict.Add(
"password"
, txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add(
"mem_pass"
,
"on"
);
string
baiduMainLoginUrl =
"https://passport.baidu.com/v2/api/?login"
;
string
loginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl,
null
, postDict);
//check whether got all expected cookies
Dictionary<
string
,
bool
> cookieCheckDict =
new
Dictionary<
string
,
bool
>();
string
[] cookiesNameList = {
"BDUSS"
,
"PTOKEN"
,
"STOKEN"
,
"SAVEUSERID"
};
foreach
(String cookieToCheck
in
cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck,
false
);
}
foreach
(Cookie singleCookie
in
curCookies)
{
if
(cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] =
true
;
}
}
bool
allCookiesFound =
true
;
foreach
(
bool
foundCurCookie
in
cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if
(loginBaiduOk)
{
txbEmulateLoginResult.Text =
"成功模拟登陆百度首页!"
;
}
else
{
txbEmulateLoginResult.Text =
"模拟登陆百度首页 失败!"
;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的HTML源码为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
}
}
else
{
MessageBox.Show(
"错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"
);
}
}
private
void
lklEmulateLoginTutorialUrl_LinkClicked(
object
sender, LinkLabelLinkClickedEventArgs e)
{
string
emulateLoginTutorialUrl =
"http://www.crifan.com/emulate_login_website_using_csharp"
;
System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
}
private
void
btnClearAll_Click(
object
sender, EventArgs e)
{
curCookies =
new
CookieCollection();
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
txbGotBaiduid.Text =
""
;
txbExtractedTokenVal.Text =
""
;
txbBaiduUsername.Text =
""
;
txbBaiduPassword.Text =
""
;
txbEmulateLoginResult.Text =
""
;
}
}
}
完整的VS2010的项目,可去这里下载:
emulateLoginBaidu_csharp_crifanLibVersion_2012-11-07.7z
关于crifanLib.cs:
在线浏览:crifanLib.cs
下载:crifanLib_2012-11-07.7z
【总结】
可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比用Python实现出来,要复杂的多。
主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。
所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。
【后记 2013-09-11】
1.经过研究:
【记录】研究模拟登陆百度的C#代码为何在.NET 4.0中不工作
的确是:
之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;
2.现已找到原因并修复。
原因是:
.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:
H_PS_PSSID
失效,导致后续操作都异常了。
而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;
3.修复后的代码:
供下载:
(1)模拟百度登陆 独立完整代码版本 .NET 4.0
emulateLoginBaidu_csharp_independentCodeVersion_2013-09-11.7z
(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0
emulateLoginBaidu_csharp_crifanLibVersion_2013-09-11.7z
(抽空再上传上面两个文件,因为此处上传出错:
xxx.7z:unknown Bytes complete FAILED!
:Upload canceled
: VIRUS DETECTED!
(Heuristics.Broken.Executable FOUND)
抽空换个时间上传试试。还是同样错误的话,再去解决。)
【总结】
.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:
一直就是狗屎,bug一堆。
详见:
SetCookie解析有bug
以后,能少用那个resp.Cookies,就少用吧。
否则被C#玩死,都不知道怎么死的。
还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。
详见:
解析(Http访问所返回的)Set-Cookie的字符串为Cookie数组:parseSetCookie
- 模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
- 模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
- 【教程】模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
- 模拟登陆网站 之 Python版(内含两种版本的完整的可运行的代码)
- 使用C#的HttpWebRequest模拟登陆网站
- 使用C#的HttpWebRequest模拟登陆网站
- 使用C#的HttpWebRequest模拟登陆网站
- Delphi字符串处理(下面贴出的是完整的可运行代码)
- 从控制台读入double数据的容错处理(附完整可运行代码)
- 详解抓取网站,模拟登陆,抓取动态网页的原理和实现(Python,C#等)
- Java完整的运行代码
- 可算是写好了自己的第一个教务处模拟登陆的代码了-.-
- 网页版的模拟登陆有验证码的网站
- 在rhas3.0上建立一个完整的邮件系统(内含四部分)修正版 V
- python3.3教程之模拟百度登陆的代码
- 玩玩网络课堂的外挂-网站模拟登陆
- c++ primer 习题 15.35 集成了哪一章的完整的可运行代码
- python爬虫学习之路(1)_ CSDN网站的模拟登陆
- linux shell中的$0,$?,$!和<<'END'
- 从post获取的html源码的编码问题
- 黑马程序员--Java基础之集合
- JAVA设计模式之 建造者模式【Builder Pattern】
- 数据结构第三章思维导图
- 模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
- 黑马程序员-day14集合
- c++项目包含纯c文件编译方法
- Google Guice之牛刀小试
- 题目1510:替换空格
- s5pv210驱动之流水灯程序2
- H.264 视频 RTP 负载格式
- ftpclient
- C++成员变量的初始化顺序问题