模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)

来源:互联网 发布:蓝叠安卓模拟器mac版 编辑:程序博客网 时间:2024/06/05 09:13

模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)

之前已经介绍过了网络相关的一些基础知识了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

以及简单的网页内容抓取,用C#是如何实现的:

【教程】抓取网并提取网页中所需要的信息 之 C#版

现在接着来介绍,以模拟登陆百度首页:

http://www.baidu.com/

为例,说明如何通过C#模拟登陆网站。

不过,此处需要介绍一下此文前提:

假定你已经看完了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

了解了基本的网络相关基本概念;

看完了:

【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器

知道了如何使用IE9的F12等工具去分析网页执行的过程。

1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑

此想要通过程序,即C#代码,实现模拟登陆百度首页之前。

你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。

而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:

【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程

2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑

看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。

注:

(1)关于在C#中如何利用cookie,不熟悉的,先去看:

【经验总结】Http,网页访问,HttpRequest,HttpResponse相关的知识

(2)对于正则表达式不熟悉的,去参考:

正则表达式学习心得

(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:

C#中的正则表达式的学习心得

此处,再把分析出来的流程,贴出来,以便方便和代码对照:

顺序访问地址访问类型发送的数据需要获得/提取 的返回的值 1http://www.baidu.com/GET无返回的cookie中的BAIDUID2https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=trueGET包含BAIDUID这个cookie从返回的html中提取出token的值3https://passport.baidu.com/v2/api/?loginPOST一堆的post data,其中token的值是之前提取出来的需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID

然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。

【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】

其中,通过UI中,点击“获取cookie BAIDUID”:

click get cookie baiduid then got its value

然后调用下面这部分代码:

private void btnGetBaiduid_Click(object sender, EventArgs e){    //http://www.baidu.com/    string baiduMainUrl = txbBaiduMainUrl.Text;    //generate http request    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);     //add follow code to handle cookies    req.CookieContainer = new CookieContainer();    req.CookieContainer.Add(curCookies);     req.Method = "GET";    //use request to get response    HttpWebResponse resp = (HttpWebResponse)req.GetResponse();    txbGotBaiduid.Text = "";    foreach (Cookie ck in resp.Cookies)    {        txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;        if (ck.Name == "BAIDUID")        {            gotCookieBaiduid = true;        }    }     if (gotCookieBaiduid)    {        //store cookies        curCookies = resp.Cookies;    }    else    {        MessageBox.Show("错误:没有找到cookie BAIDUID !");    }}

获得上述所看到的BAIDUID这个cookie的值了。

然后接着点击“获取token值”,然后调用下面的代码:

private void btnGetToken_Click(object sender, EventArgs e){    if (gotCookieBaiduid)    {        string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);         //add previously got cookies        req.CookieContainer = new CookieContainer();        req.CookieContainer.Add(curCookies);         req.Method = "GET";        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();        StreamReader sr = new StreamReader(resp.GetResponseStream());        string respHtml = sr.ReadToEnd();         //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';        string tokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';";        Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);        if (foundTokenVal.Success)        {            //extracted the token value            txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;            extractTokenValueOK = true;        }        else        {            txbExtractedTokenVal.Text = "错误:没有找到token的值!";        }     }    else    {        MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");    }}

就可以获取对应的token的值了:

click get token then got token value

接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:

private void btnEmulateLoginBaidu_Click(object sender, EventArgs e){    if (gotCookieBaiduid && extractTokenValueOK)    {        string staticpage = "http://www.baidu.com/cache/user/html/jump.html";                 //init post dict info        Dictionary<string, string> postDict = new Dictionary<string, string>();        //postDict.Add("ppui_logintime", "");        postDict.Add("charset", "utf-8");        //postDict.Add("codestring", "");        postDict.Add("token", txbExtractedTokenVal.Text);        postDict.Add("isPhone", "false");        postDict.Add("index", "0");        //postDict.Add("u", "");        //postDict.Add("safeflg", "0");        postDict.Add("staticpage", staticpage);        postDict.Add("loginType", "1");        postDict.Add("tpl", "mn");        postDict.Add("callback", "parent.bdPass.api.login._postCallback");        postDict.Add("username", txbBaiduUsername.Text);        postDict.Add("password", txbBaiduPassword.Text);        //postDict.Add("verifycode", "");        postDict.Add("mem_pass", "on");         string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);        //add cookie        req.CookieContainer = new CookieContainer();        req.CookieContainer.Add(curCookies);        //set to POST        req.Method = "POST";        req.ContentType = "application/x-www-form-urlencoded";        //prepare post data        string postDataStr = quoteParas(postDict);        byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);        req.ContentLength = postBytes.Length;        //send post data        Stream postDataStream = req.GetRequestStream();        postDataStream.Write(postBytes, 0, postBytes.Length);        postDataStream.Close();        //got response        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();        //got returned html        StreamReader sr = new StreamReader(resp.GetResponseStream());        string loginBaiduRespHtml = sr.ReadToEnd();         //check whether got all expected cookies        Dictionary<string, bool> cookieCheckDict = new Dictionary<string, bool>();        string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};        foreach (String cookieToCheck in cookiesNameList)        {            cookieCheckDict.Add(cookieToCheck, false);         }         foreach (Cookie singleCookie in resp.Cookies)        {            if (cookieCheckDict.ContainsKey(singleCookie.Name))            {                cookieCheckDict[singleCookie.Name] = true;            }        }         bool allCookiesFound = true;        foreach (bool foundCurCookie in cookieCheckDict.Values)        {            allCookiesFound = allCookiesFound && foundCurCookie;         }          loginBaiduOk = allCookiesFound;        if (loginBaiduOk)        {            txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";        }        else        {            txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";            txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";            txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();            txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;            txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";            txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;        }    }    else    {        MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");    }}

如果用户名和密码都是正确的话,即可成功登陆:

input name and pwd then click login will login ok

当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:

fake name and pwd will login fail

完整的C#模拟登陆百度首页的代码,如下:

using System;using System.Collections.Generic;using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms; using System.Net;using System.IO;using System.Text.RegularExpressions;using System.Web; namespace emulateLoginBaidu{    public partial class frmEmulateLoginBaidu : Form    {        CookieCollection curCookies = null;         bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;         public frmEmulateLoginBaidu()        {            InitializeComponent();        }         private void frmEmulateLoginBaidu_Load(object sender, EventArgs e)        {            //init            curCookies = new CookieCollection();            gotCookieBaiduid = false;            extractTokenValueOK = false;            loginBaiduOk = false;        }         /******************************************************************************        functions in crifanLib.cs        *******************************************************************************/         //quote the input dict values        //note: the return result for first para no '&'        public string quoteParas(Dictionary<string, string> paras)        {            string quotedParas = "";            bool isFirst = true;            string val = "";            foreach (string para in paras.Keys)            {                if (paras.TryGetValue(para, out val))                {                    if (isFirst)                    {                        isFirst = false;                        quotedParas += para + "=" + HttpUtility.UrlPathEncode(val);                    }                    else                    {                        quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val);                    }                }                else                {                    break;                }            }             return quotedParas;        }         /******************************************************************************        Demo emulate login baidu related functions        *******************************************************************************/         private void btnGetBaiduid_Click(object sender, EventArgs e)        {            //http://www.baidu.com/            string baiduMainUrl = txbBaiduMainUrl.Text;            //generate http request            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);             //add follow code to handle cookies            req.CookieContainer = new CookieContainer();            req.CookieContainer.Add(curCookies);             req.Method = "GET";            //use request to get response            HttpWebResponse resp = (HttpWebResponse)req.GetResponse();            txbGotBaiduid.Text = "";            foreach (Cookie ck in resp.Cookies)            {                txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;                if (ck.Name == "BAIDUID")                {                    gotCookieBaiduid = true;                }            }             if (gotCookieBaiduid)            {                //store cookies                curCookies = resp.Cookies;            }            else            {                MessageBox.Show("错误:没有找到cookie BAIDUID !");            }        }         private void btnGetToken_Click(object sender, EventArgs e)        {            if (gotCookieBaiduid)            {                string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);                 //add previously got cookies                req.CookieContainer = new CookieContainer();                req.CookieContainer.Add(curCookies);                 req.Method = "GET";                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();                StreamReader sr = new StreamReader(resp.GetResponseStream());                string respHtml = sr.ReadToEnd();                 //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';                string tokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';";                Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);                if (foundTokenVal.Success)                {                    //extracted the token value                    txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;                    extractTokenValueOK = true;                }                else                {                    txbExtractedTokenVal.Text = "错误:没有找到token的值!";                }             }            else            {                MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");            }        }         private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)        {            if (gotCookieBaiduid && extractTokenValueOK)            {                string staticpage = "http://www.baidu.com/cache/user/html/jump.html";                                 //init post dict info                Dictionary<string, string> postDict = new Dictionary<string, string>();                //postDict.Add("ppui_logintime", "");                postDict.Add("charset", "utf-8");                //postDict.Add("codestring", "");                postDict.Add("token", txbExtractedTokenVal.Text);                postDict.Add("isPhone", "false");                postDict.Add("index", "0");                //postDict.Add("u", "");                //postDict.Add("safeflg", "0");                postDict.Add("staticpage", staticpage);                postDict.Add("loginType", "1");                postDict.Add("tpl", "mn");                postDict.Add("callback", "parent.bdPass.api.login._postCallback");                postDict.Add("username", txbBaiduUsername.Text);                postDict.Add("password", txbBaiduPassword.Text);                //postDict.Add("verifycode", "");                postDict.Add("mem_pass", "on");                 string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);                //add cookie                req.CookieContainer = new CookieContainer();                req.CookieContainer.Add(curCookies);                //set to POST                req.Method = "POST";                req.ContentType = "application/x-www-form-urlencoded";                //prepare post data                string postDataStr = quoteParas(postDict);                byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);                req.ContentLength = postBytes.Length;                //send post data                Stream postDataStream = req.GetRequestStream();                postDataStream.Write(postBytes, 0, postBytes.Length);                postDataStream.Close();                //got response                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();                //got returned html                StreamReader sr = new StreamReader(resp.GetResponseStream());                string loginBaiduRespHtml = sr.ReadToEnd();                 //check whether got all expected cookies                Dictionary<string, bool> cookieCheckDict = new Dictionary<string, bool>();                string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};                foreach (String cookieToCheck in cookiesNameList)                {                    cookieCheckDict.Add(cookieToCheck, false);                 }                 foreach (Cookie singleCookie in resp.Cookies)                {                    if (cookieCheckDict.ContainsKey(singleCookie.Name))                    {                        cookieCheckDict[singleCookie.Name] = true;                    }                }                 bool allCookiesFound = true;                foreach (bool foundCurCookie in cookieCheckDict.Values)                {                    allCookiesFound = allCookiesFound && foundCurCookie;                 }                  loginBaiduOk = allCookiesFound;                if (loginBaiduOk)                {                    txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";                }                else                {                    txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";                    txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";                    txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();                    txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;                    txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";                    txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;                }            }            else            {                MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");            }        }         private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)        {            string emulateLoginTutorialUrl = "http://www.crifan.com/emulate_login_website_using_csharp";            System.Diagnostics.Process.Start(emulateLoginTutorialUrl);        }         private void btnClearAll_Click(object sender, EventArgs e)        {            curCookies = new CookieCollection();            gotCookieBaiduid = false;            extractTokenValueOK = false;            loginBaiduOk = false;             txbGotBaiduid.Text = "";            txbExtractedTokenVal.Text = "";             txbBaiduUsername.Text = "";            txbBaiduPassword.Text = "";            txbEmulateLoginResult.Text = "";        }    }}

对应的,完整的VS2010的C#项目,可以去这里下载:

emulateLoginBaidu_csharp_2012-11-07.7z

【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】

后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。

下面是完整的,利用到crifanLib.cs的版本,的C#代码:

using System;using System.Collections.Generic;using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms; using System.Net;using System.IO;using System.Text.RegularExpressions;using System.Web; namespace emulateLoginBaidu{    public partial class frmEmulateLoginBaidu : Form    {        CookieCollection curCookies = null;         bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;         public frmEmulateLoginBaidu()        {            InitializeComponent();        }         private void frmEmulateLoginBaidu_Load(object sender, EventArgs e)        {            this.AcceptButton = this.btnEmulateLoginBaidu;             //init for crifanLib.cs            curCookies = new CookieCollection();             //init for demo login            gotCookieBaiduid = false;            extractTokenValueOK = false;            loginBaiduOk = false;        }         /******************************************************************************        functions in crifanLib.cs        Online browser: http://code.google.com/p/crifanlib/source/browse/trunk/csharp/crifanLib.cs        Download:       http://code.google.com/p/crifanlib/        *******************************************************************************/         //quote the input dict values        //note: the return result for first para no '&'        public string quoteParas(Dictionary<string, string> paras)        {            string quotedParas = "";            bool isFirst = true;            string val = "";            foreach (string para in paras.Keys)            {                if (paras.TryGetValue(para, out val))                {                    if (isFirst)                    {                        isFirst = false;                        quotedParas += para + "=" + HttpUtility.UrlPathEncode(val);                    }                    else                    {                        quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val);                    }                }                else                {                    break;                }            }             return quotedParas;        }          /*********************************************************************/        /* cookie */        /*********************************************************************/         //add a single cookie to cookies, if already exist, update its value        public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies, bool overwriteDomain)        {            bool found = false;             if (cookies.Count > 0)            {                foreach (Cookie originalCookie in cookies)                {                    if (originalCookie.Name == toAdd.Name)                    {                        // !!! for different domain, cookie is not same,                        // so should not set the cookie value here while their domains is not same                        // only if it explictly need overwrite domain                        if ((originalCookie.Domain == toAdd.Domain) ||                            ((originalCookie.Domain != toAdd.Domain) && overwriteDomain))                        {                            //here can not force convert CookieCollection to HttpCookieCollection,                            //then use .remove to remove this cookie then add                            // so no good way to copy all field value                            originalCookie.Value = toAdd.Value;                             originalCookie.Domain = toAdd.Domain;                             originalCookie.Expires = toAdd.Expires;                            originalCookie.Version = toAdd.Version;                            originalCookie.Path = toAdd.Path;                             //following fields seems should not change                            //originalCookie.HttpOnly = toAdd.HttpOnly;                            //originalCookie.Secure = toAdd.Secure;                             found = true;                            break;                        }                    }                }            }             if (!found)            {                if (toAdd.Domain != "")                {                    // if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!!                    cookies.Add(toAdd);                }            }         }//addCookieToCookies         //add singel cookie to cookies, default no overwrite domain        public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies)        {            addCookieToCookies(toAdd, ref cookies, false);        }         //check whether the cookies contains the ckToCheck cookie        //support:        //ckTocheck is Cookie/string        //cookies is Cookie/string/CookieCollection/string[]        public bool isContainCookie(object ckToCheck, object cookies)        {            bool isContain = false;             if ((ckToCheck != null) && (cookies != null))            {                string ckName = "";                Type type = ckToCheck.GetType();                 //string typeStr = ckType.ToString();                 //if (ckType.FullName == "System.string")                if (type.Name.ToLower() == "string")                {                    ckName = (string)ckToCheck;                }                else if (type.Name == "Cookie")                {                    ckName = ((Cookie)ckToCheck).Name;                }                 if (ckName != "")                {                    type = cookies.GetType();                     // is single Cookie                    if (type.Name == "Cookie")                    {                        if (ckName == ((Cookie)cookies).Name)                        {                            isContain = true;                        }                    }                    // is CookieCollection                    else if (type.Name == "CookieCollection")                    {                        foreach (Cookie ck in (CookieCollection)cookies)                        {                            if (ckName == ck.Name)                            {                                isContain = true;                                break;                            }                        }                    }                    // is single cookie name string                    else if (type.Name.ToLower() == "string")                    {                        if (ckName == (string)cookies)                        {                            isContain = true;                        }                    }                    // is cookie name string[]                    else if (type.Name.ToLower() == "string[]")                    {                        foreach (string name in ((string[])cookies))                        {                            if (ckName == name)                            {                                isContain = true;                                break;                            }                        }                    }                }            }             return isContain;        }//isContainCookie         // update cookiesToUpdate to localCookies        // if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate        public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies, object omitUpdateCookies)        {            if (cookiesToUpdate.Count > 0)            {                if (localCookies == null)                {                    localCookies = cookiesToUpdate;                }                else                {                    foreach (Cookie newCookie in cookiesToUpdate)                    {                        if (isContainCookie(newCookie, omitUpdateCookies))                        {                            // need omit process this                        }                        else                        {                            addCookieToCookies(newCookie, ref localCookies);                        }                    }                }            }        }//updateLocalCookies                 //update cookiesToUpdate to localCookies        public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies)        {            updateLocalCookies(cookiesToUpdate, ref localCookies, null);        }         /*********************************************************************/        /* HTTP */        /*********************************************************************/         /* get url's response */        public HttpWebResponse getUrlResponse(string url,                                        Dictionary<string, string> headerDict,                                        Dictionary<string, string> postDict,                                        int timeout,                                        string postDataStr)        {            //CookieCollection parsedCookies;             HttpWebResponse resp = null;             HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);             req.AllowAutoRedirect = true;            req.Accept = "*/*";             //const string gAcceptLanguage = "en-US"; // zh-CN/en-US            //req.Headers["Accept-Language"] = gAcceptLanguage;             req.KeepAlive = true;             //IE8            //const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";            //IE9            //const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64            const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86            //Chrome            //const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";            //Mozilla Firefox            //const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";            req.UserAgent = gUserAgent;             req.Headers["Accept-Encoding"] = "gzip, deflate";            req.AutomaticDecompression = DecompressionMethods.GZip;             req.Proxy = null;             if (timeout > 0)            {                req.Timeout = timeout;            }             if (curCookies != null)            {                req.CookieContainer = new CookieContainer();                req.CookieContainer.PerDomainCapacity = 40; // following will exceed max default 20 cookie per domain                req.CookieContainer.Add(curCookies);            }             if (headerDict != null)            {                foreach (string header in headerDict.Keys)                {                    string headerValue = "";                    if (headerDict.TryGetValue(header, out headerValue))                    {                        // following are allow the caller overwrite the default header setting                        if (header.ToLower() == "referer")                        {                            req.Referer = headerValue;                        }                        else if (header.ToLower() == "allowautoredirect")                        {                            bool isAllow = false;                            if (bool.TryParse(headerValue, out isAllow))                            {                                req.AllowAutoRedirect = isAllow;                            }                        }                        else if (header.ToLower() == "accept")                        {                            req.Accept = headerValue;                        }                        else if (header.ToLower() == "keepalive")                        {                            bool isKeepAlive = false;                            if (bool.TryParse(headerValue, out isKeepAlive))                            {                                req.KeepAlive = isKeepAlive;                            }                        }                        else if (header.ToLower() == "accept-language")                        {                            req.Headers["Accept-Language"] = headerValue;                        }                        else if (header.ToLower() == "useragent")                        {                            req.UserAgent = headerValue;                        }                        else                        {                            req.Headers[header] = headerValue;                        }                    }                    else                    {                        break;                    }                }            }             if (postDict != null || postDataStr != "")            {                req.Method = "POST";                req.ContentType = "application/x-www-form-urlencoded";                 if (postDict != null)                {                    postDataStr = quoteParas(postDict);                }                 //byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData);                byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);                req.ContentLength = postBytes.Length;                 Stream postDataStream = req.GetRequestStream();                postDataStream.Write(postBytes, 0, postBytes.Length);                postDataStream.Close();            }            else            {                req.Method = "GET";            }             //may timeout, has fixed in:            //http://www.crifan.com/fixed_problem_sometime_httpwebrequest_getresponse_timeout/            resp = (HttpWebResponse)req.GetResponse();             updateLocalCookies(resp.Cookies, ref curCookies);             return resp;        }         public HttpWebResponse getUrlResponse(string url,                                    Dictionary<string, string> headerDict,                                    Dictionary<string, string> postDict)        {            return getUrlResponse(url, headerDict, postDict, 0, "");        }         public HttpWebResponse getUrlResponse(string url)        {            return getUrlResponse(url, null, null, 0, "");        }         // valid charset:"GB18030"/"UTF-8", invliad:"UTF8"        public string getUrlRespHtml(string url,                                        Dictionary<string, string> headerDict,                                        string charset,                                        Dictionary<string, string> postDict,                                        int timeout,                                        string postDataStr)        {            string respHtml = "";             //HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout);            HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr);             //long realRespLen = resp.ContentLength;             StreamReader sr;            if ((charset != null) && (charset != ""))            {                Encoding htmlEncoding = Encoding.GetEncoding(charset);                sr = new StreamReader(resp.GetResponseStream(), htmlEncoding);            }            else            {                sr = new StreamReader(resp.GetResponseStream());            }            respHtml = sr.ReadToEnd();             return respHtml;        }         public string getUrlRespHtml(string url, Dictionary<string, string> headerDict, string charset, Dictionary<string, string> postDict, string postDataStr)        {            return getUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr);        }         public string getUrlRespHtml(string url, Dictionary<string, string> headerDict, Dictionary<string, string> postDict)        {            return getUrlRespHtml(url, headerDict, "", postDict, "");        }         public string getUrlRespHtml(string url, Dictionary<string, string> headerDict)        {            return getUrlRespHtml(url, headerDict, null);        }         public string getUrlRespHtml(string url, string charset, int timeout)        {            return getUrlRespHtml(url, null, charset, null, timeout, "");        }         public string getUrlRespHtml(string url, string charset)        {            return getUrlRespHtml(url, charset, 0);        }         public string getUrlRespHtml(string url)        {            return getUrlRespHtml(url, "");        }          /******************************************************************************        Demo emulate login baidu related functions        *******************************************************************************/         private void btnGetBaiduid_Click(object sender, EventArgs e)        {            //http://www.baidu.com/            string baiduMainUrl = txbBaiduMainUrl.Text;            HttpWebResponse resp = getUrlResponse(baiduMainUrl);            txbGotBaiduid.Text = "";            foreach (Cookie ck in resp.Cookies)            {                txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;                if (ck.Name == "BAIDUID")                {                    gotCookieBaiduid = true;                }            }             if (gotCookieBaiduid)            {                //store cookies                curCookies = resp.Cookies;            }            else            {                MessageBox.Show("错误:没有找到cookie BAIDUID !");            }        }         private void btnGetToken_Click(object sender, EventArgs e)        {            if (gotCookieBaiduid)            {                string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";                string respHtml = getUrlRespHtml(getapiUrl);                 //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';                string tokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';";                Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);                if (foundTokenVal.Success)                {                    //extracted the token value                    txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;                    extractTokenValueOK = true;                }                else                {                    txbExtractedTokenVal.Text = "错误:没有找到token的值!";                }             }            else            {                MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");            }        }         private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)        {            if (gotCookieBaiduid && extractTokenValueOK)            {                string staticpage = "http://www.baidu.com/cache/user/html/jump.html";                                 //init post dict info                Dictionary<string, string> postDict = new Dictionary<string, string>();                //postDict.Add("ppui_logintime", "");                postDict.Add("charset", "utf-8");                //postDict.Add("codestring", "");                postDict.Add("token", txbExtractedTokenVal.Text);                postDict.Add("isPhone", "false");                postDict.Add("index", "0");                //postDict.Add("u", "");                //postDict.Add("safeflg", "0");                postDict.Add("staticpage", staticpage);                postDict.Add("loginType", "1");                postDict.Add("tpl", "mn");                postDict.Add("callback", "parent.bdPass.api.login._postCallback");                postDict.Add("username", txbBaiduUsername.Text);                postDict.Add("password", txbBaiduPassword.Text);                //postDict.Add("verifycode", "");                postDict.Add("mem_pass", "on");                 string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";                string loginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl, null, postDict);                 //check whether got all expected cookies                Dictionary<string, bool> cookieCheckDict = new Dictionary<string, bool>();                string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};                foreach (String cookieToCheck in cookiesNameList)                {                    cookieCheckDict.Add(cookieToCheck, false);                 }                 foreach (Cookie singleCookie in curCookies)                {                    if (cookieCheckDict.ContainsKey(singleCookie.Name))                    {                        cookieCheckDict[singleCookie.Name] = true;                    }                }                 bool allCookiesFound = true;                foreach (bool foundCurCookie in cookieCheckDict.Values)                {                    allCookiesFound = allCookiesFound && foundCurCookie;                 }                  loginBaiduOk = allCookiesFound;                if (loginBaiduOk)                {                    txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";                }                else                {                    txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";                    txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";                    txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;                }            }            else            {                MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");            }        }         private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)        {            string emulateLoginTutorialUrl = "http://www.crifan.com/emulate_login_website_using_csharp";            System.Diagnostics.Process.Start(emulateLoginTutorialUrl);        }         private void btnClearAll_Click(object sender, EventArgs e)        {            curCookies = new CookieCollection();            gotCookieBaiduid = false;            extractTokenValueOK = false;            loginBaiduOk = false;             txbGotBaiduid.Text = "";            txbExtractedTokenVal.Text = "";             txbBaiduUsername.Text = "";            txbBaiduPassword.Text = "";            txbEmulateLoginResult.Text = "";        }     }}

完整的VS2010的项目,可去这里下载:

emulateLoginBaidu_csharp_crifanLibVersion_2012-11-07.7z

关于crifanLib.cs:

在线浏览: crifanLib.cs

下载: crifanLib_2012-11-07.7z

【总结】

可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比 用Python实现出来 ,要复杂的多。

主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。

所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。

【后记 2013-09-11】

1.经过研究:

【记录】研究模拟登陆百度的C#代码为何在.NET 4.0中不工作

的确是:

之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;

2.现已找到原因并修复。

原因是:

.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:

H_PS_PSSID

失效,导致后续操作都异常了。

而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;

3.修复后的代码:

供下载:

(1)模拟百度登陆 独立完整代码版本 .NET 4.0

emulateLoginBaidu_csharp_independentCodeVersion_2013-09-11.7z

(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0

emulateLoginBaidu_csharp_crifanLibVersion_2013-09-11.7z

(抽空再上传上面两个文件,因为此处上传出错:

xxx.7z: 

unknown Bytes complete FAILED!

:Upload canceled

: VIRUS DETECTED!

(Heuristics.Broken.Executable FOUND)

抽空换个时间上传试试。还是同样错误的话,再去解决。)

【总结】

.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:

一直就是狗屎,bug一堆。

详见:

SetCookie解析有bug

以后,能少用那个resp.Cookies,就少用吧。

否则被C#玩死,都不知道怎么死的。

还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。

详见:

解析(Http访问所返回的)Set-Cookie的字符串为Cookie数组:parseSetCookie

0 0
原创粉丝点击