模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)

来源:互联网 发布:fix you知乎 编辑:程序博客网 时间:2024/05/22 16:01

【教程】模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)

http://www.crifan.com/emulate_login_website_using_csharp/

这个网站很好   ,他研究了  很多 post的 基础知识。。。

还有 就是 他 驱动应该也 涉及过 



之前已经介绍过了网络相关的一些基础知识了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

以及简单的网页内容抓取,用C#是如何实现的:

【教程】抓取网并提取网页中所需要的信息 之 C#版

现在接着来介绍,以模拟登陆百度首页:

http://www.baidu.com/

为例,说明如何通过C#模拟登陆网站。

不过,此处需要介绍一下此文前提:

假定你已经看完了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

了解了基本的网络相关基本概念;

看完了:

【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器

知道了如何使用IE9的F12等工具去分析网页执行的过程。


1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑

 

此想要通过程序,即C#代码,实现模拟登陆百度首页之前。

你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。

而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:

【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程

2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑

 

看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。

注:

(1)关于在C#中如何利用cookie,不熟悉的,先去看:

【经验总结】Http,网页访问,HttpRequest,HttpResponse相关的知识

(2)对于正则表达式不熟悉的,去参考:

正则表达式学习心得

(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:

C#中的正则表达式的学习心得

此处,再把分析出来的流程,贴出来,以便方便和代码对照:

顺序

     

访问地址

     

访问类型

     

发送的数据

     

需要获得/提取的返回的值

 

 

   

 1http://www.baidu.com/GET无返回的cookie中的BAIDUID2https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=trueGET包含BAIDUID这个cookie从返回的html中提取出token的值3https://passport.baidu.com/v2/api/?loginPOST一堆的post data,其中token的值是之前提取出来的需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID

然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。

【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】

其中,通过UI中,点击“获取cookie BAIDUID”:

click get cookie baiduid then got its value

然后调用下面这部分代码:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
privatevoid btnGetBaiduid_Click(objectsender, EventArgs e)
{
    //http://www.baidu.com/
    stringbaiduMainUrl = txbBaiduMainUrl.Text;
    //generate http request
    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
  
    //add follow code to handle cookies
    req.CookieContainer =new CookieContainer();
    req.CookieContainer.Add(curCookies);
  
    req.Method ="GET";
    //use request to get response
    HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
    txbGotBaiduid.Text ="";
    foreach(Cookie ck inresp.Cookies)
    {
        txbGotBaiduid.Text +="[" + ck.Name +"]=" + ck.Value;
        if(ck.Name == "BAIDUID")
        {
            gotCookieBaiduid =true;
        }
    }
  
    if(gotCookieBaiduid)
    {
        //store cookies
        curCookies = resp.Cookies;
    }
    else
    {
        MessageBox.Show("错误:没有找到cookie BAIDUID !");
    }
}

获得上述所看到的BAIDUID这个cookie的值了。

然后接着点击“获取token值”,然后调用下面的代码:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
privatevoid btnGetToken_Click(objectsender, EventArgs e)
{
    if(gotCookieBaiduid)
    {
        stringgetapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
  
        //add previously got cookies
        req.CookieContainer =new CookieContainer();
        req.CookieContainer.Add(curCookies);
  
        req.Method ="GET";
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        StreamReader sr =new StreamReader(resp.GetResponseStream());
        stringrespHtml = sr.ReadToEnd();
  
        //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
        stringtokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';";
        Match foundTokenVal = (newRegex(tokenValP)).Match(respHtml);
        if(foundTokenVal.Success)
        {
            //extracted the token value
            txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;
            extractTokenValueOK =true;
        }
        else
        {
            txbExtractedTokenVal.Text ="错误:没有找到token的值!";
        }
  
    }
    else
    {
        MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
    }
}

就可以获取对应的token的值了:

click get token then got token value

接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
privatevoid btnEmulateLoginBaidu_Click(objectsender, EventArgs e)
{
    if(gotCookieBaiduid && extractTokenValueOK)
    {
        stringstaticpage = "http://www.baidu.com/cache/user/html/jump.html";
          
        //init post dict info
        Dictionary<string,string> postDict =new Dictionary<string,string>();
        //postDict.Add("ppui_logintime", "");
        postDict.Add("charset","utf-8");
        //postDict.Add("codestring", "");
        postDict.Add("token", txbExtractedTokenVal.Text);
        postDict.Add("isPhone","false");
        postDict.Add("index","0");
        //postDict.Add("u", "");
        //postDict.Add("safeflg", "0");
        postDict.Add("staticpage", staticpage);
        postDict.Add("loginType","1");
        postDict.Add("tpl","mn");
        postDict.Add("callback","parent.bdPass.api.login._postCallback");
        postDict.Add("username", txbBaiduUsername.Text);
        postDict.Add("password", txbBaiduPassword.Text);
        //postDict.Add("verifycode", "");
        postDict.Add("mem_pass","on");
  
        stringbaiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
        //add cookie
        req.CookieContainer =new CookieContainer();
        req.CookieContainer.Add(curCookies);
        //set to POST
        req.Method ="POST";
        req.ContentType ="application/x-www-form-urlencoded";
        //prepare post data
        stringpostDataStr = quoteParas(postDict);
        byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
        req.ContentLength = postBytes.Length;
        //send post data
        Stream postDataStream = req.GetRequestStream();
        postDataStream.Write(postBytes, 0, postBytes.Length);
        postDataStream.Close();
        //got response
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        //got returned html
        StreamReader sr =new StreamReader(resp.GetResponseStream());
        stringloginBaiduRespHtml = sr.ReadToEnd();
  
        //check whether got all expected cookies
        Dictionary<string,bool> cookieCheckDict =new Dictionary<string,bool>();
        string[] cookiesNameList = {"BDUSS","PTOKEN", "STOKEN", "SAVEUSERID"};
        foreach(String cookieToCheck incookiesNameList)
        {
            cookieCheckDict.Add(cookieToCheck,false);
        }
  
        foreach(Cookie singleCookie inresp.Cookies)
        {
            if(cookieCheckDict.ContainsKey(singleCookie.Name))
            {
                cookieCheckDict[singleCookie.Name] =true;
            }
        }
  
        boolallCookiesFound = true;
        foreach(bool foundCurCookie in cookieCheckDict.Values)
        {
            allCookiesFound = allCookiesFound && foundCurCookie;
        }
  
  
        loginBaiduOk = allCookiesFound;
        if(loginBaiduOk)
        {
            txbEmulateLoginResult.Text ="成功模拟登陆百度首页!";
        }
        else
        {
            txbEmulateLoginResult.Text ="模拟登陆百度首页 失败!";
            txbEmulateLoginResult.Text += Environment.NewLine +"所返回的Header信息为:";
            txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
            txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
            txbEmulateLoginResult.Text += Environment.NewLine +"所返回的HTML源码为:";
            txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
        }
    }
    else
    {
        MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");
    }
}

如果用户名和密码都是正确的话,即可成功登陆:

input name and pwd then click login will login ok

当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:

fake name and pwd will login fail

完整的C#模拟登陆百度首页的代码,如下:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
  
using System.Net;
using System.IO;
usingSystem.Text.RegularExpressions;
usingSystem.Web;
  
namespaceemulateLoginBaidu
{
    publicpartial classfrmEmulateLoginBaidu : Form
    {
        CookieCollection curCookies =null;
  
        boolgotCookieBaiduid, extractTokenValueOK, loginBaiduOk;
  
        publicfrmEmulateLoginBaidu()
        {
            InitializeComponent();
        }
  
        privatevoid frmEmulateLoginBaidu_Load(objectsender, EventArgs e)
        {
            //init
            curCookies =new CookieCollection();
            gotCookieBaiduid =false;
            extractTokenValueOK =false;
            loginBaiduOk =false;
        }
  
        /******************************************************************************
        functions in crifanLib.cs
        *******************************************************************************/
  
        //quote the input dict values
        //note: the return result for first para no '&'
        publicstring quoteParas(Dictionary<string,string> paras)
        {
            stringquotedParas = "";
            boolisFirst = true;
            stringval = "";
            foreach(string para in paras.Keys)
            {
                if(paras.TryGetValue(para, outval))
                {
                    if(isFirst)
                    {
                        isFirst =false;
                        quotedParas += para +"=" + HttpUtility.UrlPathEncode(val);
                    }
                    else
                    {
                        quotedParas +="&" + para +"=" + HttpUtility.UrlPathEncode(val);
                    }
                }
                else
                {
                    break;
                }
            }
  
            returnquotedParas;
        }
  
        /******************************************************************************
        Demo emulate login baidu related functions
        *******************************************************************************/
  
        privatevoid btnGetBaiduid_Click(objectsender, EventArgs e)
        {
            //http://www.baidu.com/
            stringbaiduMainUrl = txbBaiduMainUrl.Text;
            //generate http request
            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
  
            //add follow code to handle cookies
            req.CookieContainer =new CookieContainer();
            req.CookieContainer.Add(curCookies);
  
            req.Method ="GET";
            //use request to get response
            HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
            txbGotBaiduid.Text ="";
            foreach(Cookie ck inresp.Cookies)
            {
                txbGotBaiduid.Text +="[" + ck.Name +"]=" + ck.Value;
                if(ck.Name == "BAIDUID")
                {
                    gotCookieBaiduid =true;
                }
            }
  
            if(gotCookieBaiduid)
            {
                //store cookies
                curCookies = resp.Cookies;
            }
            else
            {
                MessageBox.Show("错误:没有找到cookie BAIDUID !");
            }
        }
  
        privatevoid btnGetToken_Click(objectsender, EventArgs e)
        {
            if(gotCookieBaiduid)
            {
                stringgetapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
  
                //add previously got cookies
                req.CookieContainer =new CookieContainer();
                req.CookieContainer.Add(curCookies);
  
                req.Method ="GET";
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                StreamReader sr =new StreamReader(resp.GetResponseStream());
                stringrespHtml = sr.ReadToEnd();
  
                //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
                stringtokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';";
                Match foundTokenVal = (newRegex(tokenValP)).Match(respHtml);
                if(foundTokenVal.Success)
                {
                    //extracted the token value
                    txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;
                    extractTokenValueOK =true;
                }
                else
                {
                    txbExtractedTokenVal.Text ="错误:没有找到token的值!";
                }
  
            }
            else
            {
                MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
            }
        }
  
        privatevoid btnEmulateLoginBaidu_Click(objectsender, EventArgs e)
        {
            if(gotCookieBaiduid && extractTokenValueOK)
            {
                stringstaticpage = "http://www.baidu.com/cache/user/html/jump.html";
                  
                //init post dict info
                Dictionary<string,string> postDict =new Dictionary<string,string>();
                //postDict.Add("ppui_logintime", "");
                postDict.Add("charset","utf-8");
                //postDict.Add("codestring", "");
                postDict.Add("token", txbExtractedTokenVal.Text);
                postDict.Add("isPhone","false");
                postDict.Add("index","0");
                //postDict.Add("u", "");
                //postDict.Add("safeflg", "0");
                postDict.Add("staticpage", staticpage);
                postDict.Add("loginType","1");
                postDict.Add("tpl","mn");
                postDict.Add("callback","parent.bdPass.api.login._postCallback");
                postDict.Add("username", txbBaiduUsername.Text);
                postDict.Add("password", txbBaiduPassword.Text);
                //postDict.Add("verifycode", "");
                postDict.Add("mem_pass","on");
  
                stringbaiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
                //add cookie
                req.CookieContainer =new CookieContainer();
                req.CookieContainer.Add(curCookies);
                //set to POST
                req.Method ="POST";
                req.ContentType ="application/x-www-form-urlencoded";
                //prepare post data
                stringpostDataStr = quoteParas(postDict);
                byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
                req.ContentLength = postBytes.Length;
                //send post data
                Stream postDataStream = req.GetRequestStream();
                postDataStream.Write(postBytes, 0, postBytes.Length);
                postDataStream.Close();
                //got response
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                //got returned html
                StreamReader sr =new StreamReader(resp.GetResponseStream());
                stringloginBaiduRespHtml = sr.ReadToEnd();
  
                //check whether got all expected cookies
                Dictionary<string,bool> cookieCheckDict =new Dictionary<string,bool>();
                string[] cookiesNameList = {"BDUSS","PTOKEN", "STOKEN", "SAVEUSERID"};
                foreach(String cookieToCheck incookiesNameList)
                {
                    cookieCheckDict.Add(cookieToCheck,false);
                }
  
                foreach(Cookie singleCookie inresp.Cookies)
                {
                    if(cookieCheckDict.ContainsKey(singleCookie.Name))
                    {
                        cookieCheckDict[singleCookie.Name] =true;
                    }
                }
  
                boolallCookiesFound = true;
                foreach(bool foundCurCookie in cookieCheckDict.Values)
                {
                    allCookiesFound = allCookiesFound && foundCurCookie;
                }
  
  
                loginBaiduOk = allCookiesFound;
                if(loginBaiduOk)
                {
                    txbEmulateLoginResult.Text ="成功模拟登陆百度首页!";
                }
                else
                {
                    txbEmulateLoginResult.Text ="模拟登陆百度首页 失败!";
                    txbEmulateLoginResult.Text += Environment.NewLine +"所返回的Header信息为:";
                    txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
                    txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
                    txbEmulateLoginResult.Text += Environment.NewLine +"所返回的HTML源码为:";
                    txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
                }
            }
            else
            {
                MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");
            }
        }
  
        privatevoid lklEmulateLoginTutorialUrl_LinkClicked(objectsender, LinkLabelLinkClickedEventArgs e)
        {
            stringemulateLoginTutorialUrl = "http://www.crifan.com/emulate_login_website_using_csharp";
            System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
        }
  
        privatevoid btnClearAll_Click(objectsender, EventArgs e)
        {
            curCookies =new CookieCollection();
            gotCookieBaiduid =false;
            extractTokenValueOK =false;
            loginBaiduOk =false;
  
            txbGotBaiduid.Text ="";
            txbExtractedTokenVal.Text ="";
  
            txbBaiduUsername.Text ="";
            txbBaiduPassword.Text ="";
            txbEmulateLoginResult.Text ="";
        }
    }
}

对应的,完整的VS2010的C#项目,可以去这里下载:

emulateLoginBaidu_csharp_2012-11-07.7z

【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】

后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。

下面是完整的,利用到crifanLib.cs的版本,的C#代码:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
  
using System.Net;
using System.IO;
usingSystem.Text.RegularExpressions;
usingSystem.Web;
  
namespaceemulateLoginBaidu
{
    publicpartial classfrmEmulateLoginBaidu : Form
    {
        CookieCollection curCookies =null;
  
        boolgotCookieBaiduid, extractTokenValueOK, loginBaiduOk;
  
        publicfrmEmulateLoginBaidu()
        {
            InitializeComponent();
        }
  
        privatevoid frmEmulateLoginBaidu_Load(objectsender, EventArgs e)
        {
            this.AcceptButton =this.btnEmulateLoginBaidu;
  
            //init for crifanLib.cs
            curCookies =new CookieCollection();
  
            //init for demo login
            gotCookieBaiduid =false;
            extractTokenValueOK =false;
            loginBaiduOk =false;
        }
  
        /******************************************************************************
        functions in crifanLib.cs
        Online browser:http://code.google.com/p/crifanlib/source/browse/trunk/csharp/crifanLib.cs
        Download:      http://code.google.com/p/crifanlib/
        *******************************************************************************/
  
        //quote the input dict values
        //note: the return result for first para no '&'
        publicstring quoteParas(Dictionary<string,string> paras)
        {
            stringquotedParas = "";
            boolisFirst = true;
            stringval = "";
            foreach(string para in paras.Keys)
            {
                if(paras.TryGetValue(para, outval))
                {
                    if(isFirst)
                    {
                        isFirst =false;
                        quotedParas += para +"=" + HttpUtility.UrlPathEncode(val);
                    }
                    else
                    {
                        quotedParas +="&" + para +"=" + HttpUtility.UrlPathEncode(val);
                    }
                }
                else
                {
                    break;
                }
            }
  
            returnquotedParas;
        }
  
  
        /*********************************************************************/
        /* cookie */
        /*********************************************************************/
  
        //add a single cookie to cookies, if already exist, update its value
        publicvoid addCookieToCookies(Cookie toAdd,ref CookieCollection cookies,bool overwriteDomain)
        {
            boolfound = false;
  
            if(cookies.Count > 0)
            {
                foreach(Cookie originalCookie incookies)
                {
                    if(originalCookie.Name == toAdd.Name)
                    {
                        // !!! for different domain, cookie is not same,
                        // so should not set the cookie value here while their domains is not same
                        // only if it explictly need overwrite domain
                        if((originalCookie.Domain == toAdd.Domain) ||
                            ((originalCookie.Domain != toAdd.Domain) && overwriteDomain))
                        {
                            //here can not force convert CookieCollection to HttpCookieCollection,
                            //then use .remove to remove this cookie then add
                            // so no good way to copy all field value
                            originalCookie.Value = toAdd.Value;
  
                            originalCookie.Domain = toAdd.Domain;
  
                            originalCookie.Expires = toAdd.Expires;
                            originalCookie.Version = toAdd.Version;
                            originalCookie.Path = toAdd.Path;
  
                            //following fields seems should not change
                            //originalCookie.HttpOnly = toAdd.HttpOnly;
                            //originalCookie.Secure = toAdd.Secure;
  
                            found =true;
                            break;
                        }
                    }
                }
            }
  
            if(!found)
            {
                if(toAdd.Domain != "")
                {
                    // if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!!
                    cookies.Add(toAdd);
                }
            }
  
        }//addCookieToCookies
  
        //add singel cookie to cookies, default no overwrite domain
        publicvoid addCookieToCookies(Cookie toAdd,ref CookieCollection cookies)
        {
            addCookieToCookies(toAdd,ref cookies,false);
        }
  
        //check whether the cookies contains the ckToCheck cookie
        //support:
        //ckTocheck is Cookie/string
        //cookies is Cookie/string/CookieCollection/string[]
        publicbool isContainCookie(objectckToCheck, objectcookies)
        {
            boolisContain = false;
  
            if((ckToCheck != null) && (cookies !=null))
            {
                stringckName = "";
                Type type = ckToCheck.GetType();
  
                //string typeStr = ckType.ToString();
  
                //if (ckType.FullName == "System.string")
                if(type.Name.ToLower() == "string")
                {
                    ckName = (string)ckToCheck;
                }
                elseif (type.Name == "Cookie")
                {
                    ckName = ((Cookie)ckToCheck).Name;
                }
  
                if(ckName != "")
                {
                    type = cookies.GetType();
  
                    // is single Cookie
                    if(type.Name == "Cookie")
                    {
                        if(ckName == ((Cookie)cookies).Name)
                        {
                            isContain =true;
                        }
                    }
                    // is CookieCollection
                    elseif (type.Name == "CookieCollection")
                    {
                        foreach(Cookie ck in(CookieCollection)cookies)
                        {
                            if(ckName == ck.Name)
                            {
                                isContain =true;
                                break;
                            }
                        }
                    }
                    // is single cookie name string
                    elseif (type.Name.ToLower() =="string")
                    {
                        if(ckName == (string)cookies)
                        {
                            isContain =true;
                        }
                    }
                    // is cookie name string[]
                    elseif (type.Name.ToLower() =="string[]")
                    {
                        foreach(string name in ((string[])cookies))
                        {
                            if(ckName == name)
                            {
                                isContain =true;
                                break;
                            }
                        }
                    }
                }
            }
  
            returnisContain;
        }//isContainCookie
  
        // update cookiesToUpdate to localCookies
        // if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate
        publicvoid updateLocalCookies(CookieCollection cookiesToUpdate,ref CookieCollection localCookies,object omitUpdateCookies)
        {
            if(cookiesToUpdate.Count > 0)
            {
                if(localCookies == null)
                {
                    localCookies = cookiesToUpdate;
                }
                else
                {
                    foreach(Cookie newCookie incookiesToUpdate)
                    {
                        if(isContainCookie(newCookie, omitUpdateCookies))
                        {
                            // need omit process this
                        }
                        else
                        {
                            addCookieToCookies(newCookie,ref localCookies);
                        }
                    }
                }
            }
        }//updateLocalCookies
          
        //update cookiesToUpdate to localCookies
        publicvoid updateLocalCookies(CookieCollection cookiesToUpdate,ref CookieCollection localCookies)
        {
            updateLocalCookies(cookiesToUpdate,ref localCookies,null);
        }
  
        /*********************************************************************/
        /* HTTP */
        /*********************************************************************/
  
        /* get url's response */
        publicHttpWebResponse getUrlResponse(stringurl,
                                        Dictionary<string,string> headerDict,
                                        Dictionary<string,string> postDict,
                                        inttimeout,
                                        stringpostDataStr)
        {
            //CookieCollection parsedCookies;
  
            HttpWebResponse resp =null;
  
            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
  
            req.AllowAutoRedirect =true;
            req.Accept ="*/*";
  
            //const string gAcceptLanguage = "en-US"; // zh-CN/en-US
            //req.Headers["Accept-Language"] = gAcceptLanguage;
  
            req.KeepAlive =true;
  
            //IE8
            //const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";
            //IE9
            //const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64
            conststring gUserAgent ="Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)";// x86
            //Chrome
            //const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";
            //Mozilla Firefox
            //const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";
            req.UserAgent = gUserAgent;
  
            req.Headers["Accept-Encoding"] ="gzip, deflate";
            req.AutomaticDecompression = DecompressionMethods.GZip;
  
            req.Proxy =null;
  
            if(timeout > 0)
            {
                req.Timeout = timeout;
            }
  
            if(curCookies != null)
            {
                req.CookieContainer =new CookieContainer();
                req.CookieContainer.PerDomainCapacity = 40;// following will exceed max default 20 cookie per domain
                req.CookieContainer.Add(curCookies);
            }
  
            if(headerDict != null)
            {
                foreach(string header in headerDict.Keys)
                {
                    stringheaderValue = "";
                    if(headerDict.TryGetValue(header, outheaderValue))
                    {
                        // following are allow the caller overwrite the default header setting
                        if(header.ToLower() == "referer")
                        {
                            req.Referer = headerValue;
                        }
                        elseif (header.ToLower() =="allowautoredirect")
                        {
                            boolisAllow = false;
                            if(bool.TryParse(headerValue,out isAllow))
                            {
                                req.AllowAutoRedirect = isAllow;
                            }
                        }
                        elseif (header.ToLower() =="accept")
                        {
                            req.Accept = headerValue;
                        }
                        elseif (header.ToLower() =="keepalive")
                        {
                            boolisKeepAlive = false;
                            if(bool.TryParse(headerValue,out isKeepAlive))
                            {
                                req.KeepAlive = isKeepAlive;
                            }
                        }
                        elseif (header.ToLower() =="accept-language")
                        {
                            req.Headers["Accept-Language"] = headerValue;
                        }
                        elseif (header.ToLower() =="useragent")
                        {
                            req.UserAgent = headerValue;
                        }
                        else
                        {
                            req.Headers[header] = headerValue;
                        }
                    }
                    else
                    {
                        break;
                    }
                }
            }
  
            if(postDict != null|| postDataStr != "")
            {
                req.Method ="POST";
                req.ContentType ="application/x-www-form-urlencoded";
  
                if(postDict != null)
                {
                    postDataStr = quoteParas(postDict);
                }
  
                //byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData);
                byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
                req.ContentLength = postBytes.Length;
  
                Stream postDataStream = req.GetRequestStream();
                postDataStream.Write(postBytes, 0, postBytes.Length);
                postDataStream.Close();
            }
            else
            {
                req.Method ="GET";
            }
  
            //may timeout, has fixed in:
            //http://www.crifan.com/fixed_problem_sometime_httpwebrequest_getresponse_timeout/
            resp = (HttpWebResponse)req.GetResponse();
  
            updateLocalCookies(resp.Cookies,ref curCookies);
  
            returnresp;
        }
  
        publicHttpWebResponse getUrlResponse(stringurl,
                                    Dictionary<string,string> headerDict,
                                    Dictionary<string,string> postDict)
        {
            returngetUrlResponse(url, headerDict, postDict, 0, "");
        }
  
        publicHttpWebResponse getUrlResponse(stringurl)
        {
            returngetUrlResponse(url, null,null, 0, "");
        }
  
        // valid charset:"GB18030"/"UTF-8", invliad:"UTF8"
        publicstring getUrlRespHtml(stringurl,
                                        Dictionary<string,string> headerDict,
                                        stringcharset,
                                        Dictionary<string,string> postDict,
                                        inttimeout,
                                        stringpostDataStr)
        {
            stringrespHtml = "";
  
            //HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout);
            HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr);
  
            //long realRespLen = resp.ContentLength;
  
            StreamReader sr;
            if((charset != null) && (charset !=""))
            {
                Encoding htmlEncoding = Encoding.GetEncoding(charset);
                sr =new StreamReader(resp.GetResponseStream(), htmlEncoding);
            }
            else
            {
                sr =new StreamReader(resp.GetResponseStream());
            }
            respHtml = sr.ReadToEnd();
  
            returnrespHtml;
        }
  
        publicstring getUrlRespHtml(stringurl, Dictionary<string,string> headerDict,string charset, Dictionary<string,string> postDict,string postDataStr)
        {
            returngetUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr);
        }
  
        publicstring getUrlRespHtml(stringurl, Dictionary<string,string> headerDict, Dictionary<string,string> postDict)
        {
            returngetUrlRespHtml(url, headerDict, "", postDict,"");
        }
  
        publicstring getUrlRespHtml(stringurl, Dictionary<string,string> headerDict)
        {
            returngetUrlRespHtml(url, headerDict, null);
        }
  
        publicstring getUrlRespHtml(stringurl, stringcharset, inttimeout)
        {
            returngetUrlRespHtml(url, null, charset,null, timeout,"");
        }
  
        publicstring getUrlRespHtml(stringurl, stringcharset)
        {
            returngetUrlRespHtml(url, charset, 0);
        }
  
        publicstring getUrlRespHtml(stringurl)
        {
            returngetUrlRespHtml(url, "");
        }
  
  
        /******************************************************************************
        Demo emulate login baidu related functions
        *******************************************************************************/
  
        privatevoid btnGetBaiduid_Click(objectsender, EventArgs e)
        {
            //http://www.baidu.com/
            stringbaiduMainUrl = txbBaiduMainUrl.Text;
            HttpWebResponse resp = getUrlResponse(baiduMainUrl);
            txbGotBaiduid.Text ="";
            foreach(Cookie ck inresp.Cookies)
            {
                txbGotBaiduid.Text +="[" + ck.Name +"]=" + ck.Value;
                if(ck.Name == "BAIDUID")
                {
                    gotCookieBaiduid =true;
                }
            }
  
            if(gotCookieBaiduid)
            {
                //store cookies
                curCookies = resp.Cookies;
            }
            else
            {
                MessageBox.Show("错误:没有找到cookie BAIDUID !");
            }
        }
  
        privatevoid btnGetToken_Click(objectsender, EventArgs e)
        {
            if(gotCookieBaiduid)
            {
                stringgetapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
                stringrespHtml = getUrlRespHtml(getapiUrl);
  
                //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
                stringtokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';";
                Match foundTokenVal = (newRegex(tokenValP)).Match(respHtml);
                if(foundTokenVal.Success)
                {
                    //extracted the token value
                    txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;
                    extractTokenValueOK =true;
                }
                else
                {
                    txbExtractedTokenVal.Text ="错误:没有找到token的值!";
                }
  
            }
            else
            {
                MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
            }
        }
  
        privatevoid btnEmulateLoginBaidu_Click(objectsender, EventArgs e)
        {
            if(gotCookieBaiduid && extractTokenValueOK)
            {
                stringstaticpage = "http://www.baidu.com/cache/user/html/jump.html";
                  
                //init post dict info
                Dictionary<string,string> postDict =new Dictionary<string,string>();
                //postDict.Add("ppui_logintime", "");
                postDict.Add("charset","utf-8");
                //postDict.Add("codestring", "");
                postDict.Add("token", txbExtractedTokenVal.Text);
                postDict.Add("isPhone","false");
                postDict.Add("index","0");
                //postDict.Add("u", "");
                //postDict.Add("safeflg", "0");
                postDict.Add("staticpage", staticpage);
                postDict.Add("loginType","1");
                postDict.Add("tpl","mn");
                postDict.Add("callback","parent.bdPass.api.login._postCallback");
                postDict.Add("username", txbBaiduUsername.Text);
                postDict.Add("password", txbBaiduPassword.Text);
                //postDict.Add("verifycode", "");
                postDict.Add("mem_pass","on");
  
                stringbaiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";
                stringloginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl,null, postDict);
  
                //check whether got all expected cookies
                Dictionary<string,bool> cookieCheckDict =new Dictionary<string,bool>();
                string[] cookiesNameList = {"BDUSS","PTOKEN", "STOKEN", "SAVEUSERID"};
                foreach(String cookieToCheck incookiesNameList)
                {
                    cookieCheckDict.Add(cookieToCheck,false);
                }
  
                foreach(Cookie singleCookie incurCookies)
                {
                    if(cookieCheckDict.ContainsKey(singleCookie.Name))
                    {
                        cookieCheckDict[singleCookie.Name] =true;
                    }
                }
  
                boolallCookiesFound = true;
                foreach(bool foundCurCookie in cookieCheckDict.Values)
                {
                    allCookiesFound = allCookiesFound && foundCurCookie;
                }
  
  
                loginBaiduOk = allCookiesFound;
                if(loginBaiduOk)
                {
                    txbEmulateLoginResult.Text ="成功模拟登陆百度首页!";
                }
                else
                {
                    txbEmulateLoginResult.Text ="模拟登陆百度首页 失败!";
                    txbEmulateLoginResult.Text += Environment.NewLine +"所返回的HTML源码为:";
                    txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
                }
            }
            else
            {
                MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");
            }
        }
  
        privatevoid lklEmulateLoginTutorialUrl_LinkClicked(objectsender, LinkLabelLinkClickedEventArgs e)
        {
            stringemulateLoginTutorialUrl = "http://www.crifan.com/emulate_login_website_using_csharp";
            System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
        }
  
        privatevoid btnClearAll_Click(objectsender, EventArgs e)
        {
            curCookies =new CookieCollection();
            gotCookieBaiduid =false;
            extractTokenValueOK =false;
            loginBaiduOk =false;
  
            txbGotBaiduid.Text ="";
            txbExtractedTokenVal.Text ="";
  
            txbBaiduUsername.Text ="";
            txbBaiduPassword.Text ="";
            txbEmulateLoginResult.Text ="";
        }
  
    }
}

完整的VS2010的项目,可去这里下载:

emulateLoginBaidu_csharp_crifanLibVersion_2012-11-07.7z

关于crifanLib.cs:

在线浏览:crifanLib.cs

下载:crifanLib_2012-11-07.7z

【总结】

可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比用Python实现出来,要复杂的多。

主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。

所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。


【后记 2013-09-11】

1.经过研究:

【记录】研究模拟登陆百度的C#代码为何在.NET 4.0中不工作

的确是:

之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;

2.现已找到原因并修复。

原因是:

.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:

H_PS_PSSID

失效,导致后续操作都异常了。

而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;

3.修复后的代码:

供下载:

(1)模拟百度登陆 独立完整代码版本 .NET 4.0

emulateLoginBaidu_csharp_independentCodeVersion_2013-09-11.7z

 

(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0

emulateLoginBaidu_csharp_crifanLibVersion_2013-09-11.7z

 

(抽空再上传上面两个文件,因为此处上传出错:

xxx.7z:

unknown Bytes complete FAILED!

:Upload canceled

: VIRUS DETECTED!

(Heuristics.Broken.Executable FOUND)

抽空换个时间上传试试。还是同样错误的话,再去解决。)

 

【总结】

.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:

一直就是狗屎,bug一堆。

详见:

SetCookie解析有bug

以后,能少用那个resp.Cookies,就少用吧。

否则被C#玩死,都不知道怎么死的。

还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。

详见:

解析(Http访问所返回的)Set-Cookie的字符串为Cookie数组:parseSetCookie

0 0