A Beginner’s Guide to LibCurl

来源:互联网 发布:淘宝怎样提高转化率 编辑:程序博客网 时间:2024/04/30 10:27

Background/Introduction:


LibCurl is an open-source file transfer library that provides developers with a simple yet robust interface for all common transfer-related needs. While aimed primarily for use with C/C++, an extensive list of bindings is available for most languages. Furthermore, LibCurl is supported by almost every conceivable common platform, making it one of the most versatile libraries of its kind.
In this article, we’re going to look at using LibCurl to make simple HTTP requests with its built-in cookie processor to programmatically log in to websites and download web pages. We’re going to be using C for this, as it’s LibCurl’s native language. Afterwards, we’ll look at how to use this from a practical perspective, including how to use LibCurl to solve several HTS Programming Challenges.

Setup / Installation:


The download page for LibCurl can be found here. I won’t go into too much detail (or any, really) about installation simply because the process is fairly straightforward. The developers even provide an intuitive download wizard at the link provided. The installation guide, if necessary, is well-written and can be foundhere. When in doubt, ask Google. If all else fails, ask your fellow HTS members.
You will, of course, need to link to the LibCurl libraries when building your programs. You may also have to configure your compiler/linker to look in the appropriate directories for the LibCurl include files. Once again, the process is usually simple, but as all compilers are different, you may to have to seek external help in getting things set up if you’re new to this. Once LibCurl is set up, you’re ready to start building your own file transfer programs.

Retrieving a webpage:


Hereafter, I will refer to LibCurl simply as LC, for convenience.


All of the code below has been tested and compiled with Code::Blocks on Windows Vista.

First we’re going to be using LC to retrieve a webpage using the following code snippets. After that, we’ll look at using cookies and HTTP POST data.

First, our main LC include file.

CODE : 
#include <curl/curl.h>


Before making any calls to LC functions, you must initialize LC globally. For our sake, we can do this with one simple line of code:

CODE : 
curl_global_init( CURL_GLOBAL_ALL );


This call will initialize all known LC sub-modules. After this, we’re ready to create a session handle. For this, we’re going to use LC’s “easy interface.” 

CODE : 
CURL * myHandle;
CURLcode result; // We’ll store the result of CURL’s webpage retrieval, for simple error checking.
myHandle = curl_easy_init ( ) ;
// Notice the lack of major error-checking, for brevity


With our new handle, we can now start transferring files. To do this, we must tell LC how to operate: What URL to connect to, amongst other things. LC operates as a state machine, much like OpenGL. This means that once we send parameters to LC telling it how to operate, those settings will stay in effect until we explicitly change them. This can be great for avoiding redundancy between retrievals.

CODE : 
curl_easy_setopt( myHandle, CURLOPT_URL, "http://www.example.com");
result = curl_easy_perform( myHandle ); 
curl_easy_cleanup( myHandle );


That’s all there is to retrieving a webpage with LC. Yes, really. The following C program, made from the code snippets above, will output the HTML from example.com to a console. 

CODE : 

#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>

int main()
{
curl_global_init( CURL_GLOBAL_ALL );
CURL * myHandle;
CURLcode result; // We’ll store the result of CURL’s webpage retrieval, for simple error checking.
myHandle = curl_easy_init ( ) ;
// Notice the lack of major error checking, for brevity
curl_easy_setopt(myHandle, CURLOPT_URL, "http://www.example.com");
result = curl_easy_perform( myHandle );
curl_easy_cleanup( myHandle ); 
printf("LibCurl rules!\n");
return 0;
}


But what if you don’t want to output to the console? What if you want to want to output to, say, a file? Easy. First we just need to define a function to accept LC’s output into a C-style struct, pass a function pointer to LC, and then send the contents of that struct to a file. I won’t go into much detail here, the source should speak for itself. The following code will download the HTML from example.com and save it to a file, example.html. There’s not much error checking in order to save space. If you have any questions not related to LibCurl, look around here.

CODE : 

#include <stdlib.h>
#include <string.h>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

// Define our struct for accepting LCs output
struct BufferStruct
{
char * buffer;
size_t size;
};

// This is the function we pass to LC, which writes the output to a BufferStruct
static size_t WriteMemoryCallback
(void *ptr, size_t size, size_t nmemb, void *data)
{
size_t realsize = size * nmemb;

struct BufferStruct * mem = (struct BufferStruct *) data;

mem->buffer = realloc(mem->buffer, mem->size + realsize + 1);

if ( mem->buffer )
{
memcpy( &( mem->buffer[ mem->size ] ), ptr, realsize );
mem->size += realsize;
mem->buffer[ mem->size ] = 0;
}
return realsize;
}


int main()
{

curl_global_init( CURL_GLOBAL_ALL );
CURL * myHandle;
CURLcode result; // We’ll store the result of CURL’s webpage retrieval, for simple error checking.
struct BufferStruct output; // Create an instance of out BufferStruct to accept LCs output
output.buffer = NULL;
output.size = 0;
myHandle = curl_easy_init ( ) ;

/* Notice the lack of major error checking, for brevity */

curl_easy_setopt(myHandle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback); // Passing the function pointer to LC
curl_easy_setopt(myHandle, CURLOPT_WRITEDATA, (void *)&output); // Passing our BufferStruct to LC
curl_easy_setopt(myHandle, CURLOPT_URL, "http://www.example.com");
result = curl_easy_perform( myHandle );
curl_easy_cleanup( myHandle );

FILE * fp;
fp = fopen( "example.html","w");
if( !fp )
return 1;
fprintf(fp, output.buffer );
fclose( fp );

if( output.buffer )
{
free ( output.buffer );
output.buffer = 0;
output.size = 0;
}

printf("LibCurl rules!\n");
return 0;
}



Retrieving a webpage (With cookies!):


The example above is cool if you’re just starting out, but it’s fairly useless. You’ll notice that if you try to retrieve, say, a Facebook profile, LC will just retrieve the Facebook login page. The reason? Cookies. When you log in to a website, the remote server will send you a cookie, and expect the browser to resend that cookie for later HTTP requests. No cookie, no webpage. At least, not the one you were expecting. If you’re new to cookies, it’s imperative to at least read over the CookieCentral FAQ. I also highly suggest reading overhow HTTP works.

FACT: Reading doesn’t make you a hacker, but you won’t become a hacker without reading almost technical guide, FAQ, and manual you can find.

Now, we’re going to look at some more LC commands to send HTTP POST data and then store a remote server’s cookies. This will allow us to simulate filling out an HTML form in your web browser and then log in to a website of choice. First, we’re going to add a line to define our user agent, since some websites reject HTTP requests without a proper user agent. We’ll also specify for LC to follow redirects, as many login pages redirect users to a home screen. Remember, we only need to do this once. LC will keep these settings in effect unless we change them.
CODE : 
curl_easy_setopt(curl, CURLOPT_USERAGENT, "Mozilla/4.0");
curl_easy_setopt(curl, CURLOPT_AUTOREFERER, 1 );
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1 );


Next, we’ll enable LC’s automatic processing of cookies. With this line, we won’t have to manually work with the cookies ourselves.
CODE : 
curl_easy_setopt(curl, CURLOPT_COOKIEFILE, "");


Finally we have to send the appropriate HTTP POST data, which your browser usually does behind the scenes after you fill out an HTML form. The format for this is a C-style string as follows:
CODE : 
char * submit_this_please = “name_of_form_input=some_value&name_of_other_input=other_value”;


The following code will submit the appropriate data to login to HackThisSite.
CODE : 
char *data="username=your_username_here&password=your_password_here";
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, data);


The following is a complete, yet brief, example that will show you how to use LC to login to HackThisSite. Obviously, you will have to modify the hard-coded username and password to login to your own account. To save space, the output will be sent simply to the console. Additionally, I have not included any real error checking.
CODE : 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>


int main()
{

curl_global_init( CURL_GLOBAL_ALL );
CURL * myHandle = curl_easy_init ( );

// Set up a couple initial paramaters that we will not need to mofiy later.
curl_easy_setopt(myHandle, CURLOPT_USERAGENT, "Mozilla/4.0");
curl_easy_setopt(myHandle, CURLOPT_AUTOREFERER, 1 );
curl_easy_setopt(myHandle, CURLOPT_FOLLOWLOCATION, 1 );
curl_easy_setopt(myHandle, CURLOPT_COOKIEFILE, "");

// Visit the login page once to obtain a PHPSESSID cookie
curl_easy_setopt(myHandle, CURLOPT_URL, "http://www.hackthissite.org/user/login/");
curl_easy_perform( myHandle );


// Now, can actually login. First we forge the HTTP referer field, or HTS will deny the login
curl_easy_setopt(myHandle, CURLOPT_REFERER, "http://www.hackthissite.org/user/login/");
// Next we tell LibCurl what HTTP POST data to submit
char *data="username=your_username_here&password=your_password_here";
curl_easy_setopt(myHandle, CURLOPT_POSTFIELDS, data);
curl_easy_perform( myHandle );

curl_easy_cleanup( myHandle );


return 0;
}


Applying LibCurl to HTS Challenges:


None of these examples will directly help you solve the Programming Challenges. However, several challenges consist of obtaining dynamically provided information, using the information to generate a text value, and then submitting that value. By combining the concepts covered in these examples, you should be able to programmatically log in to HTS, download arbitrary webpages, and save those pages into a buffer. It’s up to you to write those programs – I doubt that you’re going to get much more information than what is provided here without doing your own research. 

For example, to solve at least one of the challenges, you will have to URL-escape the POST data. If you take initiative, you can easily figure out how to do this by referring to the LibCurl website

Real-world uses:


As you can read here, LibCurl is used in numerous real-world applications. If you’re looking for practice, try using LibCurl to build some of the following examples:
A Simple Text-Based Web Browser
An FTP File Uploader
A Flexible Downloading Tool
A Dictionary Tool (to lookup information from, say, dictionary.reference.com)
A Cookie Forging Tool (that can accept and use stolen cookies)
(Insert any file transfer utility here)

Closing remarks:


You should have a general idea on how to use LibCurl. As stated in the introduction, LC is an immensely flexible library that can work with a large variety of protocols – we’ve hardly begun to scratch the surface. I highly suggest reading through the provided links to gain a better appreciation for the capabilities of this library, as much as I suggest compiling the provided examples and experimenting with your own programs. Good luck, and happy hacking.
0 0
原创粉丝点击