libcurl编程教程

来源:互联网 发布:淘宝特效代码生成器 编辑:程序博客网 时间:2024/06/03 17:13

毕业设计翻译了一部分libcurl的编程教程,包含中文翻译及英文原文。现在分享出来以供大家参考。


libcurl编程教程

1 名称

libcurl-tutorial —— libcurl编程教程

2 目的

本文档尝试描述使用libcurl进行编程时要考虑的一般原则和一些基本方法。本文将主要集中在基于C的接口上,但是在其他接近C语言的接口上也能很好地应用,通常它们跟C一样。

本文中的“用户”指的是使用libcurl编写源代码的人员。本文中的“程序”通常指的是使用libcurl进行开发的源代码。

更多的选项信息和功能的描述,请参阅各自的手册页。

3 构建

构建使用libcurlC程序有许多方法,本章将假设在目标环境为Unix风格的操作系统上构建程序。如果使用不同的构建系统,也可以参考本文,以获得一些适用于其他构建系统的常规信息。

1)编译程序

编译器需要知道libcurl的头文件的位置。因此必须将编译器的包含路径设置为libcurl的安装目录。可以使用curl-config工具来获取这方面的信息:

$ curl-config --cflags

2)程序与libcurl链接

编译程序后,需要链接目标文件以创建一个可执行程序。对于使用libcurl的程序,需要链接libcurl以及它的依赖库才能成功生成可执行程序。查看链接方式可以使用curl-config工具:

$ curl-config --libs

3)是否使用SSL

libcurl可以以许多方式构建和定制。与其他库不同的是,libcurl可以根据需求来定制编译。例如HTTPSFTPS就需要libcurl支持基于SSL的传输。如果需要启用SSL的支持,则在编译libcurl之前就正确设置。可以使用curl-config工具来查看libcurl支持的特性:

$ curl-config --feature

如果libcurl支持SSL,命令行将会输出”SSL”的字符。

4autoconf

configure脚本以检测libcurl和设置变量时,我们提供一个预先写入的宏。 请参阅docs / libcurl / libcurl.m4文件 -包含了如何使用该宏的信息。

4 跨平台可移植的代码

libcurl的开发人员付出了很大的努力,使得libcurl尽可能的在多数操作系统和环境中工作。

开发人员在不同的平台上将以相同的方式编写libcurl程序,而只需要考虑极少的次要的平台因素。如果想要创建足够可移植的代码,libcurl将会很好的帮助你去创建高可移植的程序。

5 全局初始化

程序在使用libcurl之前,必须进行先初始化libcurl。整个程序只需要初始化一次。使用:

curl_global_init()

来完成初始化。该函数需要一个参数,以告知libcurl初始化的方式。参数CURL_GLOBAL_ALL将使其初始化所有已知的内部子模块,该参数是一个较好的默认选项。还有两个可选参数值:

CURL_GLOBAL_WIN32

该参数适用于Windows平台,使用它可以告知libcurl去初始化winsock。在Windows下使用SOCKET必须初始化winsock

CURL_GLOBAL_SSL

如果libcurl启用支持SSL,则此参数将初始化相应的SSL库。

libcurl有一个默认保护机制,在调用curl_easy_perform()函数时,程序还没有调用curl_global_init()进行初始化,libcurl将根据当前环境自动的调用curl_global_init()函数。但应注意,让libcurl自己初始化并不是一个很好的选择。

当程序不再使用libcurl时,应该调用curl_global_cleanup()函数,它将执行与curl_global_init()函数相反的操作去释放相关资源。

需要注意的是,在程序中应该避免重复调用curl_global_initcurl_global_cleanup,它们只能被调用一次。

6 libcurl提供的特性

在运行时而不是在编译时确定libcurl所支持的特性被认为是最佳实践。也就是说运行时根据libcurl特性来开发比编译时更好。在运行时可以通过调用 curl_version_info()函数来获取运行时libcurl支持的特性,使得程序可以得知libcurl的支持内容从而健壮的运行。

7 两种编程接口

libcurl首先介绍了easy接口。Easy接口中所有函数都以”curl_easy”为前缀。easy接口提供同步、阻塞的单次传输功能。

Libcurl还提供了另一种接口,提供异步、多并发的传输功能。

8 easy接口

要使用easy接口,首先需要创建一个easy接口的句柄(handle)。该句柄将用于easy接口调用的每一个操作。基本上,每一个用于数据传输的线程都应该有一个easy句柄。需要注意的是,不要再多个线程间共享相同的easy句柄。使用:

easyhandle = curl_easy_init();

获取一个easy句柄。一个句柄就是一个或一系列逻辑传输实体。

可以使用curl_easy_setopt()函数为easy句柄设置属性和选项。这些属性和选项将会控制之后进行的数据传输。选项将被存储进句柄中,直到被再次设置。这些选项是和句柄关联的,相同的句柄的多个请求会使用相同的选项。

如果想在任何时候删除以前所有设置的选项,可以调用curl_easy_reset()函数。可以使用curl_easy_duphandle()函数克隆一个easy句柄,该句柄将会和被克隆句柄的所有选项相同。

libcurl中许多选项使用字符串来设置,字符串以’\0’结束。当使用字符串参数时,libcurl将会自动生成该选项字符串的副本,这使得使用完该字符串后可以释放掉它。

在句柄中设置的最基本的属性是URL,属性名为”CURLOPT_URL”。可以使用函数:

 curl_easy_setopt(handle, CURLOPT_URL, "http://domain.com/");

来设置URL

假设需要要获取URL所标识的远程资源,需要编写应用程序来完成数据的传输。可能用户希望保存收到的数据而不是在屏幕上显示,这需要有一个回调函数来完成数据的保存的处理:

size_t write_data(void * buffer,size_t size,size_t nmemb,void* userp);

可以通过以下操作来向libcurl注册该回调函数:

 curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data);

其中write_data即注册的回调函数的函数名。可以通过以下操作来设置回调函数的第四个参数。

curl_easy_setopt(easyhandle, CURLOPT_WRITEDATA, &internal_struct);

如果没有使用”CURLOPT_WRITEFUNCTION”选项来设置回调函数,libcurl将会提供一个默认的回调函数,默认回调函数会简单的将收到的数据送至标准输出上。也可以通过使用”CURLOPT_WRITEDATA”选项来向回调函数传递一个FILE*的指针,从而将收到的数据写入不同的文件。

需要注意一些平台相关的问题,在某些平台下,libcurl无法操作应用程序打开的文件。因此,如果使用默认回调,并且使用”CURLOPT_WRITEDATA”选项传递FILE*指针至默认回调,程序将会崩溃。如果程序需要运行在多种环境或平台下,就需要避免这样做。

如果使用的是win32动态链接库(DLL)的libcurl,在使用”CURLOPT_WRITEDATA”选项时必须设置”CURLOPT_WRITEFUNCTION”选项来注册回调函数,否则可能造成崩溃。

libcurl支持很多选项可以设置,这将在稍后介绍。在设置好URL、回调函数等选项后,就可以调用以下函数来执行既定任务:

success = curl_easy_perform(easyhandle);

curl_easy_perfrom()函数阻塞地连接到远程主机,执行必要的命令,访问目标URL资源并接收数据。在收到数据时,就会调用之前设置的回调函数。Libcurl可能一次只收到1个字节,也有可能一次收到几千个字节。libcurl尽最大努力地交付数据。回调函数在处理数据后会返回数据的字节数,若回调函数返回的字节数与其接收的字节数不同,则libcurl将会终止操作并返回错误码。传输完成后,curl_easy_perfrom()返回一个返回码,以告知是否完成传输任务。若要了解更多错误信息,可以通过设置”CURLOPT_ERRORBUFFER”选项来制定一个缓冲区来存储更详细的错误信息。

如果想要进行新的传输任务,则可以重用已完成任务的句柄。libcurl也会尝试重用以前的连接。对于一些协议,传输文件可能涉及复杂的过程,例如:登录、设置传输模式、改变当前目录以及最终的传输数据。libcurl将考虑到所有的方方面面,对于开发人员来说,只需要设置好URLlibcurl将负责文件从一端到另一端传输所涉及的所有细节。

9 数据上传

libcurl尽最大可能保持传输与协议无关。因此上传文件到远程FTP站点非常类似于使用PUT请求将数据上传到HTTP服务器。

要实现文件上传至远程站点,首先需要创建easy句柄,或者重用一个现有的easy句柄。和上文设置类似,需要设置一个URL用于远程上传的目的地址:

curl_easy_setopt(handle, CURLOPT_URL, "http://domain.com/");

和下载使用写回调函数相反,我们需要为上传设置读回调函数:

size_t function(char *bufptr, size_t size, size_t nitems, void *userp);

其中,参数bufptr是指向缓冲区的指针,我们在缓冲区中填入需要上传的数据。参数size和参数nitems之乘积是缓冲区的大小,也是回调函数返回的最大值。参数userp和下载的写回调参数类似,是用户自定义的参数。使用以下操作完成选项设置:

 curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);

 curl_easy_setopt(easyhandle, CURLOPT_READDATA, &filedata);

使用以下操作告知libcurl该任务为上传:

curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, 1L);

有些协议需要在上传时告知文件大小,使用” CURLOPT_INFILESIZE_LARGE”设置文件大小:

curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size);

参数file_size必须是一个curl_off_t类型的变量。之后调用 curl_easy_perform()完成上传任务。它将调用设置的回调函数来读取文件数据,回调函数每次尽可能多的读取数据,回调函数返回写入缓冲的字节数,返回0表示上传结束。

10 异常情况

由于某些原因,传输总会失败。可能是开发人员设置了错误的libcurl选项或者误解了libcurl选项实际执行的操作,或者远程服务器返回一个非标准的回复从而使libcurl非正常工作。

当发生这些一场情况时,有一个黄金规则是:将” CURLOPT_VERBOSE”选项设为1libcurl将会输出通信过程中的协议的详细信息、一些内部信息以及一些接收的协议信息(特别是FTP)。如果使用的是HTTP协议,请求头和响应头也会被输出。通过这种方式可以很好的了解到服务器的工作方式。将”CURLOPT_HEADER”设置为1将使正常的输出中包含头部数据。

libcurl也存在一些bug导致异常情况。当您在使用libcurl时发现bug,请向我们提交错误报告以便我们能够修复它。在提交bug时,请尽可能的提供更多的详细信息,包括:”CURLOPT_VERBOSE”产生的协议转储、libcurl的版本、尽可能多的使用libcurl的代码、操作系统信息、编译器信息等。

如果”CURLOPT_VERBOSE”不够,可以使用” CURLOPT_DEBUGFUNCTION”来增加调试数据的级别。

获取所涉及协议的一些深入知识是永远不会无用的。如果能简单的研究相应的RFC文档,就会对相关协议了解越多,使用libcurl就越不容易出错。



Libcurl Programming Tutorial

NAME

libcurl-tutorial - libcurl programming tutorial

Objective

This document attempts to describe the general principles and some basic approaches to consider when programming with libcurl. The text will focus mainly on the C interface but might apply fairly well on other interfaces as well as they usually follow the C one pretty closely.

This document will refer to 'the user' as the person writing the source code that uses libcurl. That would probably be you or someone in your position. What will be generally referred to as 'the program' will be the collected source code that you write that is using libcurl for transfers. The program is outside libcurl and libcurl is outside of the program.

To get more details on all options and functions described herein, please refer to their respective man pages.

Building

There are many different ways to build C programs. This chapter will assume a Unix style build process. If you use a different build system, you can still read this to get general information that may apply to your environment as well.

Compiling the Program

Your compiler needs to know where the libcurl headers are located. Therefore you must set your compiler's include path to point to the directory where you installed them. The 'curl-config'[3] tool can be used to get this information:

$ curl-config --cflags

Linking the Program with libcurl

When having compiled the program, you need to link your object files to create a single executable. For that to succeed, you need to link with libcurl and possibly also with other libraries that libcurl itself depends on. Like the OpenSSL libraries, but even some standard OS libraries may be needed on the command line. To figure out which flags to use, once again the 'curl-config' tool comes to the rescue:

$ curl-config --libs

SSL or Not

libcurl can be built and customized in many ways. One of the things that varies from different libraries and builds is the support for SSL-based transfers, like HTTPS and FTPS. If a supported SSL library was detected properly at build-time, libcurl will be built with SSL support. To figure out if an installed libcurl has been built with SSL support enabled, use 'curl-config' like this:

$ curl-config --feature

And if SSL is supported, the keyword 'SSL' will be written to stdout, possibly together with a few other features that could be either on or off on for different libcurls.

See also the "Features libcurl Provides" further down.

autoconf macro

When you write your configure script to detect libcurl and setup variables accordingly, we offer a prewritten macro that probably does everything you need in this area. See docs/libcurl/libcurl.m4 file - it includes docs on how to use it.

Portable Code in a Portable World

TThe program must initialize some of the libcurl functionality globally. That means it should be done exactly once, no matter how many times you intend to use the library. Once for your program's entire life time. This is done using

 curl_global_init()

and it takes one parameter which is a bit pattern that tells libcurl what to initialize. Using CURL_GLOBAL_ALLwill make it initialize all known internal sub modules, and might be a good default option. The current two bits that are specified are:

CURL_GLOBAL_WIN32

which only does anything on Windows machines. When used on a Windows machine, it'll make libcurl initialize the win32 socket stuff. Without having that initialized properly, your program cannot use sockets properly. You should only do this once for each application, so if your program already does this or of another library in use does it, you should not tell libcurl to do this as well.

CURL_GLOBAL_SSL

which only does anything on libcurls compiled and built SSL-enabled. On these systems, this will make libcurl initialize the SSL library properly for this application. This only needs to be done once for each application so if your program or another library already does this, this bit should not be needed.

libcurl has a default protection mechanism that detects if curl_global_init hasn't been called by the timecurl_easy_perform is called and if that is the case, libcurl runs the function itself with a guessed bit pattern. Please note that depending solely on this is not considered nice nor very good.

When the program no longer uses libcurl, it should call curl_global_cleanup, which is the opposite of the init call. It will then do the reversed operations to cleanup the resources the curl_global_init call initialized.

Repeated calls to curl_global_init and curl_global_cleanup should be avoided. They should only be called once each.

Features libcurl Provides

It is considered best-practice to determine libcurl features at run-time rather than at build-time (if possible of course). By calling curl_version_info and checking out the details of the returned struct, your program can figure out exactly what the currently running libcurl supports.

Two Interfaces

libcurl first introduced the so called easy interface. All operations in the easy interface are prefixed with 'curl_easy'. The easy interface lets you do single transfers with a synchronous and blocking function call.

libcurl also offers another interface that allows multiple simultaneous transfers in a single thread, the so called multi interface. More about that interface is detailed in a separate chapter further down. You still need to understand the easy interface first, so please continue reading for better understanding.

Handle the Easy libcurl

To use the easy interface, you must first create yourself an easy handle. You need one handle for each easy session you want to perform. Basically, you should use one handle for every thread you plan to use for transferring. You must never share the same handle in multiple threads.

Get an easy handle with

easyhandle = curl_easy_init();

It returns an easy handle. Using that you proceed to the next step: setting up your preferred actions. A handle is just a logic entity for the upcoming transfer or series of transfers.

You set properties and options for this handle using curl_easy_setopt. They control how the subsequent transfer or transfers will be made. Options remain set in the handle until set again to something different. They are sticky. Multiple requests using the same handle will use the same options.

If you at any point would like to blank all previously set options for a single easy handle, you can callcurl_easy_reset and you can also make a clone of an easy handle (with all its set options) usingcurl_easy_duphandle.

Many of the options you set in libcurl are "strings", pointers to data terminated with a zero byte. When you set strings with curl_easy_setopt, libcurl makes its own copy so that they don't need to be kept around in your application after being set[4].

One of the most basic properties to set in the handle is the URL. You set your preferred URL to transfer withCURLOPT_URL in a manner similar to:

 curl_easy_setopt(handle, CURLOPT_URL, "http://domain.com/");

Let's assume for a while that you want to receive data as the URL identifies a remote resource you want to get here. Since you write a sort of application that needs this transfer, I assume that you would like to get the data passed to you directly instead of simply getting it passed to stdout. So, you write your own function that matches this prototype:

 size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp);

You tell libcurl to pass all data to this function by issuing a function similar to this:

 curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data);

You can control what data your callback function gets in the fourth argument by setting another property:

 curl_easy_setopt(easyhandle, CURLOPT_WRITEDATA, &internal_struct);

Using that property, you can easily pass local data between your application and the function that gets invoked by libcurl. libcurl itself won't touch the data you pass with CURLOPT_WRITEDATA.

libcurl offers its own default internal callback that will take care of the data if you don't set the callback withCURLOPT_WRITEFUNCTION. It will then simply output the received data to stdout. You can have the default callback write the data to a different file handle by passing a 'FILE *' to a file opened for writing with theCURLOPT_WRITEDATA option.

Now, we need to take a step back and have a deep breath. Here's one of those rare platform-dependent nitpicks. Did you spot it? On some platforms[2], libcurl won't be able to operate on files opened by the program. Thus, if you use the default callback and pass in an open file with CURLOPT_WRITEDATA, it will crash. You should therefore avoid this to make your program run fine virtually everywhere.

(CURLOPT_WRITEDATA was formerly known as CURLOPT_FILE. Both names still work and do the same thing).

If you're using libcurl as a win32 DLL, you MUST use the CURLOPT_WRITEFUNCTION if you setCURLOPT_WRITEDATA - or you will experience crashes.

There are of course many more options you can set, and we'll get back to a few of them later. Let's instead continue to the actual transfer:

 success = curl_easy_perform(easyhandle);

curl_easy_perform will connect to the remote site, do the necessary commands and receive the transfer. Whenever it receives data, it calls the callback function we previously set. The function may get one byte at a time, or it may get many kilobytes at once. libcurl delivers as much as possible as often as possible. Your callback function should return the number of bytes it "took care of". If that is not the exact same amount of bytes that was passed to it, libcurl will abort the operation and return with an error code.

When the transfer is complete, the function returns a return code that informs you if it succeeded in its mission or not. If a return code isn't enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a buffer of yours where it'll store a human readable error message as well.

If you then want to transfer another file, the handle is ready to be used again. Mind you, it is even preferred that you re-use an existing handle if you intend to make another transfer. libcurl will then attempt to re-use the previous connection.

For some protocols, downloading a file can involve a complicated process of logging in, setting the transfer mode, changing the current directory and finally transferring the file data. libcurl takes care of all that complication for you. Given simply the URL to a file, libcurl will take care of all the details needed to get the file moved from one machine to another.

Upload Data to a Remote Site

libcurl tries to keep a protocol independent approach to most transfers, thus uploading to a remote FTP site is very similar to uploading data to a HTTP server with a PUT request.

Of course, first you either create an easy handle or you re-use one existing one. Then you set the URL to operate on just like before. This is the remote URL, that we now will upload.

Since we write an application, we most likely want libcurl to get the upload data by asking us for it. To make it do that, we set the read callback and the custom pointer libcurl will pass to our read callback. The read callback should have a prototype similar to:

 size_t function(char *bufptr, size_t size, size_t nitems, void *userp);

Where bufptr is the pointer to a buffer we fill in with data to upload and size*nitems is the size of the buffer and therefore also the maximum amount of data we can return to libcurl in this call. The 'userp' pointer is the custom pointer we set to point to a struct of ours to pass private data between the application and the callback.

 curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);

 curl_easy_setopt(easyhandle, CURLOPT_READDATA, &filedata);

Tell libcurl that we want to upload:

 curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, 1L);

A few protocols won't behave properly when uploads are done without any prior knowledge of the expected file size. So, set the upload file size using the CURLOPT_INFILESIZE_LARGE for all known file sizes like this[1]:

 /* in this example, file_size must be an curl_off_t variable */

 curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size);

When you call curl_easy_perform this time, it'll perform all the necessary operations and when it has invoked the upload it'll call your supplied callback to get the data to upload. The program should return as much data as possible in every invoke, as that is likely to make the upload perform as fast as possible. The callback should return the number of bytes it wrote in the buffer. Returning 0 will signal the end of the upload.

When It Doesn't Work

There will always be times when the transfer fails for some reason. You might have set the wrong libcurl option or misunderstood what the libcurl option actually does, or the remote server might return non-standard replies that confuse the library which then confuses your program.

There's one golden rule when these things occur: set the CURLOPT_VERBOSE option to 1. It'll cause the library to spew out the entire protocol details it sends, some internal info and some received protocol data as well (especially when using FTP). If you're using HTTP, adding the headers in the received output to study is also a clever way to get a better understanding why the server behaves the way it does. Include headers in the normal body output with CURLOPT_HEADER set 1.

Of course, there are bugs left. We need to know about them to be able to fix them, so we're quite dependent on your bug reports! When you do report suspected bugs in libcurl, please include as many details as you possibly can: a protocol dump that CURLOPT_VERBOSE produces, library version, as much as possible of your code that uses libcurl, operating system name and version, compiler name and version etc.

If CURLOPT_VERBOSE is not enough, you increase the level of debug data your application receive by using the CURLOPT_DEBUGFUNCTION.

Getting some in-depth knowledge about the protocols involved is never wrong, and if you're trying to do funny things, you might very well understand libcurl and how to use it better if you study the appropriate RFC documents at least briefly.


原创粉丝点击