8-Elementary UDP Sockets

来源：互联网发布：福州大学网络教育平台编辑：程序博客网时间：2024/04/27 22:55

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

8.1 Introduction

Figure 8.1 shows the function calls for a typical UDP client/server. The client sends a datagram to the server using sendto, which requires the address of the destination(server) as a parameter. The server calls recvfrom which waits until data arrives from some client. recvfrom returns the protocol address of the client, along with the datagram, so the server can send a response to the correct client.

8.2 ‘recvfrom’ and ‘sendto’ Functions

#include <sys/types.h>#include <sys/socket.h>ssize_t recvfrom(int sockfd, void *buff, size_t nbytes, int flags, struct sockaddr *from, socklen_t *addrlen);ssize_t sendto(int sockfd, const void *buff, size_t nbytes, int flags, const struct sockaddr *to, socklen_t addrlen);Both return: number of bytes read or written if OK, –1 on error

sockfd, buff, and nbytes are: descriptor, pointer to buffer to read into or write from, and number of bytes to read or write.
to for sendto is a socket address structure containing the protocol address(IP address & port number) of where the data is to be sent. The size of this socket address structure is specified by addrlen. recvfrom fills in the socket address structure pointed to by from with the protocol address of who sent the datagram. The number of bytes stored in this socket address structure is returned to the caller in the integer pointed to by addrlen. Note addrlen to sendto is an integer value, while to recvfrom is a pointer to an integer value.
The final two arguments to recvfrom: the contents of the socket address structure upon return tell us who sent the datagram(in the case of UDP) or who initiated the connection(in the case of TCP).
The final two arguments to sendto: we fill in the socket address structure with the protocol address of where to send the datagram(in the case of UDP) or with whom to establish a connection(in the case of TCP).
Both functions return the length of the data that was read or written as the value of the function. With a datagram protocol, the return value of recvfrom is the amount of user data in the datagram received.
Writing a datagram of length 0 is acceptable. In the case of UDP, this results in an IP datagram containing an IP header(20 bytes for IPv4 and 40 bytes for IPv6), an 8-byte UDP header, and no data. A return value of 0 from recvfrom is acceptable for a datagram protocol, not mean that the peer has closed the connection. Since UDP is connectionless, there is no such thing as closing a UDP connection.
If from is a null pointer, then the corresponding addrlen must also be a null pointer. This indicates that we are not interested in knowing the protocol address of who sent us data.

8.3 UDP Echo Server: ‘main’ Function

Create UDP socket, bind server’s well-known port 7–12

Create UDP socket by specifying the second argument to socket as SOCK_DGRAM(a datagram socket in the IPv4 protocol). The IPv4 address for the bind is specified as INADDR_ANY and the server’s well-known port is the constant SERV_PORT from the unp.h header.

8.4 UDP Echo Server: ‘dg_echo’ Function

Read datagram, echo back to sender 8–12

This function reads the next datagram arriving at the server’s port using recvfrom and sends it back using sendto.
This function never terminates: Since UDP is a connectionless protocol, there is nothing like an EOF as we have with TCP.
This function provides an iterative server(a single server process handles any and all clients). In general, TCP servers are concurrent and UDP servers are iterative.
There is implied queuing taking place in the UDP layer for this socket. Each UDP socket has a receive buffer and each datagram that arrives for this socket is placed in that socket receive buffer. When the process calls recvfrom, the next datagram from the buffer is returned to the process in a first-in, first-out(FIFO) order. If multiple datagrams arrive for the socket before the process can read what’s already queued for the socket, the arriving datagrams are just added to the socket receive buffer. But this buffer has a limited size: SO_RCVBUF socket option in Section 7.5.

Figure 8.5 summarizes TCP client/server from Chapter 5 when two clients establish connections with the server: There are two connected sockets and each of the two connected sockets on the server host has its own socket receive buffer.

Figure 8.6 shows the scenario when two clients send datagrams to our UDP server. There is only one server process and it has a single socket on which it receives all arriving datagrams and sends all responses. That socket has a receive buffer into which all arriving datagrams are placed.
The main function in Figure 8.3 is protocol-dependent(it creates a socket of protocol AF_INET and allocates and initializes an IPv4 socket address structure), but the dg_echo function is protocol-independent because the caller(main function) must allocate a socket address structure of the correct size, and a pointer to this structure, along with its size, are passed as arguments to dg_echo. The function dg_echo never looks inside this protocol-dependent structure, it simply passes a pointer to the structure to recvfrom and sendto. recvfrom fills this structure with the IP address and port number of the client, and since the same pointer(pcliaddr) is then passed to sendto as the destination address, this is how the datagram is echoed back to the client that sent the datagram.

8.5 UDP Echo Client: ‘main’ Function

8.6 UDP Echo Client: ‘dg_cli’ Function

Four steps in processing loop: read a line from standard input using fgets, send the line to the server using sendto, read back the server’s echo using recvfrom, and print the echoed line to standard output using fputs.
Our client has not asked the kernel to assign an ephemeral port to its socket.(With a TCP client, the call to connect is where this takes place.) With a UDP socket, the first time the process calls sendto, if the socket has not yet had a local port bound to it, that is when an ephemeral port is chosen by the kernel for the socket. As with TCP, the client can call bind explicitly, but this is rarely done.
Recvfrom specifies a null pointer as the fifth and sixth arguments to tell the kernel that we are not interested in knowing who sent the reply.
dg_cli is protocol-independent, but the client main function is protocol-dependent. The main function allocates and initializes a socket address structure of some protocol type and then passes a pointer to this structure, along with its size, to dg_cli.

8.7 Lost Datagrams

Our UDP client/server is not reliable.
1. If a client datagram is lost, the client will block forever in its call to recvfrom in the function dg_cli, waiting for a server reply that will never arrive.
2. If the client datagram arrives at the server but the server’s reply is lost, the client will again block forever in its call to recvfrom.
A typical way to prevent this is to place a timeout on the client’s call to recvfrom. Section 14.2.

8.8 Verifying Received Response

Section 8.6: Any process that knows the client’s ephemeral port number could send datagrams to our client, and these would be intermixed with the normal server replies.
What we can do is change the call to recvfrom in Figure 8.8 to return the IP address and port of who sent the reply and ignore any received datagrams that are not from the server to whom we sent the datagram. There are a few pitfalls with this.
First, we change the client main function(Figure 8.7) to use the standard echo server(Figure 2.18). Replace the assignment
servaddr.sin_port = htons(SERV_PORT);
with
servaddr.sin_port = htons(7);
We do this so we can use any host running the standard echo server with our client.
We then recode the dg_cli function to allocate another socket address structure to hold the structure returned by recvfrom. We show this in Figure 8.9.

Allocate another socket address structure 9

We allocate another socket address structure by calling malloc. dg_cli function is still protocol-independent; because we do not care what type of socket address structure we are dealing with, we use only its size in the call to malloc.

Compare returned address 12–18

In the call to recvfrom, we tell the kernel to return the address of the sender of the datagram. We first compare the length returned by recvfrom in the value-result argument and then compare the socket address structures themselves using memcmp.
Section 3.2: Even if the socket address structure contains a length field, we need never set it or examine it. memcmp compares every byte of data in the two socket address structures, and the length field is set in the socket address structure that the kernel returns; so in this case we must set it when constructing the sockaddr. If we don’t, the memcmp will compare a 0(since we didn’t set it) with a 16(assuming sockaddr_in) and will not match.
This new version of our client works fine if the server is on a host with just a single IP address. But this program can fail if the server is multihomed. We run this program to our host freebsd4, which has two interfaces and two IP addresses.

$ host freebsd4freebsd4.unpbook.com has address 172.24.37.94freebsd4.unpbook.com has address 135.197.17.100$ udpcli02 135.197.17.100helloreply from 172.24.37.94:7(ignored)goodbyereply from 172.24.37.94:7(ignored)

We specified the IP address that does not share the same subnet as the client. This is normally allowed. Most IP implementations accept an arriving IP datagram that is destined for any of the host’s IP addresses, regardless of the interface on which the datagram arrives. RFC 1122 calls this the weak end system model. If a system implemented what this RFC calls the strong end system model, it would accept an arriving datagram only if that datagram arrived on the interface to which it was addressed.
The IP address returned by recvfrom(the source IP address of the UDP datagram) is not the IP address to which we sent the datagram. When the server sends its reply, the destination IP address is 172.24.37.78. The routing function within the kernel on freebsd4 chooses 172.24.37.94 as the outgoing interface. Since the server has not bound an IP address to its socket(the server has bound the wildcard address to its socket), the kernel chooses the source address for the IP datagram. It is chosen to be the primary IP address of the outgoing interface. Also, since it is the primary IP address of the interface, if we send our datagram to a non-primary IP address of the interface(i.e., an alias), this will also cause our test in Figure 8.9 to fail.
One solution is for the client to verify the responding host’s domain name instead of its IP address by looking up the server’s name in the DNS(Chapter 11), given the IP address returned by recvfrom.
Another solution is for the UDP server to create one socket for every IP address that is configured on the host, bind that IP address to the socket, use select across all these sockets(waiting for any one to become readable), and then reply from the socket that is readable. Since the socket used for the reply was bound to the IP address that was the destination address of the client’s request(or the datagram would not have been delivered to the socket), this guaranteed that the source address of the reply was the same as the destination address of the request.
The scenario in this section is for Berkeley-derived implementations that choose the source IP address based on the outgoing interface.

8.9 Server Not Running

If we start the client without starting the server and type in a single line to the client, nothing happens. The client blocks forever in its call to recvfrom, waiting for a server reply that will never appear.
First we start tcpdump on the host, then start the client on the same host, specifying the host freebsd4 as the server host. We then type a single line, but the line is not echoed.

$ udpcli01 172.24.37.94hello, world

We type this line but nothing is echoed back. Figure 8.10 shows the tcpdump output.

First an ARP request and reply are needed before the client host can send the UDP datagram to the server host.(We left this exchange in the output to reiterate the potential for an ARP request-reply before an IP datagram can be sent to another host or router on the local network.)
Line 3: the client datagram sent but the server host responds in line 4 with an ICMP “port unreachable”.(The length 13 accounts for the 12 characters “hello, world” and the newline.) This ICMP error is not returned to the client process. Instead, the client blocks forever in the call to recvfrom in Figure 8.8.
We call this ICMP error an asynchronous error. The error was caused by sendto, but sendto returned successfully. Section 2.11: A successful return from a UDP output operation only means there was room for the resulting IP datagram on the interface output queue. The ICMP error is not returned until later(4 ms later in Figure 8.10), which is why it is called asynchronous.
The basic rule is that an asynchronous error is not returned for a UDP socket unless the socket has been connected(Section 8.11).
Consider a UDP client that sends three datagrams in a row to three different servers(i.e., three different IP addresses) on a single UDP socket. The client then enters a loop that calls recvfrom to read the replies. Two of the datagrams are correctly delivered(that is, the server was running on two of the three hosts) but the third host was not running the server. This third host responds with an ICMP port unreachable.
This ICMP error message contains the IP header and UDP header of the datagram that caused the error.(ICMPv4 and ICMPv6 error messages always contain the IP header and all of the UDP header or part of the TCP header to allow the receiver of the ICMP error to determine which socket caused the error.) The client that sent the three datagrams needs to know the destination of the datagram that caused the error to distinguish which of the three datagrams caused the error. How can the kernel return this information to the process? Recvfrom can’t return destination IP address and destination UDP port number of the datagram in error, it can only return an errno value. Therefor, the decision was made to return these asynchronous errors to the process only if the process connected the UDP socket to exactly one peer.

8.10 Summary of UDP Example

Figure 8.11 shows as bullets the four values that must be specified or chosen when the client sends a UDP datagram. The client must specify the server’s IP address and port number for the call to sendto.
Normally, the client’s IP address and port are chosen automatically by the kernel. If these two values for the client are chosen by the kernel, the client’s ephemeral port is chosen once, on the first sendto, and then it never changes. The client’s IP address can change for every UDP datagram that the client sends, assuming the client does not bind a specific IP address to the socket. Reason shown in Figure 8.11: If the client is multihomed, the client could alternate between two destinations, one going out the datalink on the left, and the other going out the datalink on the right. In worst-case scenario, the client’s IP address, as chosen by the kernel based on the outgoing datalink, would change for every datagram.
What happens if the client binds an IP address to its socket, but the kernel decides that an outgoing datagram must be sent out some other datalink?
In this case the IP datagram will contain a source IP address that is different from the IP address of the outgoing datalink(see Exercise 8.6).

Figure 8.12 shows the four values from the server’s perspective. There are at least four pieces of information that a server might want to know from an arriving IP datagram: source IP address, destination IP address, source port number, and destination port number. Figure 8.13 shows the function calls that return this information for a TCP server and a UDP server.

A TCP server has access to all four pieces of information for a connected socket and these four values remain constant for the lifetime of a connection.
With a UDP socket, the destination IP address can only be obtained by setting the IP_RECVDSTADDR socket option for IPv4 or the IPV6_PKTINFO socket option for IPv6 and then calling recvmsg instead of recvfrom. Since UDP is connectionless, the destination IP address can change for each datagram that is sent to the server. A UDP server can also receive datagrams destined for one of the host’s broadcast addresses or for a multicast address(Chapters 20 and 21).

8.11 ‘connect’ Function with UDP

Section 8.9: An asynchronous error is not returned on a UDP socket unless the socket has been connected. We are able to call connect(Section 4.3) for a UDP socket, but there is no three-way handshake. The kernel just checks for any immediate errors(e.g., an unreachable destination), records the IP address and port number of the peer from the socket address structure passed to connect, and returns immediately to the calling process.
With this capability, we must distinguish between
1. An unconnected UDP socket, the default when we create a UDP socket.
2. A connected UDP socket, the result of calling connect on a UDP socket.
With a connected UDP socket, three things change, compared to the default unconnected UDP socket:
1. We can no longer specify the destination IP address and port for an output operation. We do not use sendto, but write or send instead. Anything written to a connected UDP socket is automatically sent to the protocol address(e.g., IP address and port) specified by connect. Similar to TCP, we can call sendto for a connected UDP socket, but we cannot specify a destination address. The fifth argument to sendto(pointer to the socket address structure) must be a null pointer, and the sixth argument(size of the socket address structure) should be 0.
2. We do not need to use recvfrom to learn the sender of a datagram, but read, recv, or recvmsg instead.
  The only datagrams returned by the kernel for an input operation on a connected UDP socket are those arriving from the protocol address specified in connect. Datagrams destined to the connected UDP socket’s local protocol address(e.g., IP address and port) but arriving from a protocol address other than the one to which the socket was connected are not passed to the connected socket. A connected UDP socket exchanges datagrams with only one IP address, because it is possible to connect to a multicast or broadcast address.
3. Asynchronous errors are returned to the process for connected UDP sockets. Unconnected UDP sockets do not receive asynchronous errors.
Figure 8.14 summarizes the first point in the list with respect to 4.4BSD.

POSIX: An output operation that does not specify a destination address on an unconnected UDP socket should return ENOTCONN, not EDESTADDRREQ.
Figure 8.15 summarizes the three points that we made about a connected UDP socket.

The application calls connect, specifying the IP address and port number of its peer. It then uses read and write to exchange data with the peer. Datagrams arriving from any other IP address or port(“???” in Figure 8.15) are not passed to the connected socket because either the source IP address or source UDP port does not match the protocol address to which the socket is connected. These datagrams could be delivered to some other UDP socket on the host. If there is no other matching socket for the arriving datagram, UDP will discard it and generate an ICMP “port unreachable” error.
Summary: A UDP client or server can call connect only if that process uses the UDP socket to communicate with exactly one peer. Normally, a UDP client calls connect; a UDP server that communicates with a single client for a long duration call connect.

A DNS client can be configured to use one or more servers, normally by listing the IP addresses of the servers in the file /etc/resolv.conf. If a single server is listed, the client can call connect; if multiple servers are listed, the client cannot call connect. A DNS server normally handles any client request, so the servers cannot call connect.

Calling connect Multiple Times for a UDP Socket

A process with a connected UDP socket can call connect again for that socket for one of two reasons:
1. To specify a new IP address and port.
2. To unconnect the socket.
3. Specifying a new peer for a connected UDP socket differs from connect with TCP socket(connect can be called only one time for a TCP socket).
4. To unconnect a UDP socket, we zero out an address structure, set the family to AF_UNSPEC and pass it to connect. This might return an error of EAFNOSUPPORT but is acceptable. It is the process of calling connect on an already connected UDP socket that causes the socket to become unconnected.

Performance

When an application calls sendto on an unconnected UDP socket, Berkeley-derived kernels temporarily connect the socket, send the datagram, and then unconnect the socket. Calling sendto for two datagrams on an unconnected UDP socket then involves the following 6 steps by the kernel:
Connect the socket -> Output the first datagram -> Unconnect the socket ->
Connect the socket -> Output the first datagram -> Unconnect the socket
Another consideration is the number of searches of the routing table. The first temporary connect searches the routing table for the destination IP address and saves(caches) that information. The second temporary connect notices that the destination address equals the destination of the cached routing table information(we are assuming two sendto to the same destination) and we do not need to search the routing table again.
When the application knows it will be sending multiple datagrams to the same peer, it is more efficient to connect the socket explicitly. Calling connect and then calling write two times involves the following steps by the kernel:
Connect the socket -> Output first datagram -> Output second datagram
In this case, the kernel copies only the socket address structure containing the destination IP address and port one time, versus two times when sendto is called twice.

8.12 ‘dg_cli’ Function(Revisited)

$ udpcli04 172.24.37.94hello, worldread error: Connection refused

We do not receive the error when we start the client process. The error occurs only after we send the first datagram to the server. It is sending this datagram that elicits the ICMP error from the server host.
But when a TCP client calls connect, specifying a server host that is not running the server process, connect returns the error because the call to connect causes the TCP three-way handshake to happen, and the first packet of that handshake elicits an RST from the server TCP(Section 4.3). Figure 8.18 shows the tcpdump output.

Figure A.15: this ICMP error is mapped by the kernel into the error ECONNREFUSED, which corresponds to the message string output by our err_sys function: “Connection refused.”

8.13 Lack of Flow Control with UDP

Figure 8.19: dg_cli writes 2,000 1,400-byte UDP datagrams to the server.

Figure 8.20: server receive datagrams and count the number received. This server no longer echoes datagrams back to the client. When we terminate the server with interrupt key(SIGINT), it prints the number of received datagrams and terminates.

We run netstat -s on the server, both before and after, as the statistics that are output tell us how many datagrams were lost. Figure 8.21 shows the output on the server.

The client sent 2,000 datagrams, but the server application received only 30 of these, for a 98% loss rate.
If we look at the netstat output, the total number of datagrams received by the server host(not the server application) is 2,000(73,208 - 71,208). The counter “dropped due to full socket buffers” indicates how many datagrams were received by UDP but were discarded because the receiving socket’s receive queue was full. This value is 1,970(3,491 - 1,971), which when added to the counter output by the application(30), equals the 2,000 datagrams received by the host.
The number of datagrams received by the server in this example is not predictable. It depends on many factors, such as the network load, the processing load on the client host, and the processing load on the server host.
If we run the same client and server, but this time with the client on the slow Sun and the server on the faster RS/6000, no datagrams are lost.

$ udpserv06^?                          #we type our interrupt key after the client is finishedreceived 2000 datagrams

The number of UDP datagrams that are queued by UDP for a given socket is limited by the size of that socket’s receive buffer. We can change this with the SO_RCVBUF socket option(Section 7.5). The default size of the UDP socket receive buffer under FreeBSD is 42080 bytes, which allows room for only 30 of our 1400-byte datagrams. If we increase the size of the socket receive buffer, we expect the server to receive additional datagrams. Figure 8.22 shows a modification to the dg_echo function from Figure 8.20 that sets the socket receive buffer to 240 KB.

If we run this server on the Sun and the client on the RS/6000, the count of received datagrams is now 103. While this is better than the earlier example with the default socket receive buffer, it is no panacea.
Why set the receive socket buffer size to 220 * 1024 in Figure 8.22?
The maximum size of a socket receive buffer in FreeBSD defaults to 262144 bytes (256 * 1,024), but due to the buffer allocation policy, the actual limit is 233016 bytes.

8.14 Determining Outgoing Interface with UDP

A connected UDP socket can be used to determine the outgoing interface that will be used to a particular destination. This is because of a side effect of the connect function when applied to a UDP socket: The kernel chooses the local IP address assuming the process has not already called bind to explicitly assign this. This local IP address is chosen by searching the routing table for the destination IP address, and then using the primary IP address for the resulting interface.

Figure 8.23: connects to a specified IP address and then calls getsockname, printing the local IP address and port.
If we run the program on the multihomed host freebsd, we have the following output:

$ udpcli09 206.168.112.96local address 12.106.32.254:52329$ udpcli09 192.168.42.2local address 192.168.42.1:52330$ udpcli09 127.0.0.1local address 127.0.0.1:52331

The first time: the command-line argument is an IP address that follows the default route. The kernel assigns the local IP address to the primary address of the interface to which the default route points.
The second time: the argument is the IP address of a system connected to a second Ethernet interface, so the kernel assigns the local IP address to the primary address of this second interface.
Calling connect on a UDP socket does not send anything to that host; it is entirely a local operation that saves the peer’s IP address and port. Calling connect on an unbound UDP socket also assigns an ephemeral port to the socket.

8.15 TCP and UDP Echo Server Using ‘select’

We combine our concurrent TCP echo server from Chapter 5 with our iterative UDP echo server from this chapter into a single server that uses select to multiplex a TCP and UDP socket.

Create listening TCP socket 14–22

A listening TCP socket is created that is bound to the server’s well-known port. We set the SO_REUSEADDR socket option in case connections exist on this port.

Create UDP socket 23–29

A UDP socket is also created and bound to the same port. Even though the same port is used for TCP and UDP sockets, there is no need to set the SO_REUSEADDR socket option before this call to bind, because TCP ports are independent of UDP ports.

Establish signal handler for SIGCHLD 30

A signal handler is established for SIGCHLD because TCP connections will be handled by a child process. We showed this signal handler in Figure 5.11.

Prepare for select 31–32

We initialize a descriptor set for select and calculate the maximum of the two descriptors for which we will wait.

Call select 34–41

We call select, waiting only for readability on the listening TCP socket or readability on the UDP socket. Since our sig_chld handler can interrupt our call to select, we handle an error of EINTR.

Handle new client connection 42–51

We accept a new client connection when the listening TCP socket is readable, fork a child, and call our str_echo function in the child. This is the same sequence of steps we used in Chapter 5.

Handle arrival of datagram 52–57

If the UDP socket is readable, a datagram has arrived. We read it with recvfrom and send it back to the client with sendto.

8.16 Summary

Converting our echo client/server to use UDP instead of TCP was simple. But lots of features provided by TCP are missing: detecting lost packets and retransmitting, verifying responses as being from the correct peer, and the like. We will return to this topic in Section 22.5 and see what it takes to add some reliability to a UDP application.
UDP sockets can generate asynchronous errors, that is, errors that are reported some time after a packet is sent. TCP sockets always report these errors to the application, but with UDP, the socket must be connected to receive these errors.
UDP has no flow control, and this is easy to demonstrate. Normally, this is not a problem, because many UDP applications are built using a request-reply model, and not for transferring bulk data.
There are still more points to consider when writing UDP applications, but we will save these until Chapter 22, after covering the interface functions, broadcasting, and multicasting.

Exercises

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

0 0