tcp4---(总结一下)---what.if.process.crash.and.reboot

来源：互联网发布：淘宝天猫内部优惠券网编辑：程序博客网时间：2024/06/05 08:07

#
# This doc describe:
#
# the handling internal after local / remote $(tcp-sock) process crash and reboot.
#
# including:
#
# possible result of binding to local address / port
#
# possible result of receiving *late* segments from remote side
#
#
# Note that, some book and webpage description is obscure / complicated / inaccurate / scary on these topics. But this doc should be clear ( at least to myself ).
#
#
# ---------------------------------------------------------------------------------------------------------------
#
# For clear description purpose, we need to define the role of remote side and local side.
#
# local $(peer-sock)
# remote $(peer-sock)
# local $(listen-sock)
#
#
# For description simplicity, we don't consider: # otherwise, would have many branch to follow in our description.
#
# "simultaneous connection setup" case
#

===============================================================================================================

--- index:

---------------------------------------------------------------------------------------------------------------

=> [*][*][*] don't worry, it is VERY unlikely *late* segments after crash cause problems. # <quote>: <<ServerFramework---TIME_WAIT and its design implications for protocols and scalable client server systems>>

---------------------------------------------------------------------------------------------------------------

=> local $(peer-sock) process crash and reboot

=> when local $(peer-sock) process suddenly crash:

=> when local $(peer-sock) process immediately reboot again:

---------------------------------------------------------------------------------------------------------------

=> local $(listen-sock) process crash and reboot

=> when crash, clean up to local $(listen-sock): # [*] doesn't affect those already accepted local forked (peer-sock)

=> when crash, if local $(listen-sock) and local forked $(peer-sock) are not in the same process:

=> when crash, if local $(listen-sock) and local forked $(peer-sock) are in the same process:

=> local $(listen-sock) process immediately reboot again

=> when segments from previous remote $(peer-sock) comes

---------------------------------------------------------------------------------------------------------------

===============================================================================================================

@@ [*][*][*] don't worry, it is VERY unlikely *late* segments after crash cause problems. # <quote>: <<ServerFramework---TIME_WAIT and its design implications for protocols and scalable client server systems>>

In the diagram above we have two connections from end point 1 to end point 2. The address and port of each end point is the same in each connection. The first connection terminates with the active close initiated by end point 2. If end point 2 wasn't kept in TIME_WAIT for long enough to ensure that all segments from the previous connection had been invalidated then a delayed segment (with appropriate sequence numbers) could be mistaken for part of the second connection...

Note that it is very unlikely that delayed segments will cause problems like this. Firstly the address and port of each end point needs to be the same; which is normally unlikely as the client's port is usually selected for you by the operating system from the ephemeral port range and thus changes between connections. Secondly, the sequence numbers for the delayed segments need to be valid in the new connection which is also unlikely. However, should both of these things occur then TIME_WAIT will prevent the new connection's data from being corrupted.

The second reason for the TIME_WAIT state is to implement TCP's full-duplex connection termination reliably. If the final ACK from end point 2 is dropped then the end point 1 will resend the final FIN. If the connection had transitioned to CLOSED on end point 2 then the only response possible would be to send an RST as the retransmitted FIN would be unexpected. This would cause end point 1 to receive an error even though all data was transmitted correctly.

===============================================================================================================

@@ local $(peer-sock) process crash and reboot

===============================================================================================================

@@-@ when local $(peer-sock) process suddenly crash:

The whole FD table of local process is cleaned by exit() routine, the FD to local $(peer-sock) is released.

For a local $(peer-sock), we have the following cleanup:

-----------------------------------------------------

"socket_file_ops->release()" = sock_close() -> sock_release(SOCKET_I(inode));
-> "inet_stream_ops->release()" = inet_release()
-> "tcp_prot->close()" = tcp_close()

-----------------------------------------------------

if (unlikely(tcp_sk(sk)->repair)) {

sk->sk_prot->disconnect(sk, 0); = tcp_disconnect()

tcp_set_state(sk, TCP_CLOSE);

tcp_send_active_reset(sk, gfp_any());

-----------------------------------------------------

} else if (data_was_unread) {

tcp_set_state(sk, TCP_CLOSE);

tcp_send_active_reset(sk, sk->sk_allocation);

-----------------------------------------------------

} else if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) {

sk->sk_prot->disconnect(sk, 0); = tcp_disconnect()

tcp_set_state(sk, TCP_CLOSE);

tcp_send_active_reset(sk, gfp_any());

-----------------------------------------------------

} else if (tcp_close_state(sk)) { # From ESTABLISHED, transfer to FIN_WAIT1

tcp_send_fin(sk);

-----------------------------------------------------

sk_stream_wait_close(sk, timeout);

-----------------------------------------------------

-----------------------------------------------------

[*][*] That is to say, even local process is killed or crash, its local $(peer-sock) might still exist for a while, and in the process of connection teardown:

[*][*] Later, we consider we are in the worst case: the local $(peer-sock) is in FIN_WAIT1 state, with the hope of graceful teardown.

===============================================================================================================

@@-@ when local $(peer-sock) process immediately reboot again:

It would do the following operations:

-----------------------------------------------------

#0. create a *current* local $(peer-sock).

-----------------------------------------------------

#1. bind to a local socket address ( local IP address + local port ) # maybe autobind

-----------------------------------------------------

#1.a. If the local port is randomly selected ( by specifying 0 ), then that is OK.

-----------------------------------------------------

#1.b. If the local port is a *fixed* one, then we would fail to bind, because: # [*][*]

the *previous* local $(peer-sock) on last process lifetime still exist, and occupy this local port.

--- The previous local $(peer-sock) need this local port to further teardown packet transaction.

--- The previous local $(peer-sock) might be in TCP_TIME_WAIT state, but it is still holding this local port.

As shown by:

"inet_stream_ops->bind()" = inet_bind()

"tcp_prot->get_port()" = inet_csk_get_port()

"ipv4_specific->bind_conflict()" = inet_csk_bind_conflict()

if ((!reuse || !sk2->sk_reuse || sk2->sk_state == TCP_LISTEN) &&
(!reuseport || !sk2->sk_reuseport || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2))))) {

......

--- $(tw-reuse) is in connect() syscall, not in bind().

-----------------------------------------------------

#2. connect to remote side, suppose remote side is a remote $(listen-sock)

-----------------------------------------------------

#2.a. if local $(peer-sock) is bound to a different local port, then:

This is fine, remote $(listen-sock) would respond normally in connection regular setup.

-----------------------------------------------------

#2.b. if local $(peer-sock) is bound to a fixed local port, then:

-----------------------------------------------------

For most case, this is also fine.

-----------------------------------------------------

But some minor chance exist:

Suppose previously we sent a RST to remote $(peer-sock), but the RST has been lost and never reached
remote $(peer-sock). Therefore, remote $(peer-sock) still exist ( suppose it is not quite active,
and its keepalive timer not timeout ).


[*][*] In this case, our *current* local $(peer-sock) is trying to send a 1st SYN to a old remote $(peer-sock). But we would be refused by remote $(peer-sock).

As we can see from old remote $(peer-sock)'s handling to the 1st SYN:

"tcp_protocol->handler()" = tcp_v4_rcv()

sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest); # [*] remote $(peer-sock) would be hit first,
# but not remote $(listen-sock).

struct sock *sk = __inet_lookup_established(net, hashinfo, saddr, sport, daddr, hnum, dif);

return sk ? : __inet_lookup_listener(net, hashinfo, saddr, sport, daddr, hnum, dif);

tcp_v4_do_rcv(sk, skb); -> if (sk->sk_state == TCP_ESTABLISHED) { tcp_rcv_established(sk, skb ) # remote $(peer-sock) is still in ESTABLISHED state

if (!tcp_validate_incoming(sk, skb, th, 1)) return;

/* step 4: Check for a SYN
* RFC 5691 4.2 : Send a challenge ack
*/
if (th->syn) {

tcp_send_challenge_ack(sk); # [*] our remote $(peer-sock) sent a ACK, but not a SYN/ACK

goto discard; # return false;


[*][*] Then remote $(peer-sock) sent a ACK, but not a SYN/ACK. Therefore, our local $(peer-sock) would never make its expected connection setup. Unless remote $(peer-sock) is closed by its keepalive timer or by its upper user-level process.

-----------------------------------------------------

===============================================================================================================

@@ local $(listen-sock) process crash and reboot

===============================================================================================================

@@-@ when crash, clean up to local $(listen-sock): # [*] doesn't affect those already accepted local forked (peer-sock)

For a local $(listen-sock), we have the following cleanup:

-----------------------------------------------------

"socket_file_ops->release()" = sock_close() -> sock_release(SOCKET_I(inode));
-> "inet_stream_ops->release()" = inet_release()
-> "tcp_prot->close()" = tcp_close()

if (sk->sk_state == TCP_LISTEN) {

------------------------------------------------------

tcp_set_state(sk, TCP_CLOSE); -> case TCP_CLOSE:

sk->sk_prot->unhash(sk); # unlink local $(listen-sock) from listen_hash table

if (inet_csk(sk)->icsk_bind_hash && !(sk->sk_userlocks & SOCK_BINDPORT_LOCK))

inet_put_port(sk); # [*] unlink local $(listen-sock) from bhash table, release its local port.

------------------------------------------------------

inet_csk_listen_stop(sk);

------------------------------------------------------

reqsk_queue_destroy(queue); # destroy all $(request-sock) in synQ

------------------------------------------------------

while ((req = acc_req) != NULL) { # destory all $(request-sock) and corresponding local forked $(peer-sock) in acceptQ ( not accept() yet )

sk->sk_prot->disconnect(child, O_NONBLOCK); = tcp_disconnect()

tcp_set_state(sk, TCP_CLOSE);

tcp_send_active_reset(sk, gfp_any());

------------------------------------------------------

[*][*] That is to say, clean up to a local $(listen-sock) does NOT affect those local forked $(peer-sock) which have been already returned by accept() syscall.


===============================================================================================================

@@-@ when crash, if local $(listen-sock) and local forked $(peer-sock) are not in the same process:

It is likely that:

When local process of $(listen-sock) accept a incomming connection request via accept() syscall, it fork a new child
*process* specifically to handle this connection ( by all means passing the local forked $(peer-sock) to new forked child process ).

In this case, even parent process owning local $(listen-sock) crash, but the child process owning local forked $(peer-sock) is still OK.

[*][*] And remote $(peer-sock) and local forked $(peer-sock) could communicate with each other without sensing any problems.

===============================================================================================================

@@-@ when crash, if local $(listen-sock) and local forked $(peer-sock) are in the same process:

This is more likely the usual case. And if crash, the whole process FD table would be cleaned up, including local $(listen-sock)
and local forked $(peer-sock).

The local $(listen-sock) would be cleaned up as described in:

"=> when crash, clean up to local $(listen-sock) " section

The local forked $(peer-sock) ( already returned by accept() syscall ) would be cleaned up as described in:

"=> when local $(peer-sock) process suddenly crash:" section

===============================================================================================================

@@-@ local $(listen-sock) process immediately reboot again

The process logic would be:

-----------------------------------------------------

#0. create a *current* local $(listen-sock).

-----------------------------------------------------

#1. bind this *current* local $(listen-sock) to a *fixed* well-known local port.

This would succeed without any problem, because:

Cleanup to *previous* local $(listen-sock) already release the *fixed* local port.

local forked $(peer-sock) ( not in the same process ) would be still bound to this *fixed* local port, but not affect *current* local $(listen-sock)'s binding.

As shown by:

"inet_stream_ops->bind()" = inet_bind()

"tcp_prot->get_port()" = inet_csk_get_port()

if (((tb->fastreuse > 0 &&
sk->sk_reuse &&
sk->sk_state != TCP_LISTEN) || # [*][*] skip check of "bind_conflict()", if we are $(listen-sock)
(tb->fastreuseport > 0 &&
sk->sk_reuseport &&
uid_eq(tb->fastuid, uid))) &&
(tb->num_owners < smallest_size || smallest_size == -1)) {
smallest_size = tb->num_owners;
smallest_rover = rover;
if (atomic_read(&hashinfo->bsockets) > (high - low) + 1 &&
!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false)) {
snum = smallest_rover;
goto tb_found;
}
}

-----------------------------------------------------

#2. select() / accept() on *current* local $(listen-sock), for new incoming connection request.

-----------------------------------------------------

===============================================================================================================

@@-@-@ when segments from previous remote $(peer-sock) comes

---------------------------------------------------------------------------------------------------------------

As previously said, if local $(peer-sock) was not in the same process, then it is just OK.

sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);

struct sock *sk = __inet_lookup_established(net, hashinfo,
saddr, sport, daddr, hnum, dif);

return sk ? : __inet_lookup_listener(net, hashinfo, saddr, sport,
daddr, hnum, dif);

Would find the local $(peer-sock) in other process which was _NOT_ affected at all by crash of previous local $(listen-sock) process.

---------------------------------------------------------------------------------------------------------------

If local $(peer-sock) was in the same process with previous $(listen-sock), then we would have minor chances that:

#a. Either, the local $(peer-sock) was in previous local $(listen-sock) acceptQ during cleanup, and the RST sent to
remote $(peer-sock) get lost in network.

#b. Or, the local $(peer-sock) was previously accepted, then its cleanup may try gracefull "regular teardown" ( FIN_WAIT_1 ).
( RST is another option, would be same as #a, RST might get lost ).

------------------------------------------------------

If #a, the previous remote $(peer-sock) would not know previous local $(peer-sock) has been CLOSEd, and still send segments.

then:

"tcp_protocol->handler()" = tcp_v4_rcv()

sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest); # [*] Would find *current* local $(listen-sock)

tcp_v4_do_rcv(sk, skb);

if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) { goto reset; # [*] Would send RST to previous remote $(peer-sock)

case TCP_LISTEN: # [*] only handle ingress SYN, ignore other ingress segments


That is to say, in this case, previous remote $(peer-sock) would be RSTed by current local $(peer-sock).

------------------------------------------------------

If #b, when previous remote $(peer-sock) send segments, the previous local $(peer-sock) would be selected from:

sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);

and any ingress segment would go to gracefull "regular teardown" path handling.

------------------------------------------------------

---------------------------------------------------------------------------------------------------------------

===============================================================================================================

0 0