Asynchronous Servers in Python
来源:互联网 发布:淘宝店铺卖家入口 编辑:程序博客网 时间:2024/06/05 11:33
Asynchronous Servers in Python
Nicholas Piël | December 22, 2009There has already been written a lot on the C10K problem and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we handle the concurrent connections in a single thread.
In this post i am going to look at a selection of asynchronous servers implemented in Python.
Asynchronous Server Specs
Since Python is really rich with (asynchronous) frameworks, I collected a few and looked at the following features:
- What License does the framework have?
- Does it provide documentation?
- Does the documentation contain examples?
- Is it used in production somewhere?
- Does it have some sort of community (mailinglist, irc, etc..)?
- Is there any recent activity?
- Does it have a blog (from the owner)?
- Does it have a twitter account?
- Where can i find the repository?
- Does it have a Thread Pool?
- Does it provide access to a TCP Socket?
- Does it have any Comet features?
- Is it using EPOLL?
- What kind of server is it? (greenlets, callbacks, generators etc..)
This gave me the following table.
This is quite a list and i probably still missed a few. The main reasons for using a framework and not implementing something your self is that you hope to be able to accelerate your own development process by standing on the shoulders of other developers. I think it therefore is important that there is documentation, some sort of developers community (mailinglist fe) and that it is still active. If we take this as a requirement we are left with the following solutions:
- Orbited / Twisted (callbacks)
- Tornado (async)
- Dieselweb (generator)
- Eventlet (greenlet)
- Concurrence (stackless)
- Circuits (async)
- Gevent (greenlet)
- Cogen (generator)
To quickly summarize this list; Twisted has been the de-facto standard to async programming with Python. It has an immense community, a wealth of tools, protocols and features. It has grown big and some say it reminds them of shirtless men drinking Jager-bombs complex. This is also one of the biggest reasons why people are looking elsewhere. Recently Facebook released the code of their async. approach called Tornado which is also using callbacks and recent benchmark show that it outperformsTwisted.
A common heard argument against programming with callbacks is that it can get overly complex. A programmatically cleaner approach is to use light-weight threads (imho). This can be achieved by using a different Python implementation; Stackless (such as Concurrence is using) or a plugin for regular python Greenlet (such as Eventlet and Gevent are using). Another approach is to simulate these light-weight threads with Python generators, such as Dieselweb and Cogen are doing.
This should already show that while all these frameworks provide you asynchronous concurrency they do this in each of their own ways. I want to invite you to look at these frameworks as they all have their own code gems. For example, Concurrence has a non-blocking interface to MySQL. Eventlet has a neat thread-pool, Tornado can pre-fork over CPU’s, Gevent offloads HTTP header parsing and DNS lookups to Libevent, Cogen has sendfile support and Twisted probably already has a factory doing exactly what you are planning to do next.
The Ping Pong Benchmark
In this benchmark i am going to focus on the performance of the framework to listen on a socket and write to incoming connections. The client pings the socket by opening it, the server responds with a‘Pong!’ and closes the socket. This should be really simple but it is a pain to create something that does this in an asynchronous and non-blocking way from scratch and that is exactly the reason why we are looking at these frameworks. It is all about making our lives easier.
Ok, for this benchmark i am going to use httperf, a high performance tool that understands the HTTP protocol. If we want httperf to play along in our Ping-Pong benchmark we have to make it understand the ‘PONG!’ response. We can do this by mimicking a HTTP server and have our server respond with:
HTTP/1.0 200 OK
Content-Length: 5Pong!
instead of just ‘Pong!’. Also, since most default server configurations are not set up to handle a large amount of concurrent requests, we need to make a few adjustments:
- Raise the per-process file limit by compiling httperf after some adjustments.
- Raise the per-user file limit, set ‘ulimit -n 10000‘ on both server and client.
- Raise kernel limit on file handles: ‘echo “128000″ > /proc/sys/fs/file-max’.
- Increase the connection backlog, ‘sysctl -w net.core.netdev_max_backlog = 2500‘
- Raise the maximum connections with ’sysctl -w net.core.somaxconn = 250000‘
With these settings my Debian Lenny system was ready to hammer the different servers up to rates far beyond the capacity of the frameworks. I used the following command
httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
And increased the rate with an interval of 100 from 400 up to 9000 requests per second for a total of 40.000 requests at each interval.
Code
What will now follow, is the implementation of the server side in the different frameworks. It should show the different approaches the frameworks take.
Twisted
Gentlemen start your reactor!
Gentlemen start your reactor!
from
twisted.internet
import
epollreactor epollreactor.install()
from
twisted.internet.protocol
import
Protocol, Factory
from
twisted.internet
import
reactor
class
Pong(Protocol):
def
connectionMade(
self
):
self
.transport.write(
"HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
)
self
.transport.loseConnection()
# Start the reactor
factory
=
Factory()
factory.protocol
=
Pong
reactor.listenTCP(
8000
, factory)
reactor.run()
Tornado
Tornado, does not hide the raw socket interface, which makes this example more lengthy then the others.
import
errno
import
functools
import
socket
from
tornado
import
ioloop, iostream
def
connection_ready(sock, fd, events):
while
True
:
try
:
connection, address
=
sock.accept()
except
socket.error, e:
if
e[
0
]
not
in
(errno.EWOULDBLOCK, errno.EAGAIN):
raise
return
connection.setblocking(
0
)
stream
=
iostream.IOStream(connection)
stream.write(
"HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
, stream.close)
if
__name__
=
=
'__main__'
:
sock
=
socket.socket(socket.AF_INET, socket.SOCK_STREAM,
0
)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,
1
)
sock.setblocking(
0
)
sock.bind(("",
8010
))
sock.listen(
5000
)
io_loop
=
ioloop.IOLoop.instance()
callback
=
functools.partial(connection_ready, sock)
io_loop.add_handler(sock.fileno(), callback, io_loop.READ)
try
:
io_loop.start()
except
KeyboardInterrupt:
io_loop.stop()
print
"exited cleanly"
Dieselweb
While this example is beautifully small, i do not really enjoy the generator approach which sprinkles ‘yield’ all over the place.
from
diesel
import
Application, Service
def
server_pong(addr):
yield
"HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
app
=
Application()
app.add_service(Service(server_pong,
8020
))
app.run()
Circuits
I think the Circuit code is the most beautiful of them all, very elegent.
from
circuits.net.sockets
import
TCPServer
class
PongServer(TCPServer):
def
connect(
self
, sock, host, port):
self
.write(sock,
'HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n'
)
self
.close(sock)
PongServer((
'localhost'
,
8050
)).run()
Eventlet
The Eventlet uses a Greenlet approach.
from
eventlet
import
api
def
handle_socket(sock):
sock.makefile(
'w'
).write(
"HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
)
sock.close()
server
=
api.tcp_listener((
'localhost'
,
8030
))
while
True
:
try
:
new_sock, address
=
server.accept()
except
KeyboardInterrupt:
break
# handle every new connection with a new coroutine
api.spawn(handle_socket, new_sock)
Gevent
Gevent is presented as a rewrite of eventlet focussing on performance.
import
gevent
from
gevent
import
socket
def
handle_socket(sock):
sock.sendall(
"HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
)
sock.close()
server
=
socket.socket()
server.bind((
'localhost'
,
8070
))
server.listen(
500
)
while
True
:
try
:
new_sock, address
=
server.accept()
except
KeyboardInterrupt:
break
# handle every new connection with a new coroutine
gevent.spawn(handle_socket, new_sock)
Concurrence
Concurrence uses the Tasklet approach, it can be run under Greenlet and under Stackless Python. In this benchmark there was not really any performance difference between the two different engines.
from
concurrence
import
dispatch, Tasklet
from
concurrence.io
import
BufferedStream, Socket
def
handler(client_socket):
stream
=
BufferedStream(client_socket)
writer
=
stream.writer
writer.write_bytes(
"HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
)
writer.flush()
stream.close()
def
server():
server_socket
=
Socket.new()
server_socket.bind((
'localhost'
,
8040
))
server_socket.listen()
while
True
:
client_socket
=
server_socket.accept()
Tasklet.new(handler)(client_socket)
if
__name__
=
=
'__main__'
:
dispatch(server)
Cogen
Cogen, uses the generator approach as well.
import
sys
from
cogen.core
import
sockets
from
cogen.core
import
schedulers
from
cogen.core.coroutines
import
coroutine
@coroutine
def
server():
srv
=
sockets.Socket()
adr
=
(
'0.0.0.0'
,
len
(sys.argv)>
1
and
int
(sys.argv[
1
])
or
1200
)
srv.bind(adr)
srv.listen(
500
)
while
1
:
conn, addr
=
yield
srv.accept()
fh
=
conn.makefile()
yield
fh.write(
"HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nHello World!\r\n"
)
yield
fh.flush()
conn.close()
m
=
schedulers.Scheduler()
m.add(server)
m.run()
Results
The first graph clearly shows at which connection rate (on the horizontal axis) the successful connection rate starts to degrade. It shows a huge difference between the best performer; Tornado with 7400 requests per second and the worst, Circuits with 1400 requests per second (which doesn’t use EPOLL). This connection rate was sustained for at least 40.000 requests. We can see that, when the hammering of the server continues beyond rates the server can handle, the performance drops. This is caused by connection errors or timeouts.
This graph shows the response time, it is clearly visible that once the maximum connection rate has been reached the overal response time starts to increase.
The last graph shows the amount of errors, ie no return of a 200 detected by httperf. We can see a correlation between the performance of the server and the returned errors at a given request rate. The performing servers return less overall errors. There is however, one exception. Cogen was able to return ALL its requests successfully no matter how hard it was hammered. It is therefore not visible in this graph. This is interesting, at 9000 requests per second it was still able to answer all requests. However, the average connection time (from socket open till socket close) was about 7 seconds meaning that Cogen was serving about 28000 concurrent connections somewhat at reduced performance but not dropping them.
Notes
This post should make it clear that Python has a rich set of options toward asynchronous programming. All tested frameworks show great performance. I mean, even Circuits results with 1300 requests per second isn’t too bad. Tornado really blew me away with its performance at 7400 requests per second. But if i had to choose a favorite i would probably go with Gevent, i am really digging its greenlet style.
The clean Greentlet / Stackless style is really cool, especially since Stackless Python is keeping up nowadays with CPython. There was some talk on a mailing list about Gevent running on Stackless. The concurrence framework already runs on Stackless and can thus be a great option already if you are looking for specific features of Stackless Python such as tasklet-pickling.
I want to make clear that this test only shows how these frameworks perform at a relatively simple task. It could be that when more stuff is going on in the background the results will change. However, I feel that this benchmark is a great indicator of how each frameworks handles a socket connection.
In the coming days I plan to investigate this some more. I will also check out how these Python frameworks stack up against its equivalents in different languages, fe Ape, CometD, NodeJS. Stay tuned!
from http://nichol.as/asynchronous-servers-in-python
- Asynchronous Servers in Python
- Asynchronous Servers in Python
- Asynchronous Calls in .Net
- Asynchronous programming in C# 5
- Asynchronous example Thrift in Java
- Building COM Servers in .NET
- HIGH AVAILABILITY IN APPLICATION SERVERS
- Building ActiveX servers in Qt
- Benchmark of Python WSGI Servers
- Introduction to Asynchronous Programming in .NET 2.0
- Asynchronous Programming Model in .NET 2.0
- Asynchronous Pages in ASP.NET 2.0
- Asynchronous processing support in Servlet 3.0
- Asynchronous Socket Programming in C#: Part I
- Asynchronous Socket Programming in C#: Part II
- Asynchronous Pages in ASP.NET 2.0
- Asynchronous HTTP Requests in Android Using Volley
- Asynchronous HTTP Requests in Android Using Volley
- IOS开发系列--Objective-C之协议、代码块、分类
- React 的 diff 算法
- /dev/input/event* linux keyboard mouse event simulation
- WARN: HHH000223: Recognized obsolete hibernate namespace
- 分隔字符串,返回表的 function
- Asynchronous Servers in Python
- 布局传输延迟该怎么计算?PCB中布线的传播延时公式
- IOS开发系列--Objective-C之类和对象
- PHP通信接口的3种方式
- Android OOM 问题的总结
- awj-----Solr在结果中返回函数值
- 0-1背包问题的四种写法
- IOS开发系列—Objective-C之基础概览
- AndroidAnnotations一个可以让你的android代码更加简洁的框架