Understand Threads in python
来源:互联网 发布:软件调试技术 编辑:程序博客网 时间:2024/05/16 01:14
We will see some examples of using threads in Python and how to avoid race conditions:
You should run each example several times to notice that threads are unpredictable and that your results differ every time.
Disclaimer: Forget anything you heard about GIL for now, because GIL is not going to mess up with scenarios I want to show.
Example 1:
We want to fetch five different urls:
Single threaded way:
def get_responses(): urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com'] start = time.time() for url in urls: print url resp = urllib2.urlopen(url) print resp.getcode() print "Elapsed time: %s" % (time.time()-start)get_response()
Output is:
http://www.google.com 200http://www.amazon.com 200http://www.ebay.com 200http://www.alibaba.com 200http://www.reddit.com 200Elapsed time: 3.0814409256
Explanation:
- All the urls are fetched in sequence.
- Unless the processor got response from a url, it didn't fetch the next url.
- Network operations are time taking, so processor was idle during the time it was expecting the response from a url.
Even in a single threaded program, there is one thread of execution. Let's call itmain thread. So, last example had only one thread, i.e main thread.
Multi threaded way:
You need to create a class which subclasses Thread:
from threading import Threadclass GetUrlThread(Thread): def __init__(self, url): self.url = url super(GetUrlThread, self).__init__() def run(self): resp = urllib2.urlopen(self.url) print self.url, resp.getcode()def get_responses(): urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com'] start = time.time() threads = [] for url in urls: t = GetUrlThread(url) threads.append(t) t.start() for t in threads: t.join() print "Elapsed time: %s" % (time.time()-start)get_responses()
Output is:
http://www.reddit.com 200http://www.google.com 200http://www.amazon.com 200http://www.alibaba.com 200http://www.ebay.com 200Elapsed time: 0.689890861511
Explanation:
- Appreciate the improvement in running time of this program.
- We wrote a multi threaded program to decrease processor's idle time. While waiting for response of a particular thread's url, processor can work on some other thread and fetch the other thread's url.
- We wanted one thread to act on one url, so overridden the constructor of thread class to pass it a url.
- Execution of a thread means execution of a thread's run().
- So, whatever we want the thread to do must go in its run().
- Created one thread for each url and called start() on it. This tells the processor that it can execute the particular thread i.e the run() of thread.
- We don't want the elapsed time to be evaluated until all the threads have executed, join() comes in picture here.
- Calling join() on a thread tells the main thread to wait for this particular thread to finish before the main thread can execute the next instruction.
- We call join() on all the threads, so elapsed time will be printed only after all the threads have run.
Few things about threads
- Processor might not execute run() of a thread immediately after start().
- You can't say in which order run() of different threads will be called.
- For a specific thread, it's guaranteed that the statements inside run() will be executed sequentially.
- It means that first the the url associated with the thread will be fetched and only then the recieved response will be printed.
Example 2:
We will demonstrate race condition with a program and then fix it:
Read wikipedia example to understand what race condition means.
#define a global variablesome_var = 0class IncrementThread(Thread): def run(self): #we want to read a global variable #and then increment it global some_var read_value = some_var print "some_var in %s is %d" % (self.name, read_value) some_var = read_value + 1 print "some_var in %s after increment is %d" % (self.name, some_var)def use_increment_thread(): threads = [] for i in range(50): t = IncrementThread() threads.append(t) t.start() for t in threads: t.join() print "After 50 modifications, some_var should have become 50" print "After 50 modifications, some_var is %d" % (some_var,)use_increment_thread()
Run this and you will find different result every time.
Explanation:
- There is a global variable and all the threads will modify it.
- All threads should add 1 to the existing value of the variable.
- There are 50 threads, so at end the value of some_var should become 50, but it doesn't.
Why some_var didn't reach 50?
- At some point thread t1 read the value of some_var as 15 and then processor took the control from this thread and gave it to thread t2.
- t2 also reads some_var as 15.
- Both t1 and t2 reset the value of some_var to 15+1 i.e 16.
- But when two threads act on some_var we expected it's value to be increased by 2.
- So, we have a race condition here.
- A similar race condition might have occurred few more times and so value ofsome_var at end remains something like 41 or 42 or anything less than 50.
Fix this race condition
Change the run() of IncrementThread to:
from threading import Locklock = Lock()class IncrementThread(Thread): def run(self): #we want to read a global variable #and then increment it global some_var lock.acquire() read_value = some_var print "some_var in %s is %d" % (self.name, read_value) some_var = read_value + 1 print "some_var in %s after increment is %d" % (self.name, some_var) lock.release()
You should run use_increment_thread again and the result will match your expectation.
Explanation:
- Lock is used to guard against race condition.
- If thread t1 has acquired the lock before performing a set of operations, no other thread can perform the same set of operation until t1 releases the lock.
- We want to make sure that once t1 has read some_var, no other thread can read some_var until t1 is done with modifying the value of some_var.
- So reading some_var and modifying it are logically related operations here.
- And that is why we keep read and modify part of some_var guarded by a Lock instance.
- Lock is a separate object and it will be acquired by the thread from whose context it is called.
Example 3
In last example we saw how a global variable gets affected in multithreading. Let's see an example to verify that one thread cannot affect the instance variable of some other thread.
Let's introduce time.sleep() in this example. It will make sure that a thread goes in suspended state and hence enforces thread switching to occur.
import timeclass CreateListThread(Thread): def run(self): self.entries = [] for i in range(10): time.sleep(1) self.entries.append(i) print self.entriesdef use_create_list_thread(): for i in range(3): t = CreateListThread() t.start()use_create_list_thread()
Run it few times and you will notice that the list do not get printed properly.
Possibly the entries of one thread was getting printed and during this operation, processor switched to some other thread and started printing the entries for other thread. We want to ensure that entries get printed one after another for separate threads.
Change run() of CreateListThread to use lock.
class CreateListThread(Thread): def run(self): self.entries = [] for i in range(10): time.sleep(1) self.entries.append(i) lock.acquire() print self.entries lock.release()
So, we put the print operation inside a lock. When one thread has acquired the lock and printing its entries, no other thread can print its entries. And so you will see entries of different threads printed on separate lines.
This will show that all threads' entries, which is an instance variable, is a list with numbers from 0 to 9. So, thread switching doesn't affect the instance variable of a thread.
- Understand Threads in python
- Threads in Python
- [Python]Understand attribute in Python
- Python example of SSH in multi-threads
- Understand ClassLoader in Java
- Understand Selections in D3
- Multi-Threads in COM
- Synchronizing threads in Java
- Creating threads in Java
- Threads in Spring
- Threads in Spring
- Threads in Java
- deeply understand pointer in C
- Understand <Java Concurrency in Practice>
- In order to understand C++
- User Interface Threads in MFC
- Running Multiple Threads in Cocoa
- Processes and Threads in android
- SQL两种简单分页查询方式
- Java 在栈中以常数时间求出栈中最小元素
- activiti工作流退回问题研究
- 黑马程序员——第7天——(面向对象)继承,final关键字,抽象类,接口
- 函数指针与指针函数之间的区别
- Understand Threads in python
- Windows 2003单网卡实现VPN+NAT完整攻略
- html5练习1
- java解惑之表达式之谜(谜题9)
- 安卓http-get请求中有中文怎么办?
- 编译keepalived报错
- http://write.blog.csdn.net/postedit
- 大牛的博客(长期收集新浪、网易等)
- RabbitMq(一) --- Centos5安装RabbitMq