Python 并发的意义

  • Post category:Python

Python并发的意义:
在单线程执行的情况下,程序在执行IO密集型任务时(如网络请求,磁盘读写等),大部分时间都在等待I/O操作完成,而这些操作会阻塞线程,导致程序响应变慢。同时,如果是CPU密集型任务,单线程也不能发挥出多核CPU的优势。
而Python并发可以提高程序的运行效率,通过多线程/进程/协程的方式,可以让程序在执行IO操作时立即释放CPU,去执行其他任务,提高整个程序的执行效率,同时也能够发挥出多核CPU的优势,提高CPU密集型任务的执行速度。

Python并发的使用方法攻略:
1.多线程:
线程是操作系统能够进行运算调度的最小单位。在Python中,使用Threading模块实现多线程,其中Thread类用于创建线程对象。
示例1: 爬取多个网站的信息,将结果保存到一个文件中。

import threading
import requests

def get_page(url):
    res = requests.get(url)
    with open('result.txt', 'a') as f:
        f.write(res.text)

if __name__ == '__main__':
    urls = ['http://www.python.org', 'http://www.baidu.com', 'http://www.cnblogs.com']
    threads = []
    for url in urls:
        t = threading.Thread(target=get_page, args=(url,))
        threads.append(t)
    for t in threads:
        t.start()
    for t in threads:
        t.join()

示例2: 多线程进行数字的累加计算。

import threading

COUNT = 100000000
NUM_WORKERS = 10

def increment_count():
    global COUNT
    for i in range(int(COUNT/NUM_WORKERS)):
        COUNT += 1

if __name__ == "__main__":
    workers = []
    for i in range(NUM_WORKERS):
        worker = threading.Thread(target=increment_count)
        workers.append(worker)
        worker.start()

    for worker in workers:
        worker.join()

    print("Final count: ", COUNT)

2.多进程:
进程是系统进行资源管理和调度的基本单位。通过multiprocessing模块实现多进程操作,其中Process类用于创建进程对象。
示例1: 利用多个进程并行下载文件。

import urllib.request
import time
import multiprocessing

def download(url):
    filename = url.split('/')[-1]
    urllib.request.urlretrieve(url, filename)
    print(filename, 'downloaded.')

if __name__ == "__main__":
    urls = ['http://www.bing.com', 'http://www.baidu.com', 'http://www.cnblogs.com']
    p_list = []
    for url in urls:
        p = multiprocessing.Process(target=download, args=(url,))
        p_list.append(p)
        p.start()
    for p in p_list:
        p.join()

示例2: 多进程进行数字的累加计算。

from multiprocessing import Process, Lock, Value

COUNT = 100000000
NUM_WORKERS = 10

def increment_count(count, lock):
    with lock:
        for i in range(int(COUNT/NUM_WORKERS)):
            count.value += 1

if __name__ == "__main__":
    lock = Lock()
    count = Value('i', 0)
    workers = []
    for i in range(NUM_WORKERS):
        worker = Process(target=increment_count, args=(count, lock))
        workers.append(worker)
        worker.start()

    for worker in workers:
        worker.join()

    print("Final count: ", count.value)

3.协程:
协程是一种用户态的轻量级线程,通过yield语句实现,在执行IO操作时可以暂停,去执行其他协程。
示例1: 使用gevent模块实现协程并发。

import gevent
from gevent import monkey; monkey.patch_socket()
import requests

def get_page(url):
    res = requests.get(url)
    print(res.text)

if __name__ == '__main__':
    urls = ['http://www.python.org', 'http://www.baidu.com', 'http://www.cnblogs.com']
    gevent.joinall([gevent.spawn(get_page, url) for url in urls])

示例2: 使用asyncio模块实现协程并发。

import asyncio

async def acquire_resource():
    await asyncio.sleep(2)
    return 1

async def use_resource(val):
    await asyncio.sleep(1)
    print(val)

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    tasks = [loop.create_task(acquire_resource()) for _ in range(5)]
    results = loop.run_until_complete(asyncio.gather(*tasks))
    tasks = [loop.create_task(use_resource(res)) for res in results]
    loop.run_until_complete(asyncio.gather(*tasks))

以上三种方式都可以达到Python并发效果,但各有侧重。多线程适合I/O密集型任务;多进程适合CPU密集型任务;协程适合I/O密集型任务。具体应该根据任务类型和实际情况选取。