multithreading - Python Socket and Thread pooling, how to get more performance? -
multithreading - Python Socket and Thread pooling, how to get more performance? -
i trying implement basic lib issue http get requests. target receive info through socket connections - minimalistic design improve performance - usage threads, thread pool(s).
i have bunch of links grouping hostnames, here's simple demonstration of input urls:
hostname1.com - 500 links hostname2.org - 350 links hostname3.co.uk - 100 links ...
i intend utilize sockets because of performance issues. intend utilize number of sockets keeps connected (if possible , is) , issue http requests. thought came urllib low performance on continuous requests, met urllib3, realized uses httplib , decided seek sockets. here's accomplished till now:
getsocket class, socketpool class, threadpool , worker classes
getsocket class minified, "http only" version of python's httplib.
so, utilize these classes that:
sp = comm.socketpool(host,size=self.poolsize, timeout=5) link in linklist: pool.add_task(self.__get_url_by_sp, self.count, sp, link, results) self.count += 1 pool.wait_completion() pass
__get_url_by_sp
function wrapper calls sp.urlopen
, saves result results
list. using pool of 5 threads has socket pool of 5 getsocket classes.
what wonder is, there other possible way can improve performance of system?
i've read asyncore here, couldn't figure out how utilize same socket connection class httpclient(asyncore.dispatcher)
provided.
another point, don't know if i'm using blocking or non-blocking socket, improve performance or how implement one.
please specific experiences, i don't intend import library http want code own tiny library.
any help appreciated, thanks.
do this.
use multiprocessing
. http://docs.python.org/library/multiprocessing.html.
write worker process
puts of url's queue
.
write worker process
gets url queue
, get, saving file , putting file info queue. you'll want multiple copies of process
. you'll have experiment find how many right number.
write worker process
reads file info queue
, whatever you're trying do.
python multithreading sockets threadpool http-get
Comments
Post a Comment