Discussion:
ÔõÃŽŽŠÀíurllib2ÏÂÔØÎÄŒþ¡¢È»ºó±£Žæ
(时间太久无法回复)
c***@newsmth.net-SPAM.no
2012-12-10 11:22:02 UTC
Permalink
主要是保存这一步,我在通过urllib2得到文件流后,以写文件的方式将之往文件中送
这种速度很慢[一分钟处理四五个文件]
有没有其他的好方法
关键代码:
for url_addr in urls:
#1.download file
text = urllib2.urlopen(url_addr).read()
#2.save file
file(url_addr.split(url_seperator)[-1],'wb').write(text)

--

※ 来源:・水木社区 http://www.newsmth.net・[FROM: 60.191.2.*]
oo
2012-12-10 15:00:56 UTC
Permalink
测时间了没?
【 在 citihome () 的大作中提到: 】
: 主要是保存这一步,我在通过urllib2得到文件流后,以写文件的方式将之往文件中送
: 这种速度很慢[一分钟处理四五个文件]
: 有没有其他的好方法
: ...................

--

※ 来源:・水木社区 newsmth.net・[FROM: 101.68.91.*]
c***@newsmth.net-SPAM.no
2012-12-11 02:19:03 UTC
Permalink
测了,总计47个pdf files/17mins
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on 10821073b, Standard
collapesed time of url parsing is 0:00:00.020000
collapesed time of downloading and saving file 'skinning.pdf' parsing is 0:00:00.450000
collapesed time of downloading and saving file 'fogshop-supplement.pdf' parsing is 0:00:13.050000
collapesed time of downloading and saving file 'biclustering-preprint.pdf' parsing is 0:00:10.870000
collapesed time of downloading and saving file 'InverseTex.pdf' parsing is 0:01:37.810000
collapesed time of downloading and saving file 'MSR-TR-2008-51.pdf' parsing is 0:00:00.480000
collapesed time of downloading and saving file 'transfer-preprint.pdf' parsing is 0:00:10.890000
collapesed time of downloading and saving file 'dynInterpolation-tr.pdf' parsing is 0:00:38.740000
collapesed time of downloading and saving file 'GPUGI.pdf' parsing is 0:00:45.320000
collapesed time of downloading and saving file 'BSGP.pdf' parsing is 0:00:04.560000
collapesed time of downloading and saving file 'ShadowFields.pdf' parsing is 0:00:07.180000
collapesed time of downloading and saving file 'SHExp.pdf' parsing is 0:00:03.920000
collapesed time of downloading and saving file 'PVT.pdf' parsing is 0:00:12.800000
collapesed time of downloading and saving file 'BSGP-x3dparser.pdf' parsing is 0:00:01.030000
collapesed time of downloading and saving file 'MSR-TR-2007-37.pdf' parsing is 0:00:11.340000
collapesed time of downloading and saving file 'ParallelOctree-preprint.pdf' parsing is 0:00:22.880000
collapesed time of downloading and saving file 'mesh-animation.pdf' parsing is 0:00:10.660000
collapesed time of downloading and saving file 'MeshPuppetry.pdf' parsing is 0:00:04.460000
collapesed time of downloading and saving file 'HDR-egsr07.pdf' parsing is 0:00:06.700000
collapesed time of downloading and saving file 'ooc-mesh.pdf' parsing is 0:00:06.320000
collapesed time of downloading and saving file 'smoke.pdf' parsing is 0:00:13.230000
collapesed time of downloading and saving file 'VGL.pdf' parsing is 0:00:20.790000
collapesed time of downloading and saving file 'BTF-tvcg.pdf' parsing is 0:00:16.960000
collapesed time of downloading and saving file 'dynamicBRDF.pdf' parsing is 0:00:03.370000
collapesed time of downloading and saving file 'renderants.pdf' parsing is 0:00:30.300000
collapesed time of downloading and saving file 'hair-rendering.pdf' parsing is 0:01:20.171000
collapesed time of downloading and saving file 'debug.pdf' parsing is 0:00:26.350000
collapesed time of downloading and saving file 'BSGP-primitive.pdf' parsing is 0:00:04.950000
collapesed time of downloading and saving file 'kdtree.pdf' parsing is 0:00:04.770000
collapesed time of downloading and saving file 'hairSyn.pdf' parsing is 0:00:35.990000
collapesed time of downloading and saving file 'SPAP-TR.pdf' parsing is 0:00:11.180000
collapesed time of downloading and saving file 'PoissonMesh.pdf' parsing is 0:00:46.180000
collapesed time of downloading and saving file 'isochart.pdf' parsing is 0:00:08.990000
collapesed time of downloading and saving file 'SHScaling.pdf' parsing is 0:00:08.940000
collapesed time of downloading and saving file 'joint-aware.pdf' parsing is 0:00:04.190000
collapesed time of downloading and saving file '2DShape.pdf' parsing is 0:00:05.820000
collapesed time of downloading and saving file 'mem-hierarchy-preprint.pdf' parsing is 0:02:04.620000
collapesed time of downloading and saving file 'transcut-tvcg-preprint.pdf' parsing is 0:01:26.190000
collapesed time of downloading and saving file 'meshquilting.pdf' parsing is 0:00:36.600000
collapesed time of downloading and saving file 'refraction.pdf' parsing is 0:00:03.440000
collapesed time of downloading and saving file 'subd-gpu.pdf' parsing is 0:00:04.470000
collapesed time of downloading and saving file 'texturemontage.pdf' parsing is 0:00:23.382000
collapesed time of downloading and saving file 'MotionFieldSyn.pdf' parsing is 0:00:19.330000
collapesed time of downloading and saving file 'subspace.pdf' parsing is 0:00:09.990000
collapesed time of downloading and saving file 'imitation-preprint.pdf' parsing is 0:00:15.460000
collapesed time of downloading and saving file 'mptracing.pdf' parsing is 0:00:42.030000
collapesed time of downloading and saving file 'LSG.pdf' parsing is 0:00:25.520000
collapesed time of downloading and saving file 'VST.pdf' parsing is 0:00:20.440000

【 在 wuhaochi 的大作中提到: 】
: 测时间了没?

--

※ 来源:・水木社区 http://www.newsmth.net・[FROM: 60.191.2.*]
oo
2012-12-11 02:56:51 UTC
Permalink
这个是用什么统计的?

你对urllib.open()和 file.write()这两个函数分别统计下,看看是哪个耗时。

不过两个你都没办法,试试弄多个线程,多个线程一块网络下载,再弄一个写磁盘。


【 在 citihome () 的大作中提到: 】
: 标 题: Re: 怎么处理urllib2下载文件、然后保存
: 发信站: 水木社区 (Tue Dec 11 10:19:02 2012), 转信
:
: 测了,总计47个pdf files/17mins
: Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on 10821073b, Standard
: >>> collapesed time of url parsing is 0:00:00.020000
: collapesed time of downloading and saving file 'skinning.pdf' parsing is 0:00:00.450000
: collapesed time of downloading and saving file 'fogshop-supplement.pdf' parsing is 0:00:13.050000
: collapesed time of downloading and saving file 'biclustering-preprint.pdf' parsing is 0:00:10.870000
: collapesed time of downloading and saving file 'InverseTex.pdf' parsing is 0:01:37.810000
: collapesed time of downloading and saving file 'MSR-TR-2008-51.pdf' parsing is 0:00:00.480000
: collapesed time of downloading and saving file 'transfer-preprint.pdf' parsing is 0:00:10.890000
: collapesed time of downloading and saving file 'dynInterpolation-tr.pdf' parsing is 0:00:38.740000
: collapesed time of downloading and saving file 'GPUGI.pdf' parsing is 0:00:45.320000
: collapesed time of downloading and saving file 'BSGP.pdf' parsing is 0:00:04.560000
: collapesed time of downloading and saving file 'ShadowFields.pdf' parsing is 0:00:07.180000
: collapesed time of downloading and saving file 'SHExp.pdf' parsing is 0:00:03.920000
: collapesed time of downloading and saving file 'PVT.pdf' parsing is 0:00:12.800000
: collapesed time of downloading and saving file 'BSGP-x3dparser.pdf' parsing is 0:00:01.030000
: collapesed time of downloading and saving file 'MSR-TR-2007-37.pdf' parsing is 0:00:11.340000
: collapesed time of downloading and saving file 'ParallelOctree-preprint.pdf' parsing is 0:00:22.880000
: collapesed time of downloading and saving file 'mesh-animation.pdf' parsing is 0:00:10.660000
: collapesed time of downloading and saving file 'MeshPuppetry.pdf' parsing is 0:00:04.460000
: collapesed time of downloading and saving file 'HDR-egsr07.pdf' parsing is 0:00:06.700000
: collapesed time of downloading and saving file 'ooc-mesh.pdf' parsing is 0:00:06.320000
: collapesed time of downloading and saving file 'smoke.pdf' parsing is 0:00:13.230000
: collapesed time of downloading and saving file 'VGL.pdf' parsing is 0:00:20.790000
: collapesed time of downloading and saving file 'BTF-tvcg.pdf' parsing is 0:00:16.960000
: collapesed time of downloading and saving file 'dynamicBRDF.pdf' parsing is 0:00:03.370000
: collapesed time of downloading and saving file 'renderants.pdf' parsing is 0:00:30.300000
: collapesed time of downloading and saving file 'hair-rendering.pdf' parsing is 0:01:20.171000
: collapesed time of downloading and saving file 'debug.pdf' parsing is 0:00:26.350000
: collapesed time of downloading and saving file 'BSGP-primitive.pdf' parsing is 0:00:04.950000
: collapesed time of downloading and saving file 'kdtree.pdf' parsing is 0:00:04.770000
: collapesed time of downloading and saving file 'hairSyn.pdf' parsing is 0:00:35.990000
: collapesed time of downloading and saving file 'SPAP-TR.pdf' parsing is 0:00:11.180000
: collapesed time of downloading and saving file 'PoissonMesh.pdf' parsing is 0:00:46.180000
: collapesed time of downloading and saving file 'isochart.pdf' parsing is 0:00:08.990000
: collapesed time of downloading and saving file 'SHScaling.pdf' parsing is 0:00:08.940000
: collapesed time of downloading and saving file 'joint-aware.pdf' parsing is 0:00:04.190000
: collapesed time of downloading and saving file '2DShape.pdf' parsing is 0:00:05.820000
: collapesed time of downloading and saving file 'mem-hierarchy-preprint.pdf' parsing is 0:02:04.620000
: collapesed time of downloading and saving file 'transcut-tvcg-preprint.pdf' parsing is 0:01:26.190000
: collapesed time of downloading and saving file 'meshquilting.pdf' parsing is 0:00:36.600000
: collapesed time of downloading and saving file 'refraction.pdf' parsing is 0:00:03.440000
: collapesed time of downloading and saving file 'subd-gpu.pdf' parsing is 0:00:04.470000
: collapesed time of downloading and saving file 'texturemontage.pdf' parsing is 0:00:23.382000
: collapesed time of downloading and saving file 'MotionFieldSyn.pdf' parsing is 0:00:19.330000
: collapesed time of downloading and saving file 'subspace.pdf' parsing is 0:00:09.990000
: collapesed time of downloading and saving file 'imitation-preprint.pdf' parsing is 0:00:15.460000
: collapesed time of downloading and saving file 'mptracing.pdf' parsing is 0:00:42.030000
: collapesed time of downloading and saving file 'LSG.pdf' parsing is 0:00:25.520000
: collapesed time of downloading and saving file 'VST.pdf' parsing is 0:00:20.440000
:
: 【 在 wuhaochi 的大作中提到: 】
: : 测时间了没?
:
: --
:
: ※ 来源:・水木社区 http://www.newsmth.net・[FROM: 60.191.2.*]


--

※ 来源:・水木社区 newsmth.net・[FROM: 27.115.104.*]
c***@newsmth.net-SPAM.no
2012-12-11 07:58:40 UTC
Permalink
自己用datetime包(+格式字符串)做的统计
我了解的socket通信都是异步的,可能整多线程有好的性能,(但对网络读和磁盘写的同步不是很熟悉,只知道所有网络读方式结束然后磁盘写)我先试试
【 在 wuhaochi 的大作中提到: 】
: 这个是用什么统计的?
: 你对urllib.open()和 file.write()这两个函数分别统计下,看看是哪个耗时。
: 不过两个你都没办法,试试弄多个线程,多个线程一块网络下载,再弄一个写磁盘。
: ...................

--

※ 来源:・水木社区 http://www.newsmth.net・[FROM: 60.191.2.*]
ÀÏÓã
2012-12-12 08:45:46 UTC
Permalink
text = urllib2.urlopen(url_addr).read()

如果你这把一行分开来,你会发现最耗时的地方在于最后面的read()

【 在 citihome () 的大作中提到: 】
: 主要是保存这一步,我在通过urllib2得到文件流后,以写文件的方式将之往文件中送
: 这种速度很慢[一分钟处理四五个文件]
: 有没有其他的好方法
: ...................

--
灭绝人性啊


※ 来源:・水木社区 newsmth.net・[FROM: 117.24.144.*]
c***@newsmth.net-SPAM.no
2012-12-12 09:04:01 UTC
Permalink
最费时的是在数据传输环节
然后我问怎么改进?(速度慢得有点让人受不了)
【 在 hgoldfish 的大作中提到: 】
: text = urllib2.urlopen(url_addr).read()
: 如果你这把一行分开来,你会发现最耗时的地方在于最后面的read()
:

--

※ 来源:・水木社区 http://www.newsmth.net・[FROM: 60.191.2.*]
ÀÏÓã
2012-12-12 09:10:50 UTC
Permalink
看看服务器支不支持gzip,你在HTTP头部加入Accept头看看,怎么加看一下urllib的文档,google一下应该有。

多进程下载如果能加快速度的话,你用multiprocessing.Pool.map看看。

【 在 citihome () 的大作中提到: 】
: 最费时的是在数据传输环节
: 然后我问怎么改进?(速度慢得有点让人受不了)


--
灭绝人性啊


※ 来源:・水木社区 newsmth.net・[FROM: 117.24.144.*]
继续阅读narkive:
Loading...