会编程的孩子是个宝

4月 25th, 2009

昨天晚上熬夜帮老板敲文章,早上5点才睡,中午醒来一看手机,我的乖乖,又有任务了。我就不知道ICML2008的网站是要干啥,他把所有的会议论文打个压缩包,158个PDF全用序号命名,而我的任务是把它们重命名为“文章标题.PDF”。

正常情况下,PDF的文字是不能复制的(某些软件有Copy功能其实是基于OCR),如果要保证准确,只能逐个照着敲,就算要复制也要挨个打开-选中-复制-关闭-选中-重命名-粘贴-确定,我当时就晕厥了,我要睡觉啊……我到ICML的网站上,想找一个带序号的文章摘要列表,天可怜见,我找到了这样的格式:

paper ID: 158
Localized Multiple Kernel Learning
Mehmet Gonen and Ethem Alpaydin
Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. In this paper, we develop a localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally. The localizing gating model and the kernel-based classifier are coupled and their optimization is done in a joint manner. Empirical results on ten benchmark and two bioinformatics data sets validate the applicability of our approach. LMKL achieves statistically similar accuracy results compared with MKL by storing fewer support vectors. LMKL can also combine multiple copies of the same kernel function localized in different parts. For example, LMKL with multiple linear kernels gives better accuracy results than using a single linear kernel on bioinformatics data sets.

哈哈哈,和谐社会把我给救了啊,赶紧写个小程序,把这个网页上面所有的序号和对应标题读出来,“ID:”后面的就是序号,接下来的两个“\n”之间就是对应的标题,代码写的太难看,我就不发了,注意读标题的时候要把冒号和斜杠省略,这些字符不能作为文件名。抓出来之后按这样输出:

1
out<<"rename "<<序号<<".pdf \""<<标题<<".pdf\""<<endl;

就会得到
rename 111.pdf “Preconditioned Temporal Difference Learning.pdf”
rename 113.pdf “The GroupLASSO for Generalized Linear Models: Uniqueness … .pdf”
rename 121.pdf “Autonomous Geometric Precision Error Estimation in … .pdf”
rename 129.pdf “Dirichlet Component Analysis: Feature Extraction for … .pdf”
rename 130.pdf “Adaptive p-Posterior Mixture-Model Kernels for Multiple … .pdf”
如此的159行……

然后打开一个命令行窗口,把这段复制一下粘贴进去,大功告成。原本需要N个小时的事情,编个小程序5分钟搞定继续睡觉去。估计各位Geek们也都会这样的小伎俩,哈哈,用的时候是不是也有这样的感觉:会编程是孩子是个宝,不会编程的孩子是棵草啊……