在MongoDB下使用JS和Python的性能比较

在普通硬件上,MongoDB能够实现每秒插入80000记录。

时间事件样本数据如下:


{
"_id" : ObjectId("5298a5a03b3f4220588fe57c"),
"created_on" : ISODate("2012-04-22T01:09:53Z"),
"value" : 0.1647851116706831
}

当我们想要得到的随机值,我们认为使用JavaScript或Python生成它们的(我们可以在Java中尝试过,但我们想尽可能快地把它写)。我们不知道哪一个会比较快,所以我们决定对它们进行测试。我们的第一次尝试是通过在MongoDB shell中运行JavaScript:



var minDate = new Date(2012, 0, 1, 0, 0, 0, 0);
var maxDate = new Date(2013, 0, 1, 0, 0, 0, 0);
var delta = maxDate.getTime() - minDate.getTime();

var job_id = arg2;

var documentNumber = arg1;
var batchNumber = 5 * 1000;

var job_name = 'Job#' + job_id
var start = new Date();

var batchDocuments = new Array();
var index = 0;

while(index < documentNumber) {
var date = new Date(minDate.getTime() + Math.random() * delta);
var value = Math.random();
var document = {
created_on : date,
value : value
};
batchDocuments[index % batchNumber] = document;
if((index + 1) % batchNumber == 0) {
db.randomData.insert(batchDocuments);
}
index++;
if(index % 100000 == 0) {
print(job_name + ' inserted ' + index + ' documents.');
}
}
print(job_name + ' inserted ' + documentNumber + ' in ' + (new Date() - start)/1000.0 + 's');

运行后结果是:
mongo random --eval "var arg1=50000000;arg2=1" create_random.js
Job#1 inserted 100000 documents.
Job#1 inserted 200000 documents.
Job#1 inserted 300000 documents.
...
Job#1 inserted 49900000 documents.
Job#1 inserted 50000000 in 566.294s

时间花费566.294s,平均88293 inserts/second。

而使用Python脚步后的输出时间:

python create_random.py 50000000
Job#1 inserted 100000 documents.
Job#1 inserted 200000 documents.
Job#1 inserted 300000 documents.
...
Job#1 inserted 49900000 documents.
Job#1 inserted 50000000 in 1713.501 s

时间花费1713.501s, 平均比javascript慢29180 inserts/second。但是不用泄气,我们可以让Python 利用四核的潜力,每个CPU运行一个。


import sys
import pymongo
import time
import subprocess
import multiprocessing

from datetime import datetime

cpu_count = multiprocessing.cpu_count()

# obtain a mongo connection
connection = pymongo.Connection('mongodb://localhost', safe=True)

# obtain a handle to the random database
db = connection.random
collection = db.randomData

total_documents_count = 50 * 1000 * 1000;
inserted_documents_count = 0
sleep_seconds = 1
sleep_count = 0

for i in range(cpu_count):
documents_number = str(total_documents_count/cpu_count)
print documents_number
subprocess.Popen(['python', '../create_random.py', documents_number, str(i)])

start = datetime.now();

while (inserted_documents_count < total_documents_count) is True:
inserted_documents_count = collection.count()
if (sleep_count > 0 and sleep_count % 60 == 0):
print 'Inserted ', inserted_documents_count, ' documents.'
if (inserted_documents_count < total_documents_count):
sleep_count += 1
time.sleep(sleep_seconds)

print 'Inserting ', total_documents_count, ' took ', (datetime.now() - start).total_seconds(), 's'

这次运行结果是:
python create_random_parallel.py
Job#3 inserted 100000 documents.
Job#2 inserted 100000 documents.
Job#0 inserted 100000 documents.
Job#1 inserted 100000 documents.
Job#3 inserted 200000 documents.
...
Job#2 inserted 12500000 in 571.819 s
Job#0 inserted 12400000 documents.
Job#3 inserted 10800000 documents.
Job#1 inserted 12400000 documents.
Job#0 inserted 12500000 documents.
Job#0 inserted 12500000 in 577.061 s
Job#3 inserted 10900000 documents.
Job#1 inserted 12500000 documents.
Job#1 inserted 12500000 in 578.427 s
Job#3 inserted 11000000 documents.
...
Job#3 inserted 12500000 in 623.999 s
Inserting 50000000 took 624.655 s

平均80044 inserts/seconds,成绩符合我们的预期。但是比JS还慢一些,下面我们使用子进程再优化:



for i in range(cpu_count):
documents_number = str(total_documents_count/cpu_count)
script_name = 'create_random_' + str(i + 1) + '.bat'
script_file = open(script_name, 'w')
script_file.write('mongo random --eval "var arg1=' + documents_number +';arg2=' + str(i + 1) +'" ../create_random.js');
script_file.close()
subprocess.Popen(script_name)

最后得到83437 inserts/second成绩,但是还是没有击败Javascript的88293 inserts/second成绩。

测试网站代码:Github