在普通硬件上,MongoDB能够实现每秒插入80000记录。
时间事件样本数据如下:
{ "_id" : ObjectId("5298a5a03b3f4220588fe57c"), "created_on" : ISODate("2012-04-22T01:09:53Z"), "value" : 0.1647851116706831 }
|
当我们想要得到的随机值,我们认为使用JavaScript或Python生成它们的(我们可以在Java中尝试过,但我们想尽可能快地把它写)。我们不知道哪一个会比较快,所以我们决定对它们进行测试。我们的第一次尝试是通过在MongoDB shell中运行JavaScript:
var minDate = new Date(2012, 0, 1, 0, 0, 0, 0); var maxDate = new Date(2013, 0, 1, 0, 0, 0, 0); var delta = maxDate.getTime() - minDate.getTime(); var job_id = arg2; var documentNumber = arg1; var batchNumber = 5 * 1000; var job_name = 'Job#' + job_id var start = new Date(); var batchDocuments = new Array(); var index = 0; while(index < documentNumber) { var date = new Date(minDate.getTime() + Math.random() * delta); var value = Math.random(); var document = { created_on : date, value : value }; batchDocuments[index % batchNumber] = document; if((index + 1) % batchNumber == 0) { db.randomData.insert(batchDocuments); } index++; if(index % 100000 == 0) { print(job_name + ' inserted ' + index + ' documents.'); } } print(job_name + ' inserted ' + documentNumber + ' in ' + (new Date() - start)/1000.0 + 's');
|
运行后结果是:
mongo random --eval "var arg1=50000000;arg2=1" create_random.js
Job#1 inserted 100000 documents.
Job#1 inserted 200000 documents.
Job#1 inserted 300000 documents.
...
Job#1 inserted 49900000 documents.
Job#1 inserted 50000000 in 566.294s时间花费566.294s,平均88293 inserts/second。
而使用Python脚步后的输出时间:
python create_random.py 50000000
Job#1 inserted 100000 documents.
Job#1 inserted 200000 documents.
Job#1 inserted 300000 documents.
...
Job#1 inserted 49900000 documents.
Job#1 inserted 50000000 in 1713.501 s
时间花费1713.501s, 平均比javascript慢29180 inserts/second。但是不用泄气,我们可以让Python 利用四核的潜力,每个CPU运行一个。
import sys import pymongo import time import subprocess import multiprocessing from datetime import datetime cpu_count = multiprocessing.cpu_count() # obtain a mongo connection connection = pymongo.Connection('mongodb://localhost', safe=True) # obtain a handle to the random database db = connection.random collection = db.randomData total_documents_count = 50 * 1000 * 1000; inserted_documents_count = 0 sleep_seconds = 1 sleep_count = 0 for i in range(cpu_count): documents_number = str(total_documents_count/cpu_count) print documents_number subprocess.Popen(['python', '../create_random.py', documents_number, str(i)]) start = datetime.now(); while (inserted_documents_count < total_documents_count) is True: inserted_documents_count = collection.count() if (sleep_count > 0 and sleep_count % 60 == 0): print 'Inserted ', inserted_documents_count, ' documents.' if (inserted_documents_count < total_documents_count): sleep_count += 1 time.sleep(sleep_seconds) print 'Inserting ', total_documents_count, ' took ', (datetime.now() - start).total_seconds(), 's'
|
这次运行结果是:
python create_random_parallel.py
Job#3 inserted 100000 documents.
Job#2 inserted 100000 documents.
Job#0 inserted 100000 documents.
Job#1 inserted 100000 documents.
Job#3 inserted 200000 documents.
...
Job#2 inserted 12500000 in 571.819 s
Job#0 inserted 12400000 documents.
Job#3 inserted 10800000 documents.
Job#1 inserted 12400000 documents.
Job#0 inserted 12500000 documents.
Job#0 inserted 12500000 in 577.061 s
Job#3 inserted 10900000 documents.
Job#1 inserted 12500000 documents.
Job#1 inserted 12500000 in 578.427 s
Job#3 inserted 11000000 documents.
...
Job#3 inserted 12500000 in 623.999 s
Inserting 50000000 took 624.655 s
平均80044 inserts/seconds,成绩符合我们的预期。但是比JS还慢一些,下面我们使用子进程再优化:
for i in range(cpu_count): documents_number = str(total_documents_count/cpu_count) script_name = 'create_random_' + str(i + 1) + '.bat' script_file = open(script_name, 'w') script_file.write('mongo random --eval "var arg1=' + documents_number +';arg2=' + str(i + 1) +'" ../create_random.js'); script_file.close() subprocess.Popen(script_name)
|
最后得到83437 inserts/second成绩,但是还是没有击败Javascript的88293 inserts/second成绩。
测试网站代码:Github