AMPLab Big Data Benchmark
参考 https://amplab.cs.berkeley.edu/benchmark/
需要您在Amazon注册一个免费的账号。注册时需要您提供信用卡、邮箱、电话等信息。之后可以在Amazon AWS Console获取新的访问密钥
在控制台运行以下命令:
$ sudo apt-get install s3cmd $ mkdir tiny; cd tiny; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/tiny/ . $ cd .. $ mkdir 1node; cd 1node; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/1node/ . $ cd .. $ mkdir 5nodes; cd 5nodes; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/5nodes/ . $ cd ..
在ClickHouse运行如下查询:
CREATE TABLE rankings_tiny ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_tiny ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); CREATE TABLE rankings_1node ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_1node ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); CREATE TABLE rankings_5nodes_on_single ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_5nodes_on_single ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192);
回到控制台运行如下命令:
$ for i in tiny/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_tiny FORMAT CSV"; done $ for i in tiny/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_tiny FORMAT CSV"; done $ for i in 1node/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_1node FORMAT CSV"; done $ for i in 1node/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_1node FORMAT CSV"; done $ for i in 5nodes/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_5nodes_on_single FORMAT CSV"; done $ for i in 5nodes/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_5nodes_on_single FORMAT CSV"; done
简单的查询示例:
SELECT pageURL, pageRank FROM rankings_1node WHERE pageRank > 1000 SELECT substring(sourceIP, 1, 8), sum(adRevenue) FROM uservisits_1node GROUP BY substring(sourceIP, 1, 8) SELECT sourceIP, sum(adRevenue) AS totalRevenue, avg(pageRank) AS pageRank FROM rankings_1node ALL INNER JOIN ( SELECT sourceIP, destinationURL AS pageURL, adRevenue FROM uservisits_1node WHERE (visitDate > '1980-01-01') AND (visitDate < '1980-04-01') ) USING pageURL GROUP BY sourceIP ORDER BY totalRevenue DESC LIMIT 1
MgBench is a new analytical benchmark for machine-generated log data, Andrew Crott ...