YCSB-KVTracer integrates YCSB and KVTracer to achieve the function of generating customized Trace, the main code repositories used are as follows:
- load
bin/ycsb.bat load kvtracer -P workloads/workload -p "kvtracer.tracefile=trace_load.txt" -p "kvtracer.keymapfile=trace_keys.txt"
- run
bin/ycsb.bat run kvtracer -P workloads/workload -p "kvtracer.tracefile=trace_run.txt" -p "kvtracer.keymapfile=trace_keys.txt"
bash ycsb_run.sh
The ycsb_run.sh
file needs to be placed in the same directory as the workloads
folder, and the TARGET_WORKLOAD_DIR
can be modified as needed, with the following contents:
TARGET_WORKLOAD_DIR="workloads/**"
# Iterate through each subdirectory of the workloads folder
for workload_subdir in $TARGET_WORKLOAD_DIR; do
if [ -d "$workload_subdir" ]; then
echo "process dir: $workload_subdir"
# Generate trace files with YCSB
./bin/ycsb.sh load kvtracer -P "$workload_subdir/workload" -p "kvtracer.tracefile=trace_load.txt" -p "kvtracer.keymapfile=trace_keys.txt"
./bin/ycsb.sh run kvtracer -P "$workload_subdir/workload" -p "kvtracer.tracefile=trace_run.txt" -p "kvtracer.keymapfile=trace_keys.txt"
# Move the trace_run.txt file to the workload directory
sudo mv "trace_run.txt" "$workload_subdir/"
fi
done
Note: This file is for Linux and Mac only, for Windows, change . /bin/ycsb.sh
to . /bin/ycsb.bat
The following steps have been performed on this repository and are available for immediate use. If you are interested in learning about the configuration process or would like to configure it yourself, please refer to the following process.
- OS: Ubuntu / Windows / Mac
- Requirements: Maven 3
Windows and Mac systems in the modification of the file will be opened with Notepad to modify the file can be.
git clone /~https://github.com/brianfrankcooper/YCSB.git
cd YCSB/
git clone /~https://github.com/seekstar/kvtracer.git
sudo vim pom.xml
As indicated in the +
line.
<properties>
...
<redis.version>2.0.0</redis.version>
+ <kvtracer.version>0.1.0</kvtracer.version>
...
</properties>
<modules>
...
<module>redis</module>
+ <module>kvtracer</module>
...
</modules>
sudo vim bin/ycsb
As indicated in the +
line.
DATABASES = {
...
"redis" : "site.ycsb.db.RedisClient",
+ "kvtracer" : "site.ycsb.db.KVTracerClient",
...
}
sudo vim bin/bindings.properties
As indicated in the +
line.
redis:site.ycsb.db.RedisClient
+ kvtracer:site.ycsb.db.KVTracerClient
mvn -pl kvtracer -am clean package
Please configure Maven 3 in advance as required.
bin/ycsb.sh load kvtracer -P workloads/workloada -p "kvtracer.tracefile=tracea_load.txt" -p "kvtracer.keymapfile=tracea_keys.txt"
and
bin/ycsb.sh run kvtracer -P workloads/workloada -p "kvtracer.tracefile=tracea_run.txt" -p "kvtracer.keymapfile=tracea_keys.txt"
bin/ycsb.bat load kvtracer -P workloads/workloada -p "kvtracer.tracefile=tracea_load.txt" -p "kvtracer.keymapfile=tracea_keys.txt"
and
bin/ycsb.bat run kvtracer -P workloads/workloada -p "kvtracer.tracefile=tracea_run.txt" -p "kvtracer.keymapfile=tracea_keys.txt"
The workload
file can be customized in the workloads/
directory to generate different traces on demand. the important parameters are explained next.
In YCSB\core\src\main\java\site\ycsb\generator\ZipfianGenerator.java
, the ZIPFIAN_CONSTANT
parameter is a key configuration item that defines the degree of skewness of the key distribution in the load
The Zipfian distribution is a probability distribution that describes how some items in data are accessed much more frequently than others. In the Zipfian distribution, the frequency of the nth most frequent element is proportional to 1/n.
ZIPFIAN_CONSTANT (Zipfian constant) is used to adjust the degree of skewness of this distribution:
- As
ZIPFIAN_CONSTANT
approaches 0, the distribution approaches a uniform distribution, i.e., all items are visited with roughly equal probability. - As
ZIPFIAN_CONSTANT
increases, the distribution becomes more skewed. Smaller key values are more likely to be accessed more frequently, while most other key values are accessed less frequently.
A typical ZIPFIAN_CONSTANT
value is 0.99, which is a reasonable approximation in many real-world scenarios, such as web page accesses, city population distributions, and so on.
In the workload
file, the recordcount
parameter specifies the number of records to be inserted during the load phase, or the number of records that will already exist in the table before the run phase begins. If recordcount
is set to 1000000, this means that one million records will be operated on in the database.
In the workload
file, the operationcount
parameter defines the total number of operations that will be performed during the run phase.
If operationcount
is set to 150000, it means that 150000 database operations (e.g., read, update, insert, etc.) will be performed during the test.
In the workload
file, the requestdistribution
parameter defines how requests for keyspace are distributed, which is categorized as zipfian, uniform, and latest.