Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Execute on cloudlab node: r650 or r6525 (with ConnectX-5 network cards) #3

Open
J-XZ opened this issue May 3, 2024 · 0 comments

Comments

@J-XZ
Copy link

J-XZ commented May 3, 2024

Issue 1:

NovaConfig::config->max_stoc_file_size = FLAGS_max_stoc_file_size_mb * 1024;

Is it necessary to modify here to * 1024 twice, rather than once?

Issue 2:
Can this project run on CloudLab nodes with updated versions of RDMA network cards, such as r650 or r6525 (using ConnectX-5 network cards)? I attempted to run this project on a cluster composed of five nodes, where the commands executed on each of the five nodes were as follows:

node0:

stdbuf --output=0 --error=0 ./nova_server_main_debug --ltc_migration_policy=immediate --enable_range_index=false --num_migration_threads=32 --num_sstable_replicas=1 --level=6 --l0_start_compaction_mb=4096 --subrange_no_flush_num_keys=100 --enable_detailed_db_stats=false --major_compaction_type=sc --major_compaction_max_parallism=32 --major_compaction_max_tables_in_a_set=20 --enable_flush_multiple_memtables=true --recover_dbs=false --num_recovery_threads=32  --sampling_ratio=1 --zipfian_dist_ref_counts=/tmp/zipfian --client_access_pattern=zipfian  --memtable_type=static_partition --enable_subrange=true --num_log_replicas=1 --log_record_mode=none --scatter_policy=power_of_two --number_of_ltcs=2 --enable_lookup_index=true --l0_stop_write_mb=10240 --num_memtable_partitions=64 --num_memtables=256 --num_rdma_bg_workers=16 --db_path=/db/nova-db-10000-1024 --num_storage_workers=8 --stoc_files_path=/db/stoc_files --max_stoc_file_size_mb=4 --sstable_size_mb=2 --ltc_num_stocs_scatter_data_blocks=1 --all_servers=node0:10210,node1:10210,node2:10210 --server_id=0 --mem_pool_size_gb=32 --use_fixed_value_size=1024 --ltc_config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config --ltc_num_client_workers=8 --num_rdma_fg_workers=8 --num_compaction_workers=32 --block_cache_mb=0 --row_cache_mb=0 --memtable_size_mb=4 --cc_log_buf_size=1024 --rdma_port=20820 --rdma_max_msg_size=262144 --rdma_max_num_sends=32 --rdma_doorbell_batch_size=8 --enable_rdma=true --enable_load_data=false --use_local_disk=false

node1:

stdbuf --output=0 --error=0 ./nova_server_main_debug --ltc_migration_policy=immediate --enable_range_index=false --num_migration_threads=32 --num_sstable_replicas=1 --level=6 --l0_start_compaction_mb=4096 --subrange_no_flush_num_keys=100 --enable_detailed_db_stats=false --major_compaction_type=sc --major_compaction_max_parallism=32 --major_compaction_max_tables_in_a_set=20 --enable_flush_multiple_memtables=true --recover_dbs=false --num_recovery_threads=32  --sampling_ratio=1 --zipfian_dist_ref_counts=/tmp/zipfian --client_access_pattern=zipfian  --memtable_type=static_partition --enable_subrange=true --num_log_replicas=1 --log_record_mode=none --scatter_policy=power_of_two --number_of_ltcs=2 --enable_lookup_index=true --l0_stop_write_mb=10240 --num_memtable_partitions=64 --num_memtables=256 --num_rdma_bg_workers=16 --db_path=/db/nova-db-10000-1024 --num_storage_workers=8 --stoc_files_path=/db/stoc_files --max_stoc_file_size_mb=4 --sstable_size_mb=2 --ltc_num_stocs_scatter_data_blocks=1 --all_servers=node0:10210,node1:10210,node2:10210 --server_id=1 --mem_pool_size_gb=32 --use_fixed_value_size=1024 --ltc_config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config --ltc_num_client_workers=8 --num_rdma_fg_workers=8 --num_compaction_workers=32 --block_cache_mb=0 --row_cache_mb=0 --memtable_size_mb=4 --cc_log_buf_size=1024 --rdma_port=20820 --rdma_max_msg_size=262144 --rdma_max_num_sends=32 --rdma_doorbell_batch_size=8 --enable_rdma=true --enable_load_data=false --use_local_disk=false

node2:

stdbuf --output=0 --error=0 ./nova_server_main_debug --ltc_migration_policy=immediate --enable_range_index=false --num_migration_threads=32 --num_sstable_replicas=1 --level=6 --l0_start_compaction_mb=4096 --subrange_no_flush_num_keys=100 --enable_detailed_db_stats=false --major_compaction_type=sc --major_compaction_max_parallism=32 --major_compaction_max_tables_in_a_set=20 --enable_flush_multiple_memtables=true --recover_dbs=false --num_recovery_threads=32  --sampling_ratio=1 --zipfian_dist_ref_counts=/tmp/zipfian --client_access_pattern=zipfian  --memtable_type=static_partition --enable_subrange=true --num_log_replicas=1 --log_record_mode=none --scatter_policy=power_of_two --number_of_ltcs=2 --enable_lookup_index=true --l0_stop_write_mb=10240 --num_memtable_partitions=64 --num_memtables=256 --num_rdma_bg_workers=16 --db_path=/db/nova-db-10000-1024 --num_storage_workers=8 --stoc_files_path=/db/stoc_files --max_stoc_file_size_mb=4 --sstable_size_mb=2 --ltc_num_stocs_scatter_data_blocks=1 --all_servers=node0:10210,node1:10210,node2:10210 --server_id=2 --mem_pool_size_gb=32 --use_fixed_value_size=1024 --ltc_config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config --ltc_num_client_workers=8 --num_rdma_fg_workers=8 --num_compaction_workers=32 --block_cache_mb=0 --row_cache_mb=0 --memtable_size_mb=4 --cc_log_buf_size=1024 --rdma_port=20820 --rdma_max_msg_size=262144 --rdma_max_num_sends=32 --rdma_doorbell_batch_size=8 --enable_rdma=true --enable_load_data=false --use_local_disk=false

node3:

java -cp /tmp/YCSB-Nova/jdbc/conf:/tmp/YCSB-Nova/jdbc/target/jdbc-binding-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jta_1.1_spec/1.1.1/geronimo-jta_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/users/ruixuan/.m2/repository/net/sourceforge/serp/serp/1.13.1/serp-1.13.1.jar:/tmp/YCSB-Nova/core/target/core-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-jdbc/2.1.1/openjpa-jdbc-2.1.1.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jms_1.1_spec/1.1.1/geronimo-jms_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-kernel/2.1.1/openjpa-kernel-2.1.1.jar:/users/ruixuan/.m2/repository/net/spy/spymemcached/2.11.4/spymemcached-2.11.4.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/users/ruixuan/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/users/ruixuan/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-lib/2.1.1/openjpa-lib-2.1.1.jar:/users/ruixuan/.m2/repository/commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar:/users/ruixuan/.m2/repository/mysql/mysql-connector-java/5.1.44/mysql-connector-java-5.1.44.jar:/users/ruixuan/.m2/repository/com/google/guava/guava/21.0/guava-21.0.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.NovaDBClient -P /users/ruixuan/NovaLSM/workloads/workloadw -P /users/ruixuan/NovaLSM/workloads/db.properties -s -threads 16 -p nova_servers=node0:10210,node1:10210 -p debug=false -p partition=range -p stringkey=false -p insertorder=ordered -p recordcount=10000 -p maxexecutiontime=1200 -p requestdistribution=zipfian -p valuesize=1024 -p config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config -p operationcount=0 -p cardinality=10 -p zipfianconstant=0.99 -p offset=0

node4:

java -cp /tmp/YCSB-Nova/jdbc/conf:/tmp/YCSB-Nova/jdbc/target/jdbc-binding-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jta_1.1_spec/1.1.1/geronimo-jta_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/users/ruixuan/.m2/repository/net/sourceforge/serp/serp/1.13.1/serp-1.13.1.jar:/tmp/YCSB-Nova/core/target/core-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-jdbc/2.1.1/openjpa-jdbc-2.1.1.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jms_1.1_spec/1.1.1/geronimo-jms_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-kernel/2.1.1/openjpa-kernel-2.1.1.jar:/users/ruixuan/.m2/repository/net/spy/spymemcached/2.11.4/spymemcached-2.11.4.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/users/ruixuan/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/users/ruixuan/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-lib/2.1.1/openjpa-lib-2.1.1.jar:/users/ruixuan/.m2/repository/commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar:/users/ruixuan/.m2/repository/mysql/mysql-connector-java/5.1.44/mysql-connector-java-5.1.44.jar:/users/ruixuan/.m2/repository/com/google/guava/guava/21.0/guava-21.0.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.NovaDBClient -P /users/ruixuan/NovaLSM/workloads/workloadw -P /users/ruixuan/NovaLSM/workloads/db.properties -s -threads 16 -p nova_servers=node0:10210,node1:10210 -p debug=false -p partition=range -p stringkey=false -p insertorder=ordered -p recordcount=10000 -p maxexecutiontime=1200 -p requestdistribution=zipfian -p valuesize=1024 -p config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config -p operationcount=0 -p cardinality=10 -p zipfianconstant=0.99 -p offset=0

The objective is to use node0 and node1 as ltc, node2 as stoc, and node3 and node4 as clients. Am I correct in my understanding of the commands? Additionally, when I run in the manner described above, node0 encounters a segmentation fault. The error occurred possibly during the execution of the first flush or compaction, after successfully inserting approximately 16348 key-value pairs. The error location is at

NOVA_ASSERT(wcs_[i].status == IBV_WC_SUCCESS)
where the value of wcs_[i].status is IBV_WC_RETRY_EXC_ERR. Does this imply that RDMA is not properly configured?

Thank you for your time and assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant