From 751d0a198159c59858fff37c8dde831e01130d43 Mon Sep 17 00:00:00 2001 From: "Field G. Van Zee" Date: Thu, 26 Oct 2023 19:26:45 -0500 Subject: [PATCH] Add check to disable armsve on Apple M1. - (cherry picked from commit c803b03e52a7a6997a8d304a8cfa9acf7c1c555b) Fix auto-detection of firestorm (Apple M1). - (cherry picked from commit 2dd692b710b6a9889f7ebdd7934a2108be5c5530) Added Discord documentation (#677) Details: - Added a docs/Discord.md markdown document that walks the reader through creating a Discord account, obtaining the invite link, and using the link to join the BLIS Discord server. - Updated README.md to reference the new Discord.md document in multiple places, including via the official Discord logo (used with explicit permission from representatives at Discord Inc.). - (cherry picked from commit 88105dbecf0f9dfbfa30215743346e8bd6afb971) Shuffled checked properties in bli_l3_check.c. (#676) Details: - Added certain checks for matrix structure to the level-3 operations' _check() functions, and slightly reorganized existing checks. - (cherry picked from commit 23f5b8df3e802a27bacd92571184ec57bbdfa646) CREDITS file update. Details: - This attribution was intended to go in PR #647. - (cherry picked from commit 9453e0f163503f64a290256b4be53d8882224863) Reinstate sanity check in bli_pool_finalize. (#671) Details: - Added a reinit argument to bli_pool_finalize(). This bool will signal whether or not the function is being called from bli_pool_reinit(). If it is not being called from _reinit(), we can safely check to confirm that .top_index == 0 (i.e., all blocks have been checked in). But if it *is* being called from _reinit(), then that check will be skipped since one of the predicted use cases for bli_pool_reinit() anticipates that some blocks are (probably) checked out when the pool_t is reinitialized. - Updated existing invocations of bli_pool_finalize() to pass in either FALSE (from bli_apool_free_block() or bli_pba_finalize_pools()) or TRUE (from bli_pool_reinit()) for the new reinit argument. - (cherry picked from commit 76a23bd8c33e161221891935a489df9a9fb9c8c0) Fix some bugs in bli_pool.c (#670) Details: - Add a check for premature pool exhaustion when checking in blocks via bli_pool_checkin_block(). This detects "double-free" and other bad conditions that don't necessarily result in a segfault. - Make sure to copy all block pointers when growing the pool size. Previously, checked-out block pointers (which are guaranteed to be set to NULL) were not being copied, leading to the presence of uninitialized data. - (cherry picked from commit 63470b49e3b9b15e00a8f666e86ccd70c6005fe9) Add AddressSanitizer (-fsanitize=address) option. (#669) Details: - Added support for AddressSanitizer (ASan), a compiler-integrated memory error detector. The option (disabled by default) enables compiling and linking with the -fsanitize=address flag supported by clang, gcc, and probably others. This flag is employed during compilation of all BLIS source files *except* for optimized kernels, which are exempted because ASan usually requires an extra register, which violates the constraints for many gemm microkernels. - Minor whitespace, comment, ordering, and configure help text updates. - (cherry picked from commit 42d0e66318b186d25eeb215b40ce26115401ed8b) Add consistent NaN/Inf handling in sumsqv. (#668) Details: - Changed sumsqv implementation as follows: - If there is a NaN (either real or imaginary), then return a sum of NaN and unit scale. - Else, if there is an Inf (either real or imaginary), then return a sum of +Inf and unit scale. - Otherwise behave as normal. - (cherry picked from commit b861c71b50c6d48cb07282f44aa9dddffc1f1b3f) Parameterized test/3 drivers via command line args. (#667) Details: - Rewrote the drivers in test/3, the Makefile, and the runme.sh script so that most of the important parameters, including parameter combo, datatype, storage combo, induced method, problem size range, dimension bindings, number of repeats, and alpha/beta values can be passed in via command line arguments. (Previously, most of these parameters were hard-coded into the driver source, except a few that were hard-coded into the Makefile.) If no argument is given for any particular option, it will be assigned a sane default. Either way, the values employed at runtime will be printed to stdout before the performance data in a section that is commented out with '%' characters (which is used by matlab and octave for comments), unless the -q option is given, in which case the driver will proceed quietly and output only performance data. Each driver also provides extensive help via the -h option, with the help text tailored for the operation in question (e.g. gemm, hemm, herk, etc.). In this help text, the driver reminds the user which implementation it was linked to (e.g. blis, openblas, vendor, eigen). Thanks to Jeff Diamond for suggesting this CLI-based reimagining of the test/3 drivers. - In the test/3 drivers: converted cpp macro string constants, as well as two string literals (for the opname and pc_str) used in each test driver, to global (or static) const char* strings, and replaced the use of strncpy() for storing the results of the command line argument parsing with pointer copies from the corresponding strings in argv. This works because the argv array is guaranteed by the C99 standard to persist throughout the life of the program. This new approach uses less storage and executes faster. Thanks to Minh Quan Ho for recommending this change. - Renamed the IMP_STR cpp macro that gets defined on the command line, via the test/3/Makefile, to IMPL_STR. - Updated runme.sh to set the problem size ranges for single-threaded and multithreaded execution independently from one another, as well as on a per-system basis. - Added a 'quiet' variable to runme.sh that can easily toggle quiet mode for the test drivers' output. - Very minor typecast fix in call to bli_getopt() in bli_utils.c. - In bli_getopt(), changed the nextchar variable from being a local static variable to a field of the getopt_t state struct. (Not sure why it was ever declared static to begin with.) - Other minor changes to bli_getopt() to accommodate the rewritten test drivers' command line parsing needs. - (cherry picked from commit ee81efc7887374c974a78bfb3e0865776b2f97a8) --- CREDITS | 1 + Makefile | 1 + README.md | 31 +- build/config.mk.in | 3 + common.mk | 34 +- configure | 82 +++- docs/Discord.md | 115 ++++++ docs/images/discord.svg | 23 ++ frame/3/bli_l3_check.c | 179 ++++++--- frame/base/bli_apool.c | 2 +- frame/base/bli_cpuid.c | 19 +- frame/base/bli_getopt.c | 54 ++- frame/base/bli_getopt.h | 1 + frame/base/bli_pba.c | 6 +- frame/base/bli_pool.c | 35 +- frame/base/bli_pool.h | 3 +- frame/util/bli_util_unb_var1.c | 56 ++- test/3/Makefile | 304 +++++---------- test/3/old/runme.sh | 277 +++++++++++++ test/3/runme.sh | 206 ++++++---- test/3/test_gemm.c | 335 ++++++++-------- test/3/test_hemm.c | 194 ++++++---- test/3/test_herk.c | 192 +++++---- test/3/test_trmm.c | 203 +++++----- test/3/test_trsm.c | 204 +++++----- test/3/test_utils.c | 684 +++++++++++++++++++++++++++++++++ test/3/test_utils.h | 142 +++++++ 27 files changed, 2464 insertions(+), 922 deletions(-) create mode 100644 docs/Discord.md create mode 100644 docs/images/discord.svg create mode 100755 test/3/old/runme.sh create mode 100644 test/3/test_utils.c create mode 100644 test/3/test_utils.h diff --git a/CREDITS b/CREDITS index 476d9c929c..448db24cc8 100644 --- a/CREDITS +++ b/CREDITS @@ -37,6 +37,7 @@ but many others have contributed code and feedback, including Roman Gareev @gareevroman Richard Goldschmidt @SuperFluffy Chris Goodyer + Alexander Grund @Flamefire John Gunnels @jagunnels (IBM, T.J. Watson Research Center) Ali Emre Gülcü @Lephar Jeff Hammond @jeffhammond (Intel) diff --git a/Makefile b/Makefile index 5c4a32b59a..04cdca4214 100644 --- a/Makefile +++ b/Makefile @@ -1161,6 +1161,7 @@ showconfig: check-env @echo "install includedir: $(INSTALL_INCDIR)" @echo "install sharedir: $(INSTALL_SHAREDIR)" @echo "debugging status: $(DEBUG_TYPE)" + @echo "enable AddressSanitizer? $(MK_ENABLE_ASAN)" @echo "enabled threading model(s): $(THREADING_MODEL)" @echo "enable BLAS API? $(MK_ENABLE_BLAS)" @echo "enable CBLAS API? $(MK_ENABLE_CBLAS)" diff --git a/README.md b/README.md index 7996cb6766..012861366c 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,8 @@ [![Build Status](https://api.travis-ci.com/flame/blis.svg?branch=master)](https://app.travis-ci.com/github/flame/blis) [![Build Status](https://ci.appveyor.com/api/projects/status/github/flame/blis?branch=master&svg=true)](https://ci.appveyor.com/project/shpc/blis/branch/master) +[Discord logo](docs/Discord.md) + Contents -------- @@ -97,6 +99,17 @@ all of which are available for free via the [edX platform](http://www.edx.org/). What's New ---------- + * **Join us on Discord!** In 2021, we soft-launched our [Discord](https://discord.com/) +server by privately inviting current and former collaborators, attendees of our BLIS +Retreat, as well as other participants within the BLIS ecosystem. We've been thrilled by +the results thus far, and are happy to announce that our new community is now open to +the broader public! If you'd like to hang out with other BLIS users and developers, +ask a question, discuss future features, or just say hello, please feel free to join us! +We've put together a [step-by-step guide](docs/Discord.md) for creating an account and +joining our cozy enclave. We even have a monthly "BLIS happy hour" event where people +can casually come together for a video chat, Q&A, brainstorm session, or whatever it +happens to unfold into! + * **Addons feature now available!** Have you ever wanted to quickly extend BLIS's operation support or define new custom BLIS APIs for your application, but were unsure of how to add your source code to BLIS? Do you want to isolate your custom @@ -417,6 +430,9 @@ If/when you have time, we *strongly* encourage you to read the detailed walkthrough of the build system found in our [Build System](docs/BuildSystem.md) guide. +If you are still having trouble, you are welcome to [join us on Discord](docs/Discord.md) +for further information and/or assistance. + Example Code ------------ @@ -500,6 +516,10 @@ empirically measured performance of `gemm` on select hardware architectures within BLIS and other BLAS libraries when performing matrix problems where one or two dimensions is exceedingly small. + * **[Discord](docs/Discord.md).** This document describes how to: create an +account on Discord (if you don't already have one); obtain a private invite +link; and use that invite link to join our BLIS server on Discord. + * **[Release Notes](docs/ReleaseNotes.md).** This document tracks a summary of changes included with each new version of BLIS, along with contributor credits for key features. @@ -610,16 +630,15 @@ has Linux, OSX and Windows binary packages for x86_64. Discussion ---------- -You can keep in touch with developers and other users of the project by joining -one of the following mailing lists: +Most of the active discussions are now happening on our [Discord](https://discord.com/) +server. Users and developers alike are welcome! Please see the +[BLIS Discord guide](docs/Discord.md) for a walkthrough of how to join us. + +You can also still stay in touch by using either of the following mailing lists: * [blis-devel](https://groups.google.com/group/blis-devel): Please join and post to this mailing list if you are a BLIS developer, or if you are trying to use BLIS beyond simply linking to it as a BLAS library. -**Note:** Most of the interesting discussions happen here; don't be afraid to -join! If you would like to submit a bug report, or discuss a possible bug, -please consider opening a [new issue](https://github.com/flame/blis/issues) on -github. * [blis-discuss](https://groups.google.com/group/blis-discuss): Please join and post to this mailing list if you have general questions or feedback regarding diff --git a/build/config.mk.in b/build/config.mk.in index 849a7ccfa9..efb123366b 100644 --- a/build/config.mk.in +++ b/build/config.mk.in @@ -124,6 +124,9 @@ LDFLAGS_PRESET := @ldflags_preset@ # The level of debugging info to generate. DEBUG_TYPE := @debug_type@ +# Whether to compile and link the AddressSanitizer library. +MK_ENABLE_ASAN := @enable_asan@ + # Whether operating system support was requested via --enable-system. ENABLE_SYSTEM := @enable_system@ diff --git a/common.mk b/common.mk index c8d1fa273a..f4e58af105 100644 --- a/common.mk +++ b/common.mk @@ -118,6 +118,7 @@ get-noopt-cxxflags-for = $(strip $(CFLAGS_PRESET) \ get-refinit-cflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \ $(call get-noopt-cflags-for,$(1)) \ -DBLIS_CNAME=$(1) \ + $(BUILD_ASANFLAGS) \ $(BUILD_CPPFLAGS) \ $(BUILD_SYMFLAGS) \ -DBLIS_IN_REF_KERNEL=1 \ @@ -129,6 +130,7 @@ get-refkern-cflags-for = $(strip $(call load-var-for,CROPTFLAGS,$(1)) \ $(call get-noopt-cflags-for,$(1)) \ $(COMPSIMDFLAGS) \ -DBLIS_CNAME=$(1) \ + $(BUILD_ASANFLAGS) \ $(BUILD_CPPFLAGS) \ $(BUILD_SYMFLAGS) \ -DBLIS_IN_REF_KERNEL=1 \ @@ -137,12 +139,14 @@ get-refkern-cflags-for = $(strip $(call load-var-for,CROPTFLAGS,$(1)) \ get-config-cflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \ $(call get-noopt-cflags-for,$(1)) \ + $(BUILD_ASANFLAGS) \ $(BUILD_CPPFLAGS) \ $(BUILD_SYMFLAGS) \ ) get-frame-cflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \ $(call get-noopt-cflags-for,$(1)) \ + $(BUILD_ASANFLAGS) \ $(BUILD_CPPFLAGS) \ $(BUILD_SYMFLAGS) \ ) @@ -201,11 +205,14 @@ get-sandbox-cxxflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \ # Define a separate function that will return appropriate flags for use by # applications that want to use the same basic flags as those used when BLIS # was compiled. (NOTE: This is the same as the $(get-frame-cflags-for ...) -# function, except that it omits two variables that contain flags exclusively -# for use when BLIS is being compiled/built: BUILD_CPPFLAGS, which contains a -# cpp macro that confirms that BLIS is being built; and BUILD_SYMFLAGS, which -# contains symbol export flags that are only needed when a shared library is -# being compiled/linked.) +# function, except that it omits a few variables that contain flags exclusively +# for use when BLIS is being compiled/built: +# - BUILD_CPPFLAGS, which contains a cpp macro that confirms that BLIS +# is being built; +# - BUILD_SYMFLAGS, which contains symbol export flags that are only +# needed when a shared library is being compiled/linked; and +# - BUILD_ASANFLAGS, which contains a flag that causes the compiler to +# insert instrumentation for memory error detection. get-user-cflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \ $(call get-noopt-cflags-for,$(1)) \ ) @@ -563,6 +570,11 @@ ifeq ($(DEBUG_TYPE),sde) LDFLAGS := $(filter-out $(LIBMEMKIND),$(LDFLAGS)) endif +# If AddressSanitizer is enabled, add the compiler flag to LDFLAGS. +ifeq ($(MK_ENABLE_ASAN),yes) +LDFLAGS += -fsanitize=address +endif + # Specify the shared library's 'soname' field. # NOTE: The flag for creating shared objects is different for Linux and OS X. ifeq ($(OS_NAME),Darwin) @@ -808,11 +820,19 @@ $(foreach c, $(CONFIG_LIST_FAM), $(eval $(call append-var-for,CXXLANGFLAGS,$(c)) CPPROCFLAGS := -D_POSIX_C_SOURCE=200112L $(foreach c, $(CONFIG_LIST_FAM), $(eval $(call append-var-for,CPPROCFLAGS,$(c)))) +# --- AddressSanitizer flags --- + +ifeq ($(MK_ENABLE_ASAN),yes) +BUILD_ASANFLAGS := -fsanitize=address +else +BUILD_ASANFLAGS := +endif + # --- Threading flags --- # NOTE: We don't have to explicitly omit -pthread when --disable-system is given -# since that option forces --enable-threading=none, and thus -pthread never gets -# added to begin with. +# since that option forces --enable-threading=single, and thus -pthread never +# gets added to begin with. CTHREADFLAGS := diff --git a/configure b/configure index 858ce55ded..37399fbde2 100755 --- a/configure +++ b/configure @@ -224,12 +224,22 @@ print_usage() echo " " echo " --enable-mem-tracing, --disable-mem-tracing" echo " " - echo " Enable (disable by default) output to stdout that traces" + echo " Enable (disabled by default) output to stdout that traces" echo " the allocation and freeing of memory, including the names" echo " of the functions that triggered the allocation/freeing." echo " Enabling this option WILL NEGATIVELY IMPACT PERFORMANCE." echo " Please use only for informational/debugging purposes." echo " " + echo " --enable-asan, --disable-asan" + echo " " + echo " Enable (disabled by default) compiling and linking BLIS" + echo " framework code with the AddressSanitizer (ASan) library." + echo " Optimized kernels are NOT compiled with ASan support due" + echo " to limitations of register assignment in inline assembly." + echo " WARNING: ENABLING THIS OPTION WILL NEGATIVELY IMPACT" + echo " PERFORMANCE. Please use only for informational/debugging" + echo " purposes." + echo " " echo " -i SIZE, --int-size=SIZE" echo " " echo " Set the size (in bits) of internal BLIS integers and" @@ -1325,6 +1335,17 @@ blacklistbu_add() fi } +blacklistos_add() +{ + # Check whether we've already blacklisted the given sub-config so + # we don't output redundant messages. + if [ $(is_in_list "$1" "${config_blist}") == "false" ]; then + + echowarn "The operating system does not support building '$1'; adding to blacklist." + config_blist="${config_blist} $1" + fi +} + blacklist_init() { config_blist="" @@ -1979,6 +2000,13 @@ check_assembler() fi } +check_os() +{ + if [[ "$(uname -s)" == "Darwin" && "$(uname -m)" == "arm64" ]]; then + blacklistos_add "armsve" + fi +} + try_assemble() { local cc cflags asm_src asm_base asm_bin rval @@ -2451,6 +2479,9 @@ main() debug_type='' debug_flag='' + # A flag indicating whether AddressSanitizer should be used. + enable_asan='no' + # The system flag. enable_system='yes' @@ -2576,6 +2607,12 @@ main() disable-debug) debug_flag=0 ;; + enable-asan) + enable_asan='yes' + ;; + disable-asan) + enable_asan='no' + ;; enable-verbose-make) enable_verbose='yes' ;; @@ -2867,6 +2904,9 @@ main() get_binutils_version check_assembler + # Check if there is any incompatibility due to the operating system. + check_os + # Remove duplicates and whitespace from the blacklist. blacklist_cleanup @@ -3357,6 +3397,20 @@ main() echo "${script_name}: no preset LDFLAGS detected." fi + # Check if the verbose make flag was specified. + if [ "x${enable_verbose}" = "xyes" ]; then + echo "${script_name}: enabling verbose make output. (disable with 'make V=0'.)" + else + echo "${script_name}: disabling verbose make output. (enable with 'make V=1'.)" + fi + + # Check if the ARG_MAX hack was requested. + if [ "x${enable_arg_max_hack}" = "xyes" ]; then + echo "${script_name}: enabling ARG_MAX hack." + else + echo "${script_name}: disabling ARG_MAX hack." + fi + # Check if the debug flag was specified. if [ -n "${debug_flag}" ]; then if [ "x${debug_type}" = "xopt" ]; then @@ -3373,29 +3427,24 @@ main() echo "${script_name}: debug symbols disabled." fi - # Check if the verbose make flag was specified. - if [ "x${enable_verbose}" = "xyes" ]; then - echo "${script_name}: enabling verbose make output. (disable with 'make V=0'.)" - else - echo "${script_name}: disabling verbose make output. (enable with 'make V=1'.)" - fi - - # Check if the ARG_MAX hack was requested. - if [ "x${enable_arg_max_hack}" = "xyes" ]; then - echo "${script_name}: enabling ARG_MAX hack." + # Check if the AddressSanitizer flag was specified. + if [ "x${enable_asan}" = "xyes" ]; then + echo "${script_name}: enabling AddressSanitizer support (except for optimized kernels)." else - echo "${script_name}: disabling ARG_MAX hack." + enable_asan='no' + echo "${script_name}: AddressSanitizer support disabled." fi - enable_shared_01=1 # Check if the static lib flag was specified. if [ "x${enable_static}" = "xyes" -a "x${enable_shared}" = "xyes" ]; then echo "${script_name}: building BLIS as both static and shared libraries." + enable_shared_01=1 + elif [ "x${enable_static}" = "xno" -a "x${enable_shared}" = "xyes" ]; then + echo "${script_name}: building BLIS as a shared library (static library disabled)." + enable_shared_01=1 elif [ "x${enable_static}" = "xyes" -a "x${enable_shared}" = "xno" ]; then echo "${script_name}: building BLIS as a static library (shared library disabled)." enable_shared_01=0 - elif [ "x${enable_static}" = "xno" -a "x${enable_shared}" = "xyes" ]; then - echo "${script_name}: building BLIS as a shared library (static library disabled)." else echo "${script_name}: Both static and shared libraries were disabled." echo "${script_name}: *** Please enable one (or both) to continue." @@ -3917,7 +3966,7 @@ main() # Create a #define for the configuration family (config_name). uconf=$(echo ${config_name} | tr '[:lower:]' '[:upper:]') config_name_define="#define BLIS_FAMILY_${uconf}\n" - + # Create a list of #defines, one for each configuration in config_list. config_list_defines="" for conf in ${config_list}; do @@ -4012,6 +4061,7 @@ main() | sed -e "s/@libpthread@/${libpthread_esc}/g" \ | sed -e "s/@cflags_preset@/${cflags_preset_esc}/g" \ | sed -e "s/@ldflags_preset@/${ldflags_preset_esc}/g" \ + | sed -e "s/@enable_asan@/${enable_asan}/g" \ | sed -e "s/@debug_type@/${debug_type}/g" \ | sed -e "s/@enable_system@/${enable_system}/g" \ | sed -e "s/@threading_model@/${threading_model}/g" \ diff --git a/docs/Discord.md b/docs/Discord.md new file mode 100644 index 0000000000..b4403f7bc6 --- /dev/null +++ b/docs/Discord.md @@ -0,0 +1,115 @@ +*NOTE: The [BLIS](https://github.com/flame/blis) project is not affiliated with [Discord Inc.](https://discord.com/company) in any way, and we use the Discord logo with their permission.* + + +## Contents + +* **[Welcome](Discord.md#welcome)** +* **[Introduction to Discord](Discord.md#introduction-to-discord)** +* **[Creating an account](Discord.md#creating-an-account)** +* **[Obtaining the invite link](Discord.md#obtaining-the-invite-link)** +* **[Joining the BLIS server](Discord.md#joining-the-blis-server)** +* **[Additional resources](Discord.md#additional-resources)** + +## Welcome + +In 2021, we soft-launched our Discord server by privately inviting current and former collaborators, attendees of our BLIS Retreat, as well as other participants within the BLIS ecosystem. We've been thrilled by the results thus far, and are happy to announce that our new community is now open to the broader public! + +If you'd like to hang out with other BLIS users and developers, ask a question, discuss future features, or just say hello, please feel free to join us! Joining our server is also a great way to get announcements for new versions, workshop events, video chat parties, and other infrequent updates. + +**If you already use Discord** and want to skip straight to the invite link, you can find it [here](#obtaining-the-invite-link). Just be sure to manually remove the dashes (`-`) and equal signs (`=`) before using it! + +## Introduction to Discord + +The remaining sections of this file walk the reader through basic instructions for joining the BLIS community on [Discord](https://discord.com). + +Discord is free to use for everyone. You can optionally pay for premium features via their [Nitro](https://discord.com/nitro) subscription, but Nitro is not necessary for most casual users. + +Discord offers several kinds of clients. Users may use Discord via: + +- the official Android and iOS apps on mobile devices +- a [web browser](https://discord.com/login) +- the standalone desktop application, available from their [Download](https://discord.com/download) page. + +You can even stay logged in on multiple devices! Each one will automatically sync itself to newly sent/received messages. + +In this document, we'll walk you through each step necessary to join the BLIS Discord community. First, we'll talk about how to [create a Discord account](#creating-an-account) (if you don't already have one). Then, we'll explain how to [obtain the invite link](#obtaining-the-invite-link). And finally, we'll tell you how to use that invite link to [join the BLIS Discord server](#joining-the-blis-server). + + +## Creating an account + +If you don't already have a Discord account, you'll need to first create one. + +As of this writing, you may follow these steps to create your account: + +*NOTE: We recommend executing these steps using a desktop web browser. Once you've created your account and joined the BLIS server, you can proceed to use your client(s) of choice (mobile app, desktop app, or web browser).* + +1. Go to [https://discord.com](https://discord.com) and click on "Login" at the top-right. +2. At the bottom of the dialog, click the "Register" link. +3. Enter the prompted information, such as username and email, then click "Continue". +4. Perform the Captcha verification. +5. This should take you into the web browser version of Discord. You will be asked if you want to create your own server. Close the dialog without making any selection. +6. At this point, you need to verify your email address. Check your email account for a message from Discord. Click the link in the email. This should bring up a dialog confirming your email has been verified. You may now close the web browser tab. + +Congratulations! You're now a member of Discord and ready to join individual communities, or "servers." + + +## Obtaining the invite link + +Since we do not have access to an official Captcha-like service to confirm that you are not a software bot, we have instead obfuscated our invite link in a way that should be easy for a human to unmangle. + +Here's an example invite link (for reference purposes only): `https://discord.gg/abC2jUVeip` + +Notice that the link consists of `https://discord.gg/` followed by a 10-character string consisting of lower- and upper-case letters, and (typically) one numerical digit. + +**The BLIS Discord invite link is: https://discord.gg/e-Zx=p-z9=p-Ks=x** + +*Note that you **must** remove the dashes (`-`) and equal signs (`=`) before using the link!* + +Once you decipher the invite link, copy it to your clipboard so it's ready to use in the appropriate step within the next section, [Joining the BLIS server](#joining-the-blis-server). + + +## Joining the BLIS server + +Once you have the invite link copied to your clipboard, follow these steps in order to join the BLIS server: + +*NOTE: We recommend executing these steps using a desktop web browser. Once you've joined the BLIS server, you can proceed to use your client(s) of choice (mobile app, desktop app, or web browser).* + +1. Log in to the [Discord website](https://discord.com). +2. Once logged in, on the left-hand side of the UI, click on the button with the "+" symbol. This will bring up a dialog asking if you want to create a server. +3. At the bottom of the dialog, there will be a section asking, "Have an invite already?" Click the button below it labeled "Join a Server". +4. Paste the invite link into the prompt and click "Join Server". +5. This should bring up a dialog stating that you've been invited to join the BLIS server. Click on "Accept Invite". This will trigger a new dialog informing you that your account has been updated with the invitation. + +That's it! Now that you've joined our server, please consider introducing yourself in `#general`! We love hearing about how application developers and end-users are using BLIS. + +If you had any difficulty joining or with the invite link, please reach out to [field@cs.utexas.edu](field@cs.utexas.edu). + + +## Additional resources + +Are you new to Discord? Not sure how to work this newfangled technology? Don't worry; once you learn the basics, you'll feel much more at home! + +While a tutorial on Discord is beyond the scope of this document, there are countless articles and YouTube videos that introduce newcomers to Discord's UI. Here are a few articles on the basics: + +- **tom's guide**. [Discord: Everything You Need to Know](https://www.tomsguide.com/us/what-is-discord,review-5203.html) +- **WIRED.** [How to Use Discord: A Beginner's Guide](https://www.wired.com/story/how-to-use-discord/) +- **Discord Support.** [Beginner's Guide to Discord](https://support.discord.com/hc/en-us/articles/360045138571-Beginner-s-Guide-to-Discord) + +And some YouTube videos: + +- **Tech Audit TV.** [How to Use Discord in 2022: The Ultimate Beginner Walkthrough](https://www.youtube.com/watch?v=nPmdafMo1b8) +- **Howfinity.** [How to Use Discord - Beginner's Guide](https://www.youtube.com/watch?v=rnYGrq95ezA) + +Some things I recommend setting up shortly after you create your account: + +- Take note of your username's "tag" or disambiguator. This is a randomly-assigned four-digit number that gets implicitly appended to the end of your username (e.g. `bsmith#1234`), which helps when others need to tell you apart from others who have the same username. +- Not happy with your username? You can change it! +- Review your privacy settings, and consider using two-factor authentication. +- Personalize your account with a custom profile image. +- Consider switching to the "dark" theme (if you prefer dark modes on other websites or on mobile devices). +- Tweak other appearence settings such as the font size or UI compactness. +- Set up your notifications. + +There are many other settings in Discord! Feel free to explore all of them by clicking the gear icon in the bottom-left area of your screen, just to the right of your username. + +We hope you found this short guide useful, and we hope to see you on Discord! Thanks for your interest in BLIS and our community! :) diff --git a/docs/images/discord.svg b/docs/images/discord.svg new file mode 100644 index 0000000000..1f483fe8f0 --- /dev/null +++ b/docs/images/discord.svg @@ -0,0 +1,23 @@ + + + + + + + + + + + + + + + + + + + + + + + diff --git a/frame/3/bli_l3_check.c b/frame/3/bli_l3_check.c index 3b4d887468..9ac0a7fbbe 100644 --- a/frame/3/bli_l3_check.c +++ b/frame/3/bli_l3_check.c @@ -44,7 +44,7 @@ void bli_gemm_check const cntx_t* cntx ) { - //err_t e_val; + err_t e_val; // Check basic properties of the operation. @@ -52,15 +52,14 @@ void bli_gemm_check // Check object structure. - // NOTE: Can't perform these checks as long as bli_gemm_check() is called - // from bli_l3_int(), which is in the execution path for structured - // level-3 operations such as hemm. + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); - //e_val = bli_check_general_object( a ); - //bli_check_error_code( e_val ); + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); - //e_val = bli_check_general_object( b ); - //bli_check_error_code( e_val ); + e_val = bli_check_general_object( c ); + bli_check_error_code( e_val ); } void bli_gemmt_check @@ -83,6 +82,14 @@ void bli_gemmt_check e_val = bli_check_square_object( c ); bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); } void bli_hemm_check @@ -102,10 +109,21 @@ void bli_hemm_check bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + // Check object structure. e_val = bli_check_hermitian_object( a ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( c ); + bli_check_error_code( e_val ); } void bli_herk_check @@ -127,18 +145,26 @@ void bli_herk_check bli_herk_basic_check( alpha, a, &ah, beta, c, cntx ); - // Check for real-valued alpha and beta. - - e_val = bli_check_real_valued_object( alpha ); - bli_check_error_code( e_val ); + // Check matrix squareness. - e_val = bli_check_real_valued_object( beta ); + e_val = bli_check_square_object( c ); bli_check_error_code( e_val ); // Check matrix structure. e_val = bli_check_hermitian_object( c ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); + + // Check for real-valued alpha and beta. + + e_val = bli_check_real_valued_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_real_valued_object( beta ); + bli_check_error_code( e_val ); } void bli_her2k_check @@ -162,15 +188,26 @@ void bli_her2k_check bli_her2k_basic_check( alpha, a, &bh, b, &ah, beta, c, cntx ); - // Check for real-valued beta. + // Check matrix squareness. - e_val = bli_check_real_valued_object( beta ); + e_val = bli_check_square_object( c ); bli_check_error_code( e_val ); // Check matrix structure. e_val = bli_check_hermitian_object( c ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); + + // Check for real-valued beta. + + e_val = bli_check_real_valued_object( beta ); + bli_check_error_code( e_val ); } void bli_symm_check @@ -190,10 +227,21 @@ void bli_symm_check bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + // Check object structure. e_val = bli_check_symmetric_object( a ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( c ); + bli_check_error_code( e_val ); } void bli_syrk_check @@ -215,10 +263,18 @@ void bli_syrk_check bli_herk_basic_check( alpha, a, &at, beta, c, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( c ); + bli_check_error_code( e_val ); + // Check matrix structure. e_val = bli_check_symmetric_object( c ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); } void bli_syr2k_check @@ -242,10 +298,21 @@ void bli_syr2k_check bli_her2k_basic_check( alpha, a, &bt, b, &at, beta, c, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( c ); + bli_check_error_code( e_val ); + // Check matrix structure. e_val = bli_check_symmetric_object( c ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); } void bli_trmm3_check @@ -261,14 +328,25 @@ void bli_trmm3_check { err_t e_val; - // Perform checks common to hemm/symm/trmm/trsm. + // Check basic properties of the operation. bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + // Check object structure. e_val = bli_check_triangular_object( a ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( c ); + bli_check_error_code( e_val ); } void bli_trmm_check @@ -282,14 +360,22 @@ void bli_trmm_check { err_t e_val; - // Perform checks common to hemm/symm/trmm/trsm. + // Check basic properties of the operation. bli_hemm_basic_check( side, alpha, a, b, &BLIS_ZERO, b, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + // Check object structure. e_val = bli_check_triangular_object( a ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); } void bli_trsm_check @@ -307,10 +393,18 @@ void bli_trsm_check bli_hemm_basic_check( side, alpha, a, b, &BLIS_ZERO, b, cntx ); + // Check matrix squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + // Check object structure. e_val = bli_check_triangular_object( a ); bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); } // ----------------------------------------------------------------------------- @@ -385,6 +479,14 @@ void bli_gemmt_basic_check e_val = bli_check_level3_dims( a, b, c ); bli_check_error_code( e_val ); + + // Check for consistent datatypes. + + e_val = bli_check_consistent_object_datatypes( c, a ); + bli_check_error_code( e_val ); + + e_val = bli_check_consistent_object_datatypes( c, b ); + bli_check_error_code( e_val ); } void bli_hemm_basic_check @@ -417,11 +519,6 @@ void bli_hemm_basic_check bli_check_error_code( e_val ); } - // Check matrix squareness. - - e_val = bli_check_square_object( a ); - bli_check_error_code( e_val ); - // Check for consistent datatypes. e_val = bli_check_consistent_object_datatypes( c, a ); @@ -452,19 +549,6 @@ void bli_herk_basic_check e_val = bli_check_level3_dims( a, ah, c ); bli_check_error_code( e_val ); - // Check matrix squareness. - - e_val = bli_check_square_object( c ); - bli_check_error_code( e_val ); - - // Check matrix structure. - - e_val = bli_check_general_object( a ); - bli_check_error_code( e_val ); - - e_val = bli_check_general_object( ah ); - bli_check_error_code( e_val ); - // Check for consistent datatypes. e_val = bli_check_consistent_object_datatypes( c, a ); @@ -501,25 +585,6 @@ void bli_her2k_basic_check e_val = bli_check_level3_dims( b, ah, c ); bli_check_error_code( e_val ); - // Check matrix squareness. - - e_val = bli_check_square_object( c ); - bli_check_error_code( e_val ); - - // Check matrix structure. - - e_val = bli_check_general_object( a ); - bli_check_error_code( e_val ); - - e_val = bli_check_general_object( bh ); - bli_check_error_code( e_val ); - - e_val = bli_check_general_object( b ); - bli_check_error_code( e_val ); - - e_val = bli_check_general_object( ah ); - bli_check_error_code( e_val ); - // Check for consistent datatypes. e_val = bli_check_consistent_object_datatypes( c, a ); @@ -586,13 +651,13 @@ void bli_l3_basic_check e_val = bli_check_object_buffer( alpha ); bli_check_error_code( e_val ); - e_val = bli_check_object_buffer( a ); + e_val = bli_check_object_buffer( beta ); bli_check_error_code( e_val ); - e_val = bli_check_object_buffer( b ); + e_val = bli_check_object_buffer( a ); bli_check_error_code( e_val ); - e_val = bli_check_object_buffer( beta ); + e_val = bli_check_object_buffer( b ); bli_check_error_code( e_val ); e_val = bli_check_object_buffer( c ); diff --git a/frame/base/bli_apool.c b/frame/base/bli_apool.c index a42c7103e5..693e91bf92 100644 --- a/frame/base/bli_apool.c +++ b/frame/base/bli_apool.c @@ -188,7 +188,7 @@ void bli_apool_free_block if ( pool != NULL ) { // Finalize the pool. - bli_pool_finalize( pool ); + bli_pool_finalize( pool, FALSE ); #ifdef BLIS_ENABLE_MEM_TRACING printf( "bli_apool_free_block(): pool_t %d: ", ( int )i ); diff --git a/frame/base/bli_cpuid.c b/frame/base/bli_cpuid.c index 527db1f5d7..d967cc05d6 100644 --- a/frame/base/bli_cpuid.c +++ b/frame/base/bli_cpuid.c @@ -781,7 +781,7 @@ uint32_t bli_cpuid_query if ( bli_cpuid_has_features( ecx, FEATURE_MASK_AVX ) ) *features |= FEATURE_AVX; if ( bli_cpuid_has_features( ecx, FEATURE_MASK_FMA3 ) ) *features |= FEATURE_FMA3; - // Check whether the hardware supports xsave/xrestor/xsetbv/xgetbv AND + // Check whether the hardware supports xsave/xrestor/xsetbv/xgetbv AND // support for these is enabled by the OS. If so, then we proceed with // checking that various register-state saving features are available. if ( bli_cpuid_has_features( ecx, FEATURE_MASK_XGETBV ) ) @@ -813,7 +813,7 @@ uint32_t bli_cpuid_query // The OS can manage the state of 512-bit zmm (AVX-512) registers // only if the xcr[7:5] bits are set. If they are not set, then - // clear all feature bits related to AVX-512. + // clear all feature bits related to AVX-512. if ( !bli_cpuid_has_features( eax, XGETBV_MASK_XMM | XGETBV_MASK_YMM | XGETBV_MASK_ZMM ) ) @@ -829,7 +829,7 @@ uint32_t bli_cpuid_query // The OS can manage the state of 256-bit ymm (AVX) registers // only if the xcr[2] bit is set. If it is not set, then - // clear all feature bits related to AVX. + // clear all feature bits related to AVX. if ( !bli_cpuid_has_features( eax, XGETBV_MASK_XMM | XGETBV_MASK_YMM ) ) { @@ -842,7 +842,7 @@ uint32_t bli_cpuid_query // The OS can manage the state of 128-bit xmm (SSE) registers // only if the xcr[1] bit is set. If it is not set, then // clear all feature bits related to SSE (which means the - // entire bitfield is clear). + // entire bitfield is clear). if ( !bli_cpuid_has_features( eax, XGETBV_MASK_XMM ) ) { *features = 0; @@ -1025,6 +1025,7 @@ static uint32_t get_coretype { int implementer = 0x00, part = 0x000; *features = FEATURE_NEON; + bool has_sve = FALSE; #ifdef __linux__ if ( getauxval( AT_HWCAP ) & HWCAP_CPUID ) @@ -1033,7 +1034,7 @@ static uint32_t get_coretype // /sys/devices/system/cpu/cpu0/regs/identification/midr_el1 // and split out in /proc/cpuinfo (with a tab before the colon): // CPU part : 0x0a1 - + uint64_t midr_el1; __asm("mrs %0, MIDR_EL1" : "=r" (midr_el1)); /* @@ -1047,8 +1048,8 @@ static uint32_t get_coretype implementer = (midr_el1 >> 24) & 0xFF; part = (midr_el1 >> 4) & 0xFFF; } - - bool has_sve = getauxval( AT_HWCAP ) & HWCAP_SVE; + + has_sve = getauxval( AT_HWCAP ) & HWCAP_SVE; if (has_sve) *features |= FEATURE_SVE; #endif //__linux__ @@ -1097,7 +1098,7 @@ static uint32_t get_coretype // CAVIUM_CPU_PART_THUNDERX2 0x0AF // CAVIUM_CPU_PART_THUNDERX3 0x0B8 // taken from OpenBLAS // - // BRCM_CPU_PART_BRAHMA_B53 0x100 + // BRCM_CPU_PART_BRAHMA_B53 0x100 // BRCM_CPU_PART_VULCAN 0x516 // // QCOM_CPU_PART_FALKOR_V1 0x800 @@ -1210,7 +1211,7 @@ uint32_t bli_cpuid_query #elif defined(__arm__) || defined(_M_ARM) || defined(_ARCH_PPC) -/* +/* I can't easily find documentation to do this as for aarch64, though it presumably could be unearthed from Linux code. However, on Linux 5.2 (and Androids's 3.4), /proc/cpuinfo has this sort of diff --git a/frame/base/bli_getopt.c b/frame/base/bli_getopt.c index e1d90d3234..bf74eb1d76 100644 --- a/frame/base/bli_getopt.c +++ b/frame/base/bli_getopt.c @@ -37,18 +37,19 @@ static const char OPT_MARKER = '-'; +//bool bli_char_is_in_str( char ch, const char* str ); + void bli_getopt_init_state( int opterr, getopt_t* state ) { - state->optarg = NULL; - state->optind = 1; - state->opterr = opterr; - state->optopt = 0; + state->nextchar = NULL; + state->optarg = NULL; + state->optind = 1; + state->opterr = opterr; + state->optopt = 0; } int bli_getopt( int argc, const char* const * argv, const char* optstring, getopt_t* state ) { - static const char* nextchar = NULL; - const char* elem_str; const char* optstr_char; @@ -60,7 +61,7 @@ int bli_getopt( int argc, const char* const * argv, const char* optstring, getop // an element of argv with more than one option character, in which // case we need to pick up where we left off (which is the address // contained in nextchar). - if ( nextchar == NULL ) + if ( state->nextchar == NULL ) { elem_str = argv[ state->optind ]; @@ -87,10 +88,10 @@ int bli_getopt( int argc, const char* const * argv, const char* optstring, getop // character. // Use the nextchar pointer as our element string. - elem_str = nextchar; + elem_str = state->nextchar; // Reset nextchar to NULL. - nextchar = NULL; + state->nextchar = NULL; } // Find the first occurrence of elem_str[0] in optstring. @@ -130,17 +131,24 @@ int bli_getopt( int argc, const char* const * argv, const char* optstring, getop state->optind += 1; return '?'; } - // If there are still more elements in argv yet to process AND - // the next one is an option, then the argument was omitted. + // If there are still more elements in argv yet to process AND the + // next one is an option marker, then the argument was omitted + // (unless the option marker is actually part of the argument, + // such as with negative numbers, e.g. -1, which is very likely + // if the char *after* the option marker is missing from optstring). else if ( argv[ state->optind + 1 ][0] == OPT_MARKER ) { - if ( state->opterr == 1 ) fprintf( stderr, "bli_getopt(): **error**: option character '%c' is missing an argument (next element of argv is option '%c')\n", elem_str[0], argv[ state->optind + 1 ][1] ); - - state->optopt = *optstr_char; - state->optind += 1; - return '?'; + // If the char after the option marker is present in optstring, + // then the first option argument is missing. + if ( strchr( optstring, argv[ state->optind + 1 ][1] ) != NULL ) + { + if ( state->opterr == 1 ) fprintf( stderr, "bli_getopt(): **error**: option character '%c' is missing an argument (next element of argv is option '%c')\n", elem_str[0], argv[ state->optind + 1 ][1] ); + + state->optopt = *optstr_char; + state->optind += 1; + return '?'; + } } - // If no error was deteced above, we can safely assign optarg // to be the next element in argv and increment optind by two. state->optarg = argv[ state->optind + 1 ]; @@ -166,7 +174,7 @@ int bli_getopt( int argc, const char* const * argv, const char* optstring, getop { if ( strchr( optstring, elem_str[1] ) != NULL ) { - nextchar = &elem_str[1]; + state->nextchar = &elem_str[1]; return *optstr_char; } } @@ -176,3 +184,13 @@ int bli_getopt( int argc, const char* const * argv, const char* optstring, getop return *optstr_char; } +#if 0 +bool bli_char_is_in_str( char ch, const char* str ) +{ + int chi = ( int )ch; + + if ( strchr( str, chi ) == NULL ) return FALSE; + + return TRUE; +} +#endif diff --git a/frame/base/bli_getopt.h b/frame/base/bli_getopt.h index bb0e4f2cf1..1e0f7b2509 100644 --- a/frame/base/bli_getopt.h +++ b/frame/base/bli_getopt.h @@ -34,6 +34,7 @@ typedef struct getopt_s { + const char* nextchar; const char* optarg; int optind; int opterr; diff --git a/frame/base/bli_pba.c b/frame/base/bli_pba.c index 68dffd7285..cabaf4ff6a 100644 --- a/frame/base/bli_pba.c +++ b/frame/base/bli_pba.c @@ -389,9 +389,9 @@ void bli_pba_finalize_pools pool_t* pool_c = bli_pba_pool( index_c, pba ); // Finalize the memory pools for A, B, and C. - bli_pool_finalize( pool_a ); - bli_pool_finalize( pool_b ); - bli_pool_finalize( pool_c ); + bli_pool_finalize( pool_a, FALSE ); + bli_pool_finalize( pool_b, FALSE ); + bli_pool_finalize( pool_c, FALSE ); } // ----------------------------------------------------------------------------- diff --git a/frame/base/bli_pool.c b/frame/base/bli_pool.c index 684b0ef736..891f770aef 100644 --- a/frame/base/bli_pool.c +++ b/frame/base/bli_pool.c @@ -115,7 +115,8 @@ void bli_pool_init void bli_pool_finalize ( - pool_t* pool + pool_t* pool, + bool reinit ) { // NOTE: This implementation assumes that either: @@ -129,24 +130,22 @@ void bli_pool_finalize // Query the total number of blocks currently allocated. const siz_t num_blocks = bli_pool_num_blocks( pool ); - // NOTE: This sanity check has been disabled because bli_pool_reinit() - // is currently implemented in terms of bli_pool_finalize() followed by - // bli_pool_init(). If that _reinit() takes place when some blocks are - // checked out, then we would expect top_index != 0, and therefore this - // check is not universally appropriate. -#if 0 // Query the top_index of the pool. const siz_t top_index = bli_pool_top_index( pool ); // Sanity check: The top_index should be zero. - if ( top_index != 0 ) + // NOTE: This sanity check is disabled when called from bli_pool_reinit() + // because it is currently implemented in terms of bli_pool_finalize() followed by + // bli_pool_init(). If that _reinit() takes place when some blocks are + // checked out, then we would expect top_index != 0, and therefore this + // check is not universally appropriate. + if ( top_index != 0 && !reinit ) { printf( "bli_pool_finalize(): final top_index == %d (expected 0); block_size: %d.\n", ( int )top_index, ( int )bli_pool_block_size( pool ) ); printf( "bli_pool_finalize(): Implication: not all blocks were checked back in!\n" ); bli_abort(); } -#endif // Query the free() function pointer for the pool. free_ft free_fp = bli_pool_free_fp( pool ); @@ -215,7 +214,7 @@ void bli_pool_reinit // those blocks back into the pool. (This condition can be detected // since the block size is encoded into each pblk, which is copied // upon checkout.) - bli_pool_finalize( pool ); + bli_pool_finalize( pool, TRUE ); // Reinitialize the pool with the new parameters, in particular, // the new block size. @@ -335,6 +334,10 @@ void bli_pool_checkin_block // Query the top_index of the pool. const siz_t top_index = bli_pool_top_index( pool ); + // Check for double-free and other conditions which may prematurely + // exhaust the memory pool. + if ( top_index == 0 ) bli_abort(); + #ifdef BLIS_ENABLE_MEM_TRACING printf( "bli_pool_checkin_block(): checking in block %d of size %d " "(align %d, offset %d).\n", @@ -403,14 +406,12 @@ void bli_pool_grow = bli_malloc_intl( block_ptrs_len_new * sizeof( pblk_t ), &r_val ); - // Query the top_index of the pool. - const siz_t top_index = bli_pool_top_index( pool ); - // Copy the contents of the old block_ptrs array to the new/resized - // array. Notice that we can begin with top_index since all entries - // from 0 to top_index-1 have been (and are currently) checked out - // to threads. - for ( dim_t i = top_index; i < num_blocks_cur; ++i ) + // array. Notice that we copy the entire array, including elements + // corresponding to blocks that have been checked out. Those elements + // were set to NULL upon checkout, and so it's important to copy them + // into the new block_ptrs array. + for ( dim_t i = 0; i < num_blocks_cur; ++i ) { block_ptrs_new[i] = block_ptrs_cur[i]; } diff --git a/frame/base/bli_pool.h b/frame/base/bli_pool.h index 0b16ae8eea..6f199f7a4c 100644 --- a/frame/base/bli_pool.h +++ b/frame/base/bli_pool.h @@ -228,7 +228,8 @@ void bli_pool_init ); void bli_pool_finalize ( - pool_t* pool + pool_t* pool, + bool reinit ); void bli_pool_reinit ( diff --git a/frame/util/bli_util_unb_var1.c b/frame/util/bli_util_unb_var1.c index 2b65c8460f..3c501d1075 100644 --- a/frame/util/bli_util_unb_var1.c +++ b/frame/util/bli_util_unb_var1.c @@ -1068,6 +1068,7 @@ void PASTEMAC(ch,varname) \ ctype_r scale_r; \ ctype_r sumsq_r; \ ctype_r abs_chi1_r; \ + ctype_r abs_chi1_i; \ dim_t i; \ \ /* NOTE: This function attempts to mimic the algorithm for computing @@ -1085,10 +1086,47 @@ void PASTEMAC(ch,varname) \ PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ \ abs_chi1_r = bli_fabs( chi1_r ); \ + abs_chi1_i = bli_fabs( chi1_i ); \ +\ + if ( bli_isnan( abs_chi1_r ) ) \ + { \ + sumsq_r = abs_chi1_r; \ + scale_r = one_r; \ + } \ +\ + if ( bli_isnan( abs_chi1_i ) ) \ + { \ + sumsq_r = abs_chi1_i; \ + scale_r = one_r; \ + } \ +\ + if ( bli_isnan( sumsq_r ) ) \ + { \ + chi1 += incx; \ + continue; \ + } \ +\ + if ( bli_isinf( abs_chi1_r ) ) \ + { \ + sumsq_r = abs_chi1_r; \ + scale_r = one_r; \ + } \ +\ + if ( bli_isinf( abs_chi1_i ) ) \ + { \ + sumsq_r = abs_chi1_i; \ + scale_r = one_r; \ + } \ +\ + if ( bli_isinf( sumsq_r ) ) \ + { \ + chi1 += incx; \ + continue; \ + } \ \ /* Accumulate real component into sumsq, adjusting scale if needed. */ \ - if ( abs_chi1_r > zero_r || bli_isnan( abs_chi1_r) ) \ + if ( abs_chi1_r > zero_r ) \ { \ if ( scale_r < abs_chi1_r ) \ { \ @@ -1104,25 +1142,23 @@ void PASTEMAC(ch,varname) \ ( abs_chi1_r / scale_r ); \ } \ } \ -\ - abs_chi1_r = bli_fabs( chi1_i ); \ \ /* Accumulate imaginary component into sumsq, adjusting scale if needed. */ \ - if ( abs_chi1_r > zero_r || bli_isnan( abs_chi1_r) ) \ + if ( abs_chi1_i > zero_r ) \ { \ - if ( scale_r < abs_chi1_r ) \ + if ( scale_r < abs_chi1_i ) \ { \ sumsq_r = one_r + \ - sumsq_r * ( scale_r / abs_chi1_r ) * \ - ( scale_r / abs_chi1_r ); \ + sumsq_r * ( scale_r / abs_chi1_i ) * \ + ( scale_r / abs_chi1_i ); \ \ - PASTEMAC(chr,copys)( abs_chi1_r, scale_r ); \ + PASTEMAC(chr,copys)( abs_chi1_i, scale_r ); \ } \ else \ { \ - sumsq_r = sumsq_r + ( abs_chi1_r / scale_r ) * \ - ( abs_chi1_r / scale_r ); \ + sumsq_r = sumsq_r + ( abs_chi1_i / scale_r ) * \ + ( abs_chi1_i / scale_r ); \ } \ } \ \ diff --git a/test/3/Makefile b/test/3/Makefile index 9dca6f6330..e7cb7235a1 100644 --- a/test/3/Makefile +++ b/test/3/Makefile @@ -126,27 +126,6 @@ VENDOR_LIB := $(MKL_LIB) VENDORP_LIB := $(MKLP_LIB) -# -# --- Problem size definitions ------------------------------------------------- -# - -# Single core (single-threaded) -PS_BEGIN := 48 -PS_MAX := 2400 -PS_INC := 48 - -# Single-socket (multithreaded) -# Coarse Run: -P1_BEGIN := 160 -P1_MAX := 8000 -P1_INC := 160 - -# Dual-socket (multithreaded) -# Coarse Quick Run: -P2_BEGIN := 160 -P2_MAX := 12000 -P2_INC := 160 - # # --- General build definitions ------------------------------------------------ @@ -184,30 +163,19 @@ CXXFLAGS_MT := -march=native $(CXXFLAGS) # Which library? -BLI_DEF := -DBLIS -BLA_DEF := -DBLAS -EIG_DEF := -DEIGEN - -# Complex implementation type -D1M := -DIND=BLIS_1M -DNAT := -DIND=BLIS_NAT - -# Implementation string -#STR_1M := -DSTR=\"1m\" -STR_NAT := -DSTR=\"asm_blis\" -STR_OBL := -DSTR=\"openblas\" -STR_EIG := -DSTR=\"eigen\" -STR_VEN := -DSTR=\"vendor\" - -# Single or multithreaded string -STR_ST := -DTHR_STR=\"st\" -STR_1S := -DTHR_STR=\"1s\" -STR_2S := -DTHR_STR=\"2s\" +DEF_BLI := -DBLIS +DEF_BLA := -DBLAS +DEF_EIG := -DEIGEN -# Problem size specification -PDEF_ST := -DP_BEGIN=$(PS_BEGIN) -DP_INC=$(PS_INC) -DP_MAX=$(PS_MAX) -PDEF_1S := -DP_BEGIN=$(P1_BEGIN) -DP_INC=$(P1_INC) -DP_MAX=$(P1_MAX) -PDEF_2S := -DP_BEGIN=$(P2_BEGIN) -DP_INC=$(P2_INC) -DP_MAX=$(P2_MAX) +# Implementation string. +STR_BLI := -DIMPL_STR=\"blis\" +STR_OBL := -DIMPL_STR=\"openblas\" +STR_EIG := -DIMPL_STR=\"eigen\" +STR_VEN := -DIMPL_STR=\"vendor\" + +# Single or multithreaded string. +STR_ST := -DTHR_STR=\"st\" +STR_MT := -DTHR_STR=\"mt\" @@ -215,188 +183,132 @@ PDEF_2S := -DP_BEGIN=$(P2_BEGIN) -DP_INC=$(P2_INC) -DP_MAX=$(P2_MAX) # --- Targets/rules ------------------------------------------------------------ # -all: all-st all-1s all-2s -blis: blis-st blis-1s blis-2s -openblas: openblas-st openblas-1s openblas-2s -eigen: eigen-st eigen-1s eigen-2s -vendor: vendor-st vendor-1s vendor-2s -mkl: vendor -armpl: vendor +all: all-st -all-st: blis-st openblas-st mkl-st eigen-st -all-1s: blis-1s openblas-1s mkl-1s eigen-1s -all-2s: blis-2s openblas-2s mkl-2s eigen-2s +all-st: blis-st openblas-st mkl-st eigen-st +all-mt: blis-mt openblas-mt mkl-mt eigen-mt -blis-st: blis-nat-st -blis-1s: blis-nat-1s -blis-2s: blis-nat-2s - -#blis-ind: blis-ind-st blis-ind-mt -blis-nat: blis-nat-st blis-nat-1s blis-nat-2s +blis: blis-st +openblas: openblas-st +eigen: eigen-st +vendor: vendor-st +mkl: mkl-st # Define the datatypes, operations, and implementations. -DTS := s d c z OPS := gemm hemm herk trmm trsm -BIMPLS := asm_blis openblas vendor +BIMPLS := blis openblas vendor EIMPLS := eigen -# Define functions to construct object filenames from the datatypes and -# operations given an implementation. We define one function for single- -# threaded, single-socket, and dual-socket filenames. -get-st-objs = $(foreach dt,$(DTS),$(foreach op,$(OPS),test_$(dt)$(op)_$(PS_MAX)_$(1)_st.o)) -get-1s-objs = $(foreach dt,$(DTS),$(foreach op,$(OPS),test_$(dt)$(op)_$(P1_MAX)_$(1)_1s.o)) -get-2s-objs = $(foreach dt,$(DTS),$(foreach op,$(OPS),test_$(dt)$(op)_$(P2_MAX)_$(1)_2s.o)) - -# Construct object and binary names for single-threaded, single-socket, and -# dual-socket files for BLIS, OpenBLAS, and a vendor library (e.g. MKL). -BLIS_NAT_ST_OBJS := $(call get-st-objs,asm_blis) -BLIS_NAT_ST_BINS := $(patsubst %.o,%.x,$(BLIS_NAT_ST_OBJS)) -BLIS_NAT_1S_OBJS := $(call get-1s-objs,asm_blis) -BLIS_NAT_1S_BINS := $(patsubst %.o,%.x,$(BLIS_NAT_1S_OBJS)) -BLIS_NAT_2S_OBJS := $(call get-2s-objs,asm_blis) -BLIS_NAT_2S_BINS := $(patsubst %.o,%.x,$(BLIS_NAT_2S_OBJS)) +# Define a function to construct object filenames from the operations +# given an implementation. +get-st-objs = $(foreach op,$(OPS),test_$(op)_$(1)_st.o) +get-mt-objs = $(foreach op,$(OPS),test_$(op)_$(1)_mt.o) + +# Construct object and binary names for single-threaded and multithreaded +# files for BLIS, OpenBLAS, Eigen, and a vendor library (e.g. MKL). +BLIS_ST_OBJS := $(call get-st-objs,blis) +BLIS_ST_BINS := $(patsubst %.o,%.x,$(BLIS_ST_OBJS)) + +BLIS_MT_OBJS := $(call get-mt-objs,blis) +BLIS_MT_BINS := $(patsubst %.o,%.x,$(BLIS_MT_OBJS)) OPENBLAS_ST_OBJS := $(call get-st-objs,openblas) OPENBLAS_ST_BINS := $(patsubst %.o,%.x,$(OPENBLAS_ST_OBJS)) -OPENBLAS_1S_OBJS := $(call get-1s-objs,openblas) -OPENBLAS_1S_BINS := $(patsubst %.o,%.x,$(OPENBLAS_1S_OBJS)) -OPENBLAS_2S_OBJS := $(call get-2s-objs,openblas) -OPENBLAS_2S_BINS := $(patsubst %.o,%.x,$(OPENBLAS_2S_OBJS)) + +OPENBLAS_MT_OBJS := $(call get-mt-objs,openblas) +OPENBLAS_MT_BINS := $(patsubst %.o,%.x,$(OPENBLAS_MT_OBJS)) EIGEN_ST_OBJS := $(call get-st-objs,eigen) EIGEN_ST_BINS := $(patsubst %.o,%.x,$(EIGEN_ST_OBJS)) -EIGEN_1S_OBJS := $(call get-1s-objs,eigen) -EIGEN_1S_BINS := $(patsubst %.o,%.x,$(EIGEN_1S_OBJS)) -EIGEN_2S_OBJS := $(call get-2s-objs,eigen) -EIGEN_2S_BINS := $(patsubst %.o,%.x,$(EIGEN_2S_OBJS)) + +EIGEN_MT_OBJS := $(call get-mt-objs,eigen) +EIGEN_MT_BINS := $(patsubst %.o,%.x,$(EIGEN_MT_OBJS)) VENDOR_ST_OBJS := $(call get-st-objs,vendor) VENDOR_ST_BINS := $(patsubst %.o,%.x,$(VENDOR_ST_OBJS)) -VENDOR_1S_OBJS := $(call get-1s-objs,vendor) -VENDOR_1S_BINS := $(patsubst %.o,%.x,$(VENDOR_1S_OBJS)) -VENDOR_2S_OBJS := $(call get-2s-objs,vendor) -VENDOR_2S_BINS := $(patsubst %.o,%.x,$(VENDOR_2S_OBJS)) - -# Define some targets associated with the above object/binary files. -blis-nat-st: check-env $(BLIS_NAT_ST_BINS) -blis-nat-1s: check-env $(BLIS_NAT_1S_BINS) -blis-nat-2s: check-env $(BLIS_NAT_2S_BINS) - -openblas-st: check-env $(OPENBLAS_ST_BINS) -openblas-1s: check-env $(OPENBLAS_1S_BINS) -openblas-2s: check-env $(OPENBLAS_2S_BINS) -eigen-st: check-env $(EIGEN_ST_BINS) -eigen-1s: check-env $(EIGEN_1S_BINS) -eigen-2s: check-env $(EIGEN_2S_BINS) +VENDOR_MT_OBJS := $(call get-mt-objs,vendor) +VENDOR_MT_BINS := $(patsubst %.o,%.x,$(VENDOR_MT_OBJS)) -vendor-st: check-env $(VENDOR_ST_BINS) -vendor-1s: check-env $(VENDOR_1S_BINS) -vendor-2s: check-env $(VENDOR_2S_BINS) +# List other miscellaneous object files +UTIL_OBJS := test_utils.o +UTIL_HDRS := test_utils.h -mkl-st: vendor-st -mkl-1s: vendor-1s -mkl-2s: vendor-2s - -armpl-st: vendor-st -armpl-1s: vendor-1s -armpl-2s: vendor-2s +# Define some targets associated with the above object/binary files. +blis-st: check-env $(BLIS_ST_BINS) +blis-mt: check-env $(BLIS_MT_BINS) +openblas-st: check-env $(OPENBLAS_ST_BINS) +openblas-mt: check-env $(OPENBLAS_MT_BINS) +eigen-st: check-env $(EIGEN_ST_BINS) +eigen-mt: check-env $(EIGEN_MT_BINS) +vendor-st: check-env $(VENDOR_ST_BINS) +vendor-mt: check-env $(VENDOR_MT_BINS) +mkl-st: vendor-st +mkl-mt: vendor-mt +armpl-st: vendor-st +armpl-mt: vendor-mt # Mark the object files as intermediate so that make will remove them # automatically after building the binaries on which they depend. -.INTERMEDIATE: $(BLIS_NAT_ST_OBJS) $(BLIS_NAT_1S_OBJS) $(BLIS_NAT_2S_OBJS) -.INTERMEDIATE: $(OPENBLAS_ST_OBJS) $(OPENBLAS_1S_OBJS) $(OPENBLAS_2S_OBJS) -.INTERMEDIATE: $(EIGEN_ST_OBJS) $(EIGEN_1S_OBJS) $(EIGEN_2S_OBJS) -.INTERMEDIATE: $(VENDOR_ST_OBJS) $(VENDOR_1S_OBJS) $(VENDOR_2S_OBJS) +.INTERMEDIATE: $(BLIS_ST_OBJS) $(BLIS_MT_OBJS) +.INTERMEDIATE: $(OPENBLAS_ST_OBJS) $(OPENBLAS_MT_OBJS) +.INTERMEDIATE: $(EIGEN_ST_OBJS) $(EIGEN_MT_OBJS) +.INTERMEDIATE: $(VENDOR_ST_OBJS) $(VENDOR_MT_OBJS) +.INTERMEDIATE: $(UTIL_OBJS) # -- Object file rules -- -#$(TEST_OBJ_PATH)/%.o: $(TEST_SRC_PATH)/%.c -# $(CC) $(CFLAGS) -c $< -o $@ - -# A function to return the datatype cpp macro def from the datatype -# character. -get-dt-cpp = $(strip \ - $(if $(findstring s,$(1)),-DDT=BLIS_FLOAT -DIS_FLOAT,\ - $(if $(findstring d,$(1)),-DDT=BLIS_DOUBLE -DIS_DOUBLE,\ - $(if $(findstring c,$(1)),-DDT=BLIS_SCOMPLEX -DIS_SCOMPLEX,\ - -DDT=BLIS_DCOMPLEX -DIS_DCOMPLEX)))) - # A function to return other cpp macros that help the test driver # identify the implementation. -#get-bl-cpp = $(strip \ -# $(if $(findstring blis,$(1)),$(STR_NAT) $(BLI_DEF),\ -# $(if $(findstring openblas,$(1)),$(STR_OBL) $(BLA_DEF),\ -# $(if $(findstring eigen,$(1)),$(STR_EIG) $(EIG_DEF),\ -# $(STR_VEN) $(BLA_DEF))))) - get-bl-cpp = $(strip \ - $(if $(findstring blis,$(1)),$(STR_NAT) $(BLI_DEF),\ - $(if $(findstring openblas,$(1)),$(STR_OBL) $(BLA_DEF),\ + $(if $(findstring blis,$(1)),$(STR_BLI) $(DEF_BLI),\ + $(if $(findstring openblas,$(1)),$(STR_OBL) $(DEF_BLA),\ $(if $(and $(findstring eigen,$(1)),\ $(findstring gemm,$(2))),\ - $(STR_EIG) $(EIG_DEF),\ + $(STR_EIG) $(DEF_EIG),\ $(if $(findstring eigen,$(1)),\ - $(STR_EIG) $(BLA_DEF),\ - $(STR_VEN) $(BLA_DEF)))))) + $(STR_EIG) $(DEF_BLA),\ + $(STR_VEN) $(DEF_BLA)))))) +# Rules for miscellaneous files. +test_utils.o: test_utils.c test_utils.h + $(CC) $(CFLAGS) -c $< -o $@ # Rules for BLIS and BLAS libraries. define make-st-rule -test_$(1)$(2)_$(PS_MAX)_$(3)_st.o: test_$(op).c Makefile - $(CC) $(CFLAGS) $(PDEF_ST) $(call get-dt-cpp,$(1)) $(call get-bl-cpp,$(3),$(2)) $(DNAT) $(STR_ST) -c $$< -o $$@ -endef - -define make-1s-rule -test_$(1)$(2)_$(P1_MAX)_$(3)_1s.o: test_$(op).c Makefile - $(CC) $(CFLAGS) $(PDEF_1S) $(call get-dt-cpp,$(1)) $(call get-bl-cpp,$(3),$(2)) $(DNAT) $(STR_1S) -c $$< -o $$@ +test_$(1)_$(2)_st.o: test_$(op).c Makefile + $(CC) $(CFLAGS) $(call get-bl-cpp,$(2),$(1)) $(STR_ST) -c $$< -o $$@ endef -define make-2s-rule -test_$(1)$(2)_$(P2_MAX)_$(3)_2s.o: test_$(op).c Makefile - $(CC) $(CFLAGS) $(PDEF_2S) $(call get-dt-cpp,$(1)) $(call get-bl-cpp,$(3),$(2)) $(DNAT) $(STR_2S) -c $$< -o $$@ +define make-mt-rule +test_$(1)_$(2)_mt.o: test_$(op).c Makefile + $(CC) $(CFLAGS) $(call get-bl-cpp,$(2),$(1)) $(STR_MT) -c $$< -o $$@ endef -$(foreach dt,$(DTS), \ $(foreach op,$(OPS), \ -$(foreach im,$(BIMPLS),$(eval $(call make-st-rule,$(dt),$(op),$(im)))))) +$(foreach im,$(BIMPLS),$(eval $(call make-st-rule,$(op),$(im))))) -$(foreach dt,$(DTS), \ $(foreach op,$(OPS), \ -$(foreach im,$(BIMPLS),$(eval $(call make-1s-rule,$(dt),$(op),$(im)))))) - -$(foreach dt,$(DTS), \ -$(foreach op,$(OPS), \ -$(foreach im,$(BIMPLS),$(eval $(call make-2s-rule,$(dt),$(op),$(im)))))) +$(foreach im,$(BIMPLS),$(eval $(call make-mt-rule,$(op),$(im))))) # Rules for Eigen. +# NOTE: Eigen determines single- vs. multithreadedness at compile time. define make-eigst-rule -test_$(1)$(2)_$(PS_MAX)_$(3)_st.o: test_$(op).c Makefile - $(CXX) $(CXXFLAGS_ST) $(PDEF_ST) $(call get-dt-cpp,$(1)) $(call get-bl-cpp,$(3),$(2)) $(DNAT) $(STR_ST) -c $$< -o $$@ -endef - -define make-eig1s-rule -test_$(1)$(2)_$(P1_MAX)_$(3)_1s.o: test_$(op).c Makefile - $(CXX) $(CXXFLAGS_MT) $(PDEF_1S) $(call get-dt-cpp,$(1)) $(call get-bl-cpp,$(3),$(2)) $(DNAT) $(STR_1S) -c $$< -o $$@ +test_$(1)_$(2)_st.o: test_$(op).c Makefile + $(CXX) $(CXXFLAGS_ST) $(call get-bl-cpp,$(2),$(1)) $(STR_ST) -c $$< -o $$@ endef -define make-eig2s-rule -test_$(1)$(2)_$(P2_MAX)_$(3)_2s.o: test_$(op).c Makefile - $(CXX) $(CXXFLAGS_MT) $(PDEF_2S) $(call get-dt-cpp,$(1)) $(call get-bl-cpp,$(3),$(2)) $(DNAT) $(STR_2S) -c $$< -o $$@ +define make-eigmt-rule +test_$(1)_$(2)_mt.o: test_$(op).c Makefile + $(CXX) $(CXXFLAGS_MT) $(call get-bl-cpp,$(2),$(1)) $(STR_MT) -c $$< -o $$@ endef -$(foreach dt,$(DTS), \ -$(foreach op,$(OPS), \ -$(foreach im,$(EIMPLS),$(eval $(call make-eigst-rule,$(dt),$(op),$(im)))))) - -$(foreach dt,$(DTS), \ $(foreach op,$(OPS), \ -$(foreach im,$(EIMPLS),$(eval $(call make-eig1s-rule,$(dt),$(op),$(im)))))) +$(foreach im,$(EIMPLS),$(eval $(call make-eigst-rule,$(op),$(im))))) -$(foreach dt,$(DTS), \ $(foreach op,$(OPS), \ -$(foreach im,$(EIMPLS),$(eval $(call make-eig2s-rule,$(dt),$(op),$(im)))))) +$(foreach im,$(EIMPLS),$(eval $(call make-eigmt-rule,$(op),$(im))))) # -- Executable file rules -- @@ -406,44 +318,36 @@ $(foreach im,$(EIMPLS),$(eval $(call make-eig2s-rule,$(dt),$(op),$(im)))))) # compatibility layer. This prevents BLIS from inadvertently getting called # for the BLAS routines we are trying to test with. -test_%_$(PS_MAX)_asm_blis_st.x: test_%_$(PS_MAX)_asm_blis_st.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@) - -test_%_$(P1_MAX)_asm_blis_1s.x: test_%_$(P1_MAX)_asm_blis_1s.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@) - -test_%_$(P2_MAX)_asm_blis_2s.x: test_%_$(P2_MAX)_asm_blis_2s.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@) - +# Combine the miscellaneous objects with libblis for conciseness (since all +# driver binaries depend on these objects). +COMMON_OBJS := $(UTIL_OBJS) $(LIBBLIS_LINK) -test_%_$(PS_MAX)_openblas_st.x: test_%_$(PS_MAX)_openblas_st.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(OPENBLAS_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_blis_st.x: test_%_blis_st.o $(COMMON_OBJS) + $(CC) $(strip $< $(COMMON_OBJS) $(LDFLAGS) -o $@) -test_%_$(P1_MAX)_openblas_1s.x: test_%_$(P1_MAX)_openblas_1s.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(OPENBLASP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_blis_mt.x: test_%_blis_mt.o $(COMMON_OBJS) + $(CC) $(strip $< $(COMMON_OBJS) $(LDFLAGS) -o $@) -test_%_$(P2_MAX)_openblas_2s.x: test_%_$(P2_MAX)_openblas_2s.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(OPENBLASP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_openblas_st.x: test_%_openblas_st.o $(COMMON_OBJS) + $(CC) $(strip $< $(OPENBLAS_LIB) $(COMMON_OBJS) $(LDFLAGS) -o $@) -test_%_$(PS_MAX)_eigen_st.x: test_%_$(PS_MAX)_eigen_st.o $(LIBBLIS_LINK) - $(CXX) $(strip $< $(EIGEN_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_openblas_mt.x: test_%_openblas_mt.o $(COMMON_OBJS) + $(CC) $(strip $< $(OPENBLASP_LIB) $(COMMON_OBJS) $(LDFLAGS) -o $@) -test_%_$(P1_MAX)_eigen_1s.x: test_%_$(P1_MAX)_eigen_1s.o $(LIBBLIS_LINK) - $(CXX) $(strip $< $(EIGENP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) -test_%_$(P2_MAX)_eigen_2s.x: test_%_$(P2_MAX)_eigen_2s.o $(LIBBLIS_LINK) - $(CXX) $(strip $< $(EIGENP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_eigen_st.x: test_%_eigen_st.o $(COMMON_OBJS) + $(CXX) $(strip $< $(EIGEN_LIB) $(COMMON_OBJS) $(LDFLAGS) -o $@) +test_%_eigen_mt.x: test_%_eigen_mt.o $(COMMON_OBJS) + $(CXX) $(strip $< $(EIGENP_LIB) $(COMMON_OBJS) $(LDFLAGS) -o $@) -test_%_$(PS_MAX)_vendor_st.x: test_%_$(PS_MAX)_vendor_st.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(VENDOR_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) -test_%_$(P1_MAX)_vendor_1s.x: test_%_$(P1_MAX)_vendor_1s.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(VENDORP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_vendor_st.x: test_%_vendor_st.o $(COMMON_OBJS) + $(CC) $(strip $< $(VENDOR_LIB) $(COMMON_OBJS) $(LDFLAGS) -o $@) -test_%_$(P2_MAX)_vendor_2s.x: test_%_$(P2_MAX)_vendor_2s.o $(LIBBLIS_LINK) - $(CC) $(strip $< $(VENDORP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@) +test_%_vendor_mt.x: test_%_vendor_mt.o $(COMMON_OBJS) + $(CC) $(strip $< $(VENDORP_LIB) $(COMMON_OBJS) $(LDFLAGS) -o $@) # -- Environment check rules -- diff --git a/test/3/old/runme.sh b/test/3/old/runme.sh new file mode 100755 index 0000000000..cf84bd1215 --- /dev/null +++ b/test/3/old/runme.sh @@ -0,0 +1,277 @@ +#!/bin/bash + +# File pefixes. +exec_root="test" +out_root="output" +delay=0.1 + +sys="blis" +#sys="stampede2" +#sys="lonestar5" +#sys="ul252" +#sys="ul264" +#sys="ul2128" + +# Bind threads to processors. +#export OMP_PROC_BIND=true +#export GOMP_CPU_AFFINITY="0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21 23" +#export GOMP_CPU_AFFINITY="0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103" + +if [ ${sys} = "blis" ]; then + + export GOMP_CPU_AFFINITY="0-3" + + numactl="" + threads="jc1ic1jr1_st + jc2ic1jr1_1s + jc2ic2jr1_2s" + +elif [ ${sys} = "stampede2" ]; then + + echo "Need to set GOMP_CPU_AFFINITY." + exit 1 + + numactl="" + threads="jc1ic1jr1_st + jc4ic6jr1_1s + jc4ic12jr1_2s" + +elif [ ${sys} = "lonestar5" ]; then + + export GOMP_CPU_AFFINITY="0-23" + + # A hack to use libiomp5 with gcc. + #export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/apps/intel/16.0.1.150/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64" + + numactl="" + threads="jc1ic1jr1_st + jc2ic3jr2_1s + jc4ic3jr2_2s" + +elif [ ${sys} = "ul252" ]; then + + export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/field/intel/mkl/lib/intel64" + export GOMP_CPU_AFFINITY="0-51" + + numactl="" + threads="jc1ic1jr1_st + jc2ic13jr1_1s + jc4ic13jr1_2s" + +elif [ ${sys} = "ul264" ]; then + + export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/field/intel/mkl/lib/intel64" + export GOMP_CPU_AFFINITY="0-63" + + numactl="numactl --interleave=all" + threads="jc1ic1jr1_st + jc1ic8jr4_1s + jc2ic8jr4_2s" + +elif [ ${sys} = "ul2128" ]; then + + export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/field/intel/mkl/lib/intel64" + export GOMP_CPU_AFFINITY="0-127" + + numactl="numactl --interleave=all" + threads="jc1ic1jr1_st + jc4ic4jr4_1s + jc8ic4jr4_2s" + #threads="jc4ic4jr4_1s + # jc8ic4jr4_2s" + #threads="jc1ic1jr1_st" + #threads="jc4ic4jr4_1s" + #threads="jc8ic4jr4_2s" +fi + +# Datatypes to test. +test_dts="d s z c" +#test_dts="s" + +# Operations to test. +test_ops="gemm hemm herk trmm trsm" +#test_ops="herk" + +# Implementations to test. +impls="blis" +#impls="openblas" +#impls="vendor" +#impls="other" +#impls="eigen" +#impls="all" + +if [ "${impls}" = "blis" ]; then + + test_impls="asm_blis" + +elif [ "${impls}" = "openblas" ]; then + + test_impls="openblas" + +elif [ "${impls}" = "vendor" ]; then + + test_impls="vendor" + +elif [ "${impls}" = "eigen" ]; then + + test_impls="eigen" + +elif [ "${impls}" = "other" ]; then + + test_impls="openblas vendor eigen" +else + + test_impls="openblas asm_blis vendor eigen" +fi + +# Save a copy of GOMP_CPU_AFFINITY so that if we have to unset it, we can +# restore the value. +GOMP_CPU_AFFINITYsave=${GOMP_CPU_AFFINITY} + + +# Iterate over the threading configs. +for th in ${threads}; do + + # Start with one way of parallelism in each loop. We will now begin + # parsing the 'th' variable to update one or more of these threading + # parameters. + jc_nt=1; pc_nt=1; ic_nt=1; jr_nt=1; ir_nt=1 + + # Strip everything before and after the underscore so that what remains + # is the problem size and threading parameter string, respectively. + #psize=${th##*_}; thinfo=${th%%_*} + tsuf=${th##*_}; thinfo=${th%%_*} + + # Identify each threading parameter and insert a space before it. + thsep=$(echo -e ${thinfo} | sed -e "s/\([jip][cr]\)/ \1/g" ) + + nt=1 + + for loopnum in ${thsep}; do + + # Given the current string, which identifies a loop and the + # number of ways of parallelism for that loop, strip out + # the ways and loop separately to identify each. + loop=$(echo -e ${loopnum} | sed -e "s/[0-9]//g" ) + num=$(echo -e ${loopnum} | sed -e "s/[a-z]//g" ) + + # Construct a string that we can evaluate to set the number + # of ways of parallelism for the current loop. + loop_nt_eq_num="${loop}_nt=${num}" + + # Update the total number of threads. + nt=$(expr ${nt} \* ${num}) + + # Evaluate the string to assign the ways to the variable. + eval ${loop_nt_eq_num} + + done + + # Find a binary using the test driver prefix and the threading suffix. + # Then strip everything before and after the max problem size that's + # encoded into the name of the binary. + binname=$(ls -1 ${exec_root}_*_${tsuf}.x | head -n1) + temp1=${binname#${exec_root}_*_} + psize=${temp1%%_*} + + # Sanity check: If 'ls' couldn't find any binaries, then the user + # probably didn't build them. Inform the user and proceed to the next + # threading config. + if [ "${binname}" = "" ]; then + + echo "Could not find binaries corresponding to '${tsuf}' threading config. Skipping." + continue + fi + + # Let the user know what threading config we are working on. + echo "Switching to: jc${jc_nt} pc${pc_nt} ic${ic_nt} jr${jr_nt} ir${ir_nt} (nt = ${nt}) p_max${psize}" + + # Iterate over the datatypes. + for dt in ${test_dts}; do + + # Iterate over the implementations. + for im in ${test_impls}; do + + # Iterate over the operations. + for op in ${test_ops}; do + + # Eigen does not support multithreading for hemm, herk, trmm, + # or trsm. So if we're getting ready to execute an Eigen driver + # for one of these operations and nt > 1, we skip this test. + if [ "${im}" = "eigen" ] && \ + [ "${op}" != "gemm" ] && \ + [ "${nt}" != "1" ]; then + continue; + fi + + # Find the threading suffix by probing the executable. + binname=$(ls ${exec_root}_${dt}${op}_*_${im}_${tsuf}.x) + + #echo "found file: ${binname} with suffix ${suf}" + + # Set the number of threads according to th. + if [ "${tsuf}" = "1s" ] || [ "${tsuf}" = "2s" ]; then + + # Set the threading parameters based on the implementation + # that we are preparing to run. + if [ "${im}" = "asm_blis" ]; then + unset OMP_NUM_THREADS + export BLIS_JC_NT=${jc_nt} + export BLIS_PC_NT=${pc_nt} + export BLIS_IC_NT=${ic_nt} + export BLIS_JR_NT=${jr_nt} + export BLIS_IR_NT=${ir_nt} + elif [ "${im}" = "openblas" ]; then + unset OMP_NUM_THREADS + export OPENBLAS_NUM_THREADS=${nt} + elif [ "${im}" = "eigen" ]; then + export OMP_NUM_THREADS=${nt} + elif [ "${im}" = "vendor" ]; then + unset OMP_NUM_THREADS + export MKL_NUM_THREADS=${nt} + fi + export nt_use=${nt} + + # Multithreaded OpenBLAS seems to have a problem running + # properly if GOMP_CPU_AFFINITY is set. So we temporarily + # unset it here if we are about to execute OpenBLAS, but + # otherwise restore it. + if [ ${im} = "openblas" ]; then + unset GOMP_CPU_AFFINITY + else + export GOMP_CPU_AFFINITY="${GOMP_CPU_AFFINITYsave}" + fi + else + + export BLIS_JC_NT=1 + export BLIS_PC_NT=1 + export BLIS_IC_NT=1 + export BLIS_JR_NT=1 + export BLIS_IR_NT=1 + export OMP_NUM_THREADS=1 + export OPENBLAS_NUM_THREADS=1 + export MKL_NUM_THREADS=1 + export nt_use=1 + fi + + # Construct the name of the test executable. + exec_name="${exec_root}_${dt}${op}_${psize}_${im}_${tsuf}.x" + + # Construct the name of the output file. + out_file="${out_root}_${tsuf}_${dt}${op}_${im}.m" + + #echo "Running (nt = ${nt_use}) ./${exec_name} > ${out_file}" + echo "Running ${numactl} ./${exec_name} > ${out_file}" + + # Run executable with or without numactl, depending on how + # the numactl variable was set. + ${numactl} ./${exec_name} > ${out_file} + + # Bedtime! + sleep ${delay} + + done + done + done +done + diff --git a/test/3/runme.sh b/test/3/runme.sh index ecb2f6c1e4..1aa946e531 100755 --- a/test/3/runme.sh +++ b/test/3/runme.sh @@ -5,6 +5,18 @@ exec_root="test" out_root="output" delay=0.1 +# Bind threads to processors. +#export OMP_PROC_BIND=true +#export GOMP_CPU_AFFINITY="0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21 23" +#export GOMP_CPU_AFFINITY="0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103" + +# ------------------ + +# Problem size range for single- and multithreaded execution. Set psr_st and +# psr_mt on a per-system basis below to override these default values. +psr_st="100 1000 100" +psr_mt="200 2000 200" + sys="blis" #sys="stampede2" #sys="lonestar5" @@ -14,19 +26,15 @@ sys="blis" sys="altra" # sys="altramax" -# Bind threads to processors. -#export OMP_PROC_BIND=true -#export GOMP_CPU_AFFINITY="0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21 23" -#export GOMP_CPU_AFFINITY="0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103" - if [ ${sys} = "blis" ]; then export GOMP_CPU_AFFINITY="0-3" numactl="" threads="jc1ic1jr1_st - jc2ic1jr1_1s - jc2ic2jr1_2s" + jc2ic2jr1_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "stampede2" ]; then @@ -35,8 +43,9 @@ elif [ ${sys} = "stampede2" ]; then numactl="" threads="jc1ic1jr1_st - jc4ic6jr1_1s - jc4ic12jr1_2s" + jc4ic12jr1_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "lonestar5" ]; then @@ -47,8 +56,9 @@ elif [ ${sys} = "lonestar5" ]; then numactl="" threads="jc1ic1jr1_st - jc2ic3jr2_1s - jc4ic3jr2_2s" + jc4ic3jr2_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "ul252" ]; then @@ -57,8 +67,9 @@ elif [ ${sys} = "ul252" ]; then numactl="" threads="jc1ic1jr1_st - jc2ic13jr1_1s - jc4ic13jr1_2s" + jc4ic13jr1_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "ul264" ]; then @@ -67,8 +78,9 @@ elif [ ${sys} = "ul264" ]; then numactl="numactl --interleave=all" threads="jc1ic1jr1_st - jc1ic8jr4_1s - jc2ic8jr4_2s" + jc2ic8jr4_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "ul2128" ]; then @@ -77,8 +89,10 @@ elif [ ${sys} = "ul2128" ]; then numactl="numactl --interleave=all" threads="jc1ic1jr1_st - jc4ic4jr4_1s - jc8ic4jr4_2s" + jc4ic4jr4_mt + jc8ic4jr4_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "altra" ]; then @@ -91,8 +105,10 @@ elif [ ${sys} = "altra" ]; then numactl="numactl --localalloc" # Temporarily reducing run to 12000 & 8000 to save time threads="jc1ic1jr1_st - jc1ic10jr8_1s - jc2ic10jr8_2s" + jc1ic10jr8_mt + jc2ic10jr8_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" elif [ ${sys} = "altramax" ]; then @@ -105,50 +121,43 @@ elif [ ${sys} = "altramax" ]; then numactl="numactl --localalloc" # Temporarily reducing run to 12000 to save time threads="jc1ic1jr1_st - jc1ic16jr8_1s - jc2ic16jr8_2s" + jc1ic16jr8_mt + jc2ic16jr8_mt" + #psr_st="40 1000 40" + #psr_mt="40 4000 40" fi # Datatypes to test. -test_dts="d s z c" +test_dts="s d c z" #test_dts="d" # Operations to test. -test_ops="gemm hemm herk trmm trsm" -#test_ops="gemm" +test_ops="gemm_nn hemm_ll herk_ln trmm_llnn trsm_runn" +#test_ops="herk" # Implementations to test. -impls="blis" -#impls="openblas" -#impls="vendor" -#impls="other" -#impls="eigen" -#impls="all" - -if [ "${impls}" = "blis" ]; then - - test_impls="asm_blis" - -elif [ "${impls}" = "openblas" ]; then - - test_impls="openblas" - -elif [ "${impls}" = "vendor" ]; then - - test_impls="vendor" - -elif [ "${impls}" = "eigen" ]; then +test_impls="blis" +#test_impls="openblas" +#test_impls="vendor" +#test_impls="eigen" +#test_impls="all" + +if [ "${impls}" = "all" ]; then + test_impls="openblas blis vendor eigen" +fi - test_impls="eigen" +# Number of repeats per problem size. +nrepeats=3 -elif [ "${impls}" = "other" ]; then +# The induced method to use ('native' or '1m'). +ind="native" - test_impls="openblas vendor eigen" -else +# Quiet mode? +#quiet="yes" - test_impls="openblas asm_blis vendor eigen" -fi +# For testing purposes. +#dryrun="yes" # Save a copy of GOMP_CPU_AFFINITY so that if we have to unset it, we can # restore the value. @@ -158,35 +167,41 @@ GOMP_CPU_AFFINITYsave=${GOMP_CPU_AFFINITY} # Iterate over the threading configs. for th in ${threads}; do + #threads="jc1ic1jr1_st + # jc8ic4jr4_mt" + # Start with one way of parallelism in each loop. We will now begin # parsing the 'th' variable to update one or more of these threading # parameters. jc_nt=1; pc_nt=1; ic_nt=1; jr_nt=1; ir_nt=1 - # Strip everything before and after the underscore so that what remains - # is the problem size and threading parameter string, respectively. - #psize=${th##*_}; thinfo=${th%%_*} - tsuf=${th##*_}; thinfo=${th%%_*} + # Strip everything before the understore so that what remains is the + # threading suffix. + tsuf=${th##*_}; + + # Strip everything after the understore so that what remains is the + # parallelism (threading) info. + thinfo=${th%%_*} # Identify each threading parameter and insert a space before it. - thsep=$(echo -e ${thinfo} | sed -e "s/\([jip][cr]\)/ \1/g" ) + thinfo_sep=$(echo -e ${thinfo} | sed -e "s/\([jip][cr]\)/ \1/g" ) nt=1 - for loopnum in ${thsep}; do + for loopnum in ${thinfo_sep}; do - # Given the current string, which identifies a loop and the - # number of ways of parallelism for that loop, strip out - # the ways and loop separately to identify each. + # Given the current string, which identifies a loop and the number of + # ways of parallelism to be obtained from that loop, strip out the ways + # and loop separately to identify each. loop=$(echo -e ${loopnum} | sed -e "s/[0-9]//g" ) - num=$(echo -e ${loopnum} | sed -e "s/[a-z]//g" ) + nways=$(echo -e ${loopnum} | sed -e "s/[a-z]//g" ) - # Construct a string that we can evaluate to set the number - # of ways of parallelism for the current loop. - loop_nt_eq_num="${loop}_nt=${num}" + # Construct a string that we can evaluate to set the number of ways of + # parallelism for the current loop (e.g. jc_nt, ic_nt, jr_nt). + loop_nt_eq_num="${loop}_nt=${nways}" # Update the total number of threads. - nt=$(expr ${nt} \* ${num}) + nt=$(expr ${nt} \* ${nways}) # Evaluate the string to assign the ways to the variable. eval ${loop_nt_eq_num} @@ -197,8 +212,6 @@ for th in ${threads}; do # Then strip everything before and after the max problem size that's # encoded into the name of the binary. binname=$(ls -1 ${exec_root}_*_${tsuf}.x | head -n1) - temp1=${binname#${exec_root}_*_} - psize=${temp1%%_*} # Sanity check: If 'ls' couldn't find any binaries, then the user # probably didn't build them. Inform the user and proceed to the next @@ -210,7 +223,7 @@ for th in ${threads}; do fi # Let the user know what threading config we are working on. - echo "Switching to: jc${jc_nt} pc${pc_nt} ic${ic_nt} jr${jr_nt} ir${ir_nt} (nt = ${nt}) p_max${psize}" + echo "Switching to: jc${jc_nt} pc${pc_nt} ic${ic_nt} jr${jr_nt} ir${ir_nt} (nt = ${nt})" # Iterate over the datatypes. for dt in ${test_dts}; do @@ -221,26 +234,29 @@ for th in ${threads}; do # Iterate over the operations. for op in ${test_ops}; do + # Strip everything before the understore so that what remains is + # the operation parameter string. + oppars=${op##*_}; + + # Strip everything after the understore so that what remains is + # the operation name (sans parameter encoding). + opname=${op%%_*} + # Eigen does not support multithreading for hemm, herk, trmm, # or trsm. So if we're getting ready to execute an Eigen driver # for one of these operations and nt > 1, we skip this test. - if [ "${im}" = "eigen" ] && \ - [ "${op}" != "gemm" ] && \ - [ "${nt}" != "1" ]; then + if [ "${im}" = "eigen" ] && \ + [ "${opname}" != "gemm" ] && \ + [ "${nt}" != "1" ]; then continue; fi - # Find the threading suffix by probing the executable. - binname=$(ls ${exec_root}_${dt}${op}_*_${im}_${tsuf}.x) - - #echo "found file: ${binname} with suffix ${suf}" - # Set the number of threads according to th. - if [ "${tsuf}" = "1s" ] || [ "${tsuf}" = "2s" ]; then + if [ "${tsuf}" = "mt" ]; then # Set the threading parameters based on the implementation # that we are preparing to run. - if [ "${im}" = "asm_blis" ]; then + if [ "${im}" = "blis" ]; then unset OMP_NUM_THREADS export BLIS_JC_NT=${jc_nt} export BLIS_PC_NT=${pc_nt} @@ -267,8 +283,14 @@ for th in ${threads}; do else export GOMP_CPU_AFFINITY="${GOMP_CPU_AFFINITYsave}" fi + + # Choose the mt problem size range. + psr="${psr_mt}" + else + # Set all environment variables to 1 to ensure single- + # threaded execution. export BLIS_JC_NT=1 export BLIS_PC_NT=1 export BLIS_IC_NT=1 @@ -278,20 +300,38 @@ for th in ${threads}; do export OPENBLAS_NUM_THREADS=1 export MKL_NUM_THREADS=1 export nt_use=1 + + # Choose the st problem size range. + psr="${psr_st}" + fi + + if [ "${quiet}" = "yes" ]; then + qv="-q" # quiet + else + qv="-v" # verbose (the default) fi # Construct the name of the test executable. - exec_name="${exec_root}_${dt}${op}_${psize}_${im}_${tsuf}.x" + exec_name="${exec_root}_${opname}_${im}_${tsuf}.x" # Construct the name of the output file. - out_file="${out_root}_${tsuf}_${dt}${op}_${im}.m" - - #echo "Running (nt = ${nt_use}) ./${exec_name} > ${out_file}" - echo "Running ${numactl} ./${exec_name} > ${out_file}" + out_file="${out_root}_${tsuf}_${dt}${opname}_${oppars}_${im}.m" + + # Use printf for its formatting capabilities. + printf 'Running %s %-21s %s %-7s %s %s %s %s > %s\n' \ + "${numactl}" "./${exec_name}" "-d ${dt}" \ + "-c ${oppars}" \ + "-i ${ind}" \ + "-p \"${psr}\"" \ + "-r ${nrepeats}" \ + "${qv}" \ + "${out_file}" # Run executable with or without numactl, depending on how # the numactl variable was set. - ${numactl} ./${exec_name} > ${out_file} + if [ "${dryrun}" != "yes" ]; then + ${numactl} ./${exec_name} -d ${dt} -c ${oppars} -i ${ind} -p "${psr}" -r ${nrepeats} ${qv} > ${out_file} + fi # Bedtime! sleep ${delay} diff --git a/test/3/test_gemm.c b/test/3/test_gemm.c index b85d119e96..8d09a0fc0d 100644 --- a/test/3/test_gemm.c +++ b/test/3/test_gemm.c @@ -36,18 +36,20 @@ #ifdef EIGEN #define BLIS_DISABLE_BLAS_DEFS #include "blis.h" + #include "test_utils.h" #include #include using namespace Eigen; #else #include "blis.h" + #include "test_utils.h" #endif -#define COL_STORAGE -//#define ROW_STORAGE - //#define PRINT +static const char* LOCAL_OPNAME_STR = "gemm"; +static const char* LOCAL_PC_STR = "nn"; + int main( int argc, char** argv ) { obj_t a, b, c; @@ -70,65 +72,43 @@ int main( int argc, char** argv ) double dtime_save; double gflops; - //bli_init(); - - //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - - n_repeats = 3; - - dt = DT; - - ind = IND; - -#if 1 - p_begin = P_BEGIN; - p_max = P_MAX; - p_inc = P_INC; - - m_input = -1; - n_input = -1; - k_input = -1; -#else - p_begin = 40; - p_max = 1000; - p_inc = 40; - - m_input = -1; - n_input = -1; - k_input = -1; -#endif - + params_t params; // Supress compiler warnings about unused variable 'ind'. ( void )ind; -#if 0 - cntx_t* cntx; + //bli_init(); + + //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); + + // Parse the command line options into strings, integers, enums, + // and doubles, as appropriate. + parse_cl_params( argc, argv, init_def_params, ¶ms ); - ind_t ind_mod = ind; + dt = params.dt; - // Initialize a context for the current induced method and datatype. - cntx = bli_gks_query_ind_cntx( ind_mod ); + ind = params.im; - // Set k to the kc blocksize for the current datatype. - k_input = bli_cntx_get_blksz_def_dt( dt, BLIS_KC, cntx ); + p_begin = params.sta; + p_max = params.end; + p_inc = params.inc; -#elif 1 + m_input = params.m; + n_input = params.n; + k_input = params.k; - //k_input = 256; + n_repeats = params.nr; -#endif - // Choose the char corresponding to the requested datatype. - if ( bli_is_float( dt ) ) dt_ch = 's'; - else if ( bli_is_double( dt ) ) dt_ch = 'd'; - else if ( bli_is_scomplex( dt ) ) dt_ch = 'c'; - else dt_ch = 'z'; + // Map the datatype to its corresponding char. + bli_param_map_blis_to_char_dt( dt, &dt_ch ); - transa = BLIS_NO_TRANSPOSE; - transb = BLIS_NO_TRANSPOSE; + // Map the parameter chars to their corresponding BLIS enum type values. + bli_param_map_char_to_blis_trans( params.pc_str[0], &transa ); + bli_param_map_char_to_blis_trans( params.pc_str[1], &transb ); + // Map the BLIS enum type values to their corresponding BLAS chars. bli_param_map_blis_to_netlib_trans( transa, &f77_transa ); bli_param_map_blis_to_netlib_trans( transb, &f77_transb ); @@ -136,8 +116,8 @@ int main( int argc, char** argv ) // matlab allocates space for the entire array once up-front. for ( p = p_begin; p + p_inc <= p_max; p += p_inc ) ; - printf( "data_%s_%cgemm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:4 ) = [ %4lu %4lu %4lu %7.2f ];\n", + printf( "data_%s_%cgemm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:4 ) = [ %5lu %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )0, ( unsigned long )0, @@ -158,17 +138,20 @@ int main( int argc, char** argv ) bli_obj_create( dt, 1, 1, 0, 0, &alpha ); bli_obj_create( dt, 1, 1, 0, 0, &beta ); - #ifdef COL_STORAGE - bli_obj_create( dt, m, k, 0, 0, &a ); - bli_obj_create( dt, k, n, 0, 0, &b ); - bli_obj_create( dt, m, n, 0, 0, &c ); - bli_obj_create( dt, m, n, 0, 0, &c_save ); - #else - bli_obj_create( dt, m, k, k, 1, &a ); - bli_obj_create( dt, k, n, n, 1, &b ); - bli_obj_create( dt, m, n, n, 1, &c ); - bli_obj_create( dt, m, n, n, 1, &c_save ); - #endif + // Choose the storage of each matrix based on the corresponding + // char in the params_t struct. Note that the expected order of + // storage specifers in sc_str is CAB (not ABC). + if ( params.sc_str[1] == 'c' ) bli_obj_create( dt, m, k, 0, 0, &a ); + else bli_obj_create( dt, m, k, k, 1, &a ); + + if ( params.sc_str[2] == 'c' ) bli_obj_create( dt, k, n, 0, 0, &b ); + else bli_obj_create( dt, k, n, n, 1, &b ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c ); + else bli_obj_create( dt, m, n, n, 1, &c ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c_save ); + else bli_obj_create( dt, m, n, n, 1, &c_save ); bli_randm( &a ); bli_randm( &b ); @@ -177,12 +160,18 @@ int main( int argc, char** argv ) bli_obj_set_conjtrans( transa, &a ); bli_obj_set_conjtrans( transb, &b ); - bli_setsc( (2.0/1.0), 0.0, &alpha ); - bli_setsc( (1.0/1.0), 0.0, &beta ); + //bli_setsc( (2.0/1.0), 0.0, &alpha ); + //bli_setsc( (1.0/1.0), 0.0, &beta ); + bli_setsc( params.alpha, 0.0, &alpha ); + bli_setsc( params.beta, 0.0, &beta ); + + //bli_printm( "alpha:", &alpha, "%7.4e", "" ); + //bli_printm( "beta: ", &beta, "%7.4e", "" ); bli_copym( &c, &c_save ); -#if 0 //def BLIS +#ifdef BLIS + // Switch to the induced method specified by ind. bli_ind_disable_all_dt( dt ); bli_ind_enable_dt( ind, dt ); #endif @@ -196,58 +185,66 @@ int main( int argc, char** argv ) void* bp = bli_obj_buffer_at_off( &b ); void* cp = bli_obj_buffer_at_off( &c ); - #ifdef COL_STORAGE - const int os_a = bli_obj_col_stride( &a ); - const int os_b = bli_obj_col_stride( &b ); - const int os_c = bli_obj_col_stride( &c ); - #else - const int os_a = bli_obj_row_stride( &a ); - const int os_b = bli_obj_row_stride( &b ); - const int os_c = bli_obj_row_stride( &c ); - #endif + int os_a, os_b, os_c; + + if ( params.sc_str[0] == 'c' ) + { + os_a = bli_obj_col_stride( &a ); + os_b = bli_obj_col_stride( &b ); + os_c = bli_obj_col_stride( &c ); + } + else + { + os_a = bli_obj_row_stride( &a ); + os_b = bli_obj_row_stride( &b ); + os_c = bli_obj_row_stride( &c ); + } Stride stride_a( os_a, 1 ); Stride stride_b( os_b, 1 ); Stride stride_c( os_c, 1 ); - #ifdef COL_STORAGE - #if defined(IS_FLOAT) - typedef Matrix MatrixXf_; - #elif defined (IS_DOUBLE) - typedef Matrix MatrixXd_; - #elif defined (IS_SCOMPLEX) - typedef Matrix, Dynamic, Dynamic, ColMajor> MatrixXcf_; - #elif defined (IS_DCOMPLEX) - typedef Matrix, Dynamic, Dynamic, ColMajor> MatrixXcd_; - #endif - #else - #if defined(IS_FLOAT) - typedef Matrix MatrixXf_; - #elif defined (IS_DOUBLE) - typedef Matrix MatrixXd_; - #elif defined (IS_SCOMPLEX) - typedef Matrix, Dynamic, Dynamic, RowMajor> MatrixXcf_; - #elif defined (IS_DCOMPLEX) - typedef Matrix, Dynamic, Dynamic, RowMajor> MatrixXcd_; - #endif - #endif - #if defined(IS_FLOAT) - Map > A( ( float* )ap, m, k, stride_a ); - Map > B( ( float* )bp, k, n, stride_b ); - Map > C( ( float* )cp, m, n, stride_c ); - #elif defined (IS_DOUBLE) - Map > A( ( double* )ap, m, k, stride_a ); - Map > B( ( double* )bp, k, n, stride_b ); - Map > C( ( double* )cp, m, n, stride_c ); - #elif defined (IS_SCOMPLEX) - Map > A( ( std::complex* )ap, m, k, stride_a ); - Map > B( ( std::complex* )bp, k, n, stride_b ); - Map > C( ( std::complex* )cp, m, n, stride_c ); - #elif defined (IS_DCOMPLEX) - Map > A( ( std::complex* )ap, m, k, stride_a ); - Map > B( ( std::complex* )bp, k, n, stride_b ); - Map > C( ( std::complex* )cp, m, n, stride_c ); - #endif + typedef Matrix MatrixXs_c; + typedef Matrix MatrixXd_c; + typedef Matrix, Dynamic, Dynamic, ColMajor> MatrixXc_c; + typedef Matrix, Dynamic, Dynamic, ColMajor> MatrixXz_c; + + typedef Matrix MatrixXs_r; + typedef Matrix MatrixXd_r; + typedef Matrix, Dynamic, Dynamic, RowMajor> MatrixXc_r; + typedef Matrix, Dynamic, Dynamic, RowMajor> MatrixXz_r; + + Map > As_c( ( float* )ap, m, k, stride_a ); + Map > Bs_c( ( float* )bp, k, n, stride_b ); + Map > Cs_c( ( float* )cp, m, n, stride_c ); + + Map > Ad_c( ( double* )ap, m, k, stride_a ); + Map > Bd_c( ( double* )bp, k, n, stride_b ); + Map > Cd_c( ( double* )cp, m, n, stride_c ); + + Map > Ac_c( ( std::complex* )ap, m, k, stride_a ); + Map > Bc_c( ( std::complex* )bp, k, n, stride_b ); + Map > Cc_c( ( std::complex* )cp, m, n, stride_c ); + + Map > Az_c( ( std::complex* )ap, m, k, stride_a ); + Map > Bz_c( ( std::complex* )bp, k, n, stride_b ); + Map > Cz_c( ( std::complex* )cp, m, n, stride_c ); + + Map > As_r( ( float* )ap, m, k, stride_a ); + Map > Bs_r( ( float* )bp, k, n, stride_b ); + Map > Cs_r( ( float* )cp, m, n, stride_c ); + + Map > Ad_r( ( double* )ap, m, k, stride_a ); + Map > Bd_r( ( double* )bp, k, n, stride_b ); + Map > Cd_r( ( double* )cp, m, n, stride_c ); + + Map > Ac_r( ( std::complex* )ap, m, k, stride_a ); + Map > Bc_r( ( std::complex* )bp, k, n, stride_b ); + Map > Cc_r( ( std::complex* )cp, m, n, stride_c ); + + Map > Az_r( ( std::complex* )ap, m, k, stride_a ); + Map > Bz_r( ( std::complex* )bp, k, n, stride_b ); + Map > Cz_r( ( std::complex* )cp, m, n, stride_c ); #endif dtime_save = DBL_MAX; @@ -274,7 +271,22 @@ int main( int argc, char** argv ) #elif defined(EIGEN) - C.noalias() += alpha_r * A * B; + //C.noalias() += alpha_r * A * B; + + if ( params.sc_str[0] == 'c' ) + { + if ( params.dt_str[0] == 's' ) Cs_c.noalias() += alpha_r * As_c * Bs_c; + else if ( params.dt_str[0] == 'd' ) Cd_c.noalias() += alpha_r * Ad_c * Bd_c; + else if ( params.dt_str[0] == 'c' ) Cc_c.noalias() += alpha_r * Ac_c * Bc_c; + else if ( params.dt_str[0] == 'z' ) Cz_c.noalias() += alpha_r * Az_c * Bz_c; + } + else // if ( params.sc_str[0] == 'r' ) + { + if ( params.dt_str[0] == 's' ) Cs_r.noalias() += alpha_r * As_r * Bs_r; + else if ( params.dt_str[0] == 'd' ) Cd_r.noalias() += alpha_r * Ad_r * Bd_r; + else if ( params.dt_str[0] == 'c' ) Cc_r.noalias() += alpha_r * Ac_r * Bc_r; + else if ( params.dt_str[0] == 'z' ) Cz_r.noalias() += alpha_r * Az_r * Bz_r; + } #else // if defined(BLAS) @@ -293,15 +305,15 @@ int main( int argc, char** argv ) float* cp = ( float* )bli_obj_buffer( &c ); sgemm_( &f77_transa, - &f77_transb, - &mm, - &nn, - &kk, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_transb, + &mm, + &nn, + &kk, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } else if ( bli_is_double( dt ) ) { @@ -318,15 +330,15 @@ int main( int argc, char** argv ) double* cp = ( double* )bli_obj_buffer( &c ); dgemm_( &f77_transa, - &f77_transb, - &mm, - &nn, - &kk, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_transb, + &mm, + &nn, + &kk, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } else if ( bli_is_scomplex( dt ) ) { @@ -343,15 +355,15 @@ int main( int argc, char** argv ) scomplex* cp = ( scomplex* )bli_obj_buffer( &c ); cgemm_( &f77_transa, - &f77_transb, - &mm, - &nn, - &kk, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_transb, + &mm, + &nn, + &kk, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } else if ( bli_is_dcomplex( dt ) ) { @@ -368,15 +380,15 @@ int main( int argc, char** argv ) dcomplex* cp = ( dcomplex* )bli_obj_buffer( &c ); zgemm_( &f77_transa, - &f77_transb, - &mm, - &nn, - &kk, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_transb, + &mm, + &nn, + &kk, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } #endif @@ -392,12 +404,13 @@ int main( int argc, char** argv ) if ( bli_is_complex( dt ) ) gflops *= 4.0; - printf( "data_%s_%cgemm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:4 ) = [ %4lu %4lu %4lu %7.2f ];\n", + printf( "data_%s_%cgemm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:4 ) = [ %5lu %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )m, ( unsigned long )k, ( unsigned long )n, gflops ); + fflush( stdout ); fflush(stdout); @@ -415,3 +428,25 @@ int main( int argc, char** argv ) return 0; } +void init_def_params( params_t* params ) +{ + params->opname = LOCAL_OPNAME_STR; + params->impl = IMPL_STR; + + params->pc_str = LOCAL_PC_STR; + params->dt_str = GLOB_DEF_DT_STR; + params->sc_str = GLOB_DEF_SC_STR; + + params->im_str = GLOB_DEF_IM_STR; + + params->ps_str = GLOB_DEF_PS_STR; + params->m_str = GLOB_DEF_M_STR; + params->n_str = GLOB_DEF_N_STR; + params->k_str = GLOB_DEF_K_STR; + + params->nr_str = GLOB_DEF_NR_STR; + + params->alpha_str = GLOB_DEF_ALPHA_STR; + params->beta_str = GLOB_DEF_BETA_STR; +} + diff --git a/test/3/test_hemm.c b/test/3/test_hemm.c index 4e5c73c674..1c0e5bcad6 100644 --- a/test/3/test_hemm.c +++ b/test/3/test_hemm.c @@ -34,9 +34,13 @@ #include #include "blis.h" +#include "test_utils.h" //#define PRINT +static const char* LOCAL_OPNAME_STR = "hemm"; +static const char* LOCAL_PC_STR = "ll"; + int main( int argc, char** argv ) { obj_t a, b, c; @@ -59,54 +63,42 @@ int main( int argc, char** argv ) double dtime_save; double gflops; - //bli_init(); - - //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - - n_repeats = 3; - - dt = DT; - - ind = IND; - - p_begin = P_BEGIN; - p_max = P_MAX; - p_inc = P_INC; - - m_input = -1; - n_input = -1; - + params_t params; // Supress compiler warnings about unused variable 'ind'. ( void )ind; -#if 0 - cntx_t* cntx; + //bli_init(); - ind_t ind_mod = ind; + //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - // Initialize a context for the current induced method and datatype. - cntx = bli_gks_query_ind_cntx( ind_mod ); + // Parse the command line options into strings, integers, enums, + // and doubles, as appropriate. + parse_cl_params( argc, argv, init_def_params, ¶ms ); - // Set k to the kc blocksize for the current datatype. - k_input = bli_cntx_get_blksz_def_dt( dt, BLIS_KC, cntx ); + dt = params.dt; -#elif 1 + ind = params.im; - //k_input = 256; + p_begin = params.sta; + p_max = params.end; + p_inc = params.inc; -#endif + m_input = params.m; + n_input = params.n; - // Choose the char corresponding to the requested datatype. - if ( bli_is_float( dt ) ) dt_ch = 's'; - else if ( bli_is_double( dt ) ) dt_ch = 'd'; - else if ( bli_is_scomplex( dt ) ) dt_ch = 'c'; - else dt_ch = 'z'; + n_repeats = params.nr; - side = BLIS_LEFT; - uploa = BLIS_LOWER; + // Map the datatype to its corresponding char. + bli_param_map_blis_to_char_dt( dt, &dt_ch ); + + // Map the parameter chars to their corresponding BLIS enum type values. + bli_param_map_char_to_blis_side( params.pc_str[0], &side ); + bli_param_map_char_to_blis_uplo( params.pc_str[1], &uploa ); + + // Map the BLIS enum type values to their corresponding BLAS chars. bli_param_map_blis_to_netlib_side( side, &f77_side ); bli_param_map_blis_to_netlib_uplo( uploa, &f77_uploa ); @@ -114,8 +106,8 @@ int main( int argc, char** argv ) // matlab allocates space for the entire array once up-front. for ( p = p_begin; p + p_inc <= p_max; p += p_inc ) ; - printf( "data_%s_%chemm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%chemm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )0, ( unsigned long )0, 0.0 ); @@ -133,13 +125,28 @@ int main( int argc, char** argv ) bli_obj_create( dt, 1, 1, 0, 0, &alpha ); bli_obj_create( dt, 1, 1, 0, 0, &beta ); - if ( bli_is_left( side ) ) - bli_obj_create( dt, m, m, 0, 0, &a ); - else - bli_obj_create( dt, n, n, 0, 0, &a ); - bli_obj_create( dt, m, n, 0, 0, &b ); - bli_obj_create( dt, m, n, 0, 0, &c ); - bli_obj_create( dt, m, n, 0, 0, &c_save ); + // Choose the storage of each matrix based on the corresponding + // char in the params_t struct. Note that the expected order of + // storage specifers in sc_str is CAB (not ABC). + if ( params.sc_str[1] == 'c' ) + { + if ( bli_is_left( side ) ) bli_obj_create( dt, m, m, 0, 0, &a ); + else bli_obj_create( dt, n, n, 0, 0, &a ); + } + else // if ( params.sc_str[1] == 'r' ) + { + if ( bli_is_left( side ) ) bli_obj_create( dt, m, m, m, 1, &a ); + else bli_obj_create( dt, n, n, n, 1, &a ); + } + + if ( params.sc_str[2] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &b ); + else bli_obj_create( dt, m, n, n, 1, &b ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c ); + else bli_obj_create( dt, m, n, n, 1, &c ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c_save ); + else bli_obj_create( dt, m, n, n, 1, &c_save ); bli_randm( &a ); bli_randm( &b ); @@ -153,12 +160,15 @@ int main( int argc, char** argv ) bli_mkherm( &a ); bli_mktrim( &a ); - bli_setsc( (2.0/1.0), 0.0, &alpha ); - bli_setsc( (1.0/1.0), 0.0, &beta ); + //bli_setsc( (2.0/1.0), 0.0, &alpha ); + //bli_setsc( (1.0/1.0), 0.0, &beta ); + bli_setsc( params.alpha, 0.0, &alpha ); + bli_setsc( params.beta, 0.0, &beta ); bli_copym( &c, &c_save ); - -#if 0 //def BLIS + +#ifdef BLIS + // Switch to the induced method specified by ind. bli_ind_disable_all_dt( dt ); bli_ind_enable_dt( ind, dt ); #endif @@ -202,14 +212,14 @@ int main( int argc, char** argv ) float* cp = ( float* )bli_obj_buffer( &c ); ssymm_( &f77_side, - &f77_uploa, - &mm, - &nn, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_uploa, + &mm, + &nn, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } else if ( bli_is_double( dt ) ) { @@ -225,14 +235,14 @@ int main( int argc, char** argv ) double* cp = ( double* )bli_obj_buffer( &c ); dsymm_( &f77_side, - &f77_uploa, - &mm, - &nn, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_uploa, + &mm, + &nn, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } else if ( bli_is_scomplex( dt ) ) { @@ -256,14 +266,14 @@ int main( int argc, char** argv ) #endif chemm_( &f77_side, - &f77_uploa, - &mm, - &nn, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_uploa, + &mm, + &nn, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } else if ( bli_is_dcomplex( dt ) ) { @@ -287,14 +297,14 @@ int main( int argc, char** argv ) #endif zhemm_( &f77_side, - &f77_uploa, - &mm, - &nn, - alphap, - ap, &lda, - bp, &ldb, - betap, - cp, &ldc ); + &f77_uploa, + &mm, + &nn, + alphap, + ap, &lda, + bp, &ldb, + betap, + cp, &ldc ); } #endif @@ -313,11 +323,12 @@ int main( int argc, char** argv ) if ( bli_is_complex( dt ) ) gflops *= 4.0; - printf( "data_%s_%chemm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%chemm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )m, ( unsigned long )n, gflops ); + fflush( stdout ); fflush(stdout); @@ -335,3 +346,24 @@ int main( int argc, char** argv ) return 0; } +void init_def_params( params_t* params ) +{ + params->opname = LOCAL_OPNAME_STR; + params->impl = IMPL_STR; + + params->pc_str = LOCAL_PC_STR; + params->dt_str = GLOB_DEF_DT_STR; + params->sc_str = GLOB_DEF_SC_STR; + + params->im_str = GLOB_DEF_IM_STR; + + params->ps_str = GLOB_DEF_PS_STR; + params->m_str = GLOB_DEF_M_STR; + params->n_str = GLOB_DEF_N_STR; + + params->nr_str = GLOB_DEF_NR_STR; + + params->alpha_str = GLOB_DEF_ALPHA_STR; + params->beta_str = GLOB_DEF_BETA_STR; +} + diff --git a/test/3/test_herk.c b/test/3/test_herk.c index 629f4b57a2..e1bc3beb6c 100644 --- a/test/3/test_herk.c +++ b/test/3/test_herk.c @@ -35,9 +35,13 @@ #include #include "blis.h" +#include "test_utils.h" //#define PRINT +static const char* LOCAL_OPNAME_STR = "herk"; +static const char* LOCAL_PC_STR = "ln"; + int main( int argc, char** argv ) { obj_t a, c; @@ -60,55 +64,43 @@ int main( int argc, char** argv ) double dtime_save; double gflops; - //bli_init(); - - //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - - n_repeats = 3; - - dt = DT; - dt_real = bli_dt_proj_to_real( DT ); - - ind = IND; - - p_begin = P_BEGIN; - p_max = P_MAX; - p_inc = P_INC; - - m_input = -1; - k_input = -1; - + params_t params; // Supress compiler warnings about unused variable 'ind'. ( void )ind; -#if 0 - cntx_t* cntx; + //bli_init(); + + //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); + + // Parse the command line options into strings, integers, enums, + // and doubles, as appropriate. + parse_cl_params( argc, argv, init_def_params, ¶ms ); - ind_t ind_mod = ind; + dt = params.dt; + dt_real = bli_dt_proj_to_real( dt ); - // Initialize a context for the current induced method and datatype. - cntx = bli_gks_query_ind_cntx( ind_mod ); + ind = params.im; - // Set k to the kc blocksize for the current datatype. - k_input = bli_cntx_get_blksz_def_dt( dt, BLIS_KC, cntx ); + p_begin = params.sta; + p_max = params.end; + p_inc = params.inc; -#elif 1 + m_input = params.m; + k_input = params.k; - //k_input = 256; + n_repeats = params.nr; -#endif - // Choose the char corresponding to the requested datatype. - if ( bli_is_float( dt ) ) dt_ch = 's'; - else if ( bli_is_double( dt ) ) dt_ch = 'd'; - else if ( bli_is_scomplex( dt ) ) dt_ch = 'c'; - else dt_ch = 'z'; + // Map the datatype to its corresponding char. + bli_param_map_blis_to_char_dt( dt, &dt_ch ); - uploc = BLIS_LOWER; - transa = BLIS_NO_TRANSPOSE; + // Map the parameter chars to their corresponding BLIS enum type values. + bli_param_map_char_to_blis_uplo( params.pc_str[0], &uploc ); + bli_param_map_char_to_blis_trans( params.pc_str[1], &transa ); + // Map the BLIS enum type values to their corresponding BLAS chars. bli_param_map_blis_to_netlib_uplo( uploc, &f77_uploc ); bli_param_map_blis_to_netlib_trans( transa, &f77_transa ); @@ -116,8 +108,8 @@ int main( int argc, char** argv ) // matlab allocates space for the entire array once up-front. for ( p = p_begin; p + p_inc <= p_max; p += p_inc ) ; - printf( "data_%s_%cherk_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%cherk_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )0, ( unsigned long )0, 0.0 ); @@ -135,15 +127,25 @@ int main( int argc, char** argv ) bli_obj_create( dt_real, 1, 1, 0, 0, &alpha ); bli_obj_create( dt, 1, 1, 0, 0, &beta ); - if ( bli_does_trans( transa ) ) - bli_obj_create( dt, k, m, 0, 0, &a ); - else - bli_obj_create( dt, m, k, 0, 0, &a ); - bli_obj_create( dt, m, m, 0, 0, &c ); - //bli_obj_create( dt, m, k, 2, 2*m, &a ); - //bli_obj_create( dt, k, n, 2, 2*k, &b ); - //bli_obj_create( dt, m, n, 2, 2*m, &c ); - bli_obj_create( dt, m, m, 0, 0, &c_save ); + // Choose the storage of each matrix based on the corresponding + // char in the params_t struct. Note that the expected order of + // storage specifers in sc_str is CA (not AC). + if ( params.sc_str[1] == 'c' ) + { + if ( bli_does_trans( transa ) ) bli_obj_create( dt, k, m, 0, 0, &a ); + else bli_obj_create( dt, m, k, 0, 0, &a ); + } + else // if ( params.sc_str[1] == 'r' ) + { + if ( bli_does_trans( transa ) ) bli_obj_create( dt, k, m, m, 1, &a ); + else bli_obj_create( dt, m, k, k, 1, &a ); + } + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, m, 0, 0, &c ); + else bli_obj_create( dt, m, m, m, 1, &c ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, m, 0, 0, &c_save ); + else bli_obj_create( dt, m, m, m, 1, &c_save ); bli_randm( &a ); bli_randm( &c ); @@ -151,14 +153,22 @@ int main( int argc, char** argv ) bli_obj_set_struc( BLIS_HERMITIAN, &c ); bli_obj_set_uplo( uploc, &c ); + // Make C densely Hermitian, and zero the unstored triangle to + // ensure the implementation reads only from the stored region. + bli_mkherm( &c ); + bli_mktrim( &c ); + bli_obj_set_conjtrans( transa, &a ); - bli_setsc( (2.0/1.0), 0.0, &alpha ); - bli_setsc( (1.0/1.0), 0.0, &beta ); + //bli_setsc( (2.0/1.0), 0.0, &alpha ); + //bli_setsc( (1.0/1.0), 0.0, &beta ); + bli_setsc( params.alpha, 0.0, &alpha ); + bli_setsc( params.beta, 0.0, &beta ); bli_copym( &c, &c_save ); - -#if 0 //def BLIS + +#ifdef BLIS + // Switch to the induced method specified by ind. bli_ind_disable_all_dt( dt ); bli_ind_enable_dt( ind, dt ); #endif @@ -197,13 +207,13 @@ int main( int argc, char** argv ) float* cp = ( float* )bli_obj_buffer( &c ); ssyrk_( &f77_uploc, - &f77_transa, - &mm, - &kk, - alphap, - ap, &lda, - betap, - cp, &ldc ); + &f77_transa, + &mm, + &kk, + alphap, + ap, &lda, + betap, + cp, &ldc ); } else if ( bli_is_double( dt ) ) { @@ -217,13 +227,13 @@ int main( int argc, char** argv ) double* cp = ( double* )bli_obj_buffer( &c ); dsyrk_( &f77_uploc, - &f77_transa, - &mm, - &kk, - alphap, - ap, &lda, - betap, - cp, &ldc ); + &f77_transa, + &mm, + &kk, + alphap, + ap, &lda, + betap, + cp, &ldc ); } else if ( bli_is_scomplex( dt ) ) { @@ -244,13 +254,13 @@ int main( int argc, char** argv ) #endif cherk_( &f77_uploc, - &f77_transa, - &mm, - &kk, - alphap, - ap, &lda, - betap, - cp, &ldc ); + &f77_transa, + &mm, + &kk, + alphap, + ap, &lda, + betap, + cp, &ldc ); } else if ( bli_is_dcomplex( dt ) ) { @@ -271,13 +281,13 @@ int main( int argc, char** argv ) #endif zherk_( &f77_uploc, - &f77_transa, - &mm, - &kk, - alphap, - ap, &lda, - betap, - cp, &ldc ); + &f77_transa, + &mm, + &kk, + alphap, + ap, &lda, + betap, + cp, &ldc ); } #endif @@ -293,11 +303,12 @@ int main( int argc, char** argv ) if ( bli_is_complex( dt ) ) gflops *= 4.0; - printf( "data_%s_%cherk_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%cherk_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )m, ( unsigned long )k, gflops ); + fflush( stdout ); fflush(stdout); @@ -314,3 +325,24 @@ int main( int argc, char** argv ) return 0; } +void init_def_params( params_t* params ) +{ + params->opname = LOCAL_OPNAME_STR; + params->impl = IMPL_STR; + + params->pc_str = LOCAL_PC_STR; + params->dt_str = GLOB_DEF_DT_STR; + params->sc_str = GLOB_DEF_SC_STR; + + params->im_str = GLOB_DEF_IM_STR; + + params->ps_str = GLOB_DEF_PS_STR; + params->m_str = GLOB_DEF_M_STR; + params->k_str = GLOB_DEF_K_STR; + + params->nr_str = GLOB_DEF_NR_STR; + + params->alpha_str = GLOB_DEF_ALPHA_STR; + params->beta_str = GLOB_DEF_BETA_STR; +} + diff --git a/test/3/test_trmm.c b/test/3/test_trmm.c index 726746ee04..793c1d7593 100644 --- a/test/3/test_trmm.c +++ b/test/3/test_trmm.c @@ -35,9 +35,13 @@ #include #include "blis.h" +#include "test_utils.h" //#define PRINT +static const char* LOCAL_OPNAME_STR = "trmm"; +static const char* LOCAL_PC_STR = "llnn"; + int main( int argc, char** argv ) { obj_t a, c; @@ -64,65 +68,44 @@ int main( int argc, char** argv ) double dtime_save; double gflops; - //bli_init(); - - //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - - n_repeats = 3; - - dt = DT; - - ind = IND; - - p_begin = P_BEGIN; - p_max = P_MAX; - p_inc = P_INC; - - m_input = -1; - n_input = -1; - + params_t params; // Supress compiler warnings about unused variable 'ind'. ( void )ind; -#if 0 - cntx_t* cntx; + //bli_init(); - ind_t ind_mod = ind; + //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - // Initialize a context for the current induced method and datatype. - cntx = bli_gks_query_ind_cntx( ind_mod ); + // Parse the command line options into strings, integers, enums, + // and doubles, as appropriate. + parse_cl_params( argc, argv, init_def_params, ¶ms ); - // Set k to the kc blocksize for the current datatype. - k_input = bli_cntx_get_blksz_def_dt( dt, BLIS_KC, cntx ); + dt = params.dt; -#elif 1 + ind = params.im; - //k_input = 256; + p_begin = params.sta; + p_max = params.end; + p_inc = params.inc; -#endif + m_input = params.m; + n_input = params.n; - // Choose the char corresponding to the requested datatype. - if ( bli_is_float( dt ) ) dt_ch = 's'; - else if ( bli_is_double( dt ) ) dt_ch = 'd'; - else if ( bli_is_scomplex( dt ) ) dt_ch = 'c'; - else dt_ch = 'z'; + n_repeats = params.nr; -// Change both ifs to 1 IF using on DUAL SOCKET altra, else keep at zero. -#if 1 - side = BLIS_LEFT; -#else - side = BLIS_RIGHT; -#endif -#if 1 - uploa = BLIS_LOWER; -#else - uploa = BLIS_UPPER; -#endif - transa = BLIS_NO_TRANSPOSE; - diaga = BLIS_NONUNIT_DIAG; + // Map the datatype to its corresponding char. + bli_param_map_blis_to_char_dt( dt, &dt_ch ); + + // Map the parameter chars to their corresponding BLIS enum type values. + bli_param_map_char_to_blis_side( params.pc_str[0], &side ); + bli_param_map_char_to_blis_uplo( params.pc_str[1], &uploa ); + bli_param_map_char_to_blis_trans( params.pc_str[2], &transa ); + bli_param_map_char_to_blis_diag( params.pc_str[3], &diaga ); + + // Map the BLIS enum type values to their corresponding BLAS chars. bli_param_map_blis_to_netlib_side( side, &f77_side ); bli_param_map_blis_to_netlib_uplo( uploa, &f77_uploa ); bli_param_map_blis_to_netlib_trans( transa, &f77_transa ); @@ -132,8 +115,8 @@ int main( int argc, char** argv ) // matlab allocates space for the entire array once up-front. for ( p = p_begin; p + p_inc <= p_max; p += p_inc ) ; - printf( "data_%s_%ctrmm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%ctrmm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )0, ( unsigned long )0, 0.0 ); @@ -150,12 +133,26 @@ int main( int argc, char** argv ) bli_obj_create( dt, 1, 1, 0, 0, &alpha ); - if ( bli_is_left( side ) ) - bli_obj_create( dt, m, m, 0, 0, &a ); - else - bli_obj_create( dt, n, n, 0, 0, &a ); - bli_obj_create( dt, m, n, 0, 0, &c ); - bli_obj_create( dt, m, n, 0, 0, &c_save ); + // Choose the storage of each matrix based on the corresponding + // char in the params_t struct. Note that the expected order of + // storage specifers in sc_str is CA (not AC). Also note that + // C plays the role of matrix B. + if ( params.sc_str[1] == 'c' ) + { + if ( bli_is_left( side ) ) bli_obj_create( dt, m, m, 0, 0, &a ); + else bli_obj_create( dt, n, n, 0, 0, &a ); + } + else // if ( params.sc_str[1] == 'r' ) + { + if ( bli_is_left( side ) ) bli_obj_create( dt, m, m, m, 1, &a ); + else bli_obj_create( dt, n, n, n, 1, &a ); + } + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c ); + else bli_obj_create( dt, m, n, n, 1, &c ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c_save ); + else bli_obj_create( dt, m, n, n, 1, &c_save ); bli_randm( &a ); bli_randm( &c ); @@ -165,14 +162,16 @@ int main( int argc, char** argv ) bli_obj_set_conjtrans( transa, &a ); bli_obj_set_diag( diaga, &a ); - bli_randm( &a ); + // Zero the unstored triangle. bli_mktrim( &a ); - bli_setsc( (2.0/1.0), 0.0, &alpha ); + //bli_setsc( (2.0/1.0), 0.0, &alpha ); + bli_setsc( params.alpha, 0.0, &alpha ); bli_copym( &c, &c_save ); - -#if 0 //def BLIS + +#ifdef BLIS + // Switch to the induced method specified by ind. bli_ind_disable_all_dt( dt ); bli_ind_enable_dt( ind, dt ); #endif @@ -210,14 +209,14 @@ int main( int argc, char** argv ) float* cp = ( float* )bli_obj_buffer( &c ); strmm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } else if ( bli_is_double( dt ) ) { @@ -230,14 +229,14 @@ int main( int argc, char** argv ) double* cp = ( double* )bli_obj_buffer( &c ); dtrmm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } else if ( bli_is_scomplex( dt ) ) { @@ -256,14 +255,14 @@ int main( int argc, char** argv ) #endif ctrmm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } else if ( bli_is_dcomplex( dt ) ) { @@ -282,14 +281,14 @@ int main( int argc, char** argv ) #endif ztrmm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } #endif @@ -308,11 +307,12 @@ int main( int argc, char** argv ) if ( bli_is_complex( dt ) ) gflops *= 4.0; - printf( "data_%s_%ctrmm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%ctrmm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )m, ( unsigned long )n, gflops ); + fflush( stdout ); fflush(stdout); @@ -328,3 +328,24 @@ int main( int argc, char** argv ) return 0; } +void init_def_params( params_t* params ) +{ + params->opname = LOCAL_OPNAME_STR; + params->impl = IMPL_STR; + + params->pc_str = LOCAL_PC_STR; + params->dt_str = GLOB_DEF_DT_STR; + params->sc_str = GLOB_DEF_SC_STR; + + params->im_str = GLOB_DEF_IM_STR; + + params->ps_str = GLOB_DEF_PS_STR; + params->m_str = GLOB_DEF_M_STR; + params->n_str = GLOB_DEF_N_STR; + + params->nr_str = GLOB_DEF_NR_STR; + + params->alpha_str = GLOB_DEF_ALPHA_STR; + params->beta_str = GLOB_DEF_BETA_STR; +} + diff --git a/test/3/test_trsm.c b/test/3/test_trsm.c index aa1d2aa01c..1b259ba162 100644 --- a/test/3/test_trsm.c +++ b/test/3/test_trsm.c @@ -35,9 +35,13 @@ #include #include "blis.h" +#include "test_utils.h" //#define PRINT +static const char* LOCAL_OPNAME_STR = "trsm"; +static const char* LOCAL_PC_STR = "llnn"; + int main( int argc, char** argv ) { obj_t a, c; @@ -64,65 +68,44 @@ int main( int argc, char** argv ) double dtime_save; double gflops; - //bli_init(); - - //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - - n_repeats = 3; - - dt = DT; - - ind = IND; - - p_begin = P_BEGIN; - p_max = P_MAX; - p_inc = P_INC; - - m_input = -1; - n_input = -1; - + params_t params; // Supress compiler warnings about unused variable 'ind'. ( void )ind; -#if 0 - cntx_t* cntx; + //bli_init(); - ind_t ind_mod = ind; + //bli_error_checking_level_set( BLIS_NO_ERROR_CHECKING ); - // Initialize a context for the current induced method and datatype. - cntx = bli_gks_query_ind_cntx( ind_mod ); + // Parse the command line options into strings, integers, enums, + // and doubles, as appropriate. + parse_cl_params( argc, argv, init_def_params, ¶ms ); - // Set k to the kc blocksize for the current datatype. - k_input = bli_cntx_get_blksz_def_dt( dt, BLIS_KC, cntx ); + dt = params.dt; -#elif 1 + ind = params.im; - //k_input = 256; + p_begin = params.sta; + p_max = params.end; + p_inc = params.inc; -#endif + m_input = params.m; + n_input = params.n; - // Choose the char corresponding to the requested datatype. - if ( bli_is_float( dt ) ) dt_ch = 's'; - else if ( bli_is_double( dt ) ) dt_ch = 'd'; - else if ( bli_is_scomplex( dt ) ) dt_ch = 'c'; - else dt_ch = 'z'; + n_repeats = params.nr; -// Set both zeros to 1 for current Ampere platforms due to column preferred storage micropanel -#if 1 - side = BLIS_LEFT; -#else - side = BLIS_RIGHT; -#endif -#if 1 - uploa = BLIS_LOWER; -#else - uploa = BLIS_UPPER; -#endif - transa = BLIS_NO_TRANSPOSE; - diaga = BLIS_NONUNIT_DIAG; + // Map the datatype to its corresponding char. + bli_param_map_blis_to_char_dt( dt, &dt_ch ); + + // Map the parameter chars to their corresponding BLIS enum type values. + bli_param_map_char_to_blis_side( params.pc_str[0], &side ); + bli_param_map_char_to_blis_uplo( params.pc_str[1], &uploa ); + bli_param_map_char_to_blis_trans( params.pc_str[2], &transa ); + bli_param_map_char_to_blis_diag( params.pc_str[3], &diaga ); + + // Map the BLIS enum type values to their corresponding BLAS chars. bli_param_map_blis_to_netlib_side( side, &f77_side ); bli_param_map_blis_to_netlib_uplo( uploa, &f77_uploa ); bli_param_map_blis_to_netlib_trans( transa, &f77_transa ); @@ -132,8 +115,8 @@ int main( int argc, char** argv ) // matlab allocates space for the entire array once up-front. for ( p = p_begin; p + p_inc <= p_max; p += p_inc ) ; - printf( "data_%s_%ctrsm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%ctrsm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )0, ( unsigned long )0, 0.0 ); @@ -150,13 +133,26 @@ int main( int argc, char** argv ) bli_obj_create( dt, 1, 1, 0, 0, &alpha ); - if ( bli_is_left( side ) ) - bli_obj_create( dt, m, m, 0, 0, &a ); - else - bli_obj_create( dt, n, n, 0, 0, &a ); - bli_obj_create( dt, m, n, 0, 0, &c ); - //bli_obj_create( dt, m, n, n, 1, &c ); - bli_obj_create( dt, m, n, 0, 0, &c_save ); + // Choose the storage of each matrix based on the corresponding + // char in the params_t struct. Note that the expected order of + // storage specifers in sc_str is CA (not AC). Also note that + // C plays the role of matrix B. + if ( params.sc_str[1] == 'c' ) + { + if ( bli_is_left( side ) ) bli_obj_create( dt, m, m, 0, 0, &a ); + else bli_obj_create( dt, n, n, 0, 0, &a ); + } + else // if ( params.sc_str[1] == 'r' ) + { + if ( bli_is_left( side ) ) bli_obj_create( dt, m, m, m, 1, &a ); + else bli_obj_create( dt, n, n, n, 1, &a ); + } + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c ); + else bli_obj_create( dt, m, n, n, 1, &c ); + + if ( params.sc_str[0] == 'c' ) bli_obj_create( dt, m, n, 0, 0, &c_save ); + else bli_obj_create( dt, m, n, n, 1, &c_save ); bli_randm( &a ); bli_randm( &c ); @@ -166,17 +162,19 @@ int main( int argc, char** argv ) bli_obj_set_conjtrans( transa, &a ); bli_obj_set_diag( diaga, &a ); - bli_randm( &a ); + // Zero the unstored triangle. bli_mktrim( &a ); // Load the diagonal of A to make it more likely to be invertible. bli_shiftd( &BLIS_TWO, &a ); - bli_setsc( (2.0/1.0), 0.0, &alpha ); + //bli_setsc( (2.0/1.0), 0.0, &alpha ); + bli_setsc( params.alpha, 0.0, &alpha ); bli_copym( &c, &c_save ); - -#if 0 //def BLIS + +#ifdef BLIS + // Switch to the induced method specified by ind. bli_ind_disable_all_dt( dt ); bli_ind_enable_dt( ind, dt ); #endif @@ -214,14 +212,14 @@ int main( int argc, char** argv ) float* cp = ( float* )bli_obj_buffer( &c ); strsm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } else if ( bli_is_double( dt ) ) { @@ -234,14 +232,14 @@ int main( int argc, char** argv ) double* cp = ( double* )bli_obj_buffer( &c ); dtrsm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } else if ( bli_is_scomplex( dt ) ) { @@ -260,14 +258,14 @@ int main( int argc, char** argv ) #endif ctrsm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } else if ( bli_is_dcomplex( dt ) ) { @@ -286,14 +284,14 @@ int main( int argc, char** argv ) #endif ztrsm_( &f77_side, - &f77_uploa, - &f77_transa, - &f77_diaga, - &mm, - &kk, - alphap, - ap, &lda, - cp, &ldc ); + &f77_uploa, + &f77_transa, + &f77_diaga, + &mm, + &kk, + alphap, + ap, &lda, + cp, &ldc ); } #endif @@ -312,11 +310,12 @@ int main( int argc, char** argv ) if ( bli_is_complex( dt ) ) gflops *= 4.0; - printf( "data_%s_%ctrsm_%s", THR_STR, dt_ch, STR ); - printf( "( %2lu, 1:3 ) = [ %4lu %4lu %7.2f ];\n", + printf( "data_%s_%ctrsm_%s", THR_STR, dt_ch, IMPL_STR ); + printf( "( %4lu, 1:3 ) = [ %5lu %5lu %8.2f ];\n", ( unsigned long )(p - p_begin)/p_inc + 1, ( unsigned long )m, ( unsigned long )n, gflops ); + fflush( stdout ); fflush(stdout); @@ -332,3 +331,24 @@ int main( int argc, char** argv ) return 0; } +void init_def_params( params_t* params ) +{ + params->opname = LOCAL_OPNAME_STR; + params->impl = IMPL_STR; + + params->pc_str = LOCAL_PC_STR; + params->dt_str = GLOB_DEF_DT_STR; + params->sc_str = GLOB_DEF_SC_STR; + + params->im_str = GLOB_DEF_IM_STR; + + params->ps_str = GLOB_DEF_PS_STR; + params->m_str = GLOB_DEF_M_STR; + params->n_str = GLOB_DEF_N_STR; + + params->nr_str = GLOB_DEF_NR_STR; + + params->alpha_str = GLOB_DEF_ALPHA_STR; + params->beta_str = GLOB_DEF_BETA_STR; +} + diff --git a/test/3/test_utils.c b/test/3/test_utils.c new file mode 100644 index 0000000000..8e441d055e --- /dev/null +++ b/test/3/test_utils.c @@ -0,0 +1,684 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2022, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name(s) of the copyright holder(s) nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" +#include "test_utils.h" + +// Global string constants. +const char* GLOB_DEF_DT_STR = "d"; +const char* GLOB_DEF_SC_STR = "ccc"; +const char* GLOB_DEF_IM_STR = "native"; + +const char* GLOB_DEF_PS_STR = "50 1000 50"; +const char* GLOB_DEF_M_STR = "-1"; +const char* GLOB_DEF_N_STR = "-1"; +const char* GLOB_DEF_K_STR = "-1"; + +const char* GLOB_DEF_NR_STR = "3"; + +const char* GLOB_DEF_ALPHA_STR = "1.0"; +const char* GLOB_DEF_BETA_STR = "1.0"; + + +void parse_cl_params( int argc, char** argv, init_fp fp, params_t* params ) +{ + bool gave_option_c = FALSE; + bool gave_option_d = FALSE; + bool gave_option_s = FALSE; + + bool gave_option_i = FALSE; + + bool gave_option_p = FALSE; + bool gave_option_m = FALSE; + bool gave_option_n = FALSE; + bool gave_option_k = FALSE; + + bool gave_option_r = FALSE; + + bool gave_option_a = FALSE; + bool gave_option_b = FALSE; + + int opt; + char opt_ch; + + getopt_t state; + + // Initialize the params_t struct with the caller-supplied function. + fp( params ); + + // Copy the binary name pointer so we can use it later. + params->bin = argv[0]; + + // Alias the binary name for conciseness. + const char* bin = params->bin; + + // Initialize the state for running bli_getopt(). Here, 0 is the + // initial value for opterr, which suppresses error messages. + bli_getopt_init_state( 0, &state ); + + // Process all option arguments until we get a -1, which means we're done. + while( (opt = bli_getopt( argc, ( const char* const * )argv, "c:d:s:i:p:m:n:k:r:a:b:qvh", &state )) != -1 ) + { + // Explicitly typecast opt, which is an int, to a char. (Failing to + // typecast resulted in at least one user-reported problem whereby + // opt was being filled with garbage.) + opt_ch = ( char )opt; + + switch( opt_ch ) + { + case 'c': + params->pc_str = state.optarg; + gave_option_c = TRUE; + break; + + case 'd': + params->dt_str = state.optarg; + gave_option_d = TRUE; + break; + + case 's': + params->sc_str = state.optarg; + gave_option_s = TRUE; + break; + + + case 'i': + params->im_str = state.optarg; + gave_option_i = TRUE; + break; + + + case 'p': + params->ps_str = state.optarg; + gave_option_p = TRUE; + break; + + case 'm': + params->m_str = state.optarg; + gave_option_m = TRUE; + break; + + case 'n': + params->n_str = state.optarg; + gave_option_n = TRUE; + break; + + case 'k': + params->k_str = state.optarg; + gave_option_k = TRUE; + break; + + + case 'r': + params->nr_str = state.optarg; + gave_option_r = TRUE; + break; + + + case 'a': + params->alpha_str = state.optarg; + gave_option_a = TRUE; + break; + + case 'b': + params->beta_str = state.optarg; + gave_option_b = TRUE; + break; + + + case 'q': + params->verbose = FALSE; + break; + + case 'v': + params->verbose = TRUE; + break; + + case 'h': + { + bool has_trans = FALSE; + bool has_side = FALSE; + bool has_uplo = FALSE; + bool has_unit = FALSE; + + if ( is_gemm( params ) || + is_herk( params ) || + is_trmm( params ) || + is_trsm( params ) ) has_trans = TRUE; + + if ( is_hemm( params ) || + is_trmm( params ) || + is_trsm( params ) ) has_side = TRUE; + + if ( is_hemm( params ) || + is_herk( params ) || + is_trmm( params ) || + is_trsm( params ) ) has_uplo = TRUE; + + if ( is_trmm( params ) || + is_trsm( params ) ) has_unit = TRUE; + + printf( "\n" ); + printf( " %s performance driver\n", params->opname ); + printf( " -----------------------\n" ); + printf( " (part of the BLIS framework)\n" ); + printf( "\n" ); + printf( " Measure performance of the '%s' implementation of the '%s' operation:\n", params->impl, params->opname ); + printf( "\n" ); + if ( is_gemm( params ) ) + { + printf( " C := beta * C + alpha * trans(A) * trans(B)\n" ); + printf( "\n" ); + printf( " where C is an m x n matrix, trans(A) is an m x k matrix, and\n" ); + printf( " trans(B) is a k x n matrix.\n" ); + } + else if ( is_hemm( params ) ) + { + printf( " C := beta * C + alpha * uplo(A) * B (side = left)\n" ); + printf( " C := beta * C + alpha * B * uplo(A) (side = right)\n" ); + printf( "\n" ); + printf( " where C and B are m x n matrices and A is a Hermitian matrix stored\n" ); + printf( " in the lower or upper triangle, as specified by uplo(A). When side =\n" ); + printf( " left, A is m x m, and when side = right, A is n x n.\n" ); + } + else if ( is_herk( params ) ) + { + printf( " uplo(C) := beta * uplo(C) + alpha * trans(A) * trans(A)^H\n" ); + printf( "\n" ); + printf( " where C is an m x m Hermitian matrix stored in the lower or upper\n" ); + printf( " triangle, as specified by uplo(C), and trans(A) is an m x k matrix.\n" ); + } + else if ( is_trmm( params ) ) + { + printf( " B := alpha * trans(uplo(A)) * B (side = left)\n" ); + printf( " B := alpha * B * trans(uplo(A)) (side = right)\n" ); + printf( "\n" ); + printf( " where B is an m x n matrix and A is a triangular matrix stored in\n" ); + printf( " the lower or upper triangle, as specified by uplo(A), with unit/non-unit\n" ); + printf( " diagonal specified by diag(A). When side = left, A is m x m, and when\n" ); + printf( " side = right, A is n x n.\n" ); + } + else if ( is_trsm( params ) ) + { + printf( " B := alpha * trans(uplo(A))^{-1} * B (side = left)\n" ); + printf( " B := alpha * B * trans(uplo(A))^{-1} (side = right)\n" ); + printf( "\n" ); + printf( " where B is an m x n matrix and A is a triangular matrix stored in\n" ); + printf( " the lower or upper triangle, as specified by uplo(A), with unit/non-unit\n" ); + printf( " diagonal specified by diag(A). When side = left, A is m x m, and when\n" ); + printf( " side = right, A is n x n. Note that while ^{-1} indicates inversion,\n" ); + printf( " trsm does not explicitly invert A, but rather solves for an m x n\n" ); + printf( " solution matrix X, which then overwrites the original contents of B.\n" ); + } + printf( "\n" ); + printf( " Performance measurements are taken for a range of problem sizes with a fixed\n" ); + printf( " set of parameters, and results are printed to stdout in a matlab/octave-\n" ); + printf( " friendly format.\n" ); + printf( "\n" ); + printf( " Usage:\n" ); + printf( "\n" ); + printf( " %s [options]\n", bin ); + printf( "\n" ); + printf( " The following computational options are supported:\n" ); + printf( "\n" ); + printf( " -c pc\n" ); + printf( " Use the operation-specific parameter combination specified by\n" ); + printf( " the 'pc' string. The following tables list expected parameters\n" ); + printf( " for the '%s' operation and the valid values for each parameter.\n", params->opname ); + printf( "\n" ); + printf( " Operation List (order) of parameters Example\n" ); + printf( " -------------------------------------------------------\n" ); + if ( is_gemm( params ) ) + { + printf( " gemm trans(A) trans(A) -c tn\n" ); + } + else if ( is_hemm( params ) ) + { + printf( " hemm/symm side(A) uplo(A) -c rl\n" ); + } + else if ( is_herk( params ) ) + { + printf( " herk/syrk uplo(C) trans(A) -c ln\n" ); + } + else if ( is_trmm( params ) ) + { + printf( " trmm side(A) uplo(A) trans(A) unit(A) -c lutn\n" ); + } + else if ( is_trsm( params ) ) + { + printf( " trsm side(A) uplo(A) trans(A) unit(A) -c rlnn\n" ); + } + printf( "\n" ); + printf( " Valid\n" ); + printf( " Param chars Interpretation\n" ); + printf( " ---------------------------------------\n" ); + if ( has_trans ) + { + printf( " trans n No transpose\n" ); + printf( " t Transpose only\n" ); + printf( " c Conjugate only*\n" ); + printf( " h Hermitian transpose\n" ); + printf( "\n" ); + } + if ( has_side ) + { + printf( " side l Left\n" ); + printf( " r Right\n" ); + printf( "\n" ); + } + if ( has_uplo ) + { + printf( " uplo l Lower-stored\n" ); + printf( " u Upper-stored\n" ); + printf( "\n" ); + } + if ( has_unit ) + { + printf( " unit u Unit diagonal\n" ); + printf( " n Non-unit diagonal\n" ); + printf( "\n" ); + } + if ( has_trans ) + { + printf( " *This option is supported by BLIS but not by classic BLAS.\n" ); + } + printf( "\n" ); + printf( " -d dt\n" ); + printf( " Allocate matrix elements using the datatype character specified\n" ); + printf( " by dt, and also perform computation in that same datatype. Valid\n" ); + printf( " char values for dt are:\n" ); + printf( "\n" ); + printf( " Valid\n" ); + printf( " chars Interpretation\n" ); + printf( " -----------------------------------------\n" ); + printf( " s single-precision real domain\n" ); + printf( " d double-precision real domain\n" ); + printf( " c single-precision complex domain\n" ); + printf( " z double-precision complex domain\n" ); + printf( "\n" ); + printf( " -s sc\n" ); + printf( " Use the characters in sc to determine the storage formats\n" ); + printf( " of each operand matrix used in the performance measurements.\n" ); + printf( " Valid chars are 'r' (row storage) and 'c' (column storage).\n" ); + printf( " The characters encode the storage format for each operand\n" ); + printf( " used by %s, with the mapping of chars to operand interpreted\n", params->opname ); + printf( " in the following order:\n" ); + printf( "\n" ); + printf( " Order of\n" ); + printf( " operand \n" ); + printf( " Operation mapping Example Interpretation\n" ); + printf( " ----------------------------------------------------------\n" ); + if ( is_gemm( params ) ) + { + printf( " gemm C A B -s crr C is col-stored;\n" ); + printf( " A and B are row-stored.\n" ); + } + else if ( is_hemm( params ) ) + { + printf( " hemm/symm C A B -s rcc C is row-stored;\n" ); + printf( " A and B are col-stored.\n" ); + } + else if ( is_herk( params ) ) + { + printf( " herk/syrk C A -s rc C is row-stored;\n" ); + printf( " A is col-stored.\n" ); + } + else if ( is_trmm( params ) ) + { + printf( " trmm B A -s cr B is col-stored;\n" ); + printf( " A is row-stored.\n" ); + } + else if ( is_trsm( params ) ) + { + printf( " trsm B A -s cc B and A are col-stored.\n" ); + } + printf( "\n" ); + printf( " -i im\n" ); + printf( " Use native execution if im is 'native' (or 'nat'). Otherwise,\n" ); + printf( " if im is '1m', use the 1m method to induce complex computation\n" ); + printf( " using the equivalent real-domain microkernels.\n" ); + printf( "\n" ); + printf( " -p 'lo hi in'\n" ); + printf( " Perform a sweep of measurements of problem sizes ranging from \n" ); + printf( " 'lo' to 'hi' in increments of 'in'. Note that measurements will\n" ); + printf( " be taken in descending order, starting from 'hi', and so 'lo'\n" ); + printf( " will act as a floor and may not be measured (see 2nd example).\n" ); + printf( "\n" ); + printf( " Example Interpretation\n" ); + printf( " -------------------------------------------------------\n" ); + printf( " -p '40 400 40' Measure performance from 40 to 400\n" ); + printf( " (inclusive) in increments of 40.\n" ); + printf( " -p '40 400 80' Measure performance for problem sizes\n" ); + printf( " {80,160,240,320,400}.\n" ); + printf( "\n" ); + printf( " Note that unlike the other option arguments, quotes are required\n" ); + printf( " around the 'lo hi in' string in order to facilitate parsing.\n" ); + printf( "\n" ); + printf( " -m M\n" ); + if ( is_gemm( params ) || is_hemm( params ) || is_trmm( params ) || is_trsm( params ) ) + printf( " -n N\n" ); + if ( is_gemm( params ) || is_herk( params ) ) + printf( " -k K\n" ); + if ( is_gemm( params ) ) + { + printf( " Bind the m, n, or k dimensions to M, N, or K, respectively.\n" ); + printf( " Binding of matrix dimensions takes place as follows:\n" ); + } + else if ( is_herk( params ) ) + { + printf( " Bind the m or k dimensions to M or K, respectively. Binding\n" ); + printf( " of matrix dimensions takes place as follows:\n" ); + } + else if ( is_hemm( params ) || is_trmm( params ) || is_trsm( params ) ) + { + printf( " Bind the m or n dimensions to M or N, respectively. Binding\n" ); + printf( " of matrix dimensions takes place as follows:\n" ); + } + printf( "\n" ); + printf( " if 0 < X: Bind the x dimension to X and hold it constant.\n" ); + printf( " if X = -1: Bind the x dimension to p.\n" ); + printf( " if X < -1: Bind the x dimension to p / abs(x).\n" ); + printf( "\n" ); + printf( " where p is the current problem size. Note: X = 0 is undefined.\n" ); + printf( "\n" ); + printf( " Examples Interpretation\n" ); + printf( " ---------------------------------------------------------\n" ); + if ( is_gemm( params ) ) + { + printf( " -m -1 -n -1 -k -1 Bind m, n, and k to the problem size\n" ); + printf( " to keep all matrices square.\n" ); + printf( " -m -1 -n -1 -k 100 Bind m and n to the problem size, but\n" ); + printf( " hold k = 100 constant.\n" ); + } + else if ( is_hemm( params ) ) + { + printf( " -m -1 -n -1 Bind m and n to the problem size to\n" ); + printf( " keep all matrices square.\n" ); + printf( " -m -1 -n 500 Bind m to the problem size, but hold\n" ); + printf( " n = 500 constant.\n" ); + } + else if ( is_herk( params ) ) + { + printf( " -m -1 -k -1 Bind m and k to the problem size to\n" ); + printf( " keep both matrices square.\n" ); + printf( " -m -1 -k 200 Bind m to the problem size, but hold\n" ); + printf( " k = 200 constant.\n" ); + } + else if ( is_trmm( params ) || is_trsm( params ) ) + { + printf( " -m -1 -n -1 Bind m and n to the problem size to\n" ); + printf( " keep both matrices square.\n" ); + printf( " -m -1 -n 300 Bind m to the problem size, but hold\n" ); + printf( " n = 300 constant.\n" ); + } + printf( "\n" ); + printf( " -r num\n" ); + printf( " When measuring performance for a given problem size, perform num\n" ); + printf( " repetitions and report performance using the best timing.\n" ); + printf( "\n" ); + if ( is_gemm( params ) || is_hemm( params ) || is_herk( params ) ) + { + printf( " -a alpha\n" ); + printf( " -b beta\n" ); + printf( " Specify the value to use for the alpha and/or beta scalars.\n" ); + } + else // if ( is_trmm( params ) || is_trsm( params ) ) + { + printf( " -a alpha\n" ); + printf( " Specify the value to use for the alpha scalar.\n" ); + } + printf( "\n" ); + printf( " If any of the computational options is not specified, its default value will\n" ); + printf( " be used. (Please use the -v option to see how the driver is interpreting each\n" ); + printf( " option.)\n" ); + printf( "\n" ); + printf( " The following IO options are also supported:\n" ); + printf( "\n" ); + printf( " -q\n" ); + printf( " -v\n" ); + printf( " Enable quiet or verbose output. (By default, output is quiet.)\n" ); + printf( " The verbose option is useful if you are unsure whether your options\n" ); + printf( " are being interpreted as you intended.\n" ); + printf( "\n" ); + printf( " -h\n" ); + printf( " Display this help and exit.\n" ); + printf( "\n" ); + printf( "\n" ); + + exit(0); + + break; + } + + + case '?': + printf( "%s: unexpected option '%c' given or missing option argument\n", bin, state.optopt ); + exit(1); + break; + + default: + printf( "%s: unexpected option chararcter returned from getopt: %c\n", bin, opt_ch ); + exit(1); + } + } + + // Process the command line options from strings to integers/enums/doubles, + // as appropriate. + proc_params( params ); + + // Inform the user of the values that were chosen (or defaulted to). + if ( params->verbose ) + { + const char* def_str = " (default)"; + const char* nul_str = " "; + + printf( "%%\n" ); + printf( "%% operation: %s\n", params->opname ); + printf( "%% parameter combination: %s%s\n", params->pc_str, ( gave_option_c ? nul_str : def_str ) ); + printf( "%% datatype: %s%s\n", params->dt_str, ( gave_option_d ? nul_str : def_str ) ); + printf( "%% storage combination: %s%s\n", params->sc_str, ( gave_option_s ? nul_str : def_str ) ); + printf( "%% induced method: %s%s\n", params->im_str, ( gave_option_i ? nul_str : def_str ) ); + printf( "%% problem size range: %s%s\n", params->ps_str, ( gave_option_p ? nul_str : def_str ) ); + printf( "%% m dim specifier: %s%s\n", params->m_str, ( gave_option_m ? nul_str : def_str ) ); + if ( is_gemm( params ) || is_hemm( params ) || is_trmm( params ) || is_trsm( params ) ) + printf( "%% n dim specifier: %s%s\n", params->n_str, ( gave_option_n ? nul_str : def_str ) ); + if ( is_gemm( params ) || is_herk( params ) ) + printf( "%% k dim specifier: %s%s\n", params->k_str, ( gave_option_k ? nul_str : def_str ) ); + printf( "%% number of repeats: %s%s\n", params->nr_str, ( gave_option_r ? nul_str : def_str ) ); + printf( "%% alpha scalar: %s%s\n", params->alpha_str, ( gave_option_a ? nul_str : def_str ) ); + if ( is_gemm( params ) || is_hemm( params ) || is_herk( params ) ) + printf( "%% beta scalar: %s%s\n", params->beta_str, ( gave_option_b ? nul_str : def_str ) ); + printf( "%% ---\n" ); + printf( "%% implementation: %s\n", params->impl ); + if ( params->nt == -1 ) + printf( "%% number of threads: %s\n", "unset (defaults to 1)" ); + else + printf( "%% number of threads: %ld\n", params->nt ); + printf( "%% thread affinity: %s\n", ( params->af_str == NULL ? "unset" : params->af_str ) ); + printf( "%%\n" ); + } + + + // If there are still arguments remaining after getopt() processing is + // complete, print an error. + if ( state.optind < argc ) + { + printf( "%s: encountered unexpected non-option argument: %s\n", bin, argv[ state.optind ] ); + exit(1); + } +} + +// ----------------------------------------------------------------------------- + +void proc_params( params_t* params ) +{ + dim_t nt; + + // Binary name doesn't need any conversion. + + // Operation name doesn't need any conversion. + + // Implementation name doesn't need any conversion. + + // Query the multithreading strings and convert them to integers. + if ( strncmp( params->impl, "blis", MAX_STRING_SIZE ) == 0 ) + { + nt = bli_thread_get_num_threads(); + } + else if ( strncmp( params->impl, "mkl", MAX_STRING_SIZE ) == 0 ) + { + nt = bli_env_get_var( "OMP_NUM_THREADS", -1 ); + + if ( nt == -1 ) nt = bli_env_get_var( "MKL_NUM_THREADS", -1 ); + } + else if ( strncmp( params->impl, "openblas", MAX_STRING_SIZE ) == 0 ) + { + nt = bli_env_get_var( "OMP_NUM_THREADS", -1 ); + + if ( nt == -1 ) nt = bli_env_get_var( "OPENBLAS_NUM_THREADS", -1 ); + } + else + { + nt = bli_env_get_var( "OMP_NUM_THREADS", -1 ); + } + + // Store nt to the params_t struct. + params->nt = ( long int )nt; + + // Store the affinity string pointer to the params_t struct. + params->af_str = bli_env_get_str( "GOMP_CPU_AFFINITY" ); + +#if 0 + dim_t nt = bli_thread_get_num_threads(); + dim_t jc_nt = bli_thread_get_jc_nt(); + dim_t pc_nt = bli_thread_get_pc_nt(); + dim_t ic_nt = bli_thread_get_ic_nt(); + dim_t jr_nt = bli_thread_get_jr_nt(); + dim_t ir_nt = bli_thread_get_ir_nt(); + + if ( nt == -1 ) nt = 1; + if ( jc_nt == -1 ) jc_nt = 1; + if ( pc_nt == -1 ) pc_nt = 1; + if ( ic_nt == -1 ) ic_nt = 1; + if ( jr_nt == -1 ) jr_nt = 1; + if ( ir_nt == -1 ) ir_nt = 1; + + params->nt = ( long int )nt; + params->jc_nt = ( long int )jc_nt; + params->pc_nt = ( long int )pc_nt; + params->ic_nt = ( long int )ic_nt; + params->jr_nt = ( long int )jr_nt; + params->ir_nt = ( long int )ir_nt; +#endif + + // Parameter combinations, datatype, and operand storage combination, + // need no conversion. + + // Convert the datatype to a num_t. + bli_param_map_char_to_blis_dt( params->dt_str[0], ¶ms->dt ); + + // Parse the induced method to the corresponding ind_t. + if ( strncmp( params->im_str, "native", 6 ) == 0 ) + { + params->im = BLIS_NAT; + } + else if ( strncmp( params->im_str, "1m", 2 ) == 0 ) + { + params->im = BLIS_1M; + } + else + { + printf( "%s: invalid induced method '%s'.\n", params->bin, params->im_str ); + exit(1); + } + + // Convert the problem size range and dimension specifier strings to + // integers. + sscanf( params->ps_str, "%ld %ld %ld", &(params->sta), + &(params->end), + &(params->inc) ); + sscanf( params->m_str, "%ld", &(params->m) ); + sscanf( params->n_str, "%ld", &(params->n) ); + sscanf( params->k_str, "%ld", &(params->k) ); + + // Convert the number of repeats to an integer. + sscanf( params->nr_str, "%ld", &(params->nr) ); + + // Convert the alpha and beta strings to doubles. + //params->alpha = ( double )atof( params->alpha_str ); + //params->beta = ( double )atof( params->beta_str ); + //sscanf( params->alpha_str, "%lf", &(params->alpha) ); + //sscanf( params->beta_str, "%lf", &(params->beta) ); + params->alpha = strtod( params->alpha_str, NULL ); + params->beta = strtod( params->beta_str, NULL ); +} + +// ----------------------------------------------------------------------------- + +bool is_match( const char* str1, const char* str2 ) +{ + if ( strncmp( str1, str2, MAX_STRING_SIZE ) == 0 ) return TRUE; + return FALSE; +} + +bool is_gemm( params_t* params ) +{ + if ( is_match( params->opname, "gemm" ) ) return TRUE; + return FALSE; +} + +bool is_hemm( params_t* params ) +{ + if ( is_match( params->opname, "hemm" ) ) return TRUE; + return FALSE; +} + +bool is_herk( params_t* params ) +{ + if ( is_match( params->opname, "herk" ) ) return TRUE; + return FALSE; +} + +bool is_trmm( params_t* params ) +{ + if ( is_match( params->opname, "trmm" ) ) return TRUE; + return FALSE; +} + +bool is_trsm( params_t* params ) +{ + if ( is_match( params->opname, "trsm" ) ) return TRUE; + return FALSE; +} + diff --git a/test/3/test_utils.h b/test/3/test_utils.h new file mode 100644 index 0000000000..088f9ce973 --- /dev/null +++ b/test/3/test_utils.h @@ -0,0 +1,142 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2022, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name(s) of the copyright holder(s) nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#ifndef TEST_UTILS_H +#define TEST_UTILS_H + +// Allow C++ users to include this header file in their source code. However, +// we make the extern "C" conditional on whether we're using a C++ compiler, +// since regular C compilers don't understand the extern "C" construct. +#ifdef __cplusplus +extern "C" { +#endif + +// String arrays allocated using this constant will always add 1 to +// the value defined below, and so the total allocated will still be +// a nice power of two. +#define MAX_STRING_SIZE 31 + + +extern const char* GLOB_DEF_DT_STR; +extern const char* GLOB_DEF_SC_STR; +extern const char* GLOB_DEF_IM_STR; + +extern const char* GLOB_DEF_PS_STR; +extern const char* GLOB_DEF_M_STR; +extern const char* GLOB_DEF_N_STR; +extern const char* GLOB_DEF_K_STR; + +extern const char* GLOB_DEF_NR_STR; + +extern const char* GLOB_DEF_ALPHA_STR; +extern const char* GLOB_DEF_BETA_STR; + + +typedef struct params_s +{ + // Binary name. + const char* bin; + + // Operation name. + const char* opname; + + // Implementation name. + const char* impl; + + // Multithreading parameters: number of threads and affinity string. + const char* nt_str; + long int nt; + const char* af_str; + + // Parameter combinations, datatype, operand storage combination, + // and induced method. + const char* pc_str; + const char* dt_str; + const char* sc_str; + num_t dt; + + const char* im_str; + ind_t im; + + // Problem size range and dimension specifiers. + const char* ps_str; + const char* m_str; + const char* n_str; + const char* k_str; + long int sta; + long int end; + long int inc; + long int m; + long int n; + long int k; + + // Number of repeats. + const char* nr_str; + long int nr; + + // Value of alpha and beta. + const char* alpha_str; + const char* beta_str; + double alpha; + double beta; + + // A flag controlling whether to print informational messages. + bool verbose; + +} params_t; + +typedef void (*init_fp)( params_t* params ); + +// ----------------------------------------------------------------------------- + +void init_def_params( params_t* params ); +void parse_cl_params( int argc, char** argv, init_fp fp, params_t* params ); +void proc_params( params_t* params ); + +// ----------------------------------------------------------------------------- + +bool is_match( const char* str1, const char* str2 ); +bool is_gemm( params_t* params ); +bool is_hemm( params_t* params ); +bool is_herk( params_t* params ); +bool is_trmm( params_t* params ); +bool is_trsm( params_t* params ); + +#ifdef __cplusplus +} +#endif + +#endif