Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch 4 #30

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Patch 4 #30

wants to merge 22 commits into from

Conversation

zijunlii
Copy link

Thanks for taking the time to contribute to GCC! Please be advised that if you are
viewing this on github.com, that the mirror there is unofficial and unmonitored.
The GCC community does not use github.com for their contributions. Instead, we use
a mailing list ([email protected]) for code submissions, code reviews, and
bug reports. Please send patches there instead.

avieira-arm and others added 22 commits June 19, 2024 17:05
This patch adds support in the target agnostic doloop pass for the detection of
predicated vectorized hardware loops.  Arm is currently the only target that
will make use of this feature.

gcc/ChangeLog:

	* df-core.cc (df_bb_regno_only_def_find): New helper function.
	* df.h (df_bb_regno_only_def_find): Declare new function.
	* loop-doloop.cc (doloop_condition_get): Add support for detecting
	predicated vectorized hardware loops.
	(doloop_modify): Add support for GTU condition checks.
	(doloop_optimize): Update costing computation to support alterations to
	desc->niter_expr by the backend.

Co-authored-by: Stam Markianos-Wright <[email protected]>
This patch adds support for MVE Tail-Predicated Low Overhead Loops by using the
doloop funcitonality added to support predicated vectorized hardware loops.

gcc/ChangeLog:

	* config/arm/arm-protos.h (arm_target_bb_ok_for_lob): Change
	declaration to pass basic_block.
	(arm_attempt_dlstp_transform): New declaration.
	* config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): Define targethook.
	(TARGET_PREDICT_DOLOOP_P): Likewise.
	(arm_target_bb_ok_for_lob): Adapt condition.
	(arm_mve_get_vctp_lanes): New function.
	(arm_dl_usage_type): New internal enum.
	(arm_get_required_vpr_reg): New function.
	(arm_get_required_vpr_reg_param): New function.
	(arm_get_required_vpr_reg_ret_val): New function.
	(arm_mve_get_loop_vctp): New function.
	(arm_mve_insn_predicated_by): New function.
	(arm_mve_across_lane_insn_p): New function.
	(arm_mve_load_store_insn_p): New function.
	(arm_mve_impl_pred_on_outputs_p): New function.
	(arm_mve_impl_pred_on_inputs_p): New function.
	(arm_last_vect_def_insn): New function.
	(arm_mve_impl_predicated_p): New function.
	(arm_mve_check_reg_origin_is_num_elems): New function.
	(arm_mve_dlstp_check_inc_counter): New function.
	(arm_mve_dlstp_check_dec_counter): New function.
	(arm_mve_loop_valid_for_dlstp): New function.
	(arm_predict_doloop_p): New function.
	(arm_loop_unroll_adjust): New function.
	(arm_emit_mve_unpredicated_insn_to_seq): New function.
	(arm_attempt_dlstp_transform): New function.
	* config/arm/arm.opt (mdlstp): New option.
	* config/arm/iterators.md (dlstp_elemsize, letp_num_lanes,
	letp_num_lanes_neg, letp_num_lanes_minus_1): New attributes.
	(DLSTP, LETP): New iterators.
	* config/arm/mve.md (predicated_doloop_end_internal<letp_num_lanes>,
	dlstp<dlstp_elemsize>_insn): New insn patterns.
	* config/arm/thumb2.md (doloop_end): Adapt to support tail-predicated
	loops.
	(doloop_begin): Likewise.
	* config/arm/types.md (mve_misc): New mve type to represent
	predicated_loop_end insn sequences.
	* config/arm/unspecs.md:
	(DLSTP8, DLSTP16, DLSTP32, DSLTP64,
	LETP8, LETP16, LETP32, LETP64): New unspecs for DLSTP and LETP.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/lob.h: Add new helpers.
	* gcc.target/arm/lob1.c: Use new helpers.
	* gcc.target/arm/lob6.c: Likewise.
	* gcc.target/arm/mve/dlstp-compile-asm-1.c: New test.
	* gcc.target/arm/mve/dlstp-compile-asm-2.c: New test.
	* gcc.target/arm/mve/dlstp-compile-asm-3.c: New test.
	* gcc.target/arm/mve/dlstp-int8x16.c: New test.
	* gcc.target/arm/mve/dlstp-int8x16-run.c: New test.
	* gcc.target/arm/mve/dlstp-int16x8.c: New test.
	* gcc.target/arm/mve/dlstp-int16x8-run.c: New test.
	* gcc.target/arm/mve/dlstp-int32x4.c: New test.
	* gcc.target/arm/mve/dlstp-int32x4-run.c: New test.
	* gcc.target/arm/mve/dlstp-int64x2.c: New test.
	* gcc.target/arm/mve/dlstp-int64x2-run.c: New test.
	* gcc.target/arm/mve/dlstp-invalid-asm.c: New test.

Co-authored-by: Stam Markianos-Wright <[email protected]>
gcc/fortran/ChangeLog:

	PR fortran/115390
	* trans-decl.cc (gfc_conv_cfi_to_gfc): Move derivation of type sizes
	for character via gfc_trans_vla_type_sizes to after character length
	has been set.

gcc/testsuite/ChangeLog:

	PR fortran/115390
	* gfortran.dg/bind_c_char_11.f90: New test.
Most of std::pair constructors implemented using C++20 concepts have a
conditional noexcept-specifier, but the default constructor doesn't.
This fixes that.

libstdc++-v3/ChangeLog:

	* include/bits/stl_pair.h [__cpp_lib_concepts] (pair()): Add
	conditional noexcept.
Making the state ready for a std::promise<void> only needs to move a
unique_ptr, which cannot throw. Make its call operator noexcept.
Similarly, making the state ready by storing an exception_ptr also can't
throw, so make that call operator noexcept too.

libstdc++-v3/ChangeLog:

	* include/std/future (_State_baseV2::_Setter<R, void>): Add
	noexcept to call operator.
	(_State_baseV2::_Setter<R, __exception_ptr_tag>): Likewise.
libstdc++-v3/ChangeLog:

	* include/std/future: Adjust whitespace to use tabs for
	indentation.
This patch makes avoid inserting a MEMW instruction before a load/store
nstruction with volatile memory reference if there is already a MEMW
immediately before it.

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (print_operand):
	When outputting MEMW before the instruction, check if the previous
	instruction is already that.
This patch enables -march/-mtune=shijidadao, costs and tunings are set
according to the characteristics of the processor.

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
	* common/config/i386/i386-common.cc: Add shijidadao.
	* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
	Add ZHAOXIN_FAM7H_SHIJIDADAO.
	* config.gcc: Add shijidadao.
	* config/i386/driver-i386.cc (host_detect_local_cpu):
	Let -march=native recognize shijidadao processors.
	* config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
	* config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
	(m_SHIJIDADAO): New definition.
	* config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
	* config/i386/x86-tune-costs.h (struct processor_costs):
	Add shijidadao_cost.
	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
	(ix86_adjust_cost): Ditto.
	* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add m_SHIJIDADAO.
	(X86_TUNE_USE_GATHER_4PARTS): Ditto.
	(X86_TUNE_USE_GATHER_8PARTS): Ditto.
	(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
	* doc/extend.texi: Add details about shijidadao.
	* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

	* g++.target/i386/mv32.C: Handle new -march
	* gcc.target/i386/funcspec-56.inc: Ditto.
We don't really support _Complex _BitInt(N), the only place we use
bitint complex types is for the .{ADD,SUB,MUL}_OVERFLOW internal function
results and COMPLEX_EXPR in the usual case should be either not present
yet because the ifns weren't folded and will be lowered, or optimized
into something simpler, because normally the complex bitint should be
used just for extracting the 2 subparts from it.
Still, with disabled optimizations it can occassionally happen that it
appears in the IL and that is why there is support for lowering those,
but it doesn't handle optimizing those too much, so if it uses SSA_NAME,
it relies on them having a backing VAR_DECL during the lowering.
This is normally achieves through the
                      && ((is_gimple_assign (use_stmt)
                           && (gimple_assign_rhs_code (use_stmt)
                               != COMPLEX_EXPR))
                          || gimple_code (use_stmt) == GIMPLE_COND)
hunk in gimple_lower_bitint, but as the following testcase shows, there
is one thing I've missed, the load optimization isn't guarded by the
above stuff.  So, either we'd need to add support for loads to
lower_complexexpr_stmt, or because they should be really rare, this
patch just disables the load optimization if at least one load use is
a COMPLEX_EXPR (like we do already for PHIs, calls, asm).

2024-06-19  Jakub Jelinek  <[email protected]>

	PR tree-optimization/115544
	* gimple-lower-bitint.cc (gimple_lower_bitint): Disable optimizing
	loads used by COMPLEX_EXPR operands.

	* gcc.dg/bitint-107.c: New test.
Binutils 2.42 and before don't support Zaamo/Zalrsc. When users specify
both Zaamo and Zalrsc, promote them to 'a' in the -march string.

This does not affect testsuite results for users with old versions of binutils.
Testcases that failed due to 'call'/isa string continue to fail after this PATCH
when using an old version of binutils.

gcc/ChangeLog:

	* common/config/riscv/riscv-common.cc: Add 'a' extension to
	riscv_combine_info.

Signed-off-by: Patrick O'Neill <[email protected]>
We can unify eqne and other comparison operations.

Tested on RV32 and RV64

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-bases.cc: Remove eqne cond
	* config/riscv/vector.md (@pred_eqne<mode>_scalar): Remove patterns
	(*pred_eqne<mode>_scalar_merge_tie_mask): Ditto
	(*pred_eqne<mode>_scalar): Ditto
	(*pred_eqne<mode>_scalar_narrow): Ditto

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/float-point-cmp-eqne.c: New test.
Add a utility function to check if a statement is lane-reducing operation,
which could simplify some existing code.

2024-06-16 Feng Xue <[email protected]>

gcc/
	* tree-vectorizer.h (lane_reducing_stmt_p): New function.
	* tree-vect-slp.cc (vect_analyze_slp): Use new function
	lane_reducing_stmt_p to check statement.
In vectorizable_reduction, one check on a reduction operand via index could be
contained by another one check via pointer, so remove the former.

2024-06-16 Feng Xue <[email protected]>

gcc/
	* tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated
	check.
Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better
to keep only one.

2024-06-16 Feng Xue <[email protected]>

gcc/
	* tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and
	replace it to another local variable reduction_type.
It's better to place 3 relevant independent variables into array, since we
have requirement to access them via an index in the following patch. At the
same time, this change may get some duplicated code be more compact.

2024-06-16 Feng Xue <[email protected]>

gcc/
	* tree-vect-loop.cc (vect_transform_reduction): Replace vec_oprnds0/1/2
	with one new array variable vec_oprnds[3].
According to logic of code nearby the assertion, all lane-reducing operations
should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p"
treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be allowed
by the following assertion "gcc_assert (commutative_binary_op_p (...))", so
tighten the assertion.

2024-06-16 Feng Xue <[email protected]>

gcc/
	* tree-vect-loop.cc (vect_transform_reduction): Change assertion to
	cover all lane-reducing ops.
When dlopen and pthread_create are in libc the variable is
set to "none required", therefore running configure will show
the following errors:

./configure: line 8997: test: too many arguments
./configure: line 8999: test: too many arguments
./configure: line 9003: test: too many arguments
./configure: line 9005: test: =: unary operator expected

ChangeLog:

	PR bootstrap/115453
	* configure.ac: Quote variable result of AC_SEARCH_LIBS.  Fix
	typo ac_cv_search_pthread_crate.
	* configure: Regenerate.

Signed-off-by: Collin Funk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.