Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Filtering by 'Visit converted specific goal name' using CONTAINS is Incorrect #22816

Open
4 tasks done
bojana-z opened this issue Dec 3, 2024 · 1 comment
Open
4 tasks done
Labels
Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member

Comments

@bojana-z
Copy link

bojana-z commented Dec 3, 2024

What happened?

When applying a filter for the dimension Visit converted specific goal name with the condition CONTAINS (e.g., "foo"), the results are expected to include only goals containing the specified keyword. However, the filter also incorrectly includes unrelated goals (e.g., "bar").

The issue is reproducible in both Custom Reports and Segments.
The customer mentioned instances where goals with the same goal ID from another website were included, but I could not reproduce this aspect.

image
image
image

What should happen?

The filter should only return goals containing "foo" (foo1 and foo2).

How can this be reproduced?

  1. Set up three distinct goals: foo1, foo2, and bar1.
  2. Create visits associated with each goal.
  3. Apply a filter using the condition CONTAINS "foo" for the dimension Visit converted specific goal name.

Matomo version

5.1.2

PHP version

8.2.19

Server operating system

No response

What browsers are you seeing the problem on?

No response

Computer operating system

No response

Relevant log output

No response

Validations

@bojana-z bojana-z added Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member labels Dec 3, 2024
@kevin-conn
Copy link

Hi,
Adding to this, since we were the ones bringing it up:

At first, we suspected this was simply because we were misinformed about what such a filter is supposed to do, i.e. rather than showing only goal rows that match the filter, it would show goals that were achieved by visitors that matched the behaviour of the filter. In this case, all goals triggered by visitors that achieved the 'foo' goal, as is the case in the situation pertaining to this issue.

However, not all goals we would expect to result from that visitor-based logic showed up on the report, so this seemed like a different case. Hence, we delved into the SQL resulting from the report definition.

Suspected cause

The MySQL statement tied to a report with this filter showed the following :

SELECT  /*+ MAX_EXECUTION_TIME(7200000) */

                                CASE
                                        WHEN counter = 50001 THEN '__mtm_ranking_query_others__'
                                        ELSE `Goals.GoalName`
                                END AS `Goals.GoalName`
                        ,
                                CASE
                                        WHEN counter = 50001 THEN '__mtm_ranking_query_others__'
                                        ELSE `MarketingCampaignsReporting.CampaignMedium`
                                END AS `MarketingCampaignsReporting.CampaignMedium`

                                , `nb_visits_converted`, `nb_visits`
                        FROM (
                        SELECT
                                `Goals.GoalName`, `MarketingCampaignsReporting.CampaignMedium`,

                        CASE
                                WHEN @counter = 50001 THEN 50001
                                ELSE @counter:=@counter+1
                        END
                 AS counter
                                , `nb_visits_converted`, `nb_visits`
                        FROM
                                ( SELECT @counter:=0 ) initCounter,
                                ( SELECT /* CustomReports */ /* sites 15 */ /* 2024-11-19,2024-11-20 */
                                log_inner.name AS 'Goals.GoalName', log_inner.campaign_medium AS 'MarketingCampaignsReporting.CampaignMedium',
                                sum(case log_inner.visit_goal_converted when 1 then 1 else 0end) AS 'nb_visits_converted',
                                count(distinct log_inner.idvisit) AS 'nb_visits'
                        FROM

        (

                        SELECT
                                goal_idgoal.name,
log_visit.campaign_medium,
log_visit.visit_goal_converted,
log_visit.idvisit
                        FROM
                                matomo_log_conversion AS log_conversion
                                LEFT JOIN matomo_goal AS goal_idgoal ON log_conversion.idgoal = goal_idgoal.idgoal AND goal_idgoal.idsite IN ('15')
                                LEFT JOIN matomo_log_visit AS log_visit ON log_visit.idvisit = log_conversion.idvisit
                                LEFT JOIN matomo_goal AS goal_segment_log_conversionidgoal ON log_conversion.idgoal = goal_segment_log_conversionidgoal.idgoal
                        WHERE
                                log_visit.visit_last_action_time >= '2024-11-19 23:00:00'
                                AND log_visit.visit_last_action_time <= '2024-11-20 22:59:59'
                                AND log_visit.idsite IN ('15') AND (( goal_idgoal.name is not null AND log_visit.campaign_medium is not null ) AND (goal_segment_log_conversionidgoal.name LIKE '%foo%'))
                        GROUP BY
                                goal_idgoal.name, log_visit.campaign_medium, `log_visit`.`idvisit`
                        ORDER BY
                                NULL
        ) AS log_inner
                        GROUP BY
                                log_inner.name, log_inner.campaign_medium
                        ORDER BY
                                nb_visits_converted ) actualQuery
                 ) AS withCounter
                        GROUP BY counter

where the most important section is the log_inner-subquery:

 SELECT
                                goal_idgoal.name,
log_visit.campaign_medium,
log_visit.visit_goal_converted,
log_visit.idvisit
                        FROM
                                matomo_log_conversion AS log_conversion
                                LEFT JOIN matomo_goal AS goal_idgoal ON log_conversion.idgoal = goal_idgoal.idgoal AND goal_idgoal.idsite IN ('15')
                                LEFT JOIN matomo_log_visit AS log_visit ON log_visit.idvisit = log_conversion.idvisit
                                LEFT JOIN matomo_goal AS goal_segment_log_conversionidgoal ON log_conversion.idgoal = goal_segment_log_conversionidgoal.idgoal
                        WHERE
                                log_visit.visit_last_action_time >= '2024-11-19 23:00:00'
                                AND log_visit.visit_last_action_time <= '2024-11-20 22:59:59'
                                AND log_visit.idsite IN ('15') AND (( goal_idgoal.name is not null AND log_visit.campaign_medium is not null ) AND (goal_segment_log_conversionidgoal.name LIKE '%foo%'))
                        GROUP BY
                                goal_idgoal.name, log_visit.campaign_medium, `log_visit`.`idvisit`
                        ORDER BY
                                NULL

And in particular

LEFT JOIN matomo_goal AS goal_segment_log_conversionidgoal ON log_conversion.idgoal = goal_segment_log_conversionidgoal.idgoal

The user has multiple sites, with their own goals. It seems idgoal is reused across sites in the tables, making the join above not specific enough. While the goalname filter is applied to the rows in the table aliased as 'goal_segment_log_conversionidgoal ', the join clause - not containing idsite - returns a join set where the id of a goal that matches 'foo' in a different site, is matched against a goal with the same id on site 15.

For clarification, see the table below, which serves as an example for how these goal records are present in matomo_log_conversion as well as in matomo_goal:

image

The intention of the user is to get foo 1 and foo 2 (since those match idsite 15 and the 'foo' like match). However, they got foo 1, foo 2 and bar 1.
This is because bar 1 shares its idgoal with foo 3 on idsite 16. Since 'goal_segment_log_conversionidgoal' is not filtered on idsite nor joined on it, we end up with additional records in matomo_log_conversion.

I have confirmed that manually adding a filter for goal_segment_log_conversionidgoal on idsite on the join clause or in the where clause does return the expected result set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member
Projects
None yet
Development

No branches or pull requests

2 participants