Skip to content

Debugging Postgres CPU issues

Guy Margalit edited this page May 1, 2024 · 5 revisions

Using pg_stat_statements to get the top CPU\time consuming queries

  • connect to the DB pod and run psql

    oc rsh noobaa-db-pg-0
    # inside the DB pod use psql
    psql -d nbcore
    
  • load the pg_stat_statements extension

    create extension pg_stat_statements;
    

    (remember to remove it after with drop extension pg_stat_statements;)

  • reset the pg_stat_statement counters

    select pg_stat_reset();
    select pg_stat_statements_reset();
    
  • wait for a few minutes to collect stats

  • use the pg_stat_statements table to get stats. e.g.:

    select * from pg_stat_statements order by total_time desc limit 20;
  • a nicer query to get information in a cleaner way:

    SELECT substring(query, 1, 50) AS short_query,
               round(total_time::numeric, 2) AS total_time,
               calls,
               round(mean_time::numeric, 2) AS mean,
               round((100 * total_time /
               sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu
    FROM    pg_stat_statements
    ORDER BY total_time DESC
    LIMIT 20;

    This should print something similar to this:

                        short_query                     | total_time | calls | mean | percentage_cpu
    ----------------------------------------------------+------------+-------+------+----------------
     SELECT data #- $1 as data FROM systems WHERE data- |    3302.72 |  4023 | 0.82 |          18.01
     SELECT _id, SUM(value) AS value FROM map_aggregate |    2418.17 |  1578 | 1.53 |          13.18
     SELECT (data->$1) AS _id,SUM((data->>$2)::NUMERIC) |    2228.66 |  7073 | 0.32 |          12.15
     SELECT * FROM endpointgroupreports WHERE (data->>$ |    1903.97 |  2292 | 0.83 |          10.38
     SELECT _id, SUM(value) AS value FROM map_aggregate |    1594.72 |  1578 | 1.01 |           8.69
     SELECT data #- $1 as data FROM clusters WHERE data |    1481.01 |  4023 | 0.37 |           8.07
     SELECT _id, SUM(value) AS value FROM map_aggregate |    1259.49 |  1578 | 0.80 |           6.87
     SELECT data #- $1 as data FROM pools WHERE data->$ |     642.63 |  4023 | 0.16 |           3.50
     SELECT data #- $1 as data FROM buckets WHERE data- |     460.30 |  4023 | 0.11 |           2.51
     UPDATE systems SET data = jsonb_set(jsonb_set(data |     356.51 |   794 | 0.45 |           1.94
     SELECT data #- $1 as data FROM agent_configs WHERE |     342.80 |  4023 | 0.09 |           1.87
     SELECT data #- $1 as data FROM accounts WHERE data |     328.19 |  4023 | 0.08 |           1.79
     SELECT data #- $1 as data FROM tiers WHERE data->$ |     275.45 |  4023 | 0.07 |           1.50
     SELECT data #- $1 as data FROM tieringpolicies WHE |     253.92 |  4023 | 0.06 |           1.38
     SELECT data #- $1 as data FROM roles WHERE data->$ |     156.36 |  4023 | 0.04 |           0.85
     UPDATE nodes SET data = jsonb_set(jsonb_set(jsonb_ |     150.63 |   195 | 0.77 |           0.82
     UPDATE pools SET data = jsonb_set(data,$1,$2::json |     150.52 |  2367 | 0.06 |           0.82
     SELECT data #- $1 as data FROM master_keys WHERE d |     135.52 |  4023 | 0.03 |           0.74
     SELECT data #- $1 as data FROM namespace_resources |     117.75 |  4023 | 0.03 |           0.64
     SELECT * FROM funcs WHERE (data->>$1=$2 and data-> |     115.08 |  2292 | 0.05 |           0.63
    
  • another nice query:

    SELECT total_time*'1ms'::interval AS total, mean_time*'1ms'::interval AS avg, calls, query from pg_stat_statements ORDER BY total_time DESC LIMIT 5;

Enabling query logging in core\endpoints logs

  • add the env PG_ENABLE_QUERY_LOG='true' to noobaa-core statefulet or noobaa-endpoint deployment
  • after the pods restart, each DB query is logged in the pods logs, with the label QUERY_LOG. e.g.:
    Jul-14 10:18:21.012 [WebServer/41]    [L0] core.util.postgres_client:: QUERY_LOG: {
        "tag": "T00000000|Q00000395",
        "took": "6.0ms",
        "query": { "text": "SELECT * FROM endpointgroupreports WHERE (data->>'group_name'='598aea64-e9c8-4771-9d94-c263f508f644' and data->'start_time'>='1657792800000'::jsonb and data->'end_time'<='1657793400000'::jsonb)" },
        "clients_pool": { "total": 10, "waiting": 0, "idle": 6 },
        "stack": ["    at log_query (/root/node_modules/noobaa-core/src/util/postgres_client.js:179:17)", "    at _do_query (/root/node_modules/noobaa-core/src/util/postgres_client.js:238:19)", "    at runMicrotasks (<anonymous>)", "    at processTicksAndRejections (internal/process/task_queues.js:95:5)", "    at async PostgresTable.find (/root/node_modules/noobaa-core/src/util/postgres_client.js:767:25)", "    at async _get_endpoint_groups (/root/node_modules/noobaa-core/src/server/system_services/system_server.js:1262:21)", "    at async /root/node_modules/noobaa-core/src/util/promise.js:103:46", "    at async Promise.all (index 8)", "    at async Object.map_props (/root/node_modules/noobaa-core/src/util/promise.js:101:9)", "    at async Object.read_system (/root/node_modules/noobaa-core/src/server/system_services/system_server.js:481:9)"]
    }
    
Clone this wiki locally