Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add normalized power metrics #227

Closed
wants to merge 2 commits into from
Closed

Add normalized power metrics #227

wants to merge 2 commits into from

Conversation

tjablin
Copy link
Collaborator

@tjablin tjablin commented Sep 20, 2021

No description provided.

@github-actions
Copy link

github-actions bot commented Sep 20, 2021

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@DilipSequeira
Copy link
Contributor

@tjablin for clarification - this is a number that appears only in the power tables, and is calculated as measured performance / measured energy?

@rnaidu02
Copy link
Contributor

rnaidu02 commented Oct 5, 2021

Power WG (Sachin) would like to discuss this item with Tom Jablin.

@blakehechtman
Copy link

This would be great and make the tables more comprehensible for anyone trying to understand energy efficiency.

@tjablin
Copy link
Collaborator Author

tjablin commented Oct 18, 2021

@DilipSequeira Yes.

Copy link
Contributor

@DilipSequeira DilipSequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DilipSequeira
Copy link
Contributor

Adding Sachin for review from power WG.

@s-idgunji
Copy link
Contributor

  1. Power WG (Sachin) would like to discuss this item with Tom Jablin.

We discussed in the Power WG. The concern is that a normalized metric such as Perf/W is context dependent and does not hold constant across a range of performance and power. The metric is only relevant to the measured performance metric and measured power metric . If that is the case, then it can be easily derived from the given primary metrics. What is the proposal to communicate to any reviewer referencing that : The Perf/W metric published is only relevant for the associated measured primary metrics for performance and power and that it cannot be applied to any other performance value measured or otherwise estimated.
That reviewers understand the limitations of normalized metrics and will rationally use the metric in comparisons is a fallacy. We would need to provide clear guidance on the use of normalized metrics and we have not seen such a proposal in the Power WG when discussing this PR.

@tjablin
Copy link
Collaborator Author

tjablin commented Oct 18, 2021

@s-idgunji I don't understand your comment at all. Could please provide a concrete example?

@s-idgunji
Copy link
Contributor

@s-idgunji I don't understand your comment at all. Could please provide a concrete example?

@tjablin - System Power is elastic. You can constrain your system to be at lower Power say Po1 and for a benchmark get Perf Pe1 , the Perf/W is Pe1/Po1 .

You can increase the system Power to Po2 and get Perf Pe2 , Perf/W for this scenario is Pe2/Po2

Pe2/Po2 != Pe1/Po1

But if submission is just for Pe1 , Po1 one cannot use that normalized metric Pe1/Po1 for any performance point like Pe2 . That's what I meant. And we have to be clear that such aspects are understood through explicit communication and not assume that reviewers understand such aspects.

@s-idgunji
Copy link
Contributor

Power WG will propose the appropriate text for the table and the normalized metric added (if this normalized metric is agreed as an addition by Inference WG)

@psyhtest
Copy link
Contributor

psyhtest commented Oct 20, 2021

I believe that the Results Guidelines are here to help. In particular, requiring to include the MLPerf result ID automatically identifies the system under test, the software, etc. Including the normalized Pe1/Po1 metric into the result's row will prevent its misuse with another Pe2 result from another row (with another result ID).

The only problem I see is that someone can take the normalized metric for one workload and use it with another workload in the same row. However, given that samples/Joule may be very different for different workloads (e.g. 200 for ResNet, 10 for BERT), it's only a minor concern.

@s-idgunji
Copy link
Contributor

Notes from Oct 20th Power WG meeting pertaining to this PR

For PR#227 , if the Inference WG votes for adding normalized metric , Power WG will add a footnote to the table to describe that the normalized metric be used only in context of the submitted primary metrics. Exact wording to be worked out in a future meeting since it’s not needed prior submission date, and is not blocking any work towards v2.0 submission.

@psyhtest
Copy link
Contributor

psyhtest commented Apr 12, 2022

The below figure has appeared on NVIDIA's blog with the following caption:

MLPerf v2.0 Inference Edge Closed and Edge Closed Power; Performance/Watt from MLPerf results for respective submissions for Data Center and Edge, Offline Throughput, and Power. NVIDIA Xavier AGX Xavier: 1.1-110 and 1.1-111 | Jetson AGX Orin: 2.0-140 and 2.0-141.

edge-performance

For ResNet50, the claimed Performance improvement of ~3x could have been assumed to be taken from either non-MaxQ (6,138.84 / 2,039.11 = 3.01) or MaxQ (4,750.26 / 1,506.53 = 3.15).

System Submission Performance Power Performance / Power
Xavier 1.1-110 2,039.11 N/A N/A
Xavier (MaxQ) 1.1-111 1,506.53 25.24 59.67
Orin 2.0-140 6,138.84 N/A N/A
Orin (MaxQ) 2.0-141 4,750.26 42.42 111.98

But for BERT-99, the claimed Performance improvement of ~5x could have only come from non-MaxQ (476.34 / 96.73 = 4.92), not from MaxQ (394.33 / 61.35 = 6.43).

System Submission Performance Power Performance / Power
Xavier 1.1-110 96.73 N/A N/A
Xavier (MaxQ) 1.1-111 61.35 18.89 3.25
Orin 2.0-140 476.34 N/A N/A
Orin (MaxQ) 2.0-141 394.33 53.59 7.36

Therefore, it appears that the normalized metric is used out of context here, contrary to the last comment. I think this would be cleaner to have two separate figures:

  1. Showing the improvement of Performance from Xavier (non-MaxQ) to Orin (non-MaxQ) with the caption:

NVIDIA Xavier AGX Xavier: 1.1-110 | Jetson AGX Orin: 2.0-140.

  1. Showing the improvement of Performance / Power (and possibly Performance as well) from Xavier (MaxQ) to Orin (MaxQ) with the caption:

NVIDIA Xavier AGX Xavier: 1.1-111 | Jetson AGX Orin: 2.0-141.

I'm not saying the NVIDIA figure violates any rule today, but we should be more explicit about how to make such comparisons in the future.

@s-idgunji
Copy link
Contributor

The below figure has appeared on NVIDIA's blog with the following caption:

MLPerf v2.0 Inference Edge Closed and Edge Closed Power; Performance/Watt from MLPerf results for respective submissions for Data Center and Edge, Offline Throughput, and Power. NVIDIA Xavier AGX Xavier: 1.1-110 and 1.1-111 | Jetson AGX Orin: 2.0-140 and 2.0-141.

edge-performance

For ResNet50, the claimed Performance improvement of ~3x could have been assumed to be taken from either non-MaxQ (6,138.84 / 2,039.11 = 3.01) or MaxQ (4,750.26 / 1,506.53 = 3.15).

System Submission Performance Power Performance / Power
Xavier 1.1-110 2,039.11 N/A N/A
Xavier (MaxQ) 1.1-111 1,506.53 25.24 59.67
Orin 2.0-140 6,138.84 N/A N/A
Orin (MaxQ) 2.0-141 4,750.26 42.42 111.98
But for BERT-99, the claimed Performance improvement of ~5x could have only come from non-MaxQ (476.34 / 96.73 = 4.92), not from MaxQ (394.33 / 61.35 = 6.43).

System Submission Performance Power Performance / Power
Xavier 1.1-110 96.73 N/A N/A
Xavier (MaxQ) 1.1-111 61.35 18.89 3.25
Orin 2.0-140 476.34 N/A N/A
Orin (MaxQ) 2.0-141 394.33 53.59 7.36
Therefore, it appears that the normalized metric is used out of context here, contrary to the last comment. I think this would be cleaner to have two separate figures:

  1. Showing the improvement of Performance from Xavier (non-MaxQ) to Orin (non-MaxQ) with the caption:

NVIDIA Xavier AGX Xavier: 1.1-110 | Jetson AGX Orin: 2.0-140.

  1. Showing the improvement of Performance / Power (and possibly Performance as well) from Xavier (MaxQ) to Orin (MaxQ) with the caption:

NVIDIA Xavier AGX Xavier: 1.1-111 | Jetson AGX Orin: 2.0-141.

I'm not saying the NVIDIA figure violates any rule today, but we should be more explicit about how to make such comparisons in the future.

Hi @psyhtest - We discussed this in the Power WG and from the discussions with @anirban-ghosh in the meeting, that the table is fine. On discussing with @DilipSequeira and/or @georgelyuan, if there are any questions please bring up in the Power WG next week. Thanks.

@psyhtest
Copy link
Contributor

We discussed this in the Power WG and from the discussions with @anirban-ghosh in the meeting, that the table is fine.

I believe Tejus had the same reservation as me. George agrees that this could have been more explicit. Let's revisit in the next meeting.

@s-idgunji
Copy link
Contributor

We discussed this in the Power WG and from the discussions with @anirban-ghosh in the meeting, that the table is fine.

I believe Tejus had the same reservation as me. George agrees that this could have been more explicit. Let's revisit in the next meeting.

I don't think that was the case. When it was clarified that the comparison is between consistent {Perf,Power} tuples there was no issue. From a Power WG, we'd like to be clear on what exactly is the issue that you are raising.

@psyhtest
Copy link
Contributor

Let Pid be the Performance (samples per second) and Wid be the Power of submission number id, and Rid to be their ratio, i.e. the normalized metric. For example, for ResNet50:

System Submission Performance Power Performance / Power
Xavier 1.1-110 P1.1-110=2,039.11 W1.1-110=N/A R1.1-110=N/A
Xavier (MaxQ) 1.1-111 P1.1-111=1,506.53 W1.1-111=25.24 R1.1-111=59.67
Orin 2.0-140 P2.0-140=6,138.84 W2.0-140=N/A R2.0-140=N/A
Orin (MaxQ) 2.0-141 P2.0-141=4,750.26 W2.0-141=42.42 R2.0-141=111.98

The figure effectively shows (P2.0-140 / P1.1-110 = 3.01) as "Perf" bars, and (R2.0-141 / R1.1-111 = 1.88) as "Perf/Watt" bars next to each other.

But the latter is actually (P2.0-141 / W2.0-141) / (P1.1-111 / W1.1-111) = (P2.0-141 / P1.1-111) * (W1.1-111 / W2.0-141) = 3.15 * 0.59.

In summary, the adjacent "Perf" and "Perf/Watt" bars are based on the ratio of different Performance values.

@psyhtest
Copy link
Contributor

We know the ratio of (W1.1-111 / W2.0-141 = 0.59) because W1.1-111 and W2.0-141 were measured and reported by NVIDIA.

@s-idgunji As you said many times yourself, the Perf/Watt ratio cannot be assumed constant across different modes such as MaxN and MaxQ.

Unfortunately, we do not know the MaxN ratio (W1.1-110 / W2.0-140). But, hypothetically, if (W1.1-110= 50) and (W2.0-140 = 70), this ratio is 0.71. Conversely, if (W1.1-110= 50) and the ratio is assumed the same as the MaxQ ratio (0.59), (W2.0-140 = 70) = 85. Without taking any Orin measurements myself, I find the former hard to believe.

@s-idgunji
Copy link
Contributor

=============================================================================

@s-idgunji As you said many times yourself, the Perf/Watt ratio cannot be assumed constant across different modes such as MaxN and MaxQ.

And nothing in the table is breaking this ask. As far as I can tell, there is no Watt with the Perf only submission and neither is the Max Perf being used that is the case in Perf/W comparison. The rule is that any Perf/W can use Perf with associated Watt and no other. And one can only refer to the measured Perf/W in the comparison.

If normalized data in the NVIDIA table needs to be presented separately , footnoted clearly to avoid confusion, and you have suggestions, please reach out to NVIDIA directly. If on the other hand you think that the ratios used are incorrect for Perf/W , i.e incorrect Perf Value with incorrect Power Value , then please point out in our next WG.

And do you want to propose guidelines on how Perf/W be normalized and results be presented by submitters? That can be discussed separately as well.

But, we want to be careful that eventually Perf/W comparisons can also lead to Perf comparisons. For e.g. how can submitters compare "Perf only" Perf data vs "Perf, Power" Perf data . Is that a valid comparison? Some submitter could do that. So you can see that this can become increasingly complicated as submissions can cover energy efficient points to max performance points and so on.

I also want to point out that this PR is specifically about adding a foot note in MLC externally visible tables on our MLCommons website that should state that any normalized Perf/W data is only valid for that particular entry and no other.

@DilipSequeira
Copy link
Contributor

DilipSequeira commented Apr 18, 2022

[Just catching up on this thread.]

Suppose that MLPerf had additional metrics (say, DRAM usage, and system noise in dB) with entries optimized for each using different SW configuration options. Would it be legitimate to present a comparison of two systems across the scenarios capturing all of these things in a single chart? I think it's useful to be able to do so, and we should think about what criteria we want to impose in such cases.

I would say, at minimum:

  • each measurement point the graph is apples-to-apples (so in the case of dB, it should compare the noise-optimized configurations, DRAM should compare the DRAM-optimized configurations, same for unconstrained perf and power efficiency.)
  • the measurement axis can be sensibly numbered (which in practice I think requires normalizing to one system so that the values are dimensionless.)
  • footnotes should capture which comparison points are being taken from which lines in the results tables.

AFAICT our blog post conforms to the first two of these, and IMO is consistent with the spirit of the rules as well as the letter. It doesn't conform to the third, and hence is somewhat imprecise about what's being presented. I think it would be reasonable to refine the requirements around footnoting so that it's clearer how such tables should be annotated in future rounds to prevent any possible confusion.

@arjunsuresh
Copy link
Contributor

arjunsuresh commented Apr 15, 2023

I think this PR is useful to be merged. Joules/stream can be added as an additional column in the results table as it makes life so easier for any user seeing the table. The following data is taken from the 3.0 results and is for resnet50 model. The perf and power metric columns are from the results table and I'm not sure a normal user can do a simple comparison here based on the values shown as different scenarios are involved here. But the last column - Joules per sample gives a straight away comparison and gives a useful metric to the user and this also gives a power efficiency comparison across the 4 different scenarios.

PS: In the below table Power is reported for offline and server scenarios and energy in milliJoules for Sinlgestream and Multistream as in the official results table

Log path Performance Power metric Samples/Joules
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/dlrm-99.9/offline/performance 3034740 2,741.71 1106.878788
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/dlrm-99/offline/performance 3034740 2,741.71 1106.878788
closed/Neuchips/results/RecAccel-N3000-32GB-PCIEx8/dlrm-99.9/server/performance 856398 807.88 1060.060104
closed/Neuchips/results/RecAccel-N3000-32GB-PCIEx8/dlrm-99/server/performance 856398 807.88 1060.060104
closed/Neuchips/results/RecAccel-N3000-32GB-PCIEx8/dlrm-99.9/offline/performance 811472 785.42 1033.168796
closed/Neuchips/results/RecAccel-N3000-32GB-PCIEx8/dlrm-99/offline/performance 811472 785.42 1033.168796
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/dlrm-99.9/offline/performance 1561860 2,118.61 737.2112893
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/dlrm-99/offline/performance 1561860 2,118.61 737.2112893
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/dlrm-99.9/server/performance 1501100 2,369.85 633.4157061
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/dlrm-99/server/performance 1501100 2,369.85 633.4157061
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/dlrm-99.9/server/performance 1000520 2,027.42 493.4933427
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/dlrm-99/server/performance 1000520 2,027.42 493.4933427
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/resnet50/offline/performance 9801.8 31.23 313.8814512
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/resnet50/offline/performance 6810.02 23.77 286.4782428
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/resnet50/offline/performance 7025.39 24.76 283.7624202
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/resnet50/offline/performance 6792.57 24.11 281.7195911
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/resnet50/offline/performance 7041.3 25.29 278.4752736
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/resnet50/offline/performance 169564 701.31 241.7821122
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/resnet50/server/performance 145881 641.53 227.3953862
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 2.203008 36.32 220.2750183
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/resnet50/offline/performance 333078 1,565.21 212.8006963
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/resnet50/offline/performance 106199 516.56 205.58917
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 2.05857 40.05 199.7746
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/resnet50/server/performance 290034 1,467.63 197.6208108
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/resnet50/offline/performance 361537 1,829.45 197.6207526
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/resnet50/server/performance 345043 1,751.47 197.0023219
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 2.161895 42.12 189.9384042
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 2.128724 42.56 187.9569184
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/resnet50/offline/performance 79193.9 426.60 185.6391474
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/resnet50/offline/performance 94615.2 511.91 184.8291172
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/resnet50/server/performance 93750.3 516.46 181.5263558
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 2.158166 45.02 177.7111309
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/resnet50/offline/performance 353626 2,219.58 159.3213184
closed/NVIDIA/results/Orin_TRT_MaxQ/resnet50/offline/performance 3450.66 22.66 152.2841207
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/resnet50/offline/performance 24768 168.68 146.8366338
closed/SiMa/results/davinci_dualm2/resnet50/offline/performance 2255.01 16.11 140.0053697
closed/SiMa/results/davinci_dualm2/resnet50/multistream/performance 3.708546 58.05 137.8133027
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/resnet50/offline/performance 46417.2 345.90 134.191037
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/resnet50/offline/performance 256554 2,049.73 125.1647797
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/resnet50/offline/performance 23574.7 189.13 124.645759
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.514955 8.26 121.075129
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.668982 8.98 111.3383251
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/resnet50/server/performance 240018 2,213.47 108.4353189
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.64872 10.08 99.24231726
closed/NVIDIA/results/Orin_TRT_MaxQ/resnet50/multistream/performance 4.766884 80.76 99.05938565
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.639139 10.17 98.34434972
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/resnet50/server/performance 203512 2,085.13 97.6016586
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.645762 10.61 94.22404252
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 0.976803 100.65 79.48697832
closed/cTuning/results/nvidia_orin-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/resnet50/offline/performance 4657.25 70.23 66.31824254
closed/SiMa/results/davinci_dualm2/resnet50/singlestream/performance 1.181702 15.29 65.42117983
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 0.918261 124.39 64.31431908
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/resnet50/offline/performance 36428.9 603.53 60.35976178
closed/cTuning/results/nvidia_orin-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/resnet50/multistream/performance 2.56649 132.98 60.15897963
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 0.605694 146.52 54.60054815
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 0.615673 155.29 51.51777049
closed/NVIDIA/results/Orin_TRT_MaxQ/resnet50/singlestream/performance 1.637467 22.19 45.05872747
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/rnnt/offline/performance 98719.4 2,192.92 45.01726173
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/resnet50/multistream/performance 0.576147 180.81 44.24587227
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/resnet50/multistream/performance 0.432391 192.35 41.59120604
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/rnnt/server/performance 87999.5 2,218.62 39.66410377
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/rnnt/offline/performance 77675.7 2,020.98 38.43458232
closed/cTuning/results/nvidia_orin-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/resnet50/singlestream/performance 0.725518 29.06 34.41446313
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/rnnt/server/performance 75001 2,200.49 34.0838213
closed/NVIDIA/results/Orin_TRT_MaxQ/rnnt/offline/performance 692.51 24.22 28.59002698
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.407407 38.73 25.81966126
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/rnnt/offline/performance 15295.1 599.47 25.51439565
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/resnet50/offline/performance 12752.9 634.01 20.11452377
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.408104 52.75 18.95781157
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/resnet50/offline/performance 2796.77 168.66 16.58256632
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/resnet50/multistream/performance 0.832071 511.74 15.63286321
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/resnet50/singlestream/performance 0.226915 72.72 13.75152787
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/resnet50/multistream/performance 3.264295 605.48 13.21266484
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/bert-99/offline/performance 39229.3 2,998.33 13.08371299
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/bert-99/offline/performance 373.874 30.82 12.13129049
closed/NVIDIA/results/Orin_TRT_MaxQ/bert-99/offline/performance 267.572 22.49 11.89850537
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.410575 85.87 11.64562103
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/bert-99/offline/performance 252.32 22.53 11.1990616
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/bert-99/offline/performance 274.943 24.79 11.09164715
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.410102 90.21 11.08563257
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/bert-99/server/performance 33003.6 3,061.41 10.78052622
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/bert-99/offline/performance 253.788 23.56 10.77209862
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/resnet50/singlestream/performance 0.336022 93.39 10.70751928
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/bert-99.9/offline/performance 33419.1 3,130.69 10.67468394
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/bert-99/offline/performance 260.633 24.78 10.51900339
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/bert-99/offline/performance 5009.67 539.40 9.287431489
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/bert-99/server/performance 4802.16 536.71 8.947376756
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/bert-99/offline/performance 22584.3 2,524.63 8.945605009
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/bert-99.9/server/performance 25005 2,952.66 8.468620721
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/bert-99/offline/performance 3327.67 415.60 8.006905384
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/bert-99/server/performance 17299.1 2,164.98 7.990418389
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/resnet50/singlestream/performance 0.701612 125.99 7.936998353
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/bert-99/offline/performance 10055.5 1,302.23 7.721740455
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/bert-99/server/performance 9752.5 1,296.42 7.522617992
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/bert-99/offline/performance 11313.4 1,537.51 7.35826326
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/bert-99/server/performance 11000.3 1,525.94 7.208854591
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/bert-99/offline/performance 2955.46 414.45 7.131007397
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/bert-99/offline/performance 4197.14 608.33 6.899476487
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/bert-99/offline/performance 3069.27 472.39 6.497374177
closed/NVIDIA/results/Orin_TRT_MaxQ/bert-99/singlestream/performance 7.870582 160.19 6.242433762
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/rnnt/offline/performance 3950.42 645.67 6.118295065
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/bert-99/server/performance 2724.61 454.27 5.997759049
closed/cTuning/results/nvidia_orin-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/bert-99/offline/performance 453.309 76.95 5.89078215
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/retinanet/offline/performance 173.697 30.71 5.655264851
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/rnnt/offline/performance 1054.86 186.67 5.651049532
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/bert-99/offline/performance 806.772 159.47 5.058927507
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/retinanet/offline/performance 111.169 22.43 4.955992066
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/bert-99/offline/performance 1461.9 304.13 4.806829063
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/resnet50/singlestream/performance 0.329008 208.14 4.804348381
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/retinanet/offline/performance 112.112 23.42 4.786461685
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/retinanet/offline/performance 116.215 24.40 4.76353692
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/bert-99.9/offline/performance 2854.97 599.98 4.758479497
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/retinanet/offline/performance 115.002 24.37 4.719813595
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/bert-99.9/server/performance 2700.13 587.06 4.599438044
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/bert-99/offline/performance 751.38 164.67 4.562844688
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/bert-99.9/offline/performance 11298 2,535.05 4.456722017
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 12.316776 251.77 3.971885876
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/bert-99.9/offline/performance 5716.29 1,455.17 3.928258
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/retinanet/offline/performance 1941.88 498.91 3.892213817
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/bert-99.9/server/performance 5527.06 1,438.04 3.843473734
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/bert-99.9/offline/performance 6424.67 1,672.21 3.842030693
closed/Qualcomm/results/r282_z93_q8e-qaic-v1.8.3.7-aic100/retinanet/server/performance 1849.27 482.96 3.829002064
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 109.566888 2,100.29 3.80900227
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/bert-99.9/server/performance 6202.78 1,649.90 3.759484276
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 13.107688 274.79 3.639100403
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 113.80329 2,224.38 3.596511687
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 12.944093 284.84 3.510703156
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 12.557293 285.01 3.508676276
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/bert-99.9/server/performance 7503.01 2,158.16 3.476577001
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 12.571941 291.45 3.431110515
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 116.698626 2,338.63 3.420809726
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 113.651634 2,340.85 3.417563383
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/retinanet/offline/performance 1248.81 366.98 3.40291782
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 119.702703 2,448.27 3.267610297
closed/cTuning/results/nvidia_orin-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/bert-99/singlestream/performance 7.89606 307.01 3.257215252
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/retinanet/offline/performance 3885.39 1,208.26 3.215687912
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/bert-99.9/offline/performance 1489.65 463.76 3.212135793
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/bert-99.9/server/performance 1373.07 430.79 3.187307441
closed/Qualcomm/results/g292_z43_q16e-qaic-v1.8.3.7-aic100/retinanet/server/performance 3775.19 1,186.35 3.182183256
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/retinanet/offline/performance 4370.21 1,416.32 3.085603647
closed/Qualcomm/results/g292_z43_q18e-qaic-v1.8.3.7-aic100/retinanet/server/performance 4252.12 1,380.82 3.079425499
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/retinanet/offline/performance 1103.4 359.95 3.065420318
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/retinanet/offline/performance 5982.06 2,253.30 2.654803061
closed/Qualcomm/results/gloria_highend-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 26.351145 381.99 2.617851604
closed/Qualcomm/results/heimdall-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 25.207966 395.74 2.526923234
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/retinanet/offline/performance 1047.38 419.34 2.497703666
closed/Dell/results/r7515_q4_pro-qaic-v1.8.3.7-aic100/retinanet/server/performance 986.641 405.31 2.43426131
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/retinanet/server/performance 5602.54 2,311.03 2.424260519
closed/NVIDIA/results/Orin_TRT_MaxQ/retinanet/offline/performance 49.8437 20.69 2.409569985
closed/Qualcomm/results/gloria-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 25.36629 416.84 2.398978009
closed/Krai/results/rb6-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 26.023671 423.03 2.363886211
closed/Krai/results/eb6-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 25.310616 431.15 2.319353803
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/bert-99/singlestream/performance 1.086149 462.57 2.161843282
closed/NVIDIA/results/Orin_TRT_MaxQ/retinanet/multistream/performance 147.789685 4,025.73 1.98721717
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/retinanet/offline/performance 282.036 142.82 1.974749936
closed/NVIDIA/results/Orin_TRT_MaxQ/retinanet/singlestream/performance 26.523473 521.34 1.918135163
closed/Krai/results/firefly-armnn-v22.11-opencl/resnet50/multistream/performance 687.29616 4,324.38 1.849974936
closed/Krai/results/firefly-armnn-v22.11-opencl/resnet50/singlestream/performance 83.884781 540.55 1.849974936
closed/Krai/results/firefly-armnn-v22.11-opencl/resnet50/offline/performance 12.1753 6.59 1.846687975
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/retinanet/offline/performance 4688.4 2,554.03 1.835689614
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/retinanet/server/performance 3600.34 2,021.05 1.781417904
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/retinanet/offline/performance 485.69 273.12 1.77829712
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/retinanet/offline/performance 252.99 150.40 1.682109044
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/bert-99/offline/performance 916.886 641.04 1.430306992
closed/Krai/results/firefly-armnn-v22.11-neon/resnet50/offline/performance 9.32308 7.46 1.249831143
closed/Krai/results/firefly-armnn-v22.11-neon/resnet50/multistream/performance 1099.089368 6,423.48 1.245430304
closed/Krai/results/firefly-armnn-v22.11-neon/resnet50/singlestream/performance 117.327576 802.94 1.245430304
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 7.936098 835.49 1.196902474
closed/Krai/results/orin-armnn-v22.11-neon/resnet50/multistream/performance 347.573352 6,832.75 1.170831112
closed/Krai/results/orin-armnn-v22.11-neon/resnet50/singlestream/performance 32.320006 854.09 1.170831112
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/bert-99/offline/performance 209.462 179.46 1.167211053
closed/Krai/results/orin-armnn-v22.11-neon/resnet50/offline/performance 34.5406 29.69 1.163399691
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 67.437102 7,059.58 1.13321245
closed/Krai/results/vim4-armnn-v22.11-opencl/resnet50/offline/performance 6.82026 6.11 1.115573903
closed/Krai/results/vim4-armnn-v22.11-opencl/resnet50/singlestream/performance 149.796417 899.28 1.112002581
closed/Krai/results/vim4-armnn-v22.11-opencl/resnet50/multistream/performance 1205.681864 7,194.23 1.112002581
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 26.068952 7,488.56 1.068296883
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 32.805165 7,866.55 1.016964532
closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50/offline/performance 9.61291 11.38 0.8448729852
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 81.780047 10,209.44 0.7835884737
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/retinanet/multistream/performance 45.262914 10,233.24 0.7817664006
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/bert-99/singlestream/performance 8.561233 1,283.86 0.7789016257
closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50/multistream/performance 1065.274624 10,367.49 0.7716430359
closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50/singlestream/performance 116.239871 1,295.94 0.7716430359
closed/Krai/results/orin-tflite-v2.11.0-ruy/resnet50/singlestream/performance 42.463961 1,371.26 0.7292572216
closed/Krai/results/orin-tflite-v2.11.0-ruy/resnet50/multistream/performance 481.127488 10,970.07 0.7292572216
closed/Krai/results/orin-tflite-v2.11.0-ruy/resnet50/offline/performance 23.6254 32.44 0.7282915007
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 10.838145 1,411.97 0.7082310992
closed/Lenovo/results/se350_q1_pro-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 15.555088 1,419.53 0.7044571842
closed/Krai/results/vim4-armnn-v22.11-neon/resnet50/offline/performance 5.1439 7.59 0.6775745766
closed/Krai/results/vim4-armnn-v22.11-neon/resnet50/singlestream/performance 196.290049 1,475.95 0.6775291076
closed/Krai/results/vim4-armnn-v22.11-neon/resnet50/multistream/performance 1759.284824 11,807.61 0.6775291076
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/bert-99/singlestream/performance 2.545482 1,547.83 0.6460673855
closed/NVIDIA/results/Orin_TRT_MaxQ/rnnt/singlestream/performance 117.224916 1,719.19 0.5816709277
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 7.554779 1,802.41 0.5548126757
closed/Krai/results/vim4-tflite-v2.11.0-ruy/resnet50/offline/performance 4.69909 9.93 0.4734411869
closed/Krai/results/vim4-tflite-v2.11.0-ruy/resnet50/multistream/performance 2178.966032 16,911.18 0.4730598279
closed/Krai/results/vim4-tflite-v2.11.0-ruy/resnet50/singlestream/performance 221.74806 2,113.90 0.4730598279
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 10.137517 2,152.61 0.4645533401
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/bert-99/singlestream/performance 10.994449 2,318.62 0.4312916868
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/rnnt/singlestream/performance 9.930471 2,532.51 0.3948645999
closed/Dell/results/xr4520c_q2_lite-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 23.109562 2,689.11 0.3718697456
closed/HPE/results/e920d_q4_std-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 13.183635 2,924.15 0.3419791731
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/retinanet/offline/performance 47.1028 169.15 0.2784725033
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/retinanet/offline/performance 174.883 634.85 0.2754724442
closed/Qualcomm/results/r282_z93_q5e-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 18.562301 3,670.90 0.2724124546
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/retinanet/multistream/performance 148.492157 30,749.65 0.260165563
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/retinanet/multistream/performance 50.54385 31,234.10 0.2561303255
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/retinanet/singlestream/performance 6.329648 3,935.81 0.2540771799
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/retinanet/singlestream/performance 19.853798 4,029.38 0.248177181
closed/Lenovo/results/se450_q4_lite-qaic-v1.8.3.7-aic100/retinanet/singlestream/performance 23.263048 4,474.58 0.223484734
closed/cTuning/results/amd_zen4_workstation-reference-gpu-onnxruntime-v1.14.0-default_config/retinanet/singlestream/performance 16.701465 7,553.37 0.132391233
closed/cTuning/results/amd_zen4_workstation-reference-gpu-onnxruntime-v1.14.0-default_config/retinanet/multistream/performance 146.676104 60,426.96 0.132391233
closed/cTuning/results/amd_zen4_workstation-reference-gpu-onnxruntime-v1.14.0-default_config/retinanet/offline/performance 64.7113 490.28 0.1319873533
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/rnnt/singlestream/performance 49.023009 19,868.42 0.05033111728
closed/Dell/results/XR4520c_MaxQ_A2x1_TRT_MaxQ/rnnt/singlestream/performance 176.672759 22,294.14 0.04485483097
closed/NVIDIA/results/Orin_TRT_MaxQ/3d-unet-99.9/singlestream/performance 5137.065075 72,545.18 0.01378451309
closed/NVIDIA/results/Orin_TRT_MaxQ/3d-unet-99/singlestream/performance 5137.065075 72,545.18 0.01378451309
closed/NVIDIA/results/Orin_TRT_MaxQ/3d-unet-99.9/offline/performance 0.376526 27.33 0.01377843114
closed/NVIDIA/results/Orin_TRT_MaxQ/3d-unet-99/offline/performance 0.376526 27.33 0.01377843114
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/3d-unet-99.9/offline/performance 26.8433 2,115.17 0.01269086449
closed/NVIDIA/results/H100-PCIe-80GBx8_TRT_MaxQ/3d-unet-99/offline/performance 26.8433 2,115.17 0.01269086449
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/3d-unet-99.9/offline/performance 19.7839 1,779.17 0.01111976633
closed/NVIDIA/results/A100-PCIe-80GBx8_TRT_MaxQ/3d-unet-99/offline/performance 19.7839 1,779.17 0.01111976633
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/3d-unet-99/offline/performance 4.51052 588.31 0.007666908607
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/3d-unet-99.9/offline/performance 4.5111 588.59 0.00766422461
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/3d-unet-99/singlestream/performance 431.88247 130,492.84 0.007663255942
closed/cTuning/results/amd_zen4_workstation-nvidia_original-gpu-tensorrt-v8.5.2.2-default_config/3d-unet-99.9/singlestream/performance 431.9247 130,504.53 0.00766256942
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/3d-unet-99.9/offline/performance 1.0704 635.34 0.001684778766
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/3d-unet-99/offline/performance 1.0704 635.34 0.001684778766
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/3d-unet-99.9/singlestream/performance 1831.18923 597,150.05 0.00167462099
closed/Dell/results/XR5610_L4x1_TRT_MaxQ/3d-unet-99/singlestream/performance 1831.18923 597,150.05 0.00167462099

@arjunsuresh
Copy link
Contributor

Adding the link to the full results with unified power metric (inference per Joules) here

@mrmhodak
Copy link
Contributor

mrmhodak commented Jul 5, 2023

Needs more discussion, need opinion from Power

@s-idgunji
Copy link
Contributor

Needs more discussion, need opinion from Power

We can discuss this. We had already agreed to adding energy metrics. What is this new update ?

@s-idgunji
Copy link
Contributor

s-idgunji commented Jul 12, 2023

Needs more discussion, need opinion from Power

We can discuss this. We had already agreed to adding energy metrics. What is this new update ?

image

@mrmhodak - We had a discussion in the Power WG meeting.

  1. it's unclear how the samples/J is calculated (it is derive with unambiguously relating it to Performance , because performance has no metric associated , nor from Power )

  2. Media, Reviewers viewing the table are already using Perf (measured) and Power (measured) data and then deriving the metric, so adding a 3rd column is redundant (no new information , other than ease of reading)

  3. Krai claims that there have been issues with having the measured metrics to derive and energy efficiency . There are some links/websites mentioned. Krai will provide the data/evidence for such issues

  4. Measured metrics must be always in results table.

  5. The metrics to be reported as captured in the Inference Power methodology document in MLPerf Power area clearly states that for reporting : if the the measurement is a rate ( Queries/Sec or Samples/Sec) , the associated electrical metric is Power (Joules/Sec) , i.e a rate . If the measurement is time or latency , the associated electrical metric is Energy (Joules or mJ)

  6. It's even more complicated/ambiguous for single stream and multi-stream given current reporting : the performance metric is latency (no reference to throughput and what specifically). The electrical metric is Energy (which is 1:1 corresponding metric). The suggestion of samples/J has no way of deriving how it came about from the current present metrics { Latency , Energy} . Needs further discussion.

The next steps are to follow on in the next instance (had to cover another 3.1 submission topic)

Sachin

@mrmhodak
Copy link
Contributor

Closing, no longer relevant with new spreadsheet coming soon.

@mrmhodak mrmhodak closed this Oct 31, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Oct 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants