Skip to content

Kayla (1.4 GHz ARM Cortex A9)

Christopher Celio edited this page Feb 28, 2014 · 6 revisions

Kayla has four ARMv7 Cortex A9 cores. These are narrow out-of-order processors running at 1.4 GHz.

Cache Sizes and Access Latencies

Rocket cache sizes

  • blue (unit-stride)
  • green (cache-line stride)
  • red (random stride)

Results

  • Kayla lacks prefetchers (random stride and cache-line stride performance is identical)
  • the L1 data cache is 32 kB, access latency is 2.85 ns (exactly 4 cycles for a 1.4 GHz clock)
  • the L2 cache is 1 MB, access latency is ~20 ns.
  • off-chip DRAM access latency is ~145 ns.

Performance becomes muddled around 512kB-1MB. According to the ARM Cortex A9 manual, the TLB is at most 128 entries, which provides a TLB reach of 512kB. A single core attempting to access more than 512kB will see at least some performance degradation.

Multi-threaded Request Bandwidth

Rocket cache sizes

These graphs shows both the per thread request bandwidth and the aggregate bandwidth of the Cortex A9. Each thread is performing a number of independent pointer chases. As more threads are added, per thread bandwidth drops.

Peakflops

Performance is comparable, if a little worse, than Rocket.

Add/Mul Mix Adds Only
MFLops 788.9 955.7