I wrote an instruction cache simulator in PAL code for the DECchip 21064A alpha chip (this was on an "avanti" machine). Then I used ATOM to instrument the kernel and some applications, adding a PAL call to my cache simulator at each basic block. I also used my own PAL tools to instrument PAL code. By running two parellel simulations of the 16K on-chip, virtually indexed cache, one including PAL instruction references and one not, I was able to measure the effects of PAL code on the instruction cache. It seems that roughly 10% of the on-chip cache misses can be attributed to the execution of PAL code on the 21064A. If they hit in the board cache, these misses cost 20-28 cycles each, depending on whether or not the instruction cache line was prefetched.

Cache lines affected by PAL code entry points:

Gcc from SPEC92:

Ghostscript and X11:

Ttcp (TCP throughput benchmark):