Compiler Directed Coherence
Compiler manages cache
- No directory hardware (too complex, not scalable)
- Divide program into computation units (epochs)
repeat /* computation of vector xi+1 = Axi + b */
doall j = 1 to N /* create N parallel tasks */
xtemp[j] = b[j];
for k = 1 to N
xtemp[j] = xtemp[j] + A[j,k]*x[k];
end doall
doall j = 1 to N /* create N parallel tasks */
x[j] = xtemp[j]; /* write new vectors */
end doall /* end of epoch */
until all vectors computed