As stated in Section II-A, in addition to directly measuring throughput and latency of instruction forms including memory references in combination with register operands,
The DG matrix can be naturally partitioned into matrix blocks as sketched in Fig. 1.
However, conventional KS-DFT calculations show cubic computational complexity O(N^3) with respect to the system size N.