Load/Store Hoisting

August 19, 1998

Mark Mitchell Consulting as part of a continuing contract with the Accelerated Strategic Computing Initiative (ASCI) program at Los Alamos National Laboratory, has contributed code to hoist loads from and stores to memory out of loops. Because many programs spend most of their time inside loops, making loops go faster can result in dramatic improvements in the execution time of the programs.

To see the utility of this new optimization, consider the following function:

  void f(int* k, int j)
  {
    int i;

    for (i = 0; i < j; ++i)
      *k = *k + 2 * ((*k) - 1);
  }

Without the new optimization, the code generated for the loop on an x86 looked like:

.L5:
	movl (%ecx),%eax           # Load *k
	leal -2(%eax,%eax,2),%eax  # Compute *k + 2 * ((*k) - 1)
	movl %eax,(%ecx)           # Store *k
	incl %edx                  # Increment i
	cmpl %ebx,%edx             # Test i < j
	jl .L5                     # Loop if i < j

With the optimization, the code generated for the loop becomes:

	movl (%ebx),%eax           # Load *k
.L5:
	leal -2(%eax,%eax,2),%eax  # Compute *k + 2 * ((*k) - 1)
	incl %edx                  # Increment i
	cmpl %ecx,%edx             # Test i < j
	jl .L5                     # Loop if i < j
	movl %eax,(%ebx)           # Store *k

Note that in the second case the loads and stores occur outside the loop. In this particular case, we can expect the loop to run about 33% faster, since there are 4 instructions instead of 6. (Of course, the vagaries of today's deeply pipelined architectures can play havoc with some estimates, so instruction count is only a rough guess at execution time.)

Another example, of interest to the Fortran community, is code like:

   subroutine gemm(a, b, c, m, n, k)
   integer i,m,n,k,l,j
   dimension a(k,m),  b(n,k),  c(n,m)
   do i=1,m
     do j=1,n
       do l=1,k
	 c(j,i) = c(j,i) + a(l,i)*b(j,l)
       end do
     end do
   end do
   end

Here, the code generated for the inner loop will not load or store c(j,i) inside the loop. Instead, the initial value will be loaded into a register, and the sum will be accumulated there. After the loop is complete, the value will be stored back to memory.


Please send FSF & GNU inquiries & questions to gnu@gnu.org. There are also other ways to contact the FSF.

These pages are maintained by the GCC team.

For questions related to the use of GCC, please consult these web pages and the GCC manuals. If that fails, the gcc-help@gcc.gnu.org mailing list might help.
Please send comments on these web pages and the development of GCC to our developer mailing list at gcc@gnu.org or gcc@gcc.gnu.org. All of our lists have public archives.

Copyright (C) Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110, USA.

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

Last modified 2005-07-11 Valid XHTML 1.0
Mirror provided by HKMirror. Sponsored by Porno Verzameling and webcamsex