龟速的malloc和神速的FastMM



	 

	由于在Delphi项目中，要频繁创建和释放大量小对象，因此担心有效率问题，于是打于GetMem.inc看看，发现FastMM对于小块内存作了很多工作，它预置了一组不同大小的内存池，当要创建一块内存时，FastMM找到大小最相近的内存池分配之，内存释放后回收到池中。这样的做法虽有小量内存浪费，但效率却是大大提高。

	 

	我决定做一个测试，看看效率研究如何：

	 

	 

	const 

	  cSize: Integer = 100; 

	  cNum: Integer = 10000; 

	var 

	  N, I: Integer; 

	  P: array [0..9999] of Pointer; 

	  Fre: Int64; 

	  Count1, Count2: Int64; 

	  Time: Double; 

	begin 

	  QueryPerformanceFrequency(Fre); 

	  QueryPerformanceCounter(Count1); 

	 

	  for I := 0 to 1000 - 1 do 

	  begin 

	    for N := 0 to cNum - 1 do 

	      GetMem(P[N], cSize); 

	    for N := 0 to cNum - 1 do 

	      FreeMem(P[N]); 

	  end; 

	 

	  QueryPerformanceCounter(Count2); 

	  Time := (Count2 - Count1) / Fre; 

	  Writeln(Format('Delphi2007 Release: %f', [Time])); 

	end. 

	上面例子中，循环1000次，每次循环分别创建和释放10000个100字节的内存块，运行结果如下：

	 

	  Delphi2007 Release: 0.14

	结果非常好，这下我可以尽情使用小对象来替换记录的工作了。

	我想起C++的Malloc，不知其效率如何，于是我又将Delphi的测试代码转换成C++，代码如下：

	 

	 

	LARGE_INTEGER fre; 

	LARGE_INTEGER count1, count2; 

	double time; 

	QueryPerformanceFrequency(&fre); 

	const int cSize = 100; 

	const int cNum = 10000; 

	void* p[cNum]; 

	 

	QueryPerformanceCounter(&count1); 

	for (int i = 0; i < 1000; ++i) 

	{ 

	    for (int n = 0; n < cNum; ++n) 

	        p[n] = malloc(cSize); 

	    for (int n = 0; n < cNum; ++n) 

	        free(p[n]); 

	} 

	QueryPerformanceCounter(&count2); 

	time = (count2.QuadPart - count1.QuadPart) / (double)fre.QuadPart; 

	printf("VC2008 Release: %f\n", time); 

	运行结果使我震惊，这真是龟速的malloc:

	  VC2008 Release: 3.854

	看来malloc并没有对小内存作任何优化，所以在C++中要大量使用动态对象，是必须要小心的，否则很容易引起性能问题。找了一些替换的内存管理器，始终没有办法达到FastMM的水平，最快的也只是其一半的速度。

	最后我用自己实现的一个受限的内存管理器测试，该管理器只能创建固定大小的内存块，也是用池的方式缓存内存块，代码如下：

	 

	LARGE_INTEGER fre; 

	LARGE_INTEGER count1, count2; 

	double time; 

	QueryPerformanceFrequency(&fre); 

	 

	const int cSize = 100; 

	const int cNum = 10000; 

	void* p[cNum]; 

	FixedAlloc myAlloc(cSize); 

	QueryPerformanceCounter(&count1); 

	for (int i = 0; i < 1000; ++i) 

	{ 

	    for (int n = 0; n < cNum; ++n) 

	    {    

	        //p[n] = malloc(cSize); 

	        p[n] = myAlloc.Alloc(); 

	    } 

	    for (int n = 0; n < cNum; ++n) 

	    { 

	        //free(p[n]); 

	        myAlloc.Free(p[n]); 

	    } 

	} 

	QueryPerformanceCounter(&count2); 

	time = (count2.QuadPart - count1.QuadPart) / (double)fre.QuadPart; 

	printf("VC2008 Release: %f\n", time); 

	这次的结果很让我满意：

	  VC2008 Release: 0.0806

	速度比FastMM快了近一倍，但这并不表示它比FastMM好，因为FastMM更加通用，且处理了很多其他的逻辑，如果FixedAlloc做得更完善一些，或许会和FastMM接近的。因此可见，对效率很敏感的程序，使用特有的内存管理器是必须的，否则让龟速的malloc来处理，一切都是龟速。

	进一步想，如果打开多线程判断，FastMM的效率不知如何，于是又有下面的测试代码：

	 

	IsMultiThread := True; 

	QueryPerformanceCounter(Count1); 

	 

	for I := 0 to 1000 - 1 do 

	begin 

	  for N := 0 to cNum - 1 do 

	    GetMem(P[N], cSize); 

	  for N := 0 to cNum - 1 do 

	    FreeMem(P[N]); 

	end; 

	 

	QueryPerformanceCounter(Count2); 

	Time := (Count2 - Count1) / Fre; 

	Writeln(Format('Delphi2007 Release：%f', [Time])); 

	仅仅是把IsMultiThread打开，效果非常明显：

	  Delphi2007 Release：0.41

	足足比单线程模式慢了3倍，但是如果我自己来处理多线程的情况呢，结果又是如何呢：

	 

	IsMultiThread := False; 

	InitializeCriticalSection(CS); 

	 

	QueryPerformanceCounter(Count1); 

	 

	for I := 0 to 1000 - 1 do 

	begin 

	  for N := 0 to cNum - 1 do 

	  begin 

	    EnterCriticalSection(CS); 

	    GetMem(P[N], cSize); 

	    LeaveCriticalSection(CS); 

	  end; 

	  for N := 0 to cNum - 1 do 

	  begin 

	    EnterCriticalSection(CS); 

	    FreeMem(P[N]); 

	    LeaveCriticalSection(CS); 

	  end; 

	end; 

	 

	QueryPerformanceCounter(Count2); 

	Time := (Count2 - Count1) / Fre; 

	Writeln(Format('Delphi2007 Release：%f', [Time])); 

	DeleteCriticalSection(CS); 

	结果很糟糕：

	  Delphi2007 Release：0.71

	FastMM并不像Delphi7那样，用临界区来实现多线程安全，因此效率要比那个方案更高一些，FastMM确实不失为一个顶级的内存管理器。

	 

	摘自 colin小屋

补充：软件开发 , Delphi ,