This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex


I have an issue with non-atomic increment/
decrement
of hblks, (i.e. n_mmaps) in a multi-threaded mission critical process
that monitors
heap usage as part of supervision. This value can drift astray of the
true count and
rarerly triggers an assertion in the total heap calculatio. There is a
similar disabled
assert in glibc:

#ifdef NO_THREADS
  assert(total <= (unsigned long)(mp_.max_total_mem));
  assert(mp_.n_mmaps >= 0);
#endif

Seem to be a problem of wanting to pay the cost of mutex contention
instead of
the (application specific) cost of providing wrong values. In the
context of a malloc call
that asks for 2 Mb, I have not figured out that part yet, but I can't
imagine that keeping
a correct total count in a multi-threaded is beyond state of the art
(but am sure it may
be beyond desire of hackers or motivation of maintainers). There is a
disabled
THREAD_STATS switch but that doesn't seem to handle this case. The
problem might be
resolved, but at home I can still reproduce it with latest glibc:

with Text_Io;
with Unchecked_Deallocation;
procedure T is

   type Mallinfo_Field is (Arena, Ordblks, Smblks, Hblks, Hblkhd,
Usmblks, Fsmblks, Uordblks, Fordblks, Keepcost);

   type Mallinfo_Stats is array (Mallinfo_Field) of Integer;
   -- on 32 bit.

   function mallinfo return Mallinfo_Stats;
   pragma Interface (C, mallinfo);

   type Large is array (1..2**22) of Character; -- 4M
   type Large_Ptr is access Large;
   procedure Free is new Unchecked_Deallocation (Large, Large_Ptr);

   task type Alloc is
      pragma Priority (0);
   end Alloc;

   task body Alloc is
      L : array (1..10) of Large_Ptr;
   begin
      for Count in Natural loop
         for I in L'Range loop
            L(I) := new Large;
         end loop;
         for I in L'Range loop
            Free (L(I));
         end loop;
         -- To check for sufficient parallelism
         -- Test on multi-core/multi-processor
         if Count mod 10000 = 0 then
           Text_Io.Put_Line (Count'Img);
         end if;
      end loop;
   end;

   Loads : array (1..4) of Alloc;

   procedure Show_Stats is
      Result : constant Mallinfo_Stats := mallinfo;
   begin
      for Field in Result'Range loop
         Text_IO.Put (Field'Img & "=" & Result(Field)'Img & " ");
      end loop;
      Text_Io.New_Line;
      if Result(Hblks) not in 0..40 then
         for T in Loads'Range loop
            abort Loads(T);
         end loop;
         raise Program_Error;
      end if;
   end Show_Stats;

begin
   loop
      delay 1.0;
      Show_Stats;
   end loop;
end;

output :

ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 0 HBLKHD=-20992000 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22700000
 22620000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 17 HBLKHD= 50380800 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22630000
 22570000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 7 HBLKHD= 8396800 USMBLKS= 0
FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22710000
 22630000
 22640000
 22580000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 2 HBLKHD=-12595200 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22720000
 22640000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 0 HBLKHD=-20992000 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22650000
 22590000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 26 HBLKHD= 88166400 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22730000
 22650000
 22660000
 22600000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS=-1 HBLKHD=-25190400 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560

raised PROGRAM_ERROR : t.adb:52 explicit raise 

Posted on :
http://groups.google.com/group/comp.os.linux.development.apps/browse_thread/thread/80b0c703fe7004c2/f512a2856b0eb3e3?lnk=raot&pli=1

Found a comment stating mutexing was expensive :
http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/c25c962a87d81895/674575250fd87044?q=mallinfo+hblk#674575250fd87044

I've tried adding a mutex, and I don't see what is expensive. When I compare the
latest version of glibc malloc after adding the mutex locking against the system
default glibc behaviour I don't measure a difference, results are within 3%
accurate (some fluctuations on the measure, just used time command), I guess
mutex cost is much smaller than mmap cost. Anyway, that is only performance of
iterative malloc/free of a 4 MB block, adding nothing more but a zero-fill of
this memory makes the operation about 1000 times slower, so I don't see a
significant issue with this mutex overhead.

In case overhead would be significant, at least a #define THREAD_SAFE 1 should
activate a fix, or a dynamic option, but surely correct total heap count should
be achievable.

My experiment :

$ diff -U3 arena.c.orig arena.c
--- arena.c.orig	2009-12-13 18:00:55.000000000 +0100
+++ arena.c	2009-12-13 18:03:12.000000000 +0100
@@ -93,6 +93,13 @@
 /* Mapped memory in non-main arenas (reliable only for NO_THREADS). */
 static unsigned long arena_mem;
 
+static mutex_t mp_mmap_mutex;
+/*Serialize access to (malloc.c) mp_.
+  int              n_mmaps;
+  int              n_mmaps_max;
+  int              max_n_mmaps;
+*/
+
 /* Already initialized? */
 int __malloc_initialized = -1;
 
@@ -354,6 +361,8 @@
     if(ar_ptr == &main_arena) break;
   }
   mutex_init(&list_lock);
+  mutex_init(&mp_mmap_mutex);
+
   atfork_recursive_cntr = 0;
 }
 
$ diff -U3 malloc.c.org malloc.c
--- malloc.c.org	2009-12-13 13:26:22.000000000 +0100
+++ malloc.c	2009-12-13 18:13:13.000000000 +0100
@@ -2359,6 +2359,8 @@
    ----------- Internal state representation and initialization -----------
 */
 
+#define THREAD_STATS 1
+
 struct malloc_state {
   /* Serialize access.  */
   mutex_t mutex;
@@ -2443,6 +2445,12 @@
 /* There is only one instance of the malloc parameters.  */
 
 static struct malloc_par mp_;
+static mutex_t mp_mmap_mutex;
+/*Serialize access to (malloc.c) mp_.
+  int              n_mmaps;
+  int              n_mmaps_max;
+  int              max_n_mmaps;
+*/
 
 
 #ifdef PER_THREAD
@@ -3055,7 +3063,8 @@
 	    set_head(p, size|IS_MMAPPED);
 	  }
 
-	/* update statistics */
+	/* begin update statistics */
+	(void)mutex_lock(&mp_mmap_mutex);
 
 	if (++mp_.n_mmaps > mp_.max_n_mmaps)
 	  mp_.max_n_mmaps = mp_.n_mmaps;
@@ -3071,6 +3080,10 @@
 
 	check_chunk(av, p);
 
+	(void)mutex_unlock(&mp_mmap_mutex);
+	/* end update statistics */
+
+
 	return chunk2mem(p);
       }
     }
@@ -3521,6 +3534,7 @@
 #endif
 {
   INTERNAL_SIZE_T size = chunksize(p);
+  (void)mutex_lock(&mp_mmap_mutex);
 
   assert (chunk_is_mmapped(p));
 #if 0
@@ -3544,6 +3558,7 @@
 
   mp_.n_mmaps--;
   mp_.mmapped_mem -= total_size;
+  (void)mutex_unlock(&mp_mmap_mutex);
 
   int ret __attribute__ ((unused)) = munmap((char *)block, total_size);
 
@@ -3726,6 +3741,7 @@
   _int_free(ar_ptr, p, 0);
 #else
 # if THREAD_STATS
+
   if(!mutex_trylock(&ar_ptr->mutex))
     ++(ar_ptr->stat_lock_direct);
   else {

P.S. This is actually a problem of a mission-critical system planned to become
operational on Redhat el5 next year. Assertion on positive results is now turned
into recoverable error, so non-critical but would like to get heap supervision
correct in the future for monitoring of unexpected memory leaks. We pay support
to HP (forwarding such problems to Redhat), so request for fixing might turn up
via official channels.

Test still running, but at least getting further than before :
 64660000
 64800000
 64240000
 64810000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 7 HBLKHD= 29388800 USMBLKS= 0 FSMBLKS=
0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560 
 64670000
 64810000

-- 
           Summary: mallinfo miscounting hblks because of missing mutex
           Product: glibc
           Version: 2.9
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: stef dot van-vlierberghe at telenet dot be
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=11087

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]