This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex
- From: "stef dot van-vlierberghe at telenet dot be" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: 14 Dec 2009 08:56:34 -0000
- Subject: [Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex
- Reply-to: sourceware-bugzilla at sourceware dot org
I have an issue with non-atomic increment/
decrement
of hblks, (i.e. n_mmaps) in a multi-threaded mission critical process
that monitors
heap usage as part of supervision. This value can drift astray of the
true count and
rarerly triggers an assertion in the total heap calculatio. There is a
similar disabled
assert in glibc:
#ifdef NO_THREADS
assert(total <= (unsigned long)(mp_.max_total_mem));
assert(mp_.n_mmaps >= 0);
#endif
Seem to be a problem of wanting to pay the cost of mutex contention
instead of
the (application specific) cost of providing wrong values. In the
context of a malloc call
that asks for 2 Mb, I have not figured out that part yet, but I can't
imagine that keeping
a correct total count in a multi-threaded is beyond state of the art
(but am sure it may
be beyond desire of hackers or motivation of maintainers). There is a
disabled
THREAD_STATS switch but that doesn't seem to handle this case. The
problem might be
resolved, but at home I can still reproduce it with latest glibc:
with Text_Io;
with Unchecked_Deallocation;
procedure T is
type Mallinfo_Field is (Arena, Ordblks, Smblks, Hblks, Hblkhd,
Usmblks, Fsmblks, Uordblks, Fordblks, Keepcost);
type Mallinfo_Stats is array (Mallinfo_Field) of Integer;
-- on 32 bit.
function mallinfo return Mallinfo_Stats;
pragma Interface (C, mallinfo);
type Large is array (1..2**22) of Character; -- 4M
type Large_Ptr is access Large;
procedure Free is new Unchecked_Deallocation (Large, Large_Ptr);
task type Alloc is
pragma Priority (0);
end Alloc;
task body Alloc is
L : array (1..10) of Large_Ptr;
begin
for Count in Natural loop
for I in L'Range loop
L(I) := new Large;
end loop;
for I in L'Range loop
Free (L(I));
end loop;
-- To check for sufficient parallelism
-- Test on multi-core/multi-processor
if Count mod 10000 = 0 then
Text_Io.Put_Line (Count'Img);
end if;
end loop;
end;
Loads : array (1..4) of Alloc;
procedure Show_Stats is
Result : constant Mallinfo_Stats := mallinfo;
begin
for Field in Result'Range loop
Text_IO.Put (Field'Img & "=" & Result(Field)'Img & " ");
end loop;
Text_Io.New_Line;
if Result(Hblks) not in 0..40 then
for T in Loads'Range loop
abort Loads(T);
end loop;
raise Program_Error;
end if;
end Show_Stats;
begin
loop
delay 1.0;
Show_Stats;
end loop;
end;
output :
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 0 HBLKHD=-20992000 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
22700000
22620000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 17 HBLKHD= 50380800 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
22630000
22570000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 7 HBLKHD= 8396800 USMBLKS= 0
FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
22710000
22630000
22640000
22580000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 2 HBLKHD=-12595200 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
22720000
22640000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 0 HBLKHD=-20992000 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
22650000
22590000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 26 HBLKHD= 88166400 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
22730000
22650000
22660000
22600000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS=-1 HBLKHD=-25190400 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
raised PROGRAM_ERROR : t.adb:52 explicit raise
Posted on :
http://groups.google.com/group/comp.os.linux.development.apps/browse_thread/thread/80b0c703fe7004c2/f512a2856b0eb3e3?lnk=raot&pli=1
Found a comment stating mutexing was expensive :
http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/c25c962a87d81895/674575250fd87044?q=mallinfo+hblk#674575250fd87044
I've tried adding a mutex, and I don't see what is expensive. When I compare the
latest version of glibc malloc after adding the mutex locking against the system
default glibc behaviour I don't measure a difference, results are within 3%
accurate (some fluctuations on the measure, just used time command), I guess
mutex cost is much smaller than mmap cost. Anyway, that is only performance of
iterative malloc/free of a 4 MB block, adding nothing more but a zero-fill of
this memory makes the operation about 1000 times slower, so I don't see a
significant issue with this mutex overhead.
In case overhead would be significant, at least a #define THREAD_SAFE 1 should
activate a fix, or a dynamic option, but surely correct total heap count should
be achievable.
My experiment :
$ diff -U3 arena.c.orig arena.c
--- arena.c.orig 2009-12-13 18:00:55.000000000 +0100
+++ arena.c 2009-12-13 18:03:12.000000000 +0100
@@ -93,6 +93,13 @@
/* Mapped memory in non-main arenas (reliable only for NO_THREADS). */
static unsigned long arena_mem;
+static mutex_t mp_mmap_mutex;
+/*Serialize access to (malloc.c) mp_.
+ int n_mmaps;
+ int n_mmaps_max;
+ int max_n_mmaps;
+*/
+
/* Already initialized? */
int __malloc_initialized = -1;
@@ -354,6 +361,8 @@
if(ar_ptr == &main_arena) break;
}
mutex_init(&list_lock);
+ mutex_init(&mp_mmap_mutex);
+
atfork_recursive_cntr = 0;
}
$ diff -U3 malloc.c.org malloc.c
--- malloc.c.org 2009-12-13 13:26:22.000000000 +0100
+++ malloc.c 2009-12-13 18:13:13.000000000 +0100
@@ -2359,6 +2359,8 @@
----------- Internal state representation and initialization -----------
*/
+#define THREAD_STATS 1
+
struct malloc_state {
/* Serialize access. */
mutex_t mutex;
@@ -2443,6 +2445,12 @@
/* There is only one instance of the malloc parameters. */
static struct malloc_par mp_;
+static mutex_t mp_mmap_mutex;
+/*Serialize access to (malloc.c) mp_.
+ int n_mmaps;
+ int n_mmaps_max;
+ int max_n_mmaps;
+*/
#ifdef PER_THREAD
@@ -3055,7 +3063,8 @@
set_head(p, size|IS_MMAPPED);
}
- /* update statistics */
+ /* begin update statistics */
+ (void)mutex_lock(&mp_mmap_mutex);
if (++mp_.n_mmaps > mp_.max_n_mmaps)
mp_.max_n_mmaps = mp_.n_mmaps;
@@ -3071,6 +3080,10 @@
check_chunk(av, p);
+ (void)mutex_unlock(&mp_mmap_mutex);
+ /* end update statistics */
+
+
return chunk2mem(p);
}
}
@@ -3521,6 +3534,7 @@
#endif
{
INTERNAL_SIZE_T size = chunksize(p);
+ (void)mutex_lock(&mp_mmap_mutex);
assert (chunk_is_mmapped(p));
#if 0
@@ -3544,6 +3558,7 @@
mp_.n_mmaps--;
mp_.mmapped_mem -= total_size;
+ (void)mutex_unlock(&mp_mmap_mutex);
int ret __attribute__ ((unused)) = munmap((char *)block, total_size);
@@ -3726,6 +3741,7 @@
_int_free(ar_ptr, p, 0);
#else
# if THREAD_STATS
+
if(!mutex_trylock(&ar_ptr->mutex))
++(ar_ptr->stat_lock_direct);
else {
P.S. This is actually a problem of a mission-critical system planned to become
operational on Redhat el5 next year. Assertion on positive results is now turned
into recoverable error, so non-critical but would like to get heap supervision
correct in the future for monitoring of unexpected memory leaks. We pay support
to HP (forwarding such problems to Redhat), so request for fixing might turn up
via official channels.
Test still running, but at least getting further than before :
64660000
64800000
64240000
64810000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 7 HBLKHD= 29388800 USMBLKS= 0 FSMBLKS=
0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
64670000
64810000
--
Summary: mallinfo miscounting hblks because of missing mutex
Product: glibc
Version: 2.9
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: drepper at redhat dot com
ReportedBy: stef dot van-vlierberghe at telenet dot be
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=11087
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.