In glibc 2.25, shouldn't this... * If the query has not already been tried as is then try it * unless RES_NOTLDQUERY is set and there were no dots. */ if ((dots || !searched || (statp->options & RES_NOTLDQUERY) == 0) && !(tried_as_is || root_on_list)) { ret = __libc_res_nquerydomain(statp, name, NULL, class, type, answer, anslen, answerp, answerp2, nanswerp2, resplen2, answerp2_malloced); if (ret > 0 || (ret == 0 && resplen2 != NULL && *resplen2 > 0)) return (ret); } ...look like this... * If the query has not already been tried as is then try it * unless RES_NOTLDQUERY is set and there were no dots. */ if ((dots || !searched || (statp->options & RES_NOTLDQUERY) == 0) && !(tried_as_is || root_on_list)) { ret = __libc_res_nquerydomain(statp, name, NULL, class, type, answer, anslen, answerp, answerp2, nanswerp2, resplen2, answerp2_malloced); if (ret > 0 || (ret == 0 && resplen2 != NULL && *resplen2 > 0)) return (ret); if (answerp && *answerp != answer) { answer = *answerp; anslen = MAXPACKET; } } ...a leak occurred in this spot in eglibc 2.19 provided by openembedded for Yocto Daisy...
(In reply to Rob Krakora from comment #0) > if (answerp && *answerp != answer) { > answer = *answerp; > anslen = MAXPACKET; > } I don't see how this could make a difference because the answer and anslen variables are dead at this point. > ...a leak occurred in this spot in eglibc 2.19 provided by openembedded for > Yocto Daisy... Can you provide more information how to reproduce the leak? Thanks.
Here is a trace flagged by dmalloc (we could not use Valgrind as we are on an ARM v5 and Valgrind support starts at ARM v7) as the source of the 64K memory leak. #0 send_dg (resplen2=<optimized out>, anssizp2=<optimized out>, ansp2=<optimized out>, anscp=<optimized out>, gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, ns=-1290429468, terrno=0xb3159bcc, anssizp=0xb3159c80, ansp=0xb3159c7c, buflen2=<optimized out>, buf2=<optimized out>, buflen=<optimized out>, buf=<optimized out>, statp=0x0) at res_send.c:1359 #1 __libc_res_nsend (statp=statp@entry=0xb315c704, buf=0x0, buf@entry=0xb3159ca8 "(\255\001", buflen=0, buflen@entry=63, buf2=0xb315c704 "\005", buf2@entry=0xb3159ce8 "\001q\001", buflen2=buflen2@entry=63, ans=ans@entry=0xb315a7f0 "(\255\201\200", anssiz=anssiz@entry=2048, ansp=ansp@entry=0xb315aff8, ansp2=ansp2@entry=0xb315affc, nansp2=nansp2@entry=0xb315b000, resplen2=resplen2@entry=0xb315b004) at res_send.c:575 #2 0xb0927b84 in __GI___libc_res_nquery (statp=statp@entry=0xb315c704, name=name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", class=class@entry=1, type=type@entry=17429524, answer=0xb315a7f0 "(\255\201\200", answer@entry=0xb0928634 <__GI___libc_res_nsearch+656> "", anslen=2048, anslen@entry=62321, answerp=0xb315aff8, answerp@entry=0xb315a7f0, answerp2=0xb315affc, answerp2@entry=0x800, nanswerp2=0xb315b000, nanswerp2@entry=0xb315aff8, resplen2=0xb315b004, resplen2@entry=0xb315affc) at res_query.c:226 #3 0xb0928114 in __libc_res_nquerydomain (statp=statp@entry=0xb315c704, name=name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", domain=domain@entry=0x0, class=1, class@entry=5024748, type=62321, type@entry=0, answer=0xb315a7f0 "(\255\201\200", answer@entry=0xb315b03c "\330K\241\266`\265\025\263l\265\025\263", anslen=anslen@entry=2048, answerp=answerp@entry=0xb315aff8, answerp2=answerp2@entry=0xb315affc, nanswerp2=0xb315b000, nanswerp2@entry=0xb315aff8, resplen2=0xb315b004, resplen2@entry=0xb315affc) at res_query.c:585 #4 0xb0928634 in __GI___libc_res_nsearch (statp=0xb315c704, name=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", class=5024748, class@entry=1, type=0, type@entry=62321, answer=answer@entry=0xb315a7f0 "(\255\201\200", anslen=anslen@entry=2048, answerp=0xb315aff8, answerp2=answerp2@entry=0xb315affc, nanswerp2=nanswerp2@entry=0xb315b000, resplen2=resplen2@entry=0xb315b004) at res_query.c:378 #5 0xb093f4a4 in _nss_dns_gethostbyname4_r (name=<optimized out>, name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", pat=pat@entry=0xb315b55c, buffer=buffer@entry=0xb315b058 "\177", buflen=buflen@entry=1056, errnop=errnop@entry=0xb315b560, herrnop=herrnop@entry=0xb315b56c, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:314 #6 0xb6a14bd8 in gaih_inet (name=<optimized out>, name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", service=<optimized out>, req=req@entry=0xb315b6f8, pai=pai@entry=0xb315b608, naddrs=naddrs@entry=0xb315b614) at ../sysdeps/posix/getaddrinfo.c:850 #7 0xb6a16ab4 in __GI_getaddrinfo (name=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", service=0x0, hints=0xb315b6f8, pai=0xb315b6d8) at ../sysdeps/posix/getaddrinfo.c:2406 #8 0x00220ddc in Poco::Net::DNS::hostByName (hostname=..., hintFlags=hintFlags@entry=34) at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/DNS.cpp:60 #9 0x00202000 in Poco::Net::SocketAddress::init (this=this@entry=0xb315b7a0, hostAddress=..., portNumber=<optimized out>) at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/SocketAddress.cpp:233 #10 0x002022fc in Poco::Net::SocketAddress::SocketAddress (this=0xb315b7a0, hostAddress=..., portNumber=<optimized out>) at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/SocketAddress.cpp:76 #11 0x002186fc in Poco::Net::HTTPClientSession::reconnect (this=this@entry=0x1219a08) at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/HTTPClientSession.cpp:341 #12 0x00219760 in Poco::Net::HTTPClientSession::sendRequest (this=0x1219a08, request=...) at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/HTTPClientSession.cpp:205 #13 0x000be8b0 in AvisePocoClient::sendRequest (this=0x10f7008, req=...)
The trace was against eglibc 2.19 running on a Yocto Daisy build...
(In reply to Rob Krakora from comment #3) > The trace was against eglibc 2.19 running on a Yocto Daisy build... Can you reproduce this at will? In this case, it would be very helpful to have a capture of the DNS queries and responses. Do you have any peculiar /etc/resolv.conf or RES_OPTIONS settings? What does the hints structure for getaddrinfo look like?
Yes, I can reproduce this at will. resolv.conf only contains the this: # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN nameserver 8.8.8.8 Where 8.8.8.8 is the Google DNS server. It also fails when other DNS servers are utilized. I can capture the DNS traffic for you when this occurs. tcpdump capture Ok?
(In reply to Rob Krakora from comment #5) > I can capture the DNS traffic for you when this occurs. tcpdump capture Ok? Yes, please do a binary (pcap file) capture with “-s 0” (complete packets). You should limit the capture with “host 8.8.8.8”, so that both UDP, fragmented UDP, and TCP packets are included. Capturing strace output at the same time would be helpful, too, in case the stub resolver performs unexpected system calls which are not reflected in the network traffic. How close are your sources to upstream glibc 2.19? I wonder if this could be a faulty backport.
Created attachment 9969 [details] Attached is the eglibc bitbake recipe provided with Yocto Daisy which we use to build our Yocto image for our ARM based embedded system. Attached is the eglibc bitbake recipe provided with Yocto Daisy which we use to build our Yocto image for our ARM based embedded system.
Created attachment 9970 [details] Here is the pcap and strace captures... Here is the pcap capture...
Created attachment 9971 [details] Here is the strace capture... Here is the strace capture...
The leak is occurring in the eglibc 2.19 code below...however, in the glibc 2.25 code further below there is some code added to indicated to the layers above that a malloc has occurred which would negate the my proposed patch. eglibc 2.19 and glibc 2.19 are closer and glibc 2.19 does not have the code to indicate to the layers above that a malloc has occurred so it could possibly have a leak??? eglibc 2.19: if (*thisanssizp < MAXPACKET /* If the current buffer is not the the static user-supplied buffer then we can reallocate it. */ && (thisansp != NULL && thisansp != ansp) #ifdef FIONREAD /* Is the size too small? */ && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0 || *thisanssizp < *thisresplenp) #endif ) { /* Always allocate MAXPACKET, callers expect this specific size. */ u_char *newp = malloc (MAXPACKET); if (newp != NULL) { *thisanssizp = MAXPACKET; *thisansp = newp; } } glibc 2.25: if (*thisanssizp < MAXPACKET /* If the current buffer is not the the static user-supplied buffer then we can reallocate it. */ && (thisansp != NULL && thisansp != ansp) #ifdef FIONREAD /* Is the size too small? */ && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0 || *thisanssizp < *thisresplenp) #endif ) { /* Always allocate MAXPACKET, callers expect this specific size. */ u_char *newp = malloc (MAXPACKET); if (newp != NULL) { *thisanssizp = MAXPACKET; *thisansp = newp; if (thisansp == ansp2) *ansp2_malloced = 1; } }
That is not it...adding the code from my previous reply does not fix the leak. Only the code change I originally proposed fixes it...hmmm....
Created attachment 9972 [details] Patch for eglibc-2.19
Created attachment 9973 [details] Patch for eglibc-2.19 - CORRECTION
If you want to continue using 2.19 you are on your own. Unless you can reproduce it with 2.25 this is a WONTFIX.
We are moving to Yocto Krogoth which employs glibc 2.23 instead of eglibc 2.19 in a couple of months. I will run the same test and see if the leak exists there.
The announcement of CVE-2015-7547 said: “ - Always malloc the second response buffer if needed. - Requires fix for sourceware bug 16574 to avoid memory leak. commit d668061994a7486a3ba9c7d5e7882d85a2883707 commit ab09bf616ad527b249aca5f2a4956fd526f0712f ” <https://www.sourceware.org/ml/libc-alpha/2016-02/msg00416.html> Coincidentally, this regression originally delayed the disclosure of CVE-2015-7547. The upstream glibc 2.19 branch already had the fix for bug 16574 when CVE-2015-7547 was fixed, but our downstream 2.12 and 2.17 branches still needed it. If you have you own resolv backports, you should really try to get a valgrind-clean pass with the external resolv test suite: <https://pagure.io/glibc-resolv-tests> I'm in the process of upstreaming the remaining tests. *** This bug has been marked as a duplicate of bug 16574 ***
Yes, I would LOVE to use Valgrind. However, we are on a ARM v5 and Valgrind ARM support starts at v7. :) Thanks so much for your help and valuable time!!!
(In reply to Rob Krakora from comment #17) > Yes, I would LOVE to use Valgrind. However, we are on a ARM v5 and Valgrind > ARM support starts at v7. I assume you could perform memory leak testing on a different architecture than your production architecture. The important bit is to increase test coverage. Architecture-specific memory leaks happen, but are not the norm.
True.