This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[Question] New mmap64 syscall?
- From: Yury Norov <ynorov at caviumnetworks dot com>
- To: <libc-alpha at sourceware dot org>, <linux-arch at vger dot kernel dot org>, <linux-kernel at vger dot kernel dot org>
- Cc: Catalin Marinas <catalin dot marinas at arm dot com>, <szabolcs dot nagy at arm dot com>, <heiko dot carstens at de dot ibm dot com>, <cmetcalf at ezchip dot com>, <philipp dot tomsich at theobroma-systems dot com>, <joseph at codesourcery dot com>, <zhouchengming1 at huawei dot com>, <Prasun dot Kapoor at caviumnetworks dot com>, <agraf at suse dot de>, <geert at linux-m68k dot org>, <kilobyte at angband dot pl>, <manuel dot montezelo at gmail dot com>, <arnd at arndb dot de>, <pinskia at gmail dot com>, <linyongting at huawei dot com>, <klimov dot linux at gmail dot com>, <broonie at kernel dot org>, <bamvor dot zhangjian at huawei dot com>, <linux-arm-kernel at lists dot infradead dot org>, <maxim dot kuvyrkov at linaro dot org>, <Nathan_Lynch at mentor dot com>, <schwidefsky at de dot ibm dot com>, <davem at davemloft dot net>, <christoph dot muellner at theobroma-systems dot com>
- Date: Wed, 7 Dec 2016 00:24:40 +0530
- Subject: [Question] New mmap64 syscall?
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Yuri dot Norov at caviumnetworks dot com;
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Hi all,
(Sorry if there is similar discussion, and I missed it. I didn't
find something in LKML in last half a year.)
In aarch64/ilp32 discussion Catalin wondered why we don't pass offset
in mmap() as 64-bit value (in 2 registers if needed). Looking at kernel
code I found that there's no generic interface for it. But almost all
architectures provide their own implementations, like this:
SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long,
fd, off_t, offset)
{
unsigned long result;
result = -EINVAL;
if (offset & ~PAGE_MASK)
goto out;
result = sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
out:
return result;
}
On glibc side things are even worse. There's no mmap() implementation
that allows to pass 64-bit offset in 32-bit architecture. mmap64() which
is supposed to do this is simply broken:
void *
__mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t
offset)
{
[...]
void *result;
result = (void *) INLINE_SYSCALL (mmap2, 6, addr,
len, prot, flags, fd,
(off_t) (offset >> page_shift));
return result;
}
It explicitly declares offset as 64-bit value, but casts it to 32-bit
before passing to the kernel, which is wrong for me. Even if arch has
64-bit off_t, like aarch64/ilp32, the cast will take place because
offset is passed in a single register, which is 32-bit.
I see 3 solutions for my problem:
1. Reuse aarch64/lp64 mmap code for ilp32 in glibc, but wrap offset with
SYSCALL_LL64() macro - which converts offset to the pair for 32-bit
ports. This is simple but local solution. And most probably it's enough.
2. Add new flag to mmap, like MAP_OFFSET_IN_PAIR. This will also work.
The problem here is that there are too much arches that implement
their custom sys_mmap2(). And, of course, this type of flags is
looking ugly.
3. Introduce new mmap64() syscall like this:
sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
(The pointer here because otherwise we have 7 args, if simply pass off_hi and
off_lo in registers.)
With new 64-bit interface we can deprecate mmap2(), and generalize all
implementations in kernel.
I think we can discuss it because 64-bit is the default size for off_t
in all new 32-bit architectures. So generic solution may take place.
The last question here is how important to support offsets bigger than
2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
which are looking like main aarch64/ilp32 users. If no, we can leave
things as is, and just do nothing.
Yury
On Mon, Dec 05, 2016 at 05:12:43PM +0000, Catalin Marinas wrote:
> On Fri, Oct 21, 2016 at 11:33:10PM +0300, Yury Norov wrote:
> > off_t is passed in register pair just like in aarch32.
> > In this patch corresponding aarch32 handlers are shared to
> > ilp32 code.
> [...]
> > +/*
> > + * Note: off_4k (w5) is always in units of 4K. If we can't do the
> > + * requested offset because it is not page-aligned, we return -EINVAL.
> > + */
> > +ENTRY(compat_sys_mmap2_wrapper)
> > +#if PAGE_SHIFT > 12
> > + tst w5, #~PAGE_MASK >> 12
> > + b.ne 1f
> > + lsr w5, w5, #PAGE_SHIFT - 12
> > +#endif
> > + b sys_mmap_pgoff
> > +1: mov x0, #-EINVAL
> > + ret
> > +ENDPROC(compat_sys_mmap2_wrapper)
>
> For compat sys_mmap2, the pgoff argument is in multiples of 4K. This was
> traditionally used for architectures where off_t is 32-bit to allow
> mapping files to 2^44.
>
> Since off_t is 64-bit with AArch64/ILP32, should we just pass the off_t
> as a 64-bit value in two different registers (w5 and w6)?