This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] manual: Document mprotect and introduce section on memory protection


On 11/15/2017 05:40 AM, Florian Weimer wrote:
> On 11/15/2017 07:44 AM, Rical Jasan wrote:
>> On 11/14/2017 06:26 AM, Florian Weimer wrote:
>>> 2017-11-14  Florian Weimer  <fweimer@redhat.com>
>>>
>>>     manual: Document mprotect
>>>     * manual/memory.texi (Memory Protection): New section.
>>>     * manual/llio.texi (Memory-mapped I/O): Reference this section.
>>>     Remove duplicate documentation of PROT_* flags.
>>
>> "this section" sounds like it refers to Memory-mapped I/O.  What about:
>>
>> "Reference the new section.  Move documentation of PROT_* flags there."
> 
> I've replaced it with the section name.
> 
>>> -Note that most hardware designs cannot support write permission without
>>> -read permission, and many do not distinguish read and execute permission.
>>>
>>> -Thus, you may receive wider permissions than you ask for, and mappings of
>>>
>>> -write-only files may be denied even if you do not use @code{PROT_READ}.
>>> +@code{PROT_EXEC}.  The special flag @code{PROT_NONE} reserve a region of
>>>
>>
>> "reserves"
> 
> Thanks, fixed.
> 
>>> --- a/manual/memory.texi
>>> +++ b/manual/memory.texi
>>> @@ -17,6 +17,7 @@ and allocation of real memory.
>>>   * Memory Concepts::             An introduction to concepts and terminology.
>>>
>>>   * Memory Allocation::           Allocating storage for your program data
>>>
>>>   * Resizing the Data Segment::   @code{brk}, @code{sbrk}
>>> +* Memory Protection::          Controlling access to memory regions.
>>
>> Looks like a tab is throwing off the alignment.
> 
> Interesting.  Not sure how this happened.  I switched to spaces, like
> the surrounding entries.
> 
>>> +The following flags are availabe.
>>
>> "available:"
> 
> Fixed.
> 
>>> +@table @code
>>> +@item PROT_WRITE
>>> +@vindex PROT_WRITE
>>
>> Make this an @vtable and drop the @vindex entries (@items in an @vtable
>> are automatically @vindex-ed).  The @items should then have @standards
>> beneath them with the relevant standard and header, which will also add
>> them to the Summary of Library Facilities.
> 
> Nice!
> 
>>> +The memory can be written to.
>>> +
>>> +@item PROT_READ
>>> +@vindex PROT_READ
>>> +The memory can be read.  On some architectures, this flag implies that
>>
>> A personal preference, but I would drop "that" as superfluous.
> 
> I find it easier to read because it indicates that the following nominal
> phrase belongs to a subclause.
> 
>>> +the memory can be executed as well (as if @code{PROT_EXEC} had been
>>> +specified at the same time).
>>
>> This seems a fair warning, but how would one know if their architecture
>> was affected?  Is there a good general method, other than simply trying
>> to execute something?
> 
> There is no general mechanism to figure out what the system will do if
> execution is attempted.  Linux will show the selected protection flags
> in /proc/self/maps even if the hardware does not enforce them.  There's
> a funny story everyone assumed that s390x had non-executable
> data/stack/heap because it shows up in /proc, and the test which
> attempted to execute run-time generated code always crashed. Eventually,
> it turned out that the test crashed no matter what (it used sizeof
> (func) to compute the size of a function, which is always 1, a
> GCC extension). 8-/
> 
> For device memory, all kinds of funny things can happen.  With memory
> protection keys, the situation is even more murky because system may use
> reserved keys to implement otherwise unsupported protection flag
> combinations, so you get different baseline (non-key mediated) access
> privileges if you specify a key explicitly using pkey_mprotect.
> 
>>> +
>>> +@item PROT_EXEC
>>> +@vindex PROT_EXEC
>>> +The memory can be used store instructions which can then be executed.
>>
>> "used to store"
>>
>>> +On most architectures, this flag implies that the memory can be read (as
>>>
>>
>> Similarly for "that" here, as above.
>>
>>> +if @code{PROT_READ} had been specified).
>>
>> The use of "most" with the READ/EXEC caveat here contrasts with the use
>> of "some" above, in a good way, if that was intentional.
> 
> It is.  8-)
> 
>> I think the
>> relationship here is more intuitive, anyway, which the language seems to
>> suggest.  If there is a good way to tell, though, it would be nice to be
>> complete and provide some direction here as well, or maybe for both
>> after the @table.
> 
> A fuller explanation needs to reference the (currently undocumented)
> personality function.  We could say that on GNU, PROT_EXEC always
> implies PROT_READ, as a matter of policy.>
>>> +
>>
>> Would an `@cindex anonymous mappings' make sense here?  (We need one
>> above MAP_ANON, but that's beyond the scope of this patch.)
>>
>>> +@item PROT_NONE
>>> +@vindex PROT_NONE
>>> +This flag must be specified on its own.
>>> +
>>> +The memory is reserved, but cannot be read, written, or executed.  If
>>> +this flag is specified in a call to @code{mmap}, a virtual memory area
>>> +will be set aside for future use in the process, and @code{mmap} calls
>>> +without the @code{MAP_FIXED} flag will not use it for subsequent
>>> +allocations.  For anonymous mappings, the kernel will not reserve any
>>> +physical memory for the allocation, though.
>>> +@end table
> 
> I'm changing the last sentence to:
> 
> “For anonymous mappings, the kernel will not reserve any
> physical memory for the allocation at the time the mapping is created.”
> 
>>> +The operating system may keep of these flags separately even if the
>>
>> "keep track"
> 
> Fixed.
> 
>>> +underlying hardware treats them the same for the purposes of access
>>> +checking (as it happens with @code{PROT_READ} and @code{PROT_EXEC} on
>>
>> I would just say, "as happens".
> 
> Okay.
> 
>>> +some platforms).
>>> +
>>> +Inappropriate access will cause a segfault (@pxref{Program Error
>>> +Signals}).
>>> +
>>> +After allocation, protection flags can be changed using the
>>> +@code{mprotect} function.
>>> +
>>> +@deftypefun int mprotect (void *@var{address}, size_t @var{length}, int @var{protection})
>>>
>>> +@standards{POSIX, sys/mman.h}
>>> +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
>>> +
>>> +A successful call to the @code{mprotect} function changes the protection
>>>
>>> +flags of at least @var{length} bytes of memory, starting at
>>> +@var{address}.
>>> +
>>> +@var{address} must be aligned to the page size for the mapping.  The
>>> +system page size can be obtained using @samp{sysconf (_SC_PAGE_SIZE)}.
>>
>> "_SC_PAGESIZE"
> 
> Oops.
> 
>> I believe we avoid function calls in sentences, so this should either be
>> reworded or made into an @[small]example.  Rewording might make it
>> easier to more tightly bind the reference with an @pxref.  Perhaps:
>>
>> "The system page size can be obtained by calling @code{sysconf} with the
>> @code{_SC_PAGESIZE} parameter (@pxref{Sysconf Definition})."
> 
> I've adopted this suggestions.
> 
>>> +@xref{Sysconf Definition}.  The system page size is the granularity in
>>> +which the page protection of anonymous memory mappings and most file
>>> +mappings can be changed.  Memory which is mapped from special files or
>>> +devices may have larger page granularity than the system page size and
>>> +may require larger alignment.
>>> +
>>> +@var{length} is the number of bytes whose protection flags must be
>>> +changed.  It is automatically rounded up to the next multiple of the
>>> +system page size.
>>> +
>>> +@var{protection} is a combination of the @code{PROT_}* described above.
>>
>> "the @code{PROT_*} flags described above."  (Two changes: the addition
>> of "flags", and the manual puts the wildcard within the brace.)
> 
> Good.
> 
>> Regarding the @table below, the manual has ingrained in me the
>> traditional introductory phrase:
>>
>> "The following @code{errno} error conditions are defined for this function:"
>>
> 
> Right, I missed that.
> 
>>> +@table @code
>>> +@item ENOMEM
>>> +The system was not able to allocate resources to fulfill the request.
>>> +This can happen if there is not enough physical memory in the system for
>>>
>>> +the allocation of backing storage (if the memory was initially mapped
>>> +with @code{PROT_NONE}).  The error can also occur if the new protection
>>
>> Will calling mprotect always cause backing storage to be allocated if it
>> wasn't already?  If so, that might be mentioned above.
> 
> That depends on the overrcommit configuration of the system.  The Linux
> default is not to reserve backing store, but ridiculously large requests
> will still fail (even before hitting architectural virtual memory limits).
> 
>> Also, the PROT_NONE description says virtual memory is reserved, but the
>> use of "though" in the sentence about anonymous mappings makes it sound
>> like this error would only be encountered if PROT_NONE was used with
>> MAP_ANON, so it might be good to clarify whether the use of PROT_NONE
>> without MAP_ANON will, won't, or may allocate backing storage.
> 
> I tweaked the PROT_NONE documentation, as explained above.
> 
> The full story is complex.  For regular file-based memory and a
> MAP_SHARED mapping or a mapping which is never written to, there is
> backing store.  But if you map a file from tmpfs, Linux will by default
> not allocate physical memory (again depending on the overcommit
> configuration).
> 
>>> +flags would cause the memory region to be split from its neighbors, and
>>> +the process limit for the number of such distinct memory regions would
>>> +be exceeded.
>>
>> Is there a way to know that limit, and the process' current count?  May
>> be TMI, but these kinds of warnings always make me wonder.
> 
> On Linux, you can read it from /proc, but it's difficult to predict when
> splitting is needed because it depends on arch-specific memory
> management aspects.
> 
>>> +
>>> +@item EINVAL
>>> +@var{address} is not properly aligned to a page boundary for the
>>> +mapping, or @var{length} (after rounding up to the system page size) is
>>> +not a multiple of the applicable page size for the mapping, or the
>>> +combination of flags in @var{protection} is not valid.
>>> +
>>> +@item EACCES
>>> +The file for a file-based mapping was not opened with flags which match
>>> +@var{protection}.
>>
>> Isn't mprotect supposed to set the flags to protection?
> 
> Right, the wording isn't correct.
> 
> “
> The file for a file-based mapping was not opened with open flags which
> are compatible with @var{protection}.
> “”
> 
>> Is there
>> somewhere we can reference what is special about file-based mappings
>> such that they can prevent the use of mprotect?  How is this different
>> from EPERM, below?  The content below the @deftypefun makes it sound
>> like this might only be returned by some systems.
> 
> Maybe we should just say “non-anonymous mappings“ instead?
> 
> You can map device memory using files under /dev and /proc, even
> starting with regular files, and strange things can happen there.  These
> are technically file-based mappings because that's how you create them,
> but they are very different from what files would behave like.

Given the is-or-isn't nature of "[non-]anonymous", that could work.  If
"file-based" is equivalent to "non-anonymous" and we stuck with the
latter, that'd be fine, but my concern is the (lack of) introduction of
these terms.  "Anonymous mapping" only appears once in the manual, isn't
introduced with an @dfn, and doesn't have an @cindex to find it by; and
this is the first occurrence of "file-based mapping".

I can submit a patch that introduces and indexes the anonymous mapping
concept where it's mentioned in llio.texi, but I don't know where we
would provide a similarly cursory mention of file-based mappings.
Ultimately, I'd rather see this patch go in than get stalled on my own
unfamiliarity with the vernacular, and if "file-based" is how you as the
author would naturally refer to it, that's probably best.

> 
> EACCES vs EPERM is a bit tricky.  On Linux, EACCES generally refers to
> the permission bits/ACLs on the objects directly involved, and EPERM is
> more about policy violations and exceeded limits.  Many library
> functions can return both, and I'm not sure this is the right place to
> elaborate on the difference.
> 
>>> +@item EPERM
>>> +The system security policy does not allow a mapping with the specified
>>> +flags.  For example, mappings which are both @code{PROT_EXEC} and
>>> +@code{PROT_WRITE} at the same time might not be allowed.
>>> +@end table
>>> +@end deftypefun
>>> +
>>> +If the @code{mprotect} function is used to make a region of memory
>>> +inaccessible by specifying the @code{PROT_NONE} protection flag and
>>> +access is later restored, the memory retains its previous contents.
>>> +
>>> +On some systems, it may not be possible to specify additional flags
>>> +which were not present when the mapping was first created.  For example,
>>>
>>> +an attempt make a region of memory executable could fail if the initial
>>
>> "attempt to make"
>>
>>> +protection flags where @samp{PROT_READ | PROT_WRITE}.
>>
>> "were"
> 
> Fixed.
> 
>>> +In general, the @code{mprotect} function can be used to change any
>>> +process memory, no matter how it was allocated.  However, portable use
>>> +of the function requires that it is only used with memory regions
>>> +returned by @code{mmap} or @code{mmap64}.
>>>     @node Locking Pages
>>>   @section Locking Pages
>>>
>>
>> I think this is quality documentation.  Thank you.
> 
> Thanks a lot for your review!
> 
> I'm attaching the updated patch.
> 
> Florian
> 
> mprotect.patch
> 
> 
> 2017-11-14  Florian Weimer  <fweimer@redhat.com>
> 
> 	manual: Document mprotect
> 	* manual/memory.texi (Memory Protection): New section.
> 	* manual/llio.texi (Memory-mapped I/O): Remove duplicate
> 	documentation of PROT_* flags and reference the Memory Protection
> 	section instead.
> 
> diff --git a/manual/llio.texi b/manual/llio.texi
> index 10ad546723..cbcd2221f6 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1411,19 +1411,11 @@ is created, which is not removed by closing the file.
>  address is automatically removed.  The address you give may still be
>  changed, unless you use the @code{MAP_FIXED} flag.
>  
> -@vindex PROT_READ
> -@vindex PROT_WRITE
> -@vindex PROT_EXEC
>  @var{protect} contains flags that control what kind of access is
>  permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
> -@code{PROT_EXEC}, which permit reading, writing, and execution,
> -respectively.  Inappropriate access will cause a segfault (@pxref{Program
> -Error Signals}).
> -
> -Note that most hardware designs cannot support write permission without
> -read permission, and many do not distinguish read and execute permission.
> -Thus, you may receive wider permissions than you ask for, and mappings of
> -write-only files may be denied even if you do not use @code{PROT_READ}.

Coming back to this now, is there a reason not to leave this paragraph in?

> +@code{PROT_EXEC}.  The special flag @code{PROT_NONE} reserves a region
> +of address space for future use.  The @code{mprotect} function can be
> +used to change the protection flags.  @xref{Memory Protection}.
>  
>  @var{flags} contains flags that control the nature of the map.
>  One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
> diff --git a/manual/memory.texi b/manual/memory.texi
> index 51a5f4e83c..dd6ac37c10 100644
> --- a/manual/memory.texi
> +++ b/manual/memory.texi
> @@ -17,6 +17,7 @@ and allocation of real memory.
>  * Memory Concepts::             An introduction to concepts and terminology.
>  * Memory Allocation::           Allocating storage for your program data
>  * Resizing the Data Segment::   @code{brk}, @code{sbrk}
> +* Memory Protection::           Controlling access to memory regions.
>  * Locking Pages::               Preventing page faults
>  @end menu
>  
> @@ -3047,7 +3048,126 @@ of the data segment is.
>  
>  @end deftypefun
>  
> +@node Memory Protection
> +@section Memory Protection
> +@cindex memory protection
> +@cindex page protection
> +@cindex protection flags
>  
> +When a page is mapped using @code{mmap}, page protection flags can be
> +specified using the protection flags argument.  @xref{Memory-mapped
> +I/O}.
> +
> +The following flags are available:
> +
> +@vtable @code
> +@item PROT_WRITE
> +@standards{POSIX, sys/mman.h}

I skipped over this before since we haven't canonicalized standards yet,
but can you provide a specific POSIX version (e.g., POSIX.1-2001)?

> +The memory can be written to.
> +
> +@item PROT_READ
> +@standards{POSIX, sys/mman.h}
> +The memory can be read.  On some architectures, this flag implies that
> +the memory can be executed as well (as if @code{PROT_EXEC} had been
> +specified at the same time).
> +
> +@item PROT_EXEC
> +@standards{POSIX, sys/mman.h}
> +The memory can be used to store instructions which can then be executed.
> +On most architectures, this flag implies that the memory can be read (as
> +if @code{PROT_READ} had been specified).
> +
> +@item PROT_NONE
> +@standards{POSIX, sys/mman.h}
> +This flag must be specified on its own.
> +
> +The memory is reserved, but cannot be read, written, or executed.  If
> +this flag is specified in a call to @code{mmap}, a virtual memory area
> +will be set aside for future use in the process, and @code{mmap} calls
> +without the @code{MAP_FIXED} flag will not use it for subsequent
> +allocations.  For anonymous mappings, the kernel will not reserve any
> +physical memory for the allocation at the time the mapping is created.

Just to make sure I'm reading this correctly, if you do use PROT_NONE
and don't use MAP_ANON, the kernel may allocate the backing storage.

> +@end vtable
> +
> +The operating system may keep track of these flags separately even if
> +the underlying hardware treats them the same for the purposes of access
> +checking (as happens with @code{PROT_READ} and @code{PROT_EXEC} on some
> +platforms).  On GNU systems, @code{PROT_EXEC} always implies
> +@code{PROT_READ}, so that users can view the machine code which is
> +executing on their system.

+1

> +
> +Inappropriate access will cause a segfault (@pxref{Program Error
> +Signals}).
> +
> +After allocation, protection flags can be changed using the
> +@code{mprotect} function.
> +
> +@deftypefun int mprotect (void *@var{address}, size_t @var{length}, int @var{protection})
> +@standards{POSIX, sys/mman.h}
> +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
> +
> +A successful call to the @code{mprotect} function changes the protection
> +flags of at least @var{length} bytes of memory, starting at
> +@var{address}.
> +
> +@var{address} must be aligned to the page size for the mapping.  The
> +system page size can be obtained by calling @code{sysconf} with the
> +@code{_SC_PAGESIZE} parameter (@pxref{Sysconf Definition}).  The system
> +page size is the granularity in which the page protection of anonymous
> +memory mappings and most file mappings can be changed.  Memory which is
> +mapped from special files or devices may have larger page granularity
> +than the system page size and may require larger alignment.
> +
> +@var{length} is the number of bytes whose protection flags must be
> +changed.  It is automatically rounded up to the next multiple of the
> +system page size.
> +
> +@var{protection} is a combination of the @code{PROT_*} flags described
> +above.
> +

The return values of the function are also missing:

"@code{mprotect} returns @code{0} on success or @code{-1} on error."

> +The following @code{errno} error conditions are defined for this
> +function:
> +
> +@table @code
> +@item ENOMEM
> +The system was not able to allocate resources to fulfill the request.
> +This can happen if there is not enough physical memory in the system for
> +the allocation of backing storage (if the memory was initially mapped
> +with @code{PROT_NONE}).  The error can also occur if the new protection

Can we simply drop the parenthetical "if" here?

> +flags would cause the memory region to be split from its neighbors, and
> +the process limit for the number of such distinct memory regions would
> +be exceeded.
> +
> +@item EINVAL
> +@var{address} is not properly aligned to a page boundary for the
> +mapping, or @var{length} (after rounding up to the system page size) is
> +not a multiple of the applicable page size for the mapping, or the
> +combination of flags in @var{protection} is not valid.
> +
> +@item EACCES
> +The file for a file-based mapping was not opened with open flags which
> +are compatible with @var{protection}.
> +
> +@item EPERM
> +The system security policy does not allow a mapping with the specified
> +flags.  For example, mappings which are both @code{PROT_EXEC} and
> +@code{PROT_WRITE} at the same time might not be allowed.
> +@end table
> +@end deftypefun
> +
> +If the @code{mprotect} function is used to make a region of memory
> +inaccessible by specifying the @code{PROT_NONE} protection flag and
> +access is later restored, the memory retains its previous contents.
> +
> +On some systems, it may not be possible to specify additional flags
> +which were not present when the mapping was first created.  For example,
> +an attempt to make a region of memory executable could fail if the
> +initial protection flags were @samp{PROT_READ | PROT_WRITE}.
> +
> +In general, the @code{mprotect} function can be used to change any
> +process memory, no matter how it was allocated.  However, portable use
> +of the function requires that it is only used with memory regions
> +returned by @code{mmap} or @code{mmap64}.
>  
>  @node Locking Pages
>  @section Locking Pages

This is looking really good.

Rical


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]