This is the mail archive of the mailing list for the binutils project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [GOLD] add new method for computing a build ID

> The patch adds a new mathematical function for build ID, in addition
> to the two that are available now (SHA-1 and MD5). The new function
> does MD5 on chunks of the output file and then does SHA-1 on the MD5
> hashes of the chunks. This is easy to parallelize.

Why use SHA-1 to combine the MD5 hashes? Why not just use MD5
throughout? Or SHA-1 throughout? Is it the case that feeding MD5 into
itself is known to be weaker than one MD5 pass? If the benefit is from
parallelization, I don't really see why you'd need to switch from
SHA-1 to MD5 -- couldn't you just add your approach on top of whatever
hash function is selected?

I've got an incremental linker patch (haven't posted it yet because I
haven't finished writing the test cases) that recomputes the build id
for an incremental link by saving the context structure and streaming
just the new data into it. At the time I was implementing that, I was
thinking about rewriting the regular hash so that it would compute the
hashes of chunks in each Relocate_task, then combine the resulting
chunks at the end (adding in a few pieces not covered by the relocate
tasks). The difference is that each chunk would be the set of
contributions from an individual .o file, rather than a fixed-size
chunk of the output file. I think this would have an advantage,
though, in taking advantage of the cache locality as we're writing the
data to the output file, rather than starting up a whole new set of
tasks to go back over the data.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]