This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: onwards to git


Thomas Schwinge wrote:

> Hello Jim!
>
> On Fri, May 22, 2009 at 12:27:09PM +0200, I wrote:
>> On Tue, May 12, 2009 at 04:15:17PM +0200, Jim Meyering wrote:
>> > Thomas Schwinge wrote:
>> > > On Mon, May 11, 2009 at 10:49:32PM +0200, Jim Meyering wrote:
>> > >> I've converted the trunk and all branches, filtering
>> > >> to aggregate commits, and cleaning up by removing empty commits
>> > >> and applying heuristics to use reasonable commit messages
>> > >> derived from ChangeLog entries.
>> > >
>> > > This indeed looks very nice in the vast majority of cases!  I'm sure
>> > > there are a number of people who are interested in seeing the scripts and
>> > > techniques you used.  I am, for sure.  :-)
>> >
>> > Thanks for the feedback!
>> >
>> > I'll post the scripts, of course ;-)
>> > If I don't do it this week it's because I forgot or didn't
>> > find the time, so a ping would be welcome.
>>
>> Ping.  :-)
>
> I don't want to trouble you too much, but I would be thankful already if
> you could simply hand me over the script you used for the post-conversion
> cleanup and commit accumulation.

Hi Thomas,
Thanks for the prod.

Here are three scripts:
[definitely not production quality.
 I wanted to clean them up before publishing, but if I wait
 to find time for that, it may never happen, so... ]

git-log-munge: a helper script invoked by glibc-reconstruct-commits

glibc-reconstruct-commits: based on a script by Paolo Bonzini.  I used this
  to aggregate commits on the "master" branch of a
  just-cvs-to-git-converted glibc.git repository.
  However, doing that unhooked all branches and tags from master...

tag-restore: reconnect those branches and tags
  (caveat, note: contains hard-coded paths)

#!/usr/bin/perl -T
# massage a log as pre-filtered by glibc-reconstruct-commits.
use strict;
use warnings;

sub find_bz ($)
{
  my ($lines) = @_;
  my @bzs;
  foreach my $line (grep (/\bBZ \#\d/, @$lines))
    {
      $line =~ s/BZ #2423, #2749/BZ #2423, BZ #2749/; # sole fix-up
      push @bzs, $line =~ /BZ #(\d+)/mg;
    }
  return @bzs;
}

{
  my @line = <>;
  while (@line && $line[$#line] eq "\n") { pop @line; };
  while (1 < @line && $line[0] eq ".\n") { shift @line; };

  # If the first line starts with TABs, remove them.
  @line
    and $line[0] =~ s/^\t+//;

  # If the first line contains any other TABs, split on them,
  # on the assumption that it is a ChangeLog entry that has been
  # concatenated by git.
  if (@line && $line[0] =~ /\t+/)
    {
      my $l = $line[0];
      chomp $l;
      my @spl = split ("\t", $l);
      splice @line, 0, 1, (map {"$_\n"} @spl);
    }

  # If there's a BZ number on the first line, use that as the subject.
  if (@line && $line[0] =~ /BZ #\d+/)
    {
      # BZ is already on the first line; do nothing more.
    }
  else
    {
      # If there are more than 1000 lines, presume it's due to a
      # ChangeLog->ChangeLog.N rotation and keep only the first.
      1000 < @line
        and @line = ($line[0]);

      my @bz = find_bz \@line;
      if (@bz)
        {
          # Filter out duplicates and numerical-sort.
          my %unique = map { $_ => 1 } @bz;
          @bz = sort { $a <=> $b } keys %unique;

          # We're about to prepend subject+blank-line,
          # so if the preexisting 2nd line is blank, remove it.
          2 <= @line && $line[1] eq "\n"
            and splice @line, 1, 1;
          @bz = map { "BZ #$_" } @bz;
          my $subject = "[" . join (', ', @bz) . "]\n";
          unshift @line, $subject, "\n";
        }
    }

  # If there are 3 or more lines, the first looks like date+name+email
  # of a ChangeLog entry, the 2nd is blank, and third starts with a TAB,
  # then use the third (minus its leading TAB).
  if (3 <= @line && $line[1] eq "\n"
      && $line[0] =~ /^2\d\d\d-\d\d-\d\d  \S.*?  <.*>$/
      && $line[2] =~ /^\t[^\t]/)
    {
      shift @line;
      shift @line;
      $line[0] =~ s/^\t//;
    }

  # if there are two or more lines, ensure the 2nd is blank
  2 <= @line && $line[1] ne "\n"
    and splice @line, 1, 0, ("\n");

  print @line;
}

# FIXME:
my $junk = <<'EOF';
This commit log message is messed up:
Note how the subject was precisely the body of the ChangeLog entry.

    * elf/dl-open.c (_dl_open): Bump GL(dl_nns) to 1 if no libraries

    are dlopened in statically linked program even for __LM_ID_CALLER.
    2009-04-16  Jakub Jelinek  <jakub@redhat.com>

        * elf/dl-open.c (_dl_open): Bump GL(dl_nns) to 1 if no libraries
        are dlopened in statically linked program even for __LM_ID_CALLER.
EOF

# Local Variables:
# indent-tabs-mode: nil
# End:
#!/bin/bash
# Based on the script from Paolo Bonzini:
# http://sourceware.org/ml/libc-alpha/2009-05/msg00005.html

debug=0
g_prev_commit=

warn () { echo "$*" >&2; }
debug () { test "$debug" = 1 && echo "$*" >&2 || :; }

map()
{
	# if it was not rewritten, take the original
	if test -r "map/$1"
	then
		cat "map/$1"
	else
		echo "$1"
	fi
}
m2()
{
    cat m2/"$1" 2>/dev/null ||
      cat map/"$1" 2>/dev/null ||
        echo "$1"
}

# override die(): this version puts in an extra line break, so that
# the progress is still visible

die()
{
	echo >&2
	echo "$*" >&2
	exit 1
}

# When piped a commit, output a script to set the ident of either
# "author" or "committer

set_ident () {
	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
	pick_id_script='
		/^'$lid' /{
			s/'\''/'\''\\'\'\''/g
			h
			s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
			s/'\''/'\''\'\'\''/g
			s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p

			g
			s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
			s/'\''/'\''\'\'\''/g
			s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p

			g
			s/^'$lid' [^<]* <[^>]*> \(.*\)$/\1/
			s/'\''/'\''\'\'\''/g
			s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p

			q
		}
	'

	LANG=C LC_ALL=C sed -ne "$pick_id_script"
	# Ensure non-empty id name.
	echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
}

final_rcs_log_msg='Previously uncontrolled files put into CVS.'
: ${skip_up_to=$(git log -1 --pretty=format:%H ":/$final_rcs_log_msg")}

: ${suspended_tree=}
: ${suspended_commit=}

t()
{
    # printf %3.3s "$1"
    test -n "$1" &&
      git log -1 --pretty='format:[%h:%s]' "$1"
      # git log -1 --pretty=format:%h[%s] "$1"
}

do_commit ()
{
    test -z "$2" \
      && git commit-tree "$1^{tree}" > map/$1 \
      || git commit-tree "$1^{tree}" -p $2 > map/$1
    #debug "  do_ci > $(t $1):$(t $(cat map/$1)) p:$(t $2)"
}

skip_commit()
{
    echo "$2" > map/$1
    #debug "skip_ci > $(t $1):$(t $2)"
}

new_msg()
{
    local c=$1

    local orig_log=$(git log -1 --pretty=$'format:%s\n%b' $c)

    # If any of the three variables is empty, use orig commit.
    local t=":$old_changelog:$new_changelog:$c:"
    case "$t" in
      *::*) printf %s "$orig_log"; return;;
    esac

    # Find the first pair of differing SHA1s
    local i=1
    for old_sha1 in $(echo $old_changelog); do
	new_sha1=$(echo "$new_changelog"|sed -n ${i}p)
	test $old_sha1 != $new_sha1 && break
	i=$(expr $i + 1)
    done

    { printf '%s\n' "$orig_log"
      git diff $old_sha1 $new_sha1 \
	| sed -n \
	  -e '1,/^+/ s/^ //p' \
	  -e '1,/^@@/d' \
	  -e 's/^+//p'; \
    } \
      | git-log-munge
}

get_changelog_hashes ()
{
    local c=$1
    local p=$2
    test -z "$p" && return
    new_changelog=$(git ls-tree $c^{tree} \
		      ChangeLog \
		      posix/glob/ChangeLog \
		      nptl_db/ChangeLog \
		      nptl/ChangeLog \
		      localedata/ChangeLog \
		      libidn/ChangeLog \
		      linuxthreads/ChangeLog \
		    | awk '{print $3}')
    old_changelog=$(git ls-tree $p \
		      ChangeLog \
		      posix/glob/ChangeLog \
		      nptl_db/ChangeLog \
		      nptl/ChangeLog \
		      localedata/ChangeLog \
		      libidn/ChangeLog \
		      linuxthreads/ChangeLog \
		    | awk '{print $3}')
}

# Ugly, since it uses and updates a global.
# Map deferred commits to the one they're aggregated to in the new tree.
# When aggregating, $g_prev_commit is the most recent commit (in the orig tree)
# that we've added to the new tree.  Map the commits after $g_prev_commit and
# before $COMMIT to the image of $COMMIT in the new tree.
# $g_prev_commit is empty initially, and in that case, we map all commits
# before $COMMIT.
map_suspended_commits ()
{
    local commit="$1"
    local c
    for c in $(git rev-list $g_prev_commit..$commit^); do
	#debug map-susp: $(t $c) $(t $(map $commit))
	skip_commit $c $(map $commit)
    done
    g_prev_commit=$commit
}

filter_commit ()
{
  local commit="$1"
  local parent="$2"
  if test "$parent" = "$skip_up_to"; then
    echo 'initial import' | do_commit $commit
    map_suspended_commits $commit
  else
    if [ "$GIT_AUTHOR_NAME" != "$PREV_AUTHOR_NAME" -a -n "$suspended_commit" ];
        then
      #debug committing suspended "$(t $suspended_commit)" "w/parent $(t $parent)"
      get_changelog_hashes $suspended_commit $suspended_parent
      new_msg $suspended_commit \
	| \
        GIT_COMMITTER_NAME="$PREV_COMMITTER_NAME" \
        GIT_COMMITTER_EMAIL="$PREV_COMMITTER_EMAIL" \
        GIT_COMMITTER_DATE="$PREV_COMMITTER_DATE" \
        GIT_AUTHOR_NAME="$PREV_AUTHOR_NAME" \
        GIT_AUTHOR_EMAIL="$PREV_AUTHOR_EMAIL" \
        GIT_AUTHOR_DATE="$PREV_AUTHOR_DATE" \
	do_commit "$suspended_commit" "$parent"
      parent=$(map "$suspended_commit")
      map_suspended_commits $suspended_commit
    fi
    get_changelog_hashes $commit $parent

    if test "$old_changelog" = "$new_changelog" -a $commit != $head \
	-a "$GIT_AUTHOR_NAME" = "$PREV_AUTHOR_NAME"; then
      #debug deferring "$(t $commit)" "($GIT_AUTHOR_NAME)"
      suspended_commit="$commit"
      suspended_parent="$parent"
      skip_commit "$commit" "$parent"
    else
      #debug commit "$(t $commit)" "($GIT_AUTHOR_NAME)"
      suspended_commit=
      new_msg $commit \
	| do_commit $commit "$parent"
      g_prev_commit=$commit
    fi
  fi
  PREV_COMMITTER_NAME="$GIT_COMMITTER_NAME"
  PREV_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL"
  PREV_COMMITTER_DATE="$GIT_COMMITTER_DATE"
  PREV_AUTHOR_NAME="$GIT_AUTHOR_NAME"
  PREV_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL"
  PREV_AUTHOR_DATE="$GIT_AUTHOR_DATE"
}

USAGE="[--original <namespace>] [-d <directory>] [-f | --force] \
[<rev-list options>...]"

OPTIONS_SPEC=
. "$(git --exec-path)/git-sh-setup"

git diff-files --quiet &&
	git diff-index --cached --quiet HEAD -- ||
	die "Cannot rewrite branch(es) with a dirty working directory."

tempdir=.git-rewrite
orig_namespace=refs/original/
force=
while :
do
	case "$1" in
	--)
		shift
		break
		;;
	--force|-f)
		shift
		force=t
		continue
		;;
	-*)
		;;
	*)
		break;
	esac

	# all switches take one argument
	ARG="$1"
	case "$#" in 1) usage ;; esac
	shift
	OPTARG="$1"
	shift

	case "$ARG" in
	-d)
		tempdir="$OPTARG"
		;;
	--original)
		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
		;;
	*)
		usage
		;;
	esac
done

case "$force" in
t)
	rm -rf "$tempdir"
;;
'')
	test -d "$tempdir" &&
		die "$tempdir already exists, please remove it"
esac
mkdir -p "$tempdir/t" || die ""
rmdir "$tempdir/t" || die ""
cd "$tempdir"
tempdir=$(pwd)

# Remove tempdir on exit
trap 'cd ..; rm -rf "$tempdir"' 0

# Make sure refs/original is empty
git for-each-ref > "$tempdir"/backup-refs
while read sha1 type name
do
	case "$force,$name" in
	,$orig_namespace*)
		die "Namespace $orig_namespace not empty"
	;;
	t,$orig_namespace*)
		git update-ref -d "$name" $sha1
	;;
	esac
done < "$tempdir"/backup-refs

ORIG_GIT_DIR="$GIT_DIR"
ORIG_GIT_WORK_TREE="$GIT_WORK_TREE"
ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE"
GIT_WORK_TREE=.
export GIT_DIR GIT_WORK_TREE

# The refs should be updated if their heads were rewritten
if test "$#" != 0; then die "usage: $0"; fi

# Update only the master branch, and all tags.
set master $(git tag -l)
git rev-parse --no-flags --revs-only --symbolic-full-name "$@" |
  sed -e '/^^/d' >"$tempdir"/heads

test -s "$tempdir"/heads ||
	die "Which ref do you want to rewrite?"

ret=0

# map old->new commit ids for rewriting parents
mkdir map || die "Could not create map/ directory"
mkdir m2 || die "Could not create m2/ directory"

git rev-list --reverse --topo-order --parents "$@" ^$skip_up_to > revs ||
	die "Could not get the commits"
commits=$(wc -l <revs | tr -d " ")

test $commits -eq 0 && die "Found nothing to rewrite"

# Rewrite the commits

head=$(git log -1 --pretty=format:%H)
i=0
elided=0
non_elided_commit=
elided_commits=
parent_tree=
while read commit parent blah; do
	test -n "$blah" && die unexpected merge
	i=$(($i+1))
	printf "Rewrite ($i/$commits) $(t $commit)\n"

	commit_tree=$(git rev-parse "$commit^{tree}")
	# Elide each empty commit.
	if test "$commit_tree" = "$parent_tree"; then
	    elided_commits="$elided_commits $commit"
	    debug "eliding empty $(t $commit) -> p:$(t $parent) $(t $(map $parent))"
	    elided=1
	    continue
	fi

	test $elided = 1 \
	  && parent=$non_elided_commit \
	  || non_elided_commit=$commit
	elided=0

	git cat-file commit "$commit" >commit ||
		die "Cannot read commit $commit"

	eval "$(set_ident AUTHOR <commit)" ||
		die "setting author failed for commit $commit"
	eval "$(set_ident COMMITTER <commit)" ||
		die "setting committer failed for commit $commit"
	mapped_parent=$(map "$parent")
	#debug FC: "$(t $commit) $(t $parent) [$(t $mapped_parent)]"
	filter_commit $commit $mapped_parent

	mapped_parent=$(map "$parent")
	for e in $elided_commits; do
	    #debug "eliding empty $(t $e) -> p:$(t $parent) $(t $mapped_parent)"
	    echo $(map $parent) > m2/$e
	done
	elided_commits=
	parent_tree=$commit_tree
done <revs

# Finally update the refs

_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
echo
while read ref
do
	# avoid rewriting a ref twice
	test -f "$orig_namespace$ref" && continue

	sha1=$(git rev-parse "$ref"^0)
	rewritten=$(m2 $sha1)

	test $sha1 = "$rewritten" &&
		warn "WARNING: Ref '$ref' is unchanged" &&
		continue

	case "$rewritten" in
	'')
		echo "Ref '$ref' was deleted"
		git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
			{ warn "Could not delete $ref"; ret=1; }
	;;
	$_x40)
		echo "Ref '$ref' was rewritten $(t $rewritten)"
		git update-ref -m "filter-branch: rewrite" \
				"$ref" $rewritten ||
			{ warn "Could not rewrite $ref"; ret=1; }
	;;
	*)
		# NEEDSWORK: possibly add -Werror, making this an error
		warn "WARNING: '$ref' was rewritten into multiple commits:"
		warn "$rewritten"
		warn "WARNING: Ref '$ref' points to the first one now."
		rewritten=$(echo "$rewritten" | head -n 1)
		git update-ref -m "filter-branch: rewrite to first" \
				"$ref" $rewritten $sha1 ||
			{ warn "Could not rewrite $ref"; ret=1; }
	;;
	esac
	git update-ref -m "filter-branch: backup" "$orig_namespace$ref" $sha1
done < "$tempdir"/heads

# Save copies of important pieces, in case we want to redo graft.
cp -a ../.git ../.git-pre-graft-backup
b=$(basename $tempdir)
cp -a $tempdir ../$(basename $tempdir)-backup

set -e
set -x
git reset --hard master
branch_heads=$(git br|sed s/..//|grep -v '^master$')
# For every non-master branch, $b, do the following:
# But first record a merge base for each branch, since with
# two or more branches, original/refs/heads/master disappears after
# the first due to our use of git filter-branch's -f option.
mkdir mp
for b in $(echo "$branch_heads"); do
    git merge-base original/refs/heads/master "$b" > "mp/$b"
    debug "merge-base: $b: $(cat "mp/$b")"
done

# For when a branch has no commits or when it is identical to another.
# In that case, we can't use .git/info/grafts (filter-branch would fail).
# Instead, simply update the ref.
rewrite_ref()
{
    local branch_name=$1
    local commit=$2
    b_full=$(git rev-parse --symbolic-full-name "$branch_name")
    git update-ref -m "graft-empty-branch: rewrite" "$b_full" $commit ||
	{ warn "Could not rewrite $b_full to $commit"; ret=1; }
}

graft_file=../.git/info/grafts
touch $graft_file
for b in $(echo "$branch_heads"); do
    debug "grafting branch: $b"
    branch_point=$(cat "mp/$b")

    # This is the commit on "master" that will be the parent.
    mapped_branch_point=$(map $branch_point)
    test -z "$mapped_branch_point" && die "no branch point for $b"

    # Get first commit on the branch.
    first_commit_on_branch=$(git rev-list $branch_point.."$b"|tail -1)
    test -z "$first_commit_on_branch" &&
        { debug "skipping $b; it has no commit of its own"
	  rewrite_ref $b $mapped_branch_point; continue; }

    # If this is a duplicate graft <commit,parent> pair, skip it.
    grep -B1 "^$first_commit_on_branch $mapped_branch_point$" $graft_file &&
        { debug "skipping $b: it is identical to a preceding one"
	  rewrite_ref $b $mapped_branch_point; continue; }

    # Set graft point.
    printf "# %s\n%s %s\n" "$b" \
      $first_commit_on_branch $mapped_branch_point >> $graft_file

    # Filter the branch to make the graft permanent.
    git filter-branch -f $mapped_branch_point..$b
done

cd ..
rm -rf "$tempdir"

trap - 0

unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
test -z "$ORIG_GIT_DIR" || {
	GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
}
test -z "$ORIG_GIT_WORK_TREE" || {
	GIT_WORK_TREE="$ORIG_GIT_WORK_TREE" &&
	export GIT_WORK_TREE
}
test -z "$ORIG_GIT_INDEX_FILE" || {
	GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" &&
	export GIT_INDEX_FILE
}
git read-tree -u -m HEAD

exit $ret
#!/bin/bash
test $# = 0 || exit 1

orig=/var/tmp/glibc-pristine/.git
public=$HOME/w/co/glibc/.git

trap 'st=$?; rm -rf $mapdir && exit $st' 0
trap 'exit $?' 1 2 13 15
mapdir=$(mktemp -d) || exit 1

# Build a table mapping each tagged SHA1 to its list of tag names:
# Note the leading "*" to get the referent of each tag object.
printf 'building SHA1-to-tag-name map using orig repo...\n'
git --git-dir=$orig for-each-ref --shell \
      --format='r=%(refname) tag=${r#refs/tags/} o=%(*objectname)' refs/tags |\
    while read entry; do
        eval "$entry"
        echo "cvs/$tag" >> $mapdir/"$o"
    done

branch_heads=$(git --git-dir=$orig branch|sed s/..//|grep -v master)
for branch in $(echo "$branch_heads"); do
    # Propagate tags on $BRANCH in a pristine, just-converted-from-CVS git
    # repository to the cset-aggregated and grafted public glibc.git.

    # Use an array to map indices 0..N to the corresponding commit-on-orig-branch:
    i=0
    for c in $(git --git-dir=$orig rev-list master..$branch); do
	old_c[$i]=$c
	i=$[i+1]
    done

    # Apply those tags to the $public tree
    export GIT_DIR=$public
    i=0
    n_tagged=0
    n=$(git rev-list master..origin/cvs/$branch|wc -l)
    printf "  propagating tags to the $n-commit branch, $branch...\n"
    for c in $(git rev-list master..origin/cvs/$branch);do
	f=$mapdir/${old_c[$i]}
	tag_list=$(test -r $f && cat $f) &&
	    for t in $(echo "$tag_list"); do
		git tag -f "$t" $c
		n_tagged=$[n_tagged+1]
	    done
	i=$[i+1]
	printf '  %03d/%03d (#t=%d)\r' $i $n $n_tagged
    done
    printf "\n$branch: applied/moved $n_tagged tags\n"
done

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]