This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: sgml auto-indenter


At 26 Nov 2000 15:21 -0500, Kevin M. Dunn wrote:
 > Several people have discussed the use of tidy to indent sgml and xml
 > sources. It didn't work for my documents, as
 > tidy did not recognize my entities. Rather than fix tidy, I just wrote a
 > perl script to indent anything with sgml-type
 > tags. Only non-empty tags are indented, and text is justified at 80
 > characters/line (easily changed). Try it out, if you
 > like, and let me know what needs fixing. I am running perl under redhat
 > 6.1.

I did something similar a while ago, but my program reads the DTD to
work out which elements have character data content, so it adds
newlines only where that won't affect the parsing, but it doesn't
indent.

This program requires Earl Hood's perlSGML package to parse the DTD
and David Megginson's sgmlspm package to do the right thing for the
tags in the instance.

Regards,


Tony Graham
======================================================================
Tony Graham                            mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

#! /perl/bin/perl.exe
#
# $Id: prettysgml.pl,v 1.7 1999/04/09 12:59:08 tkg Exp $
#
# Perl script to pretty-print an SGML instance.
#
# Requires Earl Hood's perlSGML package to parse the DTD and David
# Megginson's sgmlspm package to do the right thing for the tags
# in the instance.

############################################################
# Requires

# Earl Hood's perlSGML function library
# Change this path if "dtd.pl" is at a different place on your system.
require $ENV{'TAGLIBBASE'} . '\lib\dtd.pl';

############################################################
# Uses (Perl5 improvement on 'require')
use SGMLS;
use SGMLS::Output;

############################################################
# Constants
#
# Usage statement
$cUsage = <<"EndOfUsage";
Usage:
perl $0 [-xml] TagLib.sgm

where:
 -xml          = Enable XML-specific behavior
  TagLib.sgm   = Tag Library SGML file
EndOfUsage

# Our own, non-reference-concrete-syntax, SGML declaration
$cSGMLDeclaration = $ENV{'TAGLIBBASE'} . '/lib/taglib.dec';

############################################################
# Process command line

while (@ARGV) {
    if ($ARGV[0] =~ /^-/) {
	if ($ARGV[0] =~ /^-xml$/) {
	    $gXML = shift;
	} else {
	    print STDERR $cUsage;
	    die "\nUnknown option:$ARGV[0]:\n";
	}
    } else {
	last;
    }
}

if (!@ARGV) {
    die $cUsage;
}

# Set XML-specific behavior, if required
if ($gXML) {
    &DTDset_xml($gXML);
}

############################################################
# Parse the tag library file
# 

# Whatever remains, however improbable, must be the instance

$gTagLibFile = shift;

open(TAGLIB, "$gTagLibFile") ||
    die "Couldn't open Tag Library file \"$gTagLibFile\".\n";

# Read catalog files from SGML_CATALOG_FILES environment variable
&DTDread_catalog_files();

# Read the Tag Library DTD
&DTDread_dtd("main'TAGLIB") ||
    die "perlSGML library couldn't read Tag Library DTD.\n";

close(TAGLIB);

# Reopen the tag library to get the document type declaration
open(TAGLIB, "$gTagLibFile") ||
    die "Couldn't open Tag Library file \"$gTagLibFile\".\n";

while(<TAGLIB>) {
    last if /^<[^!]/;

    print $_;
    $gDocumentTypeDeclaration .= $_;
}

close(TAGLIB);

############################################################
# perlSGML processing to define what to do when the sgmlspl part
# works on the tags

# Get our list of elements courtesy of perlSGML
@gElements = &DTDget_elements(0);

foreach $lElement (@gElements) {
    local($lContentModel) = join(":", &DTDget_base_children($lElement, 0));
    local(%lAttributes) = &DTDget_elem_attr($lElement);

    $lElement =~ tr/a-z/A-Z/;

    if ($lContentModel =~ /#PCDATA/i ||
	$lContentModel =~ /^CDATA$/i ||
	$lContentModel =~ /^RCDATA$/i) {
	$gContentType{$lElement} = 'mixed';
    } elsif ($lContentModel =~ /^EMPTY$/i) {
	$gContentType{$lElement} = 'empty';
    } else {
	$gContentType{$lElement} = 'element';
    }

#    print STDERR ":$lElement:$gContentType{$lElement}:$lContentModel:\n";
}

############################################################

sgml('end_subdoc', '');		# Ignore the ends of subdocument entities.

sgml('re', sub {
    output "\n";
});

sgml('pi', sub {
    my $lProcessingInstruction = shift;

    output "<?$lProcessingInstruction>";
});

#sgml('sdata', sub {
#    my $lSDATA = shift;
#
#    output $lSDATA;
#});

sgml('start_element', sub {
    my $lElement = shift;
    my $lParent = $lElement->parent;
    my $lParentName = $lElement->parent->name if $lParent ne '';

    if ($lParent ne '' && $gContentType{$lParentName} eq 'element') {
	output "\n";
    }

    output "<" . $lElement->name;

    foreach $lAttribute ($lElement->attribute_names) {
	local($lAttributeValue) = $lElement->attribute($lAttribute)->value;

	if (!$lElement->attribute($lAttribute)->is_implied) {
	    if ($lElement->attribute($lAttribute)->type eq 'NOTATION') {
		output " $lAttribute=\"" . 
		    $lElement->attribute($lAttribute)->value->name . "\"";
	    } elsif ($lElement->attribute($lAttribute)->type eq 'ENTITY') {
		output " $lAttribute=\"" . 
		    $lElement->attribute($lAttribute)->value->name . "\"";
	    } else {
		output " $lAttribute=\"$lAttributeValue\"";
	    }
	}
    }

    output ">";
});

sgml('end_element', sub {
    my $lElement = shift;

    if ($gContentType{$lElement->name} eq 'element') {
	output "\n";
    }
    if ($gContentType{$lElement->name} ne 'empty') {
	output "</" . $lElement->name . ">";
    }
});
    

########################################################################
# SDATA Handler -- Output the entity that we started with
########################################################################

sgml('sdata', sub {
    my $lSDATA = shift;

    $lSDATA =~ s/\[/\&/;
    $lSDATA =~ s/\s*\]/;/;

    output $lSDATA;
});



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]