This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Muenchian method on nodes with two or more items for indexing
- From: larry_hayashi at hotmail dot com
- To: <xsl-list at lists dot mulberrytech dot com>
- Date: Fri, 20 Sep 2002 11:56:59 -0600
- Subject: Re: [xsl] Muenchian method on nodes with two or more items for indexing
- References: <E392EEA75EC5F54AB75229B693B1B6A70E2675@esebe018.ntc.nokia.com>
- Reply-to: xsl-list at lists dot mulberrytech dot com
Sorry about mismatch between input and output. I over simplified. Here is a
file that I actually ran through the XSL with its output and what I would
LIKE to see. As for my source document, it has approximately 6000 entries
with another 6000 index items.
Thanks for any help!
Larry
Input:
<?xml-stylesheet type="text/xsl"
href="C:\LL2XML\TransXML2HTML\xml2ReverseIndex2.xsl"?>
<LexicalDatabase>
<minor>
<base>'wah 'nabuuysk</base>
<sense num=" 1">
<index enc="ENG">unexpected</index>
</sense>
</minor>
<minor>
<base>'wah wilâontk</base>
</minor>
<major>
<base>'wàhamaniits'à</base>
<sense num=" 1">
<pos>v</pos>
<def enc="ENG">careless</def>
<index enc="ENG">careless</index>
</sense>
</major>
<major>
<base>xbimooksk</base>
<sense num=" 1">
<pos>n</pos>
<def enc="ENG">half-white </def>
<index enc="ENG">metis</index>
<index enc="ENG">half-white</index>
<sense num="1.1">
<pos>n</pos>
<def enc="ENG">test</def>
<index enc="ENG">test</index>
</sense>
</sense>
</major>
<major>
<base>xbismsgèè</base>
<sense num=" 1">
<pos>v</pos>
<index enc="ENG">bow your head</index>
<index enc="ENG">bend down</index>
</sense>
</major>
</LexicalDatabase>
XSL:
<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://icl.com/saxon">
<xsl:output method="xml" encoding="ISO-8859-1"/>
<xsl:key name="BaseForm" match="LexicalDatabase/*"
use="concat(base,baseHom)"/>
<xsl:key name="entries-by-index" match="//LexicalDatabase/*"
use=".//index"/>
<xsl:template match="/">
<ReverseEntries>
<xsl:apply-templates/>
</ReverseEntries>
</xsl:template>
<xsl:template match="LexicalDatabase">
<xsl:for-each
select="//LexicalDatabase/*[generate-id(.)=generate-id(key('entries-by-index
',.//index))]">
<xsl:sort select=".//index" order="ascending" />
<IndexItem>
<xsl:attribute name="value"><xsl:value-of
select=".//index"/></xsl:attribute>
<xsl:for-each select="key('entries-by-index', .//index)">
<!--xsl:sort select="base"/ Should be presorted coming from
LinguaLinks to account for multigraphs-->
<entry>
<xsl:attribute name="base"><xsl:value-of
select="base"/></xsl:attribute>
</entry>
</xsl:for-each>
</IndexItem>
</xsl:for-each>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Actual Output:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon">
<IndexItem value="bow your head">
<entry base="xbismsgèè"/>
</IndexItem>
<IndexItem value="careless">
<entry base="'wàhamaniits'à"/>
</IndexItem>
<IndexItem value="metis">
<entry base="xbimooksk"/>
</IndexItem>
<IndexItem value="unexpected">
<entry base="'wah 'nabuuysk"/>
</IndexItem>
</ReverseEntries>
Desired Output:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon">
<IndexItem value="bend down"> <-- This is missing above.
<entry base="xbismsgèè"/>
</IndexItem>
<IndexItem value="bow your head">
<entry base="xbismsgèè"/>
</IndexItem>
<IndexItem value="careless">
<entry base="'wàhamaniits'à"/>
</IndexItem>
<IndexItem value="half-white"> <-- This is missing above.
<entry base="xbimooksk"/>
</IndexItem>
<IndexItem value="metis">
<entry base="xbimooksk"/>
</IndexItem>
<IndexItem value="test"> <-- This is missing above. Comes from sense
within another sense.
<entry base="xbimooksk"/>
</IndexItem>
<IndexItem value="unexpected">
<entry base="'wah 'nabuuysk"/>
</IndexItem>
</ReverseEntries>
----- Original Message -----
From: <Jarno.Elovirta@nokia.com>
To: <xsl-list@lists.mulberrytech.com>
Sent: Friday, September 20, 2002 12:20 AM
Subject: RE: [xsl] Muenchian method on nodes with two or more items for
indexing
> Hi,
>
> > I just tried using an axes method with this problem and it
> > took more than 15
> > minutes to crunch through on a 2 GHZ Pentium with lots of
> > RAM. I need to
>
> How big was your source document?
>
> > I have data of the following sort. You will note that minor or major
> > elements and their senses can have one or more index elements.
> >
> > <LexicalDatabase>
> > <minor>
> > <base>'wah 'nabuuysk</base>
> > <sense num=" 1">
> > <index enc="ENG">unexpected</index>
> > </sense>
> > </minor>
> > <minor>
> > <base>'wah wilâontk</base>
> > </minor>
> > <major>
> > <base>'wàhamaniits'à</base>
> > <sense num=" 1">
> > <pos>v</pos>
> > <def enc="ENG">careless</def>
> > <index enc="ENG">careless</index>
> > </sense>
> > </major>
> > <major>
> > <base>xbimooksk</base>
> > <sense num=" 1">
> > <pos>n</pos>
> > <def enc="ENG">half-white </def>
> > <index enc="ENG">metis</index>
> > <index enc="ENG">half-white</index>
> > </sense>
> > </major>
> > <major>
> > <base>xbismsgèè</base>
> > <sense num=" 1">
> > <pos>v</pos>
> > <index enc="ENG">bow your head</index>
> > <index enc="ENG">bend down</index>
> > </sense>
> > </major>
> > </LexicalDatabase>
> >
> > What I would like to do is get output a file that has index elements
> > containing their major or minor entries. It is similar to
> > grouping by last
> > name or city except that each person could have one, two or
> > more of these.
> > Perhaps "Schools attended" would be a good example. Anyhow,
> > here is a sample
> > of what I would like to output.
> >
> > <IndexList>
> > <IndexItem value="metis">
> > <entry base="xbimooksk" baseHom="" />
> > </IndexItem>
> > <IndexItem value="microwave">
> > <entry base="âànuut" baseHom="2"/>
> > </IndexItem>
> > <IndexItem value="midday">
> > <entry base="nsèèlga sah" baseHom=""/>
> > <entry base="sèèlgyàxsk" baseHom=""/>
> > </IndexItem>
> > <IndexItem value="middle (in the _)">
> > <entry base="lusèèlk" baseHom=""/>
> > <entry base="xts'a" baseHom=""/>
> > </IndexItem>
> > </IndexList>
>
> Your source and desired output don't match (e.g. no "microwave" in
source), so it's bit hard to see how it should work.
>
> <xsl:key name="entries-by-index" match="index" use="."/>
>
> <xsl:template match="LexicalDatabase">
> <IndexList>
> <xsl:for-each select="*/sense/index[generate-id() =
generate-id(key('entries-by-index', .)]">
> <xsl:sort select="." data-type="text"/>
> <IndexItem value="{.}">
> <xsl:for-each select="key('entries-by-index', .)/../../base">
> <entry base="{.}" baseHom=""/>
> </xsl:for-each>
> </IndexItem>
> </xsl:for-each>
> </IndexList>
> </xsl:template>
>
> Will get you somewhere, but I didn't understand where the value of baseHom
comes from.
>
> J - Wumpscut: Deliverance (Alternative Club Mix)
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list