This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Muenchian method on nodes with two or more items for indexing


Sorry about mismatch between input and output. I over simplified. Here is a
file that I actually ran through the XSL with its output and what I would
LIKE to see. As for my source document, it has approximately 6000 entries
with another 6000 index items.

Thanks for any help!

Larry


Input:

<?xml-stylesheet type="text/xsl"
href="C:\LL2XML\TransXML2HTML\xml2ReverseIndex2.xsl"?>
<LexicalDatabase>
  <minor>
    <base>'wah 'nabuuysk</base>
    <sense num=" 1">
      <index enc="ENG">unexpected</index>
    </sense>
  </minor>
  <minor>
    <base>'wah wil&#226;ontk</base>
  </minor>
  <major>
    <base>'w&#224;hamaniits'&#224;</base>
    <sense num=" 1">
      <pos>v</pos>
      <def enc="ENG">careless</def>
      <index enc="ENG">careless</index>
    </sense>
  </major>
  <major>
    <base>xbimooksk</base>
    <sense num=" 1">
      <pos>n</pos>
      <def enc="ENG">half-white </def>
      <index enc="ENG">metis</index>
      <index enc="ENG">half-white</index>
      <sense num="1.1">
        <pos>n</pos>
        <def enc="ENG">test</def>
        <index enc="ENG">test</index>
      </sense>
    </sense>
  </major>
  <major>
    <base>xbismsg&#232;&#232;</base>
    <sense num=" 1">
      <pos>v</pos>
      <index enc="ENG">bow your head</index>
      <index enc="ENG">bend down</index>
    </sense>
  </major>
</LexicalDatabase>


XSL:

<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:saxon="http://icl.com/saxon";>
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:key name="BaseForm" match="LexicalDatabase/*"
use="concat(base,baseHom)"/>
  <xsl:key name="entries-by-index" match="//LexicalDatabase/*"
use=".//index"/>
  <xsl:template match="/">
    <ReverseEntries>
      <xsl:apply-templates/>
    </ReverseEntries>
  </xsl:template>
  <xsl:template match="LexicalDatabase">
    <xsl:for-each
select="//LexicalDatabase/*[generate-id(.)=generate-id(key('entries-by-index
',.//index))]">
      <xsl:sort select=".//index" order="ascending" />
      <IndexItem>
        <xsl:attribute name="value"><xsl:value-of
select=".//index"/></xsl:attribute>
        <xsl:for-each select="key('entries-by-index', .//index)">
          <!--xsl:sort select="base"/ Should be presorted coming from
LinguaLinks to account for multigraphs-->
          <entry>
            <xsl:attribute name="base"><xsl:value-of
select="base"/></xsl:attribute>
          </entry>
        </xsl:for-each>
      </IndexItem>
    </xsl:for-each>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>

Actual Output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon";>
  <IndexItem value="bow your head">
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="careless">
    <entry base="'wàhamaniits'à"/>
  </IndexItem>
  <IndexItem value="metis">
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="unexpected">
    <entry base="'wah 'nabuuysk"/>
  </IndexItem>
</ReverseEntries>

Desired Output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon";>
  <IndexItem value="bend down">  <-- This is missing above.
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="bow your head">
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="careless">
    <entry base="'wàhamaniits'à"/>
  </IndexItem>
  <IndexItem value="half-white">  <-- This is missing above.
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="metis">
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="test"> <-- This is missing above. Comes from sense
within another sense.
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="unexpected">
    <entry base="'wah 'nabuuysk"/>
  </IndexItem>

</ReverseEntries>


----- Original Message -----
From: <Jarno.Elovirta@nokia.com>
To: <xsl-list@lists.mulberrytech.com>
Sent: Friday, September 20, 2002 12:20 AM
Subject: RE: [xsl] Muenchian method on nodes with two or more items for
indexing


> Hi,
>
> > I just tried using an axes method with this problem and it
> > took more than 15
> > minutes to crunch through on a 2 GHZ Pentium with lots of
> > RAM. I need to
>
> How big was your source document?
>
> > I have data of the following sort. You will note that minor or major
> > elements and their senses can have one or more index elements.
> >
> > <LexicalDatabase>
> > <minor>
> > <base>'wah 'nabuuysk</base>
> > <sense num=" 1">
> > <index enc="ENG">unexpected</index>
> > </sense>
> > </minor>
> > <minor>
> > <base>'wah wil&#226;ontk</base>
> > </minor>
> > <major>
> > <base>'w&#224;hamaniits'&#224;</base>
> > <sense num=" 1">
> > <pos>v</pos>
> > <def enc="ENG">careless</def>
> > <index enc="ENG">careless</index>
> > </sense>
> > </major>
> > <major>
> > <base>xbimooksk</base>
> > <sense num=" 1">
> > <pos>n</pos>
> > <def enc="ENG">half-white </def>
> > <index enc="ENG">metis</index>
> > <index enc="ENG">half-white</index>
> > </sense>
> > </major>
> > <major>
> > <base>xbismsg&#232;&#232;</base>
> > <sense num=" 1">
> > <pos>v</pos>
> > <index enc="ENG">bow your head</index>
> > <index enc="ENG">bend down</index>
> > </sense>
> > </major>
> > </LexicalDatabase>
> >
> > What I would like to do is get output a file that has index elements
> > containing their major or minor entries. It is similar to
> > grouping by last
> > name or city except that each person could have one, two or
> > more of these.
> > Perhaps "Schools attended" would be a good example. Anyhow,
> > here is a sample
> > of what I would like to output.
> >
> > <IndexList>
> > <IndexItem value="metis">
> > <entry base="xbimooksk" baseHom="" />
> > </IndexItem>
> > <IndexItem value="microwave">
> > <entry base="âànuut" baseHom="2"/>
> > </IndexItem>
> > <IndexItem value="midday">
> > <entry base="nsèèlga sah" baseHom=""/>
> > <entry base="sèèlgyàxsk" baseHom=""/>
> > </IndexItem>
> > <IndexItem value="middle (in the _)">
> > <entry base="lusèèlk" baseHom=""/>
> > <entry base="xts'a" baseHom=""/>
> > </IndexItem>
> > </IndexList>
>
> Your source and desired output don't match (e.g. no "microwave" in
source), so it's bit hard to see how it should work.
>
> <xsl:key name="entries-by-index" match="index" use="."/>
>
> <xsl:template match="LexicalDatabase">
>   <IndexList>
>     <xsl:for-each select="*/sense/index[generate-id() =
generate-id(key('entries-by-index', .)]">
>       <xsl:sort select="." data-type="text"/>
>       <IndexItem value="{.}">
>         <xsl:for-each select="key('entries-by-index', .)/../../base">
>           <entry base="{.}" baseHom=""/>
>         </xsl:for-each>
>       </IndexItem>
>     </xsl:for-each>
>   </IndexList>
> </xsl:template>
>
> Will get you somewhere, but I didn't understand where the value of baseHom
comes from.
>
> J - Wumpscut: Deliverance (Alternative Club Mix)
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]