This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE : (text processing) lexical context


Thank you all!
I'll try all those solutions and see which one fits the best.
I cannot tokenize sentences because the docs i work on are far more
complex and it would give such results:
<p><sentence></p><p></sentence><sentence></sentence></p>
Or i woul have to write a complex script.

> -----Message d'origine-----
> De : owner-xsl-list@lists.mulberrytech.com [mailto:owner-xsl-
> list@lists.mulberrytech.com] De la part de Michael Kay
> Envoyé : mercredi 24 avril 2002 10:32
> À : xsl-list@lists.mulberrytech.com
> Objet : RE: [xsl] (text processing) lexical context
> 
> One other piece of advice (somewhat heretical for this list): XSLT is
not
> the only tool in your kitbag. In fact, where you want to identify
> structure
> in the source that's not explicit in the markup, XSLT is often not the
> best
> tool for the job.
> 
> You could probably tackle this one more easily by writing a SAX filter
> that
> inserts a <sentence> start tag immediately after <root>, a </sentence>
end
> tag immediately before </root>, and a </sentence><sentence> pair
> immediately
> after a "." that's followed by whitespace.
> 
> Michael Kay
> Software AG
> home: Michael.H.Kay@ntlworld.com
> work: Michael.Kay@softwareag.com
> 
> > -----Original Message-----
> > From: owner-xsl-list@lists.mulberrytech.com
> > [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of cutlass
> > Sent: 24 April 2002 09:04
> > To: xsl-list@lists.mulberrytech.com
> > Subject: Re: [xsl] (text processing) lexical context
> >
> >
> > Hello Nicolas,
> >
> > ----- Original Message -----
> > From: "Nicolas Mazziotta" <Nicolas.Mazziotta@ulg.ac.be>
> >
> > > <root>
> > > This is the <w>first</w> <i>sentence</i>. This is the
<w>second</w>
> > > <i>sentence</i>. This is the <w>third</w> <i>sentence</i>.
> > > </root>
> >
> > this particular form of markup keeps cropping up over and
> > over again, and i
> > suspect that most people will tell you that it is not so
> > good. The main
> > problem with this type of markup is that it tends to be rather open
> > ended....eg. there could be a variety of elements, nesting
structures,
> > etc....
> >
> > > <html>
> > > <ol>
> > > <li>first: This is the <b>first</b> <i>sentence</i>.
> > > <li>Second: This is the <w>second</b> <i>sentence</i>.
> > > <li>Third: This is the <b>third</b> <i>sentence</i>.
> > > </ol>
> > > </html>
> > >
> >
> > i am assuming u made an error with the opening <w> in second
> > sentance ?
> >
> > right so you want to
> >
> > a) tokenize each sentance
> > b) number with words ( i.e. First, Second, Third )
> > c) copy all children elements within a sentance across
> > d) replace elements with other elements
> >
> > there are a few approaches;
> >
> > - you are doing too much in one transform, yes it is possible
> > to have one
> > large complicated transform, but why not break up into small
> > steps so u can
> > conceptualise
> >
> > - u can either tokenise each sentance by customising the
> > string tokenise
> > function ( many places, one of them being www.exslt.org ) and
> > tokenise each
> > sentance ( based upon finding a period )
> >
> > - or i suspect this is a rather good use of  Dimitre
> > Novatchev's functional
> > library at www.topxml.com
> >
> > both results will require a little investment in learning,
> >
> > the other stuff, like copying or replacing elements,
> > numbering with words
> > will come after you get over the first step.
> >
> > gl, jim fuller
> >
> >
> > > But I can't figure out how I can select the text surrounding the
<w>
> > > element without using <xsl:value-of.../>, which does not allow me
to
> > > process the following <i> element...
> > >
> > > i.e., I get
> > >
> > > <html>
> > > <ol>
> > > <li>first: This is the <b>first</b> sentence.
> > > <li>Second: This is the <w>second</b> sentence.
> > > <li>Third: This is the <b>third</b> sentence.
> > > </ol>
> > > </html>
> > >
> > > and the <i> element is lost...
> > >
> > > And I can't do <xsl template match="substring(...)">
> > because substring
> > > is not a DOM node.
> > >
> > > Help: is there a way to process substrings or stg?
> > >
> > > N. Mazziotta
> > >
> > >
> > >  XSL-List info and archive:
> http://www.mulberrytech.com/xsl/xsl-list
> >
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]