This is the mail archive of the
docbook@lists.oasis-open.org
mailing list for the DocBook project.
Re: Q: how to store articles DOI numbers?
- To: docbook at lists dot oasis-open dot org
- Subject: Re: DOCBOOK: Q: how to store articles DOI numbers?
- From: Terry Allen <tallen at sonic dot net>
- Date: Fri, 24 Mar 2000 09:34:56 -0800 (PST)
- Reply-To: docbook at lists dot oasis-open dot org
I attach an article on DOIs. It appears that we need to
provide for multiple identifiers; we have ISBN and ISSN elements, but
a better solution might be:
<BiblioIdentifier type="doi">afdsafdjsakf;jdsa</>
<BiblioIdentifier type="isbn">fdjsakfjd;lakfjd</>
and we can add to the list of types upon request. I feel sure we'll
see more types in the future.
regards, Terry
http://www.elsevier.co.jp/inca/homepage/about/diginfo/Menu.shtml
<html><head>
<title>General Information</title>
<style type="text/css">
<!--
BODY { font-family: Times,Helvetica; }
TD { font-family: Times,Helvetica; }
DL { font-family: Times,Helvetica; }
P { font-family: Times,Helvetica; }
UL { font-family: Times,Helvetica; }
H1 { font-family: Times,Helvetica; }
FORM { font-family: Times,Helvetica; }
A { font-family: Times,Helvetica; text-decoration: none; }
-->
</style>
</head><body bgcolor="#ffffff">
<a name="Go to top">
<p><font size="5"><b>Digital Information Objects and the STM Publisher</b></font></p>
<p><i>Reproduced from STM Annual Report, 1997</i></p>
<p><b>Introduction</b></p>
<p>This review summarises activities during the past year (to
August 1997) of relevance to STM publishers in defining <i>standards
for identifying digital information objects</i>, and <i>applications
of such standards in electronic publishing</i>. Additional
background information is available in two other documents
published this year:</p>
<ul>
<li>A brief introduction to the topic of identifiers,
recently updated by the authors: <i>Unique Identifiers: a
brief introduction, by Brian Green and Mark Bide. </i>[BIC;
March 1997 <a
href="http://www.bic.org.uk/bic/uniquid.html">http://www.bic.org.uk/bic/uniquid.html</a>]</li>
<li>A more extensive review, expanded from the paper
distributed earlier this year as an insert with STM
Newsletter 101 and since published in both paper and
electronic forms: this contains a full glossary of terms
and detailed references: <i>Information Identifiers, by
Norman Paskin</i>, [Learned Publishing, Vol. 10 No. 2, pp
135-156 (April 1997); also available at <a
href="/locate/infoident">http://www.elsevier.nl/locate/infoident</a>].</li>
</ul>
<p><b>Identifiers, document computing and electronic commerce</b></p>
<p>Information identifiers are of interest because of their
potential applications. A core concept is the distinction between
<i>"simple"</i> ("dumb",
"unintelligent" or "meaningless") identifiers
on the one hand, and "compound"
("intelligent" or "meaningful") identifiers
on the other. Simple identifiers are only a unique label for a
digital object; compound identifiers also contain other
information (<i>metadata</i>) which conveys some additional facts
such as location, format, owner, etc. Simple identifiers can also
be used to provide such information about the object they
identify, by using them to point to repositories of metadata.
These additional pieces of information about a digital object act
as hooks for other actions; in an electronic environment these
other actions typically include format and presentation
instructions (<i>document computing</i>) and rights and sales
transactions (<i>electronic commerce</i>).</p>
<p>Whilst there continues to be active discussion of simple
identifiers (in particular, PII and ISWC), much activity is
currently on potential compound identifiers (DOI, URNs, etc.).
The requirements imposed on a compound identifier for storing
metadata have consequences for the identifier itself: a complete
understanding of the topic of identifiers therefore takes us into
areas of mark-up, multimedia rights clearance systems, and
electronic commerce. </p>
<p><i>Mark-up </i>developments are briefly covered here only in
the context of relevance to identifiers. <i>Multimedia rights
clearance systems</i> are the subject of a number of initiatives,
including EC schemes such as Imprimatur [<a
href="http://www.imprimatur.alcs.co.uk/expert.htm">http://www.imprimatur.alcs.co.uk/expert.htm</a>]
and recently the EC MMRCS project within Info 2000 managed by
PIRA. [<a href="http://www2.echo.lu/info2000/en/infowkpg.html">http://www2.echo.lu/info2000/en/infowkpg.html</a>];
they will not be discussed here. </p>
<p><i>Electronic commerce systems</i> are likely to be determined
by banks and other institutions; publishers need not become
involved in their development but will wish to use proven
systems. A frontrunner is the VISA/MasterCard SET (Secure
Electronic Transaction) proposal of 1996 [<a
href="http://www.rsa.com/set/">http://www.rsa.com/set/</a>]),
which aims to have system availability in 40 countries by the end
of 1997, although this now looks optimistic as considerable
problems (due to set-up complexity and transaction times) were
reported in July 1997 by a number of banks currently trialling
SET 1.0. The World Wide Web Consortium (W3C) activity Joint
Electronic Payment Initiative (JEPI) has now been down scaled to
an Interest Group For Electronic Commerce, which had a first
meeting in April 97 and is currently awaiting member input
regarding next steps for a meeting in September 1997. [<a
href="http://www.w3.org/Payments/Activity">http://www.w3.org/Payments/Activity</a>]
</p>
<p><b>PII: Publisher Item Identifier</b></p>
<p>PII, introduced in 1995 [<a
href="/inca/homepage/about/pii/">http://www.elsevier.nl/inca/homepage/about/pii/</a>]
by the STI group of publishers, remains in active use by
publishers participating in its origination and others (e.g.
American Mathematical Society). Amongst related information
users, ISI are actively considering the use of PII in their
abstracting and indexing services. Several publishers adopting
PII have stated that they intend to use PII as the
publisher-assigned portion of future potential schemes such as
DOI. PII provides an easy to use simple identifier which can be
integrated into compound identifiers, and has the advantage of an
ASCII alphanumeric character syntax (e.g. S016538069600403) which
poses no problems for exchange protocols or naming conventions.</p>
<p>It is worth recapping why those publishers who originated the
PII continue to actively use and support it. The PII originators
required an identifier that was short enough to be useful in
document ordering; the version 1 of SICI that was in effect at
the time PII was established was grounded in print (page number
etc.) whereas something was needed which worked for electronic
information; and the latest SICI, DOI and URN developments had
not been formally initiated (arguably PII activities spurred them
on, as intended by PII participants). PII remains an effective
and easy to implement simple identifier for use within a
publishers system or for exchange between defined parties; it
also provides a very good basis for integration into the compound
identifiers and systems now being considered for usages such as
rights control and electronic commerce. The PII originators are
currently considering whether extensions to PII to allow for
specification of components to an arbitrary level of granularity
would be a useful recommendation, and if so how this might be
accomplished.</p>
<p>The question has been raised of whether the Year 2000
compliance issue (Y2K or millennium problem of computer data
systems) has any consequences for PII: it does not. A date cannot
be derived from a PII so the Y2K issue is irrelevant. PII, when
used to identify a serial item, may contain as its ninth and
tenth numerical characters two digits derived from the year of
publication (a recommendation made by the PII originators simply
as one way to derive a unique number for any serial item).
However because PII is a simple (meaningless) identifier, it
cannot be reverse engineered (i.e. meaning cannot be attributed
to individual subsequences from the PII). This is clear if a
publisher opts to use another convention to derive unique
numbers, e.g. assigning the ninth and tenth characters as 01 for
the first year of PII usage, 02 for the next and so on. In theory
there will be an analogous problem after 99 years of usage of the
PII, but it is assumed that by that time other solutions will be
available.</p>
<p><b>ISWC: International Standard Work Code</b></p>
<p>The International Standard Work Code (ISWC) is a proposal made
by CISAC to ISO in September 1996. The ISWC is currently defined
and in use within CISAC for musical works, but is not a formal
ISO standard. The proposal is to extend the scope of the CIS
(Common Information System) to works such as articles and
documents and formalise this as a standard related to other ISO
standards such as ISBN, ISSN, ISMN, etc. ISWC is itself a simple
identifier; it gains intelligence from its linkage to metadata
held elsewhere in the CIS model such as an author (composer)
database etc. [<a href="http://www.cisac.org/iswcfly.htm">http://www.cisac.org/iswcfly.htm</a>]</p>
<p>As used currently (for musical works) each ISWC is made up of
the letter "T" followed by nine digits and a check
digit e.g. ISWC T-034.524.680-1. The components of the ISWC do
not have meaning and the punctuation is for readability only. The
proposal is to create ISWCs for other kinds of works with a
different letter prefix - "L" for literary works and "S
" for scientific works
(definitions of which have not been given). L and S codes
currently have no formal status other than as items under
discussion by ISO.</p>
<p>In May 1997 ISO began to consider this proposal as Work Item
15707: Information and documentation - International Standard
Work Code (ISWC) within ISO TC 46/SC 9 and established a Working
Group. Information is available on the ISO web site [<a
href="http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm">http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm</a>].
The purpose of ISWC is <font face="WP TypographicSymbols">"</font>to
provide a means of uniquely identifying intellectual properties,
primarily for applications related to the administration of
copyright and for use within computer databases and related
documentation. The ISWC may be used in conjunction with existing
international identification systems for published materials
(e.g. ISBN, ISRC, etc.) but it is not intended to be an
alternative nor a substitute for those identifiers<font
face="WP TypographicSymbols">"</font>. The stated target
date for final publication of an approved standard is April 2000,
although attempts to speed up this timetable would be welcomed by
all affected. </p>
<p><b>SICI: Serial Item and Contribution Identifier</b></p>
<p>Although approved in August 1996, the revised Serial Item and
Contribution Identifier (SICI) was not published until April 1997
[ANSI/NISO Z39.56-1996 (Version 2) ISSN: 1041-5653]. The new
availability in this standard of SICI mechanisms for
non-paginated items (or for other identifier systems) in the
CSI-3 format greatly enhances the usefulness of the SICI to the
information industries.</p>
<p>A complementary standard for book items using a similar
methodology (BICI: Book Item and Contribution Identifier) was
formally proposed in April 1997 and is under consideration by
NISO for adoption as a standard. [<a
href="http://www.bic.org.uk/bic/bici.html">http://www.bic.org.uk/bic/bici.html</a>]</p>
<p>The use of SICIs in Internet-based systems may be complicated
by issues of character transmission: the standard naming
conventions for internet objects and resources exclude or
restrict the use of some characters (e.g. URN syntax excludes
angle brackets, square brackets, back slash). A typical SICI
contains some of these, (e.g.
0015-6914(19950605)+<>1.0.TX;2-8). Although there are
work-arounds to enable the transmission of such characters there
may be a loss of transparency to the user.<b> </b>Issues such as
these may well be dealt with as part of the DOI initiative which
encounters the same problem.</p>
<p><b>DOI: Digital Object Identifier</b></p>
<p>The Association of American Publishers has designed a system
for marking digital objects in order to facilitate electronic
commerce and enable copyright management systems. That system,
called the Digital Object Identifier System, is now under
development, in partnership with the Corporation for National
Research Initiatives (using the CNRI-developed Internet Handle
technology), and is expected to be live on a limited scale in
August, 1997. An internet web site is being maintained with
complete and up to date information about that initiative and
directions for further development of the DOI in the future [<a
href="http://www.doi.org">http://www.doi.org</a>]. </p>
<p>An extensive prototype system has been developed using data
from five publishers which will be extended and demonstrated in
Frankfurt in October 1997. Over 200,000 DOIs have been easily
assigned by publishers participating in the prototype, and
algorithms for automated DOI generation have been developed.
Links to metadata (in Warwick Framework form) are under
consideration; guidelines for creators, publishers and
information providers have been drafted [<a
href="http://www.handle.net/doi-prototype">http://www.handle.net/doi-prototype</a>].
</p>
<p>A DOI will consist of two portions: a <i>prefix</i> or
defining where to go for further information, and a <i>suffix</i>
identifying a particular object. Viewed in this way, a DOI
becomes a routing slip on the Internet carrying a ticket
identifying a particular item at its destination. The DOI suffix
will probably be (wholly or in part) an existing identifier
rather than a new scheme; in practice DOI should be able to
accommodate any scheme already in use, becoming interoperable
with <font face="WP TypographicSymbols">"</font>legacy<font
face="WP TypographicSymbols">"</font> systems. Thus the DOI
suffix will not be a single format but any of a number of
alternative suffixes including PII, SICI, ISWC, ISRC, etc.</p>
<p>There are still a number of issues to be resolved, among which
are:</p>
<p>- DOI interoperability with as wide a range of existing
identifier schemes as possible. Among these SICI is considered
essential, yet the Handle technology is an application of the URN
system; as mentioned earlier, current concept definitions of URNs
do not allow use of some characters which are used in SICIs.
Representatives of W3C/IETF have been involved in this issue,
which it is now believed can be readily resolved.</p>
<p>- The governance and commercial control of such a scheme.</p>
<p>- The funding of an operational scheme: suggestions include
creating a body which would recover costs from DOI directory or
number usage.</p>
<p>- The operational issues of such a scheme, such as numbering
agencies, directory services, etc.; an agency which assigns a
number and a directory manager which runs the routing system are
separate functions, even if handled by the same organization.</p>
<p>A recent development is the concept of an ISDI (<i>International
Standard Document Identifier</i>) introduced by NISO at an
informal working group convened in June 1997. This describes the <font
face="WP TypographicSymbols">"</font>identification piece<font
face="WP TypographicSymbols">"</font> (the suffix) of the
proposed DOI system. (That meeting did not concern itself with
the trading or registration aspects of the DOI initiative, the
prefix). The <font face="WP TypographicSymbols">"</font>identification
piece<font face="WP TypographicSymbols">"</font> has been
referred to by NISO as ISDI as a generic descriptive term, not
(as the name could imply) another standard: ISDI currently has no
formal status as a standard or proposed standard. At the June
1997 meeting in Washington DC, a preliminary conclusion was that
such an ISDI would need to carry at minimum the following:</p>
<p>- an agency identifier (the agency/registry assigning or
storing the object);</p>
<p>- an identifier type (categories such as SICI, BICI, ISRC,
etc.);</p>
<p>- an indication of the name of the assigner of the identifier
(i.e. the publisher);</p>
<p>- the identifier itself;</p>
<p>- a check digit (to be determined if this is needed).</p>
<p>NISO has recommended that only ISDIs be used in the
identification prefix of the DOI.</p>
<p>It is not yet clear whether an ISDI is anything more than a
description of the DOI suffix syntax, and if so who should be the
prescriptive authority. Discussions are continuing between NISO
and those involved in the DOI and other activities; at the time
of writing there is no formal position statement on ISDI.</p>
<p>DOI promises to bring together activities on internet routing
of information (Uniform Resource addressing technology) and
practical assignment by publishers of information identifiers
(PII, SICI, etc) into a working model for publishers. </p>
<p><b>STM activities</b></p>
<p>STM and IPA have together convened an Information Identifiers
Committee, chaired by Charles Ellis (Wiley), tasked with
facilitating an international consensus within the publishing
industry on a standard system (or systems) for identification and
application of digital information objects. The committee
includes a wide range of industry expertise, including
individuals representing PII, DOI, SICI and ISWC activities. </p>
<p>An initial statement [<a
href="http://ww.ipa-uie.org/ipa_iic.html">http://ww.ipa-uie.org/ipa_iic.html</a>]
was issued by the STM/IPA Committee in May 1997 supporting the
concept of the DOI, encouraging IPA and STM members and other
organizations to support and play an active role in its
development. Further recommendations are expected following
Frankfurt 1997.</p>
<p><b>Uniform Resource addressing</b></p>
<p>Internet technology is particularly relevant for electronic
interchange of digital objects, as in the case of DOI. Work on
extending the various definitions and standards for Uniform
Resource addressing has recently been transferred from IETF to
the W3C (World Wide Web consortium): [<a
href="http://www.w3.org/pub/WWW/Addressing/Activity#as-h2-5794">http://www.w3.org/pub/WWW/Addressing/Activity#as-h2-5794</a>]</p>
<p>Unfortunately there is still much confusion caused by careless
use or misunderstanding of various addressing terms, summarised
in table 1:</p>
<p>Table 1: Uniform Resource Addressing</p>
<table border="1" cellpadding="8" width="601"
bordercolor="#000000">
<tr>
<td width="33%"><p align="left"><font size="2">URI
(Uniform Resource Identifier)</font></p>
</td>
<td width="67%"><p align="left"><font size="2">the
generic set of all names/addresses that are short strings
that refer to resources.</font></p>
</td>
</tr>
<tr>
<td width="33%"><p align="left"><font size="2">URL
(Uniform Resource Locator)</font></p>
</td>
<td width="67%"><p align="left"><font size="2">the set of
URI schemes that have explicit instructions on how to
access the resource on the internet.</font></p>
</td>
</tr>
<tr>
<td width="33%"><p align="left"><font size="2">URN
(Uniform Resource Name)</font></p>
</td>
<td width="67%"><p align="left"><font size="2">(1) a URI
that has an institutional commitment to persistence,
availability, etc.(may also be a URL e.g. PURL)</font></p>
<p align="left"><font size="2">(2) A particular scheme
which is currently under development in the W3C and IETF
which should provide for the resolution using internet
protocols of names which have a greater persistence than
that currently associated with internet host names or
organizations. When defined, a URN(2) will be an example
of a URI. </font></p>
</td>
</tr>
<tr>
<td width="33%"><p align="left"><font size="2">URC
(Uniform Resource Citation, or Uniform Resource
Characteristics)</font></p>
</td>
<td width="67%"><p align="left"><font size="2">A set of
attribute/value pairs describing a resource. Some of the
values may be URIs of various kinds. Others may include,
for example, authorship, publisher, datatype, date,
copyright status and shoe size: a set of fields and
values with some defined free formatting. </font></p>
</td>
</tr>
</table>
<p align="left"><font size="2"><i>Based on information from </i></font><a
href="http://www.w3.org/pub/WWW/Addressing/Addressing.html"><font
size="2"><i>http://www.w3.org/pub/WWW/Addressing/Addressing.html</i></font></a></p>
<p align="left">An internet draft on <font
face="WP TypographicSymbols">"</font>Using Existing
Bibliographic Identifiers as Uniform Resource Names<font
face="WP TypographicSymbols">"</font> was issued on 22 March
1997 for comment (Internet drafts expire in a six month period)
which attempted to bring together the bibliographic standards and
internet worlds [<a
href="http://globecom.net/(nobg)/ietf/draft/draft-ietf-urn-biblio-00.shtml">http://globecom.net/(nobg)/ietf/draft/draft-ietf-urn-biblio-00.shtml</a>].</p>
<p align="left">DOI uses CNRI<font face="WP TypographicSymbols">=</font>s
<font face="WP TypographicSymbols">"</font>Handle<font
face="WP TypographicSymbols">"</font> technology which is an
application of a URN system. URNs are at present specified
conceptually but not in final implemented form. The W3C web site
describes the current situation and future work on internet
addressing as follows: Unlike web data formats and protocols HTML
and HTTP, there is only one web naming/addressing technology:
URLs. URLs are stable, standard, and ubiquitous. But their
popularity, combined with some design and implementation
oversights, has led to overly fragile service and wasteful use of
IP addresses. The wasteful use of IP addresses has been addressed
by a new specification of the technical transfer protocol, HTTP
1.1, deployment of which W3C consider to be critical. Work in the
W3C<i> Activity on SGML, XML, and Structured Document Interchange</i>
seeks to establish mechanisms for addressing into structured
documents in a general way. The URL specifications are in
revision within the IETF. W3C are considering the issue of how
much staff resource to commit to this effort. W3C are also
investigating the use of metadata to enhance link robustness. </p>
<p align="left"><b>Metadata activities</b></p>
<p align="left">Information identifiers either contain or can
point to supplementary information (<font
face="WP TypographicSymbols">"</font>metadata<font
face="WP TypographicSymbols">"</font>) enabling actions to
be carried out; common agreement on what formats such metadata
should follow will be essential. Prominent among such continuing
activities are the <font face="WP TypographicSymbols">"</font>Dublin
Core<font face="WP TypographicSymbols">"</font> (and its
follow-up activities) and Internet developments for metadata
coding such as MCF.</p>
<p align="left">The Dublin Metadata workshop of March 1995 and
the Warwick Metadata Workshop of April 1996 aimed to develop
consensus on network resource description across a broad spectrum
of stakeholders: the computer science community, text markup, and
librarians among others. The result was the Dublin Core Metadata
Element Set - a simple resource description record providing a
foundation for electronic bibliographic description, improving
structured access to information on the Internet and
interoperability among disparate description models. The Dublin
Core has now been updated and as of January 1997 specifies
fifteen elements (table 2): currently many of the elements and
their contents should be considered experimental. The Warwick
Metadata Workshop follow-on activity produced a proposed syntax
for the Dublin Core, the development of guidelines for
applications, and the <font face="WP TypographicSymbols">"</font>Warwick
Framework<font face="WP TypographicSymbols">"</font> to
promote modular, separately accessible and maintainable packages
of metadata. Thus, a Dublin Core package might be one of a number
of other packages, including packages for terms and conditions,
archiving and preservation, content ratings, and others. A third
workshop (September, 1996: CNI/OCLC Image Metadata) addressed
application of the Dublin Core to visual resources and resulted
in minor changes to the original element set. The fourth and most
recent workshop (Canberra, March 1997) addressed issues
concerning deployment of the Dublin Core including extensibility,
element structure, and element refinement.<i> Extensibility</i>
refers to making DC a minimum set on which others may build
additional elements; <i>element structure</i> refers to
identification of default schemes and subelement conventions; <i>element
refinement</i> refers to clearer definitions for certain of the
elements (e.g. coverage, relation, and rights management). [<a
href="http://www.oclc.org:5046/research/dublin_core/">http://www.oclc.org:5046/research/dublin_core/</a>]</p>
<p align="left">Table 2: Dublin Core Element Descriptions (latest
update, January 1997)<font size="2"><i> </i></font></p>
<table border="1" cellpadding="8" width="601"
bordercolor="#000000">
<tr>
<td width="20%"><p align="left"><font size="2">TITLE </font></p>
</td>
<td width="80%"><p align="left"><font size="2">The name
given to the resource by the CREATOR or PUBLISHER. </font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">CREATOR</font></p>
</td>
<td width="80%"><p align="left"><font size="2">The
person(s) or organization(s) primarily responsible for
the intellectual content of the resource. For example,
authors in the case of written documents.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">SUBJECT</font></p>
</td>
<td width="80%"><p align="left"><font size="2">The topic
of the resource, or keywords or phrases that describe the
subject or content of the resource. The intent of the
specification of this element is to promote the use of
controlled vocabularies, keywords, classification data
(e.g. Library of Congress Classification Numbers, Dewey
Decimal numbers, MEdical Subject Headings)</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">DESCRIPTION</font></p>
</td>
<td width="80%"><p align="left"><font size="2">Text
description of the content of the resource, including
abstracts in the case of</font></p>
<p align="left"><font size="2">document-like objects or
content descriptions in the case of e.g. visual
resources. Future metadata collections might include
computational content description; this field might
contain a link to such a description rather than the
description itself.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">PUBLISHER</font></p>
</td>
<td width="80%"><p align="left"><font size="2">The entity
that provides access to the resource such as a publisher,
a university department, or a corporate entity.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">CONTRIBUTORS
</font></p>
</td>
<td width="80%"><p align="left"><font size="2">Person(s)
or organization(s) in addition to those specified in the
CREATOR element who</font></p>
<p align="left"><font size="2">have made significant
intellectual contributions (e.g. editors, transcribers,
illustrators, and convenors).</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">DATE </font></p>
</td>
<td width="80%"><p align="left"><font size="2">The date
the resource was made available in its present form;
recommended 8 digit number in the form YYYYMMDD</font>.</p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">TYPE </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Category
of the resource, such as home page, novel, poem, working
paper, preprint,</font></p>
<p align="left"><font size="2">technical report, essay,
dictionary. It is expected that this will be chosen from
a specified list of types.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">FORMAT </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Data
representation of the resource, such as text/html, ASCII,
Postscript file, executable application, or JPEG image.
In principal, formats can include physical media such as
books, serials, or other non-electronic media. </font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">IDENTIFIER
</font></p>
</td>
<td width="80%"><p align="left"><font size="2">String or
number used to uniquely identify the resource. Examples
for networked resources include URLs and URNs (when
implemented), other globally-unique identifiers such as
ISBN, etc. </font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">SOURCE </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Work from
which this resource is derived, if applicable.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">LANGUAGE </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Language(s)
of the intellectual content of the resource.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">RELATION </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Relationship
to other resources: a means to express relationships
among resources that have formal relationships to others,
but exist as discrete resources themselves. For example,
images in a document, chapters in a book, or items in a
collection.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">COVERAGE </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Spatial
locations and temporal durations characteristic of the
resource.</font></p>
</td>
</tr>
<tr>
<td width="20%"><p align="left"><font size="2">RIGHTS </font></p>
</td>
<td width="80%"><p align="left"><font size="2">Link to a
copyright notice, rights-management statement, or server
that would provide such information in a dynamic way.</font></p>
</td>
</tr>
</table>
<p align="left"><font size="2"><i>Adapted from </i></font><a
href="http://purl.org/metadata/dublin_core_elements"><font
size="2"><i>http://purl.org/metadata/dublin_core_elements</i></font></a></p>
<p align="left">A<b> </b>Convention for Embedding Metadata in
HTML (i.e. tagging of meta information in HTML) was proposed
reflecting the consensus of a break-out group at a May 1996 W3C
Distributed Indexing and Searching Workshop. This group included
representatives of major players: the Dublin Core/Warwick
Framework Metadata meetings, Lycos, Microsoft, WebCrawler, the
IEEE metadata effort, Verity Software, and the W3C. Tagging in
HTML would enable Internet exchange of such metadata. [<a
href="http://www.oclc.org:5046/~weibel/html-meta.html">http://www.oclc.org:5046/~weibel/html-meta.html</a>].
Since then, proposals have been tabled in June 1997 to W3C by
NetScape for an interchange format called Meta Content Framework
(MCF), based on work initiated at Apple [<a
href="http://mcf.research.apple.com/hs/mcf.html">http://mcf.research.apple.com/hs/mcf.html</a>]
which provides a system for representing a wide range of
information about content. MCF files contain descriptions of
meta-content objects referred to as "units": a unit
consists of a unit identifier (e.g. URL) and some number of
predicates (<font face="WP TypographicSymbols">"</font>slots<font
face="WP TypographicSymbols">"</font>). MCF is not intended
to be an extension of markup languages such as HTML; it provides
a format for holding the metadata externally. MCF should be able
to represent the metadata that proposals such as the Dublin Core
aim to cover. In this way, metadata would become available to
Internet search engines, and in effect all sites that make use of
MCF would have the ability to provide categorisations of their
material: the inventor of MCF R.V. Guha has described the effect
as <font face="WP TypographicSymbols">"</font>search engines
on steroids<font face="WP TypographicSymbols">"</font>.</p>
<p align="left"><b>Mark Up Languages</b></p>
<p align="left">A document placed in an electronic environment
should be identifiable, either by containing mark-up tags for
elements such as <font face="WP TypographicSymbols">"</font>identifier<font
face="WP TypographicSymbols">" </font>(explicitly stating
the identifier); or alternatively, enable the identifier be
generated implicitly from internal document information (<font
face="WP TypographicSymbols">"</font>affordance<font
face="WP TypographicSymbols">"</font>) which must therefore
also be made available in a standard format. Documents should
also be <font face="WP TypographicSymbols">"</font>open<font
face="WP TypographicSymbols">"</font> or <font
face="WP TypographicSymbols">"</font>interoperable<font
face="WP TypographicSymbols">"</font>, i.e. readable
(exchangeable) via any common software packages through a
commonly agreed standard. Some developments in the past year with
mark-up languages assist both of these aims: the release of a new
version of the standard Internet mark-up, HTML (HyperText Markup
Language); and the proposal for XML (Extended Markup Language) of
particular interest to publishers already using SGML.</p>
<p align="left">A major potential problem with Internet exchange
of documents, especially for scientific material, is that the
HTML standard used for mark-up (layout and formatting) is being
outgrown by demands for complex document support; this has let to
many extensions of HTML - around 90 exist, many of which are
proprietary and supported only by certain software or browsers.
This problem is being resolved in two different ways. One aims to
widen the HTML standard to encompass known requirements; in July
1997, W3C released a draft of the latest HTML 4.0 intended to
exploit new features without proprietary extensions, including
greater control over forms, frames and tables, and all the
benefits of scripts, style sheets and objects. Of interest to STM
publishers, the feature of <font face="WP TypographicSymbols">"</font>Additional
Named Entities<font face="WP TypographicSymbols">"</font>
adds support for important symbols and glyphs used in
mathematics, markup and internationalization. [<a
href="http://www.w3.org/Press/HTML4">http://www.w3.org/Press/HTML4</a>].
The difficulty in this approach is that such a standard may never
be complete. </p>
<p align="left">An alternative response is represented by
Extensible Markup Language (XML), a subset of SGML (Standard
Generalized Markup Language) designed for delivery on the Web,
proposed at SGML 96 (November 1996) and resulting in a W3C
working draft proposal to the sixth WWW conference in April 1997.
The XML approach is to provide a language which can make HTML
self-extending in the true fashion of SGML, i.e. publishers can
provide their own extensions and definitions akin to DTDs and
define appropriate, readable, tags. XML could also provide a
framework for Java language applets to work in. [<a
href="http://www.w3.org/pub/WWW/TR/WD-xml.html">http://www.w3.org/pub/WWW/TR/WD-xml.html</a>];
[<a href="http://www.w3.org/pub/WWW/XML/Activity.html">http://www.w3.org/pub/WWW/XML/Activity.html</a>];[<i>Extensible
MarkUp Language: SGML On-Ramp and Web Enabler. Tim Bray</i>, The
Information Interchange Report, Vol 4 no 2/3 Nov/Dec 1996 pp1-6]</p>
<p align="left">STM publishers are also interested in
developments with mathematical mark-up; after more than a year of
in-depth study and experimentation, the HTML Math working group
released an updated working draft of MathML (Mathematical Mark-Up
Language), a way of encoding both mathematical content and visual
presentation, in July 1997. [<a
href="http://www.w3.org/pub/WWW/TR/WD-math/">http://www.w3.org/pub/WWW/TR/WD-math/</a>]</p>
<p align="left">The Document Object Model [<a
href="http://www.w3.org/MarkUp/DOM/">http://www.w3.org/MarkUp/DOM/</a>]
is a platform- and language-neutral interface that will allow
programs and scripts to dynamically access and update the
content, structure and style of documents ("Dynamic
HTML" is a term used by some vendors to describe the
combination of HTML, style sheets and scripts). The document can
be further processed and the results of that processing can be
incorporated back into the presented page. Requirements are being
gathered for a first release of <font
face="WP TypographicSymbols">"</font>level one<font
face="WP TypographicSymbols">"</font> (functionality
equivalent to that currently exposed in Netscape Navigator 3.0
and Microsoft Internet Explorer 3.0) in the second half of 1997.
While of great interest in the long term, it seems unlikely that
such interactive documents will be widely implemented in the STM
world in the next year or so.</p>
<p align="left"><b>The way forward</b></p>
<p align="left">Internet standards are inescapably at the centre
of likely future scenarios for our industry. The pace of
development in this area leads to some conflict; for example,
both HTML 4.0 and XML arise within W3C, yet the two are in
tension, even to the extent that Tim Berners-Lee (W3C's Director)
stated in July 1997:<i> </i><font face="WP TypographicSymbols"><i>A</i></font><i>"It's
no wonder consumers, buyers and IT managers are concerned.....
Extensible Markup Language (XML) naturally supports a variety of
applications which could compromise the design of HTML</i><font
face="WP TypographicSymbols"><i>"</i></font><i>.</i> [<a
href="http://www.w3.org/Press/HTML4-pers.html">http://www.w3.org/Press/HTML4-pers.html</a>].</p>
<p align="left">It is clear from recent activities such as MCF
and other NetScape and Microsoft proposals to W3C that Internet
standards (de facto or de jure) are now being heavily influenced
by commercial technology players fighting to provide better
access tools for internet and intranet applications in general
(and by so doing to gain commercial advantage for their
particular tools with a W3C imprimatur). Publishers will no doubt
benefit from these activities but have little chance of
influencing them. The World-Wide Web Consortium has so far not
produced many actions of immediate specific concern to STM
publishers; document identification, rights clearance mechanisms
and so on appear to be taking a relatively minor position in its
priorities compared to technical infrastructure issues and
pressing matters such as <font face="WP TypographicSymbols">"</font>next
generation<font face="WP TypographicSymbols">"</font>
addressing protocols. All of this is understandable but also
inevitable if one considers that most members of the W3C are
technology companies; few are electronic publishers, and only one
company (Reed-Elsevier) is a major publisher of both traditional
paper and electronic information. We cannot expect that special
cases such as STM material presentation, representing a tiny
proportion of internet traffic, will receive any favoured
treatment; we can however hope that the generation of
sufficiently open standards and technology will enable STM
material and transactions to be satisfactorily accomodated in
future web standards. As W3C reaches the end of its first
three-year funding and considers how to renew funding subscribers
(and attract more) this emphasis may change (which suggests a
possible action for those publishers interested in influencing
such events).</p>
<p align="left">STM publishers view the future scientific article
as containing multimedia elements: <font face="Times New Roman">full
text and abstract text; live </font><font
face="WP TypographicSymbols">"</font><font
face="Times New Roman">hot spot</font><font
face="WP TypographicSymbols">" </font><font
face="Times New Roman">references; video or audio clips;
supplementary data tables; software linkages to e.g. 3-D models;
links to other internet sites; forward links to comments,
corrections, future papers, etc. How can identifiers and metadata
assist us in developing such a rich system? The future digital
object will need to take the following themes for a solution:</font></p>
<p align="left"><font face="Times New Roman">- <i>Unique
identification</i>: unambiguous identification of a defined piece
of information, possibly with details of medium, version, format
etc.;</font></p>
<p align="left"><font face="Times New Roman">-<i> Multiple
linkage</i>: by stating which naming convention is used, multiple
naming or identification schemes should be possible (an idea
adopted in SICI and DOI).</font></p>
<p align="left"><font face="Times New Roman">- <i>Multiple
(overlapping) identification</i> of content (e.g. a sound clip
within a digital object may be identified by a music identifier
as well as being part of a document with another identifier; the
Dublin concept of relation may prove useful here); </font></p>
<p align="left"><font face="Times New Roman">- <i>Arbitrary
granularity</i>: if a publisher wants to identify a paragraph or
equation as a separate item he can do so; </font></p>
<p align="left"><font face="Times New Roman">- <i>Cascading
responsibility</i>: once below a certain level, no central agency
permission needed to assign unique numbers (sub-levels assigned
by the owner of the higher level);</font></p>
<p align="left"><font face="Times New Roman">- <i>Links to
metadata</i>: via simple identifiers pointing to specific
repositories for different needs, e.g. copyright, trading, EDI</font></p>
<p align="left"><font face="Times New Roman">- <i>Open standards</i>:
technical architecture interoperable with standard software
packages, making use of W3C approved standards.</font></p>
<p align="left"><font face="Times New Roman">- <i>Distributed
data</i>: not all data and metadata held on one site; a virtual
single network created from multiple interlinked servers.</font></p>
<p align="left"><font face="Times New Roman">- </font><font
face="WP TypographicSymbols">"</font><font
face="Times New Roman"><i>Many but dumb</i></font><font
face="WP TypographicSymbols">":</font><font
face="Times New Roman"> a network of interconnected simple
identifiers and links is preferable to a all-embracing single
standard identifier which attempts to cover everything from a
scientific article to a new music release.</font></p>
<p align="left"><font face="Times New Roman">Once we have a
recognised interoperable network in which to exchange information
about digital information objects, we can begin to apply some of
the emerging electronic commerce standards to carry out
commercial transactions with them. </font>The demonstration of
DOI at Frankfurt this year holds out the promise of one such
workable system.</p>
<p align="left"><b>Glossary of abbreviations used in this review</b></p>
<table border="0">
<tr>
<td>AAP</td>
<td>Association of American Publishers</td>
</tr>
<tr>
<td>ANSI</td>
<td>American National Standards Institute</td>
</tr>
<tr>
<td>ASCII</td>
<td>7-bit American National Standard Code for Information
Interchange, ANSI X3.4:1986</td>
</tr>
<tr>
<td>BIC</td>
<td>Book Industry Communication (UK organisation)</td>
</tr>
<tr>
<td>BICI</td>
<td>Book Item and Contribution Identifier (proposed NISO
development)</td>
</tr>
<tr>
<td>CIS</td>
<td>Common Information System (CISAC)</td>
</tr>
<tr>
<td>CISAC</td>
<td>Confederation International des Societies d<font
face="WP TypographicSymbols">=</font>Auteurs et
Compositeurs = International confederation of societies
of authors and composers</td>
</tr>
<tr>
<td>DOI</td>
<td>Digital Object Identifier (AAP)</td>
</tr>
<tr>
<td>EC</td>
<td>European Commission</td>
</tr>
<tr>
<td>HTTP</td>
<td>Hyper Text Transfer Protocol</td>
</tr>
<tr>
<td>IETF</td>
<td>Internet Engineering Task Force</td>
</tr>
<tr>
<td>IFPI</td>
<td>International Federation of Phonographic Industries
(London)</td>
</tr>
<tr>
<td>IPA</td>
<td>International Publishers Association</td>
</tr>
<tr>
<td>ISBN</td>
<td>International Standard ISO 2108:1992 <br>
Information and Documentation - International Standard
Book Numbering (ISBN)</td>
</tr>
<tr>
<td>ISDI</td>
<td>International Standard Document Identifier (proposed
term)</td>
</tr>
<tr>
<td>ISI</td>
<td>Institute of Scientific Information, Inc.</td>
</tr>
<tr>
<td>ISMN</td>
<td>International Standard ISO 10957:1993 <br>
Information and Documentation - International Standard
Music Number (ISMN)</td>
</tr>
<tr>
<td>ISO</td>
<td>International Organization for Standardization </td>
</tr>
<tr>
<td>ISRC</td>
<td>International Standard ISO 3901:1986 <br>
Documentation<b> - </b>International Standard Recording
Code (ISRC): administered by IFPI</td>
</tr>
<tr>
<td>ISSN</td>
<td>International Standard ISO 3297:1986 <br>
Documentation - International Standard Serial Numbering
(ISSN)<br>
US equivalent: ANSI Z39.9:1979 (R1984)</td>
</tr>
<tr>
<td>ISWC</td>
<td>International Standard Work Code (currently proposed
to ISO TC 46)</td>
</tr>
<tr>
<td>NISO</td>
<td>National Information Standards Organisation (USA)</td>
</tr>
<tr>
<td>OCLC</td>
<td>Online Computer Library Center Inc.</td>
</tr>
<tr>
<td>PII</td>
<td>Publisher Item Identifier</td>
</tr>
<tr>
<td>STI</td>
<td>Scientific, Technical and Information publishers<font
face="WP TypographicSymbols">=</font> group (ACS, AIP,
APS, IEEE, Elsevier Science)</td>
</tr>
<tr>
<td>STM</td>
<td>International Association of Scientific, Technical
and Medical Publishers </td>
</tr>
<tr>
<td>URC</td>
<td>(1) Uniform Resource Citation (IETF)<br>
(2) Uniform Resource Characteristic (IETF)</td>
</tr>
<tr>
<td>URI</td>
<td>Uniform Resource Identifier (IETF)</td>
</tr>
<tr>
<td>URL</td>
<td>Uniform Resource Locator (IETF)</td>
</tr>
<tr>
<td>URN</td>
<td>Uniform Resource Name (IETF)</td>
</tr>
<tr>
<td>W3C</td>
<td>World Wide Web Consortium</td>
</tr>
<tr>
<td>XML</td>
<td>Extensible Markup Language (subset of SGML)</td>
</tr>
</table>
<hr>
<p align="left"><b>Dr. Norman Paskin</b><br>
Director, Information Technology Development<br>
Elsevier Science<br>
The Boulevard<br>
Langford Lane<br>
Kidlington<br>
Oxford OX5 1GB, UK<i><br>
Tel: (+44) (0) 1865 843798<br>
Fax: (+44) (0) 1865 843967<br>
E mail: </i><a href="mailto:n.paskin@elsevier.co.uk"><i>n.paskin@elsevier.co.uk</i></a></p>
<hr>
</body>
</html>
<p>
Last update: 17 September 1997
<hr>
<font size=-1>Mirror sites: <a href="http://www.elsevier.nl" target="_top">www.europe</a> | <a href="http://www.elsevier.com" target="_top">www.usa</a> | <a href="http://www.elsevier.co.jp" target="_top">www.japan</a></font>
<br>
© <a href = "/inca/homepage/about/c_right/">Copyright</a> 1997, Elsevier Science, All rights reserved.<br>
<!-- To avoid double titles -->
<img src=/inca/homepage/layout/images/blank.gif width=10 height=250>
</body></html>