Discussion:
Last Call: 'Tags for Identifying Languages' to BCP
JFC (Jefsey) Morfin
2005-08-24 09:03:06 UTC
Permalink
I would like to understand why
http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
claims to be a BCP: it introduces a standard track proposition,
conflicting with current practices and development projects under way?

I support it as a transition standard track RFC needed by some, as
long as it does not exclude more specific/advanced language
identification formats, processes or future IANA or ISO 11179
conformant registries. In order to avoid conflicts, its ABNF should
be completed in dedicating a singleton to the general tag
URI
(http://www.ietf.org/internet-drafts/draft-kindberg-tag-uri-07.txt
accepted RFC).

jfc


At 18:45 23/08/2005, The IESG wrote:
>The IESG has received a request from the Language Tag Registry Update WG to
>consider the following documents:
>
>- 'Initial Language Subtag Registry '
> <draft-ietf-ltru-initial-04.txt> as an Informational RFC
>- 'Tags for Identifying Languages '
> <draft-ietf-ltru-registry-12.txt> as a BCP
>
>The IESG plans to make a decision in the next few weeks, and solicits
>final comments on this action. Please send any comments to the
>***@ietf.org or ***@ietf.org mailing lists by 2005-09-06.
>
>The file can be obtained via
>http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-04.txt
>http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
Scott Hollenbeck
2005-08-24 11:24:02 UTC
Permalink
> -----Original Message-----
> From: ietf-***@ietf.org [mailto:ietf-***@ietf.org] On
> Behalf Of JFC (Jefsey) Morfin
> Sent: Wednesday, August 24, 2005 5:03 AM
> To: ***@ietf.org; ***@ietf.org
> Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP
>
> I would like to understand why
> http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
> claims to be a BCP: it introduces a standard track proposition,
> conflicting with current practices and development projects under way?
>
> I support it as a transition standard track RFC needed by some, as
> long as it does not exclude more specific/advanced language
> identification formats, processes or future IANA or ISO 11179
> conformant registries. In order to avoid conflicts, its ABNF should
> be completed in dedicating a singleton to the general tag
> URI
> (http://www.ietf.org/internet-drafts/draft-kindberg-tag-uri-07.txt
> accepted RFC).

Jefsey,

First, let's agree that you've asked this question [1], made this suggestion
[2], and engaged in discussion of these topics on the LTRU working group
mailing list. I know you haven't been happy with the way the discussion
went, but these are not new topics. Agreed?

Why a BCP? Production of this document is a direct requirement of the group
charter:

"This working group will address these issues by developing two documents.
The first is a successor to RFC 3066."

3066 is BCP 47. The introduction and list of changes included in the
document describe why and how it is obsoleting 3066.

The ABNF suggestion has been discussed, partially accepted, and partially
rejected by the working group. If you have new information to describe why
you think the working group decision was a mistake, please describe it.

-Scott-
LTRU Area Advisor

[1]
http://www1.ietf.org/mail-archive/web/ltru/current/msg03360.html

[2]
http://www1.ietf.org/mail-archive/web/ltru/current/msg03196.html
JFC (Jefsey) Morfin
2005-08-24 13:02:36 UTC
Permalink
On 13:24 24/08/2005, Scott Hollenbeck said:
> > -----Original Message-----
> > From: ietf-***@ietf.org [mailto:ietf-***@ietf.org] On
> > Behalf Of JFC (Jefsey) Morfin
> > Sent: Wednesday, August 24, 2005 5:03 AM
> > To: ***@ietf.org; ***@ietf.org
> > Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP
> >
> > I would like to understand why
> > http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
> > claims to be a BCP: it introduces a standard track proposition,
> > conflicting with current practices and development projects under way?
> >
> > I support it as a transition standard track RFC needed by some, as
> > long as it does not exclude more specific/advanced language
> > identification formats, processes or future IANA or ISO 11179
> > conformant registries. In order to avoid conflicts, its ABNF should
> > be completed in dedicating a singleton to the general tag
> > URI
> >
> (<http://www.ietf.org/internet-drafts/draft-kindberg-tag-uri-07.txt>http://www.ietf.org/internet-drafts/draft-kindberg-tag-uri-07.txt
>
> > accepted RFC).
>
>Jefsey,
>First, let's agree that you've asked this question [1], made this suggestion
>[2], and engaged in discussion of these topics on the LTRU working group
>mailing list. I know you haven't been happy with the way the discussion
>went, but these are not new topics. Agreed?

Dear Scot
This an IESG last call. The IESG solicited final comments on its
intent to take a decision on the document - not on the WG methods. I
am honored to be involved in an internal discussion to the IESG by
the AD in charge, but if the IESG has already set-up its mind, what
is the use of a Last Call period?

The considered Draft does not describe a practice. It is a standard
track proposition among many others, existing or possible, including
better ones (according to his author), in an area where expertise is scarce.

>Why a BCP? Production of this document is a direct requirement of the group
>charter: "This working group will address these issues by developing
>two documents.
>The first is a successor to RFC 3066." 3066 is BCP 47. The
>introduction and list of changes included in the
>document describe why and how it is obsoleting 3066.

A successor is not necessarily a replacement. This question marred
the two last previous Last Calls of this proposition. Time has come
to address this in deprecating RFC 3066/BCP 47 and in considering
this Draft as what it is: a standard track RFC.

>The ABNF suggestion has been discussed, partially accepted, and partially
>rejected by the working group. If you have new information to describe why
>you think the working group decision was a mistake, please describe it.

The IESG is to determine is if there is a consensus or not about the
Draft. It is not new the sun is not blue. It is not new that
commercial interests are in conflict with open sources. There are on
this list - and this is the purpose of a LC - all the IETF
competences to evaluate if the partial acceptance of my suggestions
went far enough or not.

A technical conflict remains a conflict. One may dislike it, but one
has to address it. We have two contradicting propositions, one
accepted as an RFC, one here under discussion. Both use W3C needs as
a motive, but both authors claim (one by disclaimer in his text, the
other in the WG debate) they do not represent the W3C positions. May
be a LC is the proper time, and this list the best place, for W3C to
tell us officially the tags, the private use tags or the other tag
formats they (also) want. And the same for all the other concerned SDOs.

All the best.
jfc
Scott Hollenbeck
2005-08-24 16:25:08 UTC
Permalink
On Wed, August 24, 2005 9:02 am, JFC (Jefsey) Morfin said:
> On 13:24 24/08/2005, Scott Hollenbeck said:
>> > -----Original Message-----
>> > From: ietf-***@ietf.org [mailto:ietf-***@ietf.org] On
>> > Behalf Of JFC (Jefsey) Morfin
>> > Sent: Wednesday, August 24, 2005 5:03 AM
>> > To: ***@ietf.org; ***@ietf.org
>> > Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP
>> >
>> > I would like to understand why
>> > http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
>> > claims to be a BCP: it introduces a standard track proposition,
>> > conflicting with current practices and development projects under way?
>> >
>> > I support it as a transition standard track RFC needed by some, as
>> > long as it does not exclude more specific/advanced language
>> > identification formats, processes or future IANA or ISO 11179
>> > conformant registries. In order to avoid conflicts, its ABNF should
>> > be completed in dedicating a singleton to the general tag
>> > URI
>> >
>> (<http://www.ietf.org/internet-drafts/draft-kindberg-tag-uri-07.txt>
>>
>> > accepted RFC).
>>
>>Jefsey,
>>First, let's agree that you've asked this question [1], made this
>> suggestion
>>[2], and engaged in discussion of these topics on the LTRU working group
>>mailing list. I know you haven't been happy with the way the discussion
>>went, but these are not new topics. Agreed?
>
> Dear Scot
> This an IESG last call. The IESG solicited final comments on its
> intent to take a decision on the document - not on the WG methods. I
> am honored to be involved in an internal discussion to the IESG by
> the AD in charge, but if the IESG has already set-up its mind, what
> is the use of a Last Call period?

This is not an internal discussion. You sent a question to both the IESG
mailing list and the IETF discussion list. The people on these lists are
not necessarily familiar with the discussion that took place on the LTRU
working group mailing list. I believe it is necessary to help those readers
understand your questions and comments by putting them in context.

The IESG has not made up it's mind about anything. We do, however, need to
understand what transpired during the course of working group deliberations
as we attempt to understand your questions and comments.

> The considered Draft does not describe a practice. It is a standard
> track proposition among many others, existing or possible, including
> better ones (according to his author), in an area where expertise is
> scarce.

Thank you for that comment. It will be considered by the IESG.

>>Why a BCP? Production of this document is a direct requirement of the
>> group
>>charter: "This working group will address these issues by developing
>>two documents.
>>The first is a successor to RFC 3066." 3066 is BCP 47. The
>>introduction and list of changes included in the
>>document describe why and how it is obsoleting 3066.
>
> A successor is not necessarily a replacement. This question marred
> the two last previous Last Calls of this proposition. Time has come
> to address this in deprecating RFC 3066/BCP 47 and in considering
> this Draft as what it is: a standard track RFC.

The working group was chartered to produce a document that replaces RFC 3066
according to my reading of the charter. The registry draft notes this
status in the Introduction. Yesterday the working group received an AD
evaluation comment that this status also needs to be more explicitly noted
in the first page document header. I consider this an editorial issue that
can be resolved in the context of any other edits required as a result of
last call review.

If you are suggesting that this document does not meet the requirements of
the working group charter, then that is a valid comment that must be
considered by the IESG.

>>The ABNF suggestion has been discussed, partially accepted, and partially
>>rejected by the working group. If you have new information to describe
>> why
>>you think the working group decision was a mistake, please describe it.
>
> The IESG is to determine is if there is a consensus or not about the
> Draft. It is not new the sun is not blue. It is not new that
> commercial interests are in conflict with open sources. There are on
> this list - and this is the purpose of a LC - all the IETF
> competences to evaluate if the partial acceptance of my suggestions
> went far enough or not.
>
> A technical conflict remains a conflict. One may dislike it, but one
> has to address it. We have two contradicting propositions, one
> accepted as an RFC, one here under discussion. Both use W3C needs as
> a motive, but both authors claim (one by disclaimer in his text, the
> other in the WG debate) they do not represent the W3C positions. May
> be a LC is the proper time, and this list the best place, for W3C to
> tell us officially the tags, the private use tags or the other tag
> formats they (also) want. And the same for all the other concerned SDOs.

I provided a link to your working group comment to help readers of this
exchange (including the IESG) review both your suggestion and the working
group response to your suggestion. Thank you for this clarifying
information.

-Scott-
JFC (Jefsey) Morfin
2005-08-25 03:29:17 UTC
Permalink
On 18:25 24/08/2005, Scott Hollenbeck said:
The people on these lists are
>not necessarily familiar with the discussion that took place on the LTRU
>working group mailing list. I believe it is necessary to help those readers
>understand your questions and comments by putting them in context.

Dear Scott,
what is surprising is that a Draft dealing with standard track
issues, replacing an RFC dealing with the same issues, itself
replacing a standard track RFC 1766 considering the same issues, can
become a BCP.

As you now know it is because RFC 3066 became BCP 47. Alluding to
WG-ltru is then confusing: the need is for someone with good memory
to tells us how the content of the Standard Track RFC 3066 has been
granted a BCP status, while deprecating a Standard Track RFC?

And to discuss how to correct that situation.

>The IESG has not made up it's mind about anything. We do, however, need to
>understand what transpired during the course of working group deliberations
>as we attempt to understand your questions and comments.

If you say so. FYI never discussed the RFC 1766 origin because the
point was out of the WG-ltru scope.

> > The considered Draft does not describe a practice. It is a standard
> > track proposition among many others, existing or possible, including
> > better ones (according to his author), in an area where expertise is
> > scarce.
>
>Thank you for that comment. It will be considered by the IESG.
>
> >>Why a BCP? Production of this document is a direct requirement of the
> >> group
> >>charter: "This working group will address these issues by developing
> >>two documents.
> >>The first is a successor to RFC 3066." 3066 is BCP 47. The
> >>introduction and list of changes included in the
> >>document describe why and how it is obsoleting 3066.
> >
> > A successor is not necessarily a replacement. This question marred
> > the two last previous Last Calls of this proposition. Time has come
> > to address this in deprecating RFC 3066/BCP 47 and in considering
> > this Draft as what it is: a standard track RFC.
>
>The working group was chartered to produce a document that replaces RFC 3066
>according to my reading of the charter.

I think it is slightly more complex. The Charter says:
"RFC 3066 [a BCP] and its predecessor, RFC 1766 [a Standard Track],
defined language tags for use on the Internet.".

It then says:
"This working group will address these issues by developing two
documents. The first is a successor to RFC 3066."

Then, deep in the Charter, it says:
"The current registry contains pairs like uz-Cyrl/uz-Latn and
sr-Cyrl/sr-Latn, but RFC 3066 contains no general mechanism or
guidance for how scripts should be incorporated into language tags;
this replacement document is expected to provide such a mechanism."

You know the lack of Charter analysis by the WG I often complained
about. This is the reason why the WG has not been in a position to
elucidate that "thorny" point. I consider legitimate and reasonable to say:

1. we have a need: to better address the language identification in
Internet protocols
2. this is a standard track issue introduced by RFC 1766 and we need
to correct the difficulty met by its successor RFC 3066
3. one of these difficulties is that such an Internet core issue to
the Multilingual Internet is addressed by a closed BCP instead of an
open Standard. Let make the successor document a Standard Track
document replacing a centralised control by a distributed solution.

>The registry draft notes this
>status in the Introduction. Yesterday the working group received an AD
>evaluation comment that this status also needs to be more explicitly noted
>in the first page document header. I consider this an editorial issue that
>can be resolved in the context of any other edits required as a result of
>last call review.
>
>If you are suggesting that this document does not meet the requirements of
>the working group charter, then that is a valid comment that must be
>considered by the IESG.

I consider this document meets _some_ but _not_all_ the requirements
of the working group charter (I could provide a long list if this was
necessary). I submit this because 25 years of applied R&D, operations
and survey in this area shown me no one on earth has today the
expertise to stabilise any standard or practice in the brain to brain
language interoperability area. This why _any_ proposition can only
be "ad experimendam".

We met a very similar, however limited, situation in the IDN case. We
all know the result. I suggest we do not repeat the same approach and
that we deliver something stable the Internet community can rely on,
live and develop with.

To do that there are three options.

1. to replace the BCP 47 (RFC 3066) by an Internet standard process
framework dealing with the Multilingual Internet support. I started
working on such a Draft one year ago. This was delayed by the
WG-ltru: it has been a tough but fruitful experience I am now able to
use in a preliminary proposition. I hope to be able to publish a
preliminary draft very soon, with reasonable international support
expectations. This will probably be a long process.

2. to mend the closed format proposed by the Draft. For the reasons
above I submit this is not feasible now, because no one has the
expertise, the vision, the experience and the necessary universal
support to do it today. I expect however that the community process I
will propose will help doing it, based upon the experience. I would
not be surprised if the R&D involved lead to a work of a magnitude
comparable to IPv6 but calling on many sub-expertises the IETF has
not right now.

3. to make the proposed Draft a default language identification
system, supporting any other identification schemes in using one
singleton as an introduction sequence. I note that during the WG-ltru
process I have documented the support of my organisation to an ABNF
included in the URI tags RFC. Some refinements are being discussed. I
am ready to document such an extension through a WG-ltru specialised draft.

Everyone knows that people from large stakeholders, consortia and
standardisation entities are involved in the proposition. They may
find some commercial motivation in the use of the Draft current
closed format. I submit this would be a mistake because they would
block their own R&D (which will most probably agree with the results
of our own experimentation when they have done their own home work).
They would also run into the risk to attach a poor image to their
name, if Governments and users are dissatisfied as they will most
probably be.

This is why I think my "default" proposition meets the requirements
and the interests of everyone:

- large stakeholders benefit from a stable solution and lead (rather
than a controverted IANA)
- their own R&D can develop
- competitive specific propositions and innovation can develop
without harming anything, comforting the common approach
- community R&D and grassroots projects can develop and boost the
Multilingual Internet - what we missed in the IPv6 case.

> >>The ABNF suggestion has been discussed, partially accepted, and partially
> >>rejected by the working group. If you have new information to describe
> >> why
> >>you think the working group decision was a mistake, please describe it.
> >
> > The IESG is to determine is if there is a consensus or not about the
> > Draft. It is not new the sun is not blue. It is not new that
> > commercial interests are in conflict with open sources. There are on
> > this list - and this is the purpose of a LC - all the IETF
> > competences to evaluate if the partial acceptance of my suggestions
> > went far enough or not.
> >
> > A technical conflict remains a conflict. One may dislike it, but one
> > has to address it. We have two contradicting propositions, one
> > accepted as an RFC, one here under discussion. Both use W3C needs as
> > a motive, but both authors claim (one by disclaimer in his text, the
> > other in the WG debate) they do not represent the W3C positions. May
> > be a LC is the proper time, and this list the best place, for W3C to
> > tell us officially the tags, the private use tags or the other tag
> > formats they (also) want. And the same for all the other concerned SDOs.
>
>I provided a link to your working group comment to help readers of this
>exchange (including the IESG) review both your suggestion and the working
>group response to your suggestion. Thank you for this clarifying
>information.

I have repeatedly indicated that I would appeal from a positive
decision of IESG concerning the current status of the Draft. Because
it would harm and delay the Internet community, for the global reason
I give above and for many non-detailed "sub-reasons". This lead the
Shepherding Chair to send you and through you the IESG a summary of
my objections. I have no copy of this mail.

jfc
Scott Hollenbeck
2005-08-25 12:42:25 UTC
Permalink
> -----Original Message-----
> From: JFC (Jefsey) Morfin [mailto:***@jefsey.com]
> Sent: Wednesday, August 24, 2005 11:29 PM
> To: Scott Hollenbeck; ***@ietf.org
> Cc: ***@iesg.org; ***@ietf.org
> Subject: RE: Last Call: 'Tags for Identifying Languages' to BCP

[snip]

> 3. one of these difficulties is that such an Internet core issue to
> the Multilingual Internet is addressed by a closed BCP instead of an
> open Standard. Let make the successor document a Standard Track
> document replacing a centralised control by a distributed solution.

Jefsey,

You seem to be misunderstanding the distinction between BCPs and standards
track documents. There's nothing "closed" about BCPs. They are subject to
the exact same community review, approval, and appeal processes that are
used for standards track documents.

If I've misunderstood what you mean by "closed", please feel free to clarify
so that I and others can better understand your comment.

-Scott-
JFC (Jefsey) Morfin
2005-08-25 15:35:04 UTC
Permalink
At 14:42 25/08/2005, Scott Hollenbeck wrote:
> > -----Original Message-----
> > From: JFC (Jefsey) Morfin [mailto:***@jefsey.com]
> > Sent: Wednesday, August 24, 2005 11:29 PM
> > To: Scott Hollenbeck; ***@ietf.org
> > Cc: ***@iesg.org; ***@ietf.org
> > Subject: RE: Last Call: 'Tags for Identifying Languages' to BCP
>
>[snip]
>
> > 3. one of these difficulties is that such an Internet core issue to
> > the Multilingual Internet is addressed by a closed BCP instead of an
> > open Standard. Let make the successor document a Standard Track
> > document replacing a centralised control by a distributed solution.
>
>Jefsey,
>
>You seem to be misunderstanding the distinction between BCPs and standards
>track documents. There's nothing "closed" about BCPs. They are subject to
>the exact same community review, approval, and appeal processes that are
>used for standards track documents.
>
>If I've misunderstood what you mean by "closed", please feel free to clarify
>so that I and others can better understand your comment.

Dear Scott,
you ask two totally separate questions here: the BCP nature of the
Draft, and the open/closed ABNF issue.
I will try to explain each of them in detail. I apologise being long,
but we face a complex new Internet phase (brain to brain
interintelligibility), most probably more complex than the whole
today Internet. We need to be careful at not committing a mistake on
its very root.

1. RFC 2026 says that
"The BCP subseries of the RFC series is designed to be a way to
standardize practices and the results of community deliberations.".

A BCP is to describe and stabilise existing practices, present best
community and IESG/IAB leadership thinking and to document the
Internet standard process itself. One of the practical consequences is that:
- an appeal does not prevent a BCP to be enforced, since it
should describe and stabilise something which already exists.
- no successful implementations are required before it being confirmed.
- IESG will not accept a document for information or for
experimentation contradicting it, since it is a community best practice.
- error are not easy to correct: every RFC does not make a standard,
but every BCP is supposed to document a reality to stay compatible
with. If the initial orientation is incorrect this will stay for ever.

While a standard track RFC is enforced only when published and
confirmed after starting being successfully used, using a BCP vehicle
permits a "fait accompli" strategy, all the more when a IANA registry
has been implemented. This may be extremely costly to the Internet
and the users, in money and delay. In this case, an error would be
dramatic as we all know that multilingualism and the engaged
balkanisation risk are key issue for the Internet. Can we foot
another IDNA (which fortunately are Standard Track and informational?).

The target of the authors (this was still discussed at the WG-ltru
yesterday) is to immediately enforce the ABNF in the IANA registry.
This is not normal since the Draft introduces a new standard track
proposition. This proposition imposes the IANA langtag registry a
strict format and is said to be needed in a much larger number of new
occasions (XML) while the danger to the user and the privacy legal
aspects have not been investigated. This format conflicts with other
formats resulting from the lose application of the out-dated RFC
3066, or waiting for the corrections needed of RFC 3066, or
respecting the URI tag RFC (from those I know). Actually this Draft
opposes other existing or projected practices and projects. This was
documented by the Draft's author: he begged a competitive Draft from
me, so the "best" could "win". He acknowledged better formats could
be devised, such as an alternative project he had in the past.

IMHO the "winner" of an IETF document is to be the user community. A
standard track document must define the best solution, a BCP must
inclusively report on a community consensus or respect every serious
or dedicated practice and R&D. This is all the more true in an area
where the IETF community knows it has no particular experience, and
no one in the world as final expertise. My own organisation's R&D
"bad practices" are to respect the URI tag RFC and to disrespect the
Draft proposition. This only shows that we are in a standard track
case and that (even if we are wrong) we must be able to appeal
_before_ what we consider as a deadly error is imposed on the Internet.

Another situation where a BCP could be acceptable, would be a
community consensus that a current mix of practices would be a real
disorder. Then the Draft constraints could be understandable. However
not if it creates a greater disorder. The rigidity of the Draft will
create such a disorder at some stage at ruling in a
human/culture/political area. This rigidity is acceptable in a
federating proposed standard but is opposed to the very concept of a
BCP. I documented some areas it does not address and the security
problems it creates. I note that I obtained no indication from the
WG-ltru on the use and the ways of use of the Registry, what is the
basic of a practice description and what will affect the practical
feasibility of the proposed standard. The load on the IANA servers
may be tremendous if XML libraries start calling the IANA servers or
even if the locale want to cache the registry. This would represent a
technical and financial bottleneck - as a ccTLD Registry I am
directly interested as we contribute to the cost of the IANA service.
This is still terra quasi incognita.

On 15:23 25/08/2005, Bill Fenner said:
>In March, 1995, when RFC 1766 was published, the BCP track did not
>exist. The Standards Track was being used for things that were not
>protocols and did not fit well into the 3-stage process. Since BCPs
>are subject to the same consensus judging and scrutiny as
>standards-track documents, it's been common practice to obsolete old
>standards-track documents with BCPs when it's reasonable to think
>that the original document would have been a BCP if BCPs had existed
>at the time.

I thank Bill for this input. This clarifies the historical origin and
demonstrates the misunderstanding. The RFC 1766, 3066 and the Draft
addresses two different issues: a registry and a protocol issue. I
suppose that a registry management issue can be documented by a BCP.
But the proposition defines the exclusive management of a community
property (the langtag IANA registry) it can be only be finalised by a
serious positive community consensus. I submit that the IETF
community has not yet the common expertise and the multilingual
culture to uncover such a consensus. I note that all the participants
to this debate, but me, have a perfect command of the English
language and the majority has not the equivalent command of another
language: this may not be the best experience to address and
establish non-English language management rules?

But it seems odd to document the option to use a registry (the
langtags do not necessitate a central directory in the URI tag
practice) and the core of the complex interintelligibility layer
protocol architecture to develop by a BCP (unless an IAB BCP?) ,
while there is:
- not even a community global awareness,
- no consensus on the nature of what we discuss
- a need of a Draft because the RFC 3066 use is limited by its lacks

The BCP nature could be accepted should the Draft intend to
complement RFC 2026 and be dedicated to the support of Multilingual
Internet specifics. This was the intent of the Draft I started
working on. Such a BCP would for example introduce a "multilingual
support" section in every Internet standard process document. This is
not the case.

RFC 3066 can also be accepted as a BCP as written by an IESG Member
outside of any WG, documenting the still existing private mailing
list ietf-***@alvestrand.no as the reviewer of the IANA langtag
registry. In such a case RFC 3066 is an "IESG incitation". But the
Charter obsoletes that incitation, obsoleting it. This should return
the Draft to the RFC 1766 standard track not-a-standard-yet status.

Another possibility (the one I would have favored by far) would be a
new "IESG/IAB incitation" in the area: an IESG or IAB documentation
of the multilingual internet framework I call for.

I do not think the Draft standardises a common practice. I do not
think the Draft is the result of a community wide delibarated consensus.


2. the open/closed nature of the proposition.

The difference between an open and a closed proposition is that a
closed proposition is built to cover every possible issue and
prevents possible additions then considered as a confusion, except
through a revision. An open proposition is built to (be able) to
welcome and support additional possibilities without revision. The
current Draft is a closed proposition: yesterday the WG-ltru list
still discussed ways to forbid usages they did not think about.

My "default proposition" makes it an open proposition, in considering
the Draft proposition as a strictly defined default approach and in
using a reserved mechanism as an organised hook for other
possibilities, R&D, experimentation, innovation, user protection
through encryption, multilingualism etc.

I would say that "closed" means that additions are to be made inside
the proposition through revisions (the WG-ltru has already planned an
RFC 3066 ter, RFC 3066 tetra, etc) An "open" proposition permits any
addition to be build on top. Usually an appliance is a closed system,
while a computer is an open one.

If you want to compare with OS, you will have Windows, Linux, QNX
from closed to open systems. In networking the same graduation will
be from centralised, decentralised, to distributed (no need to say
that my vision of the Internet is more user-centric than
central-registry centric: my work is over the granular distribution
of personalised but consistent registries. This is in-line with
TC32/ISO 11179 the WG-ltru refused to consider, an area where IETF
starts having a real experience (ISO lacks) with the URI (IRI, MRI) tags.

I hope this is clearer.
jfc
Bill Fenner
2005-08-25 13:23:12 UTC
Permalink
JFC,

In March, 1995, when RFC 1766 was published, the BCP track did not exist.
The Standards Track was being used for things that were not protocols
and did not fit well into the 3-stage process. Since BCPs are subject to
the same consensus judging and scrutiny as standards-track documents, it's
been common practice to obsolete old standards-track documents with BCPs
when it's reasonable to think that the original document would have been
a BCP if BCPs had existed at the time.

Bill
David Hopwood
2005-08-24 15:34:22 UTC
Permalink
JFC (Jefsey) Morfin wrote:
> I would like to understand why
> http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
> claims to be a BCP: it introduces a standard track proposition,
> conflicting with current practices and development projects under way?

I've read this draft and see nothing wrong with it. Having a fixed,
unambiguous way to parse the elements of a language tag is certainly
a good idea. What specific current practices do you think it conflicts
with?

> I support it as a transition standard track RFC needed by some, as long
> as it does not exclude more specific/advanced language identification
> formats, processes or future IANA or ISO 11179 conformant registries.

The grammar defined in the draft is already flexible enough.

--
David Hopwood <***@blueyonder.co.uk>
JFC (Jefsey) Morfin
2005-08-25 01:39:03 UTC
Permalink
At 17:34 24/08/2005, David Hopwood wrote:
>JFC (Jefsey) Morfin wrote:
>>I would like to understand why
>>http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
>>claims to be a BCP: it introduces a standard track proposition,
>>conflicting with current practices and development projects under way?
>
>I've read this draft and see nothing wrong with it. Having a fixed,
>unambiguous way to parse the elements of a language tag is certainly
>a good idea. What specific current practices do you think it conflicts
>with?

Dear David,
Before parsing the language tags many issues are to be considered
which have important consequences often out of the IETF scope (L8/9).

I could tell you I work on brain to brain interintelligibilty related
tools and projects for 25 years: the inadequation, the scarcity, the
centalised control of the proposed solution directly oppose the work
of my own R&D organisation. But you could object "too bad for you"
(we are used to that).

So, I will tell you something different. Today, the common practice
of nearly one billion of Internet users is to be able to turn off
cookies to protect their anonymous free usage of the web. Once the
Draft enters into action they will be imposed a conflicting privacy
violation: "tell me what you read, I will tell you who you are": any
OPES can monitor the exchange, extact these unambigous ASCII tags,
and know (or block) what you read. You can call these tags in google
and learn a lot about people. There is no proposed way to turn that
personal tagging off, nor to encode it.

>>I support it as a transition standard track RFC needed by some, as
>>long as it does not exclude more specific/advanced language
>>identification formats, processes or future IANA or ISO 11179
>>conformant registries.
>
>The grammar defined in the draft is already flexible enough.

(I suppose you mean more than just grammar. Talking of the ABNF is
probably clearer?).

I am certainly eager to learn how I can support modal information
(type of voice, accent, signs, icons, feelings, fount, etc.), medium
information, language references (for example is it plain, basic,
popular English? used dictionary, used software publisher), nor the
context (style, relation, etc.), nor the nature of the text (mono,
multilingual, human or machine oriented - for example what is the tag
to use for a multilingual file [printed in a language of choice]),
the date of the langtag version being used, etc.

The Draft relates language tags to a centraly controled and managed
registry. This is a deprecating concept as the Internet distributed
nature becomes more and more a reality. This is fully documented by
the RFC on URI tags. That RFC proposes some examples, using standard
Internet schemes. It would be great if you could show me how the
Draft can support them.

The Draft has introduced the "script" subtag in addition to RFC 3066
(what is an obvious change). However in order to stay "compatible"
with RFC 3066, author says it cannot introduce a specific support of
URI tags. This is why I would be more than gratefull if you could
show me how the ABNF is "already flexible enough" to support them.

Deep thanks.
jfc








>--
>David Hopwood <***@blueyonder.co.uk>
>
>
>_______________________________________________
>Ietf mailing list
>***@ietf.org
>https://www1.ietf.org/mailman/listinfo/ietf
>
David Hopwood
2005-08-25 22:40:56 UTC
Permalink
JFC (Jefsey) Morfin wrote:
> [...] Today, the common practice of
> nearly one billion of Internet users is to be able to turn off cookies
> to protect their anonymous free usage of the web. Once the Draft enters
> into action they will be imposed a conflicting privacy violation: "tell
> me what you read, I will tell you who you are": any OPES can monitor the
> exchange, extact these unambigous ASCII tags, and know (or block) what
> you read. You can call these tags in google and learn a lot about
> people. There is no proposed way to turn that personal tagging off, nor
> to encode it.

I don't know which browser you use, but in Firefox, I can configure exactly
which language tags it sends. If it were sending other information using
language tags as a covert channel (which it *could* do regardless of the
draft under discussion), I'd expect that to be treated as at least a bug,
and if it were a deliberate privacy violation, I'd expect that to cause a
big scandal.

>>> I support it as a transition standard track RFC needed by some, as
>>> long as it does not exclude more specific/advanced language
>>> identification formats, processes or future IANA or ISO 11179
>>> conformant registries.
>>
>> The grammar defined in the draft is already flexible enough.
>
> (I suppose you mean more than just grammar. Talking of the ABNF is
> probably clearer?).
>
> I am certainly eager to learn how I can support modal information (type
> of voice, accent, signs, icons, feelings, fount, etc.), medium
> information, language references (for example is it plain, basic,
> popular English? used dictionary, used software publisher), nor the
> context (style, relation, etc.), nor the nature of the text (mono,
> multilingual, human or machine oriented - for example what is the tag to
> use for a multilingual file [printed in a language of choice]), the date
> of the langtag version being used, etc.

I mean that the grammar is flexible enough to encode any of the above
attributes (not that it would be useful or a good idea to encode most
of them).

> The Draft has introduced the "script" subtag in addition to RFC 3066
> (what is an obvious change). However in order to stay "compatible" with
> RFC 3066, author says it cannot introduce a specific support of URI
> tags.

This objection seems to be correct: URI tags include characters not
allowed by RFC 3066. But you could easily encode the equivalent information
to an URI tag, if you wanted to.

--
David Hopwood <***@blueyonder.co.uk>
JFC (Jefsey) Morfin
2005-08-26 00:55:52 UTC
Permalink
On 00:40 26/08/2005, David Hopwood said:
>JFC (Jefsey) Morfin wrote:
>>[...] Today, the common practice of nearly one billion of Internet
>>users is to be able to turn off cookies to protect their anonymous
>>free usage of the web. Once the Draft enters into action they will
>>be imposed a conflicting privacy violation: "tell me what you read,
>>I will tell you who you are": any OPES can monitor the exchange,
>>extact these unambigous ASCII tags, and know (or block) what you
>>read. You can call these tags in google and learn a lot about
>>people. There is no proposed way to turn that personal tagging off,
>>nor to encode it.
>
>I don't know which browser you use, but in Firefox, I can configure exactly
>which language tags it sends. If it were sending other information using
>language tags as a covert channel (which it *could* do regardless of the
>draft under discussion), I'd expect that to be treated as at least a bug,
>and if it were a deliberate privacy violation, I'd expect that to cause a
>big scandal.

Dear David,
the privacy problem is the "what you read, who you are" intelligence
leak. Today langtags are not yet much used (say the W3C people in the
WG-ltru) when compared with what they should in XML, HTML, etc. This
is all what this proposition is about. This proposition is to give
_one_shot_ in a _standardised_ way the language, the script and the
country. It uses for that ISO codes. ISO never wanted to propose such
a code where:

ar-arab-us are texts destined the people interested in US Arabic
community issues.
iw-hebr-ru are texts destined to people interested in Jewish Russian
community,
etc.

When you browser accept that langtags and you pursue the relation,
this structured information can be filtered by ISP (for police,
censoring, intelligence gathering, etc.) to know about their users.
It can be used for searches on a large scale in search engines to
know the mail you responded, etc. I suppose that in most of the world
countries this is subject to privacy laws. I think that in France it
is subject to the anti-racist law (the one used against Yahoo a few years ago).

The problem is that there is no way for the _receiver_to turn it
down. This is potentially dangerous spam: it is a digital information
I never asked for, which discloses information on me.

Is that a reason why to kill the Draft? I do not think so, but it
certainly shows the complexity of the issue - and the lack of
preparation of the Draft (I proposed the Security section to better
warn about the problem). IETF proposes a solution: it is the OPES. An
OPES on the host side can remove the langtags or to encrypt them at
the request of the reader, without a change on the host. I tried to
make the WG-ltru understand that not considering/reminding OPES at
the same time as documenting langtags is criminal.

This is why the default proposition I make (the Draft's ABNF and
system being considered as a starting default proposition, and hooks
open to IRI Tags adapted to each situation at the decision of the
user or of services he trusts).

Let take the case above. A service provider can propose an OPES
service, changing "he-hebr-us" into "x-abcf" and an internal OPES
plug-in to the user to restore x-abcf into he-hebr-us, so his
libraries work. And mani L9 organisations/Governments are satisfied.
He can even provide dynamically updated langtag aliases. However, a
good service should warranty the service is conflict free. This is no
problem if the langtag alias is x-service.com:abcf (conforming with
URI Tag RFC), but this is forbidden by the Draft. My proposition is
to use "0-" has a hook to specific format, so the Draft ABNF is fully
respected.

In that case "0-service.com:abcf will be not rise an error. And will
not conflict with the people using the default format (the format
proposed by the Draft). The interest of "0-" is that it can be
multilingual, so the hook can work in ASCII but also in punycode, and
in any script. It can also be entirerly numeric and possibly refer
directly to an IPv6 address, making the scheme DN independent.

>>>>I support it as a transition standard track RFC needed by some,
>>>>as long as it does not exclude more specific/advanced language
>>>>identification formats, processes or future IANA or ISO 11179
>>>>conformant registries.
>>>
>>>The grammar defined in the draft is already flexible enough.
>>(I suppose you mean more than just grammar. Talking of the ABNF is
>>probably clearer?).
>>I am certainly eager to learn how I can support modal information
>>(type of voice, accent, signs, icons, feelings, fount, etc.),
>>medium information, language references (for example is it plain,
>>basic, popular English? used dictionary, used software publisher),
>>nor the context (style, relation, etc.), nor the nature of the text
>>(mono, multilingual, human or machine oriented - for example what
>>is the tag to use for a multilingual file [printed in a language of
>>choice]), the date of the langtag version being used, etc.
>
>I mean that the grammar is flexible enough to encode any of the
>above attributes (not that it would be useful or a good idea to encode most
>of them).

hmmm.... you take the responsibility of both declarations :-)
- that you _can_ encode it. But you do not provide examples.
- that it would not be useful or a good idea to encode basic content
object attributes.

>>The Draft has introduced the "script" subtag in addition to RFC
>>3066 (what is an obvious change). However in order to stay
>>"compatible" with RFC 3066, author says it cannot introduce a
>>specific support of URI tags.
>
>This objection seems to be correct: URI tags include characters not
>allowed by RFC 3066.

Then? The purpose of this work is to address the limitations of RFC
3066. URI tags did not exist when RFC 3066 was written. Do you mean
for example that langtags are to be ASCII only because RFC 3066 was ASCII only?

> But you could easily encode the equivalent information to an URI
> tag, if you wanted to.

please document how do you do, while respecting the hybrid format of
the proposed ABNF where information is not indentified by fixed
position, but also relative position and size, with "-" as sole
separator. And they want to keep labels between "-" 8 characters
long. Tell me how you support IDNs.

Let suppose that I have "lang-tags.org:" as a scheme.
or "xn--abcdef.com:". Tell me how you support them
jfc
Bruce Lilly
2005-08-28 14:15:00 UTC
Permalink
> Date: 2005-08-25 20:55
> From: "JFC (Jefsey) Morfin" <***@jefsey.com>

> the privacy problem is the "what you read, who you are" intelligence
> leak.

That is to some extent true of any negotiation mechanism and negotiated
value.

> Today langtags are not yet much used (say the W3C people in the
> WG-ltru) when compared with  what they should in XML, HTML, etc.

XML, HTML, etc. are not IETF protocols and should not be the main
consideration in IETF work on IETF documents, especially as language tags
are heavily used by IETF protocols, notably MIME (RFCs 2045, 2047, 2231,
3282) and widely-deployed core IETF application protocols which use MIME
(e.g. the Internet Message Format and its applications (email, news, voice
messaging, EDI, etc.) and HTTP and applications using HTTP as a substrate.

> This
> is all what this proposition is about. This proposition is to give
> _one_shot_ in a _standardised_ way the language, the script and the
> country.

This was discussed during Last Call of the previous non-IETF (individual
submission) attempt. IIRC David Singer brought up several examples of
other pieces of information (e.g. legal/copyright variations) that could
also be negotiated and which might affect the presentation of content (or
choice among alternative content). Lumping all of these separate items into
one tag is a poor design as it impedes negotiation and tends toward lengthy
tags which are incompatible with fixed-length mechanisms such as MIME
encoded-words. While there is some mention of this issue in the document
under discussion, its treatment and resolving the underlying issue in a
manner that would minimize the problems are lacking.

> It uses for that ISO codes. ISO never wanted to propose such
> a code where:
>
> ar-arab-us are texts destined the people interested in US Arabic
> community issues.
> iw-hebr-ru are texts destined to people interested in Jewish Russian
> community,
> etc.
>
> When you browser accept that langtags and you pursue the relation,
> this structured information can be filtered by ISP (for police,
> censoring, intelligence gathering, etc.) to know about their users.
> It can be used for searches on a large scale in search engines to
> know the mail you responded, etc. I suppose that in most of the world
> countries this is subject to privacy laws. I think that in France it
> is subject to the anti-racist law (the one used against Yahoo a few years ago).

Let's separate three issues:
1. privacy
2. tagging
3. negotiation

The privacy issue exists whenever any information is conveyed; the user
needs to balance privacy concerns with facilitation of communication.
Mechanisms such as TLS can be used to limit the visibility of the information
to the end points of communication; ultimately it boils down to a matter of
trust in the end-point partner in the communication exchange. I believe
that the issue is dealt with adequately in the security considerations
section of the document under discussion (some mention of transport-level
protection of privacy would be welcome).

Tagging identifies characteristics of a particular piece of content. For
that purpose alone, it makes little difference (other than regarding the
aforementioned compatibility issues with existing IETF mechanisms) whether
the characteristics are lumped or separate. There are existing IETF
mechanisms which permit handling of either lumped or individual characteristics
(e.g. the extensible header field mechanism of RFC 2045 and the "feature/filter"
mechanism of RFC 2533/2738/2912). Tagging per se identifies characteristics
of content. While that may be used to infer something about the content
provider, such inferences may be unreliable, particularly for providers that
support a wide variety of characteristics for the content in question.

Negotiation of characteristics is where several issues arise. One such
issue, as discussed here in December 2004/January 2005 relates to an
algorithm for matching content characteristics (e.g. between a particular
piece of content and a specified range of acceptance (as in an RFC 3282
Accept-Language field). RFC 3066 skirted that issue as it stopped short of
specification of an algorithm, and as it specified a mere two particular
characteristics (language per se, and country) which could be combined in
a tag. That was not true of the individual submission, which combined at
least 5 characteristics and specified an algorithm. As a result of issues
with that approach, the LTRU WG was established with a charter to produce a
BCP (for registration procedures) and a separate Standards Track document
for topics such as algorithms which are unsuitable for BCP. A related issue
is the interaction of the established negotiation mechanism (viz. the RFC
3282 Accept-Language field) and potential use of the other (feature/filter)
mechanism for negotiation. The Accept-Language field provides for
specification of language ranges and for associating a preference value
with specific languages (as defined in RFC 3066) or ranges. The proposed
mechanism in the individual submission of late last year (essentially
unchanged in the LTRU product (see discussion below)) does not address the
language range issue, and that issue is greatly complicated by conflating
separate characteristics into a single tag. Addressing the language range
issue is not a WG work item and, unfortunately, the algorithm issue is
scheduled to be a later work item than the registry issue. Added to that
is the fact that the specification of the tag format is mixed with
registration procedures. Negotiation of separate characteristics is much
simpler than that of a combined conflation of characteristics; each
characteristic can be assigned separate preference values, and irrelevant
characteristics (e.g. script w.r.t. spoken language) can be easily ignored.

As negotiation and related issues represent a critical technical issue for
the design of language tags (viz. keeping separate characteristics out of
*language* tags), it is essential that such negotiation issues be considered
carefully before specifying the format of tags. Unfortunately, that has not
been done, and considering the published WG milestones it appears that that
issue has not been taken into consideration. It should be pointed out that
such issues have been raised, both in the discussion during Last Call of the
individual submission and as a result of subsequent work. However, it
appears that the WG has not considered the issues, with the effect that the
WG product lacks the "particular care" expected of BCP documents (RFC 2026).
Note that it is not the registration procedural issues that are typical of
BCP documents that are problematic; rather it is the conflation of separate
characteristics into a single tag syntax, specified in the same document,
which raises problems related to content negotiation.

Part of the problem is the scheduling of WG work items as noted above
(viz. negotiation issues are critical to design of tag syntax, and should not
have been deferred until after syntax specification). Another large part of
the problem is WG management; in addition to the issues raised by John
Klensin the last time that LTRU participation was discussed on the IETF
discussion list -- and with which I wholeheartedly agree -- it appears that
management of WG participant conduct has been rather lax; proponents of the
individual submission effort who are participating in the WG tend to resort
to ad-hominem attacks when a problem is identified or when an alternative
approach is raised, with no visible intervention by the WG co-chairs. That
has also (i.e. in addition to the factors which John identified) had the
effect of limiting WG participation by individuals.

Specification of "language" tag syntax which conflates other content
characteristics prior to open and professional discussion of negotiation
issues and alternative approaches would be a premature lock-in of a design
choice. As the document under discussion specifies a conflation of such
characteristics without open discussion -- indeed hampered by unchecked
unprofessional conduct -- it should not be approved as BCP in its current
form. Separation of syntax specification to a separate document, to be
specified after due consideration of negotiation issues, leaving purely
procedural issues of registration, would be one approach to enable making
a decision on BCP registration procedures independently of an in advance of
a concrete specification of negotiation issues and tag syntax. However,
as it stands, the document cannot be evaluated for soundness of the tag
syntax design in the absence of a specification that addresses negotiation
issues (in a backwards-compatible manner with the existing negotiation
mechanisms (viz. MIME Content- and Accept- fields and feature/filter
negotiation).

Therefore, at minimum, I recommend that the IESG defer a decision on the
subject document until such time as the full impact of the early design
choice to conflate multiple characteristics into a single tag can be fully
evaluated w.r.t. proposed matching algorithms and impact on existing
IETF-approved negotiation mechanisms. Revision to move the syntax
specification to a separate document, as mentioned above, would permit
evaluation of the registration procedures per se independently of such
concerns, and would be one way to move forward on those registration
procedures quickly (i.e. independently of analysis of the syntax design)
if that is deemed desirable.

Aside form that, the IESG (via the cognizant ADs) should address the issues
of WG charter work items and milestones as they relate to consideration of
negotiation issues prior to locking down a tag syntax specification, should
emphasize the importance of backwards compatibility with established,
approved, and widely deployed IETF protocols and mechanisms, and should
discuss WG participant conduct (viz. ad-hominem attacks) and mailing list
issues (as identified by JCK) with the WG co-chairs.
Frank Ellermann
2005-08-28 20:25:31 UTC
Permalink
Bruce Lilly wrote:

> While there is some mention of this issue in the document
> under discussion, its treatment and resolving the underlying
> issue in a manner that would minimize the problems are
> lacking.

That's a last call, if you have better ideas than those in the
draft speak up. Your Content-Script idea is good, but won't
help e.g. in encoded words (2047+2231). We definitely tried
to minimize especially this problem.

This is a ready-for-Bruce's-review draft as far as I can judge
this, but for obvious reasons only you can really judge it. ;-)

> Addressing the language range issue is not a WG work item
> and, unfortunately, the algorithm issue is scheduled to be a
> later work item than the registry issue.

Only my personal view of course, but the matching draft offers
a syntactical form for ranges, and the Suppress-Script in the
registry provides for backwards compatibility where possible.

> it is essential that such negotiation issues be considered
> carefully before specifying the format of tags.
> Unfortunately, that has not been done

IBTD, we considered the script issues. Anything else is as
good or bad as it is with 3066 - some minor problems fixed of
course, if ISO 3166-1 pulls another CS 3066bis will handle it
better than 3066 (no potential worldwide retagging confusion).

> the WG product lacks the "particular care" expected of BCP
> documents (RFC 2026).

IBTD as far as scripts are concerned.

> it appears that management of WG participant conduct has been
> rather lax

IBTD, the WG Chairs and the responsible AD did a very good job.

> as it stands, the document cannot be evaluated for soundness
> of the tag syntax design in the absence of a specification
> that addresses negotiation issues (in a backwards-compatible
> manner with the existing negotiation mechanisms (viz. MIME
> Content- and Accept- fields and feature/filter negotiation).

IBTD, see above. Your idea to split tag and registry syntax
from all procedural aspects of tag registration is possible,
but you get the same effect by "ignore the procedural stuff
in chapter 3" (= about 14 of the 60 pages in the draft).

> Revision to move the syntax specification to a separate
> document, as mentioned above, would permit evaluation of the
> registration procedures per se

You can also read chapter 3 per se, the mentioned 14 pages
plus 3.1 as introduction (5 pages, format of the registry).

I'm not violently against splitting the document, but it's
IMHO unnecessary.
Bye, Frank (also posted on the LTRU list)
JFC (Jefsey) Morfin
2005-08-29 00:33:30 UTC
Permalink
Dear Bruce,
I will try to quickly comment/respond/suggest on some of your well made points.

At 16:15 28/08/2005, Bruce Lilly wrote:
> > Date: 2005-08-25 20:55
> > From: "JFC (Jefsey) Morfin" <***@jefsey.com>
> > the privacy problem is the "what you read, who you are" intelligence
> > leak.
>
>That is to some extent true of any negotiation mechanism and negotiated
>value.

True. The problem are:
- the unecessary accumulation of orthogonal information
- the easily identified characteristic format: an enormous difference
between "xx-xx-xxx-xx" (Draft) and "xxxx" (ISO 639-6)
- the lack of alternative (are we sure there are no other
architectural way to address the same need without information leak)
- the lack of encryption
- the "spam" aspect: I am imposed to receive the langtag.

> > Today langtags are not yet much used (say the W3C people in the
> > WG-ltru) when compared with what they should in XML, HTML, etc.
>
>XML, HTML, etc. are not IETF protocols and should not be the main
>consideration in IETF work on IETF documents,

They are specifically quoted by the Charter. Also is CLDR a private
proposition to unify "locale" file which has interest but also competition.

>especially as language tags
>are heavily used by IETF protocols, notably MIME (RFCs 2045, 2047, 2231,
>3282) and widely-deployed core IETF application protocols which use MIME
>(e.g. the Internet Message Format and its applications (email, news, voice
>messaging, EDI, etc.) and HTTP and applications using HTTP as a substrate.

RFC 2231 is among the reference quoted. I more interested in R&D. My
concern is that OPES have been disregarded.

> > This
> > is all what this proposition is about. This proposition is to give
> > _one_shot_ in a _standardised_ way the language, the script and the
> > country.
>
>This was discussed during Last Call of the previous non-IETF (individual
>submission) attempt. IIRC David Singer brought up several examples of
>other pieces of information (e.g. legal/copyright variations) that could
>also be negotiated and which might affect the presentation of content (or
>choice among alternative content). Lumping all of these separate items into
>one tag is a poor design as it impedes negotiation and tends toward lengthy
>tags which are incompatible with fixed-length mechanisms such as MIME
>encoded-words. While there is some mention of this issue in the document
>under discussion, its treatment and resolving the underlying issue in a
>manner that would minimize the problems are lacking.

The work we carried on language in a common reference center (where
are stored the common parameter of a relation) shown us that must be
included in negociation two classes of additional information. The
parameters in the community (we call referent: i.e. dictionary, etc)
and the context of the exchange (style, personal meanings,
circumstances, etc.). These elements are necessary for OPES call-out
servers supervising a relation. These elements are by default used by
... Word (language, script, country, dictionary, style).

The Draft proposes a system which permits to evaluate the locale the
computer should support for end to end interoperability purposes. It
does not necessarily permit to establish, maintain and serve a brain
to brain interintellibility.

>Let's separate three issues:
>1. privacy
>2. tagging
>3. negotiation
>
>The privacy issue exists whenever any information is conveyed; the user
>needs to balance privacy concerns with facilitation of communication.
>Mechanisms such as TLS can be used to limit the visibility of the information
>to the end points of communication; ultimately it boils down to a matter of
>trust in the end-point partner in the communication exchange. I believe
>that the issue is dealt with adequately in the security considerations
>section of the document under discussion (some mention of transport-level
>protection of privacy would be welcome).

Not really: see above. The concept is an help to privacy violation:
- more secure alternatives should be investigated and proposed
- the danger is not worth the result, necessary information is missing.

>Tagging identifies characteristics of a particular piece of content. For
>that purpose alone, it makes little difference (other than regarding the
>aforementioned compatibility issues with existing IETF mechanisms) whether
>the characteristics are lumped or separate. There are existing IETF
>mechanisms which permit handling of either lumped or individual
>characteristics
>(e.g. the extensible header field mechanism of RFC 2045 and the
>"feature/filter"
>mechanism of RFC 2533/2738/2912). Tagging per se identifies characteristics
>of content. While that may be used to infer something about the content
>provider, such inferences may be unreliable, particularly for providers that
>support a wide variety of characteristics for the content in question.

This confusion will be an increasing problem. More and more the
"architext" we use (the data from which we infer the text we read)
become intelligent and multilingual. I currently use a site
multilingual generator. This means that it uses multilingual texts to
generate unilingual version of a web site. It uses a default langtag
scheme (:xxx) to indicate the language of the lingual parts.

>Negotiation of characteristics is where several issues arise. One such
>issue, as discussed here in December 2004/January 2005 relates to an
>algorithm for matching content characteristics (e.g. between a particular
>piece of content and a specified range of acceptance (as in an RFC 3282
>Accept-Language field). RFC 3066 skirted that issue as it stopped short of
>specification of an algorithm, and as it specified a mere two particular
>characteristics (language per se, and country) which could be combined in
>a tag. That was not true of the individual submission, which combined at
>least 5 characteristics and specified an algorithm. As a result of issues
>with that approach, the LTRU WG was established with a charter to produce a
>BCP (for registration procedures) and a separate Standards Track document
>for topics such as algorithms which are unsuitable for BCP. A related issue
>is the interaction of the established negotiation mechanism (viz. the RFC
>3282 Accept-Language field) and potential use of the other (feature/filter)
>mechanism for negotiation. The Accept-Language field provides for
>specification of language ranges and for associating a preference value
>with specific languages (as defined in RFC 3066) or ranges. The proposed
>mechanism in the individual submission of late last year (essentially
>unchanged in the LTRU product (see discussion below)) does not address the
>language range issue, and that issue is greatly complicated by conflating
>separate characteristics into a single tag. Addressing the language range
>issue is not a WG work item and, unfortunately, the algorithm issue is
>scheduled to be a later work item than the registry issue.

The language negociation issue is independent from any language
identfier format. But obviously langtag formats may or not better
serve language negociation.

>Added to that is the fact that the specification of the tag format
>is mixed with
>registration procedures. Negotiation of separate characteristics is much
>simpler than that of a combined conflation of characteristics; each
>characteristic can be assigned separate preference values, and irrelevant
>characteristics (e.g. script w.r.t. spoken language) can be easily ignored.

At this stage many negociation elements are missing. The elements
related to the referent and to the context are missing. For example a
traveler will accept more easily a foreign language when it comes to
the location he tours (context). And a professional when it comes to
a technical discussion (referent). All the more than terminology OPES
services or on the fly traduction assistance can be provided

>As negotiation and related issues represent a critical technical issue for
>the design of language tags (viz. keeping separate characteristics out of
>*language* tags), it is essential that such negotiation issues be considered
>carefully before specifying the format of tags. Unfortunately, that has not
>been done, and considering the published WG milestones it appears that that
>issue has not been taken into consideration. It should be pointed out that
>such issues have been raised, both in the discussion during Last Call of the
>individual submission and as a result of subsequent work. However, it
>appears that the WG has not considered the issues, with the effect that the
>WG product lacks the "particular care" expected of BCP documents (RFC 2026).

It is to note that ISO 639-4 work is about discussing guidelines in
that area. This work is under way and was not considered.

>Note that it is not the registration procedural issues that are typical of
>BCP documents that are problematic; rather it is the conflation of separate
>characteristics into a single tag syntax, specified in the same document,
>which raises problems related to content negotiation.
>
>Part of the problem is the scheduling of WG work items as noted above
>(viz. negotiation issues are critical to design of tag syntax, and should not
>have been deferred until after syntax specification). Another large part of
>the problem is WG management; in addition to the issues raised by John
>Klensin the last time that LTRU participation was discussed on the IETF
>discussion list -- and with which I wholeheartedly agree -- it appears that
>management of WG participant conduct has been rather lax; proponents of the
>individual submission effort who are participating in the WG tend to resort
>to ad-hominem attacks when a problem is identified or when an alternative
>approach is raised, with no visible intervention by the WG co-chairs. That
>has also (i.e. in addition to the factors which John identified) had the
>effect of limiting WG participation by individuals.

I will not object that remark. The advantage was that proposing an
alternative approach resulted in an improvement of the ABNF to
impeach it. The result is a relatively clean default ABNF which now
permits to avoid confusion with specific solutions introduced by
reserved singleton. This permits to support:
- my Draft as a default proposition
- to specify easily other formats and conceptions (such as based upon
ISO 639-6, or ISO 11179 conformant, etc.) without risking conflicts.

>Specification of "language" tag syntax which conflates other content
>characteristics prior to open and professional discussion of negotiation
>issues and alternative approaches would be a premature lock-in of a design
>choice. As the document under discussion specifies a conflation of such
>characteristics without open discussion -- indeed hampered by unchecked
>unprofessional conduct -- it should not be approved as BCP in its current
>form. Separation of syntax specification to a separate document,

Yes!!!

>to be specified after due consideration of negotiation issues, leaving purely
>procedural issues of registration,

Yes!!! supporting multimodal competences (not only script, but also
signs, voice, icons, moods, style, etc.)

> would be one approach to enable making
>a decision on BCP registration procedures independently of an in advance of
>a concrete specification of negotiation issues and tag syntax. However,
>as it stands, the document cannot be evaluated for soundness of the tag
>syntax design in the absence of a specification that addresses negotiation
>issues (in a backwards-compatible manner with the existing negotiation
>mechanisms (viz. MIME Content- and Accept- fields and feature/filter
>negotiation).
>
>Therefore, at minimum, I recommend that the IESG defer a decision on the
>subject document until such time as the full impact of the early design
>choice to conflate multiple characteristics into a single tag can be fully
>evaluated w.r.t. proposed matching algorithms and impact on existing
>IETF-approved negotiation mechanisms.

At that time we should have running services. ISO 639-6 authors just
announced that sample table will be available in Novembre. And ISO
639-3 author expects it to be published by the end of the year. The
we can start experimentation. Locking the multilingual internet core
system into a final ABNF seems premature.

> Revision to move the syntax
>specification to a separate document, as mentioned above, would permit
>evaluation of the registration procedures per se independently of such
>concerns, and would be one way to move forward on those registration
>procedures quickly (i.e. independently of analysis of the syntax design)
>if that is deemed desirable.
>
>Aside form that, the IESG (via the cognizant ADs) should address the issues
>of WG charter work items and milestones as they relate to consideration of
>negotiation issues prior to locking down a tag syntax specification, should
>emphasize the importance of backwards compatibility with established,
>approved, and widely deployed IETF protocols and mechanisms,

and documented efforts such as OPES, document the way langtags will
be used and their applications documented.
jfc
Bruce Lilly
2005-08-28 14:42:53 UTC
Permalink
> Date: 2005-08-25 20:55
> From: "JFC (Jefsey) Morfin" <***@jefsey.com>

> On 00:40 26/08/2005, David Hopwood said:

> >This objection seems to be correct: URI tags include characters not
> >allowed by RFC 3066.
>
> Then? The purpose of this work is to address the limitations of RFC
> 3066. URI tags did not exist when RFC 3066 was written.

RFC 1738 certainly existed, not only at the time of RFC 3066, but its
predecessor RFC 1766 as well.

> please document how do you do, while respecting the hybrid format of
> the proposed ABNF where information is not indentified by fixed
> position, but also relative position and size, with "-" as sole
> separator. And they want to keep labels between "-" 8 characters
> long. Tell me how you support IDNs.
>
> Let suppose that I have "lang-tags.org:" as a scheme.
> or "xn--abcdef.com:". Tell me how you support them

It's unclear what you're trying to get at here. A URI scheme is a
protocol element (an "assigned number") registered by IANA, not a
piece of text (see RFCs 1958 and 2277). As such, it has no need of
an indication of language, for it has no language; it is a language-
independent protocol element. Confusing protocol element issues with
language will only muddy the water; try to stay focused on real
problems.

For that matter, DNS labels are public names (i.e. protocol elements,
again see RFC 1958 (sect. 4.3, noting that "text" there has a different
meaning than in RFC 2277)) and as such there should not have been any
reason to overload the semantics and baggage of internationalized
text (in the RFC 2277 sense). Now, having made the decision to
nevertheless do so, you might well point out that per RFC 2277, there
ought to be a means of indicating language in IDNs. However, that is
primarily an issue with the IDN specification(s), not with the document
under discussion (except to the extent that the document under
discussion extends the likely length of tags in a way that is likely to
conflict with the DNS label length and domain name length limits, *if*
there were in fact provision in IDN for the use of language tags. You
might also point out that as IDNs use utf-8 exclusively as a charset, and
as script is easily inferred from the Unicode code points corresponding
to utf-8, that the length-increasing provision for conflating script with
language would be unnecessary and redundant *if* IDNs had provision for
language tags. But IDNs have no such provision at this time.
JFC (Jefsey) Morfin
2005-08-28 23:14:09 UTC
Permalink
At 16:42 28/08/2005, Bruce Lilly wrote:
> > Date: 2005-08-25 20:55
> > From: "JFC (Jefsey) Morfin" <***@jefsey.com>
> > please document how do you do, while respecting the hybrid format of
> > the proposed ABNF where information is not indentified by fixed
> > position, but also relative position and size, with "-" as sole
> > separator. And they want to keep labels between "-" 8 characters
> > long. Tell me how you support IDNs.
> >
> > Let suppose that I have "lang-tags.org:" as a scheme.
> > or "xn--abcdef.com:". Tell me how you support them
>
>It's unclear what you're trying to get at here. A URI scheme is a
>protocol element (an "assigned number") registered by IANA, not a
>piece of text (see RFCs 1958 and 2277).

A proposition I received from the WG-ltru was to slice URI tags into
8 aphanum labels, etc. Among many problems URI tags accept domain
names, mail names and IP addresses as registered identifiers. The
main problem with the ABNF is therefore the use of "-" as a separator
since "-" is a legitimate character in domain name. The support of
IRI tags is impossible.

Our work is on CRC (common reference centers). offering such
references. URI tags are the correct solution (with some restrictions
we will probably document once the RFC is published)

>As such, it has no need of
>an indication of language, for it has no language; it is a language-
>independent protocol element. Confusing protocol element issues with
>language will only muddy the water; try to stay focused on real
>problems.
>
>For that matter, DNS labels are public names (i.e. protocol elements,
>again see RFC 1958 (sect. 4.3, noting that "text" there has a different
>meaning than in RFC 2277)) and as such there should not have been any
>reason to overload the semantics and baggage of internationalized
>text (in the RFC 2277 sense). Now, having made the decision to
>nevertheless do so, you might well point out that per RFC 2277, there
>ought to be a means of indicating language in IDNs. However, that is
>primarily an issue with the IDN specification(s), not with the document
>under discussion (except to the extent that the document under
>discussion extends the likely length of tags in a way that is likely to
>conflict with the DNS label length and domain name length limits, *if*
>there were in fact provision in IDN for the use of language tags.

Fully correct. The response is with the approved pending URI Tags RFC.
IRT IDN, as a "multilingualiser" I disapprove IDNA. But whatever the
final solution the MLDN charsets may use a langtag like solution.
Hence the interest. Another interest is that currently the IANA uses
RFC 3066 language tags to identify the IDN tables. What is IMHO an error.

>You
>might also point out that as IDNs use utf-8 exclusively as a charset, and
>as script is easily inferred from the Unicode code points corresponding
>to utf-8, that the length-increasing provision for conflating script with
>language would be unnecessary and redundant *if* IDNs had provision for
>language tags. But IDNs have no such provision at this time.

Correct. The MLDN problem is IMHO a different issue. However I say:

1. a langtag may be associated to a locale (this is in the WG-ltru
Charter [Unicode CLDR project and our own ISO 11179 related solution])
2. we think there should be DNS locale for some important sites and services
3. DNS locale could also be the proper place to distribute MLDN
virtual zones charsets (several concepts to discuss, specify and deploy).

jfc
Peter Constable
2005-08-29 00:50:14 UTC
Permalink
> From: Bruce Lilly <***@erols.com>


> It's unclear what you're trying to get at here. A URI scheme is a
> protocol element (an "assigned number") registered by IANA, not a
> piece of text (see RFCs 1958 and 2277). As such, it has no need of
> an indication of language, for it has no language; it is a language-
> independent protocol element.

This point was made in response to Mr. Morfin on more than one occasion
within the LTRU WG. He appears to be unwilling to accept it, however.


> ought to be a means of indicating language in IDNs. However, that is
> primarily an issue with the IDN specification(s), not with the
document
> under discussion (except to the extent that the document under
> discussion extends the likely length of tags

In comparison to RFC 3066, the draft does not extend the likely length
of tags. The likely length of tags is precisely the same as before; the
main difference is that this draft imposes significant structural
constraints on tags.



Peter Constable
JFC (Jefsey) Morfin
2005-08-29 01:46:39 UTC
Permalink
At 02:50 29/08/2005, Peter Constable wrote:
> > From: Bruce Lilly <***@erols.com>
> > It's unclear what you're trying to get at here. A URI scheme is a
> > protocol element (an "assigned number") registered by IANA, not a
> > piece of text (see RFCs 1958 and 2277). As such, it has no need of
> > an indication of language, for it has no language; it is a language-
> > independent protocol element.
>
>This point was made in response to Mr. Morfin on more than one occasion
>within the LTRU WG. He appears to be unwilling to accept it, however.

Peter ....
we all know you are worth better than that!

you just show that you ignore what URI Tags are. This is embarrassing
for you since your whole problem is the conflict of your proposition
with this accepted RFC....

> > ought to be a means of indicating language in IDNs. However, that is
> > primarily an issue with the IDN specification(s), not with the
>document
> > under discussion (except to the extent that the document under
> > discussion extends the likely length of tags
>
>In comparison to RFC 3066, the draft does not extend the likely length
>of tags.

There is a confusion between two lengths. The length of the tag and
the length of the subtags. The whole issue is that Peter's colleagues
want to consider private and specialised tags as subtags, and impose
them the size of a subtag instead of the size of a tag. This
is where is the exclusion. This trick cannot stay for long: but if
they get it accepted for now, this gives them time to establish their
positions.

This means that the legitimate URI tag:
"tags:x-tags.org:constable.english.x-tag.org"
must be accommodated into the format
"x-xxxxxxxx-xxxxxxxx-xxxxxxxx-etc." instead of
"0-x-tags.org:constable.english.x-tag.org"

This can only lead to a confusing deprecation of the RFC 3066bis
which will be replaced by a generalised IRI-tags solution. The
solution I propose consists only in accepting that what the Draft
would call "specialised subtags" have a size limited to the tag
length - 2, and an URI-tag charset.

NB. Since "x-" was used in RFC 3066 and it has been pointed to me
that a specific privateuse (in a VPN for example) could be needed, we
selected "0-" for an open format. Actually we suggest "1-" for an
encrypted format. No work has been yet carried on this.

>The likely length of tags is precisely the same as before; the
>main difference is that this draft imposes significant structural
>constraints on tags.

Absolutely. This is the area of contention.

Peter takes a loosely applied chancy non-exclusive proposition, to
make it the significantly constrained exclusive rule of the Internet
instead of correcting it and following the ISO innovation (ISO 639-6
and ISO 11179) as directed by the Charter. This permits him to
exclude competitive propositions following or preceding that innovation.

With the trick above: length and character wise a private tag is a subtag.
.... and the lack of explanation of how billions of machines will
know about the daily updated version of his 600 K file, without
anyone paying for it, but me and the like.

jfc
Peter Constable
2005-08-29 02:41:30 UTC
Permalink
> From: JFC (Jefsey) Morfin [mailto:***@jefsey.com]


> This means that the legitimate URI tag:
> "tags:x-tags.org:constable.english.x-tag.org"
> must be accommodated into the format
> "x-xxxxxxxx-xxxxxxxx-xxxxxxxx-etc." instead of
> "0-x-tags.org:constable.english.x-tag.org"

As I mentioned in another message, Mr. Morfin submitted a request to the
WG that the syntax in the draft be loosened to permit tags of the form
indicated, and that the consensus of everyone else in the WG was to
reject that request on the basis that (i) it would result in backward
incompatibility with existing processes designed to conform to RFC 3066,
and (ii) it was possible to create a scheme for semantically equivalent
tags without breaking compatibility with RFC 3066.


> Peter takes a loosely applied chancy non-exclusive proposition, to
> make it the significantly constrained exclusive rule of the Internet
> instead of correcting it and following the ISO innovation (ISO 639-6
> and ISO 11179) as directed by the Charter. This permits him to
> exclude competitive propositions following or preceding that
innovation.

The LTRU charter makes no reference whatsoever to ISO 639-6 or to ISO
11179. As I have explained elsewhere, Mr. Morfin's suggestion that the
draft is incompatible with ISO 11179 while his alternative would be
conformant is far from valid. Finally, I have not excluded competing
propositions; I was one voice among many that rejected a request to
permit "." and ":" in the syntax, and to my recollection no other
concrete proposal wrt syntax, let alone an overall system of metadata
elements, was submitted by Mr. Morfin to the WG.



> With the trick above: length and character wise a private tag is a
subtag.
> .... and the lack of explanation of how billions of machines will
> know about the daily updated version of his 600 K file, without
> anyone paying for it, but me and the like.

It is completely unclear on what basis Mr. Morfin is suggestion that
billions of machines will need to update "my" (?? I did not create it!)
600K file on a daily basis. There is no indication or likelihood that
the language subtag registry proposed by this draft will change with a
frequency approaching anything close to daily. Indeed, it is entirely
likely that it will change rather less frequently than the RFC 3066
registry was likely to change.



Peter Constable
JFC (Jefsey) Morfin
2005-08-29 08:19:17 UTC
Permalink
At 04:41 29/08/2005, Peter Constable wrote:
> > From: JFC (Jefsey) Morfin [mailto:***@jefsey.com]
> > This means that the legitimate URI tag:
> > "tags:x-tags.org:constable.english.x-tag.org"
> > must be accommodated into the format
> > "x-xxxxxxxx-xxxxxxxx-xxxxxxxx-etc." instead of
> > "0-x-tags.org:constable.english.x-tag.org"
>
>As I mentioned in another message, Mr. Morfin submitted a request to the
>WG that the syntax in the draft be loosened to permit tags of the form
>indicated, and that the consensus of everyone else in the WG was to
>reject that request on the basis that (i) it would result in backward
>incompatibility with existing processes designed to conform to RFC 3066,

Dear Peter,
thank you to repeat your argument so everyone understands that the
principle of the draft is:
- to keep conformity with what never existed before (by nature new
applications have never used RFC 3066)

>and (ii) it was possible to create a scheme for semantically equivalent
>tags without breaking compatibility with RFC 3066.

- repeating this ad nauseam in carefully avoiding to explain it does
not help. Just document this in the case of example above. Since it
is "possible to create a scheme for semantically equivalent tags"
spell out the resulting tag.

> Peter takes a loosely applied chancy non-exclusive proposition, to
> > make it the significantly constrained exclusive rule of the Internet
> > instead of correcting it and following the ISO innovation (ISO 639-6
> > and ISO 11179) as directed by the Charter. This permits him to
> > exclude competitive propositions following or preceding that
>innovation.
>
>The LTRU charter makes no reference whatsoever to ISO 639-6 or to ISO
>11179.

Repeating errors and response, makes that by-now the entire IETF must
know by core the Charter says: "[the WG] is also expected to provide
mechanisms to support the evolution of the underlying ISO standards".
It happens you were just kind enough to document the latest
"evolution", back from the yearly ISO meeting in Varsaw. I quote your mail:

<quote>
I'm on my way home from the TC 37 meeting in Warsaw, and thought I'd
give a quick update on projects in the ISO 639 series.

ISO 639-3 is being advanced to its last stage, FDIS. If things stay
on schedule, the FDIS ballot will be over – and ISO 639-3 will be an
ISO standard -- before the end of the year.

Work is progressing on ISO 639-4 and ISO 639-5. These are not
expected to have much impact on RFC 3066 or its successors, though
(unless we change our minds about wanting to use additional IDs for
collections).

An updated working draft for ISO 639-6 was made available just
before the meeting. The leads for that project have been working
through the impact of adopting the ISO 11179 framework for that
project. The working group gave them the go ahead to reference ISO
12620, which (roughly) is the TC 37 counterpart to ISO 11179. They
may still need to reference directly to ISO 11179 for certain
purposes. The team for that project will be preparing a new working
draft this fall for review by the working group, with the aim of
getting it reading for a CD ballot – so potentially a CD ballot will
be distributed before the end of the year.

</quote>

>As I have explained elsewhere, Mr. Morfin's suggestion that the
>draft is incompatible with ISO 11179 while his alternative would be
>conformant is far from valid.

The whole IETF must also know by core that I proposed the Draft to be
ISO 11179 conformant and I was opposed by an unanimous "consensus"
you documented. But I know that in repeating the same proposition you
usually decide the opposite. You already have documented that the
Draft was ISO 11179 compatible. I suppose you will soon tell it ISO
11179 conformant.

None will be happier than me. And - as I proposed you already several
times privately - I am fully ready to cooperate to make it a reality.

> Finally, I have not excluded competing
>propositions; I was one voice among many that rejected a request to
>permit "." and ":" in the syntax, and to my recollection no other
>concrete proposal wrt syntax, let alone an overall system of metadata
>elements, was submitted by Mr. Morfin to the WG.

The whole IETF must also know by core that I proposed, want and was
denied the support of URI-tags along the URI-tags RFC (unfortunately
the number of that RFC has not been allocated by the Editor). And
that I support a global compatibility in having your ABNF as a
default and the URi-tag area introduce by the "0-" reserved singleton.

> > With the trick above: length and character wise a private tag is a
>subtag.
> > .... and the lack of explanation of how billions of machines will
> > know about the daily updated version of his 600 K file, without
> > anyone paying for it, but me and the like.
>
>It is completely unclear on what basis Mr. Morfin is suggestion that
>billions of machines will need to update "my" (?? I did not create it!)

No offence in giving to Caesar what belongs to Caesar. A long work I
fully appreciate, with practical serious pros and some cons. But a
blockage over an exclusivity you cannot obtain and which is
detrimental to every interest you defend. Ruling the world on
something is fun. But it only works if you serve, not if you want to lead.

>600K file on a daily basis. There is no indication or likelihood that
>the language subtag registry proposed by this draft will change with a
>frequency approaching anything close to daily. Indeed, it is entirely
>likely that it will change rather less frequently than the RFC 3066
>registry was likely to change.

This file which includes ISO tables and will have to follow the ISO
table evolution. From the input of Doug Ewell its _initial_ version
withISO 639-6 will include 40.000 lines. So a change a day would
represent only 1% change, update and addition, for a registry
established to accept additions.

But, Peter, what you do not probably realise since the WG has not
worked on the matter, is how such a system is to work. We all have an
well established 22 years old experience in the area. This is the DNS
root. The DNS root is changed 60 times a year. However it is updated
twice daily. Why? Because whenever you access a copy of it, through
whatever mean and successive copy,you MUST know if the content is up-to--date.

So, however it may be updated far less than one correction a day,
what I think realistic from other tables, it needs to have the update
date changed every day. And one way or another every user must know
everyday (unless you still consider there are "end-users" of lesser
concern who may be more poorly served). One can find solutions to
that (I proposed monthly updates), one can devise other mechanisms,
one can use the DNS, etc. etc. But one has at least to allude to the
solution the IANA is to adopt. One did say "I create the list of the
ccTLDs: up to the IANA to imagine the DNS".

Interesting.
jfc
Doug Ewell
2005-08-29 14:35:13 UTC
Permalink
JFC (Jefsey) Morfin wrote:

> From the input of Doug Ewell its _initial_ version withISO 639-6 will
> include 40.000 lines.

I never said that. I would not make any statement about ISO 639-6 when
the data is not available.

I said that the deltas for the initial version with ISO 639-*3* would
add up to 35,700 lines -- and yes, that would put the grand total at
over 40,000 lines.

It's hard to follow this line of argument that a registry with 7,600
languages would be too large, and simultaneously that it should instead
support 20,000 languages, unles the object is to do away with the
registry altogether and have everyone invent new tags on the fly. What
good would such nonce tags be?

--
Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
Peter Constable
2005-08-29 03:00:38 UTC
Permalink
> From: "JFC (Jefsey) Morfin" <***@jefsey.com>

> >XML, HTML, etc. are not IETF protocols and should not be the main
> >consideration in IETF work on IETF documents,
>
> They are specifically quoted by the Charter. Also is CLDR...

These are cited in the charter only as examples in a statement to the
effect that "the RFC 3066 standard for language tags has been widely
adopted in various protocols and text formats..."



> It is to note that ISO 639-4 work is about discussing guidelines in
> that area. This work is under way and was not considered.

Mr. Morfin appears to me to have no more than a very vague sense of the
scope of ISO 639-4.



Peter Constable
JFC (Jefsey) Morfin
2005-08-29 08:09:56 UTC
Permalink
At 05:00 29/08/2005, Peter Constable wrote:
> > From: "JFC (Jefsey) Morfin" <***@jefsey.com>
>
> > >XML, HTML, etc. are not IETF protocols and should not be the main
> > >consideration in IETF work on IETF documents,
> >
> > They are specifically quoted by the Charter. Also is CLDR...
>
>These are cited in the charter only as examples in a statement to the
>effect that "the RFC 3066 standard for language tags has been widely
>adopted in various protocols and text formats..."

Yes then?
BTW is CLDR an IETF protocol? I tried to get the insurrence that
there would never be IPR attached to it. Never got it. Since it is a
way to introduce and stabilise a proprietary file in every Linux
system, I am interested in the license and in the warranty it will
never permit protected inclusions. All the more than I would prefer a
community proposition. May be a good solution would be a structural alliance?

My concern is also the magnitude of the project and the allocated
interest and the voluntaries. Also the legal responsibility in case
of error (I am only rising questions from Gov officials).

Nevertheless a lot of time has been spent on XML. And the only
compatibility which has been worked on is with XML librairies. I do
not oppose that (one of the authors puiblishes one), but I would
appreciate other protcols and processes such as OPES, DNS, computer
languages (I was opposed when discussing java), project (I docimented
CRC enough...) PPP, IANA, etc.have been considered.

> > It is to note that ISO 639-4 work is about discussing guidelines in
> > that area. This work is under way and was not considered.
>
>Mr. Morfin appears to me to have no more than a very vague sense of the
>scope of ISO 639-4.

This is somewhat fun as I am a contributor.
jfc
Doug Ewell
2005-08-29 14:39:36 UTC
Permalink
JFC (Jefsey) Morfin <jefsey at jefsey dot com> wrote:

> BTW is CLDR an IETF protocol? I tried to get the insurrence
> that there would never be IPR attached to it. Never got it.

I can't speak for anything having to do with ICU, but even if it were
IPR-protected, that does not mean all of the protocols and standards it
uses must themselves be IPR-protected.

--
Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
Mark Davis
2005-08-29 17:34:54 UTC
Permalink
One must not confuse ICU and CLDR:

- CLDR is a project of the Unicode Consortium.
- ICU is an open-source project, sponsored by IBM, that *uses* CLDR data.

In both cases, however, data and code is freely available, with attribution.

Mark

(You know this, Doug, just pointing out for others.)

Doug Ewell wrote:
> JFC (Jefsey) Morfin <jefsey at jefsey dot com> wrote:
>
>
>>BTW is CLDR an IETF protocol? I tried to get the insurrence
>>that there would never be IPR attached to it. Never got it.
>
>
> I can't speak for anything having to do with ICU, but even if it were
> IPR-protected, that does not mean all of the protocols and standards it
> uses must themselves be IPR-protected.
>
> --
> Doug Ewell
> Fullerton, California
> http://users.adelphia.net/~dewell/
>
>
>
> _______________________________________________
> Ltru mailing list
> ***@ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>
>
>
Peter Constable
2005-08-29 12:11:37 UTC
Permalink
> From: Bruce Lilly <***@erols.com>


> > This
> > is all what this proposition is about. This proposition is to give
> > _one_shot_ in a _standardised_ way the language, the script and the
> > country.
>
> This was discussed during Last Call of the previous non-IETF
(individual
> submission) attempt. IIRC David Singer brought up several examples of
> other pieces of information (e.g. legal/copyright variations) that
could
> also be negotiated and which might affect the presentation of content
(or
> choice among alternative content). Lumping all of these separate
items
> into
> one tag is a poor design as it impedes negotiation and tends toward
> lengthy
> tags which are incompatible with fixed-length mechanisms such as MIME
> encoded-words.

I agree that it would be poor design to incorporate other pieces of
information such as legal/copyright variations into language tags, but
as such pieces of information are not supported by the draft, this
appears to be irrelevant.

We should rather focus on whether it is good design to incorporate
information related to linguistic and written-form attributes, as
supported in the draft, into a single tag. The consensus of the LTRU
working group is that it is. For instance, the use of separate tags for
language and script were considered and rejected on the basis that the
two are not entirely orthogonal. Clear examples of this was considered:
while the intent of

Accept-Language: ar, az-Cyrl, ru

is clear, the intent of

Accept-Language: ar, az, ru
Accept-Script: Cyrl

or of

Accept-Language: ar, az, ru
Accept-Script: Arab, Cyrl

is not clear, nor is it obvious how rules could be specified that would
make the intent clear, or that would permit expressing the preferences
reflected in the first instance.

It was also the consensus of the WG that the concerns of fixed-length
mechanisms have been adequately addressed. This consensus was taken
after careful consideration of IETF protocols known to involve length
limitations. It should be noted in this regard that the likely length of
language tags under this draft is no different than under RFC 3066; the
only difference is that this draft imposes greater constraints on the
form and meaning of subtag elements.


> While there is some mention of this issue in the document
> under discussion, its treatment and resolving the underlying issue in
a
> manner that would minimize the problems are lacking.

It's unclear what is meant by "the underlying issue". Please clarify.



> Tagging identifies characteristics of a particular piece of content.
For
> that purpose alone, it makes little difference (other than regarding
the
> aforementioned compatibility issues with existing IETF mechanisms)
whether
> the characteristics are lumped or separate.

On the contrary, it makes little difference only if the characteristics
in question are completely orthogonal. As pointed out above, the
characteristics of linguistic variety and written form are not
orthogonal, particularly when it comes to expressing user preferences,
and that it *does* make a difference if they are split into separate
metadata attributes or they are lumped together into a single metadata
attribute.



> While that may be used to infer something about the content
> provider, such inferences may be unreliable...

Quite so. This point was discussed in the WG.


> Negotiation of characteristics is where several issues arise...

> As a result of issues
> with that approach, the LTRU WG was established with a charter to
produce
> a
> BCP (for registration procedures) and a separate Standards Track
document
> for topics such as algorithms which are unsuitable for BCP.

The LTRU WG is a little behind its initially-proposed schedule for
milestones, but otherwise is on track to complete the approved
milestones in order. Thus, the latter document is in progress.


> The proposed
> mechanism in the individual submission of late last year (essentially
> unchanged in the LTRU product (see discussion below)) does not address
the
> language range issue, and that issue is greatly complicated by
conflating
> separate characteristics into a single tag.

It is unclear how the drafts in question can be critiqued for failing to
address "the language range issue" (which issue is not clearly
identified here, btw) given the explicit plan in the charter that
algorithms for matching be addressed in a separate document to be
completed after these drafts.


> Addressing the language range
> issue is not a WG work item and, unfortunately, the algorithm issue is
> scheduled to be a later work item than the registry issue. Added to
that
> is the fact that the specification of the tag format is mixed with
> registration procedures.

As according to the charter.


> Negotiation of separate characteristics is much
> simpler than that of a combined conflation of characteristics; each
> characteristic can be assigned separate preference values, and
irrelevant
> characteristics (e.g. script w.r.t. spoken language) can be easily
ignored.

Negotiation of separate attributes involving inter-related
characteristics is *not* simpler, as pointed out above. The draft fully
allows for irrelevant characteristics (e.g. script wrt audio content) to
be ignored. Again, what has been provided in the draft is in accordance
with the charter of the WG.


> As negotiation and related issues represent a critical technical issue
for
> the design of language tags (viz. keeping separate characteristics out
of
> *language* tags), it is essential that such negotiation issues be
> considered
> carefully before specifying the format of tags. Unfortunately, that
has
> not
> been done, and considering the published WG milestones it appears that
> that
> issue has not been taken into consideration... However, it
> appears that the WG has not considered the issues, with the effect
that
> the
> WG product lacks the "particular care" expected of BCP documents (RFC
> 2026).

It is unclear on what basis it is asserted that these issues have not
been considered by the WG. I believe most of the WG members would feel
that they have been reasonably taken into consideration. Again, what has
been submitted for last call is in accordance with the charter; just as
it is not reliable to infer something about a content provider from a
language tag, so also it is not reliable to infer from the order of
milestones in the charter that matching issues were not taken into
consideration in preparation of these drafts.



> Note that it is not the registration procedural issues that are
typical of
> BCP documents that are problematic; rather it is the conflation of
> separate
> characteristics into a single tag syntax, specified in the same
document,
> which raises problems related to content negotiation.

Bruce asserts (a) that there is conflation of separate characteristics,
and that (b) this creates problems in content negotiation. The WG
determined that the characteristics conflated into a single tag are not
independent, and that it would be *separation* into separate attributes
that would result in problems in content negotiation, not their
combination into a single attribute.



> Another large part of
> the problem is WG management; in addition to the issues raised by John
> Klensin the last time that LTRU participation was discussed on the
IETF
> discussion list -- and with which I wholeheartedly agree -- it appears
> that
> management of WG participant conduct has been rather lax; proponents
of
> the
> individual submission effort who are participating in the WG tend to
> resort
> to ad-hominem attacks when a problem is identified or when an
alternative
> approach is raised, with no visible intervention by the WG co-chairs.
> That
> has also (i.e. in addition to the factors which John identified) had
the
> effect of limiting WG participation by individuals.

It's unclear what bearing this has on what improvements can be made to
the drafts in fulfillment of the WG charter. I believe several WG
participants felt that management of conduct was lax, particularly in
relation to a very small number of participants with a penchant for
certain behaviours that would have challenged the best of moderators.

As for the accusation that proponents of an earlier individual
submission engaged in ad-hominem attacks that went without intervention
by the WG co-chairs, resulting in the limitation of participation in the
WG by other individuals, in the absence of specific evidence, this
appears itself to be no more than an ad-hominem attack on those
individuals and on the WG co-chairs. To my knowledge, there was only one
individual in relation to whom other members of the WG acted in any way
that might discourage or hinder his participation, and such actions
arose only in response to repeated provocation from that individual.



> Specification of "language" tag syntax which conflates other content
> characteristics prior to open and professional discussion of
negotiation
> issues and alternative approaches would be a premature lock-in of a
design
> choice. As the document under discussion specifies a conflation of
such
> characteristics without open discussion

It is asserted that there has been no open discussion of the matter of
conflation. This is untrue. It is asserted that there has been no open
discussion of alternatives; the only concrete alternative presented for
discussion was to have separate language and script tags, which
alternative was considered and rejected due to problems that arise in
content negotiation. The drafts submitted for review are in accordance
with the charter, and I believe I can say that in the opinion of WG
members matters of conflation and of negotiation issues were taken into
consideration, and were discussed in an open and professional manner.



Peter Constable
JFC (Jefsey) Morfin
2005-08-29 16:11:00 UTC
Permalink
Dear all,
at this stage I think it is clear that the langtags issue represents
a strong opposition between two visions of the Multilingual Internet.
These visions for the worse or the better are embodied by Peter
Constable's friends and me.


There is an affinity group gathered by circumstances or by talent to
support Peter's approach. Its kernel happens to be formed by English
mother-tongue people employed by large corporations or interests
(from history it seems it formed in the course of international
meetings). A few Members are included by personal dedication or as
consultant. There are no academic searcher, no publicly funded
contributing project, no cultural organisation sponsoring. The
Members of this affinity group share a comon culture. It is based
upon different levels of technical involvement of the structures and
individuals involved. There is no R&D involved in the network area
which is not sponsored by commercial interests, with the con and pro
meaning of RFC 3869. In that sense it can be said it is an US
industry lead group. This is at least the way non-US interest,
organisations, Government officials I discussed with identify them
with no exception. True or not, this is the perception. It is to be
related to the definition of an IETF affinity group be RFC 3774.

This group proposes a tagging of all the languages of the world, it
perceives as a commondity (a well known trait of the English mother
tongue people who share their own language with other people round
the world). This way certainly suits e-commerce and basic
interoperability and library classification of foreign books. The
idea is that a standard and a central registry will constrain the
world to follow a common useful rule, if it cannot continue using
ASCII English. This is named "internationalisation". This unliteral
standardisation is seen as the only warranty of stability and of
unicity of the network. Being unique for the entire world this
tagging must be simple and based upon simple information. This
information is made of three elements the commerce needs for
practical reasons: the written language, the script being used, the
applying law.

This vision A addresses specific urgent needs of the printing and
libraries industries to reduce costs to face the competition of other
media and the printing capacity of every user (a problem less
documented but as important as the Music industry'sproblem), with a
larger financial turn-over. World concentrations and specialisations
can be expected from a unique normative system. With all the
reluctances one can be expected and the strategy one may imagine).


There is a tissue of relations I weaved among people engaged in
network research, operations management, cultural life, government
administration, international entities, lingual oriented interests
and activities, and local industry, from various parts of the world,
in particular through an Internet test-bed named dot-root (responding
to the ICANN ICP-3 call), a long involevement in @large and ccTLDs,
and from an national internet community and governement think tank I
started one year ago and which develops unexpectedly. The strength of
this relational group is that no money is engaged, what warranties
its independance. But this is also its weakness as it leaves it no
other alternative than to rely on voluntaries to represent it - often
only one when the task is as demanding as this one; or to call on the
personal involvement of concerned people, with the risk of
overwhelming the Internet standard process by scores of irritate new
commers. The common culture of this group is common sense support
towards a user-centric multilingual architecture and strong
sustainable innovation

This group sees no need to tag the languages but the need to document
relations, which - among other things - use languages, but also many
other parameters. It thinks that every human being, machine and
service is specific and different from other, and that surety,
security, stability and innovation capacity is based upon the best
seamless support of these differences for a strong unity of the
network. It experimented that the computing generalisation and a
pervasive networking support a realistic, commercially rewarding and
humanly exciting set of possibilities. This concerns relations,
culture, economy, social, political development everyone, every
economy, every country may share in, on an equal opportunity basis.
It also sees a global convergence of R&D, civil society, economy and
political spheres in that direction (for example at WSIS, but also at
IETF) expressed in various directions, one being the information
conceptual networking (ISO 11179 R&D) and another a fluid refencing
system (URI tags) which give new possibilities; specially when added
to physical and services networking.

This vision B calls for an open description system/language of
languages, and of many other relational parameters. Obviously it is
still in infancy as everything started in the early 80s has been
delayed by the furthor OSI and then Internet vision, hardware and
bandwidth limitations and costs. It is only resuming now.

The vision A has difficulty (and lack of competence Peter helped
documenting yesterday) to understand vision B. And as usal in that
cases it fights the messenger. No big deal: the messenger is used to it.

Vision B has no problem in accepting vision A as a "default" for
those wanting it. However vision A is centralised and vision B is
distributed. Xo, vision A thinks it needs to be unique to exist and
fullfil its purpose. This is why Vision B proposed several things:

- to define a Vision A exclusive area of application. This was made
from the second Last Call in proposing the authors to add wording
telling that the area of application was the areas already covered
by RFC 3066 and documented further on.
- to protect Vision A from confusion. This was made in pushing the
authors into a very strict ABNF avoiding tag-creeps.
- there may be other propositions to sudy. This is however not easy
to uncover as Vision A has difficulty with the architectural
evolutions (network, content, relational elements) all this
technically implies.

As I explained, there are three scenarii:

1. Vision A is denied by the IESG. Progressively vision B imposes
itself through new RFCs or from a grassroots (international) process.
The current basic needs are not properly addressed. Credibility of
the IETF is engaged like in spam, IDNA, etc. This is delaying.

2. Vision B is denied by the IESG. But vision B is already accepted
through the URI-tags RFC. It will develop in opposition to Vision A.
This will cost money and delays to everyone, Multilingual Internet
will switch outside of the IETF or balkanise.

3. Vision B is included in Vision A as a community private use. This
scheme is simple to understand and to include in the RFC 3066 Bis
document in two lines. It does not break any of its principles.

- the document is unchanged and addresses the general need,
whatever it may be.
- "x-" is unchanged. Its role is to support private use schemes,
within private spaces.
- "0-" is added from the reserved singleton pool. Its role is to
support community private use schemes. This means, when a user
community wants to document languages their own way. The need is to
support in a non conflicting ways two informations:
- the community scheme identification
- the identification within that scheme.

I think this respect all the requirements of Vision A and permits a
full developement of Vision B. There are two possibilities to support
the "0-" space: either to develop a new system or to use an existing system.

I have no particular opinion except that the solution MUST be
decentralised (community centralised). I started thinking we had to
develop a new one, waiting for tge review of the WG-ltru charter both
to make sure the proposition would fully respect Vision A and to
learn Vision B points we would have overlooked (there probably are
many). This created problems to the WG wich only wanted to block
Vision B it still does not uinderstand or opposes.

Then we found the not yet numbered URI-tag RFC. It seems to address
all the needs, but more than the needs, except the
multilingualisation. My intent is therefore to document an IRI-tag
along the URI-tag lines when this debate has stabilised and the
URI-tag RFC has been published. I have no problem working on it
within the WH-ltru.

What next? The Vision A alone is harmfull to all. If it was accepted
it would be appealed. To IETF Chair for common architectural common
sense. To IESG for lack of compatibility with the Charter and other
RFCs. To IAB if necessary to obtain guidance on the implementation of
the Multilingual Internet. Then appeals would continue in the outside
world. The target is not to oppose the Vision A. It is to the
contrary to make sure it is viable. As the only solution permitted,
it will NOT survive because it is not able to resist all what one can
expect people will do with it out of control. We had a very similar
case with IDNA. The only response to hommograph phishing was "we
discussed it"....

I will document a few of these points in responding last Peter's mail.

At 14:11 29/08/2005, Peter Constable wrote:
> > From: Bruce Lilly <***@erols.com>
> > > This
> > > is all what this proposition is about. This proposition is to give
> > > _one_shot_ in a _standardised_ way the language, the script and the
> > > country.
> >
> > This was discussed during Last Call of the previous non-IETF
>(individual
> > submission) attempt. IIRC David Singer brought up several examples of
> > other pieces of information (e.g. legal/copyright variations) that
>could
> > also be negotiated and which might affect the presentation of content
>(or
> > choice among alternative content). Lumping all of these separate
>items
> > into
> > one tag is a poor design as it impedes negotiation and tends toward
> > lengthy
> > tags which are incompatible with fixed-length mechanisms such as MIME
> > encoded-words.
>
>I agree that it would be poor design to incorporate other pieces of
>information such as legal/copyright variations into language tags, but
>as such pieces of information are not supported by the draft, this
>appears to be irrelevant.

This is inexact. There is no problem in having the Draft compliant tag:

fr-Latn-fr-gayssot

to indicate a French language text fully respecting the "Loi
Gayssot", the anti-racist law used against Yahoo. There is no
warranty that an ISP or the French law does not filter out pages from
suspected sites not wearing that tag, transfering Host legal
responsibilities to the Author.

The problem in believing that one can rule the world is that the
world may not accept to be ruled.

>We should rather focus on whether it is good design to incorporate
>information related to linguistic and written-form attributes, as
>supported in the draft, into a single tag. The consensus of the LTRU
>working group is that it is.

Let phrase it a more exact way: the affinity group which formed the
WG has been gathered around that idea.

1. basic written mode attributes should not be specific in the
description of a language ... while in addition most of them are oral
2. in what manner the country code is related to a specific
information? Nowhere in the Draft this attribute is documented: is it
the location where the text has been written, the location of the
lingual community of the author, or of the lingual community of the
reader ??? Where is that location definition documented so both side
of the relation can understand each other when negociating?

> For instance, the use of separate tags for
>language and script were considered and rejected

this has not been considered and rejected. This was a predefined
faith and every question on this has been defeated.

The problem is that it is meaningless and conflicting with the charset!!!
Until you associate a "script" with a charset, a script has no meaning ....

I asked the simple question: "does fr-Latn-FR means that Latn permits
me to properly write French?" To know that, I need to know what are
the characters associated to "Latn". No response. Same question on
the Unicode list. Non-French mother tongue members said "yes" (but no
one was able to demonstrate it). French mother tongue experts said
"no" and explained that Unicode lacks a particular space needed to
properly type typical French sentences an one accentuated character.
This was then disputed. My problem as a user, as a network
standardiser is not to be concerned by these details. I need
certitudes and warranties the Draft does not provide.

>on the basis that the two are not entirely orthogonal. Clear
>examples of this was considered:
>while the intent of
>
>Accept-Language: ar, az-Cyrl, ru
>
>is clear, the intent of
>
>Accept-Language: ar, az, ru
>Accept-Script: Cyrl
>
>or of
>
>Accept-Language: ar, az, ru
>Accept-Script: Arab, Cyrl
>
>is not clear, nor is it obvious how rules could be specified that would
>make the intent clear, or that would permit expressing the preferences
>reflected in the first instance.

This kind of example is absurd. There is no more information and more
confusion with the proposed system if a page or a part of a document
is also assigne different conflicting langtags ...

> > Tagging identifies characteristics of a particular piece of content.
>For
> > that purpose alone, it makes little difference (other than regarding
>the
> > aforementioned compatibility issues with existing IETF mechanisms)
>whether
> > the characteristics are lumped or separate.
>
>On the contrary, it makes little difference only if the characteristics
>in question are completely orthogonal. As pointed out above, the
>characteristics of linguistic variety and written form are not
>orthogonal, particularly when it comes to expressing user preferences,
>and that it *does* make a difference if they are split into separate
>metadata attributes or they are lumped together into a single metadata
>attribute.

Explain.

I will go your way however you have not defined what is a script. The
author is a Rusian, siting in NY and writing a page in Urkainian and
wanting the texts to be repeated in Latn and Cyrl scripts, so
everyone there is able to read it. A very common proposition.

Please precisely document the langtags. And show what is not
orthogonal in them.

> > While that may be used to infer something about the content
> > provider, such inferences may be unreliable...
>
>Quite so. This point was discussed in the WG.

The question is to know if the solution is acceptable. This LC is the
LC of the document, not the of the WG or mine;

> > Negotiation of separate characteristics is much
> > simpler than that of a combined conflation of characteristics; each
> > characteristic can be assigned separate preference values, and
>irrelevant
> > characteristics (e.g. script w.r.t. spoken language) can be easily
>ignored.
>
>Negotiation of separate attributes involving inter-related
>characteristics is *not* simpler, as pointed out above. The draft fully
>allows for irrelevant characteristics (e.g. script wrt audio content) to
>be ignored. Again, what has been provided in the draft is in accordance
>with the charter of the WG.

Charter speaks of languages. You made clear the Draft was language
and not written language oriented. I am glad to learn that the mode
is an irrelevant characteristic.

Most of the languages are oral. Their rendering in a written form is
therefore a important information ...

> > As negotiation and related issues represent a critical technical issue
>for
> > the design of language tags (viz. keeping separate characteristics out
>of
> > *language* tags), it is essential that such negotiation issues be
> > considered
> > carefully before specifying the format of tags. Unfortunately, that
>has
> > not
> > been done, and considering the published WG milestones it appears that
> > that
> > issue has not been taken into consideration... However, it
> > appears that the WG has not considered the issues, with the effect
>that
> > the
> > WG product lacks the "particular care" expected of BCP documents (RFC
> > 2026).
>
>It is unclear on what basis it is asserted that these issues have not
>been considered by the WG. I believe most of the WG members would feel
>that they have been reasonably taken into consideration.

I agree with that. But, the question is where was the related
decisions taken. I would tend then to fully agree with Bruce.

> > Note that it is not the registration procedural issues that are
>typical of
> > BCP documents that are problematic; rather it is the conflation of
> > separate
> > characteristics into a single tag syntax, specified in the same
>document,
> > which raises problems related to content negotiation.
>
>Bruce asserts (a) that there is conflation of separate characteristics,
>and that (b) this creates problems in content negotiation. The WG
>determined that the characteristics conflated into a single tag are not
>independent, and that it would be *separation* into separate attributes
>that would result in problems in content negotiation, not their
>combination into a single attribute.

Govermental authority over content is not an orthogonal information
to language in some parts of the world. Question is to know if this
is to be addressed as a general or a specific issue.

> > Another large part of
> > the problem is WG management; in addition to the issues raised by John
> > Klensin the last time that LTRU participation was discussed on the
>IETF
> > discussion list -- and with which I wholeheartedly agree -- it appears
> > that
> > management of WG participant conduct has been rather lax; proponents
>of
> > the
> > individual submission effort who are participating in the WG tend to
> > resort
> > to ad-hominem attacks when a problem is identified or when an
>alternative
> > approach is raised, with no visible intervention by the WG co-chairs.
> > That
> > has also (i.e. in addition to the factors which John identified) had
>the
> > effect of limiting WG participation by individuals.
>
>It's unclear what bearing this has on what improvements can be made to
>the drafts in fulfillment of the WG charter. I believe several WG
>participants felt that management of conduct was lax, particularly in
>relation to a very small number of participants with a penchant for
>certain behaviours that would have challenged the best of moderators.

I suffered most of that: various innuendo on my age, my need of
English teachers, the despise of my colleagues as "end users" vs.
"IETF members" and "developers", "physical allusions to my possible
broken nose", anonymous phone calls, loss of clients due to abusive
mails they read under partners coporate name, accusations of
ignorance by ... documented ignorant, rumours, etc.

I agree that one of the moderator actively engaged in that process.
But these are the risks of opposing big interests. When it went too
far, I appealed to the AD. The problem was corrected in minutes. The
AD decided to pursue the appeal and ruled in a good way for the
stability of the WG. It is true that from then on, insults against me
did not result anymore in banning or warning or insulting me.

We all are grown boys. I am in that kind of business for nearly 30
years. I saw worse :-) (but usually more competent). I invited
without problem all my opponents to have a drink in Paris (but none
came to the IETF meeting, or told me). It would have been nice.

>As for the accusation that proponents of an earlier individual
>submission engaged in ad-hominem attacks that went without intervention
>by the WG co-chairs, resulting in the limitation of participation in the
>WG by other individuals, in the absence of specific evidence,

Please refer yourself to the mailing list. However, this is not a
Last Call of the WG management, but a Last Call of the Document. The
reasons why the document is incomplete should not be discussed so
much, just what is missing or to correct.

But it is true that several have been rebuked by the attitude of the
authors. I would say that this was evaluated very early. And that the
debate is better served when people overcome this. One judges a tree
to its fruits. The deliverable is not perfect: this is what matters today.

> this
>appears itself to be no more than an ad-hominem attack on those
>individuals and on the WG co-chairs. To my knowledge, there was only one
>individual in relation to whom other members of the WG acted in any way
>that might discourage or hinder his participation,

Two disclosed. Two implied. This is mostly because I accepted to
represent others. But what would have been the use of making the WG a
battle field? This is what the author wanted so the "best" would
"win". This is not my vision of the IETF.

>and such actions
>arose only in response to repeated provocation from that individual

archives are here.

> > Specification of "language" tag syntax which conflates other content
> > characteristics prior to open and professional discussion of
>negotiation
> > issues and alternative approaches would be a premature lock-in of a
>design
> > choice. As the document under discussion specifies a conflation of
>such
> > characteristics without open discussion
>
>It is asserted that there has been no open discussion of the matter of
>conflation. This is untrue. It is asserted that there has been no open
>discussion of alternatives; the only concrete alternative presented for
>discussion was to have separate language and script tags, which
>alternative was considered and rejected due to problems that arise in
>content negotiation. The drafts submitted for review are in accordance
>with the charter, and I believe I can say that in the opinion of WG
>members matters of conflation and of negotiation issues were taken into
>consideration, and were discussed in an open and professional manner.

total disagreement on the outcome so far. But I hope we can overcome
that with the help of the IETF/IESG.

A lot of things have already changed in what some say ....
jfc
Brian E Carpenter
2005-08-30 09:30:05 UTC
Permalink
JFC (Jefsey) Morfin wrote:
> Dear all,
> at this stage I think it is clear that the langtags issue represents a
> strong opposition between two visions of the Multilingual Internet.
> These visions for the worse or the better are embodied by Peter
> Constable's friends and me.

I know nothing of Peter Constable's friends. I do know that the LTRU
WG Chairs have declared WG consensus on this draft, and the best
interpretation I can make of your sentence above is that you are
a single dissenter from that consensus. Thankyou for pointing that out
as a Last Call comment, but there is little point in repeating it,
as the information does not gain in content by repetition.

IETF Last Calls are intended "to permit a final review by the
general Internet community" rather than to re-run the WG's discussion.

Brian
Ned Freed
2005-08-29 16:12:33 UTC
Permalink
> > From: Bruce Lilly <***@erols.com>


> > > This
> > > is all what this proposition is about. This proposition is to give
> > > _one_shot_ in a _standardised_ way the language, the script and the
> > > country.
> >
> > This was discussed during Last Call of the previous non-IETF
> (individual
> > submission) attempt. IIRC David Singer brought up several examples of
> > other pieces of information (e.g. legal/copyright variations) that
> could
> > also be negotiated and which might affect the presentation of content
> (or
> > choice among alternative content). Lumping all of these separate
> items
> > into
> > one tag is a poor design as it impedes negotiation and tends toward
> > lengthy
> > tags which are incompatible with fixed-length mechanisms such as MIME
> > encoded-words.

> I agree that it would be poor design to incorporate other pieces of
> information such as legal/copyright variations into language tags, but
> as such pieces of information are not supported by the draft, this
> appears to be irrelevant.

I agree with both points.

> We should rather focus on whether it is good design to incorporate
> information related to linguistic and written-form attributes, as
> supported in the draft, into a single tag. The consensus of the LTRU
> working group is that it is. For instance, the use of separate tags for
> language and script were considered and rejected on the basis that the
> two are not entirely orthogonal. Clear examples of this was considered:
> while the intent of

> Accept-Language: ar, az-Cyrl, ru

> is clear, the intent of

> Accept-Language: ar, az, ru
> Accept-Script: Cyrl

> or of

> Accept-Language: ar, az, ru
> Accept-Script: Arab, Cyrl

> is not clear, nor is it obvious how rules could be specified that would
> make the intent clear, or that would permit expressing the preferences
> reflected in the first instance.

This is such an important point that it deserves to be caled out, lest it
be lost in the flurry of messages on this topic.

Designs the separate tagging of, say, script and langauge appear at
first glance to be more flexible and general. But appearances can be
deceiving. The problem is that using separate labels does not provide
an easy way of linking the two, and being able to express these
lingages is vital.

> > Tagging identifies characteristics of a particular piece of content. For
> > that purpose alone, it makes little difference (other than regarding the
> > aforementioned compatibility issues with existing IETF mechanisms) whether
> > the characteristics are lumped or separate.

> On the contrary, it makes little difference only if the characteristics
> in question are completely orthogonal.

And in the case of language and scripting tags the information is almost always
inseparatable - as far from orthogonal as you can get.

> As pointed out above, the
> characteristics of linguistic variety and written form are not
> orthogonal, particularly when it comes to expressing user preferences,
> and that it *does* make a difference if they are split into separate
> metadata attributes or they are lumped together into a single metadata
> attribute.

To be totally fair, it would be possible to define a linkage between the two.
Howegver, the representation would end up being fairly compliccated, not to
mention being totally incompatible with the existing field syntax. As far as I
can see the only time a multiple field plus linkage would be a win is when the
repetition of subordinate information resulted in an overly long field. But the
sizes of the tags here are so small that this is at best a marginal corner
case.

In summary, I beleve the approach of using separate fields offers no
advantages and has numerous disadvantages over the appeoach that was
chosen, and that the WG was correct to reject it.

Ned
Peter Constable
2005-08-29 13:49:21 UTC
Permalink
> From: "JFC (Jefsey) Morfin" <***@jefsey.com>
> Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP

> >Mr. Morfin appears to me to have no more than a very vague sense of
the
> >scope of ISO 639-4.
>
> This is somewhat fun as I am a contributor.

To my knowledge, you have never been a member of TC37/SC2/WG1. I cannot
rule out the possibility that you have submitted suggestions that found
their way to WG1.



Peter Constable
Bruce Lilly
2005-08-31 01:59:59 UTC
Permalink
> Date: 2005-08-28 16:25
> From: Frank Ellermann <***@xyzzy.claranet.de>

> That's a last call, if you have better ideas than those in the
> draft speak up.  Your Content-Script idea is good, but won't
> help e.g. in encoded words (2047+2231).

Encoded-words have several characteristics, one of which is limited
length (in octets). That has two implications w.r.t. script:
1. specifying script explicitly is unnecessary; it can be determined
from the charset (always specified in an encoded-word) and the
specific octets of the encoded text (ISO-8859-1 is latin script,
KOI8 is Cyrillic, etc.).
2. an encoded-word has limited space available. of a maximum of 76
octets in an encoded-word specifying language, there are 8 for
overhead, at least one (currently exactly one) for specification
of encoding method, a charset specification (registered charsets
have names up to 45 octets in length), the language tag, and some
encoded text. The encoded text must be at least one octet for Q
encoding and a simple (unshifted) charset; for B encoding (and an
unshifted charset) it has to be a multiple of 4 octets, and a typical
charset with shift sequences will require on the order of 6 octets
minimum (for Q encoding; 8-12 minimum for B encoding). Specifying
(unnecessarily; see above) script reduces the available space for
actual (encoded) text; possibly to the point of impossibility in
pathological cases.

Specification of script is only a performance enhancement for long texts
(not the case for encoded-words) where a multi-script charset is in use.

While the Content-Script (or similar feature/filter mechanism) would not
be applicable to encoded-words, specification of script is unnecessary
for encoded-words (and undesirable due to impact on the available text
space).

Specification of script is only possible where a given text uses a single
script, and that limitation applies to any of the methods of indication
mentioned above, including the addition to language tags proposed by the
draft under discussion.

Script is a characteristic of written text; it is not applicable to (e.g.)
audio media types. It really should be a text media type parameter (or
feature).

> This is a ready-for-Bruce's-review draft as far as I can judge
> this, but for obvious reasons only you can really judge it. ;-)

As I mentioned in an earlier message, without a concrete specification
for negotiation, it is not possible to fully assess the proposed syntax
changes.

> > Addressing the language range issue is not a WG work item
> > and, unfortunately, the algorithm issue is scheduled to be a
> > later work item than the registry issue.
>
> Only my personal view of course, but the matching draft offers
> a syntactical form for ranges,

There is no such draft in Last Call at this time, as far as I know.

> if ISO 3166-1 pulls another CS 3066bis will handle it
> better than 3066 (no potential worldwide retagging confusion).

I am unaware of any "worldwide retagging confusion" w.r.t. language
tags and "CS".

> > it appears that management of WG participant conduct has been
> > rather lax
>
> IBTD, the WG Chairs and the responsible AD did a very good job.

As an affected party, I disagree.

> > Revision to move the syntax specification to a separate
> > document, as mentioned above, would permit evaluation of the
> > registration procedures per se
>
> You can also read chapter 3 per se, the mentioned 14 pages
> plus 3.1 as introduction (5 pages, format of the registry).

But a single section isn't being Last Called; it is the entire document,
and lacking specification of negotiation mechanisms it is not possible
to fully assess the document as it stands.
Frank Ellermann
2005-08-31 05:17:01 UTC
Permalink
Bruce Lilly wrote:

> Encoded-words have several characteristics, one of which is
> limited length (in octets). That has two implications w.r.t.
> script:

> 1. specifying script explicitly is unnecessary; it can be
> determined from the charset (always specified in an
> encoded-word) and the specific octets of the encoded text
> (ISO-8859-1 is latin script, KOI8 is Cyrillic, etc.).

It's not that easy for UTF-8. We need the ugly scripts after
Unicode replaced the old charsets (the "implicit script" info
of most legacy charsets). Where that's irrelevant you can of
course use language tags without script, the draft encourages
"taggers" that more is not always better, quite the contrary.

> 2. an encoded-word has limited space available.

[...snipped...] Yes, we calculated "the most perverse tag" in
all dimensions especially for 2231, I knew that you would kill
the draft otherwise... ;-) Compare figure 7 in chapter 4.3.1.

> without a concrete specification for negotiation, it is not
> possible to fully assess the proposed syntax changes.

Maybe you can convince the PTB to delay the "last call" until
the matching draft is ready, but I doubt it. And I disagree
that it's impossible to judge the "data structure" (tags) now,
the syntax is rather simple.

For a general idea what the matching draft probably will be
you could read draft-ietf-ltru-matching-03.

>> the WG Chairs and the responsible AD did a very good job.
> As an affected party, I disagree.

Then let's agree to disagree and / or be more specific: 3934
is rather new, and it was used, all parts of it incl. appeal.

IMNSHO it would be desastrous to abuse RfC 3934 as some kind
of killfiling-by-rough-consensus.

Bye, Frank
Bruce Lilly
2005-08-31 02:13:56 UTC
Permalink
> Date: 2005-08-28 20:33
> From: "JFC (Jefsey) Morfin" <***@jefsey.com>

> The problem are:
[...]
> - the lack of alternative (are we sure there are no other
> architectural way to address the same need without information leak)

I think the answer is "yes". For tagging of content, there is no "leak",
only information -- that is the entire point of the tag. For negotiation,
there necessarily has to be some exchange of information ("I prefer X").

> - the lack of encryption

See above re. the necessity of information exchange. TLS is available to
protect against eavesdropping.

> - the "spam" aspect: I am imposed to receive the langtag.

You can of course ignore it if you wish. I also fail to understand how
you can be opposed to a language tag as "spam" yet at the same time
apparently wish to compound the amount of "spam" by including icons,
dictionaries, timestamps, etc.
Bruce Lilly
2005-08-31 02:29:12 UTC
Permalink
> Date: 2005-08-28 20:33
> From: "C. M. Heard" <***@pobox.com>

> However, RFC 2026 does not set the rules for
> non-standards track documents, as it explicitly says in Section
> 2.1.

Sorry, I don't see that anywhere in 2.1. 2.1 does say that non-standards
track specifications are not subject to the rules for standardization (as
in full Standard), but it goes on to point to the rules in 4.2 for
Informational and Experimental RFCs.

> There is a precedent, by the way: RFC 2341.  Note that it postdates
> RFC 2026.

Interesting. Are there any others? I have heard that an effort to publish
a particular obsolete specification as Historic received strong pushback,
with the recommendation for publication as Informational.

Aside from the label -- and that's not a clear benefit because Historic is
ambiguous -- I don't see much difference between Historic and an
Informational RFC with a suitable IESG note (a "warning label" if you will).
Frank Ellermann
2005-08-31 03:42:54 UTC
Permalink
Bruce Lilly wrote:

>> There is a precedent, by the way: RFC 2341.  Note that it
>> postdates RFC 2026.

> Interesting. Are there any others?

Maybe 4156 (wais) & 4157 (prospero). That's a bit special,
because it's a part of the effort to get rid of 1738.

> I have heard that an effort to publish a particular obsolete
> specification as Historic received strong pushback, with the
> recommendation for publication as Informational.

If you have son-of-1036 in mind, "strong pushback" isn't how
I recall Henry's info - it was more like "lacking enthusiasm".
I can't check it at the moment.
Bye, Frank
Brian E Carpenter
2005-08-31 09:31:51 UTC
Permalink
Bruce Lilly wrote:
...
> Aside from the label -- and that's not a clear benefit because Historic is
> ambiguous -- I don't see much difference between Historic and an
> Informational RFC with a suitable IESG note (a "warning label" if you will).
>

There's a difference. For example, imagine a media type called

splat/illogical

that's been used for some years but is generally considered to be
illogically named, and a new media type has been defined to do the
same thing:

splot/logical

It would then be reasonable to document splat/illogical as Historic
to explain its IANA registration, and to document splot/logical as
Informational.

But you're certainly correct that a health warning in the text of
the RFC is more important than a status marker in the index.

Brian
Loading...