WikiDiscuss

WikiDiscuss


Morphology: Algorithm

posts: 2388
Use this thread to discuss the Morphology: Algorithm page.
posts: 2388

OK, so I haven't been following this as closely
as I ought. To bring me up to speed, let me lay
out what I understand to be the status quo ante
for "pure" Lojban (no names, no fuhivla). There
are then only cmavo, gismu and lujvo.

Using W to stand for any of V, diphthong, V'V, y
cmavo are all
(C)W'W
where the current number of repetitons of the
boxed syllable is 1 and there are no cases of
sibilant + iV nor syllabic consonant + uV and y
has some sort of restrictions.

The restrictions on y can probably be overridden.
The restriction on uV seems odd to reasonably
competent speakers of French and Spanish and
might be overrriden.
For some purposes {y bu} counts as a cmavo but
need not for these phonological ones.

Gismu are all either CV*CCV or CCV*CV (V* being a
stressed vowel). This, the oldest rule in the
book, is not going to change. The most that can
happen is that a new class of brivla is created
which are unanalyzable within Lojban, like gismu,
but are not derived in the gismu building way
from other languages.

Lujvo are built up from strings of (more or less)
reduced forms of gismu and so are constructed as
follows. Each begins with

CVCCy/CCVCy/CCV/CVC*/CVC
/CV(')V(#)


where initial CC must be permitted initial CC,
VV must be a diphthong permitted in that place
(see cmavo list)
V is not y,

  • is y if the preceding and following Cs form a

permitted initial CC (and the whole is not
CCC?)and V is unstressed. Otherwise * is void.

  1. is r if the next syllable does not begin with

CC or begins with a permitted initial CC. It is
n in these cases where the first C is r.
Otherwise # is void.

is y if the preceding and following Cs form an impermissible CC. Otherwise
is void.


If the V (the second in the case of CV'V) in this
is stressed (V+), the whole can be followed only
by CCV/CVV.
Otherwise the whole can be followed by

CCVCy/CVCCy/CVC
/CCV/CV(')V repeated any number

of times with unstressed V.
Finally, there is any of the stressed initial +
final chunks from above, or CV+CCV/CCV+CV/CV+'V

This is a much later set of rules, but is also
unlikely to change.

Now, given the original specification of brivla,
how can this be liberalized. According to the
orignal specs, a brivla
1. ends in a vowel
2. has penultimate stress
3. has a CC in its first 5 phones (not counting '
and y).
Anything more than this (and the specifics of
Lojban phonotactics) is irrelevant for the
separation of words in a string and into
categories even. Introducing names and direct
borrowings does not change this assuming we keep
some barriers around them (initial {la] and the
like with final conconant and pause, {la'o} and
{zoi}, or explicit markers like {iy} and {uy}).
So, once the beginning of a brivla is set — by
the CC or y — what happens after that until the
next stressed vowel is not relevant to the
separation algorithm (except, as noted to make
sure everything that occurs is permissible in
Lojban — a question we appear now to be
discussing). Nor need setting the beginning
(finding the first CC or y) be restricted as it
it above: a string of up to three vowels, whether
as a triphthong, a diphthong and another syllable
joined by ' or three syllables joined by ' will
work equally well, provided the first CC is not a
permitted initial (and if the rightmost of the
vowels is stressed, even that is allowed). At the
end, any permitted combination of syllable final
and syllable initial clusters can occur,
including an interpolated complete y syllable (I
think more than one would be pushing a bit to
far), ending witha vowel diphthong or triphthong.

If this is essentially correct (and I am pretty
sure I have not taken proper care of at least r
(n) hyphens when syllabic here), then the only
question is what is permissible in Lojban and
even that does not affect the separation
algorithm as such but rather the question of
whether what has been separated out are Lojban
words. And the only construction problems with
naturalized borrowings is to be sure you don't
accidentally produce from some other source a
lujvo — which given lujvos' restricted form is
just a matter of (computer assisted — these
thinks don't get created on the fly) checking.






posts: 1912


> Using W to stand for any of V, diphthong, V'V, y
> cmavo are all
> (C)W'W
> where the current number of repetitons of the
> boxed syllable is 1 and there are no cases of
> sibilant + iV nor syllabic consonant + uV and y
> has some sort of restrictions.

In fact there are no cases of any consonant plus iV or uV,
not just sibilants or syllabic consonants.
y only appears in Cy, y'y and y.

> Lujvo are built up from strings of (more or less)
> reduced forms of gismu and so are constructed as
> follows. Each begins with

> CVCCy/CCVCy/CCV/CVC*/CVC
/CV(')V(#)

>
> where initial CC must be permitted initial CC,
> VV must be a diphthong permitted in that place
> (see cmavo list)
> V is not y,
> * is y if the preceding and following Cs form a
> permitted initial CC (and the whole is not
> CCC?)and V is unstressed. Otherwise * is void.
> # is r if the next syllable does not begin with
> CC or begins with a permitted initial CC. It is
> n in these cases where the first C is r.
> Otherwise # is void.

>
is y if the preceding and following Cs form an > impermissible CC. Otherwise
is void.

  • and
look very similar :-)


> If the V (the second in the case of CV'V) in this
> is stressed (V+), the whole can be followed only
> by CCV/CVV.
> Otherwise the whole can be followed by
> CCVCy/CVCCy/CVC^/CCV/CV(')V repeated any number
> of times with unstressed V.
> Finally, there is any of the stressed initial +
> final chunks from above, or CV+CCV/CCV+CV/CV+'V

You also have to take into account the tosmabru test.
CVC+CVCCV appears to be legitimate, but in fact
sometimes it is not because it could break as
CV CCV+CCV. In order to avoid this breakage,
you have to use CVCyCVCCV in that and analogous
cases.

In the case of fu'ivla, in addition to the tosmabru
test you also need to apply the slinku'i test:
A fu'ivla can't consist of a consonant plus a string
of rafsi, even if it fullfills the other criteria,
because when a CV cmavo is in front of it, it will
look just like a lujvo.

But other than that, yes it's basically right.

mu'o mi'e xorxes




__
Do you Yahoo!?
Yahoo! Sports - Sign up for Fantasy Baseball.
http://baseball.fantasysports.yahoo.com/


posts: 2388


wrote:

>
> --- John E Clifford wrote:
> > Using W to stand for any of V, diphthong,
> V'V, y
> > cmavo are all
> > (C)W'W
> > where the current number of repetitons of the
> > boxed syllable is 1 and there are no cases of
> > sibilant + iV nor syllabic consonant + uV and
> y
> > has some sort of restrictions.
>
> In fact there are no cases of any consonant
> plus iV or uV,
> not just sibilants or syllabic consonants.

Yeah, I know it says that but I thought that had
changed since CLL. Well, add it in and a note
that that is way to restrictive and should be
removed before a lot more expansion into 'W
space

> y only appears in Cy, y'y and y.

Okay, so the restrictions is that 'y is the
only syllable to follow y and that only once.

> > Lujvo are built up from strings of (more or
> less)
> > reduced forms of gismu and so are constructed
> as
> > follows. Each begins with

> > CVCCy/CCVCy/CCV/CVC*/CVC
/CV(')V(#)

> >
> > where initial CC must be permitted initial
> CC,
> > VV must be a diphthong permitted in that
> place
> > (see cmavo list)
> > V is not y,
> > * is y if the preceding and following Cs
> form a
> > permitted initial CC (and the whole is not
> > CCC?)and V is unstressed. Otherwise * is
> void.
> > # is r if the next syllable does not begin
> with
> > CC or begins with a permitted initial CC. It
> is
> > n in these cases where the first C is r.
> > Otherwise # is void.

> >
is y if the preceding and following Cs form

> an

> > impermissible CC. Otherwise
is void.

>

> * and
look very similar :-)

>
> > If the V (the second in the case of CV'V) in
> this
> > is stressed (V+), the whole can be followed
> only
> > by CCV/CVV.
> > Otherwise the whole can be followed by
> > CCVCy/CVCCy/CVC^/CCV/CV(')V repeated any
> number
> > of times with unstressed V.
> > Finally, there is any of the stressed initial
> +
> > final chunks from above, or
> CV+CCV/CCV+CV/CV+'V
>
> You also have to take into account the tosmabru
> test.
> CVC+CVCCV appears to be legitimate, but in fact
> sometimes it is not because it could break as
> CV CCV+CCV.

I thought * took care of that; have I forgotten
some further twist.

>In order to avoid this breakage,
> you have to use CVCyCVCCV in that and analogous
>
> cases.
>
> In the case of fu'ivla, in addition to the
> tosmabru
> test you also need to apply the slinku'i test:
> A fu'ivla can't consist of a consonant plus a
> string
> of rafsi, even if it fullfills the other
> criteria,
> because when a CV cmavo is in front of it, it
> will
> look just like a lujvo.

Well, I didn't deal with fuhivla just because it
caused further problems like this, but that at
least is an easy rule to write in (and an excuse
to separate lujvo from other brivlajust as gismu
are).

> But other than that, yes it's basically right.
>
Glad to hear it. Maybe I can follow a bit better now.


posts: 1912


> > y only appears in Cy, y'y and y.
>
> Okay, so the restrictions is that 'y is the
> only syllable to follow y and that only once.

Hmmm, are we talking about permitted forms, or instantiated forms?

Those are the only instantiated forms, but otherwise y is allowed
in cmavo-forms like other single vowels.

> > > follows. Each begins with

> > > CVCCy/CCVCy/CCV/CVC*/CVC
/CV(')V(#)


> > > * is y if the preceding and following Cs
> > form a
> > > permitted initial CC (and the whole is not
> > > CCC?)and V is unstressed. Otherwise * is
> > void.

> > >
is y if the preceding and following Cs form

> > an

> > > impermissible CC. Otherwise
is void.

> >
> > You also have to take into account the tosmabru
> > test.
> > CVC+CVCCV appears to be legitimate, but in fact
> > sometimes it is not because it could break as
> > CV CCV+CCV.
>
> I thought * took care of that; have I forgotten
> some further twist.

Sorry, I took * and
to be saying the same thing, but I see

they aren't.

But * is not quite it. For example CVCCV'V doesn't need a y,
but rule * as stated would require it.

If the thing after CV is a string of rafsi, then you need the 'y'.
I don't think you can escape examining the whole thing till the end.

mu'o mi'e xorxes




__
Do you Yahoo!?
Yahoo! Mail - 250MB free storage. Do more. Manage less.
http://info.mail.yahoo.com/mail_250


posts: 2388


wrote:

>
> --- John E Clifford wrote:
> > > y only appears in Cy, y'y and y.
> >
> > Okay, so the restrictions is that 'y is the
> > only syllable to follow y and that only once.
>
> Hmmm, are we talking about permitted forms, or
> instantiated forms?


Great, scratch the restriction — though I don't
expect a lot of complex cmavo with y in the
middle .
> Those are the only instantiated forms, but
> otherwise y is allowed
> in cmavo-forms like other single vowels.
>
> > > > follows. Each begins with

> > > > CVCCy/CCVCy/CCV/CVC*/CVC
/CV(')V(#)

>
> > > > * is y if the preceding and following Cs
> > > form a
> > > > permitted initial CC (and the whole is
> not
> > > > CCC?)and V is unstressed. Otherwise * is
> > > void.

> > > >
is y if the preceding and following Cs

> form
> > > an

> > > > impermissible CC. Otherwise
is void.

> > >
> > > You also have to take into account the
> tosmabru
> > > test.
> > > CVC+CVCCV appears to be legitimate, but in
> fact
> > > sometimes it is not because it could break
> as
> > > CV CCV+CCV.
> >
> > I thought * took care of that; have I
> forgotten
> > some further twist.
>

> Sorry, I took * and
to be saying the same

> thing, but I see
> they aren't.
>
> But * is not quite it. For example CVCCV'V
> doesn't need a y,
> but rule * as stated would require it.

What is the status of {spa'i}, which used to be
bruted about? The rule, though wrong when only
lujvo are involved, looks to be right when
further forms are permitted. It shouldn't need
to look beyond determining the first CC and what
goes before it, since from there on there are no
uniqueness requirements until we get to the
stressed vowel and this doesn't play a role
there.
But other rules may have to change if further
brivla forms are allowed too. For example, I
suppose the slinku'i test would require that
dropping the first consonant did not give a
brivla of any form (which shoots down initial CCC
pretty fast), but since then it is only necessary
to check details through the first CC (or CyC)
and the bit after the stressed vowel, this looks
pretty manageable. The rest is just not using
illegal groupings and even those go beyond simply
slicing a correct speech stream correctly to also
checking that every slice is a (potential) Lojban
word, a logically distinct task (unless written
into the original claim, which I am not sure was
the case).
> If the thing after CV is a string of rafsi,
> then you need the 'y'.
> I don't think you can escape examining the
> whole thing till the end.
>
> mu'o mi'e xorxes
>
>
>
>
> __
> Do you Yahoo!?
> Yahoo! Mail - 250MB free storage. Do more.
> Manage less.
> http://info.mail.yahoo.com/mail_250
>
>
>



posts: 1912


> > But * is not quite it. For example CVCCV'V
> > doesn't need a y,
> > but rule * as stated would require it.
>
> What is the status of {spa'i}, which used to be
> bruted about?

It fails the slinku'i test. {le spa'i}
could be pronounced a {lespa'i}, which is a lujvo.

> The rule, though wrong when only
> lujvo are involved, looks to be right when
> further forms are permitted.

lujvo have priority over fu'ivla, so the possibility
of {lespa'i} blocks the possibility of {spa'i} as a word.

> It shouldn't need
> to look beyond determining the first CC and what
> goes before it, since from there on there are no
> uniqueness requirements until we get to the
> stressed vowel and this doesn't play a role
> there.

Maybe, but that's not how the system is set up.

> But other rules may have to change if further
> brivla forms are allowed too. For example, I
> suppose the slinku'i test would require that
> dropping the first consonant did not give a
> brivla of any form (which shoots down initial CCC
> pretty fast),

No, it just requires that it doesn't give a string
of rafsi (possibly with a final gismu), but if it
gives another fu'ivla it's all right, because
fu'ivla can't join with y-less rafsi to form compunds
forms.

> but since then it is only necessary
> to check details through the first CC (or CyC)
> and the bit after the stressed vowel, this looks
> pretty manageable.

I'm not sure I follow what your system would be, but the
current system is strongly biased to favour lujvo over
fu'ivla, that's why you need to do a full rafsi check
in general. In some cases you find out soon enough that
it's not a rafsi string: anything starting with CCVCrC- or
CVCCrC- for example can never be a lujvo, hence the
easy to make type III fu'ivla.

mu'o mi'e xorxes




__
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail


posts: 2388


wrote:


> > But * is not quite it. For example CVCCV'V
> > doesn't need a y,
> > but rule * as stated would require it.
>
> What is the status of {spa'i}, which used to be
> bruted about?

Nevermind. I see that this will fail something
like slinku'i because CVCCV+'V clearly has
preferred status. CVCCV+'V just has to go as a
special case. Are there others?


posts: 1912


> --- John E Clifford <clifford-j@sbcglobal.net>
>
> > What is the status of {spa'i}, which used to be
> > bruted about?
>
> Nevermind. I see that this will fail something
> like slinku'i because CVCCV+'V clearly has
> preferred status. CVCCV+'V just has to go as a
> special case. Are there others?

We are looking for lujvo that start with CVCCV
and don't need a y-hyphen, right?

CVC-CVV-CVV
CVC-CVV-CVCCV
CVC-CVV-CVC-CVV
....
CVC-CVC-CVV
CVC-CVC-CVC-CVV
....
CVC-CVCCy-CVV
....

and lots, lots more.

mu'o mi'e xorxes




__
Do you Yahoo!?
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250


posts: 2388


wrote:

>
> --- John E Clifford wrote:
> > --- John E Clifford
> <clifford-j@sbcglobal.net>
> >
> > > What is the status of {spa'i}, which used
> to be
> > > bruted about?
> >
> > Nevermind. I see that this will fail
> something
> > like slinku'i because CVCCV+'V clearly has
> > preferred status. CVCCV+'V just has to go as
> a
> > special case. Are there others?
>
> We are looking for lujvo that start with CVCCV
> and don't need a y-hyphen, right?
>
> CVC-CVV-CVV
> CVC-CVV-CVCCV
> CVC-CVV-CVC-CVV
> ...
> CVC-CVC-CVV
> CVC-CVC-CVC-CVV
> ...
> CVC-CVCCy-CVV
> ...
>
> and lots, lots more.

Thanks. Using these to guide me through the
complexities (I won't for once say "obscurities"
of CLL), I have come to realize what a mass of
apparently needless complexity and restriction
has arisen from the historical and uncritical
development of this morphology. The two needed
things were unique segmentation and distinct form
classes — at least cmene, cmavo and brivla but
in the latter maybe also gismu, lujvo and
fuhivla. Given the general characterizations of
the form classes, a much simpler system would
have been possible (starting perhaps at the
moment of GMR, had anyone done the deeper
analysis now going on then) with essentially the
same result though allowing some additional
brivla. (I am not sure why we would want more
brivla space, but I would have been willing to
accept it for the simpler algorithm: "can" ain't
"is" after all). Themain thing blocking this
simpler algorithm for the present system is the
unhyphenated initial CVCCVs. There are probably
other problems with it as well, but they have not
yet emerged (partly, I'm sure, because I haven't
formulated the scheme completely). Changing the
rules to require all cases of CVCCV where the
cluster is initial and the first vowel is
unstressed to be hyphenated: CVCyCV would
eliminate the tosmabru problem for which it was
originally designed, eliminates the slinku'i
problem in at least a number of typical cases and
legitmates CCV'V brivla (for whatever that is
worth) and apparently a large number of others.
(I keep adding to the list as possibilities come
to me so far we have all the CV(')VCyC... and
V')V((')VCyC... which loosens up the y rules
as well. Indeed, I think that the difference
between this algorithm and the present complex
system comes from thinking just in terms of
strings of phones rather than in terms of
building up strings out of rafsi blocks.The
latter process is a good one for construction of
systematic expressions — lujvo and type III
fuhivla — but restrictive for generating type
IVs and, indeed, forms wanted that have no
sources in other languagees at all.) As matters
stands, the simpler algorithm is not quite
useless, but needs to be used in conjunction with
another that checks to see if the whole (or some
part) is a lujvo (but this is needed to deal with
slinku'i problems anyhow, apparently — it is
just needed more often here).


posts: 1912


> Thanks. Using these to guide me through the
> complexities (I won't for once say "obscurities"
> of CLL), I have come to realize what a mass of
> apparently needless complexity and restriction
> has arisen from the historical and uncritical
> development of this morphology.

"Needless complexity" sounds like an appropriate
description of Lojban morphology.

> Changing the
> rules to require all cases of CVCCV where the
> cluster is initial and the first vowel is
> unstressed to be hyphenated: CVCyCV would
> eliminate the tosmabru problem for which it was
> originally designed, eliminates the slinku'i
> problem in at least a number of typical cases

Not in all cases though. For esample {zbroda}
would still fail slinku'i, because you would not
require a y in {lez-broda}. What you would need to
do to eliminate the slinku'i problem completely is
require a y-hyphen after CVC whenever the C forms
a valid initial *cluster* with what follows, not
just a vaild initial *pair*.

mu'o mi'e xorxes




__
Do you Yahoo!?
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250


posts: 2388


wrote:

>
> --- John E Clifford wrote:
> > Thanks. Using these to guide me through the
> > complexities (I won't for once say
> "obscurities"
> > of CLL), I have come to realize what a mass
> of
> > apparently needless complexity and
> restriction
> > has arisen from the historical and uncritical
> > development of this morphology.
>
> "Needless complexity" sounds like an
> appropriate
> description of Lojban morphology.
>
> > Changing the
> > rules to require all cases of CVCCV where the
> > cluster is initial and the first vowel is
> > unstressed to be hyphenated: CVCyCV would
> > eliminate the tosmabru problem for which it
> was
> > originally designed, eliminates the slinku'i
> > problem in at least a number of typical cases
>
>
> Not in all cases though. For esample {zbroda}
> would still fail slinku'i, because you would
> not
> require a y in {lez-broda}. What you would need
> to
> do to eliminate the slinku'i problem completely
> is
> require a y-hyphen after CVC whenever the C
> forms
> a valid initial *cluster* with what follows,
> not
> just a vaild initial *pair*.
>
Thanks again. I was already from other
considerations making the the rule
"in initial CVCC, the CC must be non-ininitial
(either off the list of initinals or hyphenated)
except when the immediately preceding vowel is
stressed or the CC is followed by y" I think
this can extend to the whole cmavoform+CC
pattern.


posts: 1912


> I was already from other
> considerations making the the rule
> "in initial CVCC, the CC must be non-ininitial
> (either off the list of initinals or hyphenated)
> except when the immediately preceding vowel is
> stressed or the CC is followed by y"

Sounds right.

> I think
> this can extend to the whole cmavoform+CC
> pattern.

For other cmavo forms the rule is already valid,
no need to change anything. (Forms involving 'y'
are special though, because 'y' is always a hyphen
in brivla, so they are always a compound of at least
two things.)

mu'o mi'e xorxes




__
Do you Yahoo!?
Yahoo! Sports - Sign up for Fantasy Baseball.
http://baseball.fantasysports.yahoo.com/


posts: 2388


wrote:

>
> --- John E Clifford wrote:
> > I was already from other
> > considerations making the the rule
> > "in initial CVCC, the CC must be
> non-ininitial
> > (either off the list of initinals or
> hyphenated)
> > except when the immediately preceding vowel
> is
> > stressed or the CC is followed by y"
>
> Sounds right.
>
> > I think
> > this can extend to the whole cmavoform+CC
> > pattern.
>
> For other cmavo forms the rule is already
> valid,
> no need to change anything. (Forms involving
> 'y'
> are special though, because 'y' is always a
> hyphen
> in brivla, so they are always a compound of at
> least
> two things.)
>
Well, of course this approach would not require
that there be things glued by the y introduced by
this rule (even though I called it a hyphen). I
do think y should be kept out of the part before
the first CC, however. As for the other cmavo
initials, they are covered so far as I can tell
only by the rule that says If you are adding
CV(')V and the immediately following is CV then
add r (or n)" but I am not sure that applies to
V')V((')V. And, of course, we are not in
this procedure working in terms of adding
something but simply in terms of strings, so "is
a compound of" need not apply. (It might be a
good idea to use something other than y for this
connection just to make that point — but then it
screws up the cases where it is a compound, not
that that is a concern for this system.)