/* LOJBAN MACHINE GRAMMAR, BASELINED AS OF 20 JULY 1990 */ /* INCORPORATES JC'S TECH FIXES 1-28 */ /* COPYRIGHT 1989,1990 THE LOGICAL LANGUAGE GROUP, INC. */ /* CONTACT THAT ORGANIZATION AT 2904 BEAU LANE, FAIRFAX VA 22031 USA */ /* 703-385-0273 */ /* PERMISSION TO COPY GRANTED SUBJECT TO YOUR VERIFICATION THAT THIS IS THE */ /* LATEST VERSION OF THE LOJBAN GRAMMAR, THAT YOUR DISTRIBUTION BE FOR */ /* PROMOTION OF LOJBAN, THAT THERE IS NO CHARGE FOR THE PRODUCT, AND THAT */ /* THIS COPYRIGHT NOTICE IS INCLUDED INTACT IN THE COPY. */ /* PERMISSION GRANTED FOR USE IN DERIVED WORKS PROVIDED THAT THE FACTS OF */ /* DERIVATION AND THAT THE GRAMMAR BASELINE IS PRELIMINARY ARE STATED, AND */ /* PROVIDING THE NAME AND ADDRESS OF THE LOGICAL LANGUAGE GROUP, INC. AS A*/ /* SOURCE OF FURTHER INFORMATION ABOUT THE GRAMMAR AND ABOUT LOJBAN. */ /* WE PLAN TO PLACE THE GRAMMAR IN THE PUBLIC DOMAIN UPON FINAL BASELINE */ /* APPROVAL, EXPECTED LATE IN 1990 */ /*grammar.bas */ /* The Lojban machine parsing algorithm is a multi-step process. The YACC machine grammar presented here is an amalgam of those steps, concatenated so as to allow YACC to verify the syntactic ambiguity of the grammar. YACC is used to generate a parser for a portion of the grammar, which is LALR1 (the type of grammar that YACC is designed to identify and process successfully), but most of the rest of the grammar must be parsed using some language-coded processing. Step 1 - Lexing From phonemes, stress, and pause, it is possible to resolve Lojban unambigu- ously into a stream of words. Any machine processing of speech will have to have some way to deal with 'non-Lojban' failures of fluent speech, of course. The resolved words can be expressed as a text file, using Lojban's phonetic spelling rules. The following steps, assume that there is the possibility of non-Lojban text within the Lojban text (delimited appropriately). Such non-Lojban text may not be reducible from speech phonetically. However, step 2 allows the filtering of a phonetically transcribed text stream, to recognize such portions of non-Lojban text where properly delimited, without interference with the parsing algorithm. Step 2 - Filtering From start to end, performing the following filtering and lexing tasks using the given order of precedence in case of conflict: a. If the Lojban word "zoi" (lexeme ZOI) is identified, take the following Lojban word (which should be end delimited with a pause for separation from the following non-Lojban text) as an opening delimiter. Treat all text following that delimiter, until that delimiter recurs >after a pause<, as grammatically a single token (labelled 'string_699' in this grammar). There is no need for processing within this text except as necessary to find the closing delimiter. b. If the Lojban word "zo" (lexeme ZO) is identified, treat the following Lojban word as a token labelled 'word_698', instead of lexing it by its normal grammatical function. c. If the Lojban word "lo'u" (lexeme LohU) is identified, search for the closing delimiter "le'u" (lexeme LEhU), ignoring any such closing delimiters absorbed by the previous two steps. The text between the delimiters should be treated as the single token 'words_697'. d. Categorize all remaining words into their Lojban lexeme category, including the various delimiters mentioned in the previous steps. In all steps after step 2, only the lexeme token type is significant for each word. e. If the word "si" (lexeme SI) is identified, erase it and the previous word (or token, if the previous text has been condensed into a single token by one of the above rules). f. If the word "sa" (lexeme SA) is identified, erase it, and all preceding text back to and including the first preceding token word which is in one of the lexemes: I, NIhO, LU, TUhE, and TO. g. If the word 'su' (lexeme SU) is identified, erase it and all preceding text back to and including the first preceding token word which is in one of the lexemes: NIhO, LU, TUhE, and TO. However, if speaker identification is available, a SU shall only erase to the beginning of a speaker's discourse, unless it occurs at the beginning of a speaker's discourse. (Thus, if the speaker has said something, two "su"'s are required to erase the entire conversation. Step 3 - Termination If the text contains a FAhO, treat that as the end-of-text and ignore everything that follows it. Step 4 - Absorption of Grammar-Free Tokens In a new pass, perform the following absorptions (absorption means that the token is removed from the grammar for processing in following steps, and optionally reinserted, grouped with the absorbing token after parsing is completed). a. Absorb all lexeme BAhE and PEhA tokens into the following token. If they occur at the end of text, relabel them as UI. b. Absorb all lexeme BU tokens into the previous token. Relabel the previous token as lexeme BY. c. If lexeme NAI occurs at the beginning of text, relabel it as lexeme UI. d. If lexeme NAI occurs immediately following any of tokens UI or CAI, absorb the NAI into the previous token. e. Absorb all members of lexemes POhA, DAhO, FUhO, FUhE, UI, Y, and CAI into the previous token. All of these null grammar tokens are permitted following any word of the grammar, without interfering with that word's grammatical function, or causing any effect on the grammatical interpretation of any other token in the text. Indicators at the beginning of text are explicitly handled by the grammar. Step 5 - Insertion of Lexer Lexemes Lojban is not in itself LALR1. There are words whose grammatical function is determined by following tokens. As a result, parsing of the YACC grammar must take place in two steps. In the first step, certain strings of tokens with defined grammars are identified, and either a. are replaced by a single specified 'lexer token' for step 6, or b. the lexer token is inserted in front of the token string to identify it uniquely. The YACC grammar included herein is written to make YACC generation of a step 6 parser easy regardless of whether a. or b. is used. The strings of tokens to be labelled with lexer lexemes are found in rule terminals labelled with numbers between 900 and 1099. These rules are defined with the lexer lexemes inserted, with the result that it can be verified that the language is LALR1 under option b. after steps 1 through 4 have been performed. Alternatively, if option a. is to be used, these rules are commented out, and the rule terminals labelled from 800 to 900 refer to the lexer tokens >without< the strings of defining tokens. Two sets of lexer lexeme tokens are defined in the token set so as to be compatible with either option. In this step, the strings must be labelled with the appropriate lexer tokens. Order of inserting lexer tokens >IS< significant, since some shorter strings that would be marked with a lexer token may be found inside longer strings. If the tokens are inserted before or in place of the shorter strings, the longer strings cannot be identified. Insert the lexer tokens in the following order: 1. All lexer tokens not specifically deferred to 2. 3., and 4. below. 2. lexer_O (tenses/modals) 3. lexer_K 4. lexer_B, lexer_L, lexer_P, lexer_Q, lexer_R, and lexer_S 5. lexer_E and lexer_F Step 6 - YACC Parsing YACC should now be able to parse the Lojban text in accordance with the rule terminals labelled from 1 to 899 under option 5a, or 1 to 1099 under option 5b. Comment out the rules beyond 900 if option 5a is used, and comment out the 700-series of lexer-tokens, while restoring the series of lexer tokens numbered from 900 up. */ /* In the following token definitions, lexeme names in the comments are the equivalent lexemes from earlier Institute versions of Loglan, to aid those familiar with the older grammar. */ %token A_501 /* A = eks; basic afterthought logical operators */ %token BAI_502 /* PA = modal operators */ %token BAhE_503 /* next word intensifier */ %token BE_504 /* JE = sumti link to attach sumti to a bri_unit */ %token BEI_505 /* JUE = sumti link to attach >2nd sumti */ %token BEhO_506 /* terminates BE/BEI specified descriptors */ %token BIhI_507 /* interval component of JOI */ %token BO_508 /* CI = joins two units with shortest scope */ %token BRIVLA_509 /* PREDA = any brivla */ %token BRODA_510 /* PREDA = bridi free variable; assigned with CEI */ %token BU_511 /* = turns an EK into a BY vowel lerfu */ %token BY_513 /* TAI = individual lerfu */ %token CAhA_514 /* specifies actuality/potentiality of tense */ %token CAI_515 /* afterthought intensifier */ %token CEI_516 /* pro-bridi assignment operator */ %token CMENE_517 /* DJAN = names; require consonant end, then pause no LA or DOI lexemes embedded, pause before if vowel initial and preceded by a vowel */ %token CO_518 /* GO = tanru inversion operator */ %token COI_519 /* HOI = vocative marker permitted inside names must always be followed by pause or DOI */ %token CU_520 /* GA = untensed bridi (tail?) marker */ %token CUhE_521 /* tense/modal question */ %token DAhO_524 /* cancel anaphora/cataphora assignments */ %token DOI_525 /* HOI = vocative marker */ %token DOhU_526 /* terminator for DOI-marked vocatives */ %token FA_527 /* modifier head generic case tag */ %token FAhA_528 /* superdirections in space */ %token FAhO_529 /* normally elided 'done pause' to indicate end of utterance string */ %token FEhE_530 /* event interval mod flag */ %token FEhU_531 /* ends bridi to modal conversion */ %token FIhO_532 /* marks bridi to modal conversion */ %token FOI_533 /* end compound lerfu */ %token FUhE_535 /* open long scope for indicator */ %token FUhO_536 /* close long scope for indicator */ %token GA_537 /* KA = keks; forethought logical operators */ %token GEhU_538 /* = marker ending GOI relative clauses */ %token GI_539 /* KI = forethought medial marker */ %token GIhA_541 /* A = EK set for bridi */ %token GOI_542 /* JI = attaches a sumti modifier to a sumti */ %token GOhA_543 /* back-counting pro-bridi */ %token GUhA_544 /* GEK for tanru units, corresponds to JEKs */ %token I_545 /* I = sentence link */ %token JA_546 /* CA = Ceks; logical operators within metaphors */ %token JAI_547 /* modal conversion flag */ %token JOI_548 /* ZE = mixed connective */ %token KEhE_550 /* right terminator for KE groups */ %token KE_551 /* GE = left long scope marker */ %token KEI_552 /* GUE = right terminator, NU abstractions */ %token KI_554 /* multiple utterance scope for tenses */ %token KOhA_555 /* DA,BA = sumti anaphora */ %token KU_556 /* GU = right comma */ %token KUhO_557 /* right terminator, NOI relative clauses */ %token LA_558 /* LA = name descriptors */ %token LAU_559 /* lerfu prefixes */ %token LAhE_561 /* LAE = indirect/symbolic reference */ %token LE_562 /* LE = sumti descriptors */ %token LEhAVLA_564 /* borrowing brivla */ %token LEhU_565 /* LU possibly ungrammatical text right quote */ %token LI_566 /* LIO = convert number to sumti */ %token LIhU_567 /* LU = grammatical text right quote */ %token LOhO_568 /* elidable terminator for LI */ %token LOhU_569 /* LI = possibly ungrammatical text left quote */ %token LU_571 /* LI = grammatical text left quote */ %token LUhI_572 /* set membership descriptor */ %token LUhU_573 /* set membership close delimitor */ %token ME_574 /* ME = converts a sumti into a bri_unit */ %token MEhU_575 /* terminator for ME */ %token MOhI_577 /* motion interval flag */ %token NA_578 /* NO = bridi negation operator */ %token NAI_581 /* NOI = -noi attached to words to negate them */ %token NAhE_582 /* scalar negation operator */ %token NIhO_583 /* NAU = new paragraph; change of subject */ %token NOI_584 /* JIA = attaches a subordinate clause to a sumti */ %token NU_585 /* PO = abstraction operator */ %token NUhI_586 /* marks the start of a termset */ %token NUhU_587 /* marks the end of a termset */ %token PEhA_588 /* figurative speech opening marker */ %token POhA_589 /* figurative speech closing marker */ %token POhO_591 /* short terminator after partial sentences */ %token PU_592 /* PA = tenses and modal operators */ %token RAhO_593 /* flag for modified interpretation of GOhI */ %token ROI_594 /* converts quantifier to intensional tense */ %token SA_595 /* metalinguistic eraser to the beginning of the current utterance */ %token SE_596 /* NU = conversion operators */ %token SEI_597 /* metalinguistic bridi insert marker */ %token SEhU_598 /* metalinguistic bridi end marker */ %token SI_601 /* metalinguistic single word eraser */ %token SOI_602 /* reciprocal sumti marker */ %token SU_603 /* metalinguistic eraser of the entire text */ %token TAhE_604 /* tense interval property markers */ %token TEI_605 /* start compound lerfu */ %token TO_606 /* KIE = left parens */ %token TOI_607 /* KIU = right parens */ %token TUhE_610 /* multiple utterance scope mark */ %token TUhU_611 /* multiple utterance end scope mark */ %token UI_612 /* UI = attitudinals, indicators, discursives; also PEI and KI'A */ %token VA_613 /* VA distance in space-time */ %token VAU_614 /* end of terms; ..and so forth */ %token VEhA_615 /* space-time interval size */ %token VIhA_616 /* space-time dimensionality marker */ %token XI_617 /* CI = subscripting operator; separated from scope operator since they do two incompatible things */ %token Y_618 /* hesitation */ %token ZAhO_621 /* event properties - inchoative, etc. */ %token ZEhA_622 /* indicates interval size in a tense */ %token ZI_623 /* indicates scalar magnitude of a tense */ %token ZIhA_624 /* A = EK set for sumti modifiers */ %token ZO_625 /* LIU = single word metalinguistic quote marker */ %token ZOI_626 /* LIE = delimited quote marker */ %token ZOhU_627 /* GOI = prenex fronting comma */ %token BOI_651 /* MEX between operand delimiter */ %token FUhA_655 /* reverse polish flag */ %token GAhO_656 /* open/closed interval suffix for brackets */ %token JOhI_657 /* flags an array operand */ %token KUhE_658 /* MEX right comma delimiter: MEX KU */ %token MAI_661 /* converts various things to utterance ordinals */ %token MAhO_662 /* converts BY to MEX operators */ %token MOI_663 /* -RA = converts number to a bridi */ %token MOhE_664 /* quantifies a sumti, inverse of LI lexeme */ %token NAhU_665 /* converts a bridi into a MEX operatior */ %token NIhE_666 /* quantifies a bridi; inverse of MOI lexeme */ %token NUhA_667 /* turns a MEX operator into a bridi; inverse of MEhO */ %token PA_672 /* NI = numbers, also PI, RO, KIhO */ %token PEhO_673 /* polish flag */ %token TEhU_675 /* closing gap for non-MEX constructs in MEX */ %token VEI_677 /* left MEX bracket */ %token VEhO_678 /* right MEX bracket */ %token VUhU_679 /* MEX operator */ %token words_697 /* a string of lexable Lojban words */ %token word_698 /* any single lexable Lojban words */ %token string_699 /* a possibly unlexable phoneme string */ /* The following tokens are the actual lexer tokens. The _900 series tokens are duplicates that allow limited testing of lexer rules in the context of the total grammar. They are used in the actual parser, where the 900 series rules are found in the lexer. */ %token lexer_A_701 /* flags a MAI utterance ordinal */ %token lexer_B_702 /* flags an EK unless EK_BO, EK_KE */ %token lexer_C_703 /* flags an EK_BO */ %token lexer_D_704 /* flags an EK_KE */ %token lexer_E_705 /* flags a JEK */ %token lexer_F_706 /* flags a JOIK */ %token lexer_G_707 /* flags a GEK */ %token lexer_H_708 /* flags a GUhEK */ %token lexer_I_709 /* flags a NAhE_BO */ %token lexer_J_710 /* flags a NA_KU */ %token lexer_K_711 /* flags an I_BO (may have JOIK/JEK lexer tags)*/ %token lexer_L_712 /* flags a PA, unless MAI (then lexer A) */ %token lexer_M_713 /* flags a GIhEK_BO */ %token lexer_N_714 /* flags a GIhEK_KE */ %token lexer_O_715 /* flags a modal operator BAI or compound */ %token lexer_P_716 /* flags a GIK */ %token lexer_Q_717 /* flags a BY_string unless MAI (then lexer_A) */ %token lexer_R_718 /* flags a GIhEK, not BO or KE */ %token lexer_S_719 /* flags an I, not BO */ %token lexer_T_720 /* flags a ZIhEK */ /* %token lexer_U_721 /* null */ /* %token lexer_V_722 /* null */ /* %token lexer_W_723 /* null */ /* %token lexer_X_724 /* null */ %token lexer_Y_725 /* flags a PA_MOI */ /*%token lexer_A_905 /* : lexer_A_701 utt_ordinal_root_906 */ /*%token lexer_B_910 /* : lexer_B_702 EKroot_911 */ /*%token lexer_C_915 /* : lexer_C_703 EKroot_911 BO_508 */ /*%token lexer_D_916 /* : lexer_D_704 EKroot_911 KE_551 */ /*%token lexer_E_925 /* : lexer_E_705 JEKroot_926 */ /*%token lexer_F_930 /* : lexer_F_706 JOIK_root_931 */ /*%token lexer_G_935 /* : lexer_G_707 GA_537 */ /*%token lexer_H_940 /* : lexer_H_708 GUhA_544 */ /*%token lexer_I_945 /* : lexer_I_709 NAhE_582 BO_508 */ /*%token lexer_J_950 /* : lexer_J_710 NA_578 KU_556 */ /*%token lexer_K_955 /* : lexer_K_711 I_432 BO_508 */ /*%token lexer_L_960 /* : lexer_L_712 PA_root_961 */ /*%token lexer_M_965 /* : lexer_M_713 GIhEK_root_991 BO_508 */ /*%token lexer_N_966 /* : lexer_N_714 GIhEK_root_991 KE_551 */ /*%token lexer_O_970 /* : lexer_O_715 modal_972 */ /*%token lexer_P_980 /* : lexer_P_716 GIK_root_981 */ /*%token lexer_Q_985 /* : lexer_Q_717 BY_string_A_986 */ /*%token lexer_R_990 /* : lexer_R_718 GIhEK_root_991 */ /*%token lexer_S_995 /* : lexer_S_719 I_root_996 */ /*%token lexer_T_1000 /* : lexer_T_720 ZiHEK_root_1001 */ /*%token lexer_U_1005 /* null */ /*%token lexer_V_1010 /* null */ /*%token lexer_W_1015 /* null */ /*%token lexer_X_1020 /* null */ /*%token lexer_Y_1025 /* : lexer_Y_725 PA_root_961 MOI_663 */ %start text_0 %% text_0 : text_A_1 | indicators_411 text_A_1 | free_modifier_32 text_A_1 | cmene_A_404 text_A_1 | indicators_411 free_modifier_32 text_A_1 ; text_A_1 : JOIK_JEK_422 text_B_2 /* incomplete JOIK_JEK without preceding I */ /* compare note on utt_string_A_10 */ | text_B_2 ; text_B_2 : I_819 text_C_3 | para_mark_410 text_C_3 | text_C_3 ; text_C_3 : paragraphs_4 /* Only indicators which follow certain lexemes: cmene, TOI_607, LU_571, and the lexer_K and lexer_S I_roots and compounds, and at the start of text(_0), will survive the lexer; all other valid ones will be absorbed. The only strings for which indicators generate a potential ambiguity are those which contain NAI. An indicator cannot be inserted in between a token and its negating NAI, else you can't tell whether it is the indicator or the original token being negated. */ | /* empty */ /* An empty text is legal; formerly this was handled by the explicit appearance of FAhO_529, but this is now absorbed by the preparser. */ ; paragraphs_4 : utt_string_A_10 | utt_string_A_10 para_mark_410 paragraphs_4 ; utt_string_A_10 : utt_string_B_11 | utt_string_A_10 I_819 utt_string_B_11 | utt_string_A_10 I_819 POhO_gap_455 /* this last fixes an erroneous start to a sentence, and permits incomplete JOIK_JEK after I, as well in answer to questions on those connectives */ ; utt_string_B_11 : utt_string_C_12 | utt_string_C_12 I_BO_811 utt_string_B_11 | utt_string_C_12 I_BO_811 POhO_gap_455 /* this last fixes an erroneous start to a sentence, and permits incomplete JOIK_JEK after I, as well in answer to questions on those connectives */ ; utt_string_C_12 : utterance_20 | TUhE_610 paragraphs_4 TUhU_gap_454 | header_terms_30 TUhE_610 paragraphs_4 TUhU_gap_454 | PU_mod_491 TUhE_610 paragraphs_4 TUhU_gap_454 ; utterance_20 : EK_802 | NA_578 POhO_gap_455 | GIhEK_818 | ZIhEK_820 | quantifier_300 POhO_gap_455 | terms_80 VAU_gap_456 /* answer to ma */ /* mod_head_490 requires both gap_450 and VAU_gap_456 but needs no extra rule to accomplish this */ | relative_clause_110 | links_161 | linkargs_160 | sentence_40 | header_terms_30 ; header_terms_30 : terms_80 ZOhU_627 | terms_80 ZOhU_627 free_modifier_32 ; free_modifier_32 : free_modifier_A_33 | free_modifier_32 free_modifier_A_33 ; free_modifier_A_33 : vocative_35 | parenthetical_36 | discursive_bridi_34 | subscript_486 | utterance_ordinal_801 ; discursive_bridi_34 : SEI_440 bri_string_130 SEhU_gap_459 | SOI_602 sumti_90 SEhU_gap_459 | SOI_602 sumti_90 sumti_90 SEhU_gap_459 | SEI_440 terms_80 front_gap_451 bri_string_130 SEhU_gap_459 | SEI_440 terms_80 bri_string_130 SEhU_gap_459 ; vocative_35 : DOI_415 bri_string_130 DOhU_gap_457 | DOI_415 bri_string_130 relative_clause_110 DOhU_gap_457 | DOI_415 cmene_A_404 DOhU_gap_457 | DOI_415 cmene_A_404 relative_clause_110 DOhU_gap_457 | DOI_415 sumti_90 DOhU_gap_457 | DOI_415 DOhU_gap_457 ; parenthetical_36 : TO_606 text_0 TOI_gap_468 ; sentence_40 : bridi_tail_50 /* bare observative or mo answer */ | sentenceA_41 ; sentenceA_41 : GEK_807 sentenceA_41 GIK_816 sentenceA_41 | header_terms_30 sentence_40 | statement_42 ; statement_42 : terms_80 front_gap_451 bridi_tail_50 | terms_80 bridi_tail_50 ; bridi_tail_50 : bridi_tail_A_51 | bridi_tail_50 GIhEK_818 bridi_tail_A_51 tail_terms_71 ; bridi_tail_A_51 : bridi_tail_B_52 | bridi_tail_B_52 GIhEK_BO_813 bridi_tail_A_51 tail_terms_71 ; bridi_tail_B_52 : bridi_tail_C_53 | bridi_tail_C_53 GIhEK_KE_814 bridi_tail_50 KEhE_gap_466 tail_terms_71 ; bridi_tail_C_53 : gek_bridi_tail_54 | bri_string_130 tail_terms_71 ; gek_bridi_tail_54 : GEK_807 bridi_tail_50 GIK_816 bridi_tail_C_53 | PU_mod_491 KE_551 gek_bridi_tail_54 KEhE_gap_466 | NA_578 gek_bridi_tail_54 ; tail_terms_71 : terms_80 VAU_gap_456 | VAU_gap_456 ; terms_80 : term_81 | terms_80 term_81 ; term_81 : sumti_90 | modifier_82 | term_set_83 | NA_KU_810 ; modifier_82 : mod_head_490 gap_450 | mod_head_490 sumti_90 ; term_set_83 : NUhI_586 GEK_807 terms_80 NUhU_gap_460 GIK_816 terms_80 NUhU_gap_460 | NUhI_586 NAhE_582 GEK_807 terms_80 NUhU_gap_460 GIK_816 terms_80 NUhU_gap_460 | NUhI_586 terms_80 NUhU_gap_460 EK_802 terms_80 NUhU_gap_460 ; sumti_90 : sumti_A_91 | sumti_90 JOIK_EK_421 sumti_A_91 ; sumti_A_91 : sumti_B_92 | sumti_B_92 EK_BO_803 sumti_A_91 ; sumti_B_92 : sumti_C_93 | sumti_B_92 EK_KE_804 sumti_90 KEhE_gap_466 ; sumti_C_93 : sumti_D_95 | indefinite_sumti_94 ; indefinite_sumti_94 : quantifier_300 sumti_tail_A_114 gap_450 | quantifier_300 sumti_tail_A_114 gap_450 relative_clause_110 ; sumti_D_95 : sumti_E_96 | quantifier_300 sumti_E_96 ; sumti_E_96 : sumti_F_97 | LAhE_561 sumti_C_93 | NAhE_BO_809 sumti_C_93 ; sumti_F_97 : sumti_G_98 | GEK_807 sumti_90 GIK_816 sumti_C_93 /* negation of sumti GEK handled by negation of entire sumti in E_96 above */ ; sumti_G_98 : sumti_H_99 | sumti_H_99 relative_clause_110 ; sumti_H_99 : anaphora_400 | LA_558 cmene_A_404 | LI_566 MEX_310 LOhO_gap_472 | description_112 | quote_arg_432 ; relative_clause_110 : relative_clause_A_111 | relative_clause_110 ZIhEK_820 relative_clause_A_111 ; relative_clause_A_111 : GOI_542 term_81 GEhU_gap_464 | NOI_584 sentence_40 KUhO_gap_469 ; description_112 : LA_558 sumti_tail_113 gap_450 | LE_562 sumti_tail_113 gap_450 | LUhI_572 sumti_90 LUhU_gap_463 ; sumti_tail_113 : sumti_tail_A_114 | sumti_E_96 sumti_tail_A_114 | quantifier_300 sumti_90 ; sumti_tail_A_114 : bri_string_130 | quantifier_300 bri_string_130 ; bri_string_130 : PU_mod_491 bri_string_A_131 | bri_string_A_131 ; bri_string_A_131 : bri_string_B_132 | bri_string_B_132 CO_518 bri_string_A_131 | NA_578 bri_string_130 ; bri_string_B_132 : bri_string_C_133 | bri_string_B_132 bri_string_C_133 ; bri_string_C_133 : bri_string_D_134 | bri_string_C_133 JOIK_JEK_422 bri_string_D_134 ; bri_string_D_134 : bri_unit_150 | bri_unit_150 BO_508 bri_string_D_134 | GUhEK_bri_unit_136 | NAhE_582 GUhEK_bri_unit_136 ; GUhEK_bri_unit_136 : GUhEK_808 bri_string_130 GIK_816 bri_string_D_134 ; bri_unit_150 : bri_unit_A_151 | bri_unit_150 CEI_516 bri_unit_A_151 ; bri_unit_A_151 : bri_unit_B_152 | NU_425 sentence_40 KEI_gap_453 ; bri_unit_B_152 : bri_unit_C_153 | NAhE_582 bri_unit_A_151 ; bri_unit_C_153 : bri_unit_D_154 | bri_unit_D_154 linkargs_160 ; bri_unit_D_154 : bridi_valsi_407 | KE_551 bri_string_B_132 KEhE_gap_466 | SE_480 bri_unit_D_154 | JAI_547 PU_mod_491 bri_unit_D_154 | ME_574 sumti_90 MEhU_gap_465 | ME_574 sumti_90 MEhU_gap_465 MOI_663 | NUhA_667 MEX_operator_374 ; linkargs_160 : BE_504 term_81 BEhO_gap_467 | BE_504 term_81 links_161 BEhO_gap_467 ; links_161 : BEI_505 term_81 | BEI_505 term_81 links_161 ; /* Main entry point for MEX; everything but a number must be in parens. */ quantifier_300 : PA_812 | left_bracket_470 MEX_310 right_bracket_gap_471 ; /* Entry point for MEX used after LI; no parens needed, but LI now has an elidable terminator. (This allows us to express the difference between "the expression a + b" and "the expression (a + b)"_) */ /* This rule supports left-grouping infix expressions and reverse Polish expressions. To handle infix monadic, use a null operand; to handle infix with more than two operands (whatever that means) use an extra operator or an array operand. */ MEX_310 : MEX_A_311 | MEX_310 operator_370 MEX_A_311 | FUhA_655 rp_expression_330 ; /* Support for right-grouping (short scope) infix expressions with BO. */ MEX_A_311 : MEX_B_312 | MEX_B_312 BO_508 operator_370 MEX_A_311 ; /* Support for forethought (Polish) expressions. These begin with a forethought flag, then the operator and then the argument(s). */ MEX_B_312 : operand_381 | operator_370 MEX_C_313 MEX_gap_452 | PEhO_673 operator_370 MEX_C_313 MEX_gap_452 ; MEX_C_313 : MEX_B_312 | MEX_C_313 MEX_B_312 ; /* Reverse Polish expressions always have exactly two operands. To handle one operand, use a null operand; to handle more than two operands, use a null operator. */ rp_expression_330 : rp_operand_332 rp_operand_332 operator_370 ; rp_operand_332 : operand_381 | rp_expression_330 ; /* Operators may be joined by logical connectives. */ operator_370 : operator_A_371 | operator_370 JOIK_JEK_422 operator_A_371 ; operator_A_371 : operator_B_372 | GUhEK_808 operator_A_371 GIK_816 operator_B_372 ; operator_B_372 : operator_C_373 | KE_551 operator_370 KEhE_gap_466 ; operator_C_373 : MEX_operator_374 free_modifier_32 | MEX_operator_374 ; MEX_operator_374 : VUhU_679 | SE_480 MEX_operator_374 /* changes argument order */ | NAhE_582 MEX_operator_374 /* scalar negation */ | MAhO_662 BY_string_817 /* lerfu string as operator - shifts are independent of other lerfu */ | NAhU_665 bridi_tail_50 TEhU_gap_473 ; operand_381 : operand_A_382 | operand_381 JOIK_EK_421 operand_A_382 ; operand_A_382 : operand_B_383 | operand_B_383 EK_BO_803 operand_A_382 ; operand_B_383 : operand_C_384 | operand_B_383 EK_KE_804 operand_381 KEhE_gap_466 ; operand_C_384 : operand_D_385 | LAhE_561 operand_C_384 ; operand_D_385 : quantifier_300 | BY_string_817 /* lerfu string as operand - classic math variable */ | NIhE_666 bri_unit_C_153 TEhU_gap_473 /* quantifies a bridi - inverse of -MOI */ | MOhE_664 sumti_90 TEhU_gap_473 /* quantifies a sumti - inverse of LI */ | JOhI_657 MEX_C_313 TEhU_gap_473 | GEK_807 operand_381 GIK_816 operand_C_384 ; /* _400 series lexemes are mostly specific strings, some of which may also be used by the lexer; the lexer should not use any reference to terminals numbered less than _400, as they have grammars composed on non-deterministic strings of lexemes. Some above _400 also are this way, so care should be taken; this is especially true for those that reference free_modifier_32. */ anaphora_400 : KOhA_555 | KOhA_555 free_modifier_32 | BY_string_817 | BY_string_817 free_modifier_32 ; cmene_A_404 : cmene_A_405 | cmene_A_405 free_modifier_32 ; cmene_A_405 : CMENE_517 /* pause */ | cmene_A_405 CMENE_517 /* pause*/ /* multiple CMENE are identified morphologically (by the lexer) -- separated by consonant & pause */ ; bridi_valsi_407 : bridi_valsi_408 | bridi_valsi_408 free_modifier_32 ; bridi_valsi_408 : BRIVLA_509 | PA_MOI_823 | BRODA_510 | GOhA_543 | GOhA_543 RAhO_593 | LEhAVLA_564 ; para_mark_410 : NIhO_583 | NIhO_583 free_modifier_32 | NIhO_583 para_mark_410 ; indicators_411 : indicators_412 | FUhE_535 indicators_412 ; indicators_412 : indicator_413 | indicators_412 indicator_413 ; indicator_413 : UI_612 | CAI_515 | UI_612 NAI_581 | CAI_515 NAI_581 | Y_618 | POhA_589 | DAhO_524 | FUhO_536 ; DOI_415 : DOI_525 | COI_416 | COI_416 DOI_525 ; COI_416 : COI_A_417 | COI_416 COI_A_417 ; COI_A_417 : COI_519 | COI_519 NAI_581 ; JOIK_EK_421 : EK_802 | JOIK_806 | JOIK_806 free_modifier_32 ; JOIK_JEK_422 : JOIK_806 | JOIK_806 free_modifier_32 | JEK_805 | JEK_805 free_modifier_32 ; NU_425 : NU_A_426 | NU_425 JOIK_JEK_422 NU_A_426 ; NU_A_426 : NU_585 | NU_585 NAI_581 ; quote_arg_432 : quote_arg_A_433 | quote_arg_A_433 free_modifier_32 ; quote_arg_A_433 : ZOI_quote_434 | ZO_quote_435 | LOhU_quote_436 | LU_571 text_0 LIhU_gap_448 ; /* The quoted material in the following three terminals must be identified by the lexer, but no additional lexer processing is needed. */ ZOI_quote_434 : ZOI_626 word_698 /*pause*/ string_699 /*pause*/ word_698 ; /* 'pause' is morphemic, represented by '.' The lexer assembles string_699 */ ZO_quote_435 : ZO_625 word_698 ; /* 'word' may not be a compound; but it can be any valid Lojban lexeme value, including ZO, ZOI, SI, SA, SU. The preparser will not lex the word per its normal lexeme. */ LOhU_quote_436 : LOhU_569 words_697 LEhU_565 ; /* 'words' may be any Lojban words, with no claim of grammaticality; the preparser will not lex the individual words per their normal lexemes; used to quote ungrammatical Lojban, equivalent to the * or ? writing convention for such text. */ /* The preparser needs one bit of sophistication for this rule. A quoted string should be able to contain other quoted strings - this is only a problem for a LOhU quote itself, since the LEhU clossing this quote would otherwise close the outer quotes, which is incorrect. For this purpose, we will cheat on the use of ZO in such a quote (since this is ungrammatical text, it is a sign ignored by the parser). Use ZO to mark any nested quotation LOhU. The preparser then will absorb it by the ZO rule, before testing for LOhU. This is obviously not the standard usage for ZO, which would otherwise cause the result to be a sumti. But, since the result will be part of an unparsed string anyway, it doesn't matter. */ /* It may be seen that any of the ZO/ZOI/LOhU trio of quotation markers may contain the powerful metalinguistic erasers. Since these quotations are not parsed internally, these operators are ignored within the quote. To erase a ZO, then, two SI's are needed after giving a quoted word of any type. ZOI takes four SI's, with the ENTIRE BODY OF THE QUOTE treated as a single 'word' since it is one lexeme. Thus one for the quote body, two for the single word delimiters, and one for the ZOI. In LOhU, the entire body is treated as a single word, so three SI's can erase it. */ /* All rule terminator names with 'gap' in them are potentially elidable, where such elision does not cause an ambiguity. This is implemented through use of the YACC 'error' lexeme, which effectively recovers fom an elision. */ SEI_440 : SEI_597 | SEI_597 free_modifier_32 ; LIhU_gap_448 : LIhU_567 | error ; gap_450 : KU_556 | KU_556 free_modifier_32 | error ; front_gap_451 : CU_520 | CU_520 free_modifier_32 ; MEX_gap_452 : KUhE_658 | KUhE_658 free_modifier_32 | error ; KEI_gap_453 : KEI_552 | KEI_552 free_modifier_32 | error ; TUhU_gap_454 : TUhU_611 | TUhU_611 free_modifier_32 | error ; POhO_gap_455 : POhO_591 | POhO_591 free_modifier_32 | error ; VAU_gap_456 : VAU_614 | VAU_614 free_modifier_32 | error ; /* redundant to attach a free modifier on the following */ DOhU_gap_457 : DOhU_526 | error ; FEhU_gap_458 : FEhU_531 | FEhU_531 free_modifier_32 | error ; SEhU_gap_459 : SEhU_598 | error /* a free modifier on a discursive should be somewhere within the discursive. See SEI_440 */ ; NUhU_gap_460 : NUhU_587 | NUhU_587 free_modifier_32 | error ; BOI_gap_461 : BOI_651 | error ; LUhU_gap_463 : LUhU_573 | LUhU_573 free_modifier_32 | error ; GEhU_gap_464 : GEhU_538 | GEhU_538 free_modifier_32 | error ; MEhU_gap_465 : MEhU_575 | MEhU_575 free_modifier_32 | error ; KEhE_gap_466 : KEhE_550 | KEhE_550 free_modifier_32 | error ; BEhO_gap_467 : BEhO_506 | BEhO_506 free_modifier_32 | error ; TOI_gap_468 : TOI_607 | error ; KUhO_gap_469 : KUhO_557 | KUhO_557 free_modifier_32 | error ; left_bracket_470 : VEI_677 ; right_bracket_gap_471 : VEhO_678 | VEhO_678 free_modifier_32 | error ; LOhO_gap_472 : LOhO_568 | error ; TEhU_gap_473 : TEhU_675 | error ; right_br_no_free_474 : VEhO_678 | error ; SE_480 : SE_596 | SE_596 free_modifier_32 ; FA_481 : FA_527 | FA_527 free_modifier_32 ; subscript_486 : XI_617 PA_812 | XI_617 left_bracket_470 MEX_310 right_br_no_free_474 | XI_617 BY_string_817 ; mod_head_490 : PU_mod_491 | FA_481 ; PU_mod_491 : tense_modal_815 | PU_mod_491 JOIK_JEK_422 tense_modal_815 | CUhE_521 ; utterance_ordinal_801 : lexer_A_905 ; EK_802 : lexer_B_910 | lexer_B_910 free_modifier_32 ; EK_BO_803 : lexer_C_915 | lexer_C_915 free_modifier_32 ; EK_KE_804 : lexer_D_916 | lexer_D_916 free_modifier_32 ; JEK_805 : lexer_E_925 ; JOIK_806 : lexer_F_930 ; GEK_807 : lexer_G_935 | lexer_G_935 free_modifier_32 ; GUhEK_808 : lexer_H_940 | lexer_H_940 free_modifier_32 ; NAhE_BO_809 : lexer_I_945 | lexer_I_945 free_modifier_32 ; NA_KU_810 : lexer_J_950 | lexer_J_950 free_modifier_32 ; I_BO_811 : lexer_K_955 | lexer_K_955 free_modifier_32 ; PA_812 : lexer_L_960 BOI_gap_461 | lexer_L_960 free_modifier_32 BOI_gap_461 /* BOI is the break between multiple numbers as in "50 five-gallon drums"; it also serves to break between operands in MEX */ ; GIhEK_BO_813 : lexer_M_965 | lexer_M_965 free_modifier_32 ; GIhEK_KE_814 : lexer_N_966 | lexer_N_966 free_modifier_32 ; tense_modal_815 : lexer_O_970 | lexer_O_970 free_modifier_32 | FIhO_532 bri_string_130 FEhU_gap_458 ; GIK_816 : lexer_P_980 | lexer_P_980 free_modifier_32 ; BY_string_817 : lexer_Q_985 BOI_gap_461 | lexer_Q_985 free_modifier_32 BOI_gap_461 ; GIhEK_818 : lexer_R_990 | lexer_R_990 free_modifier_32 ; I_819 : lexer_S_995 | lexer_S_995 free_modifier_32 ; ZIhEK_820 : lexer_T_1000 | lexer_T_1000 free_modifier_32 ; PA_MOI_823 : lexer_Y_1025 ; /* The following rules are used only in lexer processing. They have been tested for ambiguity at various levels in the YACC grammar, but are in the recursive descent lexer in the current parser. The lexer inserts the lexer lexemes before the processed strings, but leaves the original tokens. */ lexer_A_905 : lexer_A_701 utt_ordinal_root_906 ; utt_ordinal_root_906 : BY_string_A_986 MAI_661 | PA_root_961 MAI_661 ; lexer_B_910 : lexer_B_702 EKroot_911 ; EKroot_911 : A_501 | SE_596 A_501 | NA_578 A_501 | A_501 NAI_581 | SE_596 A_501 NAI_581 | NA_578 A_501 NAI_581 | NA_578 SE_596 A_501 | NA_578 SE_596 A_501 NAI_581 ; lexer_C_915 : lexer_C_703 EKroot_911 BO_508 | lexer_C_703 EKroot_911 no_FIhO_PU_mod_971 BO_508 ; lexer_D_916 : lexer_D_704 EKroot_911 KE_551 | lexer_D_704 EKroot_911 no_FIhO_PU_mod_971 KE_551 ; lexer_E_925 : lexer_E_705 JEK_root_926 ; JEK_root_926 : JA_546 | JA_546 NAI_581 | NA_578 JA_546 | NA_578 JA_546 NAI_581 | SE_596 JA_546 | SE_596 JA_546 NAI_581 | NA_578 SE_596 JA_546 | NA_578 SE_596 JA_546 NAI_581 ; lexer_F_930 : lexer_F_706 JOIK_root_931 ; JOIK_root_931 : JOI_548 | JOI_548 NAI_581 | SE_596 JOI_548 | SE_596 JOI_548 NAI_581 | BIhI_root_932 | GAhO_656 BIhI_root_932 GAhO_656 ; BIhI_root_932 : BIhI_507 | BIhI_507 NAI_581 | SE_596 BIhI_507 | SE_596 BIhI_507 NAI_581 ; lexer_G_935 : lexer_G_707 GA_537 | lexer_G_707 SE_596 GA_537 | lexer_G_707 GA_537 NAI_581 | lexer_G_707 SE_596 GA_537 NAI_581 | lexer_G_707 no_FIhO_PU_mod_971 GIK_root_981 | lexer_G_707 JOI_548 GIK_root_981 | lexer_G_707 SE_596 JOI_548 GIK_root_981 | lexer_G_707 BIhI_507 GIK_root_981 | lexer_G_707 SE_596 BIhI_507 GIK_root_981 ; lexer_H_940 : lexer_H_708 GUhA_544 | lexer_H_708 SE_596 GUhA_544 | lexer_H_708 GUhA_544 NAI_581 | lexer_H_708 SE_596 GUhA_544 NAI_581 ; lexer_I_945 : lexer_I_709 NAhE_582 BO_508 ; lexer_J_950 : lexer_J_710 NA_578 KU_556 ; lexer_K_955 : lexer_K_711 I_root_956 BO_508 | lexer_K_711 I_root_956 no_FIhO_PU_mod_971 BO_508 ; I_root_956 : I_545 | I_545 JOIK_JEK_957 ; JOIK_JEK_957 : JOIK_806 | JEK_805 ; /* no freemod in this version; cf. JOIK_JEK_422 */ /* this reference to a version of JOIK and JEK which already have the lexer_lexemes attached prevents shift/reduce errors. The problem is resolved in a hard-coded parser implementation which builds lexer_K, before lexer_S, before lexer_E and lexer_F. */ lexer_L_960 : lexer_L_712 PA_root_961 ; PA_root_961 : PA_672 | PA_root_961 PA_672 | PA_root_961 BY_987 ; lexer_M_965 : lexer_M_713 GIhEK_root_991 BO_508 | lexer_M_713 GIhEK_root_991 no_FIhO_PU_mod_971 BO_508 ; lexer_N_966 : lexer_N_714 GIhEK_root_991 KE_551 | lexer_N_714 GIhEK_root_991 no_FIhO_PU_mod_971 KE_551 ; lexer_O_970 : lexer_O_715 modal_972 ; /* the following rule is a lexer version of non-terminal _815 for compounding PU/modals; it disallows the lexer picking out FIhO clauses, which would require it to have knowledge of the main parser grammar */ no_FIhO_PU_mod_971 : modal_972 | no_FIhO_PU_mod_971 JOIK_JEK_957 modal_972 | CUhE_521 ; modal_972 : modal_A_973 | NAhE_582 modal_A_973 ; modal_A_973 : modal_B_974 | modal_B_974 KI_554 | KI_554 | tense_A_977 ; modal_B_974 : modal_C_975 | modal_C_975 NAI_581 ; modal_C_975 : BAI_502 | SE_596 BAI_502 ; tense_A_977 : tense_B_978 | CAhA_514 | tense_B_978 CAhA_514 ; /* specifies actuality/potentiality of the bridi */ /* puca'a = actually was */ /* baca'a = actually will be */ /* bapu'i = can and will have */ /* banu'o = can, but won't have yet */ /* canu'ojebapu'i = can, hasn't yet, but will */ tense_B_978 : tense_C_979 | tense_C_979 KI_554 ; tense_C_979 : time_1030 /* time-only */ /* space defaults to time-space reference space */ | space_time_1040 /* space-only unless specified with VIhA */ /* time defaults to the time-space reference time */ | time_1030 space_time_1040 /* time and space - specification of time if time_1030 leads, the space_time_1040 specifies only space. (If space_time_1040 is marked with VIhA for space-time the tense may be self-contradictory) */ /* interval prop before space_time is for time distribution */ ; lexer_P_980 : lexer_P_716 GIK_root_981 ; GIK_root_981 : GI_539 | GI_539 NAI_581 ; lexer_Q_985 : lexer_Q_717 BY_string_A_986 ; BY_string_A_986 : BY_987 | BY_string_A_986 BY_987 | BY_string_A_986 PA_672 ; BY_987 : BY_513 | LAU_559 BY_987 | TEI_605 BY_string_A_986 FOI_533 ; lexer_R_990 : lexer_R_718 GIhEK_root_991 ; GIhEK_root_991 : GIhA_541 | SE_596 GIhA_541 | NA_578 GIhA_541 | GIhA_541 NAI_581 | SE_596 GIhA_541 NAI_581 | NA_578 GIhA_541 NAI_581 | NA_578 SE_596 GIhA_541 | NA_578 SE_596 GIhA_541 NAI_581 ; lexer_S_995 : lexer_S_719 I_root_956 ; lexer_T_1000 : lexer_T_720 ZIhEK_root_1001 ; ZIhEK_root_1001 : ZIhA_624 | SE_596 ZIhA_624 | NA_578 ZIhA_624 | ZIhA_624 NAI_581 | SE_596 ZIhA_624 NAI_581 | NA_578 ZIhA_624 NAI_581 | NA_578 SE_596 ZIhA_624 | NA_578 SE_596 ZIhA_624 NAI_581 ; lexer_Y_1025 : lexer_Y_725 PA_root_961 MOI_663 | lexer_Y_725 BY_string_A_986 MOI_663 ; time_1030 : ZI_623 | ZI_623 time_A_1031 | time_A_1031 ; time_A_1031 : time_B_1032 | time_interval_1034 | time_B_1032 time_interval_1034 ; time_B_1032 : time_offset_1033 | time_B_1032 time_offset_1033 ; time_offset_1033 : time_direction_1035 | time_direction_1035 ZI_623 ; time_interval_1034 : ZEhA_622 | ZEhA_622 time_direction_1035 | interval_mod_1050 | ZEhA_622 interval_mod_1050 | ZEhA_622 time_direction_1035 interval_mod_1050 ; time_direction_1035 : PU_592 | PU_592 NAI_581 ; space_time_1040 : space_time_A_1042 | space_time_motion_1041 | space_time_A_1042 space_time_motion_1041 ; space_time_motion_1041 : MOhI_577 space_time_offset_1045 ; space_time_A_1042 : VA_613 | VA_613 space_time_B_1043 | space_time_B_1043 ; space_time_B_1043 : space_time_C_1044 | space_time_intval_1046 | space_time_C_1044 space_time_intval_1046 ; space_time_C_1044 : space_time_offset_1045 | space_time_C_1044 space_time_offset_1045 ; space_time_offset_1045 : space_direction_1048 | space_direction_1048 VA_613 ; space_time_intval_1046 : space_time_intval_A_1047 | space_time_intval_A_1047 space_direction_1048 | FEhE_530 interval_mod_1050 | space_time_intval_A_1047 FEhE_530 interval_mod_1050 | space_time_intval_A_1047 space_direction_1048 FEhE_530 interval_mod_1050 ; space_time_intval_A_1047: VEhA_615 | VIhA_616 | VEhA_615 VIhA_616 ; space_direction_1048 : FAhA_528 | FAhA_528 NAI_581 ; /* This terminal gives an interval size in space-time (VEhA), and possibly a dimensionality of the interval. The dimensionality may also be used with the interval size left unspecified. When this terminal is used for the spacetime origin, then barring any overriding VIhA, a VIhA here defines the dimensionality of the space-time being discussed. */ interval_mod_1050 : interval_prop_1051 | interval_prop_1051 event_mod_1052 | event_mod_1052 ; interval_prop_1051 : PA_root_961 ROI_594 | PA_root_961 ROI_594 NAI_581 | TAhE_604 | TAhE_604 NAI_581 ; /* extensional/intensional interval parameters */ /* These may be appended to any defined interval, or may stand in place of either time or space tenses. If no other tense is present, this terminal stands for the time-space interval parameter with an unspecified interval.*/ /* roroi = always and everywhere */ /* roroiku'avi = always here (ku'a = intersection) */ /* puroroi = always in the past /* paroi = once upon a time (somewhere) */ /* paroiku'avi = once upon a time here */ event_mod_1052 : event_mod_A_1053 | event_mod_1052 event_mod_A_1053 ; event_mod_A_1053 : ZAhO_621 | ZAhO_621 interval_prop_1051 ; /* The following are "Lexer-only rules", covered by steps 1-4 described at the beginning. The grammar of these lexemes is nonexistent, except possibly in cases where they interact with each other. Even there, however, the effects are semantic rather than grammatical. Where it is believed possible that conflicts could exist, the grammar of these constructs has been put in the above grammar, even though the lexer/Preparser will actually prevent these from being passed thru to the parse routine. (Otherwise we have to put unacceptably fancy code in the PreParser to determine just when these can be passed thru, and when they can't.) Lexemes in this category include quotes and indicators as defined above. (The above grammar handles utterance scope (free_modifier) and clause scope (gap) applications of the latter, however, and indicators should be allowed to be absorbed into almost any word without changing its grammar. SI_601, SA_595, and SU_603 are metalinguistic erasers. token_1100 : word_698 | BAhE_503 word_698 | PEhA_588 word_698 ; null_1101 : word_698 SI_601 | possibly_unlexable_word (PAUSE) SI_601 | utterance_20 SA_595 | possibly unlexable string (PAUSE) SA_595 erases back to the last individual token I or NIhO or start of text, ignoring the insides of ZOI, ZO, and LOhU/LEhU quotes. Start of text is defined for SU below. | text_C_3 SU_603 | possibly unparsable text (PAUSE) SU_603 erases back to start of text which is the beginning of a speaker's statement, a parenthesis (TO/TOI), a LU/LIhU quote, or a TUhE/TUhU utterance string. ; */ %%