Post hevGScdr2Hu

Severin Zahler Feb 16, 2017 (15:15)

Another quick update and some more questions for clarification to +Paul Strack...

I decided to get rid of the intermediary tables which link the reallang words with the attested records/references and conlang words. Instead I will make the connection over the generic relation table. The main reason for that is to be able to easily store the marks on the glosses.

The thought I'm currently playing with is dividing the "words" from eldamo into two (or more) groups, similar as it is on the eldamo webpage, so that there's one table only for those entries you'd find in a wordlist, i.e. actual single words, and maybe people and place names; things like grammar or phonology notes, or sentences may be stored elsewhere. That's just a thought though and the difference in useability would only be minor.

http://i.imgur.com/dYAVy8a.png

As for questions I have, take this:

- I've noticed the many glosses that are separated by semicolon [ ; ] rather than a comma [ , ]. It seems like these serve to point out words with multiple very different meanings, but I'd like to know how consistent this practice is, because if it is fairly consistent I would consider splitting these into multiple separately stored variants of the elvish words, so that the matching refences and relations could be linked properly.
Here's the list with all glosses that contain a semicolon: http://pastebin.com/S6uNDwX7

- There's only few elvish words with semicolons. I'm not sure what the idea is here though: http://pastebin.com/F4tRuxd6

- The second topic I'm wondering about is the use of { ... } in glosses, words and references. Is there any consistency on the use of these specific brackets, and the other brackets (i.e. ( ... ) and [ ... ] ) which could be used to further segment the data?

Severin Zahler Feb 16, 2017 (15:53)

current To-Do things with exporting are:
- exporting some last types of relations and store them in a reasonable way
- treat sources that have the layout of i.e. PE17/100.1010. but where the page number is a roman numeral (i.e. PM/xii.2808). Currently the program thinks they're of the Ety/ROOT.100 format as the page part is not technically numerical.
- sources of format Ety/ROOT.100-1; the variants at the end are not yet treated.
- revise export of glosses with ";"
- revise export of elvish words with ";"
- revise bracket treatment
- revise entries with a comma within brackets; currently these get split in the middle of the bracket.

Furthermore I am considering to slightly change the marks, especially the mark ** is problematic as it has two chars which is nasty to treat programmatically, and also as it seems to be related to the mark *, but actually is not. A possible variant to it may be [ × ].

Paul Strack Feb 19, 2017 (02:57)

Again, sorry for the slow response. For some reason I am not get notifications for your post and I am not sure why. Some answers:

1) Your intuition on the use of [ ; ] is correct. I used it to divide conceptually distinct groups of glosses, as opposed to glosses that are mostly synonymous.

2) For [ ; ] in Elvish phrase, I think those are my attempts to separate phrases in Tolkien's original text that were separated simply by a large space.

3) For the use of { ... } in Elvish phrases, those indicated deleted or revised words, while [ ... ] indicate editorial additions and ( ... ) generally appear in the original. In some cases, I changed [ ... ] in the original to ( ... ) to avoid confusion with editorial additions.

In the case of phonetic descriptions, I used [ ... ] to enclose IPA phonemes and { ... } to represent variable groups. Thus: [x{bdg}] = [xb] or [xd] or [xg].

Regarding changing [ ** ] to [ × ], that sounds fine to me. I am sticking with **, though, because it is standard linguistic notation for an incorrect form.

Severin Zahler Feb 20, 2017 (11:04)

Thanks a lot for the answers, and don't worry about the response time, I'm already very glad I can get these answers at all! I asked about the brackets as I am considering transforming these (especially the {...} for deletions and revisions) into i.e. "change" relations. Generally I'm trying to take away as many additional information from the words as possible, i.e. remove the marks from the glosses and store them separately so that the word as it will appear in the word list at some point is seen in an as basic form as possible, so that filtering, sorting and searching is as easy and reliable as possible.

For the marks: I didn't even know that these marks were some sort of standard^^

Paul Strack Feb 20, 2017 (16:27)

The braces { ... } don't appear in the entries for words, only for phrases. I used this notation because for phrases, it is often unclear what the order of changes are. Where ( ... ) and [ ... ] appear for words, you should probably preserve them.