Post dx2yzJXndVM

Severin Zahler Jan 26, 2017 (11:09)

Conlang Database project

I'm about to be done with exporting the eldamo data, the only big thing that's left to treat now is all the relations between the words; which I however already know how I'll export and store this data.

Right now I am however taking a step back and am double-checking whether there's any data I might have left behind. One of these I am still adding now is all the marks on the words about their reliability. Initially I only planned a mark if a word is reconstructed or not, but given the excellent data from eldamo, I want to make some distinctions.

+Paul Strack you have explained some of the marks on your Terminology page, but not all are mentioned there, thus I'd quickly like you ask about them.

Here's all marks I found as reference and word attributes, along with your explanation if available, or my guess what it may be.

[ - ] ?Tolkien deleted this reference
[ * ] Unattested form, but easily deducable from other words
[ # ] Unattested form, but can be derived through well known grammatical rules
[ † ] Poetic / Acrhaic form
[ ^ ] ???
[ ? ] Unattested form, questionable deducation from references
[ ! ] Neologism with no safe derivation
[ ‽ ] Form marked with "?" by Tolkien
[ | ] ???

[ -† ] ?Deleted archaic word
[ †- ] ?Deleted archaic word
[ ^† ] ???
[ †# ] ?Archaic unattested form, derivable by grammar rules
[ |† ] ???
[ ?† ] ?Questionably derived archaic form
[ *† ] Archaic unattested form, but credible

[ ** ] ???
[ -* ] ?Unattested form that as been proven to be wrong
[ *- ] ?Unattested form that as been proven to be wrong
[ -** ] ???
[ *^ ] ???

[ ^# ] ???
[ #^ ] ???
[ |# ] ???
[ +#- ] ?Grammatical derivation that has been proven to be wrong

[ -? ] ?Questionably derived form that has been proven to be wrong
[ -‽ ] ?Form Tolkien initially marked with "?" and later deleted
[ |? ] ???
[ |‽ ] ???

[ *^ ] ???

If you could clear me up on these, that'd help me a lot!



Severin Zahler Jan 26, 2017 (15:27)

I'm just working through the children, with as parent, and I'm stumbling about a few pecularities:

- The reference "Ngoldothrim" at GG/15.1308 has an element child with only a "form" and "variant" attribute; no source or actual word given.

- The reference "Goldothrim" at LT1A/Noldoli.083 has an element "Golda" with no source.

these two are the only two elements with refs as parents with no source attribute.

Paul Strack Jan 30, 2017 (01:34)

+Severin Zahler Sorry for the slow response. I somehow missed this post.

You mostly have the marks correct. The others:

** Are for forms that are known to be incorrect. Mostly they are things that Tolkien indicated should not happen.

| is for forms that appear in a deleted section that were not themselves individually deleted. It is a weaker version of -, which is used only for forms that were directly marked out by Tolkien. For example, if Tolkien wrote a word within some discussion of a grammatical function, then changed his mind and deleted that section of text, he might still have considered the word to be valid, having rejected the text for some other reason.

Finally, ^ is for a neologism adapted to a later version of a language based on established phonetic changes. It is mostly used for Noldorin words adapted to the phonology of Sindarin.

This last mark is part of a chain of certainty, ranging from (unmarked) > # > * > ^ > ! > **, going from most to least certain, with the last one being definitely wrong.

All the other options are combinations of two other marks.

I will look at the two element refs you mentioned. They are probably errors.

Paul Strack Jan 30, 2017 (01:38)

Sorry, one clarification on the mark [?]. It is usually used for forms that are not very legible in the original text, and therefore involve some guesses on the part of the editor (not me, the editor of the source document).

Severin Zahler Jan 31, 2017 (08:17)

Thank you very much for these explanations :D Will now see how I best add that to the database, on one hand I'd like to keep it simple and having the categories clear (i.e. so that anyone could judge what category a certain word should be in, without having to have a somewhat intuitive sense for it), on the other hand I don't really wanna lose any data!

Severin Zahler Jan 31, 2017 (09:53)

I'm a bit confused by the "*" mark. In your Terminology you wrote "* marks words that are unattested but can be reasonably deduced by comparison to other words", however when searching for "mark="*"" in the eldamo file I find many references with proper primary source that bear this mark, so they can't be unattested as a whole. Can it be that i.e. only the gloss is guessed or so? However that would contrast those references that have no gloss at all, assuming they're unknown...

Severin Zahler Jan 31, 2017 (11:22)

+Paul Strack won't just yet stop annoying you, here's another thing that looks derpy:

The word yĕrĕ(n) [GL/38.4208-2] is used twice in a tag, both times a "mark" attribute is present, but no mark is actually given (spells [mark=""]).

Paul Strack Feb 01, 2017 (07:13)

+Severin Zahler Regarding *, these are really reconstructed words rather than words that are not attested at all. The most common example of these are words that only appear as an element in a compound. That is why there are references with the * mark, which point to the portion of the compound where the word element appears.

I will take a look at the other issues you are reporting, but they sound like errors.

And don't apologize, I really appreciate the deep look you are taking of the Eldamo data, because you are definitely helping improve the quality of the data.

Paul Strack Feb 04, 2017 (19:14)

+Severin Zahler I corrected the errors you noted above. As before, I don't want to spend the day it requires to do a full Eldamo build, so I put the updated data file in a temporary location in github:

github.com - eldamo

Severin Zahler Feb 06, 2017 (16:40)

Skimming through the csv files I exported manually now, verifying that I exported the data correctly and did not miss any excpetions.

The following suspicious entries I found while doing so:
- PE17/157.9999 (word position 99?). There's also a lot of words with very high line numbers (99 and close to that), although I think that may be possible, it just seems to be supicious that the highest line number just happens to be 99 and not hundred-something...
I could easily make you a list of all sources, sorted by line number if you want to.
- PE17/113.4296 (similar as above)
- LR/152.3692
- PE19/093.2562
- WJ/140.3350

Beside those all entries with the layout source/page.line.word have a word position of 23 or lower.

- Ety/MIL-IK.025-22 (the end should most likely read "-2")
- PE17/048.41089 The fifth digit "9" is very exotic, all other entries who have such a fifth digit have it being 6 or lower, none with 7 or 8.

Note that this way I can only look out for extreme values, if there's typos which produced a result within a reasonable range I have no way of finding them. If you have some criteria that may identify a common type of typo I can gladly skim the data with my program for that!


If the gloss is preceded by a "*", i.e. gloss="*be", what exactly does that mean? Asking as I want to rearrange that and turn it into a separated mark, so that the word is easier to query for. Similar for "?" in front of glosses.

Furthermore on the topic of glosses: I'm a bit unsure of whether there's any intended consistency in the use of the various brackets, (), [] and {} that are in use in glosses.

The following glosses seem particularily suspicious:
- page-id 722395161: the gloss has a space at the front (gloss=" the stop on a flute")
- GL/48.7501-2: gloss="*"
- VT42/17.2812-1: gloss="XXbefore" (can't write this as it reads it as formatting, the two XX represent two asterisks)
- WJ/319.0309 gloss="a fixed idea, ..., will" (the three dots appeared as separate word in my export)
- EtyAC/KHIS.018, gloss=""




Paul Strack Feb 07, 2017 (03:02)

The GL and QL can have up to 99 lines (or more), but the rest of those are suspicious and probably typos.

For marks on glosses, they modify the gloss rather than the word. For example "*be" means that, while the word is attested, its gloss is not but can be reasonably deduced.

For example, Tolkien wrote several Catholic prayers in Quenya, but did not directly translate them. We can still deduce what the glosses are, however, if we assume the Quenya prayer has the same meaning as the original.

"?" in the gloss is more speculative, possibly a nearly illegible gloss.