Conlang Database project
I'm about to be done with exporting the eldamo data, the only big thing that's left to treat now is all the relations between the words; which I however already know how I'll export and store this data.
Right now I am however taking a step back and am double-checking whether there's any data I might have left behind. One of these I am still adding now is all the marks on the words about their reliability. Initially I only planned a mark if a word is reconstructed or not, but given the excellent data from eldamo, I want to make some distinctions.
+Paul Strack you have explained some of the marks on your Terminology page, but not all are mentioned there, thus I'd quickly like you ask about them.
Here's all marks I found as reference and word attributes, along with your explanation if available, or my guess what it may be.
[ - ] ?Tolkien deleted this reference
[ * ] Unattested form, but easily deducable from other words
[ # ] Unattested form, but can be derived through well known grammatical rules
[ † ] Poetic / Acrhaic form
[ ^ ] ???
[ ? ] Unattested form, questionable deducation from references
[ ! ] Neologism with no safe derivation
[ ‽ ] Form marked with "?" by Tolkien
[ | ] ???
[ -† ] ?Deleted archaic word
[ †- ] ?Deleted archaic word
[ ^† ] ???
[ †# ] ?Archaic unattested form, derivable by grammar rules
[ |† ] ???
[ ?† ] ?Questionably derived archaic form
[ *† ] Archaic unattested form, but credible
[ ** ] ???
[ -* ] ?Unattested form that as been proven to be wrong
[ *- ] ?Unattested form that as been proven to be wrong
[ -** ] ???
[ *^ ] ???
[ ^# ] ???
[ #^ ] ???
[ |# ] ???
[ +#- ] ?Grammatical derivation that has been proven to be wrong
[ -? ] ?Questionably derived form that has been proven to be wrong
[ -‽ ] ?Form Tolkien initially marked with "?" and later deleted
[ |? ] ???
[ |‽ ] ???
[ *^ ] ???
If you could clear me up on these, that'd help me a lot!
Severin Zahler Jan 26, 2017 (15:27)
- The reference "Ngoldothrim" at GG/15.1308 has an element child with only a "form" and "variant" attribute; no source or actual word given.
- The reference "Goldothrim" at LT1A/Noldoli.083 has an element "Golda" with no source.
these two are the only two elements with refs as parents with no source attribute.
Paul Strack Jan 30, 2017 (01:34)
You mostly have the marks correct. The others:
** Are for forms that are known to be incorrect. Mostly they are things that Tolkien indicated should not happen.
| is for forms that appear in a deleted section that were not themselves individually deleted. It is a weaker version of -, which is used only for forms that were directly marked out by Tolkien. For example, if Tolkien wrote a word within some discussion of a grammatical function, then changed his mind and deleted that section of text, he might still have considered the word to be valid, having rejected the text for some other reason.
Finally, ^ is for a neologism adapted to a later version of a language based on established phonetic changes. It is mostly used for Noldorin words adapted to the phonology of Sindarin.
This last mark is part of a chain of certainty, ranging from (unmarked) > # > * > ^ > ! > **, going from most to least certain, with the last one being definitely wrong.
All the other options are combinations of two other marks.
I will look at the two element refs you mentioned. They are probably errors.
Paul Strack Jan 30, 2017 (01:38)
Severin Zahler Jan 31, 2017 (08:17)
Severin Zahler Jan 31, 2017 (09:53)
Severin Zahler Jan 31, 2017 (11:22)
The word yĕrĕ(n) [GL/38.4208-2] is used twice in a
Paul Strack Feb 01, 2017 (07:13)
I will take a look at the other issues you are reporting, but they sound like errors.
And don't apologize, I really appreciate the deep look you are taking of the Eldamo data, because you are definitely helping improve the quality of the data.
Paul Strack Feb 04, 2017 (19:14)
github.com - eldamo
Severin Zahler Feb 06, 2017 (16:40)
The following suspicious entries I found while doing so:
- PE17/157.9999 (word position 99?). There's also a lot of words with very high line numbers (99 and close to that), although I think that may be possible, it just seems to be supicious that the highest line number just happens to be 99 and not hundred-something...
I could easily make you a list of all sources, sorted by line number if you want to.
- PE17/113.4296 (similar as above)
- LR/152.3692
- PE19/093.2562
- WJ/140.3350
Beside those all entries with the layout source/page.line.word have a word position of 23 or lower.
- Ety/MIL-IK.025-22 (the end should most likely read "-2")
- PE17/048.41089 The fifth digit "9" is very exotic, all other entries who have such a fifth digit have it being 6 or lower, none with 7 or 8.
Note that this way I can only look out for extreme values, if there's typos which produced a result within a reasonable range I have no way of finding them. If you have some criteria that may identify a common type of typo I can gladly skim the data with my program for that!
If the gloss is preceded by a "*", i.e. gloss="*be", what exactly does that mean? Asking as I want to rearrange that and turn it into a separated mark, so that the word is easier to query for. Similar for "?" in front of glosses.
Furthermore on the topic of glosses: I'm a bit unsure of whether there's any intended consistency in the use of the various brackets, (), [] and {} that are in use in glosses.
The following glosses seem particularily suspicious:
- page-id 722395161: the gloss has a space at the front (gloss=" the stop on a flute")
- GL/48.7501-2: gloss="*"
- VT42/17.2812-1: gloss="XXbefore" (can't write this as it reads it as formatting, the two XX represent two asterisks)
- WJ/319.0309 gloss="a fixed idea, ..., will" (the three dots appeared as separate word in my export)
- EtyAC/KHIS.018, gloss=""
Paul Strack Feb 07, 2017 (03:02)
For marks on glosses, they modify the gloss rather than the word. For example "*be" means that, while the word is attested, its gloss is not but can be reasonably deduced.
For example, Tolkien wrote several Catholic prayers in Quenya, but did not directly translate them. We can still deduce what the glosses are, however, if we assume the Quenya prayer has the same meaning as the original.
"?" in the gloss is more speculative, possibly a nearly illegible gloss.