Post CTmDGcqJqnc

Paul Strack Jul 07, 2018 (19:36)

This is the last general post on Eldamo 0.6.6. It is more technical in nature, and it therefore primarily of interest for those who work with the raw Eldamo data for sites like elfdict.com

There seems to be an increasing interest in collaborating with me on Eldamo data, which means I've needed to fix a major flaw in how Eldamo identifies its entries. Previously, Eldamo identified word entries only through the language and word itself in the raw XML data:


Eldamo also has a "page-id", but originally that ID was always derived from the word and was not fixed in value. For example, not long ago I changed ᴹQ. “hand” to ᴹQ. má¹ “hand” because a distinct word was publish in PE21: ᴹQ. má² “land, region”. This meant that the page-id for ᴹQ. “hand” changed at that time (about a year ago).

This is no longer true. I've modified the Eldamo logic so that page-id is immutable, and won't change if the word value changes. This should make collaboration easier, because you can link to Eldamo data using the fixed and immutable page-id instead of the word value itself (which might change for typo corrections or because new and similar words are published).

This is how the internationalized data I mentioned in my previous post works: it uses the immutable page-id to link the non-English translations to the associated entry in Eldamo. This way, as I edit Eldamo, the non-English translations will remained linked even if the word changes in small ways.

This is not to say a page-id will always exist. Sometimes I discover entries in Eldamo that I now think are bogus, and delete them, and their associated page-id also disappears. But if I modify rather than delete an entry, the page-id will stay fixed.
Welcome! - Parf Edhellen: an elvish dictionary
Parf Edhellen is one of the most comprehensive elvish dictionaries on the Internet, with thousands of names, words and phrases in beautiful elvish.

Lokyt L. Jul 11, 2018 (11:54)

+Paul Strack May I take the liberty of using this post to draw your attention to another possible slip in Eldamo?

You list G. gwalir "a rime" (https://eldamo.org/content/words/word-2014687261.html) together with its synonym gwaidhi as a derivation of ᴱ√GʷIÐ.
However, gwalir is IMHO clearly a combination of the collective prefix *ŋwa- (https://eldamo.org/content/words/word-3499830491.html) and ᴱ√LIÐ (https://eldamo.org/content/words/word-2340811897.html), quite the same as golairin (https://eldamo.org/content/words/word-292998387.html). You even got it right with your ᴺS. adaptation (https://eldamo.org/content/words/word-3116099065.html) :-)
So gwalir doesn't belong with gwaidhi (and ᴱ√GʷIÐ) at all.

Paul Strack Jul 11, 2018 (15:55)

+Lokyt L. You are quite right. I have these arranged wrong. I agree that gwalir is a variant of golairin, not gwaidhi. I will rearrange the entries.

Lokyt L. Jul 16, 2018 (10:44)

+Paul Strack I'm afraid I can't leave you alone yet :)

I've noticed that some word variants, probably all from the set of those deleted by Tolkien at some time, still don't show in the Search. I found this out with some G. adverbs & adjectives, namely bodron (https://eldamo.org/content/words/word-3189058085.html, where it is furthermore wrongly placed with an adjective despite being explicitely marked as "av." in GL), egron (https://eldamo.org/content/words/word-3595745785.html) and fidron (https://eldamo.org/content/words/word-255498755.html), but it might very well be more widespread.
Is this intentional?
eldamo.org - Eldamo : Gnomish : bodra

Paul Strack Jul 17, 2018 (06:36)

+Lokyt L. You are correct that this is intentional: deleted variants often don’t always show up in searches. I have to make some compromises between “completeness” and “effeciency” and deleted forms often don’t make the cut.

Lokyt L. Jul 17, 2018 (07:42)

+Paul Strack I see. Thank you for the answer.
It makes some research procedures harder (as one cannot get a reliably complete list e.g. of all occurances of one particular suffix - which is what I was trying to do), but I gather things like this aren't the primary purpose of Eldamo anyway.
(However, in the particular case of GL, I'd say most of the deleted variants actually deserve a place amongst the indexed ones more than their later replacements. The former mostly belong to the main, most coherent and most systematically composed layer of the text, whereas the later changes were AFAIK only occasional and rather erratic - cf. PE 11/2-3.)
Anyway, I'd still like to point out bodron's not being an adjective.

Paul Strack Jul 17, 2018 (15:21)

+Lokyt L. I don’t disagree with you. The Gnomish material could definitely use some cleanup and reorganization. I still intend to someday make a second pass through it to finish my phonetic analysis and describe each entry in more detail. It just a matter of finding the time, which isn’t easy. There are a lot of competing priorities.

Lokyt L. Jul 17, 2018 (16:30)

+Paul Strack Yeah, I know how that feels :)

Damien Bador Jul 20, 2018 (06:45)

+Paul Strack This for me is an issue. I believe the best solution would be to have a specific tag for deleted words, which would allow people to look up these words (or not). One could even see a value in searching only deleted words to investigate whether some recurring patterns appear.

Damien Bador Jul 20, 2018 (06:49)

+Paul Strack Meanwhile, I find your latest update destabilizing: some authentic words such as Tel. jagula no longer show up, while only its "regularized" version *yagula appears.

This feels like the wrong way to proceed, which will end up into a Neo-Elvish dictionary, and not a true dictionary of Tolkien's Elvish.

However, in this case again, a solution could be to have a tag for regularized orthographies, leaving readers to decide whether they want to search all forms, regularized forms only, or authentic forms only. The same could work for neologisms, by the way.

Paul Strack Jul 20, 2018 (07:21)

+Damien Bador these are good suggestions, but there are technical limitations on how advanced I can make the search engine given some of the early design decisions I’ve made. In particular, since I want the search to work offline in a browser without a backing database, there are limitations on how much information I can load in memory before things start running too slowly. I’m basically close to those limits now.

But those are really limitations of the search engine, not the data model itself. In theory you could parse the Eldamo data model into some other kind of engine and build the kind of searches you want. All the information is there in the raw XML model, including all the deleted forms and original spellings.

Most people don’t have the technical expertise to work with the raw XML, but one of things I’ve been considering is producing spreadsheet exports of Eldamo data for research purposes, since that pretty accessible for most people. I just don’t have clear requirements right now for it.

Damien Bador Jul 20, 2018 (09:41)

+Paul Strack Thanks for your answer. Indeed, if an alternate search engine could be built, this could alleviate my concerns. I'd have to take some time looking up with competent friends whether this would be difficult to implement.

Damien Bador Jul 20, 2018 (10:36)

+Paul Strack As a matter of fact, I've hit a new "ERROR:MULTIGLOSS" in entry for (h)róna. It's likely the new update as created some similar issues here and there.