Post Neg7i6MPEAr

Severin Zahler Mar 23, 2017 (12:09)

*Conlang Database Project*
Update March 23rd

Over the past few weeks I've been busy with working on a few other things, but that was a total benefit for this project as well as I got to know the in's and out's of what you can possibly do with a HTML form which is backed up by a lot of JS and CSS. With the knowledge of what is possible I now proceeded to write a first concept for the interface with which one will be able to search through the database. And I have gone nuts with my ideas, nothing shall be impossible, and all of it shall be easy to use even without any technical knowledge. Although I want to make it possible to inject RegEx and maybe even custom SQL Select statements.

Here's the written draft so far; I'll work on a GUI mock-up next to make the options a bit easier to understand maybe. If you have any further ideas what sort of query one possibly might want to execute on this database, please voice it!

Furthermore I documented what, and how, I exported from eldamo so far, so that may be interesting for you +Paul Strack!

While writing the document about the search form I had to constantly consider how to display the information I'd like to gather in my database. Right now I combined the thoughts I put into that with the big open question: How much of eldamo I want to export. +Lúthien Merilin kicked off the discussion about the more in-depth aspects of eldamo, and Paul especially elaborated on the aspects of phonological rules. To it is connected a very large field of data, with incredibly many facettes, which I yet have to understand in its entirety. For now I am making the decision to exclude this, and other sorts of data, i.e. the data of grammar pages, from my project, for the following reasons:

- My database is aimed to help people (inexperienced and professional students of elvish alike) with making translations with the mindset of "getting it done", i.e. not for creating new theories about the languages, but to use the languages.
- I don't know how to display this data in a meaningful way, or what options to supply to query it.
- I do not want to replace eldamo, in the contrary I now decided that I'd much rather like to incorporate eldamo into my database by providing links to the detailed word pages along with the results (given you are okay with this idea, Paul!)
- It makes Lúthien's project differ more from mine, which makes either of the two more interesting to follow, and eventually (hopefully) use frequently for working with Tolkien's languages.

More concretely I will discard all phonology-related elements from the eldamo-XML-file, and filter those elements with the speech attributes
- phoneme
- phonetic-rule
- grammar
- phonetic-group
- phonetics

everything else, including names, phrases, text and roots I will include.

- Change of topic -

I have a question about the data, and it is about elements being child of another element. I am wondering how I should store this relation. In the case of a element being a child of a it is clear, it's an attestation of the said word. But I am not sure what the information in within is... Use Case Conlang Database Use Case Conlang Database 3. Use Cases 3.9. Conlang Database The conlang (“Constructed Language”) database page offers a flexible form to query the data of the underlying database. On loading the page only a small part of the visible height of the page is taken by the form; most of th...

Paul Strack Mar 24, 2017 (04:54)

This looks quite interesting. I agree with your decision to omit the phonetic and grammatical data which, outside of a couple minor languages, is very incomplete anyway.

One thing your system can do that Eldamo can't is sophisticated search (since Eldamo is not backed by a DB). It might be nice to have Soundex search for cases where you don't know exactly how to spell a word.

For Sindarin in particular, it would be useful to build indexes of lenited and plural forms of all words. The search results could then include the singular and unlenited forms. For those new to Sindarin, I imagine that would be extremely useful.

Paul Strack Mar 24, 2017 (05:01)

Regarding word element heirarchy, these usually express the relationship between earlier and later forms of a word, and thus trace the conceptual development of a word. For example, a Sindarin word might contain its Noldorin equivalent, which in turn contains its Gnomish equivalent. See, for example: - Eldamo : Sindarin : acharn

The hierarchy may be based on similarities of form or meaning. In many cases this relationship can be subjective, though.

Paul Strack Mar 24, 2017 (05:04)

Actually, the word element hierarchy is least subjective in the case of names, because it is easier to tell when a new form of a name replaces and older form just be its role in the narrative. For example: - Eldamo : Sindarin : Aelin-uial

That, and Christopher Tolkien has done most of this work already in the History of Middle Earth series.

Severin Zahler Mar 24, 2017 (11:23)

Thanks for the clarification, and about the ideas! The thing with the inflected forms is definitely something I want to do for Sindarin and Quenya. The challenge will just be to find algorithms with which the inflections can be generated programmatically. Of course there's many exceptions no program can cover, but the goal should be to be able to create a solid base and then to only punctually make specific changes.
The open question is just whether it is performance wise better to have every single inflection of every single word pre-generated and saved to the database (with a couple thousand words with a handful of inflections each it would be a couple tenthousand data points) or whether those inflections that have no exceptions should be generated when they're needed. Initially I intended to follow this second idea, but as the amount of data would still very well be storeable without having to worry about disk space it probably is both easier and I presume also more performant to have them pre-generated.

I have not heard of Soundex before, will check that out definitely :D

Paul Strack Mar 25, 2017 (19:57)

Based on discussion in other threads, you probably want to import the tengwar attribute. It contains important information about the current phonology of the word.

In particular, for Sindarin, it tells you when a word began with a nasalize stop, which is important for lenition.

For Quenya it indicates older ñ and th in spelling (using thorn, which I can't type on my phone).

Severin Zahler Mar 27, 2017 (08:31)

Oh my, how could I forget these bits of information, ye of course I got to bring that into the DB in some way! Can you tell how complete these are in q and s?

Paul Strack Mar 27, 2017 (16:10)

+Severin Zahler They are reasonably complete for both. I did a full pass through both languages last year.

Severin Zahler Mar 29, 2017 (15:22)

I'm currently re-reading all conversations there were about my project here on LoME, and here's some thoughts regarding the earlier messages, but especially to the other conversation +Lúthien Merilin linked back then (

Although I have not taken direct inspiration from this conversation when making the concept for my instance of an elvish dictionary / database it does meet astonishingly many of the points mentioned, mainly due to Pauls eldamo, but also because of interfaces I planned on adding:

Mentioned by +Lúthien Merilin:
- Reliability of words: Filterable by eldamo-marks
- Etymological information like silme vs. thule: Thanks to th e reminder of Paul, this information will also be available.
- Exclude reconstructions and neologisms: Yes, by marks
- Include names: Yes, maybe sentences as well
- "Easy Regex": No, many many buttons offer the possibility to submit almost any query, without having to know any Regex.
- Full Regex injections: Yes
- Automatically generate new words by applying certain rules to the existing vocabulary: No, albeit it may be interesting to make a tool to make this half-automatic, i.e. that the tool suggest such a neologism and the user can decide whether to add it or not.
- Search inflected words (as does to some extent) and deliver base form of word as result: Yes
- Accomodate both academical and non-academical users: Yes. I may not supply the phonological development and rules, but for "academic" users wanting to translate something the DB should offer plenty of uses.
- provide Tolkiens roots: Yes, as far as eldamo contains them.
- differentiate attestations: Yes
- Mark deduced words: Yes, if the deduced word is gained from eldamo, the relation should already be present, if the word is gained from a list of neologies such relations should be added along with them.
- Users can add synonymous glosses: Yes (cf. below)

Mentioned by David Giraudeau:
- Normalizations (k >> c) being visible. Should be granted with the eldamo word --> reference relations.
- Full etymology: Any relation that's known will find a place in the DB and can be displayed accordingly, additionally the words will be linked to their respective eldamo-Page where the etymology is very nicely unfolded (I am rather trying to display the entries in a compact manner)
- Editorial notes and external + internal history: Rather No.
- Links between quotes (i.e. phrases) and entries: Afaik eldamo only has the relations phrase --> reference () and word --> reference but not directly phrase -> word. Can easily be added though.
- external dating: Via the linked source: Yes

As mentioned by +Tamas Ferencz, Roman Rausch and others:
The fact that so many projects of this kind have stalled before really made me consider the entire story around opening the database to user inputs. Right now I have the following two things in my concept:

- Suggest changes: Every word displayed when you submit a query shall have a button next to it for anyone to suggest changes on a word. May that be fixing mistakes or adding new things like synonyms as discussed above. Suggestions can be made on any part of the entry.
- Translation page (as voiced by +Andre Polykanine): As a more specific tool I want to add a page where one can efficiently translate the words one queried into a real language of his liking. The words will be listed on one page along with the English glosses and a text field each to enter a translation.

Either sort of submission will be stored in a suggestions database / table and I will be notified about new suggestions. I then can accept or decline the suggestions and accordingly the data gets added to the main database.

I would also be open to have a login area where specifically trusted and interested people can log in and also participate in the review process and thus the DB administration. For that there'd need to be someone willing to help out on that, but before that happens my project has to prove its usefullness first I guess. If it does end up being useful the motivation both of me and probably other using the database may be big enough to really wanting to preserve it for the days to come.

Regarding the topic of adding altogether new entries (as mentioned by +Paul Strack): I may add a tool to add in single entries as suggestions, however I doubt it would be worth it to provide a fancy interface for the assumingly very rare bulk imports. Those could be forwarded to me to be done with direct SQL injections from a csv file or similar.

On the other end: Bulk exports. I have not thought about this so far. On one hand the results surely will get presented in a table format, but some of the fields contain things that wont fit into a csv file (for instance) so easily, i.e. buttons to pop-up the entire inflection table of a word. Generally I don't think it should be very difficult to save a set of query results as a file.

Regarding what platforms I am headed for: Right now I am completely focused on a Browser interface. Personally I am unsure how useful a mobile app would be, if someone makes a proper translation he probably won't do it in the mornings on the train from his phone, but much rather sitting at his desk at home, or at least have a laptop at hand. I will however make sure that the website is responsive and useable with mobiles.

It also looks pretty good regarding whether the project will see the light of day at all, and with the already fantastic dataset of eldamo it should be useful from day one. Personally I am, even after multiple years of working with elvish, still getting into the matter, rather than feeling like being in the "business" for ages already, and have already made fantastic experiences with it. - Elvish dictionary app(lication) Mellyn, following a session at the Omentiel...