Post Dffnsng79Lb

Severin Zahler Dec 12, 2016 (16:40)

Hey everyone!

As a part of my IT apprenticeship I'm planning on realising a kind of elvish resource which I think does not exist yet in a similar fashion. The idea would be having a rather intricate database plus matching GUI for a Quenya worldlist, which however would be able to supply all kinds of further information, and most importantly offers broad possibilities to sort, filter, and search the data.

There's absolutely no ETA for this, but I am very confident that this project (which really I only came up with today, so nothing to show yet, beside the database sketch (ERD))

Here's the *idea* list so far:

- wordlist Q(u)enya - english.
- wordlist Q(u)enya - german (languages toggleable).
(more languages can technically be added, but I could not supply the data)
- sort and filter options (inspired by wordlist of sindarin.de: http://www.sindarin.de/sindarin_dyn.html).
- pre-fabricated queries (i.e. search for all words ending in a certain character set).
- full mySQL SELECT query support (only useable if you know mySQL, thus the above, more userfriendly variant as well).
- include links to eldamo.org (+Paul Strack) for more in-depth information.
- declension tables for all nouns. Regular declensions are generated and irregular declensions get an own record. Declensions might be shown i.e. via a pop-up, somewhat as seen on leo.org.
- same for verb conjugations.
- data structure that supports multiple sources. So that you i.e. can search for all words with PE22 as source.
- Quenya words of all of Tolkien's life phases in one database, but with clear explanations in which phases the word was attested, based on the recorded sources, and of course with options to filter for time periods or similar.

If you've got any further ideas for what I might include, please voice them!

I'll try to bring out some updates whenever significant progress was made.
Dynamische Wortliste
Platzhalter, Bedeutung, Beispiel. ^, Wortanfang, /^gr/: alle Wörter, die mit gr- anfangen). ($| ), Wortende, /ui($| )/: alle Wörter, die mit -ui enden .*, beliebige Anzahl unbekannter Symbole, /^e.*ui($| )/: alle Wörter, die mit e- beginnen und mit -ui enden . genau ein unbekanntes Symbol.

Tamas Ferencz Dec 12, 2016 (16:57)

well, good luck!

Fiona Jallings Dec 12, 2016 (17:13)

Well, that sounds like a ridiculously difficult project to do on your own. You do realize that there are thousands of Quenya words, right? You may want to focus your project down to just verbs or just nouns inflected for case. Otherwise, you'll be working on this for the next 5 years. (May or may not be speaking from experience)

Lúthien Merilin Dec 12, 2016 (17:43)

Hello Severin,
that's interesting! Some of us have been working on and off (more off than on, alas) on a similar project. See this thread: plus.google.com - Elvish dictionary app(lication) Mellyn, following a session at the Omentielva…

I'd be delighted if we could somehow join our efforts.

Ekin Gören Dec 12, 2016 (17:51)

Similar indeed, yet with more twists. +Lúthien Merilin and I have been expanding further on the idea. +Roman Rausch from sindarin.de is supporting us as well. We're going to start working on the alpha build very soon. Likewise, I'd love to have you on board.

Arno Gourdol Dec 12, 2016 (18:08)

That sounds like a really cool project! I have a simple UI for something similar on tecendil.com, although it would be really interesting to add declension as well... And if you need some help in displaying tengwar, I have some code to share.

Severin Zahler Dec 12, 2016 (19:22)

Thanks a lot for these replies, everyone! I should have guessed that I'm not the only one who felt a dire need for something like this.

If I'm understanding the discussion you linked +Lúthien Merilin correctly (albeit the last reply lies 43 weeks back), beside conceptual things not much could already be realised so far, right? If that's the case it might indeed be a very good idea to join forces, the point about maintenance and the many stories of failed attempts were really intrigueing.

Now, the stage of my attempt is as early as it can possibly be, I pretty much voiced the idea for the first time today concretely. However for me it would be part of my apprenticeship, and I could work 1.5, in the next semester probably even 2 full work days on this project every week. Although, the downside, or problem on my side, would be that I'd kinda have to do most of the technical stuff myself as it is part of my education process, and using this project I'd check off some modules I got to complete in order to succeed at the apprenticeship.

Thus it might be interesting if the collaboration could consist of sharing the work of data accumulation and formatting, which of course is indeed a huge part of all this.

On the technical side, what I'd have used is a MySQL db, and as platform a regular HTML website, probably using the Bootstrap framework; and as link between website and database, PHP. I'd have access to a lot of resources, but especially a good number of very knowledgeable IT people right in the same room I'm working.

Conceptually, as of the earliest blueprint, I'd have focussed on a rather pragmatic solution, meaning it should serve as a tool to make translating more efficient. Of course one does need to consider all the etymological things connected to a word sometimes when translating, but most of the time it should be sufficient to have an overview of the sources of the word, including which year/period it was created in. And I'd lay more focus on carrying together as many neologisms (which would be filterable of course), rather than getting every word's root at the right place.
However, I am absolutely not be disincled to go all the way through including all the etymological shenanigans, but there it would be where I'd start doubting whether I could sort out and arrange all the required data.

I have already done some prep work during this year (although not specifically for this project it may still be very useful), where I compiled a (still) pretty complete list of all Quenya verbs I could possibly find, including the trustworthy neologies. Yes, that did take a fair amount of time, but it never felt endless or really wore me out, so at least for getting a an as complete as possible database of Quenya vocabular (which I'd be most interested in for my personal translation work) I am very optimistic to get to an end.

So, long story short, of course I'd be interested in sharing the load and of course I would be open for any other needs/wishes! The only problem as said would be, that the technical part would rather be a solo-project of mine, for the reasons mentioned above... And I really don't expect that if someone of you was hyped for contributing to the technical aspects that you just give that and all already completed work up because of me, I really dont!

What is the status of your instance of the project anyway?

Lúthien Merilin Dec 12, 2016 (21:05)

Indeed, not much has been realised yet, at least nothing new since that thread I mentioned. The problem seems to be a mix of everyone's limited availability and the fact that while there are enough (very) knowledgeable people around, no-one of us seems to be cut out to fulfil the role of project manager - or in any case, it's not my strongest point ;)

There have been some previous attempts to create something like this. For instance, I made a java-based desktop app with an sqlite db based on the Hiswelokë Sindarin data, but that is by now quite outdated.


I don't think the technical implementation of your project would in any way overlap with what we're doing, since we've agreed that we should first settle on a data model that can accommodate everything we need, which is (as +Ekin Gören pointed out) considerably more than a regular dictionary or word-list.

As soon as we have that database, the linguistic corpus can be entered in it (that's a short line for a significant task ;) ...).

We did not yet settle on any specific implementation of a GUI or client for it as yet. It could be any number of things, website-based interfaces, desktop clients and mobile apps that all use the same data source.

As for the current status: in the past few weeks I have been talking with (mostly) +Ekin Gören and +Eryn Galen on how to proceed. If I were to consult with +Roman Rausch about some of the database design decisions it should actually not be too much work to get that realised.

I am not sure how much work it would be to get the data in there: there are quite a number of updates on the Sindarin corpus, but I guess that the 'data ingestion' can be an ongoing task - it's not something that we need to complete before we can realise a website to view the contents (or whatever we might want to build).

As far as I am concerned, I am still as enthusiastic as ever to work on it. We just need to get going.

Is it an idea if we could have a Skype chat to fill in the blanks and see if we could help one another and where?

Severin Zahler Dec 12, 2016 (22:43)

Haven't used skype so far yet, mainly cause I got no webcam yet; ever set up Discord or Teamspeak?

Lúthien Merilin Dec 12, 2016 (23:22)

oh I just mean any way of (text) chat - I think Ekin has some bandwidth issues lately. I happen to use Skype a lot, but anything will do. Not sure what +Ekin Gören and +Eryn Galen are ok with - Gtalk maybe... ?

Andre Polykanine Dec 13, 2016 (01:52)

Great idea! I would like to contribute, both code (I know the stack, HTML+PHP+MySQL) and data (I would like to add Russian and possibly Ukrainian to the languages list). The only thing I lack are Tolkien materials, and I had explained the reason here. So, maybe I'm the most wanting person here since I desperately need Quenya and Sindarin words for my personal lexicon :). Thanks!

Leonard W. Dec 13, 2016 (08:24)

Considering your stack, you're very welcome to contribute to elfdict.com. It has a lot of what you're looking to create already... built on top of LAMP.

Severin Zahler Dec 13, 2016 (08:39)

Alright, got myself skype ready, feel free to have a chat with me over there: join.skype.com - Join conversation

I'm currently looking at +Paul Strack's XML file of eldamo.org, the work you've done on that is incredible :O Given it includes almost any attested data on Tolkien's languages in a very structured way it would be a fantastic starting point for an extensive Tolkien ConLang database. It should be no problem to write a program which can extract the data from the XML into a couple of i.e. CSV files which then can be bulk loaded into the database.

Even though the license you subjected your work under, and that you already said you'd be up to offer your data for +Lúthien Merilin and Co.'s instance of the project, I still want to ask you whether you'd be okay with me adapting your data.

Will delve into the documentation of the XML now and try to think about how a 2nd stage normalisation (database with no redundant data) of this data may look like...

Arno Gourdol Dec 13, 2016 (08:49)

I don't know if this would be useful to you, but I have a service available that returns dictionary entries extracted from eldamo as a JSON data structure. Try out tengwar.herokuapp.com - tengwar.herokuapp.com/define/river

I'm happy to share the code for it as well.

As an aside, a full SQL database is a bit overkill, IMHO, for this application. There isn't that much data, all considered, and it easily fits all in a simple data structure (array, map, etc...). Then again, maybe building a DB is a requirement for your apprenticeship.

Severin Zahler Dec 13, 2016 (18:33)

+Andre Polykanine As I'm planning it to be now, it should be no problem adding additional languages (neither additional real languages nor additional fictional languages).

+Leonard W. Thanks for the invitation! Is there some sort of data structure model for your database? From what I see through your (very appealing) GUI I can only guess a bit; but I think what I am aiming at is something different, especially I want to try to bring all the data to the 2nd normalization stage, which is a database modelling term which basically describes a database that is structured so that, optimally, no redunant data is present. For example in your model it seems to be that if an elvish word has multiple glosses (i.e. Q. ric-: "try, put forth effort, strive, endeavour" all the glosses probably make up one record, and I'd split these up into single data elements.

+Arno Gourdol Thanks a lot for the link! However I think it isn't significantly easier to extract the data from the structure your code provides compared to parsing eldamo's original XML-file


I did work on the database model a bit more today, +Lúthien Merilin gave some great inputs as well, thanks again for that!
As it stands now there will be one table which will house all conLang words (languages distinguished by a value in an additional column) and one table for all 'real' languages. Additionally one table for all unaltered attested records is planned. The conlang and attested records table are hooked up to a relations table which allows to store any type of relation, i.e. between two words of different language, a different time period, or hook up a normalized word with it's original attested form.
Beside that there will of course be all other kinds of tables, housing sources, word types, word categories (by semantic) and also various language specific inflection tables, i.e. a quenya verb conjugation chart, a quenya noun declension chart or a sindarin mutation chart.

I hope I can post a picture of this early concept of the structure tomorrow.

Again +Paul Strack, I'd be pleased to get in touch with you as I probably would like to use your XML file as first input of data.

Leonard W. Dec 13, 2016 (22:25)

The source code is available on Github: https://github.com/galadhremmin/Parf-Edhellen. Since I'm importing a lot of glosses from a variety of sources, the normalization is by no means perfect, and I have been coerced to do a number of optimizations to ensure a performant search experience, but a lot of thought has gone into its design, and I would like to think that I've avoided a great deal of redundancy. That said, there's a lot of room for improvement.

As for a UML diagram of the relational database... well, full SQL dumps are available on Github. I'll see if there's an automatic tool I can use to turn an existing database structure into an UML diagram.
github.com - galadhremmin/Parf-Edhellen

Severin Zahler Dec 15, 2016 (09:28)

Alright, quick update again, below is the current database model.
Instead of explaining all thoughts / features of it I'll be working on some first sort of documentation.

+Leonard W. I'll try generating the model off your sql files, got all the necessary tools at hand :)

https://plus.google.com/photos/...

Leonard W. Dec 15, 2016 (10:03)

I would highly discourage the use of schemas specifically for sindarin and quenya, with columns based on our current understanding and extrapolations of the language. Much of what we have today will change, in some way or another. A typical example is the ongoing discussions about the contents in Parma Eldalamberon 22. I would therefore recommend that you define declensions, conjugations etc. in separate schemas, and then use 1-to-many relationship with the schema containing the definitions. The same logic would apply to languages, which might be especially important considering (as an example) Sindarin's journey from gnomish > old noldorin > noldorin > > old sindarin > sindarin.

Severin Zahler Dec 20, 2016 (09:04)

Time for another quick update!

First off, however, thanks a lot to your input +Leonard W.! I am very well aware of the ever-changing nature of such things, the main reason why I initially went for having the declension / conjugation names as fix column names was because if any of these change, so probably will the underlying content as well, so it does not matter all that much, and just as one can use MySQL statements to alter the content, there's also ALTER TABLE to change the column names.

I talked about this with my technical supporter; and the idea he brought up was to use Views. Views basically let you pre-define aliases for certain (parts of) tables. Thus what I can do is prepare a VIEW statement for each kind of inflexion (i.e. quenya-noun, sindarin-verb), with the specific elements (nominative, accusative...) fix in this statement; and then have a sort of foreign key for each of these in the, as you suggest, generic inflexion table which contains all kinds of inflected words.

Progress-wise I have found my way into using JDOM2; a Java library to read (and write) XML-files; using that I am now extracting the eldamo data to a set of .csv files.

I won't lie that I am pretty focussed on using the eldamo data as a rather central part of my project; while the license +Paul Strack subjected it under does not conflict with this I'd still be very interested in getting in touch with you, also regarding whether it might be interesting to have a fix way to import new eldamo data in the future. Unfortunately I could not find another way to contact you than through this means here :I So, if I don't manage to reach out to you anymore soonish; just be assured that I of course will give all credit that is due to the incredible work you've compiled (that frikkin' XML file has 263891 lines :O)