Hey everyone!
As a part of my IT apprenticeship I'm planning on realising a kind of elvish resource which I think does not exist yet in a similar fashion. The idea would be having a rather intricate database plus matching GUI for a Quenya worldlist, which however would be able to supply all kinds of further information, and most importantly offers broad possibilities to sort, filter, and search the data.
There's absolutely no ETA for this, but I am very confident that this project (which really I only came up with today, so nothing to show yet, beside the database sketch (ERD))
Here's the *idea* list so far:
- wordlist Q(u)enya - english.
- wordlist Q(u)enya - german (languages toggleable).
(more languages can technically be added, but I could not supply the data)
- sort and filter options (inspired by wordlist of sindarin.de: http://www.sindarin.de/sindarin_dyn.html).
- pre-fabricated queries (i.e. search for all words ending in a certain character set).
- full mySQL SELECT query support (only useable if you know mySQL, thus the above, more userfriendly variant as well).
- include links to eldamo.org (+Paul Strack) for more in-depth information.
- declension tables for all nouns. Regular declensions are generated and irregular declensions get an own record. Declensions might be shown i.e. via a pop-up, somewhat as seen on leo.org.
- same for verb conjugations.
- data structure that supports multiple sources. So that you i.e. can search for all words with PE22 as source.
- Quenya words of all of Tolkien's life phases in one database, but with clear explanations in which phases the word was attested, based on the recorded sources, and of course with options to filter for time periods or similar.
If you've got any further ideas for what I might include, please voice them!
I'll try to bring out some updates whenever significant progress was made.
Tamas Ferencz Dec 12, 2016 (16:57)
Fiona Jallings Dec 12, 2016 (17:13)
Lúthien Merilin Dec 12, 2016 (17:43)
that's interesting! Some of us have been working on and off (more off than on, alas) on a similar project. See this thread: plus.google.com - Elvish dictionary app(lication) Mellyn, following a session at the Omentielva…
I'd be delighted if we could somehow join our efforts.
Ekin Gören Dec 12, 2016 (17:51)
Arno Gourdol Dec 12, 2016 (18:08)
Severin Zahler Dec 12, 2016 (19:22)
If I'm understanding the discussion you linked +Lúthien Merilin correctly (albeit the last reply lies 43 weeks back), beside conceptual things not much could already be realised so far, right? If that's the case it might indeed be a very good idea to join forces, the point about maintenance and the many stories of failed attempts were really intrigueing.
Now, the stage of my attempt is as early as it can possibly be, I pretty much voiced the idea for the first time today concretely. However for me it would be part of my apprenticeship, and I could work 1.5, in the next semester probably even 2 full work days on this project every week. Although, the downside, or problem on my side, would be that I'd kinda have to do most of the technical stuff myself as it is part of my education process, and using this project I'd check off some modules I got to complete in order to succeed at the apprenticeship.
Thus it might be interesting if the collaboration could consist of sharing the work of data accumulation and formatting, which of course is indeed a huge part of all this.
On the technical side, what I'd have used is a MySQL db, and as platform a regular HTML website, probably using the Bootstrap framework; and as link between website and database, PHP. I'd have access to a lot of resources, but especially a good number of very knowledgeable IT people right in the same room I'm working.
Conceptually, as of the earliest blueprint, I'd have focussed on a rather pragmatic solution, meaning it should serve as a tool to make translating more efficient. Of course one does need to consider all the etymological things connected to a word sometimes when translating, but most of the time it should be sufficient to have an overview of the sources of the word, including which year/period it was created in. And I'd lay more focus on carrying together as many neologisms (which would be filterable of course), rather than getting every word's root at the right place.
However, I am absolutely not be disincled to go all the way through including all the etymological shenanigans, but there it would be where I'd start doubting whether I could sort out and arrange all the required data.
I have already done some prep work during this year (although not specifically for this project it may still be very useful), where I compiled a (still) pretty complete list of all Quenya verbs I could possibly find, including the trustworthy neologies. Yes, that did take a fair amount of time, but it never felt endless or really wore me out, so at least for getting a an as complete as possible database of Quenya vocabular (which I'd be most interested in for my personal translation work) I am very optimistic to get to an end.
So, long story short, of course I'd be interested in sharing the load and of course I would be open for any other needs/wishes! The only problem as said would be, that the technical part would rather be a solo-project of mine, for the reasons mentioned above... And I really don't expect that if someone of you was hyped for contributing to the technical aspects that you just give that and all already completed work up because of me, I really dont!
What is the status of your instance of the project anyway?
Lúthien Merilin Dec 12, 2016 (21:05)
There have been some previous attempts to create something like this. For instance, I made a java-based desktop app with an sqlite db based on the Hiswelokë Sindarin data, but that is by now quite outdated.
I don't think the technical implementation of your project would in any way overlap with what we're doing, since we've agreed that we should first settle on a data model that can accommodate everything we need, which is (as +Ekin Gören pointed out) considerably more than a regular dictionary or word-list.
As soon as we have that database, the linguistic corpus can be entered in it (that's a short line for a significant task ;) ...).
We did not yet settle on any specific implementation of a GUI or client for it as yet. It could be any number of things, website-based interfaces, desktop clients and mobile apps that all use the same data source.
As for the current status: in the past few weeks I have been talking with (mostly) +Ekin Gören and +Eryn Galen on how to proceed. If I were to consult with +Roman Rausch about some of the database design decisions it should actually not be too much work to get that realised.
I am not sure how much work it would be to get the data in there: there are quite a number of updates on the Sindarin corpus, but I guess that the 'data ingestion' can be an ongoing task - it's not something that we need to complete before we can realise a website to view the contents (or whatever we might want to build).
As far as I am concerned, I am still as enthusiastic as ever to work on it. We just need to get going.
Is it an idea if we could have a Skype chat to fill in the blanks and see if we could help one another and where?
Severin Zahler Dec 12, 2016 (22:43)
Lúthien Merilin Dec 12, 2016 (23:22)
Andre Polykanine Dec 13, 2016 (01:52)
Leonard W. Dec 13, 2016 (08:24)
Severin Zahler Dec 13, 2016 (08:39)
I'm currently looking at +Paul Strack's XML file of eldamo.org, the work you've done on that is incredible :O Given it includes almost any attested data on Tolkien's languages in a very structured way it would be a fantastic starting point for an extensive Tolkien ConLang database. It should be no problem to write a program which can extract the data from the XML into a couple of i.e. CSV files which then can be bulk loaded into the database.
Even though the license you subjected your work under, and that you already said you'd be up to offer your data for +Lúthien Merilin and Co.'s instance of the project, I still want to ask you whether you'd be okay with me adapting your data.
Will delve into the documentation of the XML now and try to think about how a 2nd stage normalisation (database with no redundant data) of this data may look like...
Arno Gourdol Dec 13, 2016 (08:49)
I'm happy to share the code for it as well.
As an aside, a full SQL database is a bit overkill, IMHO, for this application. There isn't that much data, all considered, and it easily fits all in a simple data structure (array, map, etc...). Then again, maybe building a DB is a requirement for your apprenticeship.
Severin Zahler Dec 13, 2016 (18:33)
+Leonard W. Thanks for the invitation! Is there some sort of data structure model for your database? From what I see through your (very appealing) GUI I can only guess a bit; but I think what I am aiming at is something different, especially I want to try to bring all the data to the 2nd normalization stage, which is a database modelling term which basically describes a database that is structured so that, optimally, no redunant data is present. For example in your model it seems to be that if an elvish word has multiple glosses (i.e. Q. ric-: "try, put forth effort, strive, endeavour" all the glosses probably make up one record, and I'd split these up into single data elements.
+Arno Gourdol Thanks a lot for the link! However I think it isn't significantly easier to extract the data from the structure your code provides compared to parsing eldamo's original XML-file
I did work on the database model a bit more today, +Lúthien Merilin gave some great inputs as well, thanks again for that!
As it stands now there will be one table which will house all conLang words (languages distinguished by a value in an additional column) and one table for all 'real' languages. Additionally one table for all unaltered attested records is planned. The conlang and attested records table are hooked up to a relations table which allows to store any type of relation, i.e. between two words of different language, a different time period, or hook up a normalized word with it's original attested form.
Beside that there will of course be all other kinds of tables, housing sources, word types, word categories (by semantic) and also various language specific inflection tables, i.e. a quenya verb conjugation chart, a quenya noun declension chart or a sindarin mutation chart.
I hope I can post a picture of this early concept of the structure tomorrow.
Again +Paul Strack, I'd be pleased to get in touch with you as I probably would like to use your XML file as first input of data.
Leonard W. Dec 13, 2016 (22:25)
As for a UML diagram of the relational database... well, full SQL dumps are available on Github. I'll see if there's an automatic tool I can use to turn an existing database structure into an UML diagram.
github.com - galadhremmin/Parf-Edhellen
Severin Zahler Dec 15, 2016 (09:28)
Instead of explaining all thoughts / features of it I'll be working on some first sort of documentation.
+Leonard W. I'll try generating the model off your sql files, got all the necessary tools at hand :)
https://plus.google.com/photos/...
Leonard W. Dec 15, 2016 (10:03)
Severin Zahler Dec 20, 2016 (09:04)
First off, however, thanks a lot to your input +Leonard W.! I am very well aware of the ever-changing nature of such things, the main reason why I initially went for having the declension / conjugation names as fix column names was because if any of these change, so probably will the underlying content as well, so it does not matter all that much, and just as one can use MySQL statements to alter the content, there's also ALTER TABLE to change the column names.
I talked about this with my technical supporter; and the idea he brought up was to use Views. Views basically let you pre-define aliases for certain (parts of) tables. Thus what I can do is prepare a VIEW statement for each kind of inflexion (i.e. quenya-noun, sindarin-verb), with the specific elements (nominative, accusative...) fix in this statement; and then have a sort of foreign key for each of these in the, as you suggest, generic inflexion table which contains all kinds of inflected words.
Progress-wise I have found my way into using JDOM2; a Java library to read (and write) XML-files; using that I am now extracting the eldamo data to a set of .csv files.
I won't lie that I am pretty focussed on using the eldamo data as a rather central part of my project; while the license +Paul Strack subjected it under does not conflict with this I'd still be very interested in getting in touch with you, also regarding whether it might be interesting to have a fix way to import new eldamo data in the future. Unfortunately I could not find another way to contact you than through this means here :I So, if I don't manage to reach out to you anymore soonish; just be assured that I of course will give all credit that is due to the incredible work you've compiled (that frikkin' XML file has 263891 lines :O)