So, after the christmas break I am back working on my conlang database project. Currently I am developping a java program which dissects the data from eldamo and sorts and arranges it into the tables, and ultimatively will directly load it into the database. As an intermediate developping / debugging step I am writing the tables to csv files for easy verification of the code.
+Paul Strack As I am doing this I am stumbling across some of the (so far) very few errors of the eldamo file. I just extracted the "speech" attribute of all word pages (in order to create a word type table), and there's a few things I found odd / noteworthy:
- There's a few combined types which appear in two different orders, I presume these are not intentionally different from each other:
-- "n adv" vs. "adv n"
-- "n adj" vs. "adj n"
-- "conj adv" vs. "adv conj"
-- "adv prep" vs "prep adv"
-- "adv interj" vs. "interj adv"
- Once "card" appears, in all other occurrences you did put "cardinal": http://eldamo.org/content/words/word-4008647877.html
- Similar for "suf" vs. "suffix", here the latter one appears once only: http://eldamo.org/content/words/word-3792660851.html
- Similar for "pref" vs. "prefix", prefix is used 4 times in total.
- There's one occurrence of "v" instead of "vb": http://eldamo.org/content/words/word-529403071.html
- For G. siriol "flowing" some major mix up of the fields has happened; "sindi" appears as word and "siriol" as word type: http://eldamo.org/content/words/word-2199144701.html
- There's two words who are missing the speech attribute, even though the xsd file states there always should be one:
-- http://eldamo.org/content/words/word-4199786153.html
-- http://eldamo.org/content/words/word-2768164367.html
I hope this helps to fix some things :) For your convenience, here's the entire output I got from my program: http://pastebin.com/5kSJpf6j
I will also try to add a few additional checks to my code which may help to find a few more of these minor flaws :)
Severin Zahler Jan 11, 2017 (14:09)
These are:
- "bel"
- "dor"
- "fal"
- "edan"
- "sol"
- "ln"
- "eon"
- "lon"
- "oss"
(All language tags used in the
Severin Zahler Jan 11, 2017 (16:25)
Paul Strack Jan 12, 2017 (02:06)
I haven't done validations on the parts of speech for a while, so the XSD specifications and the XML data may be out of sync. I will clean it when I have time. I added a github issue so I won't forget:
github.com - Fix various data validation issues · Issue #7 · pfstrack/eldamo
Regarding the ref language codes that do not appear in the languages list, those are references that are marked as one language that I put under words classified as a different language. There are various reasons why I did this.
"dor" and "fal" are dialects of Ilkorin (ilk), namely Doriathrin and Falathrim. "bel" is a late variation on Ilkorin that Tolkien labeled Beleriandric. "oss" and "edan" are variants of Danian (dan), namely Ossriandric and East Danian.
"sol" is Solosimpi which I lumped in with Early Telerin (et). "eon" is Early Old Noldorin which I lumped in with Early Noldorin though I may separate it in the future when I get around to analyzing it. "ln" is Late Noldorin, for words labeled "Noldorin" by Tolkien in the transitional period between Noldorin and Sindarin.
Severin Zahler Jan 12, 2017 (08:06)
The code "lon", is that "Late Old Noldorin"?
Paul Strack Jan 12, 2017 (15:43)
Severin Zahler Jan 13, 2017 (17:16)
pastebin.com - SD/421.30061 SD/421.30063 SD/422.08101 SD/421.30065 SD/422.08071 SD/422.080 - Pastebin.com
Asking as I am storing all bits of information, i.e. page, line number and word position, in a separated form.
Also theres various sources with the format of i.e. PE13/155.9901-1, what does the "-1" mean?
Paul Strack Jan 14, 2017 (02:54)
Beyond the page number, I think the most I can guarantee is that the identifier should be unique.
Severin Zahler Jan 14, 2017 (11:07)
As I'd read it now the four first digits are line number + word position as usual and the last digit the variant, often just "5", or "1", and eventually higher digits if more variants exist.
A few single ones that still puzzle me though:
PE13/147.30310: This is the only one with digit "0" at the end.
QL/060.70ive: has letters instead of digits.
TI/310.0034091: has a lot of digits.
PE19/093.20 13: has a space in between.
But forgive me for labelling so many things of the data as probably erronous, I absolutely don't want to put you to shame, not at all! I just want to make sure I don't misinterprete this valuable data, and also want to probably help you with discovering the very few inconsistencies over this massive amount of data. Actually for that you seemingly arranged all of this manually it is incredibly consistent!
Paul Strack Jan 14, 2017 (17:43)
Severin Zahler Jan 16, 2017 (11:29)
Now that I've extended the code to specially look for the length of the line number + word position bit I stumbled over a handful of sources which only have 3 digits. As there are only very few of those I assume these are not intentional:
- RGEO/63.030 ("A Elbereth Gilthoniel")
- LotR/0429.001 ("Methedras")
- LR/317.002 ("iChúrinien")
- S/154.001 ("Grond")
Also I think you've overlooked my remark on a few source reference prefixes ("books", to avoid confusion) which are not listed; I'm not familiar with all abbreviations, so I can't wholly judge which are just missing from the sources list and which may be typos, here they are again (all appear uniquely):
- VT32/07.1106
- LT/192.3006
- LT/132.0212
- PEE/17.35
Paul Strack Jan 16, 2017 (18:17)
It takes a long time for a new full build of Eldamo, but I'm finding this process very fruitful, so I've checked a temporary data file with the modifications into Github here:
github.com - eldamo
[EDIT] When I clicked on the download button in my browser, it tried to load the xml file into my browser window, so you may want to right-click on the button and choose "Save As..."