[accepted, implemented] Context-free Grammar for unit names

skeptical_troll · Post by **skeptical_troll** » April 8th, 2016, 7:05 pm

Just out of curiosity, is the current algorithm based on MCMC on letters , not syllables or blocks? If the length is the problem, isn't just possible to include it in the likelihood by hand so that probability of long names go down?

Spixi · Post by **Spixi** » April 8th, 2016, 7:17 pm

skeptical_troll wrote:Just out of curiosity, is the current algorithm based on MCMC on letters , not syllables or blocks? If the length is the problem, isn't just possible to include it in the likelihood by hand so that probability of long names go down?

You can find the current implementation in /src/race.cpp.

I wonder if there is already an existing library which does exactly the opposite of what GNU flex does.

Dugi · Post by **Dugi** » April 8th, 2016, 7:49 pm

skeptical_troll wrote:Just out of curiosity, is the current algorithm based on MCMC on letters , not syllables or blocks? If the length is the problem, isn't just possible to include it in the likelihood by hand so that probability of long names go down?

If you are lazy to find it, it takes a pair of letters to pick what will follow. It can be configured to use the last triplet or more, but it's not used anywhere as far as I know.
I have tried the algorithm with next letter determined from the previous single letter, but it sucked.

EDIT:
Some improved grammars whose average number of recruits till a pair of namesakes appear are significantly better than it was before (no experiment was made because it can be estimated mathematically, I am adding rough estimates).
Male elves (around 70)
Female elves (around 70)
Male humans (around 80)
Female humans (around 80)
Orcs (around 90)
Other grammars were already better than the current markov generator.

Some further improvements were made and the pull request was accepted and will be a part of wesnoth 1.13.5 and later.

Post by **GunChleoc** » January 27th, 2017, 9:43 am

I have finally been playing with translating the name generation. I am running into a few difficulties with the town names:

I have prefixed the base names with "XXX", but I get names with "XX", "XXXX" or "XXXXXX" in them. This should not happen - I should see "XXX" only here.
I have prefixed the rule-generated base names with "NOCOM". I don't see any of those at all.
Segmentation of base names is broken. Seems like blank space is used in addition to , for parsing the names into a list, resulting in nonsense names.
There are town names that consist of base names only. Since my base names need to be in the genitive case for the composition rules, I need to get rid of pure base town names without any prefixes. I haven't found a way to do that.

I am attaching the current state of my translation file for the wesnoth textdomain, gd locale.

Dugi · Post by **Dugi** » January 29th, 2017, 10:50 am

Hello,

All mentions that I have contributed something at all were removed from my forum profile, so this is not my responsibility any more.

However, because it's you, I will give you some advice. The implementation tries to find the name generator. If it fails to find it, is falls back to the old Markov chains. The Markov chain generated names work like that, if you add XXX before all base names, the result may have a random number of X letters in it. You have prefixed the context-free grammar generated names of villages with NOCOM, but the code is messed up (the second line is missing a newline), the parsing fails and falls back to the old method. I do not know how is the old name system implemented, so I have no idea what can be broken in it and cause the segmantation issue.

HTH,
-Dugi

Post by **GunChleoc** » January 29th, 2017, 4:16 pm

Thanks, Dugi. Seems like that's not the only thing that's broken in my code though, so I need to find a way to really debug this thing.

Dugi · Post by **Dugi** » January 29th, 2017, 8:25 pm

You can use my website to debug it (most of those links I have posted link to that). I have expanded its functionality since then, but you probably won't come across any of the new syntactic features.

You may need to add that \n at the end of each line. The code needs the newlines to be there, but I am not sure how do the translation file deal with the newlines.

Post by **GunChleoc** » January 30th, 2017, 12:07 pm

Thanks, that is a very helpful tool! It's working via the website now, but not in Wesnoth.

I compiled Wesnoth on my Linux box and added some debug output. I spent a few hours digging into the code and it seems like calling generate() for "main" always returns an empty string, even for English. This means that the $base variable is then filled by the Markov generator.

So, this is definitely a bug in the context free generator in Wesnoth, but I have no idea what's wrong with it yet.

Post by **GunChleoc** » January 31st, 2017, 11:53 am

I found the bug

https://github.com/wesnoth/wesnoth/pull/921

I am still getting pure base names though.

Wussel · Post by **Wussel** » September 1st, 2018, 11:23 am

Spixi wrote: ↑April 8th, 2016, 6:37 pm The problem with Markov chains is that there may be loops or dead ends which can cause very long or very short names.

This small example shows, what I mean:

Given are the following names:
LILA
ANNE
ALENA

This produces the following Markov chain:
<start> -> { A, A, L }
A -> { <end>, <end>, L, N }
E -> { <end>, N }
I -> { L }
L -> { A, E, I }
N -> { A, N, N, E }

The probability to generate the name "A" is 4/9, because 2/3 of all names start with A and 2/3 of all names end with A.
The likelihood that a name, which contains a N, contains at least three Ns in a row is (1/2)^3 = 1/8, which makes names like "ANNNA" very common.
If a name contains a I, it will contain at least four characters, because it has to contain the path L -> I -> L -> {A, E, I}

We conclude that names usually do not follow Markov chains. Many names are based on context-free grammars, however. This example shows a simple grammar for old German names:

NAME = {PREFIX} + {SUFFIX}
PREFIX = "A", "Al", "Bal", "Ed", "Eg", "Frie", "Gott", "Hein", "Hin", "Rein", "Sig", "Ul", "Wil", "Win", "Wal", "Wol"
SUFFIX = "bert", "dolf", "drich", "dulin", "dur", "fried", "helm", "hold", "lieb", "ram", "rich", "win"

Example names are: Edwin, Reinhold, Friedrich and Winfried.

As you see, this would generate names with a better quality than the current implementation.

That would be exactly how it should be. I remember the use of this for pen and paper RPG in the late 80ties. Making lists in excel and using for NPCs. How many more years will it take for Wesnoth to catch up?

Post by **Tad_Carlucci** » September 1st, 2018, 2:41 pm

alalalalalalalalalalalalalalalalalalalalalalalal

Good recognizer, lousy generator. That probably answers why we're not "with it" .. we like junk to work and not cause infinite loops or other silliness.

Wussel · Post by **Wussel** » March 9th, 2020, 11:26 am

Ok, how do I use this? Is it used now as standard for all names? Is it part of mainline?

Post by **GunChleoc** » March 9th, 2020, 4:39 pm

It's in mainline, in wesnoth.po.

You can test your rules at Dugi's site: https://www.physics.muni.cz/~dugi/index.fcgi/cfggen

Wussel · Post by **Wussel** » March 9th, 2020, 4:48 pm

https://wiki.wesnoth.org/Context-free_grammar
found it. Now tryin to make random banter in messages like the examples. Is that possible? plz help with wml syntax. Name gen I managed on my own by copying mainline.

The Battle for Wesnoth Forums

[accepted, implemented] Context-free Grammar for unit names

Re: How about using Context-free Grammar to generate unit na

Re: How about using Context-free Grammar to generate unit na

Re: How about using Context-free Grammar to generate unit na

Re: [accepted, implemented] Context-free Grammar for unit na

Re: [accepted, implemented] Context-free Grammar for unit na

Re: [accepted, implemented] Context-free Grammar for unit na

Re: [accepted, implemented] Context-free Grammar for unit na

Re: [accepted, implemented] Context-free Grammar for unit na

Re: [accepted, implemented] Context-free Grammar for unit na

Re: How about using Context-free Grammar to generate unit na

Re: [accepted, implemented] Context-free Grammar for unit names

Re: [accepted, implemented] Context-free Grammar for unit names

Re: [accepted, implemented] Context-free Grammar for unit names

Re: [accepted, implemented] Context-free Grammar for unit names