-
-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hi, thanks for the good work! I'd like to generate my own layout based on my personal corpus (my mails, code, LaTeX reports, etc., both in French and English). I tried to adapt the code but there are too many parts I don't understand sufficiently, and it also seems that some hard-coded data is defined in several places in the code, so that I ultimately don't end up a specific layout.
Would it be possible to adapt the code such that interested people only have to provide a few arrays at the beginning of the notebook and then one just has to run everything to get candidate layouts in the end? In principle, it should be enough to provide a table of letter frequencies and bigram frequencies, right?
As an example, this is what I did for my own letter frequencies :
my_24letters = [ ('E', 1286273), ('T', 911921), ('I', 785967), ('A', 767995), ... ]
my_bigrams = [('IN', 178498), ('ON', 149623), ('TH', 134033), ('TI', 132851), ('RE', 131569), ... ]
letters24, instances24 = list(zip(*my_24letters))
max_frequency = instances24[0]
bigrams_arr, bigram_freqs_arr = list(zip(*my_bigrams))
bigrams = np.array(bigrams_arr)
bigram_frequencies = np.array(bigram_freqs_arr)
I don't know where to go from there... but if the code was totally parametric, this would be far-reaching for Engram I think.