Abstract Proclaiming the Development and Dissemination of the Haroldic Universal Alphabet

Abstract Proclaiming the Development and Dissemination of the Haroldic Universal Alphabet

Harold Horseface and Pig Milligan

 

Purpose

The world today is divided. One of the biggest forces perpetuating this division is normative alphabet. The various alphabet-industrial complexes enforce adherence to their letters and keep us as global citizens from understanding each other. A change is urgently needed. However, to truly break down the arbitrary barriers that divide the world, reform from within existing alphabet continua is not sufficient.

Enter the Haroldic Universal Alphabet.

The purpose of the Haroldic Universal Alphabet (HUA) is to provide an alphabet that any literate person can read, regardless of what language they are literate in. This has been achieved through the construction of new letters that represent the average forms of existing letters. This abstract will introduce you to these new letters and explain how they were constructed.

Raw Alphabetic Material

Each new letter was hand-crafted from an agglomeration of its fellow letters. The scripts from which these letters were taken are as follows:

Alphabets: Adlam, Armenian, Coptic, Cyrillic, Georgian, Glagolitic, Greek, Hangul, Hanifi, Kayah Li, Latin, Mongolian, N'Ko, Ol Chiki, Old Hungarian, Orkhon (including Yeniseian variants), Osmanya, ‘Phags-pa, Runic, Tifinagh, Warang Chiti

Abjads: Arabic, Hebrew, Syriac

Abugidas: Balinese, Bengali-Assamese, Canadian Aboriginal, Devanagari, Dogra, Ge'ez, Gujarati, Javanese, Kannada, Khmer, Lao, Lepcha, Limbu, Lontara, Malayalam, Meitei, Mon-Burmese, Odia, Sinhala, Sundanese, Tamil, Telugu, Thaana, Thai, Tibetan

Syllabaries: Bopomofo, Cherokee, Japanese (Hiragana and Katakana), Kikakui, Vai

To gather phonemic values for the letters, 502 languages were analyzed. These include the official languages of every country in the world, as well as others included for linguistic diversity. The full list of languages can be found in Appendix B.

Some languages have multiple alphabets, most of which use different scripts. All available alphabets for the chosen languages were analyzed. Additionally, miscellaneous letters were gathered for many of the scripts. In total, 724 alphabetic entities were analyzed for the creation of this alphabet.

Letter Weight Calculation

As previously stated, the letters of the HUA are designed to be averages of existing letters. The reckoning of these letters was achieved by calculating weighted averages that represent each phoneme in American English. The averages are the product of a certain letter’s frequency for a phoneme and a specially calculated script weight.

The letter frequencies were determined through analysis of every entity for a given script. The number of entities with a certain phoneme and the letters representing that phoneme were recorded. The frequency with which a certain letter represented that phoneme was then calculated.

The script weights for each phoneme consist of three component sub-weights:

Component A: 1/np , where np is the number of scripts containing a specific phoneme.

Component B: xps/xp , where xps is the percentage of entities in the given script containing the phoneme, and ∑xp is the sum of all such percentages across all scripts.

Component C: nps/npt , where nps is the number of entities that use the given scripts and contain the specific phoneme, and npt is the total number of entities containing the phoneme across all scripts.

Each component favors different scripts with different characteristics. Component A weights each script containing the phoneme equally; Component B favors scripts that are more strongly associated with the phoneme, that is, scripts where more entities contain the phoneme; and Component C favors scripts with larger numbers of entities. The components were weighted themselves to allow the most prominent scripts for a phoneme to take precedence, while still giving all scripts strong representation. The final script weight formula is thus:

W = 0.5(A)+0.25(B) +0.25(C)

The final script weight for each phoneme was multiplied by the letter frequencies in that script for the same phoneme. The resulting values were the final letter weights used for reckoning the letters of the HUA.

As an example, the weighting process of the phoneme /ə/ in the Cyrillic alphabet is shown below:

Letter Frequencies:

Out of 77 entities using Cyrillic, 18 contain /ə/. 2 entities use the letter А to represent /ə/, 2 use Ә, 3 use Ъ, 9 use Ы, and 2 use Э.

fəCА = 11.1% 

fəCӘ = 11.1%  

fəCЪ = 16.7%  

fəCЫ = 50.0%

 fəCЭ = 11.1%

Script weighting of Cyrillic alphabet for phoneme /ə/:

Out of 54 total scripts, 24 have entities containing /ə/. nə = 24

Out of 77 entities using Cyrillic, 18 contain /ə/. nəC = 18. xəC = 18/77 = 0.234

Performing similar calculations for all scripts gives the value ∑xə = 11.835

Across all scripts, 156 entities contain /ə/. nət = 156  

AəC = 1nə = 1/24 = 0.042  

BəC = xəC xə = 0.234/11.835 = 0.020  

CəC = nəC nət = 18/156 = 0.115 

WəC = 0.5(AəC) + 0.25(BəC) + 0.25(CəC) = 0.5(0.042) + 0.25(0.020) + 0.25(0.115) 

        = 0.055

Determining Final Letter Weights

WəC × fəCА = 0.055 × 0.111 = 0.6%

WəC × fəCӘ = 0.055 × 0.111 = 0.6%

WəC × fəCЪ = 0.055 × 0.167 = 0.9%

WəC × fəCЫ = 0.055 × 0.500 = 2.7%

WəC × fəCЭ = 0.055 × 0.111 = 0.6%

Graphic Manipulation of Letters

Over 2,000 individual letters were collected for the weighing process. After the weights had been calculated, the letters were inputted into a vector-graphics program. Each letter was given a transparency value equal to its weight, aligned, and grouped together to reveal the underlying common symbol. The common symbols were traced to form the letters of the HUA. Figures 1a, 1b, and 1c demonstrate this process.

Due to the novelty of this alphabet, the letters of the HUA are not currently included as Unicode characters and thus cannot be represented digitally. To mitigate this problem, an interim alphabet called the Haroldic Universal Alphabet – Digital (HUA-D) has been created. This alphabet consists of existing Unicode characters that resemble the HUA letters, as shown in Figure 1d. Alas, this means the root HUA is not currently available for the world to use. However, as the First Law of Alphabet states, alphabet is a continuum. Therefore, usage of the HUA-D will begin the process of acclimatizing readers to the root HUA. Furthermore, the ease with which the HUA-D can be digitally implemented will allow it to be quickly disseminated around the globe.

 

Order

The order of letters in the HUA was determined using a simpler calculation process. For each alphabetic entity analyzed, each letter was given a value between 0 and 1 representing its proportional place in the entity’s alphabetical order. The formula for this value is: 

(p - 1)/(n - 1), where p is the letter’s actual place in the alphabetical order and n is the number of letters in the alphabet. 

This formula ensures that the first letter of an alphabet has a value of 0 and the last letter has a value of 1; every other letter has a value reflecting its position in between them. For every phoneme, the average proportional place within each script was calculated. The averages of all nonzero script averages for each phoneme were then calculated and sorted from least to greatest. The resulting order of phonemes is the order of the HUA.

Official order calculations for the HUA are still ongoing. However, due to the world’s urgent need for this alphabet, the creators have decided to proceed with its release. The order given in this abstract is therefore a preliminary order; an update containing the final order will be issued in the near future.

Future Goals

The creators of the HUA have several further developments they wish to implement, the most immediate of which is the completion of the order calculation. Research into HUA handwriting is ongoing, as is the creation of a typeface to aid the digitization process. Beyond these forthcoming developments, a likely future step is the construction of letters to represent phonemes not found in American English.

Appendix A:

The Haroldic Universal Alphabet and the Haroldic Universal Alphabet – Digital

Appendix B:

List of Languages Represented

Abaza*, Abenaki†, Abkhaz*, Aghul, Adyghe*, Akatek†, Aklanon, Acehnese*, Angika, Aja, Afar, Afrikaans*, Altai*, Anii, Anis†, Avar*, Amharic, Arawak, Aromanian, Armenian, Aymara, Evenki*, Arabic*, Balanta, Balinese, Balti*, Banjarese*, Bambara*, Bariba, Bashkir*, Beja*†, Belarusian*, Bengali, Balochi†, Bavarian, Burmese, Basque, Bikol, Bilen*, Bislama, Boro*, Blackfoot*, Boko, Bhojpuri, Bulgarian*, Buginese*, Bulu†, Bunak†, Bhutia†, Buryat*, Breton, Ewe, Estonian, Erzya, Ewondo, Ga, Galela, Galoli, Garifuna, Gaelic, Gen†, Gagauz*, Galician, Gilbertese, Gonja, Gujarati, Gumuz*, Gourmanchéma, Greek*, Greenlandic, Guarani, Odia†, Akan, Alutiiq, Even*, Dagaare, Dagbani, Dangme†, Dargwa*, Dari†, Danish, Drehu, Dinka, Dhivehi†, Dolgan, Dutch, Zhuang†, Dogri*†, Dzongkha†, Dhuwal†, Albanian*, Aleut*, Alsatian, Antillean Creole, Azerbaijani*, Assamese, Asturian, Aragonese, Kabiye†, Kaqchikel, Kalmyk*, Kannada†, Q'anjob'al, Kazakh*, Kasem:, Karaim*, Karachay-Balkar*, Karenni†, Karakalpak*, Kosraean†, Cape Verdean Creole, Carolinian, Q'eqchi', Kerinci*†, Quechua†, Kabardian*, Kabyle*, Kanuri, Khmer, Kpelle†, Kapampangan†, Karelian*, Kurdish*, Korean, Khoekhoe†, Koyukon†, Cantonese*†, Catalan, Kashmiri*, Kashubian, Kikongo, Kikuyu, Kʼicheʼ†, Kichwa, Kinyarwanda, Kimbundu†, Kituba, Kissi, Kirundi, Kyrgyz*, Kokborok*, Coptic, Cora, Cornish, Corsican, Konkani*†, Komi*, Comorian*, Xhosa, Cook Islands Maori†, Kurukh†, Kumyk*, Kurmali*, Cree*†, Krio, Crimean Tatar*, Kwaraʼae†, Gwich'in†, Quiripi, Igbo, Ilocano, Isan†, Ixil†, Chokwe, Chavacano, Cham†, Chatino†, Czech, Chechen*, Cherokee, Chewa, Chamorro, Chhattisgarhi†, Twi†, Cia-Cia*†, Chipaya†, Chipewyan*†, Chittagonian*, Choctaw, Ch'ol, Chontal Maya, Chuukese, Chukchi, Chuj, Chuvash, Diola, Javanese*, Jamaican Patois, German, Japanese, Jingpo, Georgian, Dioula*, Ingush*, English, Indonesian, Inuktitut*†, Iñupiaq†, Italian, Ngäbere, Fang, Fante, Fataluku, Faroese, Fiji Hindi†, Fijian, Finnish, Fon, Foodo, Fuzhounese†, Fula*, Futunan, Fur, Franco-Provençal†, French, Friulan, Icelandic, Irish, Oʼodham†, Occitan, Ovambo, Ossetian*, Oromo†, Arrernte†, Jèrriais, Hakka*, Jakaltek, Hajong, Harari*, Harayan†, Haitian Creole, Hausa*, Hawaiian, Khakas*, Khanty, Hebrew, Hiri Motu, Hill Mari, Hiligaynon, Hindi, Hokkien*, Hungarian*, Ho*, Hunsrik†, Ladino*, Lak*, Lamaholot†, Lampung, Latvian, Lao, Lezgin*, Lepcha, Latin, Lingala, Lio†, Limbu, Ligurian†, Livonian, Limburgish, Lithuanian, Lombard, Luxembourgish, Low German (Low Saxon)†, Luchazi, Okinawan, Ojibwe*†, Nagpuri†, Noon, Navajo, Nambya, Nateni, Nawat†, Náhuatl†, Narragansett, Nengone†, Nenets, Nepali†, Nauruan, Neapolitan, Neo-Aramaic*, Northern Ndebele†, Northern Emberá†, Northern Thai†, Northern Sotho†, Norfuk, Norwegian, Nogai*, Nzema, Nuer, Nheengatu†, Niuean, Venda, Veps*, Valencian, Venetian, Vietnamese, Vai, Võro, Maba*, Magahi†, Maguindanao*, Makassarese*, Macushi†, Mohegan†, Moldovan, Maltese, Mon†, Mandinka*, Mankanya, Maninka*†, Mansi*, Mam, Mamasa†, Mazahua, Marshallese, Mayo, Māori, Meitei†, Massachusett, Mbelime, Meadow Mari, Mende*, Malay*, Mapuche†, Maranao*, Marathi, Madurese*, Manx, Malagasy*, Malayalam*†, Maliseet†, Mandarin*, Macedonian, Minangkabau*, Mizo, Miskito, Mi'kmaq, Mixtec, Maithili, Mongolian*, Mauritian Creole, Moksha, Hmong†, Mongondow†, Mohawk, Mooré, Moronene, Muong, Muna, Mundari*, Udmurt, Urdu, Zarma*, Zapotec†, Zulu, Paicî, Pali*†, Pangasinan, Palauan†, Persian*†, Papiamento, Piedmontese, Picard†, Pitjantjatjara, Portuguese, Pashto, Polish, Pohnpeian, Punjabi†, Purepecha, Tabasaran, Tajik*, Talaud†, Tat*, Tatar*, Tarahumara†, Telugu†, Tetum, Ternate*, Tagalog, Tigre*, Tahitian, Turkish*, Turkmen*, Tausug*, Tamil*†, Tiwi, Tibetan†, Tigrinya, Thai†, Tok Pisin, Tongan, Tlapanec, Tłįchǫ†, Tlingit†, Toba Qom, Tobelo, Tokelauan, Tonga†, Tojolab'al, Tolaki†, Toraja-Saʼdan†, Tukang Besi†, Tucano†, Tulu†, Tuvaluan, Tuvan*, Tsakhur*, Tzetlal, Tsonga, Tzotzil, Tswana, Tuareg*†, Tshiluba, Uab Meto†, Umbundu†, Uzbek*, Sango, Sangir, Safen, Sakha*, Saho*, Santali*, Sambal, Sami*, Saraiki†, Saramaccan†, Sardinian, Sarikoli*, Seychellois Creole, Cebuano, Selkup, Sena, Central Alaskan Yup'ik, Serer*, S’gaw*, Samoan, Serbo-Croatian*, Sanskrit, Scots, Sidama*, Sylheti†, Sindhi*, Sinhala†, Sicilian, Siberian Yup'ik, Sorbian, Southern Ndebele†, Southern Tepehuán†, Southern Thai†, Southern Sotho†, Slavey†, Slovak, Slovene, Soninke*, Somali*, Sundanese*, Spanish, Sioux, Sukuma†, Susu, Surigaonon, Sranan Tongo, Swahili*, Swazi†, Swedish, Rade†, Rangpuri†, Rapa Nui, Rakhine†, Russian, Rohingya*, Romani*, Romanian*, Romansh, Rutul*, Rusyn, Yakan, Yaqui†, Yapese, Yiddish, Yoruba*, Yom†, Yukaghir, Yukatek Maya, Ukrainian, Wallisian, Walloon, Waama, Huastec, Waray, Warlpiri, Welsh, West Frisian, Western Apache, Uyghur*, Huichol, Wolof*, Wolio†, Shan†, Shona, Shuar.

*  indicates multiple entities exist in this language.

† indicates language is not reflected in preliminary order calculation.

Comments

  1. Yeah, huge fan here from states yeah! Yeah! Harold! Yeah!

    ReplyDelete

Post a Comment