MACF: A Flexible Framework for Full Name Generation from Abbreviations Based on a Novel Multi-Attention Mask

With the rapid spread of information, abbreviations are used more and more commonly because they are convenient. However, the duplication of abbreviations can lead to confusion in many cases such as information management and information retrieval. The resultant confusion annoys users. Thus, inferring a full name from an abbreviation has practical and significant advantages. The bulk of studies in the literature inferred full names from pattern-matching rules or some similarity-based algorithms. However, these methods are unable to use various grained contexts properly. This paper proposes a flexible framework called the MACF (multi-attention mask abbreviation context and full name language model) to address the problem. With the abbreviation and context inputs, the MACF can automatically predict a full name by generation, where the context can be various grained. Moreover, the proposed multi-attention mask allows the model to learn the relationships among abbreviations, contexts, and full names, a process which makes the most of various grained contexts. The three corpora of different languages and fields were analyzed and measured with seven metrics in various aspects to evaluate the proposed framework. According to the experimental results, the MACF yielded more significant and consistent outputs than other baseline methods.

The main contributions can be summarized as follows:

  • To the best of our knowledge, this is the first study to propose a flexible framework for full name prediction and generation by using abbreviations and various grained contexts as the multiple inputs. With only abbreviations and context inputs, the framework can automatically extract high-quality features to infer full names, which can widely be used in many application scenarios.
  • This study proposes a novel multi-attention mask for abbreviations and various grained contexts to generate full names. For this purpose, three different multi-attention mask strategies are devised from completely masked to completely available to make the proposed framework more effective and practicable.
  • The proposed method improved the experimental results more significantly and consistently than the baseline methods. Furthermore, the experiments were conducted on three types of corpora in different languages and fields, and the results were measured with seven metrics in various aspects.
  • Source Code

    We provide codes for our proposed MACF framework.

    Our programs are all written in Python 3.7, using Keras 2.3.1 and Tensorflow 2.2.0 as deep learning architecture. So, please install Python, Keras, and Tensorflow first to run our programs.

    Data could be downloaded from here, ORGANIZATION, PAPER, and JOURNAL.

    Click here to download all the files, including the above datasets and codes.

    The pre-trained models is used chinese_wobert_L-12_H-768_A-12, wwm_cased_L-24_H-1024_A-16. They can be downloaded directly by clicking the links.


    Description of the directory:

    /Data: the data of experiment, including ORGANIZATION, PAPER, and JOURNAL.

    /Code: the codes of the proposed MACF.

    /Results: the directory of experimental results.

    /BERT: the pretrained models. (Please download the corresponding models by clicking the above links.)


    ## Code Explanation

    "MACF_{AM/CM/RM}_{Sentence/Paragraph/Keyword}_{ORGANIZATION/PAPER/JOURNAL}" is the format to name the MACF codes on experiments, where "AM/CM/RM" indicates differents multi-attention mask strategies as demonstrated in the paper, "Sentence/Paragraph/Keyword" indicates different grained context, and "ORGANIZATION/PAPER/JOURNAL" indicates different datasets.

    "EvaluateTools.py" provides some helper functions to evaluate the experimental results with metrics.


    Usage:

    python MACF_{AM/CM/RM}_{Sentence/Paragraph/Keyword}_{ORGANIZATION/PAPER/JOURNAL}.py [parameters]

    Possible parameters includes:

    "--mask-per [float]": The mask percentage in the randomly multi-attention mask. Default different values in different codes as shown in the paper.

    "--keyword [int]": The number of keywords in keyword level. Default 15.

    "--maxlen [int]": The maximum length of input data during training. Default 280 on ORGANIZATION, 256 on PAPER, and JOURNAL.

    "--batch-size [int]": The batch size during training. Default 4 on ORGANIZATION, 2 on PAPER and JOURNAL.

    "--steps-per-epoch [int]": The training steps of each epoch. Default 200 on ORGANIZATION, 400 on PAPER, and JOURNAL.

    "--epochs [int]": The training epochs. Default 60 on ORGANIZATION, 70 on PAPER, and 80 on JOURNAL.

    "--full-name-maxlen [int]": The maximum length of predicted full-name. Default 27 on ORGANIZATION, 16 on PAPER, and 32 on JOURNAL.


    We will update this page gradually, and please contact with me if you have any questions.

    Email: shuangyinli(at)scnu(dot)edu(dot)cn