A Context-Enhanced Sentence Representation Learning method for Close Domains with Topic Modeling (bi-DCSR)

  • To tackle the issue of sentence representation learning in closed domains, we propose a context-enhanced sentence representation learning method (bi-DCSR). The proposed bi-DCSR has much more effective training and inference processes. The proposed method is readily trained from scratch with limited amount of training data, which is suitable for closed domains.
  • Through a context-enhanced process, the proposed method fully takes advantage of the bi-Directional contextual information to learn high-quality sentence representations. Experiments have demonstrated that with the bi-Directional contextual information, the proposed bi-DCSR can achieve the state-of-the-art performances in sentence classification tasks on three closed-domain corpora.
  • To support the bi-DCSR method, we present a novel HPTM, a unified probabilistic language model of the sequences of sentences. The HPTM learns the topic distributions of sentences, topic distributions of all the words in the dictionary, and word probabilities of hidden topics. The proposed HPTM provides a way to embed words and sentences into the same topic space that is highly interpretable. To the best of our knowledge, this is the first work to embed sentences and words into the same interpretable space with topic modeling.
  • An online algorithm is also proposed to allow HPTM method be applied in open domains, by which the proposed bi-DCSR method is even competitive with systems tuned on open domain scenarios while also being extremely efficient and easy to use.
  • Data and Source Code

    Our code is based on C++ and python 3.5, so the g++ and python are both needed. Click here to view and download the scripts of bi-DCSR.

    Click here to view and download the code of HPTM, which is the version of stochastic variational learning, and Here is the version of the online learning.

    We also provide all the datasets used in our paper, Wikipedia, arXiv, MedicalTS and LAW.

    Click here to download MedicalTS and LAW on GitHub.

    Click here to download arXiv on Dropbox.

    Click here to download Wikipedia on Dropbox.

    We will update this page gradually, and please contact with me if you have any questions.

    Email: shuangyinli(at)scnu(dot)edu(dot)cn