biRATM by Shuangyin Li @ CSE HKUST

Bi-Directional Recurrent Attentional Topic Model (bi-RATM)

The bi-RATM models the documents as sequences of sentences from two directions by Topic Model. The main contribution is to consider the bi-directional recurrent Bayesian method and dynamic attention signals in the principled bi-RATM.

Source code

Our code and scripts are based on C++ and python 3.5, so the g++ and python3.5 are both needed. Click here to download. And here is the version of the online learning.

We also provide the preprocessed data, which are used in our experiments.

Experimental Codes and Results

You can download all the packages of codes and data to reproduce each part of the experiments, just following the README.txt contained.

Here, we release out our scripts and data, that one can reproduce the experiments reported in our paper. These experiments include Topic Coherence, document classification tasks (our model and the compared DL models), Attentions Visualization and Manual Testing. Also, one can test Perplexity the Perplexities with different C with the source code provided above.

Topic Coherence

Download the codes for Topic Coherence.

Visualize the topics and Attentions of Sentence

Download the codes to to visual the topic changes with different context sentence information. This is based on a trained model with WikiPedia. One can run this script, and input different contextual sentence to visual the topic changes of current sentence. Note that, in this process, the attention weights are the optimized based on bi-RATM.

Following the readme.txt in it, you can test more sentences with a simple interactive agent, ever than the examples listed on the paper.

Manual Attentions of Sentence

Download the codes to input the attention weights manually to see the topic distribution of current sentence change. Different from Visual-Attentions, the script of Manual-Attentions can adjust the attention weights, however, the attention weights maybe not the optimized attention weights for the current sentence.

Following the readme.txt in it, you can test more sentences with a simple interactive agent, ever than the examples listed on the paper.

Document Classification

Download the codes to show document classification tasks.

There are two types of Document Classification: Bayesian Topic models and Deep Learning Methods. Deep Learning Methods need CUDA and pyTorch to run the scripts.

If any questions, please be free to contact with me.