# MemoryPath: A Deep Reinforcement Learning Framework for Incorporating Memory Components into Knowledge Graph Reasoning (MemoryPath)

Knowledge Graph (KG) is identified as a major area in artificial intelligence, which is used for many real-world applications. The task of knowledge graph reasoning has been widely used and proven to be effective, which aims to find these reasonable paths for various relations to solve the issue of incompleteness in KGs. However, many previous works on KG reasoning, such as path-based or reinforcement learning-based methods, are too reliant on the pre-training, where the paths from the head entity and the target entity must be given to pre-train the model, which would easily lead the model to overfit on the given paths seen in the pre-training. To address this issue, we propose a novel reasoning model named MemoryPath with a deep reinforcement learning framework, which incorporates Long Short Term Memory(LSTM) and graph attention mechanism to form the memory component. The well-designed memory component can get rid of the pre-training so that the model doesn't depend on the given target entity for training. A tailored mechanism of reinforcement learning is presented in this proposed deep reinforcement framework to optimize the training procedure, where two metrics, Mean Selection Rate (MSR) and Mean Alternative Rate (MAR), are defined to quantitatively measure the complexities of the query relations. Meanwhile, three different training mechanisms, Action Dropout, Reward Shaping and Force Forward, are proposed to optimize the training process of the proposed MemoryPath. The proposed MemoryPath is validated on two datasets from FB15K-237 and NELL-995 on different tasks including fact prediction, link prediction and success rate in finding paths. The experimental results demonstrate that the tailored mechanism of reinforcement learning make the MemoryPath achieves state-of-the-art performance comparing with the other models. Also, the qualitative analysis indicates that the MemoryPath can store the learning process and automatically find the promising paths for a reasoning task during the training, and shows the effectiveness of the memory component.

The contributions can be summarized as follows:

• A novel deep reinforcement learning framework for KG reasoning is proposed, where the reasoning task is driven by an agent with continuous states based on the knowledge graph embeddings and well-designed reward function. The experimental results demonstrate that the proposed framework for KG reasoning can effectively improve the performances of the downstream tasks such as fact prediction and link prediction.
• To address the issue of dependency of model pre-training, a novel memory component based on Long Short Term Memory(LSTM) and graph attention mechanism is presented to get rid of the dependency of the known triples. The memory component formed by LSTM and graph attention mechanism can store the learning process and automatically find the promising paths for a reasoning task during the training.
• In contrast to prior works, two metrics are defined (\textbf{MSR} and \textbf{MAR}), to quantitatively measure the complexities of the relations when the model learns the alternative paths in the proposed framework. Meanwhile, three novel mechanisms of reinforcement learning are proposed to optimize the training process of the MemoryPath, action dropout, reward shaping and force forward.
• The proposed MemoryPath is evaluated on two famous datasets, FB15K-237 and NELL-995, with the downstream tasks. Experiments on fact prediction and link prediction demonstrate that the proposed MemoryPath outperforms state-of-the-art methods based on path reasoning. Also, the experiments on qualitative analysis also show the effectiveness of the memory component and the graph attention mechanism in finding reasonable paths.
• # Source Code

We provide code and scripts for our proposed MemoryPath model.

Our programs are all written in Python 3.6, using PyTorch 0.4.1 as deep learning architecture. So, please install Python and PyTorch first to run our programs.

Description of the directory:

/Experiments: the scripts for the experiments of Success Rate of Finding Paths and Analysis of the Reasoning Paths.

/Software: the code of the proposed MemoryPath.

## Code Structure

all_process_no_pretraining_dynamic_attention_lstm_reward_shaping_select_path_three_regularization_methods.py is our main program. It first trains the model, and then evaluates on two tasks: fact prediction and link prediction.

utils.py provides some help functions, including teacher forcing and removing rings from navigating paths.

networks.py contains all the models used in our experiments.

env.py is the definition of environment (i.e, knowledge graph) where the agent navigates. It also contains code for initialzation, selecting an action and walk forward.

In the BFS folder, KB.py is the definition of some data structures in the knowledge graph.

BFS.py defines the algorithm of breadth first search, which is used to judge whether a reasoning path is valid for a triple.

Usage:

python all_process_no_pretraining_dynamic_attention_lstm_reward_shaping_select_path_three_regularization_methods.py [parameters]

Possible parameters includes:

-a [int]: The dimension of the graph attention part of state vectors. Default 100.

-wr [float]: The reward for an invalid action. Default -0.05.

-ur [float]: The reward shaping coefficient for a list of actions which doesn't lead to ground truth tail entities. Default 0.01.

-tr [float]: Reward of global accuracy for a successful episode. Default 1.0.

-gw [float]: Weight for global accuracy. Default 0.1.

-lw [float]: Weight for path efficiency. Default 0.8.

-dw [float]: Weight for path diversity. Default 0.1.

-np [int]: Number of episodes used for training. Default 500.

-wd [float]: Base rate for L2 regularizaion. Default 0.005.

-d2 [float]: Base rate for Dropout. Default 0.3.

-adr [float]: Base rate for action dropout. Default 0.3.

-eb [float]: The base number used to compute difficulty coefficient. Default e.

-r [str]: Which relation to train. Its name is the same as the folder name of this relation in the data folder.

-t [str]: No need to set it. Keep it "retrain" is OK.

-hue [int]: Whether to update the hidden state every step even the agent selects an invalid action? Default 0 (No).

-mo [str]: Which model is used to compute the embedding part of state vectors? Default TransD.

-remo [str]: Which model is used to compute reward shaping? Default ConvE.