Cross-Language Code Clone Detection Using Abstract Syntax Tree and Graph Neural Network

Document Type

Conference Proceeding

Publication Date

Winter 1-23-2024

Abstract

Code clones refer to code fragments that have similar functionality but may differ in syntax. When code duplication occurs, it can pose challenges during system maintenance and necessitate fixing errors in multiple locations. Existing methods for detecting code clones typically focus on clones within the same programming language. However, as the use of multiple programming languages becomes more prevalent, clones across different languages are becoming increasingly common. Recent research studies have explored the detection of cross-language code clones using Recurrent Neural Networks (RNN), specifically variants like LSTM and GRU. This paper presents an approach that combines the strengths of Abstract Syntax Trees (AST) and Graph Neural Networks (GNN) to identify cross-language code clones. The AST represents the code as a graph structure, while GNNs are capable of learning the state embeddings of each graph node, capturing information about its surroundings and the overall graph structure. Utilizing GNNs in the context of cross-language clone detection helps capture additional semantic information about the code fragments. Notably, GNNs have not been previously applied to the detection of cross-language code clones. Experimental results demonstrate that the proposed approach outperforms LSTM, GRU, and other state-of-the-art methods in terms of F1 score, precision, and recall.

Share

COinS