Dear Colleagues, Please be informed that the scheduling team will begin planning venue bookings and invigilation for the CW exams this weekend, based on the CW maps. Once the reservations are finalized and announced, no further changes will be permitted. Best regards,
Abstract
Code clones, which are defined as fragments of code sharing logical similarities, present significant challenges in software maintenance and debugging. While existing methods focus on clones within a single programming language, the growing prevalence of multilanguage programming necessitates identifying clones across diverse languages. Recent research studies for cross-language clone detection utilized the Abstract Syntax Trees (ASTs) to analyze code fragments; yet their limitations in capturing comprehensive structural and semantic information reduce the effectiveness of previous approaches, especially in identifying negative clone instances. Consequently, existing literature approaches exhibit high recall but low precision, impacting model performance in real-world clone detection. In this paper, we present Enhanced Abstract Syntax Trees (EASTs) that incorporate innovative condition type edges along with control flow edges to effectively capture both structural and semantic aspects of code information. Graph Neural Networks (GNNs) are utilized to generate feature vectors that capture the relationships and dependencies among the nodes of EASTs. Experimental results showcase our approach’s effectiveness in detecting positive and negative code clones across Java and Python fragments. With recall, precision, and F1 scores of 0.98, 0.92, and 0.94, respectively, our method outperforms current state-of-the art techniques and commonly used Recurrent Neural Networks (RNN) models in cross-language clone detection.