Computer Science

Improving code semantics learning using enhanced Abstract Syntax Tree

Abeer Hamdy Dr., The British University in Egypt
Zeina Swilam, The British University in Egypt
Andreas Pester, The British University in Egypt

Abstract

Code clones, which are defined as fragments of code sharing logical similarities, present significant challenges in software maintenance and debugging. While existing methods focus on clones within a single programming language, the growing prevalence of multilanguage programming necessitates identifying clones across diverse languages. Recent research studies for cross-language clone detection utilized the Abstract Syntax Trees (ASTs) to analyze code fragments; yet their limitations in capturing comprehensive structural and semantic information reduce the effectiveness of previous approaches, especially in identifying negative clone instances. Consequently, existing literature approaches exhibit high recall but low precision, impacting model performance in real-world clone detection. In this paper, we present Enhanced Abstract Syntax Trees (EASTs) that incorporate innovative condition type edges along with control flow edges to effectively capture both structural and semantic aspects of code information. Graph Neural Networks (GNNs) are utilized to generate feature vectors that capture the relationships and dependencies among the nodes of EASTs. Experimental results showcase our approach’s effectiveness in detecting positive and negative code clones across Java and Python fragments. With recall, precision, and F1 scores of 0.98, 0.92, and 0.94, respectively, our method outperforms current state-of-the art techniques and commonly used Recurrent Neural Networks (RNN) models in cross-language clone detection.

This paper has been withdrawn.

Computer Science

Improving code semantics learning using enhanced Abstract Syntax Tree

Abstract

Browse

Search

Author Corner

Links