| | --- |
| | title: Code Similarity Visualization with GraphCodeBERT |
| | emoji: 🧠 |
| | colorFrom: gray |
| | colorTo: blue |
| | sdk: gradio |
| | sdk_version: 5.38.0 |
| | app_file: app.py |
| | pinned: false |
| | license: mit |
| | short_description: Augmenting the Interpretability of GraphCodeBERT |
| | --- |
| | |
| | # Code Similarity Visualization with GraphCodeBERT |
| |
|
| | This interactive application visualizes token-level embeddings generated by [GraphCodeBERT](https://huggingface.co/microsoft/graphcodebert-base) for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the model’s embedding space, using PCA for dimensionality reduction. |
| |
|
| | ## ✒️ Reference |
| |
|
| | Martinez-Gil, J. (2025). |
| | **Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks**. |
| | *International Journal of Software Engineering and Knowledge Engineering*, 35(05), 657–678. |
| |
|
| | ## 🚀 Features |
| |
|
| | - Select two classical sorting algorithms. |
| | - Automatic tokenization and embedding via GraphCodeBERT. |
| | - PCA-based projection into 2D space for visualization. |
| | - Clear matplotlib plots showing token-level distribution differences. |
| |
|
| | ## 🧠 Technical Overview |
| |
|
| | - **Model**: [`microsoft/graphcodebert-base`](https://huggingface.co/microsoft/graphcodebert-base) |
| | - **Embedding Layer**: Last hidden state |
| | - **Reduction**: Principal Component Analysis (PCA) |
| | - **Interface**: Gradio |
| | - **Languages**: Python 3.10+ |
| |
|
| | ## 🛠 Dependencies |
| |
|
| | All required libraries are listed in `requirements.txt`: |
| |
|
| | ``` |
| | |
| | transformers |
| | torch |
| | scikit-learn |
| | numpy |
| | matplotlib |
| | gradio |
| | Pillow |
| | |
| | ``` |
| |
|
| | ## 🖥️ Intended Use |
| |
|
| | - Academic teaching and demonstration of code embeddings |
| | - Qualitative evaluation of pretrained models for source code |
| | - Supplementary visualization for software engineering publications |
| |
|
| | ## 📬 Contact |
| |
|
| | **Jorge Martinez-Gil** |
| | Senior Research Scientist in Computer Science |