Rethinking IDE Customization for Enhanced HAX: A Hyperdimensional Perspective

Roham Koohestani Delft University of Technology
[email protected] Maliheh Izadi Delft University of Technology
[email protected]

Abstract

As Integrated Development Environments (IDEs) increasingly integrate Artificial Intelligence, Software Engineering faces both benefits like productivity gains and challenges like mismatched user preferences. We propose Hyper-Dimensional (HD) vector spaces to model Human-Computer Interaction, focusing on user actions, stylistic preferences, and project context. These contributions aim to inspire further research on applying HD computing in IDE design.

Index Terms:

IDE Customization, Artificial Intelligence for Software Engineering, Human-AI eXperiences (HAX), Hyper-Dimensional Computing

I Introduction

Advancements in Artificial Intelligence (AI) have transformed Software Engineering (SE), with tools such as Cursor [1] and GitHub Spark [2] redefining development workflows. Despite gains in productivity and satisfaction [3, 4], mismatches between developer preferences and AI-generated code persist [5, 6], leading to increased code churn. Although fine-tuning can address these issues [7], it remains computationally expensive, leaving the customization of developer experiences largely unsolved. Existing research [8] has explored the use of machine learning for automated code formatting, but these methods tend to incur substantial performance overhead.

At the same time, an entire field of vector-symbolic AI has been gaining traction, with some methods leveraging it to store vast amounts of project context for model consumption [9]. The field of vector-symbolic AI, and more broadly Hyper-Dimensional Computing (HDC), is not new. Its origins can be traced back to 1995, with Plate’s development of holographic reduced representation [10, 11]. Given the attributes of high-dimensional vector spaces, it is easy and efficient to model environments that store a large amount of context in a compact form. As there is a great amount of computational resource dependence for running state-of-the-art machine learning models, HDC can be seen as a less costly alternative for advanced behavior and preference modeling.

Next, we present the existing computational theory behind Hyper-Dimensional (HD) vector spaces and then explore their potential applications in customizable Human-AI Experience (HAX) designs.

II Hyperdimensional Computing

The term HDC was first coined by Pentti Kanerva [10] and builds on previous work by various others, such as Plate with Holographic Reduced Representation [11] and Gayler with Vector-Symbolic Architecture [12]. As the underlying theory is mostly the same, we will refer to the concept as HDC. We will specifically look at the Multiply Add Permute (MAP) framework by Gayler [13];

II-A Foundations

The MAP framework operates on hyper-dimensional vectors using three key operations: multiplication, addition, and permutation. These operations enable the composition, binding, and manipulation of high-dimensional representations.

II-A1 Variables

In this framework, variables are represented as randomly-sampled, high-dimensional, approximately orthogonal vectors. These vectors have the form $v\in\{-1,+1\}^{D}$ . Although there are other variants of the framework with real- and integer-valued domains, we look at the bipolar variant. For instance, coding preferences like naming conventions or indentation styles could be defined as:

\texttt{NameFormat}=V_{1},\quad\texttt{Indentation}=V_{2}

These variables are combined using the binding operation to represent more complex concepts holistically.

II-A2 Multiplication (Binding)

Binding involves combining two vectors to create a third vector that is dissimilar (orthogonal) to both. This ensures that information about the two input vectors is encoded in the resulting vector. Here, binding can be implemented as component-wise multiplication:

\texttt{Bind}(A,B)=A\otimes B

where $A$ and $B$ are sampled from $\{-1,+1\}^{D}$ . Note that the binding operator is also its own inverse.

II-A3 Addition (Bundling)

Bundling aggregates multiple vectors into a single vector that represents their collective information. This is typically implemented as a component-wise sum followed by a normalization to remain in the same domain as the original vector:

\texttt{Bundle}(A,B,C)=\text{Normalize}(A\oplus B\oplus C)

II-A4 Permutation (Reordering)

Permutation denoted by $P$ rearranges the elements of a vector to encode positional or structural information. For example, cyclically shifting vector components with respect to their position and subsequently bundling them can be used to denote a sequence:

\texttt{Permute}(A)=P(A)

II-A5 Similarity Analysis

For evaluating the similarity of two vectors, the dot product/cosine similarity of the two vectors is calculated.

III HDC in IDEs

We now shift focus to applying the theory of HDC to model two main aspects of any software project: user behavior/preferences and project context.

III-A Action Sequences - Next Action Prediction

To model sequences of user actions, we draw inspiration from the work of Mozannar et al. [14], which focuses on modeling the states of developers. Consider an IDE that logs sequences of actions performed by a developer (e.g., opening files, typing code, running tests, etc.). Using HDC, we can represent these actions as high-dimensional vectors.

Each action (e.g., OpenFile, RunTest) can be represented as a vector sampled from the high-dimensional space. A sequence of $n$ actions is represented by binding and permuting these vectors to encode temporal order (see subsubsection II-A4):

	$\displaystyle\texttt{Sequence}=$	$\displaystyle\,P^{n-1}(\texttt{Action1})\otimes P^{n-2}(\texttt{Action2})$
		$\displaystyle\,\otimes\ldots\otimes P^{0}(\texttt{ActionN})$

For example, if a user performs the actions OpenFile, RunTest, and Commit, the sequence can be encoded as:

	$\displaystyle\texttt{Sequence}=$	$\displaystyle\,P^{2}(\texttt{OpenFile})\otimes P^{1}(\texttt{RunTest})$
		$\displaystyle\,\otimes P^{0}(\texttt{Commit})$

For a sequence of $M$ actions where $m\geq n$ we can encode a user’s behavior $UB$ as

\bigoplus_{i=0}^{m-n}\texttt{encode}((\texttt{Action}_{i},\ldots,\texttt{Action}_{i+n}))

To predict the next action after observing a sequence of $n-1$ actions, we use the properties of HD vector spaces. As the binding operation is distributive over bundling, we can attempt to extract the next action by applying $UB\otimes P(\texttt{encode}(\texttt{Action}_{1},\ldots,\texttt{Action}_{n-1}))$ . As all other dissimilar vectors result in negligible noise, the remaining vector PredAcc will be highly similar to the vector of the next action of the user. This action ActionX can therefore be found as

\arg\max_{\texttt{ActionX}}\text{Similarity}(\texttt{PredAcc},\texttt{ActionX})

This allows the IDE to predict the next most likely action, enabling the optimization of the user’s experience.

III-B Stylistic Preferences - Style-Matched Generation

Developers often have personal stylistic preferences when writing code. HDC can model and enforce these preferences for tasks like code completion [15, 16] or auto-formatting [17].

Using the approach inspired by Kanerva’s “Dollar of Mexico” analogy [18], we encode stylistic preferences for different languages or individual developers. For instance:

	$\displaystyle\texttt{STYLE}=$	$\displaystyle\,(\texttt{NameFormat}\otimes\texttt{CamelCase})$
		$\displaystyle\,\oplus(\texttt{Indentation}\otimes\texttt{Spaces4})$

	$\displaystyle\texttt{MODEL\_STYLE}=$	$\displaystyle\,(\texttt{NameFormat}\otimes\texttt{SnakeCase})$
		$\displaystyle\,\oplus(\texttt{Indentation}\otimes\texttt{Tabs})$

To adapt the generated code to the developer’s style, a mapping vector is created:

\texttt{MAPPING}=\texttt{MODEL\_STYLE}\otimes\texttt{STYLE}

If a LLM generates code with NameFormat = SnakeCase, the mapping ensures it is translated to CamelCase:

\texttt{CamelCase}\approx\texttt{SnakeCase}\otimes\texttt{MAPPING}

This enables the IDE to generate dynamically style-matched code and maintain consistent project styling.

III-C Representing Project Context

HDC also provides a robust framework for modeling the context of a software project and encompasses aspects such as programming languages, Application Programming Interface (APIs), design patterns, and usage scenarios.

For example, the project’s context can be encoded as:

	$\displaystyle\texttt{CONTEXT}=$	$\displaystyle\,(\texttt{LANG}\otimes\texttt{Python})$
		$\displaystyle\,\oplus(\texttt{API}\otimes\texttt{TensorFlow})$
		$\displaystyle\,\oplus(\texttt{Pattern}\otimes\texttt{Observer})$

This holistic vector representation allows the IDE to adapt suggestions and auto-completions to the specific context of the project. For instance, when working on a Python project with TensorFlow, the IDE can prioritize TensorFlow-related completions or suggest design patterns suitable for Python.

Furthermore, transitions between contexts, such as switching from hobby projects to work-related projects, can be modeled using mappings represented as:

\texttt{WORK\_CONTEXT}*\texttt{HOBBY\_CONTEXT}

IV Future Direction

While we have outlined several ways HDC can help model user behaviors to enhance their experience, the challenge lies in applying these methods effectively in real-world scenarios. Future research should explore methods to incorporate these representations more effectively and ensure that they influence the models’ generations. There exists research looking at interventions at the decoding stage [19] to improve the coding style adherence of LLMs. Additionally, IDE developers could explore adopting a mapping approach similar to the model-to-user mapping discussed in the previous section. This would allow them to align the style of code generated by the LLM with the user’s preferred coding style.

V Conclusion

In this paper, we look at applying the HDC theory to modeling user behavior and preferences through Hyper-dimensional vectors. We present three use cases in which user actions, stylistic preferences, and project setup can be represented using HDC. We encourage the IDE research and development community to engage in research that attempts to include such representations inside the IDE to improve the experience of users in the IDE through efficient approaches.

References

[1] “Cursor - the ai code editor,” accessed: 18-11-2024. [Online]. Available: https://www.cursor.com/
[2] “Github next — github spark,” accessed: 18-11-2024. [Online]. Available: https://githubnext.com/projects/github-spark
[3] E. Kalliamvakou, “Research: quantifying github copilot’s impact on developer productivity and happiness - the github blog,” accessed: 18-11-2024. [Online]. Available: https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
[4] “Developers save up to 8 hours per week with jetbrains ai assistant — the jetbrains blog,” accessed: 18-11-2024. [Online]. Available: https://blog.jetbrains.com/ai/2024/04/developers-save-up-to-8-hours-per-week-with-jetbrains-ai-assistant/
[5] GitClear, “Coding on copilot: 2023 data suggests downward pressure on code quality (incl 2024 projections) - gitclear,” accessed: 18-11-2024. [Online]. Available: https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality
[6] A. Sergeyuk, S. Titov, and M. Izadi, “In-ide human-ai experience in the era of large language models; a literature review,” in Proceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments, 2024, pp. 95–100.
[7] V. Franzoni, S. Tagliente, and A. Milani, “Generative models for source code: Fine-tuning techniques for structured pattern learning,” Technologies, vol. 12, no. 11, 2024. [Online]. Available: https://www.mdpi.com/2227-7080/12/11/219
[8] T. Parr and J. Vinju, “Towards a universal code formatter through machine learning,” ser. SLE 2016, 2016. [Online]. Available: https://doi.org/10.1145/2997364.2997383
[9] T. Munkhdalai, M. Faruqui, and S. Gopal, “Leave no context behind: Efficient infinite context transformers with infini-attention,” 2024. [Online]. Available: https://arxiv.org/abs/2404.07143
[10] P. Kanerva, “Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors,” Cognitive computation, vol. 1, pp. 139–159, 2009.
[11] T. A. Plate, “Holographic reduced representations,” IEEE Transactions on Neural networks, vol. 6, no. 3, pp. 623–641, 1995.
[12] R. W. Gayler, “Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience,” arXiv preprint cs/0412059, 2004.
[13] ——, “Multiplicative binding, representation operators & analogy (workshop poster),” 1998.
[14] H. Mozannar, G. Bansal, A. Fourney, and E. Horvitz, “Reading between the lines: Modeling user behavior and costs in ai-assisted programming,” ser. CHI ’24. Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3613904.3641936
[15] M. Izadi, J. Katzy, T. Van Dam, M. Otten, R. M. Popescu, and A. Van Deursen, “Language models for code completion: A practical evaluation,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13.
[16] M. Izadi, R. Gismondi, and G. Gousios, “Codefill: Multi-token code completion by jointly learning from structure and naming sequences,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 401–412.
[17] R. Prabhu, N. Phutane, S. Dhar, and S. Doiphode, “Dynamic formatting of source code in editors,” in 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), 2017, pp. 1–6.
[18] P. Kanerva, “What we mean when we say” what’s the dollar of mexico?”: Prototypes and mapping in concept space,” in 2010 AAAI fall symposium series, 2010.
[19] Z. Dai, C. Yao, W. Han, Y. Yuan, Z. Gao, and J. Chen, “Mpcoder: Multi-user personalized code generator with explicit and implicit style representation learning,” 2024. [Online]. Available: https://arxiv.org/abs/2406.17255