Personalized Visualization Recommendation

Xin Qian 1234-5678-9012-3456 University of MarylandCollege ParkMDUSA [email protected] , Ryan A. Rossi 1234-5678-9012-3456 Adobe ResearchSan JoseCAUSA [email protected] , Fan Du Adobe ResearchSan JoseCAUSA [email protected] , Sungchul Kim Adobe ResearchSan JoseCAUSA [email protected] , Eunyee Koh Adobe ResearchSan JoseCAUSA [email protected] , Sana Malik Adobe ResearchSan JoseCAUSA [email protected] , Tak Yeon Lee Adobe ResearchSan JoseCAUSA [email protected] and Nesreen K. Ahmed Intel LabsSanta ClaraCAUSA [email protected]

Abstract.

Visualization recommendation work has focused solely on scoring visualizations based on the underlying dataset, and not the actual user and their past visualization feedback. These systems recommend the same visualizations for every user, despite that the underlying user interests, intent, and visualization preferences are likely to be fundamentally different, yet vitally important. In this work, we formally introduce the problem of personalized visualization recommendation and present a generic learning framework for solving it. In particular, we focus on recommending visualizations personalized for each individual user based on their past visualization interactions (e.g., viewed, clicked, manually created) along with the data from those visualizations. More importantly, the framework can learn from visualizations relevant to other users, even if the visualizations are generated from completely different datasets. Experiments demonstrate the effectiveness of the approach as it leads to higher quality visualization recommendations tailored to the specific user intent and preferences. To support research on this new problem, we release our user-centric visualization corpus consisting of 17.4k users exploring 94k datasets with 2.3 million attributes and 32k user-generated visualizations.

Personalized visualization recommendation, user-centric visualization recommendation, deep learning

^†^†journal: FACMP^†^†journalvolume: 0^†^†journalnumber: 1^†^†article: 1^†^†publicationmonth: 2^†^†copyright: acmcopyright^†^†doi: 0000001.0000001^†^†copyright: none

1. Introduction

With massive datasets becoming ubiquitous, visualization recommendation systems have become increasingly important. These systems have the promise of enabling rapid visual analysis and exploration of such datasets. However, existing end-to-end visualization recommendation systems output a long list of visualizations based solely on simple visual rules (Wongsuphasawat et al., 2015, 2017). These systems lack the ability to recommend visualizations that are personalized to the specific user and the tasks that are important to them. This makes it both time-consuming and difficult for users to effectively explore such datasets and find meaningful visualizations.

Recommending visualizations that are personalized to a specific user is an important unsolved problem. Prior work on visualization recommendation have focused mainly on rule-based or ML-based approaches that are completely agnostic to the user of the system. In particular, these systems recommend the same ranked list of visualizations for every user, despite that the underlying user interests, intent, and visualization preferences are fundamentally different, yet vitally important for recommending useful and interesting visualizations for a specific user. The rule-based methods use simple visual rules to score visualizations whereas the existing ML-based methods have focused solely on classifying design choices (Hu et al., 2019) or ranking such design choices (Moritz et al., 2018) using a corpus of visualizations that are not tied to a user. Neither of these existing classes of visualization recommendation systems focus on modeling individual user behavior nor personalizing for individual users, which is at the heart of our work.

In this work, we introduce a new problem of personalized visualization recommendation and propose an expressive framework for solving it. The problem studied in this work is as follows: Given a set of $n$ users where each user has their own specific set of datasets, and each of the user datasets contains a set of relevant visualizations (i.e., visualizations a specific user has interacted with in the past, either implicitly by clicking/viewing or explicitly by liking or adding the visualization to their favorites or a dashboard they are creating), the problem of personalized visualization recommendation is to learn an individual recommendation model for every user such that when a user selects a possibly new dataset of interest, we can apply the model for that specific user to recommend the top most relevant visualizations that are most likely to be of interest to them (despite that there is no previous implicit/explicit feedback on any of the visualizations from the new dataset). Visualizations are fundamentally tied to a dataset as they consist of the (i) set of visual design choices (e.g., chart-type, color/size, x/y) and the (ii) subset of data attributes from the full dataset used in the visualization. Therefore, how can we develop a learning framework for solving the personalized visualization recommendation problem that is able to learn from other users and their relevant visualizations, even when those visualizations are from tens of thousands of completely different datasets with no shared attributes?

There are two important and fundamental issues at the heart of the personalized visualization recommendation problem. First, since visualizations are defined based on the attributes within a single specific dataset, there is no way to leverage the visualization preferences of users across different datasets. Second, since each user often has their own dataset of interest (not shared by other users), there is no way to leverage user preferences across different datasets. In this work, we address both problems. Notably, the framework proposed in this paper naturally generalizes to the following problem settings: (a) single dataset with a single set of visualizations shared among all users, and (b) tens of thousands of datasets that are not shared between users where each dataset of interest to a user gives rise to a completely different set of possible visualizations. However, the existing work cannot be used to solve the new problem formulation that relaxes the single dataset assumption to make it more general and widely applicable.

In the problem formulation of personalized visualization recommendation, each user can have their own set of datasets, and since each visualization represents a series of design choices and data (i.e., attributes tied to a specific dataset), then this gives rise to a completely disjoint set of visualizations for each user. Hence, there is no way to directly leverage visualization feedback from other users, since the visualizations are from different datasets. Furthermore, visualizations are dataset specific, since they are generated based on the underlying dataset, and therefore any feedback from a user cannot be directly leveraged for making better recommendations for other users and datasets. To understand the difficulty of the proposed problem of personalized visualization recommendation, the equivalent problem with regards to traditional recommender systems would be as if each user on Amazon (or Netflix) had their own separate set of disjoint products (or movies) that no other user could see and provide feedback. In such a setting, how can we then use feedback from other users? Furthermore, given a single dataset uploaded by some user, there are an exponential number of possible visualizations that can be generated from it. This implies that even if there are some users interested in a single dataset, the amount of preferences by those users is likely to be extremely small compared to the exponential number of possible visualizations that can be generated and preferred by such users.

To overcome these issues and make it possible to solve the personalized visualization recommendation problem, we introduce two new models and representations that enable learning from dataset and visualization preferences across different datasets and users, respectively. First, we propose a novel model and representation that encodes users, their interactions with attributes (i.e., attributes in any dataset) and we map every attribute to a shared k-dimensional meta-feature space that enables the model to learn from user-level data preferences across all the different datasets of the users. Most importantly, the shared meta-feature space is independent of the specific datasets and the meta-features represent general functions of an arbitrary attribute, independent of the user or dataset that it arises. This enables the model to learn from user-level data preferences, despite that those preferences are on entirely different datasets. Second, we propose a novel user-level visual preference graph model for visualization recommendation using the proposed notion of a visualization configuration that enables learning from user-level visual preferences across different datasets and users. Importantly, the graph model is able to directly learn from user-level visual preferences across different datasets. This model encodes users and their visual-configurations (sets of design choices). Since each visual-configuration node represents a set of design choices that are by definition not tied to a user-specific dataset, then the proposed model can use this user-level visual graph to infer and make connections between other similar visual-configurations that are also likely to be useful to that user. This new graph model is critical since it allows the learning component to learn from user-level visual preferences (which are visual-configurations) across the different datasets and users. Without this novel component, there would be no way to learn from other users visual preferences (sets of design choices).

1.1. Summary of Contributions

This work makes the following key contributions:

•

Problem Formulation: We introduce and formulate the problem of personalized visualization recommendation that learns a personalized visualization recommendation model for every individual user based on their past visualization feedback, and the feedback of other users and their relevant visualizations from completely different datasets. Our formulation removes the unrealistic assumption of a single dataset shared across all users (and thus that there exists a single set of dataset-specific visualizations shared among all users). To solve this problem, the model must be able to learn from the visualization and data preferences of many users across tens of thousands of different datasets.
•

Framework: We propose a flexible framework that expresses a class of methods for the personalized visualization recommendation problem. To solve this new problem, we introduce new graph representations and models that enable learning from the visualization and data preferences of users despite them being in different datasets entirely. More importantly, the proposed framework is able to exploit the visualization and data preferences of users across tens of thousands of different datasets.
•

Effectiveness: The extensive experiments demonstrate the importance and effectiveness of learning personalized visualization recommendation models for each individual user. Notably, our personalized models perform significantly better than SOTA baselines with a mean improvement of 29.8% and 64.9% for HIT@5 and NDCG@5, respectively. Furthermore, the deep personalized visualization recommendation models are shown to perform even better. Finally, comprehensive ablation studies are performed to understand the effectiveness of the different learning components.

1.2. Organization of article

First, we introduce a new problem of visualization recommendation in Section 2 that learns a personalized model for each of the $n$ individual users by leveraging a large collection of datasets and relevant visualizations from each of the datasets in the collection. Notably, the learning of the individual user models are able to exploit the preferences of other users (even if the preferences are on a completely different dataset) including the data attributes used in a visualization, visual design choices, and actual visualizations generated despite that no other user may have used the underlying dataset of interest. In Section 3, we propose a computational framework for solving the new problem of personalized visualization recommendation. Further, we also propose deep personalized visualization recommendation models in Section 4 that are able to learn complex non-linear functions between the embeddings of the users, visualization-configurations, datasets and the data attributes used in the visualizations. Next, Section 5 describes the user-centric visualization corpus we created and made publicly accessible for studying this problem. Then Section 6 provides a comprehensive and systematic evaluation of the proposed approach and framework for the personalized visualization recommendation problem while Section 7 discusses related work. Finally, Section 8 concludes with a summary of the key findings and briefly discusses directions for future work on this new problem.

2. Personalized Visualization Recommendation

In this section, we formally introduce the Personalized Visualization Recommendation problem. The personalized visualization recommendation problem has two main parts: (1) training a personalized visualization recommendation model for every user $i\in[n]$ (Section 2.2), and (2) leveraging the user-personalized model to recommend personalized visualizations based on the users past dataset and visualization feedback/preferences (Section 2.3).

(1)

Personalized Model Training (Sec. 2.2): Given a user-level training visualization corpus $\mathbcal{D}=\{(\mathcal{X}_{i},\mathbb{V}_{i})\}_{i=1}^{n}$ consisting of $n$ users and their corresponding datasets of interest $\mathbcal{X}_{i}=\{\boldsymbol{\mathrm{X}}_{i1},\ldots,\boldsymbol{\mathrm{X}}_{ij},\ldots\}$ as well as their relevant sets of visualizations $\mathbb{V}_{i}=\{\mathbcal{V}_{i1},\ldots,\mathbcal{V}_{ij},\ldots\}$ for those datasets, we first learn a user-level personalized model $\mathcal{M}$ from the training corpus $\mathbcal{D}$ that best captures and scores the effective visualizations for user $i$ highly while assigning low scores to visualizations that are likely to not be preferred by the user.
(2)

Recommending Personalized Visualizations (Sec. 2.3): Given a user $i\in[n]$ and a dataset $\boldsymbol{\mathrm{X}}_{ij}$ of interest to user $i$ , we use the trained personalized visualization recommendation model $\mathbcal{M}$ for user $i$ to generate, score, and recommend the top visualizations of interest to user $i$ for dataset $\boldsymbol{\mathrm{X}}_{ij}$ . Note that we naturally support the case when the dataset $\boldsymbol{\mathrm{X}}_{ij}\not\in\mathbcal{X}_{i}$ is new or when the dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ is not new, but we have at least one or more previous user feedback about the visualizations the user likely prefers from that dataset.

The fundamental difference between the ML-based visualization recommendation problem introduced in (Qian et al., 2020) and the personalized visualization recommendation problem described above is that the personalized problem focuses on modeling the behavior, data, and visualization preferences of individual users. Since local visualization recommendation models are learned for every user $i\in[n]$ (as opposed to training a global visualization recommendation model), it becomes important to leverage every single piece of feedback from the users. For instance, global visualization recommendation models essentially ignore the notion of a user, and therefore can leverage all available training data to learn the best global visualization recommendation model. However, personalized visualization recommendation models explicitly leverage specific user feedback to learn the best personalized local model for every user $i\in[n]$ , and there of course is far less feedback from individual users.

2.1. Implicit and Explicit User Feedback for Personalized Vis. Rec.

In this work, relevant visualizations $\mathbcal{V}_{ij}\in\mathbb{V}_{i}$ for a specific user $i$ and dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ are defined generally, as the term relevant may refer to visualizations that a user clicked, liked, generated, among many other user actions that demonstrate positive feedback towards a visualization. In terms of personalized visualization recommendation, there are two general types of user feedback: implicit or explicit user feedback. Implicit user visualization feedback corresponds to user feedback that is not explicitly stated and includes user actions such as when a user clicks on a visualization or hovers over a visualizations for more than a specific time. Conversely, explicit user feedback on a visualizations refers to feedback that is more explicitly stated about a visualization such as when a user explicitly likes a visualizations, or generates a visualization. Obviously, implicit user feedback is available at a larger quantity than explicit user feedback. However, implicit user feedback is not as strong as user feedback that is explicit, e.g., a user that clicked a visualization is not as strong as a user that explicitly liked a visualization.

We propose two different types of user preferences (implicit and explicit user feedback) that are important for learning personalized visualization recommendation models for individual users, including the data preferences and visual preferences of each individual user. For learning the data and visual preferences of a user, there is both implicit and explicit user feedback that can be used for developing personalized visualization recommender systems.

2.1.1. Data Preferences of Users: Implicit and Explicit Data Feedback

There is naturally both implicit and explicit user feedback regarding the data preferences of users. Explicit user feedback about the data preferences of a user is a far stronger signal than implicit user feedback, however, there is typically a lot more implicit user feedback for learning than explicit feedback from the user.

•

Implicit Data Preferences of Users. An example of implicit feedback w.r.t. data preferences of the user is when a user clicks (or hovers over) a visualization that uses two attributes $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ from some arbitrary user-selected dataset. We can then extract the users data preferences from the visualization by encoding the two attributes that were used in the visualization preferred by that user.
•

Explicit Data Preferences of Users. Similarly, an example of explicit feedback w.r.t. data preferences of the user is when a user explicitly likes a visualization (or adds a visualization to their dashboard) that uses two attributes $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ from some arbitrary user-selected dataset. In this work, we use another form of explicit feedback based on a user-generated visualization and the attributes (data) used in the generated visualization. Hence, this is a form of explicit feedback, since the user explicitly selects the attributes and creates a visualization using them (as opposed to clicking on a visualization automatically generated by a system).

Besides using implicit and explicit feedback provided by the user based on the click or like of a visualization and the data used in it, we can also leverage an even more direct feedback about a users data preferences. For instance, many visualization recommender systems allow users to select an attribute of interest to use in the recommended visualizations. As such we can naturally leverage any feedback of this type as well.

2.1.2. Visual Preferences of Users: Implicit and Explicit Visual Feedback

In terms of visual preferences of users, there is both implicit and explicit user feedback that can be used to learn a better personalized visualization recommendation model for individual users.

•

Implicit Visual Preferences of Users. An example of implicit feedback w.r.t. visual preferences of the user is when a user clicks (or hovers over) a visualization from some arbitrary user-selected dataset. We can then extract the users visual preferences from the visualization, and appropriately encode it for learning the visual preferences of the individual user.
•

Explicit Visual Preferences of Users. Similarly, an example of explicit feedback w.r.t. visual preferences of the user is when a user explicitly likes a visualization (or adds a visualization to their dashboard). Just as before, we can then extract the visual preferences of the user from the visualization (mark/chart type, x-type, y-type, color, size, x-aggregate, and so on) and leverage the individual visual preferences or a combination of them for learning the user-specific personalized vis. rec. model.

2.2. Training User Personalized Visualization Recommendation Model

Given user log data

(1)

\displaystyle\mathbcal{D}

\displaystyle=\{(\mathbcal{X}_{1},\mathbb{V}_{1}),\ldots,(\mathbcal{X}_{i},\mathbb{V}_{i}),\ldots,(\mathbcal{X}_{n},\mathbb{V}_{n})\}=\{(\mathbcal{X}_{i},\mathbb{V}_{i})\}_{i=1}^{n}

where for each user $i\in[n]$ , we have the set of datasets of interest to that user denoted as $\mathbcal{X}_{i}$ along with the sets of relevant visualizations $\mathbb{V}_{i}$ generated by user $i$ for every dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ . More specifically,

(2)		$\displaystyle\mathbb{V}_{i}$	$\displaystyle=\{\mathbcal{V}_{i1},\ldots,\mathbcal{V}_{ij},\ldots\}\quad\text{ and }\;\;\;\;\mathbcal{V}_{ij}=\{\mathbcal{V}_{ij1},\ldots,\mathcal{V}_{ijk},\ldots\}$
(3)		$\displaystyle\mathbcal{X}_{i}$	$\displaystyle=\{\boldsymbol{\mathrm{X}}_{i1},\ldots,\boldsymbol{\mathrm{X}}_{ij},\ldots\}\quad\text{ and }\;\;\;\;\boldsymbol{\mathrm{X}}_{ij}=[\boldsymbol{\mathrm{x}}_{ij1}\,\,\boldsymbol{\mathrm{x}}_{ij2}\,\cdots]$

where $\boldsymbol{\mathrm{x}}_{ijk}$ is the $k$ th attribute (column vector) of $\boldsymbol{\mathrm{X}}_{ij}$ . Hence, the number of attributes in $\boldsymbol{\mathrm{X}}_{ij}$ has no relation to the number of relevant visualizations $|\mathbcal{V}_{ij}|$ that a user $i$ preferred for that dataset. For a single user $i$ , the number of user preferred visualizations across all datasets of interest for that user is

(4)

v_{i}=\sum_{\mathbcal{V}_{ij}\in\mathbb{V}_{i}}|\mathbcal{V}_{ij}|

where $\mathbcal{V}_{ij}$ is the set of visualizations preferred by user $i$ from dataset $j$ . Thus, the total number of user generated visualizations across all users and datasets is

(5)

v=\sum_{i=1}^{n}\sum_{\mathbcal{V}_{ij}\in\mathbb{V}_{i}}|\mathbcal{V}_{ij}|

For simplicity, let $\mathcal{V}_{ijk}\in\mathbcal{V}_{ij}=\{\mathcal{V}_{ij1},\ldots,\mathcal{V}_{ijk},\ldots\}$ denote the visualization generated by user $i$ from dataset $j$ , that is, $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ , specifically using the subset of attributes $\boldsymbol{\mathrm{X}}_{ij}^{(k)}$ from the dataset $\boldsymbol{\mathrm{X}}_{ij}$ . Further, every user $i\in[n]$ is associated with a set of datasets $\mathbcal{X}_{i}=\{\boldsymbol{\mathrm{X}}_{i1},\ldots,\boldsymbol{\mathrm{X}}_{ij},\ldots\}$ of interest. Let $\boldsymbol{\mathrm{X}}_{ij}$ be the $j$ th dataset of interest for user $i$ and let $|\boldsymbol{\mathrm{X}}_{ij}|$ denote the number of attributes (columns) of the dataset matrix $\boldsymbol{\mathrm{X}}_{ij}$ . Then the number of attributes across all datasets of interest to user $i$ is

(6)

\displaystyle m_{i}=\sum_{\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}}|\boldsymbol{\mathrm{X}}_{ij}|

and the number of attributes across all $n$ users and all their datasets is

(7)

\displaystyle m=\sum_{i=1}^{n}\sum_{\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}}|\boldsymbol{\mathrm{X}}_{ij}|

Definition 1 (Space of Attribute Combinations).

Given an arbitrary dataset matrix $\boldsymbol{\mathrm{X}}_{ij}$ , let $\mathbb{X}_{ij}$ denote the space of attribute combinations of $\boldsymbol{\mathrm{X}}_{ij}$ defined as

(8)		$\displaystyle\Sigma:\boldsymbol{\mathrm{X}}_{ij}\to\mathbb{X}_{ij},\quad\text{s.t.}$
(9)		$\displaystyle\mathbb{X}_{ij}=\{\boldsymbol{\mathrm{X}}_{ij}^{(1)},\ldots,\boldsymbol{\mathrm{X}}_{ij}^{(k)},\ldots\},$

where $\Sigma$ is an attribute combination generation function and every $\boldsymbol{\mathrm{X}}_{ij}^{(k)}\in\mathbb{X}_{ij}$ is a different subset (combination) of attributes from $\boldsymbol{\mathrm{X}}_{ij}$ consisting of one or more attributes from $\boldsymbol{\mathrm{X}}_{ij}$ .

Property 1.

Let $|\boldsymbol{\mathrm{X}}_{ij}\!|$ and $|\boldsymbol{\mathrm{X}}_{ik}|$ denote the number of attributes (columns) of two arbitrary datasets $|\boldsymbol{\mathrm{X}}_{ij}|$ and $|\boldsymbol{\mathrm{X}}_{ik}|$ of user $i$ . If $|\boldsymbol{\mathrm{X}}_{ij}|>|\boldsymbol{\mathrm{X}}_{ik}|$ , then $|\mathbb{X}_{ij}|>|\mathbb{X}_{ik}|$ .

It is straightforward to see that if $|\boldsymbol{\mathrm{X}}_{ij}|>|\boldsymbol{\mathrm{X}}_{ik}|$ , then the number of attribute combinations of $\boldsymbol{\mathrm{X}}_{ij}$ denoted as $|\mathbb{X}_{ij}|$ is larger than the number of different attribute subsets that can be generated from $\boldsymbol{\mathrm{X}}_{ik}$ denoted as $|\mathbb{X}_{ik}|$ . Property 1 is important as it characterizes the space of attribute combinations/subsets for a given dataset $\boldsymbol{\mathrm{X}}_{ij}$ and therefore can be used to understand the corresponding space of possible visualizations that can be generated from a given dataset, as these are also tied.

In this work, we assume a visualization is specified using some grammar such as Vega-Lite (Satyanarayan et al., 2016). Therefore, the data mapping and design choices of the visualization are encoded in json (or json-like format), and can easily render a visualization. A visualization configuration $\mathcal{C}$ (design choices) and the data attributes $\boldsymbol{\mathrm{X}}^{(k)}_{ij}$ selected from a dataset $\boldsymbol{\mathrm{X}}_{ij}$ is everything necessary to generate a visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}^{(k)}_{ij},\mathcal{C})$ . Hence, the tuple $(\boldsymbol{\mathrm{X}}^{(k)}_{ij},\mathcal{C})$ defines a unique visualization $\mathcal{V}$ that leverages the subset of attributes $\boldsymbol{\mathrm{X}}^{(k)}_{ij}$ from dataset $\boldsymbol{\mathrm{X}}_{ij}$ along with the visualization configuration $\mathcal{C}\in\mathbcal{C}$ .

Definition 2 (Visualization Configuration).

Given a visualization $\mathcal{X}$ generated using a subset of attributes $\boldsymbol{\mathrm{X}}_{ij}^{(k)}$ from dataset $\boldsymbol{\mathrm{X}}_{ij}$ , we define a function

(10)

\displaystyle\Gamma:\mathcal{V}\to\mathcal{C}

where $\Gamma$ maps every data-dependent design choice of the visualization to its corresponding type (i.e., the attribute mapping to the x-axis of the visualization $\mathcal{V}$ is replaced with its general type such as quantitative, nominal, ordinal, temporal, etc). The resulting visualization configuration $\mathcal{C}$ is an abstraction of the visualization $\mathcal{V}$ , in the sense that all the data attribute bindings have been abstracted and replaced with their general data attribute type. Hence, $\mathcal{C}$ is an abstraction of $\mathcal{V}$ .

Definition 3 (Space of Visualization Configurations).

Let $\mathbcal{C}$ denote the space of all visualization configurations such that a visualization configuration $\mathcal{C}_{ik}\in\mathbcal{C}$ defines an abstraction of a visualization where for each visual design choice (x, y, marker-type, color, size, etc.) that maps to an attribute in some dataset $\boldsymbol{\mathrm{X}}_{ij}$ , we replace it with its type such as quantitative, nominal, ordinal, temporal or some other general property characterizing the attribute that can be selected. Therefore visualization configurations are essentially visualizations without any attributes (data), or visualization abstractions that are by definition data-independent.

Property 2.

Every visualization configuration $\mathcal{C}_{ik}\in\mathbcal{C}$ is independent of any data matrix $\boldsymbol{\mathrm{X}}$ (by Definition 3).

The above implies that $\mathcal{C}_{ik}\in\mathbcal{C}$ can potentially arise from any arbitrary dataset and is therefore not tied to any specific dataset since visualization configurations are general abstractions where the data bindings have been replaced with their general type, e.g., if x/y in some visualization mapped to an attribute in $\boldsymbol{\mathrm{X}}$ , then it is replaced by its type (i.e., ordinal, quantitative, categorical, etc). A visualization configuration and the attributes selected from a dataset is everything necessary to generate a visualization. The size of the space of visualization configurations is large since visualization configurations come from all possible combinations of design choices and their values.

Definition 4 (Space of Visualizations of $\boldsymbol{\mathrm{X}}_{ij}$ ).

Given an arbitrary dataset matrix $\boldsymbol{\mathrm{X}}_{ij}$ , we define $\mathbb{V}^{\star}_{ij}$ as the space of all possible visualizations that can be generated from $\boldsymbol{\mathrm{X}}_{ij}$ . More formally, the space of visualizations $\mathbb{V}^{\star}_{ij}$ is defined with respect to a dataset $\boldsymbol{\mathrm{X}}_{ij}$ and the space of visualization configurations $\mathbcal{C}$ ,

(11)		$\displaystyle\mathbb{X}_{ij}=\Sigma(\boldsymbol{\mathrm{X}}_{ij})=\{\boldsymbol{\mathrm{X}}_{ij}^{(1)},\ldots,\boldsymbol{\mathrm{X}}_{ij}^{(k)},\ldots\}$
(12)		$\displaystyle\xi:\mathbb{X}_{ij}\times\mathbcal{C}\to\mathbcal{V}_{ij}^{\star}$

where $\mathbb{X}_{ij}=\{\boldsymbol{\mathrm{X}}_{ij}^{1},\ldots,\boldsymbol{\mathrm{X}}_{ij}^{(k)},\ldots\}$ is the set of all possible attribute combinations of $\boldsymbol{\mathrm{X}}_{ij}$ (Def. 1). More succinctly, $\xi:\Sigma(\boldsymbol{\mathrm{X}}_{ij})\times\mathbcal{C}\to\mathbcal{V}_{ij}^{\star}$ , and therefore $\xi(\Sigma(\boldsymbol{\mathrm{X}}_{ij}),\mathbcal{C})=\mathbcal{V}_{ij}^{\star}$ . The space of all visualizations $\mathbcal{V}_{ij}^{\star}$ is determined entirely by the underlying dataset, and therefore remains the same for all $n$ users. The difference in our personalized visualization recommendation problem is the relevance of each visualization in the space of all possible visualizations generated from an arbitrary dataset. Given a subset of attributes $\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!\in\boldsymbol{\mathrm{X}}_{ij}$ from dataset $\boldsymbol{\mathrm{X}}_{ij}$ and a visualization configuration $\mathcal{C}\in\mathbcal{C}$ , then $\xi(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C})\in\mathbcal{V}_{ij}^{\star}$ is the corresponding visualization.

Importantly, fix $\mathbcal{C}$ and let $\boldsymbol{\mathrm{X}}\not=\boldsymbol{\mathrm{Y}}\implies\forall i,j\,\,\boldsymbol{\mathrm{x}}_{i}\not=\boldsymbol{\mathrm{y}}_{j}$ , then $\xi(\Sigma(\boldsymbol{\mathrm{X}}),\mathbcal{C})\cap\xi(\Sigma(\boldsymbol{\mathrm{Y}}),\mathbcal{C})=\emptyset$ . This implies the space of possible visualizations that can be generated is entirely dependent on the dataset (not the user). Hence, for any two datasets $\boldsymbol{\mathrm{X}}$ and $\boldsymbol{\mathrm{Y}}$ without any shared attributes between them, the set of visualizations that can be generated from $\boldsymbol{\mathrm{X}}$ or $\boldsymbol{\mathrm{Y}}$ is completely different,

\xi(\Sigma(\boldsymbol{\mathrm{X}}),\mathbcal{C})\cap\xi(\Sigma(\boldsymbol{\mathrm{Y}}),\mathbcal{C})=\emptyset

This has important consequences for the new problem of personalized visualization recommendation. Since it is unlikely that any two users care about the same underlying dataset, and even if they did, it is even far more unlikely that they have any relevant visualizations in common (just w.r.t. the exponential size of the visualization space for a single dataset with a reasonable amount of attributes). Therefore, it is not possible nor practical to leverage the relevant visualizations of a user directly. Instead, we need to decompose a visualization $\mathcal{V}$ into its more meaningful components such as: (i) the characteristics of the data attributes $\boldsymbol{\mathrm{X}}_{ij}^{(k)}$ used in a visualization, and (ii) the visual design choices (chart-type/mark, color, size, and so on).

Definition 5 (Relevant Visualizations of User $i$ and Dataset $\boldsymbol{\mathrm{X}}_{ij}$ ).

Let $\mathbcal{V}_{ij}\in\mathbb{V}_{i}$ define the set of relevant (positive) visualizations for user $i$ with respect to dataset $\boldsymbol{\mathrm{X}}_{ij}$ . Therefore, $\mathbb{V}_{i}=\bigcup_{\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}}\mathbcal{V}_{ij}$ where $\mathbb{V}_{i}$ is the set of all relevant visualizations across all datasets $\mathbcal{X}_{i}$ of interest to user $i$ .

Definition 6 (Non-relevant Visualizations of User $i$ and Dataset $\boldsymbol{\mathrm{X}}_{ij}$ ).

For a user $i$ , let $\mathbb{V}_{ij}^{\star}$ denote the space of all visualizations that arise from the $j$ th dataset $\boldsymbol{\mathrm{X}}_{ij}$ such that the relevant (positive) visualizations $\mathbcal{V}_{ij}$ satisfies $\mathbcal{V}_{ij}\subseteq\mathbcal{V}_{ij}^{\star}$ , then the space of non-relevant visualizations for user $i$ on dataset $\boldsymbol{\mathrm{X}}_{ij}$ missing is $\mathbcal{V}_{ij}^{-}=\mathbcal{V}_{ij}^{\star}\setminus\mathbcal{V}_{ij}$ , which follows from $\mathbcal{V}_{ij}^{-}\cup\mathbcal{V}_{ij}=\mathbcal{V}_{ij}^{\star}$ .

Table 1. Summary of notation. Matrices are bold upright roman letters; vectors are bold lowercase letters.

$\mathbcal{D}$	user log data $\mathbcal{D}=\{(\mathbcal{X}_{i},\mathbb{V}_{i})\}_{i=1}^{n}$ consisting of a set of datasets $\mathbcal{X}_{i}$ for every user $i\in[n]$ and the sets $\mathbb{V}_{i}$ of relevant visualizations for each of those datasets.
$\mathbcal{X}_{i}$	set of datasets (data matrices) of interest to user $i$ where $\mathbcal{X}_{i}=\{\boldsymbol{\mathrm{X}}_{i1},\ldots,\boldsymbol{\mathrm{X}}_{ij},\ldots\}$
$\boldsymbol{\mathrm{X}}_{ij}$	the $j$ th dataset (data matrix) of interest to user $i$ .
$\mathbb{V}_{i}$	sets of visualizations relevant to user $i$ where $\mathbb{V}_{i}=\{\mathbcal{V}_{i1},\ldots,\mathbcal{V}_{ij},\ldots\}$
$\mathbcal{V}_{ij}$	set of visualizations relevant (generated) by user $i$ for dataset $j$ ( $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ ) where $\mathbcal{V}_{ij}=\{\ldots,\mathcal{V},\}$
$\mathcal{V}=(\boldsymbol{\mathrm{X}}^{(k)},C)$	a visualization $\mathcal{V}$ consisting of the subset of attributes $\boldsymbol{\mathrm{X}}^{(k)}$ from some dataset $\boldsymbol{\mathrm{X}}$ and the visual-configuration (design choices)
$\mathbcal{C}$	set of visual-configurations where $\mathcal{C}\in\mathbcal{C}$ represents the visualization design choices for a single visualization $\mathcal{V}$ such as the chart-type, x-axis, y-axis, color, and so on.
$\mathbb{X}_{ij}$	space of attribute combinations/subsets $\mathbb{X}_{ij}=\{\boldsymbol{\mathrm{X}}_{ij}^{(1)},\ldots,\boldsymbol{\mathrm{X}}_{ij}^{(k)},\ldots\}$ of dataset $\boldsymbol{\mathrm{X}}_{ij}$
$n$	number of users
$m$	number of attributes (columns, variables) across all datasets, $m=\sum_{i}m_{i}$ where $m_{i}$ = number of attributes in the $i$ -th dataset
$v$	number of relevant (user-generated) visualizations across all users and datasets
$h$	number of visualization configurations
$k$	dimensionality of the shared attribute feature space, i.e., number of attribute features
$d$	shared latent embedding dimensionality
$t$	number of types of implicit/explicit user feedback, i.e., attribute and visualization click, like, add-to-dashboard, among others
$\boldsymbol{\mathrm{x}}$	a attribute (column) vector from an arbitrary user uploaded dataset
$\left\|\boldsymbol{\mathrm{x}}\right\|$	cardinality of $\boldsymbol{\mathrm{x}}$ , i.e., number of unique values in $\boldsymbol{\mathrm{x}}$
$\mathsf{nnz}(\boldsymbol{\mathrm{x}})$	number of nonzeros in a vector $\boldsymbol{\mathrm{x}}$
$\mathsf{len}({\boldsymbol{\mathrm{x}}})$	length of a vector $\boldsymbol{\mathrm{x}}$
$\boldsymbol{\mathrm{A}}$	user by attribute preference matrix
$\boldsymbol{\mathrm{C}}$	user by visualization configuration matrix
$\boldsymbol{\mathrm{D}}$	attribute preference by visual-configuration matrix
$\boldsymbol{\mathrm{M}}$	attribute by meta-feature matrix
$\boldsymbol{\mathrm{U}}$	shared user embedding matrix
$\boldsymbol{\mathrm{V}}$	shared attribute embedding matrix
$\boldsymbol{\mathrm{Z}}$	shared visualization configuration embedding matrix
$\boldsymbol{\mathrm{Y}}$	meta-feature embedding matrix for the attributes across all datasets

We denote $Y_{ijk}$ as the ground-truth label of a visualization $\mathcal{V}_{ijk}\in\mathbcal{V}_{ij}^{\star}$ where $Y_{ijk}=1$ if $\mathcal{V}_{ijk}\in\mathbcal{V}_{ij}$ and $Y_{ijk}=0$ otherwise. Now we formulate the problem of training a user-level personalized visualization recommendation model $\mathbcal{M}_{i}$ for user $i$ from a large user-centric visualization training corpus $\mathbcal{D}$ .

Definition 7 (Training Personalized Vis. Recommendation Model).

Given the set of training datasets and relevant visualizations $\mathbcal{D}=\{(\mathbcal{X}_{i},\mathbb{V}_{i})\}_{i=1}^{n}$ , the goal is to learn a personalized visualization recommendation model $\mathbcal{M}_{i}$ for user $i$ by solving the following general objective function,

(13)

\displaystyle\operatornamewithlimits{\arg\min}_{\mathbcal{M}_{i}}\sum_{j=1}^{|\mathbcal{X}_{i}|}\sum_{(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}\cup\widehat{\mathbcal{V}}_{ij}^{-}}\mathbb{L}\Big{(}Y_{ijk}\,\big{|}\,\Psi(\boldsymbol{\mathrm{X}}_{ij}^{(k)}),f(\mathcal{C}_{ijk}),\mathbcal{M}_{i}\Big{)},\quad i=1,\ldots,n

where $\mathbb{L}$ is the loss function, $Y_{ijk}=\{0,1\}$ is the ground-truth label of the $k$ th visualization $\mathcal{V}_{ijk}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}\cup\widehat{\mathbcal{V}}_{ij}^{-}$ for dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ of user $i$ . Further, $\boldsymbol{\mathrm{X}}_{ij}^{(k)}\subseteq\boldsymbol{\mathrm{X}}_{ij}$ is the subset of attributes used in the visualization. In Eq. 13, $\Psi$ and $f$ are general functions over the subset of attributes $\boldsymbol{\mathrm{X}}_{ij}^{(k)}\subseteq\boldsymbol{\mathrm{X}}_{ij}$ and the visualization configuration $\mathcal{C}_{ijk}$ of the visualization $\mathcal{V}_{ijk}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}^{-}\cup\mathbcal{V}_{ij}$ , respectively.

For learning individual models $\mathbcal{M}_{i}$ for every user $i\in[n]$ , we can also leverage the visualization and data preferences from other users. The simplest and most straightforward situation is when there is another user $i^{\prime}\in[n]$ with a set of relevant visualizations that use attributes from the same exact dataset, hence $|\mathbcal{X}_{i}\cap\mathbcal{X}_{i^{\prime}}|>0$ . While the above strict assumption is convenient as it makes the problem far simpler, it is unrealistic in practice (and not very useful) to assume there exists a single dataset of interest to all users. Therefore, we designed the approach to be able to learn from visualizations preferred by other users on completely different datasets. This is done by leveraging the similarity between the attributes (used in the visualizations) across completely different datasets (by first embedding the attributes from every dataset to a shared fixed dimensional space) as well as the similarity between the visual-configurations of the relevant visualizations, despite them using completely different datasets. More formally, given any two users $i,i^{\prime}\in[n]$ along with one of their relevant visualizations, $\mathcal{V}_{ijk}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}$ and $\mathcal{V}_{i^{\prime}\!j^{\prime}\!k^{\prime}}=(\boldsymbol{\mathrm{X}}_{i^{\prime}\!j^{\prime}}^{(k^{\prime})}\!,\mathcal{C}_{i^{\prime}\!j^{\prime}\!k^{\prime}})\in\mathbcal{V}_{i^{\prime}\!j^{\prime}}$ , then since we know that the datasets used in these visualizations are completely different, we instead can leverage this across-dataset training information if they use similar attributes, where across-dataset similarity is measured by first mapping each attribute used in the visualization to a shared $K$ -dimensional meta-feature space, where we can then measure the similarity between each of the attributes used in the visualizations generated by different users. Hence, $s\langle\Psi(\boldsymbol{\mathrm{X}}_{ij}^{(k)}),\Psi(\boldsymbol{\mathrm{X}}_{i^{\prime}\!j^{\prime}}^{(k^{\prime})})\rangle>1-\epsilon$ where $\boldsymbol{\mathrm{X}}_{i^{\prime}\!j^{\prime}}\not\in\mathbcal{X}_{i}$ and $\boldsymbol{\mathrm{X}}_{ij}\not\in\mathbcal{X}_{i^{\prime}}$ . Intuitively, this implies that even though the visualizations are generated using different data, they visualize data that is similar with respect to its overall characteristics and patterns. By construction, visualizations $\mathcal{V}_{ijk}$ and $\mathcal{V}_{i^{\prime}\!j^{\prime}\!k^{\prime}}$ from two different users $i,i^{\prime}\in[n]$ and datasets $\boldsymbol{\mathrm{X}}_{ij}\not=\boldsymbol{\mathrm{X}}_{i^{\prime}\!j^{\prime}}$ may use the same visual-configuration (set of design choices), $\mathcal{C}_{ijk}=\mathcal{C}_{i^{\prime}\!j^{\prime}\!k^{\prime}}\in\mathbcal{C}$ , since we defined the notion of visual-configurations to be data-independent, and thus, even though two visualizations may visualize data attributes from completely different datasets, they can still share the same visual-configuration (design choices). Therefore, as we will see later, we are able to learn from other users with visualizations that use attributes from completely different datasets.

2.3. Personalized Visualization Scoring and Recommendation

After learning the personalized visualization recommendation model $\mathbcal{M}_{i}$ for an individual user $i\in[n]$ (Eq. 13), we can then use $\mathbcal{M}_{i}$ to score and recommend the top most relevant visualizations for user $i$ from any arbitrary dataset $\boldsymbol{\mathrm{X}}$ . There are three possible cases that are naturally supported by the learned model $\mathbcal{M}_{i}$ for recommending visualizations specifically of interest to user $i$ based on their past interactions (visualizations the user viewed/clicked or more generally interacted with):

(1)

The dataset $\boldsymbol{\mathrm{X}}$ used for recommending personalized visualizations to user $i$ via $\mathbcal{M}_{i}$ can be a new previously unseen dataset of interest $\boldsymbol{\mathrm{X}}\not\in\{\mathbcal{X}_{1},\mathbcal{X}_{2},\ldots,\mathbcal{X}_{n}\}$
(2)

The dataset $\boldsymbol{\mathrm{X}}$ is not a previous dataset of interest to user $i$ , but has been used previously by one or more other users $\boldsymbol{\mathrm{X}}\in\{\mathbcal{X}_{1},\ldots,\mathbcal{X}_{n}\}\setminus\mathbcal{X}_{i}$
(3)

The dataset $\boldsymbol{\mathrm{X}}\in\mathbcal{X}_{i}$ is a previous dataset of interest to user $i$

A fundamental property of the personalized visualization recommendation problem is that the user visualization scores for an arbitrary visualization $\mathcal{V}$ (that visualizes data from an arbitrary dataset $\boldsymbol{\mathrm{X}}$ ) are different depending on the individual user and their historical preferences and interests. More formally, given users $i,i^{\prime}\in[n]$ and a visualization $\mathcal{V}$ from a new unseen dataset $\boldsymbol{\mathrm{X}}_{\rm test}$ , we obtain personalized visualization scores for user $i$ and $i^{\prime}$ as $\mathbcal{M}_{i}(\mathcal{V})$ and $\mathbcal{M}_{i^{\prime}}(\mathcal{V})$ , respectively. While existing rule-based (Wongsuphasawat et al., 2015, 2017; Moritz et al., 2018) or ML-based systems (Qian et al., 2020) score the visualization $\mathcal{V}$ the same, no matter the actual user of the system (hence, are agnostic to the actual user and their interests, past interactions, and intent), our work instead focuses on learning individual personalized visualization recommendation models for every user $i\in[n]$ such that the personalized score $\mathbcal{M}_{i}(\mathcal{V})$ of visualization $\mathcal{V}$ for user $i$ is almost surely different from the score $\mathbcal{M}_{i^{\prime}}(\mathcal{V})$ given by the personalized model of another user $i^{\prime}$ , $\mathbcal{M}_{i}(\mathcal{V})\not=\mathbcal{M}_{i^{\prime}}(\mathcal{V})$ . We can state this more generally for all pairs of users $i,i^{\prime}\in[n]$ with respect to a single arbitrary visualization $\mathcal{V}$ ,

(14)

\displaystyle\mathbcal{M}_{i}(\mathcal{V})\not=\mathbcal{M}_{i^{\prime}}(\mathcal{V}),\quad\forall i,i^{\prime}=1,\ldots,n\quad\text{s.t.}\;\,i<i^{\prime}

Hence, given an arbitrary visualization $\mathcal{V}$ , the personalized scores $\mathbcal{M}_{i}(\mathcal{V})$ and $\mathbcal{M}_{i}^{\prime}(\mathcal{V})$ for any two distinct users $i$ and $i^{\prime}$ are not equal with high probability. This is due to the fact that the personalized visualization recommendation models $\mathbcal{M}_{1},\mathbcal{M}_{2},\ldots,\mathbcal{M}_{n}$ capture each of the $n$ users individual data preferences, design/visual preferences, and overall visualization preferences.

Definition 8 (Personalized Visualization Scoring).

Given the personalized visualization recommendation model $\mathbcal{M}_{i}$ for user $i$ and a dataset $\boldsymbol{\mathrm{X}}_{\rm test}$ of interest to user $i$ , we can obtain the personalized scores for user $i$ of every possible visualization that can be generated as,

(15)

\mathbcal{M}_{i}:\mathbcal{X}_{\rm test}\times\mathbcal{C}\to\mathbb{R}

where $\mathbcal{X}_{\rm test}=\{\ldots,\boldsymbol{\mathrm{X}}^{(k)}_{\rm test},\ldots\}$ is the space of attribute subsets from $\boldsymbol{\mathrm{X}}_{\rm test}$ and $\mathbcal{C}$ is the space of visualization configurations. Hence, given an arbitrary visualization $\mathcal{V}$ , the learned model $\mathbcal{M}_{i}$ outputs a personalized score for user $i$ describing the effectiveness or importance of the visualization with respect to that individual user.

Definition 9 (Personalized Visualization Ranking).

Given the set of generated visualizations $\mathbb{V}_{\rm test}=\{\mathbcal{V}_{1},\mathbcal{V}_{2},\ldots,\mathbcal{V}_{Q}\}$ where $Q=|\mathbb{V}_{\rm test}|$ , we derive a personalized ranking of the visualizations $\mathbb{V}_{\rm test}$ from $\boldsymbol{\mathrm{X}}_{\rm test}$ for user $i$ as follows:

(16)

\rho_{i}\big{(}\{\mathcal{V}_{1},\mathcal{V}_{2},\ldots,\mathcal{V}_{\mathcal{Q}}\}\big{)}\,=\;\operatorname*{arg\,sort}_{\mathcal{V}_{t}\in\mathbb{V}_{\rm test}}\;\mathbcal{M}_{i}(\mathcal{V}_{t})

where for any two visualizations $\mathcal{V}_{t}$ and $\mathcal{V}_{t^{\prime}}$ in the personalized ranking $\rho_{i}\big{(}\{\mathcal{V}_{1},\mathcal{V}_{2},\ldots,\mathcal{V}_{|\mathcal{Q}|}\}\big{)}$ of visualizations for the individual user $i$ (from dataset $\boldsymbol{\mathrm{X}}_{\rm test}$ ) such that $t<t^{\prime}$ , then $\mathbcal{M}_{i}(\mathcal{V}_{t})\geq\mathbcal{M}_{i}(\mathcal{V}_{t^{\prime}})$ holds by definition.

3. Personalized Visualization Recommendation Framework

In this section, we present the framework for solving the personalized visualization recommendation problem from Section 2. In Section 3.1, we first describe the meta-feature learning approach for mapping user datasets to a shared universal meta-feature space where relationships between the corpus of tens of thousands of datasets can be automatically inferred and used for learning individual personalized models for each user. Then Section 3.2 introduces a graph model that captures the data preferences of users while Section 3.3 proposes graph models that naturally encode the visual preferences of users. The personalized visualization recommendation models learned from the proposed graph representations are described in Section 3.4, while the visualization scoring and recommendation techniques are presented in Section 3.5.

3.1. Representing Datasets in a Universal Shared Meta-feature Space

To learn from user datasets of different sizes, types, and characteristics, we first embed the attributes (columns) of each dataset $\boldsymbol{\mathrm{X}}\in\mathcal{X}_{1}\cup\mathcal{X}_{2}\cup\cdots\cup\mathcal{X}_{n}$ (from any user) in a shared $K$ -dimensional meta-feature space. This also enables the personalized visualization recommendation model to learn from users with similar data preferences. Recall that each user $i\in[n]$ is associated with a set of datasets $\mathcal{X}_{i}=\{\boldsymbol{\mathrm{X}}_{i1},\boldsymbol{\mathrm{X}}_{i2},\ldots\}$ .

Claim 3.1.

Let $\mathbcal{X}=\bigcup_{i=1}^{n}\mathcal{X}_{i}$ denote the set of all datasets. Then

(17)

\displaystyle\sum_{i=1}^{n}|\mathcal{X}_{i}|\geq|\mathbcal{X}|

Hence, if $\sum_{i=1}^{n}|\mathcal{X}_{i}|=|\mathbcal{X}|$ , then this implies that all users have completely different datasets (there does not exist any two users $i,j\in[n]$ that have a dataset in common). Otherwise, if there exists two users that have at least one dataset in common with one another, then $\sum_{i=1}^{n}|\mathcal{X}_{i}|>|\mathbcal{X}|$ .

In our personalized visualization recommendation problem (Sec. 2), it is possible (and in many cases likely) that users are interested in completely different datasets. In the worst case, every user has a completely disjoint set of datasets, and thus, the implicit and/or explicit user feedback regarding the attributes of interest to the users is also completely disjoint. In such a case, the question then becomes how can we leverage the feedback from users like this, to better recommend attributes from different datasets that may be of interest to a new and/or previous user? To do this, we need a general method that can derive a fixed-size embedding $\Psi(\boldsymbol{\mathrm{x}})\in\mathbb{R}^{K}$ of an attribute $\boldsymbol{\mathrm{x}}$ from any arbitrary dataset $\boldsymbol{\mathrm{X}}$ such that the $K$ -dimensional embedding $\Psi(\boldsymbol{\mathrm{x}})$ captures the important data characteristics and statistical properties of $\boldsymbol{\mathrm{x}}$ , independent of the dataset and size of $\boldsymbol{\mathrm{x}}$ . Afterwards, given two attributes $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ from different datasets (i.e., $\boldsymbol{\mathrm{X}}$ and $\boldsymbol{\mathrm{Y}}$ ) and users, we can derive the similarity between $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ . Suppose there is implicit/explicit user feedback regarding an attribute $\boldsymbol{\mathrm{x}}$ , then given another arbitrary user $i$ interested in a new dataset $\boldsymbol{\mathrm{Y}}$ (without any feedback on the attributes in $\boldsymbol{\mathrm{Y}}$ ), then we can derive the similarity between $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ , and if $\boldsymbol{\mathrm{y}}$ is similar to an attribute $\boldsymbol{\mathrm{x}}$ that was preferred by some user(s), then we can assign the attribute $\boldsymbol{\mathrm{y}}$ a higher probability (weight, score), despite that it doesn’t yet have any user feedback. Therefore, as discussed above, it is clear that this idea of transferring user feedback about attributes across different datasets is extremely powerful and fundamentally important for personalized visualization recommendation (especially when there is only limited sparse feedback available). Moreover, the proposed idea above is also important when there is no feedback about an attribute in some dataset, or a completely new dataset of interest by a user. This enables us to learn better personalized visualization recommendation models for individual users while requiring significantly less feedback.

Property 3.

Two attributes $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ are similar iff

(18)

\displaystyle s\langle\Psi(\boldsymbol{\mathrm{x}}),\Psi(\boldsymbol{\mathrm{y}})\rangle>1-\epsilon.

where $s\langle\cdot,\cdot\rangle$ is the similarity function. Notice that since almost surely $|\boldsymbol{\mathrm{x}}|\not=|\boldsymbol{\mathrm{y}}|$ (different sizes), then the similarity of $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ cannot be computed directly. Therefore, we embed $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ into the same $K$ -dimensional meta-feature space where there similarity can be computed directly as $s\langle\Psi(\boldsymbol{\mathrm{x}}),\Psi(\boldsymbol{\mathrm{y}})\rangle$ .

Table 2. Meta-feature learning framework overview

Framework Components	Examples
1. Data representations $\mathcal{G}$	$\boldsymbol{\mathrm{x}}$ , $\boldsymbol{\mathrm{p}}$ , $g(\boldsymbol{\mathrm{x}})$ , $\ell_{b}\!(\boldsymbol{\mathrm{x}})$ log-binning, …
2. Partitioning functions $\Pi$	Clustering, binning, quartiles, …
3. Meta-feature functions $\psi$	Statistical, information theoretic, …
4. Meta-embedding of meta-features	$\operatornamewithlimits{\arg\min}\limits_{\boldsymbol{\mathrm{H}},\boldsymbol{\Sigma},\boldsymbol{\mathrm{Q}}}\;\mathbb{D}_{\!\!\mathcal{L}}\big{(}\boldsymbol{\mathrm{M}}\\|\boldsymbol{\mathrm{H}}\boldsymbol{\Sigma}\boldsymbol{\mathrm{Q}}^{\top}\big{)}$ , then $\widehat{\boldsymbol{\mathrm{q}}}=\boldsymbol{\Sigma}^{-1}\boldsymbol{\mathrm{H}}^{\top}\widehat{\boldsymbol{\mathrm{m}}}$

Attributes from different datasets are naturally of different sizes, types, and even from different domains. Therefore as shown above, there is no way to compute similarity between them directly. Instead, we propose to map each attribute from any arbitrary dataset into a shared $K$ -dimensional space using meta-feature functions. After every attribute is mapped into this $K$ -dimensional meta-feature space, we can then compare their similarity directly. In this work, we propose a meta-feature learning framework with four main components as shown in Table 2. Many of the framework components use the meta-feature functions denoted as $\psi$ . A meta-feature function is a function that maps an arbitrary vector to a value that captures a specific characteristic of the vector of values. In this work, we leverage a large class of meta-feature functions formally defined in Table 3. However, the framework is flexible and can leverage any arbitrary collection of meta-feature functions. Notably, mapping every attribute from any dataset into a low-dimensional meta-feature space enables the model to capture and learn from the similarity between user preferred attributes in completely different datasets.

Let $\boldsymbol{\mathrm{x}}$ denote an attribute (column vector) from any arbitrary user dataset. Then we may apply the collection of meta-features $\psi$ from Table 3 directly to $\boldsymbol{\mathrm{x}}$ to obtain a low-dimensional representation of $\boldsymbol{\mathrm{x}}$ as $\psi(\boldsymbol{\mathrm{x}})$ . In addition, we can also apply the meta-feature functions $\psi$ to various representations and transformation of $\boldsymbol{\mathrm{x}}$ . For instance, we can first derive the probability distribution $p(\boldsymbol{\mathrm{x}})$ of $\boldsymbol{\mathrm{x}}$ such that $p(\boldsymbol{\mathrm{x}})^{\top}\boldsymbol{\mathrm{e}}=1$ , and then use the meta-feature functions $\psi$ over $p(\boldsymbol{\mathrm{x}})$ to characterize this representation of $\boldsymbol{\mathrm{x}}$ . We can also use the meta-feature functions to characterize other important representations and transformations of the attribute vector $\boldsymbol{\mathrm{x}}$ such as different scale-invariant and dimensionless representations of the data using different normalization functions $g_{h}(\cdot)$ over the attribute (column) vector $\boldsymbol{\mathrm{x}}$ , and from each of these representations, we can apply the above meta-feature functions, e.g., $g_{h}(\boldsymbol{\mathrm{x}})=\frac{\boldsymbol{\mathrm{x}}-\min(\boldsymbol{\mathrm{x}})}{\max(\boldsymbol{\mathrm{x}})-\min(\boldsymbol{\mathrm{x}})}$ , then $\psi(g_{h}(\boldsymbol{\mathrm{x}}))$ . More generally, let $\mathcal{G}=\{g_{1},g_{2},\ldots,g_{\ell}\}$ denote a set of data representation and transformation functions that can be applied over an attribute vector $\boldsymbol{\mathrm{x}}$ from any arbitrary user dataset. We first compute the meta-feature functions $\psi$ (e.g., from Table 3) over the $\ell$ different representations of the attribute vector $\boldsymbol{\mathrm{x}}$ given by the functions $\mathcal{G}=\{g_{1},g_{2},\ldots,g_{\ell}\}$ as follows:

(19)

\psi(g_{1}(\boldsymbol{\mathrm{x}})),\psi(g_{2}(\boldsymbol{\mathrm{x}})),\ldots,\psi(g_{\ell}(\boldsymbol{\mathrm{x}}))

Note that if $g\in\mathcal{G}$ is the identity function, then $\psi(g(\boldsymbol{\mathrm{x}}))=\psi(\boldsymbol{\mathrm{x}})$ . In all cases, the meta-feature function $\psi$ maps a vector of arbitrary size to a fixed size lower-dimensional vector.

For each of the different representation/transformation functions $\mathcal{G}=\{g_{1},\ldots,g_{\ell}\}$ of the attribute vector $\boldsymbol{\mathrm{x}}$ , we use a partitioning function $\Pi$ to group the different values into $k$ different subsets (i.e., partitions, clusters, bins). Then we apply the meta-feature functions $\psi$ to each of the $k$ different groups as follows:

(20)

\displaystyle\underbrace{\psi(\Pi_{1}(g_{1}(\boldsymbol{\mathrm{x}}))),\ldots,\psi(\Pi_{k}(g_{1}(\boldsymbol{\mathrm{x}})))}_{g_{1}(\boldsymbol{\mathrm{x}})},\ldots,\underbrace{\psi(\Pi_{1}(g_{\ell}(\boldsymbol{\mathrm{x}}))),\ldots,\psi(\Pi_{k}(g_{\ell}(\boldsymbol{\mathrm{x}})))}_{g_{\ell}(\boldsymbol{\mathrm{x}})}

where $\Pi_{k}$ denotes the $k$ th partition of values from the partitioning function $\Pi$ . Note that to ensure every attribute is mapped to the same $K$ -dimensional meta-feature space, we only need to fix the number of partitions $k$ . In Eq. 20, we show only a single partitioning function $\Pi$ , however, multiple partitioning functions are used in this work and each is applied in a similar fashion as Eq. 20. All the meta-features derived from Eq. 19 and Eq. 20 are then concatenated into a single vector of meta-features describing the characteristics of the attribute $\boldsymbol{\mathrm{x}}$ . More formally, the meta-feature function $\Psi:\boldsymbol{\mathrm{x}}\to\mathbb{R}^{K}$ that combines the different components from the framework (in Table 2) is defined as

(21)		$\displaystyle\Psi(\boldsymbol{\mathrm{x}})=\big{[}\psi(g_{1}\!(\boldsymbol{\mathrm{x}}))\cdots\psi(g_{\ell}(\boldsymbol{\mathrm{x}}))$	$\displaystyle\cdots\psi(\Pi_{1}(g_{1}\!(\boldsymbol{\mathrm{x}})))\cdots\psi(\Pi_{k}(g_{1}\!(\boldsymbol{\mathrm{x}})))$
		$\displaystyle\cdots\psi(\Pi_{1}(g_{\ell}(\boldsymbol{\mathrm{x}})))\cdots\psi(\Pi_{k}(g_{\ell}(\boldsymbol{\mathrm{x}})))\big{]}$

The resulting $\Psi(\boldsymbol{\mathrm{x}})$ is a $K$ -dimensional meta-feature vector for attribute $\boldsymbol{\mathrm{x}}$ . Our approach is agnostic to the precise meta-feature functions used, and is flexible for use with any alternative set of meta-feature functions (Table 3).

Let $\mathbcal{X}=\cup_{i=1}^{n}\mathcal{X}_{i}$ denote the set of dataset matrices across all $n$ users. Given an arbitrary dataset matrix $\boldsymbol{\mathrm{X}}\in\mathbcal{X}$ (which can be shared among multiple users), let $\Psi(\boldsymbol{\mathrm{X}})\in\mathbb{R}^{K\times|\boldsymbol{\mathrm{X}}|}$ be the resulting meta-feature matrix obtained by applying $\Psi$ independently to each of the $|\boldsymbol{\mathrm{X}}|$ attributes (columns) of $\boldsymbol{\mathrm{X}}$ . Then, we can derive the overall meta-feature matrix $\boldsymbol{\mathrm{M}}$ as

(22)

\displaystyle\boldsymbol{\mathrm{M}}=\bigoplus_{\boldsymbol{\mathrm{X}}\in\mathbcal{X}}\Psi(\boldsymbol{\mathrm{X}})

where $\bigoplus$ is the concatenation operator, i.e., $\Psi(\boldsymbol{\mathrm{x}})\oplus\Psi(\boldsymbol{\mathrm{y}})=[\Psi(\boldsymbol{\mathrm{x}})\,\Psi(\boldsymbol{\mathrm{y}})]\in\mathbb{R}^{K\times 2}$ . Note that Eq. 22 is not equivalent to $\bigoplus_{i=1}^{n}\bigoplus_{\boldsymbol{\mathrm{X}}\in\mathcal{X}_{i}}\Psi(\boldsymbol{\mathrm{X}})$ since any two users $i,j\in[n]$ can share one or more datasets. With slight abuse of notation, let $d=|\mathbcal{X}|$ and $\mathbcal{X}=\{\boldsymbol{\mathrm{X}}_{1},\ldots,\boldsymbol{\mathrm{X}}_{d}\}$ , then $\boldsymbol{\mathrm{M}}=\Psi(\{\boldsymbol{\mathrm{X}}_{1},,\ldots,\boldsymbol{\mathrm{X}}_{d}\})$ where $\boldsymbol{\mathrm{M}}$ is a $K\times(|\boldsymbol{\mathrm{X}}_{1}|+\cdots+|\boldsymbol{\mathrm{X}}_{d}|)$ matrix.

In Figure 1, we investigate the similarity of attributes across different datasets in the personalized visualization corpus (Section 5), and observe two important findings. First, Figure 1 indicates that attributes across different datasets may be similar to one another and the latent relationships between the attributes can benefit learning personalized visualization recommendation models, especially for users with very few or even no visualization feedback. Second, the meta-features used to characterize attributes from any arbitrary dataset are diverse and fundamentally different from one another as shown in Figure 1. This finding is important and validates the proposed meta-feature learning framework since the meta-features must be able to capture the fundamental patterns and characteristics for a dataset from any arbitrary domain.

Refer to caption — Figure 1. Similarity of attributes across different datasets using the attribute embeddings in the universal meta-feature space. (a) uses the first 1000 attributes across different datasets and takes the cosine similarity between each pair of attributes with respect to their fixed $k$ -dimensional meta-feature vectors. (b) shows the cosine similarity between the attribute meta-features used to characterize the different attributes. See text for discussion.

Table 3. Summary of attribute meta-feature functions. Let

\boldsymbol{\mathrm{x}}

denote an arbitrary attribute (variable, column) vector and

\pi(\boldsymbol{\mathrm{x}})

is the sorted vector.

Function Name	Equation	Rationale
Num. instances	$\left\|\boldsymbol{\mathrm{x}}\right\|$	Speed, Scalability
Num. missing values	$s$	Imputation effects
Frac. of missing values	$\nicefrac{{\left\|\boldsymbol{\mathrm{x}}\right\|-s}}{{\left\|\boldsymbol{\mathrm{x}}\right\|}}$	Imputation effects
Num. nonzeros	$\mathtt{nnz}(\boldsymbol{\mathrm{x}})$	Imputation effects
Num. unique values	$\mathsf{card}(\boldsymbol{\mathrm{x}})$	Imputation effects
Density	$\nicefrac{{\mathsf{nnz}(\boldsymbol{\mathrm{x}})}}{{\left\|\boldsymbol{\mathrm{x}}\right\|}}$	Imputation effects
$Q_{1}$ , $Q_{3}$	median of the $\left\|\boldsymbol{\mathrm{x}}\right\|/2$ smallest (largest) values	$-$
IQR	$Q_{3}-Q_{1}$	$-$
Outlier LB $\alpha\in\{1.5,3\}$	$\sum_{i}\mathbb{I}(x_{i}<Q_{1}-\alpha IQR)$	Data noisiness
Outlier UB $\alpha\in\{1.5,3\}$	$\sum_{i}\mathbb{I}(x_{i}>Q_{3}+\alpha IQR)$	Data noisiness
Total outliers $\alpha\in\{1.5,3\}$	$\sum_{i}\mathbb{I}(x_{i}<Q_{1}-\alpha IQR)+\sum_{i}\mathbb{I}(x_{i}>Q_{3}+\alpha IQR)$	Data noisiness
( $\alpha$ std) outliers $\alpha\in\{2,3\}$	$\mu_{\boldsymbol{\mathrm{x}}}\pm\alpha\sigma_{\boldsymbol{\mathrm{x}}}$	Data noisiness
Spearman ( $\rho$ , p-val)	$\mathsf{spearman}(\boldsymbol{\mathrm{x}},\pi(\boldsymbol{\mathrm{x}}))$	Sequential
Kendall ( $\tau$ , p-val)	$\mathsf{kendall}(\boldsymbol{\mathrm{x}},\pi(\boldsymbol{\mathrm{x}}))$	Sequential
Pearson ( $r$ , p-val)	$\mathsf{pearson}(\boldsymbol{\mathrm{x}},\pi(\boldsymbol{\mathrm{x}}))$	Sequential
Min, max	$\min(\boldsymbol{\mathrm{x}})$ , $\max(\boldsymbol{\mathrm{x}})$	$-$
Range	$\max(\boldsymbol{\mathrm{x}})-\min(\boldsymbol{\mathrm{x}})$	Attribute normality
Median	$\mathsf{med}(\boldsymbol{\mathrm{x}})$	Attribute normality
Geometric Mean	$\left\|\boldsymbol{\mathrm{x}}\right\|^{-1}\prod_{i}x_{i}$	Attribute normality
Harmonic Mean	$\left\|\boldsymbol{\mathrm{x}}\right\|/\sum_{i}\frac{1}{x_{i}}$	Attribute normality
Mean, Stdev, Variance	$\mu_{\boldsymbol{\mathrm{x}}}$ , $\sigma_{\boldsymbol{\mathrm{x}}}$ , $\sigma^{2}_{\boldsymbol{\mathrm{x}}}$	Attribute normality
Skewness	$\nicefrac{{\mathbb{E}(\boldsymbol{\mathrm{x}}-\mu_{\boldsymbol{\mathrm{x}}})^{3}}}{{\sigma^{3}_{\boldsymbol{\mathrm{x}}}}}$	Attribute normality
Kurtosis	$\nicefrac{{\mathbb{E}(\boldsymbol{\mathrm{x}}-\mu_{\boldsymbol{\mathrm{x}}})^{4}}}{{\sigma^{4}_{\boldsymbol{\mathrm{x}}}}}$	Attribute normality
HyperSkewness	$\nicefrac{{\mathbb{E}(\boldsymbol{\mathrm{x}}-\mu_{\boldsymbol{\mathrm{x}}})^{5}}}{{\sigma^{5}_{\boldsymbol{\mathrm{x}}}}}$	Attribute normality
Moments [6-10]	$-$	Attribute normality
k-statistic [3-4]	$-$	Attribute normality
Quartile Dispersion Coeff.	$\frac{Q_{3}-Q_{1}}{Q_{3}+Q_{1}}$	Dispersion
Median Absolute Deviation	$\mathsf{med}(\left\|\boldsymbol{\mathrm{x}}-\mathsf{med}(\boldsymbol{\mathrm{x}})\right\|)$	Dispersion
Avg. Absolute Deviation	$\frac{1}{\left\|\boldsymbol{\mathrm{x}}\right\|}\boldsymbol{\mathrm{e}}^{T}\!\left\|\boldsymbol{\mathrm{x}}-\mu_{\boldsymbol{\mathrm{x}}}\right\|$	Dispersion
Coeff. of Variation	$\nicefrac{{\sigma_{\boldsymbol{\mathrm{x}}}}}{{\mu_{\boldsymbol{\mathrm{x}}}}}$	Dispersion
Efficiency ratio	$\nicefrac{{\sigma^{2}_{\boldsymbol{\mathrm{x}}}}}{{\mu^{2}_{\boldsymbol{\mathrm{x}}}}}$	Dispersion
Variance-to-mean ratio	$\nicefrac{{\sigma^{2}_{\boldsymbol{\mathrm{x}}}}}{{\mu_{\boldsymbol{\mathrm{x}}}}}$	Dispersion
Signal-to-noise ratio (SNR)	$\nicefrac{{\mu^{2}_{\boldsymbol{\mathrm{x}}}}}{{\sigma^{2}_{\boldsymbol{\mathrm{x}}}}}$	Noisiness of data
Entropy	$H(\boldsymbol{\mathrm{x}})=-\sum_{i}\;x_{i}\log x_{i}$	Attribute Informativeness
Norm. entropy	$\nicefrac{{H(\boldsymbol{\mathrm{x}})}}{{\log_{2}\left\|\boldsymbol{\mathrm{x}}\right\|}}$	Attribute Informativeness
Gini coefficient	$-$	Attribute Informativeness
Quartile max gap	$\max(Q_{i+1}-Q_{i})$	Dispersion
Centroid max gap	$\max_{ij}\|c_{i}-c_{j}\|$	Dispersion
Histogram prob. dist.	$\boldsymbol{\mathrm{p}}_{h}=\frac{\boldsymbol{\mathrm{h}}}{\boldsymbol{\mathrm{h}}^{T}\boldsymbol{\mathrm{e}}}$ (with fixed # of bins)	-
Landmarker(4-Means)	(i) sum of squared dist., (ii) mean silhouette coeff., (iii) num. of iterations

3.1.1. Meta-Embedding of Meta-Features

We can derive an embedding using the current meta-feature matrix $\boldsymbol{\mathrm{M}}$ . Note this meta-feature matrix may contain all meta-features across all previous datasets or simply the meta-features of a single dataset. However, the more datasets, the better the meta-embedding of the meta-feature will reveal the important latent structures between the meta-features. We learn the latent structure in the meta-feature matrix $\boldsymbol{\mathrm{M}}$ by solving¹¹1Assume w.l.o.g. that columns of $\boldsymbol{\mathrm{M}}$ and the meta-features of a new attribute $\widehat{\boldsymbol{\mathrm{m}}}$ are normalized to length 1.

(23)

\operatornamewithlimits{\arg\min}\limits_{\boldsymbol{\mathrm{H}},\boldsymbol{\Sigma},\boldsymbol{\mathrm{Q}}}\;\mathbb{D}_{\!\!\mathcal{L}}\big{(}\boldsymbol{\mathrm{M}}\|\boldsymbol{\mathrm{H}}\boldsymbol{\Sigma}\boldsymbol{\mathrm{Q}}^{\top}\big{)}

Given meta-features for a new attribute $\widehat{\boldsymbol{\mathrm{m}}}$ in another arbitrary unseen dataset, we use the latent low-rank meta-embedding matrices to map the meta-feature vector $\widehat{\boldsymbol{\mathrm{m}}}\in\mathbb{R}^{k}$ into the low-rank meta-embedding space as

(24)

\widehat{\boldsymbol{\mathrm{q}}}=\boldsymbol{\Sigma}^{-1}\boldsymbol{\mathrm{H}}^{\top}\widehat{\boldsymbol{\mathrm{m}}}

Hence, the meta-feature vector $\widehat{\boldsymbol{\mathrm{m}}}$ of a new previously unseen attribute is mapped into the same meta-embedding space $\widehat{\boldsymbol{\mathrm{q}}}$ . The resulting meta-embedding of the meta-features of the new attribute can then be concatenated onto the meta-feature vector. This has several important advantages. First, using the proposed meta-feature learning framework shown in Table 2 results in hundreds or thousands of meta-features for a single attribute. Many of these meta-features may not be important for a specific attribute, while a few meta-features may be crucial in describing the attribute and its data characteristics. Therefore, the meta-embedding of the meta-features can be viewed as a noise reduction step that essentially removes redundant or noisy signals from the data while preserving the most important signals that describe the fundamental direction and characteristics of the data. Second, the meta-embedding of the meta-features reveals the latent structure and relationships in the meta-features. This step can also be viewed as a type of landmark feature since we solve a learning problem to find a low-rank approximation of $\boldsymbol{\mathrm{M}}$ such that $\boldsymbol{\mathrm{M}}\approx\boldsymbol{\mathrm{H}}\boldsymbol{\Sigma}\boldsymbol{\mathrm{Q}}^{\top}$ .

However, we include it as a different component of the meta-feature learning framework in Table 2 since instead of concatenating the meta-embedding of the meta-features for a attribute, we can also use it directly by replacing it with the meta-feature vector. This is especially important when there is a large number of datasets (e.g., more than 100K datasets with millions of attributes in total) for learning. For instance, if there are 2.3M attributes (see 100K dataset in Table 4), and each attribute is encoded with a dense $K=1006$ dimensional meta-feature vector, then $\boldsymbol{\mathrm{M}}$ has $1006\times 2,300,000$ values that need to be stored, which use 18.5GB space (assuming 8 bytes per value). However, if we use the meta-embedding of the meta-features with $K=10$ , then $\boldsymbol{\mathrm{M}}$ takes about 200MB (0.18GB) of space.

3.2. Learning from User-level Data Preferences Across Different Datasets

Given users $i$ and $j$ that provide feedback on the attributes of interest from two completely different datasets, how can we leverage the user feedback (data preferences) despite it being across different datasets without any shared attributes? To address this important problem, we propose a novel representation and model that naturally enables the transfer of user-level data preferences across different datasets to improve predictive performance, recommendations, and reduce data sparsity. The across dataset transfer learning of user-level data preferences becomes possible due to the proposed representation and model for personalized visualization recommendation.

We now introduce the novel user-level data preference graph model for personalized visualization recommendation that naturally enables across-dataset and across-user transfer learning of preferences. This model encodes users, their interactions with attributes (columns/variables from any arbitrary dataset) and the meta-features of the attributes. This new representation enables us to learn from user-level data preferences across different datasets and users, and therefore very important for personalized visualization recommendation systems. In particular, we first derive the following user-by-attribute preference matrix $\boldsymbol{\mathrm{A}}$ as follows:

(25)

\displaystyle\boldsymbol{\mathrm{A}}=\big{[}\boldsymbol{\mathrm{A}}\big{]}_{ij}=\!\text{\# of times user $i$ clicked (a visualization with) attribute $j$ }

In terms of the implicit or explicit user “action” encoded by $A_{ij}$ , it could be an implicit user action such as when a user clicks or hovers-over a specific attribute $j$ or when a user clicks or hovers-over a visualization that uses attribute $j$ in it. Similarly, $A_{ij}$ can encode an explicit user action/feedback such as the attribute $j$ that a user explicitly liked (independent of a visualization), or more generally, the attribute $j$ used in a visualization that a user explicitly liked, or added-to-their dashboard, and so on. In other words, there are two different types of explicit and implicit user feedback about attributes, notably, user feedback regarding a visualization that used an attribute $j$ (whether the user action is a click, hover, added-to-dashboard, etc), or more directly, whether a user liked or clicked on an attribute in the dataset directly via some UI.

Given $\boldsymbol{\mathrm{A}}$ defined in Eq. 25, we are able to learn from two or more users that have at least one attribute preference in common. More precisely, $\boldsymbol{\mathrm{A}}_{i,:}^{\top}\boldsymbol{\mathrm{A}}_{j,:}>0$ for two arbitrary users $i$ and $j$ , which implies two users $i$ and $j$ share a dataset of interest, and have preferred at least one of the same attributes in that dataset. Unfortunately, finding two users that satisfy the above constraint is often unlikely. Therefore, we need to add another representation to $\boldsymbol{\mathrm{A}}$ that creates meaningful connections between attributes in different datasets based on their similarity. In particular, we do this by leveraging the meta-feature matrix $\boldsymbol{\mathrm{M}}$ from Section 3.1 derives by mapping every attribute in a user-specific dataset to a k-dimensional meta-feature vector. This defines a universal meta-feature space that is shared among the attributes in any arbitrary dataset, and therefore allowing the learning of connections between users and their preferred attributes in completely different datasets. This new component is very important, since without it, we have no way to learn from other users (and across different datasets), since each user has its own datasets, and thus has their own set of visualizations (where each visualization consists of a set of design choices and data choices) that are not shared by any other users.

3.3. Learning from User-level Visual Preferences Across Different Datasets

Visualizations naturally consist of data and visual design choices. The dependence of data in a visualization means that visualizations generated for one dataset will be completely different from the visualizations generated by any other dataset. This is problematic if we want to develop a personalized visualization recommendation system that can learn from the visual preferences of users despite that the users may not have preferred any visualizations from the same datasets. This is important since visualizations are fundamentally tied to the dataset, and each user may have their own set of datasets that are not shared by any other user. Moreover, even if two users had a dataset in common, the probability that the users prefer the same visualization is very small (almost surely zero) due to the exponential space of visualizations that may arise from a single dataset.

To overcome these issues, we introduce a dataset independent notion of a visualization called a visualization configuration. Using this notion, we propose a novel graph representation that enables us to learn from the visual preferences of users despite that they may not have preferred any visualizations from the same datasets. In particular, to learn from visualizations across different datasets and users, we introduce a dataset independent notion called a visualization configuration that removes the dataset dependencies of visualizations, enabling us to capture the general user-level visual preferences of users, independent of the dataset of interest. A visualization configuration is an abstraction of a visualization where instead of mapping specific data attributes to specific design choices of the visualization (e.g., x, y, color, etc.), we replace them with their general type (e.g., numerical, categorical, temporal, …) or other general property or set of properties that generalize across the different datasets. Most importantly, it is by replacing the data-specific design choices with their general type or set of general properties that enables us to capture and learn from these visualization-configurations. More formally,

Definition 10 (Visual Configuration).

Given a visualization $\mathcal{V}$ consisting of a set of design choices and data (attributes) associated to a subset of the design choices. For every design choice such as chart-type, there is a set of possible options. Other design choices such as color can also be associated to a set of options, e.g., static color definitions, or color map for specific data attributes. Let $\mathcal{T}:\boldsymbol{\mathrm{x}}\to\mathcal{P}$ be a function that maps an attribute $\boldsymbol{\mathrm{x}}$ of some arbitrary dataset $\boldsymbol{\mathrm{X}}\in\mathbcal{X}$ to a property $P$ that generalizes across any arbitrary dataset, and therefore, is independent of the specific dataset. Hence, given attributes $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ from two different datasets, then it is possible that $\mathcal{T}(\boldsymbol{\mathrm{x}})=P$ and $\mathcal{T}(\boldsymbol{\mathrm{y}})=P$ . A visualization configuration is defined as an abstraction of a visualization where every design choice of a visualization that is bound to a data attribute is replaced with a general property of the attribute $\mathcal{T}(\boldsymbol{\mathrm{x}})=P$ .

Claim 3.2.

There exists $\boldsymbol{\mathrm{x}}$ and $\boldsymbol{\mathrm{y}}$ from different datasets such that $\mathcal{T}(\boldsymbol{\mathrm{x}})=P$ and $\mathcal{T}(\boldsymbol{\mathrm{y}})=P$ hold.

The size of the space of visualization configurations is large since visualization configurations come from all possible combinations of design choices and their values such as,

chart-type: bar, scatter, …
x-type: quantitative, nominal, ordinal, temporal, …, none
y-type: quantitative, nominal, ordinal, temporal, …, none
color: red, green, blue, …
size: 1pt, 2pt, …
x-aggregate: sum, mean, bin, …, none
y-aggregate: sum, mean, bin, …, none
…

A visualization configuration and the attributes selected is everything necessary to generate a visualization. In Figure 3, we provide a toy example showing the process of extracting a data-independent visual-configuration from a visualization. Using the notion of a visualization configuration (Definition 10), we can now introduce a model that captures the visualization preferences of users while ensuring that the visual preferences are not tied to specific datasets. In particular, we define the visual preference matrix $\boldsymbol{\mathrm{C}}$ as follows:

(26)

\boldsymbol{\mathrm{C}}=\big{[}\boldsymbol{\mathrm{C}}\big{]}_{ij}=\text{\# of times user i clicked visualization configuration j}

Note that clicked is simply one such example. Other possibilities of defining $\boldsymbol{\mathrm{C}}$ (or other similar visual preference matrices) include $C_{ij}=\#$ of times user $i$ performed $\text{action}\in\{\text{clicked}$ , hovered, liked, $\text{added-to-dashboard}\}$ visualization configuration $j$ . From our proposed graph model shown in Eq. 26, we can directly learn from user-level visual preferences across different datasets. This novel user-level visual preference graph model for visualization recommendation encodes users and their visual-configurations. Since each visual-configuration node represents a set of design choices that are by definition not tied to a user-specific dataset, then the model can use this user-level visual graph to infer and make connections between other similar visual-configurations likely to be of interest to that user. This new graph model is critical since it allows the learning component to learn from user-level visual preferences (which are visual-configurations) across the different datasets and users. Without this novel component, there would be no way to learn from other users visual preferences (sets of design choices).

The novel notion of a visualization configuration that removes the dataset dependencies of a visualization enabling us to model and learn from the visual preferences of users across different datasets. We introduce the notion of a visualization-configuration, which is an abstraction of a visualization. For instance, suppose we have a json encoding of the actual visualization, that is the design choices + the actual attributes and their data used in the visualization (hence, using this json, we can create the visualization precisely). Recall that this is not very useful for personalized visualization recommendation since the visualization is clearly tied to the specific dataset used by a single user. Hence, if we used visualizations directly, then the optimization method used to optimize an arbitrary objective function to obtain the embeddings for inference would not be able to use other user preferences, since they would also be for visualizations tied to other datasets. To overcome this issue, we propose the novel notion of a visualization-configuration that removes the data-dependency. In particular, given a visualization which includes the design choices + data choices (e.g., data used for the x, y, color attributes), we derive a visualization-configuration from it by replacing the data (attributes) and data attribute names by general properties that are dataset-independent. For instance, in this work, we have used the type of the attribute (e.g., categorical, real-valued, etc.), but we can also use any other general property of the data as well. Most importantly, this new abstraction enables us to learn from users and their visual preferences (design choices), despite that these visual preferences are for visualizations generated for a completely different dataset. This is because we carefully designed the notion of a visualization-configuration to generalize across datasets. In other words, the proposed notion of visualization-configuration are independent of the dataset at hand, and therefore can be shared among users. Notice that traditional recommender systems used in movie or product recommendation are comparatively simple, since these systems assume a single universal dataset (set of movies, set of items/products) that all users share and have feedback about. However, none of these simple assumptions hold in the case of visualization recommendation, and therefore we had to develop and propose these new notions and models for learning.

Notice the matrix $\boldsymbol{\mathrm{C}}$ encodes the data-independent visual preferences of each user, which is very important. However, this representation does not capture how the visual configurations map to the actual data preferences of the users (attributes and general meta-features that characterize the attributes). Therefore, we also introduce another representation to encode these important associations. In particular, we encode the attributes associated with each visual-configurations as,

(27)

\displaystyle\boldsymbol{\mathrm{D}}=\big{[}\boldsymbol{\mathrm{D}}\big{]}_{kt}=\text{\# of times attribute $k$ was used in visual-configuration $t$ clicked by some user}

As an example, given a relevant visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)},C_{t})\in\mathbcal{V}_{ij}$ of user $i\in[n]$ for dataset $\boldsymbol{\mathrm{X}}_{ij}$ with attributes $\boldsymbol{\mathrm{X}}_{ij}^{(k)}=[\boldsymbol{\mathrm{x}}_{p}\,\boldsymbol{\mathrm{x}}_{q}]$ and visual-configuration $C_{t}\in\mathbcal{C}$ , we set $D_{pt}=D_{pt}+1$ and $D_{qt}=D_{qt}+1$ . We repeat this for all relevant visualizations of each user. In Figure 2, we provide an overview of the proposed personalized visualization recommendation graph model.

3.4. Models for Personalized Visualization Recommendation

We first introduce the PVisRec model that uses the learned meta-feature matrix $\boldsymbol{\mathrm{M}}$ from Section 3.1 and all the graph representations proposed in Section 3.2 for capturing the shared data preferences between users despite using completely different datasets along with the graph representations from Section 3.3 that capture the visual preferences of users across all datasets in the corpus. Then we discuss two variants of PVisRec that are investigated later in Section 6.

3.4.1. PVisRec:

Given the sparse user by attribute adjacency matrix $\boldsymbol{\mathrm{A}}\in\mathbb{R}^{n\times m}$ , dense meta-feature by attribute matrix $\boldsymbol{\mathrm{M}}\in\mathbb{R}^{k\times m}$ , sparse user by visual-configuration adjacency matrix $\boldsymbol{\mathrm{C}}\in\mathbb{R}^{n\times h}$ , and sparse attribute by visual-configuration adjacency matrix $\boldsymbol{\mathrm{D}}\in\mathbb{R}^{m\times h}$ , the goal is to find the rank- $d$ embedding matrices $\boldsymbol{\mathrm{U}}$ , $\boldsymbol{\mathrm{V}}$ , $\boldsymbol{\mathrm{Z}}$ , and $\boldsymbol{\mathrm{Y}}$ that minimize the following objective function:

(28)

\displaystyle f(\boldsymbol{\mathrm{U}},\boldsymbol{\mathrm{V}},\boldsymbol{\mathrm{Z}},\boldsymbol{\mathrm{Y}})=\|\boldsymbol{\mathrm{A}}-\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{V}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{M}}-\boldsymbol{\mathrm{Y}}\boldsymbol{\mathrm{V}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{C}}-\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{Z}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{D}}-\boldsymbol{\mathrm{V}}\boldsymbol{\mathrm{Z}}^{\top}\|^{2}

where $\boldsymbol{\mathrm{U}}\in\mathbb{R}^{n\times d}$ , $\boldsymbol{\mathrm{V}}\in\mathbb{R}^{m\times d}$ , $\boldsymbol{\mathrm{Z}}\in\mathbb{R}^{h\times d}$ , $\boldsymbol{\mathrm{Y}}\in\mathbb{R}^{k\times d}$ are low-rank $d$ -dimensional embeddings of the users, attributes (across all datasets), visual-configurations, and meta-features. Further, the formulation above uses squared error, though other loss functions can also be used (e.g., Bregman divergences) (Singh and Gordon, 2008). We can solve Eq. 28 by computing the gradient and then using a first-order optimization method (Schenker et al., 2021). Afterwards, we have

(29)	$\displaystyle\boldsymbol{\mathrm{A}}$	$\displaystyle\approx\boldsymbol{\mathrm{A}}^{\prime}=\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{V}}^{\top}=\sum_{r=1}^{d}\boldsymbol{\mathrm{u}}_{r}\boldsymbol{\mathrm{v}}_{r}^{\top}$
(30)	$\displaystyle\boldsymbol{\mathrm{M}}$	$\displaystyle\approx\boldsymbol{\mathrm{M}}^{\prime}=\boldsymbol{\mathrm{Y}}\boldsymbol{\mathrm{V}}^{\top}=\sum_{r=1}^{d}\boldsymbol{\mathrm{y}}_{r}\boldsymbol{\mathrm{v}}_{r}^{\top}$
(31)	$\displaystyle\boldsymbol{\mathrm{C}}$	$\displaystyle\approx\boldsymbol{\mathrm{C}}^{\prime}=\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{Z}}^{\top}=\sum_{r=1}^{d}\boldsymbol{\mathrm{u}}_{r}\boldsymbol{\mathrm{z}}_{r}^{\top}$
(32)	$\displaystyle\boldsymbol{\mathrm{D}}$	$\displaystyle\approx\boldsymbol{\mathrm{D}}^{\prime}=\boldsymbol{\mathrm{V}}\boldsymbol{\mathrm{Z}}^{\top}=\sum_{r=1}^{d}\boldsymbol{\mathrm{v}}_{r}\boldsymbol{\mathrm{z}}_{r}^{\top}$

Solving Eq. 28 corresponds to the PVisRec model investigated later in Section 6. We also investigate a few different variants of the PVisRec model from Eq. 28 later in Section 6. In particular, the model variants of PVisRec use only a subset of the graph representations $\{\boldsymbol{\mathrm{A}},\boldsymbol{\mathrm{C}},\boldsymbol{\mathrm{D}}\}$ and/or dense meta-feature matrix $\boldsymbol{\mathrm{M}}$ introduced previously in Section 3.1-3.3.

3.4.2. PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{M}}$ only):

Given the user by attribute matrix $\boldsymbol{\mathrm{A}}\in\mathbb{R}^{n\times m}$ , meta-feature by attribute matrix $\boldsymbol{\mathrm{M}}\in\mathbb{R}^{k\times m}$ , and user by visual-configuration matrix $\boldsymbol{\mathrm{C}}\in\mathbb{R}^{n\times h}$ , the goal is to find the rank- $d$ embedding matrices $\boldsymbol{\mathrm{U}}$ , $\boldsymbol{\mathrm{V}}$ , $\boldsymbol{\mathrm{Z}}$ , and $\boldsymbol{\mathrm{Y}}$ that minimize the following objective function:

(33)

\displaystyle f(\boldsymbol{\mathrm{U}},\boldsymbol{\mathrm{V}},\boldsymbol{\mathrm{Z}},\boldsymbol{\mathrm{Y}})=\|\boldsymbol{\mathrm{A}}-\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{V}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{M}}-\boldsymbol{\mathrm{Y}}\boldsymbol{\mathrm{V}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{C}}-\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{Z}}^{\top}\|^{2}

3.4.3. PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{D}}$ only)

Besides Eq. 33 that uses only $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{M}}$ , and $\boldsymbol{\mathrm{C}}$ , we also investigate another personalized visualization recommendation model that uses $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , and $\boldsymbol{\mathrm{D}}$ (without meta-features). More formally, given $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , and $\boldsymbol{\mathrm{D}}$ , then the problem is to learn low-dimensional rank- $d$ embedding matrices $\boldsymbol{\mathrm{U}}$ , $\boldsymbol{\mathrm{V}}$ , and $\boldsymbol{\mathrm{Z}}$ that minimize the following:

(34)

\displaystyle f(\boldsymbol{\mathrm{U}},\boldsymbol{\mathrm{V}},\boldsymbol{\mathrm{Z}})=\|\boldsymbol{\mathrm{A}}-\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{V}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{C}}-\boldsymbol{\mathrm{U}}\boldsymbol{\mathrm{Z}}^{\top}\|^{2}+\|\boldsymbol{\mathrm{D}}-\boldsymbol{\mathrm{V}}\boldsymbol{\mathrm{Z}}^{\top}\|^{2}

In this work, we used an ALS-based optimizer to solve Eq. 28 and the simpler variants shown in Eq. 33 and Eq. 34. However, we can also leverage a variety of different optimization schemes including cyclic/block coordinate descent (Kim et al., 2014; Rossi and Zhou, 2016), stochastic gradient descent (Yun et al., 2014; Oh et al., 2015), among others (Singh and Gordon, 2008; Bouchard et al., 2013; Choi et al., 2019; Balasubramaniam et al., 2020; Schenker et al., 2021).

3.5. Inferring Personalized Visualization Recommendations for Individual Users

We first discuss using the personalized visualization recommendation model for recommending attributes to users as well as visual-configurations. Then we discuss the fundamentally more challenging task of personalized visualization recommendation.

3.5.1. Personalized Attribute Recommendation

The ranking of attributes for user $i$ is induced by $\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}^{\top}$ where $\boldsymbol{\mathrm{U}}_{i,:}$ is the embedding of user $i$ . Let $\pi_{1}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}^{\top})$ denote the largest attribute weight for user $i$ . Therefore, the top- $k$ attribute weights for user $i$ are denoted as:

\pi_{1}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}^{\top}),\pi_{2}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}^{\top}),\ldots,\pi_{k}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}^{\top})

3.5.2. Personalized Visual-Configuration Recommendation

The personalized ranking of the visual-configurations for user $i$ is inferred by $\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}^{\top}$ where $\boldsymbol{\mathrm{U}}_{i,:}$ is the embedding of user $i$ and $\boldsymbol{\mathrm{Z}}$ is the matrix of visual-configuration embeddings. Hence, $\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}^{\top}\in\mathbb{R}^{h}$ is an $h$ -dimensional vector of weights indicating the likelihood/importance of each visual-configuration for that specific user $i$ . Let $\pi_{1}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}^{\top})$ denote the largest visual-configuration weight for user $i$ . Therefore, the top- $k$ visual-configuration weights for user $i$ is denoted as:

\pi_{1}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}^{\top}),\pi_{2}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}^{\top}),\ldots,\pi_{k}(\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}^{\top})

3.5.3. Personalized Visualization Recommendation

We now focus on the most complex and challenging problem of recommending complete visualizations personalized for a specific user $i\in[n]$ . A recommended visualization for user $i\in[n]$ consists of both the subset of attributes $\boldsymbol{\mathrm{X}}^{(k)}$ from some dataset $\boldsymbol{\mathrm{X}}$ and the design choices $C_{t}$ (a visual-configuration) for those attributes. Given user $i$ along with an arbitrary visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}^{(k)},C_{t})$ generated from some dataset $\boldsymbol{\mathrm{X}}$ of interest to user $i$ , we derive a personalized user-specific score for visualization $\mathcal{V}$ (for user $i$ ) as,

(35)

\displaystyle\widehat{y}(\mathcal{V})=\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}_{t,:}^{\top}\prod_{\boldsymbol{\mathrm{x}}_{j}\in\boldsymbol{\mathrm{X}}^{(k)}}\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}_{j,:}^{\top}

where $\boldsymbol{\mathrm{X}}^{(k)}$ is the subset of attributes from the users dataset $\boldsymbol{\mathrm{X}}$ (hence, $|\boldsymbol{\mathrm{X}}^{(k)}|\leq|\boldsymbol{\mathrm{X}}|$ ) used in the visualization $\mathcal{V}$ and $C_{t}\in\mathbcal{C}$ is the visual-configuration of the visualization $\mathcal{V}$ being scored for user $i$ . Using Eq. 35, we can predict the personalized visualization score $\widehat{y}(\mathcal{V})$ for any arbitrary visualization $\mathcal{V}$ (for any dataset) and user $i\in[n]$ . For evaluation in Section 6.1, we use Eq. 35 to score relevant and non-relevant visualizations for a specific user and dataset of interest.

4. Deep Personalized Visualization Recommendation Models

We now introduce a deep neural network architecture for personalized visualization recommendation. For this, we combine the previously proposed model with a deep multilayer neural network component to learn non-linear functions that capture complex dependencies and patterns between users and their visualization preferences.

4.0.1. Neural PVisRec

Given an arbitrary user $i$ and a visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}^{(k)}_{ij},C_{t})$ to score from some new dataset of interest to that user, we first must decide on the input representation. In this work, we leverage the user personalized embeddings learned in Section 3.4 by concatenating the embedding of user $i$ , visual configuration $t$ , along with the embeddings for each attribute used in the visualization. More formally,

(36)

\displaystyle\phi(\mathcal{V}\!=\!\langle\boldsymbol{\mathrm{X}}^{(k)}_{ij},C_{t}\rangle)=\begin{bmatrix}\boldsymbol{\mathrm{u}}_{i}\\ \boldsymbol{\mathrm{z}}_{t}\\ \boldsymbol{\mathrm{v}}_{r_{1}}\\ \vdots\\ \boldsymbol{\mathrm{v}}_{r_{s}}\end{bmatrix}

where $\boldsymbol{\mathrm{u}}_{i}$ is the embedding of user $i$ , $\boldsymbol{\mathrm{z}}_{t}$ is the embedding of the visual-configuration $C_{t}$ , and $\boldsymbol{\mathrm{v}}_{r_{1}},\ldots,\boldsymbol{\mathrm{v}}_{r_{s}}$ are the embeddings of the attributes used in the visualization being scored for user $i$ . This can be written as,

(37)

\displaystyle\phi(\mathcal{V}\!=\!\langle\boldsymbol{\mathrm{X}}^{(k)}_{ij},C_{t}\rangle)=\big{[}\,\boldsymbol{\mathrm{U}}^{\top}\boldsymbol{\mathrm{e}}_{i}\;\,\,\boldsymbol{\mathrm{Z}}^{\top}\boldsymbol{\mathrm{e}}_{t}\;\,\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{1}}\,\cdots\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{s}}\,\big{]}^{\top}

where $\boldsymbol{\mathrm{e}}_{i}\in\mathbb{R}^{n}$ (user $i$ ), $\boldsymbol{\mathrm{e}}_{t}\in\mathbb{R}^{h}$ (visual-configuration $C_{t}$ ), and $\boldsymbol{\mathrm{e}}_{r_{1}}\in\mathbb{R}^{m}$ (attribute $r_{1}$ ) are the one-hot encodings of the user $i$ , visual-configuration $t$ , and attributes $r_{1},...,r_{s}$ used in the visualization. Note that $\boldsymbol{\mathrm{U}}\in\mathbb{R}^{n\times d}$ , $\boldsymbol{\mathrm{V}}\in\mathbb{R}^{m\times d}$ , $\boldsymbol{\mathrm{Z}}\in\mathbb{R}^{h\times d}$ , $\boldsymbol{\mathrm{Y}}\in\mathbb{R}^{k\times d}$ .

The first neural personalized visualization recommendation architecture that we introduce called Neural PVisRec leverages the user, visual-configuration, and attribute embeddings from the PVisRec model in Section 3.4 as input into a deep multilayer neural network with $L$ fully-connected layers,

(38)	$\displaystyle\phi(\mathcal{V}\!=\!\langle\boldsymbol{\mathrm{X}}^{(k)}_{ij},C_{t}\rangle)$	$\displaystyle=\big{[}\,\boldsymbol{\mathrm{U}}^{\top}\boldsymbol{\mathrm{e}}_{i}\;\,\,\boldsymbol{\mathrm{Z}}^{\top}\boldsymbol{\mathrm{e}}_{t}\;\,\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{1}}\,\cdots\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{s}}\,\big{]}^{\top}$
(39)	$\displaystyle\boldsymbol{\mathrm{q}}_{1}$	$\displaystyle=\sigma_{1}(\boldsymbol{\mathrm{W}}_{1}\phi(\mathcal{V})+\boldsymbol{\mathrm{b}}_{1})$
(40)	$\displaystyle\boldsymbol{\mathrm{q}}_{2}$	$\displaystyle=\sigma_{2}(\boldsymbol{\mathrm{W}}_{2}\boldsymbol{\mathrm{q}}_{1}+\boldsymbol{\mathrm{b}}_{2})$
	$\displaystyle\;\,\vdots$
(41)	$\displaystyle\vspace{-2mm}\boldsymbol{\mathrm{q}}_{L}$	$\displaystyle=\sigma_{L}(\boldsymbol{\mathrm{W}}_{L}\boldsymbol{\mathrm{q}}_{L-1}+\boldsymbol{\mathrm{b}}_{L})$
(42)	$\displaystyle\widehat{y}$	$\displaystyle=\sigma(\boldsymbol{\mathrm{h}}^{\top}\boldsymbol{\mathrm{q}}_{L})$

where $\boldsymbol{\mathrm{W}}_{L}$ , $\boldsymbol{\mathrm{b}}_{L}$ , and $\sigma_{L}$ are the weight matrix, bias vector, and activation function for layer $L$ . Further, $\widehat{y}=\sigma(\boldsymbol{\mathrm{h}}^{\top}\boldsymbol{\mathrm{q}}_{L})$ (Eq. 42) is the output layer where $\sigma$ is the output activation function and $\boldsymbol{\mathrm{h}}^{\top}$ denotes the edge weights of the output function. For the hidden layers, we used ReLU as the activation function. Note that if the visualization does not use all $s$ attributes, then we can pad the remaining unused attributes with zeros. This enables the multi-layer neural network architecture to be flexible for visualizations with any number of attributes. Eq. 38-42 can be written more succinctly as

(43)

\displaystyle\widehat{y}\,=\,\sigma\big{(}\boldsymbol{\mathrm{h}}^{\top}\sigma_{L}(\boldsymbol{\mathrm{W}}_{L}(...\sigma_{1}(\boldsymbol{\mathrm{W}}_{1}\big{[}\boldsymbol{\mathrm{U}}^{\top}\boldsymbol{\mathrm{e}}_{i}\;\,\,\boldsymbol{\mathrm{Z}}^{\top}\boldsymbol{\mathrm{e}}_{j}\;\,\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{1}}\cdots\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{s}}\big{]}^{\top}+\boldsymbol{\mathrm{b}}_{1})...)+\boldsymbol{\mathrm{b}}_{L})\big{)}

where $\widehat{y}$ is the predicted visualization score for user $i$ .

4.0.2. Neural PVisRec-CMF

We also investigated a second neural approach for the personalized visualization recommendation problem. This approach combines scores from PVisRec and Eq. 43. More formally, given user $i$ along with an arbitrary visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}^{(k)}_{ij},C_{t})$ generated from some dataset $\boldsymbol{\mathrm{X}}_{ij}$ of interest to user $i$ , we derive a personalized user-specific score for visualization $\mathcal{V}$ (for user $i$ ) as $\widehat{y}_{\rm PVisRec}=\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}_{t,:}^{\top}\prod_{\boldsymbol{\mathrm{x}}_{j}\in\boldsymbol{\mathrm{X}}^{(k)}}\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}_{j,:}^{\top}$ where $\boldsymbol{\mathrm{X}}^{(k)}_{ij}$ is a subset of attributes used in the visualization $\mathcal{V}$ from the users dataset $\boldsymbol{\mathrm{X}}_{ij}$ (hence, $|\boldsymbol{\mathrm{X}}^{(k)}|\leq|\boldsymbol{\mathrm{X}}_{ij}|$ ) and $C_{t}\in\mathbcal{C}$ is the visual-configuration for visualization $\mathcal{V}$ . Then, we have

(44)

\displaystyle\widehat{y}\,=\,(1-\alpha)\Bigg{(}\boldsymbol{\mathrm{U}}_{i}\boldsymbol{\mathrm{Z}}_{t}^{\top}\prod_{\boldsymbol{\mathrm{x}}_{j}\in\boldsymbol{\mathrm{X}}^{(k)}}\boldsymbol{\mathrm{U}}_{i}\boldsymbol{\mathrm{V}}_{j}^{\top}\Bigg{)}\;+\;\alpha\widehat{y}_{\rm dnn}

where $\widehat{y}_{\rm dnn}=\sigma\big{(}\boldsymbol{\mathrm{h}}^{\top}\sigma_{L}(\boldsymbol{\mathrm{W}}_{L}(...\sigma_{1}(\boldsymbol{\mathrm{W}}_{1}\phi(\mathcal{V})+\boldsymbol{\mathrm{b}}_{1})...)+\boldsymbol{\mathrm{b}}_{L})\big{)}$ with $\phi(\mathcal{V})=\big{[}\boldsymbol{\mathrm{U}}^{\top}\boldsymbol{\mathrm{e}}_{i}\;\,\boldsymbol{\mathrm{Z}}^{\top}\boldsymbol{\mathrm{e}}_{t}\;\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{1}}\cdots\,\boldsymbol{\mathrm{V}}^{\top}\boldsymbol{\mathrm{e}}_{r_{s}}\big{]}^{\top}$ and $\alpha\in(0,1)$ is a hyperparameter that controls the influence of the models on the final predicted score of the visualization for user $i$ .

All layers of the various neural architectures for our personalized visualization recommendation problem use ReLU nonlinear activation. Unless otherwise mentioned, we used three hidden layers and optimized model parameters using mini-batch Adam with a learning rate of 0.001. We designed the neural network structure such that the bottom layers are the widest and each successive layer has 1/2 the number of neurons. For fairness, the last hidden layer is set to the embedding size. Hence, if the embedding size is 8, then the architecture of the layers is $32\rightarrow 16\rightarrow 8$ .

4.0.3. Training

The user-centric visualization training corpus $\mathbcal{D}=\{\mathbcal{X}_{i},\mathbb{V}_{i}\}_{i=1}^{n}$ for personalized visualization recommendation consists of user-level training data for $n$ users where for each user $i\in[n]$ we have a set of datasets $\mathbcal{X}_{i}=\{\boldsymbol{\mathrm{X}}_{i1},\ldots,\boldsymbol{\mathrm{X}}_{ij},\ldots\}$ of interest to that user along with user $i$ ’s “relevant” (generated, liked, clicked-on) visualizations $\mathbb{V}_{i}=\{\mathbcal{V}_{i1},\ldots,\mathbcal{V}_{ij},\ldots\}$ for each of those datasets. For each user $i\in[n]$ and dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ of interest to user $i$ , there is a set $\mathbcal{V}_{ij}=\{\ldots,\mathcal{V}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk}),\ldots\}$ of relevant (positive) visualizations for that user, and we also leverage a sampled set of non-relevant (negative) visualizations $\mathbcal{V}_{ij}^{-}$ for that user $i$ and dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ . Therefore, the set of training visualizations for user $i\in[n]$ and dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ is $\mathbcal{V}_{ij}\cup\mathbcal{V}_{ij}^{-}$ and $Y_{ijk}\in\{0,1\}$ denotes the ground-truth label of visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}\cup\mathbcal{V}_{ij}^{-}$ . Hence, $Y_{ijk}=1$ indicates a user-relevant (positive) visualization for user $i$ whereas $Y_{ijk}=0$ indicates a non-relevant visualization for that user, i.e., $\mathcal{V}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}^{-}$ . The goal is to have the model score $\widehat{Y}_{ijk}\in[0,1]$ each training visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}\cup\mathbcal{V}_{ij}^{-}$ for a user $i$ as close as possible to the ground-truth label $Y_{ijk}$ . The neural personalized visualization recommendation model is learned by optimizing the likelihood of model scores for all visualizations of each user. Given a user $i\in[n]$ and the model parameters $\Theta$ , the likelihood is

(45)

\mathrm{P}(\widehat{\mathbb{V}}_{i}^{-},\mathbb{V}_{i}|\Theta)\,=\prod_{j=1}^{|\mathbcal{X}_{i}|}\;\,\prod_{(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}}\!\widehat{Y}_{ijk}\prod_{(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\widehat{\mathbcal{V}}_{ij}^{-}}\!\!\Big{(}1-\widehat{Y}_{ijk}\Big{)},\;\;\text{for }i=1,\ldots,n

where $\widehat{Y}_{ijk}$ is the predicted score of a visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}_{ij}^{(k)},\mathcal{C}_{ijk})$ for user $i$ and dataset $j$ ( $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ ). Naturally, the goal is to obtain $\widehat{Y}_{\mathcal{V}}$ such that it is as close as possible to the actual ground-truth $Y_{\mathcal{V}}$ . Taking the negative log of the likelihood in Eq. 45 and summing over all $n$ users and their sets of relevant visualizations $\mathbcal{V}_{ij}$ from $|\mathbcal{X}_{i}|$ different datasets give us the total loss $\mathbb{L}$ .

(46)		$\displaystyle\mathbb{L}$	$\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{\|\mathbcal{X}_{i}\|}\Bigg{(}-\sum_{(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}}\log\widehat{Y}_{ijk}-\sum_{(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\widehat{\mathbcal{V}}_{ij}^{-}}\log(1-\widehat{Y}_{ijk})\Bigg{)}$
(46)			$\displaystyle=-\sum_{i=1}^{n}\sum_{j=1}^{\|\mathbcal{X}_{i}\|}\sum_{(\boldsymbol{\mathrm{X}}_{ij}^{(k)}\!,\mathcal{C}_{ijk})\in\mathbcal{V}_{ij}\cup\widehat{\mathbcal{V}}_{ij}^{-}}Y_{ijk}\log\widehat{Y}_{ijk}+(1-Y_{ijk})\log(1-\widehat{Y}_{ijk})$

where the objective function above is minimized via stochastic gradient descent (SGD) to update the model parameters $\Theta$ in $\mathcal{M}$ .

5. Benchmark Data for Personalized Visualization Recommendation

Since this is the first work that addresses the personalized visualization recommendation problem, there were not any existing public datasets that could be used directly for our problem. Recent works have ignored the user information (Hu et al., 2019; Qian et al., 2020) that details the “author” of the visualization, which is required in this work for user-level personalization. As an aside, VizML (Hu et al., 2019) discarded all user information and only kept the attributes used in an actual visualization (and therefore did not consider datasets as well). In this work, since we focus on the personalized visualization recommendation problem, we derive a user-centered dataset where for each user we know their datasets, visualizations, attributes, and visualization-configurations used. We started from the raw Plot.ly community feed data.²²2http://vizml-repository.s3.amazonaws.com/plotly_full.tar.gz For the personalized visualization recommendation problem, we first extract the set of all $n$ users in the visualization corpus. For each user $i\in[n]$ , we then extract the set of datasets $\mathbcal{X}_{i}$ of interest to that user. These are the datasets that user $i$ has generated at least one visualization. Depending on the visualization corpus data, this could also be other types of user feedback such as a visualization that a user liked or clicked. Next, we extract the set of user-preferred visualizations $\mathbcal{V}_{ij}$ for each of the datasets $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ of interest to user $i$ . Hence, $\mathbcal{V}_{ij}$ is the set of visualizations generated (or liked, clicked, …) by user $i$ for dataset $j$ ( $\boldsymbol{\mathrm{X}}_{ij}$ ). Every visualization $\mathcal{V}\in\mathbcal{V}_{ij}$ preferred by user $i$ also obviously contains the attributes from dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ used in the visualization (i.e., the attributes that map to the x, y, binning, color, and so on).

In Table 4, we report statistics about the personalized visualization corpus used in our work, including the number of users, attributes, datasets, visualizations, and visualization-configurations extracted from all the user-generated visualizations, and so on. The corpus $\mathbcal{D}=\{\mathbcal{X}_{i},\mathbb{V}_{i}\}_{i=1}^{n}$ for learning individual personalized visualization recommendation models consists of a total of $n=17,469$ users with $|\!\bigcup_{i=1}^{n}\mathbcal{X}_{i}|=94,419$ datasets used by those users. Further, there are $m=2,303,033$ attributes among the $94,419$ datasets of interest by the $17.4k$ users. Our user-centric visualization training corpus $\mathbcal{D}$ has a total of $|\!\bigcup_{i=1}^{n}\mathbb{V}_{i}|=$ 32,318 relevant visualizations generated by the $n=17.4k$ users with an average of 1.85 relevant visualizations per user. Each user in the corpus has an average of 5.41 datasets and each dataset has an average of 24.39 attributes. From the 32.3k user-relevant visualizations from the $17.4k$ users, we extracted a total of $|\mathbcal{C}|=686$ unique visual-configurations. To further advance research on personalized visualization recommender systems, we have made the user-level plot.ly data that we used for studying the personalized visualization recommendation problem (introduced in Section 2.3) publicly accessible at:

http://networkrepository.com/personalized-vis-rec

We have also made the graph representations used in our personalized visualization recommendation framework publicly accessible at http://networkrepository.com/personalized-vis-rec-graphs

Table 4. Personalized visualization recommendation data corpus. These user-centric dataset is used for learning personalized visualization recommendation models for individual users.

# Users	17,469
# Datasets	94,419
# Attributes	2,303,033
# Visualizations	32,318
# Vis. Configs	686
# Meta-features	1006
mean # attr. per dataset	24.39
mean # attr. per user	51.63
mean # vis. per user	1.85
mean # datasets per user	5.41
Density ( $\boldsymbol{\mathrm{A}}$ )	¡0.0001
Density ( $\boldsymbol{\mathrm{C}}$ )	¡0.0001
Density ( $\boldsymbol{\mathrm{D}}$ )	¡0.0001
Density ( $\boldsymbol{\mathrm{M}}$ )	0.4130

6. Experiments

To investigate the effectiveness of the personalized visualization recommendation approach, we design experiments to answer the following research questions:

•

RQ1: Given a user and a new dataset of interest to that user, can we accurately recommend the top most relevant visualizations for that specific user (Section 6.1)?
•

RQ2: How does our user-level personalized visualization recommendations compare to the non-personalized global recommendations (Section 6.2)?
•

RQ3: Can we significantly reduce the space requirements of our approach by trading off a small amount of accuracy for a large improvement in space (Section 6.3)?
•

RQ4: Does the neural personalized visualization recommendation models further improve the performance when incorporating a multilayer deep neural network component (Section 6.4)?

Table 5. Personalized Visualization Recommendation Results. Note

d=10

. See text for discussion.

	HR@K					NDCG@K
Model	@1	@2	@3	@4	@5	@1	@2	@3	@4	@5
VizRec	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
VisPop	0.186	0.235	0.255	0.271	0.289	0.181	0.214	0.224	0.231	0.238
VisConfigKNN	0.026	0.030	0.038	0.055	0.089	0.016	0.021	0.026	0.034	0.048
VisKNN	0.147	0.230	0.297	0.372	0.449	0.143	0.195	0.227	0.257	0.286
eALS	0.304	0.395	0.426	0.441	0.449	0.302	0.360	0.376	0.382	0.385
MLP	0.218	0.452	0.601	0.671	0.715	0.211	0.357	0.435	0.465	0.483
PVisRec	0.630	0.815	0.876	0.906	0.928	0.624	0.743	0.775	0.788	0.796

6.1. Personalized Visualization Recommendation Results

6.1.1. Experimental setup

Now we evaluate the system for recommending personalized visualizations to a user. Given an arbitrary user, we know the visualization(s) they preferred for each of the datasets of interest to them. Therefore, we can quantitatively evaluate the proposed approach for personalized visualization recommendation. For each user, we randomly select one of their datasets where the user has manually created at least two visualizations (treated as positive examples), and randomly select one of those positive visualizations to use for testing, and the other positive instances are used for training and validation. This is similar to leave-one-out evaluation which is widely used in traditional user-item recommender systems (He et al., 2016). However, in our case, we have thousands of datasets, and for each dataset there are a large and completely disjoint set of possible visualizations to recommend to that user.³³3The set of candidate visualizations for a specific dataset are not only disjoint (i.e., completely different from any other set of visualizations generated from another dataset), but the amount of possible visualizations for a given dataset are exponential in the number of attributes, possible design choices, and so on, making this problem unique and fundamentally challenging. Since it is too computationally expensive to rank all visualizations for every user (and every dataset of interest) during evaluation, we randomly sample 19 visualizations that were not created by the users. This gives us a total of 20 visualizations per user (1 relevant + 19 non-relevant visualizations) to use for evaluation of the personalized visualization recommendations from our proposed models. Using this held-out set of user visualizations, we evaluate the ability of the proposed approach to recommend these held-out relevant visualizations to the user (which are visualizations the user actually created), among the exponential amount of alternative visualizations (that arise for a single dataset of interest) from a set of attributes and sets of design choices (e.g., chart-types, …). In particular, given a user $i$ and a dataset of interest to that user, we use the proposed approach to recommend the top- $k$ visualizations personalized for that specific user and dataset. To quantitatively evaluate the personalized ranking of visualizations given by the proposed personalized visualization recommendation models, we use rank-based evaluation metrics including Hit Ratio at $K$ (HR@K) and Normalized Discounted Cumulative Gain (NDCG@K) (He et al., 2016). Intuitively, HR@K quantifies whether the held-out relevant (user generated) visualization appears in the top- $K$ ranked visualizations or not. Similarly, NDCG@K takes into account the position of the relevant (user generated) visualization in the top- $K$ ranked list of visualizations, by assigning larger scores to visualizations ranked more highly in the list. For both HR@K and NDCG@K, we report $K=1,\ldots,5$ unless otherwise mentioned.

Therefore, given a user $i$ along with the set of relevant and non-relevant visualizations $\mathbcal{V}_{ij}\cup\mathbcal{V}_{ij}^{-}$ for that user and their dataset $\boldsymbol{\mathrm{X}}_{ij}\in\mathbcal{X}_{i}$ of interest, we derive a score for each of the visualizations $\mathcal{V}\in(\mathbcal{V}_{ij}\cup\mathbcal{V}_{ij}^{-})$ where $|\mathbcal{V}_{ij}|+|\mathbcal{V}_{ij}^{-}|=20$ . An effective personalized visualization recommender will assign a larger score to the relevant visualizations and smaller scores to the non-relevant visualizations, hence, the relevant visualizations will show up first, followed by the non-relevant visualizations (which should appear further down the list). Unless otherwise mentioned, we use $d=10$ as the embedding size and use the full meta-feature matrix $\boldsymbol{\mathrm{M}}$ . For the neural variants of our approach, we use $\alpha=0.5$ .

Table 6. Ablation study results for different variants of our personalized visualization recommendation approach.

	HR@K					NDCG@K
Model	@1	@2	@3	@4	@5	@1	@2	@3	@4	@5
PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{M}}$ only)	0.307	0.416	0.470	0.488	0.501	0.306	0.374	0.401	0.410	0.415
PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{D}}$ only)	0.414	0.474	0.537	0.610	0.697	0.384	0.435	0.450	0.457	0.460
PVisRec	0.630	0.815	0.876	0.906	0.928	0.624	0.743	0.775	0.788	0.796

6.1.2. Baselines

Since the personalized visualization recommendation problem introduced in Section 2 is new, there are not any existing vis. rec. methods that can be directly applied to solve it. For instance, VizRec (Mutlu et al., 2016) is the closest existing approach, though is unable to be used since it explicitly assumes a single dataset where users provide feedback about visualizations pertaining to that dataset of interest. However, in our problem formulation and corpus $\mathbcal{D}=\{(\mathbcal{X}_{i},\mathbb{V}_{i})\}_{i=1}^{n}$ , every user $i\in[n]$ can have their own set of datasets $\mathbcal{X}_{i}$ that are not shared by any other user. In such cases, it is impossible to use VizRec. Nevertheless, we adapted a wide variety of methods to use as baselines for evaluation. We now briefly summarize these methods below:

•

VisPop: Given a visualization $\mathcal{V}$ with attributes $\boldsymbol{\mathrm{X}}^{(k)}$ and visual-configuration $\mathcal{C}\in\mathbcal{C}$ , the score of visualization $\mathcal{V}$ is $\phi(V)=f(\mathcal{C})\prod_{\boldsymbol{\mathrm{x}}\in\boldsymbol{\mathrm{X}}^{(k)}}f(\boldsymbol{\mathrm{x}})$ where $f(\boldsymbol{\mathrm{x}})$ is the frequency of attribute $\boldsymbol{\mathrm{x}}$ (sum of the columns of $\boldsymbol{\mathrm{A}}$ ) and $f(\mathcal{C})$ is the frequency of visual-configuration $\mathcal{C}$ . Hence, the score given by VisPop is a product of the frequencies of the underlying visualization components, i.e., visual-configuration and attributes used in the visualization being scored.
•

VisKNN: This is the standard item-based collaborative filtering method adapted for the visualization recommendation problem. Given a visualization $\mathcal{V}$ with attributes $\boldsymbol{\mathrm{X}}^{(k)}$ and visual-configuration $\mathcal{C}\in\mathbcal{C}$ , then we score $\mathcal{V}$ by taking the mean score of the visual configurations most similar to $\mathcal{C}$ , along with the mean score of the top attributes most similar to each of the attributes used in the visualization.
•

VisConfigKNN: This approach is similar to VisKNN, but uses only the visual-configuration matrix to score the visualizations.
•

eALS: This is an adapted version of the state-of-the-art MF method used for item recommendation in (He et al., 2016). We adapted it for our visualization recommendation problem by minimizing squared loss while treating all unobserved user iterations between attributes and visual-configurations as negative examples, which are weighted non-uniformly by the frequency of attributes and visual-configurations.
•

MLP: We used three hidden layers and optimized model parameters using mini-batch Adam with a learning rate of 0.001. For the activation functions of the MLP layers, we used ReLU. For fairness, the last hidden layer is set to the embedding size.
•

VizRec (Mutlu et al., 2016): For each dataset, this approach constructs a user-by-visualization matrix and uses it to obtain the average overall rating among the similar users of a visualization, where a user is similar if it has rated a visualization preferred by the active user. VizRec assumes a single dataset and is only applicable when there are a large number of users that have rated visualizations from the same dataset.

6.1.3. Results

We provide the results in Table 5. Overall, the proposed approach, PVisRec, significantly outperforms the baseline methods by a large margin as shown in Table 5. Strikingly, PVisRec consistently achieves the best HR@K and NDCG@K across all $K=1,2,\ldots,5$ . From Table 5, we see that PVisRec achieves a mean relative improvement of $107.2\%$ and $106.6\%$ over the best performing baseline method (eALS) for HIT@1 and NDCG@1, respectively. Comparing HIT@5 and NDCG@5, PVisRec achieves a mean improvement of $29.8\%$ and $64.9\%$ over the next best performing method (MLP). As an aside, VizRec is the only approach proposed for ranking visualizations. All other methods used in our comparison are new and to the best of our knowledge have never been extended for ranking and recommending visualizations. Recall that VizRec in itself solves a different problem, but we point out the above since it is clearly the closest. As discussed in Section 7, all of the assumptions required by VizRec are unrealistic in practice. This is also true when using VizRec for our problem and corpus $\mathbcal{D}=\{(\mathbcal{X}_{i},\mathbb{V}_{i})\}_{i=1}^{n}$ where every user $i\in[n]$ can have their own set of datasets $\mathbcal{X}_{i}$ that are not shared by any other user. In such cases, we use “N/A” to denote this fact. This is due to the VizRec assumption that there is a single dataset of interest by all $n$ users, and every user has given many different preferences on the relevant visualizations generated for that specific dataset. All of these assumptions are violated in our problem. Figure 4 shows the mean performance of the top- $K$ visualization recommendations for $K=1,2,\ldots,10$ . These results demonstrate the effectiveness of our user personalized visualization recommendation approach as we are able to successfully recommend users the held-out visualizations that they previously created.

6.1.4. Ablation Study Results

Previously, we observed that PVisRec significantly outperforms other methods for the personalized visualization recommendation problem. To understand the importance of the different model components of PVisRec, we investigate a few different variants of our personalized visualization recommendation model. The first variant called PVisRec ( $\boldsymbol{\mathrm{A}},\boldsymbol{\mathrm{C}},\boldsymbol{\mathrm{M}}$ only) does not use the attribute by visual-configuration graph represented by the sparse adjacency matrix $\boldsymbol{\mathrm{D}}$ whereas the second variant called PVisRec ( $\boldsymbol{\mathrm{A}},\boldsymbol{\mathrm{C}},\boldsymbol{\mathrm{D}}$ only) does not use the dense meta-feature matrix $\boldsymbol{\mathrm{M}}$ for learning. This is in contrast to PVisRec that uses $\boldsymbol{\mathrm{A}},\boldsymbol{\mathrm{C}},\boldsymbol{\mathrm{D}}$ and $\boldsymbol{\mathrm{M}}$ . In Table 6, we see that both variants perform worse than PVisRec, indicating the importance of using all the graph representations for learning the personalized visualization recommendation model. Further, PVisRec ( $\boldsymbol{\mathrm{A}},\boldsymbol{\mathrm{C}},\boldsymbol{\mathrm{D}}$ only) outperforms the other variant across both ranking metrics and across all $K$ . This suggests that $\boldsymbol{\mathrm{D}}$ may be more important for learning than $\boldsymbol{\mathrm{M}}$ . Nevertheless, the best personalized visualization recommendation performance is obtained when both $\boldsymbol{\mathrm{D}}$ and $\boldsymbol{\mathrm{M}}$ are used along with $\boldsymbol{\mathrm{A}}$ and $\boldsymbol{\mathrm{C}}$ . Finally, these two simpler variants still perform better than the baselines for HR@1 and NDCG@1 as shown in Table 5.

To understand the effect of the embedding size $d$ on the performance of our personalized visualization recommendation approach, we vary the dimensionality of the embeddings $d$ from 1 to 1024. In these experiments, we use PVisRec with the full-rank meta-feature matrix $\boldsymbol{\mathrm{M}}$ . In Figure 5, we show results for the personalized visualization recommendation problem using our PVisRec approach with varying embedding dimensions (size) $d\in\{2^{0},\ldots,2^{10}\}$ and HR@K for $k=1,\ldots,10$ . In addition, we also provide results in Figure-6 for NDCG@K for $k=1,\ldots,10$ while varying the embedding size $d\in\{2^{0},\ldots,2^{10}\}$ . This experiment uses the original meta-feature matrix $\boldsymbol{\mathrm{M}}$ and not the compressed meta-feature embedding (MFE) matrix. For both HR@K and NDCG@K, we observe in Figure 5-6 that performance typically increases as a function of the embedding dimension $d$ . We also observe that for HIT@1 and nDCG@1, the best performance is achieved when $d=512$ , which is 0.669 and 0.667, respectively. This holds for all k for both HR@K and NDCG@K as shown in Figure 5-6. Furthermore, when $d$ becomes too large, we observe a large drop in performance, which is due to overfitting. For instance, in Figure 5, we see that when $d=1024$ we have HIT@1 of 0.580 compared to 0.669 for $d=512$ .

Table 7. Results comparing Non-personalized vs. Personalized Visualization Recommendation.

	HR@K					NDCG@K
Model	@1	@2	@3	@4	@5	@1	@2	@3	@4	@5
Non-personalized	0.151	0.248	0.319	0.373	0.404	0.145	0.209	0.244	0.268	0.280
Personalized	0.630	0.815	0.876	0.906	0.928	0.624	0.743	0.775	0.788	0.796

6.2. Comparing Personalized vs. Non-personalized Visualization Recommendation

To answer RQ2, we compare the personalized visualization recommendation model (PVisRec) to a non-personalized ML model. More specifically, we compare the user-specific personalized visualization recommendation model (PVisRec) to a global non-personalized ML-based method that does not leverage a user-specific personalized model for each user. For fairness, we simply leverage the specific user embedding for the personalized model, and for the non-personalized model we simply derive an aggregate global embedding of a typical user, and leverage this global non-personalized model to rank the visualizations. More formally, the non-personalized ML-based approach uses a global user embedding derived as,

(47)

\displaystyle\boldsymbol{\mathrm{u}}_{g}=\frac{1}{n}\sum_{i=1}^{n}\boldsymbol{\mathrm{U}}_{i}

where $\boldsymbol{\mathrm{u}}_{g}$ is called the global user embedding and represents the centroid of the user embeddings from PVisRec. Everything else remains the same as the personalized visualization recommendation approach. More formally, given a user $i$ along with an arbitrary visualization $\mathcal{V}=(\boldsymbol{\mathrm{X}}^{(k)},C_{t})$ generated from some dataset $\boldsymbol{\mathrm{X}}$ , we derive a score for the visualization $\mathcal{V}$ using the global user embedding $\boldsymbol{\mathrm{u}}_{g}$ from Eq. 47 as follows:

(48)

\displaystyle\phi_{g}(V)=\boldsymbol{\mathrm{u}}_{g}\boldsymbol{\mathrm{Z}}_{t,:}^{\top}\prod_{\boldsymbol{\mathrm{x}}_{j}\in\boldsymbol{\mathrm{X}}^{(k)}}\boldsymbol{\mathrm{u}}_{g}\boldsymbol{\mathrm{V}}_{j,:}^{\top}

where $\boldsymbol{\mathrm{X}}^{(k)}$ is a subset of attributes used in the visualization $\mathcal{V}$ from the dataset $\boldsymbol{\mathrm{X}}$ (hence, $|\boldsymbol{\mathrm{X}}^{(k)}|\leq|\boldsymbol{\mathrm{X}}|$ ) and $\mathcal{C}_{t}\in\mathbcal{C}$ is the visual-configuration of $\mathcal{V}$ . Hence, instead of leveraging user $i$ ’s personalized visualization recommendation model to obtain a user personalized score for visualization $\mathcal{V}$ (that is $\phi(V)=\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{Z}}_{t,:}^{\top}\prod_{\boldsymbol{\mathrm{x}}_{j}\in\boldsymbol{\mathrm{X}}^{(k)}}\boldsymbol{\mathrm{U}}_{i,:}\boldsymbol{\mathrm{V}}_{j,:}^{\top}$ ), we replace $\boldsymbol{\mathrm{U}}_{i,:}$ with the global user embedding $\boldsymbol{\mathrm{u}}_{g}$ representing a “typical” user. Results are provided in Table 7. For both models in Table 7, we use the same experimental setup from Section 6.1. This PVisRec model is used for learning $\boldsymbol{\mathrm{U}}$ , then Eq. 47 is used for the non-personalized model. Notably, the non-personalized approach that uses the same global user model for all users performs significantly worse (as shown in Table 7) compared to the user-level personalized approach that leverages the appropriate learned model to personalize the ranking of visualizations with respect to the user at hand. This demonstrates the significance of learning individual models for each user that are personalized based on the users attribute/data preferences along with their visual design choice preferences (RQ2).

Table 8. Space vs. Accuracy Trade-off Results using Meta-Feature Embeddings (MFE). Results for the space-efficient variants of our personalized visualization recommendation methods that use meta-feature embeddings. In particular, we set

d=10

and vary the dimensions of the meta-feature embeddings from

\{1,2,4,8,16\}

. See text for discussion.

		HR@K					NDCG@K
Model	MFE dim.	@1	@2	@3	@4	@5	@1	@2	@3	@4	@5
PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{M}}$ only)	1	0.284	0.413	0.480	0.512	0.529	0.282	0.364	0.398	0.412	0.418
	2	0.245	0.348	0.395	0.417	0.429	0.244	0.308	0.333	0.342	0.346
	4	0.265	0.388	0.444	0.468	0.481	0.263	0.341	0.369	0.380	0.385
	8	0.304	0.419	0.462	0.492	0.506	0.302	0.376	0.397	0.410	0.416
	16	0.294	0.404	0.452	0.471	0.483	0.292	0.362	0.386	0.395	0.399
PVisRec	1	0.467	0.589	0.641	0.667	0.681	0.464	0.542	0.569	0.580	0.585
	2	0.542	0.685	0.744	0.771	0.792	0.539	0.630	0.660	0.672	0.680
	4	0.544	0.713	0.779	0.815	0.829	0.541	0.649	0.682	0.698	0.704
	8	0.608	0.806	0.874	0.906	0.925	0.604	0.731	0.765	0.779	0.787
	16	0.616	0.794	0.865	0.896	0.916	0.613	0.726	0.762	0.776	0.784

6.3. Improving Space-Efficiency via Meta-Feature Embeddings

In this section, we investigate using a low-rank meta-feature embedding matrix to significantly improve the space-efficiency of our proposed approach. In particular, we replace the original meta-feature matrix $\boldsymbol{\mathrm{M}}$ with a low-rank approximation that captures the most important and meaningful meta-feature signals in the data. In addition to significantly reducing the space requirements of PVisRec, we also investigate the performance when the low-rank meta-feature embeddings are used, and the space and accuracy trade-off as the number of meta-feature embedding dimensions varies from $\{1,2,4,8,16\}$ . We set $d=10$ and vary the dimensions of the dimensionality of the meta-feature embeddings (MFE) from $\{1,2,4,8,16\}$ across the different proposed approaches. We provide the results in Table 8 for the space-efficient variants of our personalized visualization recommendation methods that use meta-feature embeddings. Overall, we find that in nearly all cases, we find similar HR@K and NDCG@K compared to the original variants, while obtaining a significantly more compact model with orders of magnitude less space. For instance, when MFE dim. is 16, PvisRec has a HIT@1 of 0.616 compared to 0.630 using the original 1006-dimensional meta-feature matrix, which uses roughly 63x more space compared to the 16-dimensional MFE variant. As an aside, since PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{D}}$ ) does not use $\boldsymbol{\mathrm{M}}$ , it does not have a meta-feature embedding (MFE) variant. This implies that we can indeed significantly reduce the space requirements of our approaches by trading off only a tiny amount of accuracy (RQ3).

6.4. Neural Personalized Visualization Recommendation

In this section, we study the performance of the proposed Neural Personalized Visualization Recommendation models (RQ4). For these experiments, we use $d=10$ and $\alpha=0.5$ for Neural PVisRec-CMF. All models use three layers and ReLU for the activation function. See Section 4 for further details. The results are provided in Table 9. Both neural personalized visualization recommendation models outperform the simpler and faster graph-based PVisRec approach (across both rank-based evaluation metrics and across all top- $K$ personalized visualization recommendations). This is expected since the neural visualization models all leverage the graph-based PVisRec model in some fashion. Neural PVisRec uses the learned low-dimensional embeddings of the users, visual-configurations, attributes, and meta-features of the attributes as input into the first layer whereas Neural PVisRec-CMF also uses the learned low-dimensional embeddings, but also uses the predicted visualization scores from the PVisRec model for each user and combines these with the predicted scores from the neural component. Notably, both neural personalized visualization recommendation models outperform the simpler and faster graph-based approach. In Table 9, Neural PVisRec-CMF outperforms the simpler Neural PVisRec network. This holds for HR@ $K$ and NDCG@ $K$ , and across all top- $K$ personalized visualization recommendations where $K\in\{1,...,5\}$ .

Table 9. Results for the Neural Personalized Visualization Recommendation Models.

	HR@K					NDCG@K
Model	@1	@2	@3	@4	@5	@1	@2	@3	@4	@5
Neural PVisRec	0.656	0.825	0.889	0.923	0.946	0.652	0.761	0.793	0.808	0.817
Neural PVisRec-CMF	0.762	0.879	0.922	0.944	0.961	0.729	0.822	0.845	0.855	0.861

6.4.1. Nonlinear activation function.

Neural PVisRec is flexible and can leverage any nonlinear activation functions for the fully-connected layers of our multilayer neural network architecture for personalized visualization recommendation. In Table 10, we compare three non-linear activation functions $\sigma$ for learning a personalized visualization recommendation model including hyperbolic tangent (tanh) $\sigma(\boldsymbol{\mathrm{x}})=\tanh(\boldsymbol{\mathrm{x}})$ , sigmoid $\sigma(\boldsymbol{\mathrm{x}})=1/(1+\exp[-\boldsymbol{\mathrm{x}}])$ , and ReLU $\sigma(\boldsymbol{\mathrm{x}})=\max(0,\boldsymbol{\mathrm{x}})$ . The results in Table 10 show that ReLU performs best by a large margin followed by sigmoid and then tanh. ReLU likely performs well due to its ability to avoid saturation, handle sparse data and be less likely to overfit.

Table 10. Ablation study results of Neural PVisRec with different nonlinear activation functions. We report HR@1 for brevity. All results use

d=10

and

\alpha=0.5

	nonlinear activation $\sigma$
Model	tanh	sigmoid	ReLU
Neural PVisRec	0.615	0.624	0.656
Neural PVisRec-CMF	0.613	0.640	0.762

Table 11. Comparing performance of Neural PVisRec with different number of hidden layers.

	HR@K					NDCG@K
# Hidden Layers	@1	@2	@3	@4	@5	@1	@2	@3	@4	@5
1	0.579	0.773	0.844	0.880	0.896	0.578	0.701	0.737	0.752	0.758
2	0.618	0.801	0.865	0.892	0.907	0.618	0.733	0.765	0.777	0.783
3	0.656	0.825	0.889	0.923	0.946	0.652	0.761	0.793	0.808	0.817
4	0.646	0.754	0.813	0.842	0.869	0.499	0.639	0.680	0.694	0.705

6.4.2. Hidden layers.

To understand the impact of the number of layers on the performance of the neural personalized visualization recommendation models, we vary the number of hidden layers from $L\in\{1,2,3,4\}$ . In Table 11, the performance increases as additional hidden layers are included, and begins to decrease at $L=4$ . The best performance is achieved with three hidden layers. This result indicates the benefit of deep learning for personalized visualization recommendation.

6.4.3. Layer size.

Recall that our network structure followed a tower pattern where the layer size of each successive layer is halved. In this experiment, we investigate larger layer sizes while fixing the final output embedding size to be 8 and using 4 hidden layers. In Table 12, we observe a significant improvement in the visualization ranking when using larger layer sizes.

Table 12. Varying layer sizes in the deep personalized visualization recommendation model (Neural PVisRec).

	HR@K
Layer Sizes	@1	@2	@3	@4	@5
8-16-32-64	0.701	0.790	0.832	0.865	0.883
8-32-128-512	0.734	0.797	0.846	0.874	0.886
8-48-288-1728	0.752	0.822	0.869	0.895	0.913

6.4.4. Runtime performance.

Neural PVisRec is also fast, taking on average 10.85 seconds to train using the large personalized visualization corpus from Section 5. The other neural visualization recommender is nearly as fast, as it contains only an additional step that is linear in the output embedding size. For these experiments, we used a 2017 MacBook Pro with 16GB memory and 3.1GHz Intel Core i7 processor.

7. Related Work

7.1. Visualization Recommendation

Rule-based visualization recommendation systems such as Voyager (Vartak et al., 2017; Wongsuphasawat et al., 2017, 2015), VizDeck (Perry et al., 2013), and DIVE (Hu et al., 2018) use a large set of rules defined manually by domain experts to recommend appropriate visualizations that satisfy the rules (Lee, 2020; Mackinlay, 1986; Roth et al., 1994; Casner, 1991; Mackinlay et al., 2007a; Derthick et al., 1997; Stolte et al., 2002; Feiner, 1985; Seo and Shneiderman, 2005). Such rule-based systems do not leverage any training data for learning or user personalization. There have been a few “hybrid” approaches that combine some form of learning with manually defined rules for visualization recommendation (Moritz et al., 2018), e.g., Draco learns weights for rules (constraints) (Moritz et al., 2018). Recently, there has been work that focused on the end-to-end ML-based visualization recommendation problem (Qian et al., 2020; Dibia and Demiralp, 2019). However, this work learns a global visualization recommendation model that is agnostic of the user, and thus not able to be used for the personalized visualization recommendation problem studied in our work.

All of the existing rule-based (Vartak et al., 2017; Wongsuphasawat et al., 2017, 2015; Perry et al., 2013; Hu et al., 2018), hybrid (Moritz et al., 2018), and pure ML-based visualization recommendation (Qian et al., 2020) approaches are unable to recommend personalized visualizations for specific users. These approaches do not model users, but focus entirely on learning or manually defining visualization rules that capture the notion of an effective visualization (Mackinlay et al., 2007b; Wills and Wilkinson, 2010; Key et al., 2012; Elzen and Wijk, 2013; Wilkinson and Wills, 2008; Dang and Wilkinson, 2014; Vartak et al., 2015; Demiralp et al., 2017; Cui et al., 2019; Lin et al., 2020; Lee et al., 2019a; Siddiqui et al., 2016). Therefore, no matter the user, the model always gives the same recommendations. The closest existing work is VizRec (Mutlu et al., 2016). However, VizRec is only applicable when there is a single dataset shared by all users (and therefore a single small set of visualizations that the users have explicitly liked and tagged). This problem is unrealistic with many impractical assumptions that are not aligned with practice. Nevertheless, the problem solved by that prior work is a simple special case of the personalized visualization recommendation problem introduced in our paper.

7.2. Simpler Design and Data Tasks

Besides visualization recommendation, there are methods that solve simpler sub-tasks such as improving expressiveness, improving perceptual effectiveness, matching task types, etc. These simpler sub-tasks can generally be divided two categories (Lee, 2020; Wongsuphasawat et al., 2016): whether the solution focuses on recommending data (what data to visualize), such as Discovery-driven Data Cubes (Sarawagi et al., 1998), Scagnostics (Wilkinson et al., 2005), AutoVis (Wills and Wilkinson, 2010), and MuVE (Ehsan et al., 2016)) or recommending encoding (how to design and visually encode the data), such as APT (Mackinlay, 1986), ShowMe (Mackinlay et al., 2007a), and Draco–learn (Moritz et al., 2018)). While some of those are ML-based, none are able to recommend entire visualizations (nor are they personalized), which is the focus of this work. For example, VizML (Hu et al., 2019) predicts the type of a chart (e.g., bar, scatter, etc.) instead of complete visualization. Draco (Moritz et al., 2018) infers weights for a set of manually defined rules. VisPilot (Lee et al., 2019b) recommended different drill-down data subsets from datasets. As an aside, not only does these works not solve the visualization recommendation problem, they are also not personalized for individual users. Instead of solving simple sub-tasks such as predicting the chart type of a visualization, we focus on the end-to-end personalized visualization recommendation problem (Sec. 2): given a dataset of interest to user $i$ , the goal is to automatically recommend the top-k most effective visualizations personalized for that individual user. This paper fills the gap by proposing the first personalized visualization recommendation approach that is completely automatic, data-driven, and most importantly recommends personalized visualizations based on a users previous feedback, behavior, and interactions with the system.

7.3. Traditional Recommender Systems

In traditional item-based recommender systems (Adomavicius and Tuzhilin, 2005; Ricci et al., 2011; Zhao et al., 2020; Noel et al., 2012; Zhang et al., 2017), there is a single shared set of items (i.e., movies (Bennett et al., 2007; Covington et al., 2016; Harper and Konstan, 2015), products (Linden et al., 2003), hashtags (Sigurbjörnsson and Van Zwol, 2008; Wang et al., 2020), documents (Xu et al., 2020; Kanakia et al., 2019), news (Ge et al., 2020), books (Liu et al., 2014), and location (Ye et al., 2011; Zhou et al., 2019; Bennett et al., 2011)). However, in the personalized visualization recommendation problem studied in this work, since visualizations are dataset dependent, there is not a shared set of visualizations to recommend users. Therefore, given $N$ datasets, there are $N$ completely disjoint sets of visualizations that can be recommended. Every dataset consists of its own completely separate set of relevant visualizations that are exclusive to the dataset. Therefore, in contrast to the goal of traditional item recommender systems, the goal of personalized visualization recommendation is to learn a personalized vis. rec. model for each individual user, which is capable of scoring and ultimately recommending personalized visualizations to that user from any unseen dataset in the future. Some recent works have adapted various deep learning approaches for collaborative filtering (Sedhain et al., 2015; Li et al., 2020; He et al., 2017; Chen et al., 2020; Guan et al., 2019). However, none of these works have focused on the problem of personalized visualization recommendation studied in this work. The personalized visualization recommender problem has a few similarities with cross-domain recommendation (Tang et al., 2012; Gao et al., 2013; Man et al., 2017; Shapira et al., 2013; Hu et al., 2013). In cross-domain item recommendation, there is only a few datasets as opposed to tens of thousands of different datasets in our problem (Zhao et al., 2020). More importantly, in cross-domain item recommendation, the different datasets are assumed to share at least one mode between each other, whereas in personalized visualization recommendation, each new dataset gives rise to a completely different set of visualizations to recommend.

8. Conclusion

In this work, we introduced the problem of user-specific personalized visualization recommendation and proposed an approach for solving it. The approach learns individual personalized visualization recommendation models for each user. In particular, the personalized vis. rec. models for each user are learned by taking into account the user feedback including both implicit and explicit feedback regarding the visual and data preferences of the users, as well as users whom have also explored similar datasets and visualizations. We overcome the issues with data sparsity and limited user feedback by leveraging the data and visualization preferences of users whom are similar, despite that the visualizations from such users are from completely different datasets. The models are able to learn better visualization recommendation models for each individual user by leveraging the data and visualization preferences of users whom are similar. In addition, we proposed a deep neural network architecture for neural personalized visualization recommendation that can learn complex non-linear relationships between the users, their attributes of interest, and visualization preferences. This paper is a first step in the direction of learning personalized visualization recommendation models for individual users based on their data and visualization feedback, and the data and visual preferences of users with similar data and visual preferences. Future work should investigate and develop better machine learning models and learning techniques to further improve the personalized visualization recommendation models and the visualization recommendations for individual users.

References

(1)
Adomavicius and Tuzhilin (2005) Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. TKDE 17, 6 (2005), 734–749.
Balasubramaniam et al. (2020) Thirunavukarasu Balasubramaniam, Richi Nayak, Chau Yuen, and Yu-Chu Tian. 2020. Column-wise element selection for computationally efficient nonnegative coupled matrix tensor factorization. IEEE Transactions on Knowledge and Data Engineering (2020).
Bennett et al. (2007) James Bennett, Stan Lanning, et al. 2007. The netflix prize. In KDD Cup. 35.
Bennett et al. (2011) Paul N Bennett, Filip Radlinski, Ryen W White, and Emine Yilmaz. 2011. Inferring and using location metadata to personalize web search. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 135–144.
Bouchard et al. (2013) Guillaume Bouchard, Dawei Yin, and Shengbo Guo. 2013. Convex collective matrix factorization. In AISTATS. PMLR, 144–152.
Casner (1991) Stephen M Casner. 1991. Task-Analytic Approach to the Automated Design of Graphic Presentations. ACM Transactions on Graphics (ToG) 10, 2 (1991), 111–151.
Chen et al. (2020) Chong Chen, Min Zhang, Yongfeng Zhang, Yiqun Liu, and Shaoping Ma. 2020. Efficient neural matrix factorization without sampling for recommendation. ACM Transactions on Information Systems (TOIS) 38, 2 (2020), 1–28.
Choi et al. (2019) Dongjin Choi, Jun-Gi Jang, and U Kang. 2019. S3 CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization. PloS one 14, 6 (2019), e0217316.
Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. 191–198.
Cui et al. (2019) Zhe Cui, Sriram Karthik Badam, M Adil Yalçin, and Niklas Elmqvist. 2019. Datasite: Proactive Visual Data Exploration With Computation of Insight-Based Recommendations. Information Visualization 18, 2 (2019), 251–267.
Dang and Wilkinson (2014) Tuan Nhon Dang and Leland Wilkinson. 2014. ScagExplorer: Exploring Scatterplots by Their Scagnostics. In 2014 IEEE Pacific visualization symposium. IEEE, 73–80.
Demiralp et al. (2017) Çağatay Demiralp, Peter J Haas, Srinivasan Parthasarathy, and Tejaswini Pedapati. 2017. Foresight: Recommending Visual Insights. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 10.
Derthick et al. (1997) Mark Derthick, John Kolojejchick, and Steven F Roth. 1997. An interactive visualization environment for data exploration. In KDD. 2–9.
Dibia and Demiralp (2019) Victor Dibia and Çağatay Demiralp. 2019. Data2vis: Automatic generation of data visualizations using sequence-to-sequence recurrent neural networks. IEEE computer graphics and applications 39, 5 (2019), 33–46.
Ehsan et al. (2016) Humaira Ehsan, Mohamed Sharaf, and Panos Chrysanthis. 2016. Muve: Efficient multi-objective view recommendation for visual data exploration. In ICDE.
Elzen and Wijk (2013) Stef van den Elzen and Jarke J. van Wijk. 2013. Small Multiples, Large Singles: A New Approach for Visual Data Exploration. In Computer Graphics Forum, Vol. 32. 191–200.
Feiner (1985) Steven Feiner. 1985. APEX: An Experiment in the Automated Creation of Pictorial Explanations. IEEE Computer Graphics and Applications 5, 11 (1985), 29–37.
Gao et al. (2013) Sheng Gao, Hao Luo, Da Chen, Shantao Li, Patrick Gallinari, and Jun Guo. 2013. Cross-domain recommendation via cluster-level latent factor model. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 161–176.
Ge et al. (2020) Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph Enhanced Representation Learning for News Recommendation. In WWW.
Guan et al. (2019) Xinyu Guan, Zhiyong Cheng, Xiangnan He, Yongfeng Zhang, Zhibo Zhu, Qinke Peng, and Tat-Seng Chua. 2019. Attentive aspect modeling for review-aware recommendation. ACM Transactions on Information Systems (TOIS) 37, 3 (2019), 1–27.
Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. TIIS 5, 4 (2015), 1–19.
He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173–182.
He et al. (2016) Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 549–558.
Hu et al. (2019) Kevin Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, and César Hidalgo. 2019. VizML: A Machine Learning Approach to Visualization Recommendation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300358
Hu et al. (2018) Kevin Hu, Diana Orghian, and César Hidalgo. 2018. Dive: A mixed-initiative system supporting integrated data exploration workflows. In Workshop on Human-In-the-Loop Data Anal. 1–7.
Hu et al. (2013) Liang Hu, Jian Cao, Guandong Xu, Longbing Cao, Zhiping Gu, and Can Zhu. 2013. Personalized recommendation via cross-domain triadic factorization. In Proceedings of the 22nd International Conference on World Wide Web. 595–606.
Kanakia et al. (2019) Anshul Kanakia, Zhihong Shen, Darrin Eide, and Kuansan Wang. 2019. A scalable hybrid research paper recommender system for microsoft academic. In WWW.
Key et al. (2012) Alicia Key, Bill Howe, Daniel Perry, and Cecilia Aragon. 2012. VizDeck: Self-Organizing Dashboards for Visual Analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 681–684.
Kim et al. (2014) Jingu Kim, Yunlong He, and Haesun Park. 2014. Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Journal of Global Optimization 58, 2 (2014), 285–319.
Lee (2020) Doris Jung-Lin Lee. 2020. Insight Machines: The Past, Present, and Future of Visualization Recommendation. (August 2020).
Lee et al. (2019a) Doris Jung-Lin Lee, Himel Dev, Huizi Hu, Hazem Elmeleegy, and Aditya Parameswaran. 2019a. Avoiding Drill-Down Fallacies With VisPilot: Assisted Exploration of Data Subsets. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 186–196.
Lee et al. (2019b) Doris Jung-Lin Lee, Himel Dev, Huizi Hu, Hazem Elmeleegy, and Aditya Parameswaran. 2019b. Avoiding drill-down fallacies with VisPilot: assisted exploration of data subsets. In IUI. 186–196.
Li et al. (2020) Xiangsheng Li, Maarten de Rijke, Yiqun Liu, Jiaxin Mao, Weizhi Ma, Min Zhang, and Shaoping Ma. 2020. Learning Better Representations for Neural Information Retrieval with Graph Information. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 795–804.
Lin et al. (2020) Halden Lin, Dominik Moritz, and Jeffrey Heer. 2020. Dziban: Balancing Agency & Automation in Visualization Design via Anchored Recommendations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
Linden et al. (2003) Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com recommendations: Item-to-item collaborative filtering. Internet Computing 7, 1 (2003), 76–80.
Liu et al. (2014) Yidan Liu, Min Xie, and Laks VS Lakshmanan. 2014. Recommending user generated item lists. In RecSys. 185–192.
Mackinlay (1986) Jock Mackinlay. 1986. Automating the design of graphical presentations of relational information. ACM Trans. Graph. 5, 2 (1986), 110–141.
Mackinlay et al. (2007a) Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007a. Show Me: Automatic presentation for visual analysis. TVCG 13, 6 (2007), 1137–1144.
Mackinlay et al. (2007b) Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007b. Show Me: Automatic Presentation for Visual Analysis. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1137–1144.
Man et al. (2017) Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-Domain Recommendation: An Embedding and Mapping Approach.. In IJCAI. 2464–2470.
Moritz et al. (2018) Dominik Moritz, Chenglong Wang, Greg L Nelson, Halden Lin, Adam M Smith, Bill Howe, and Jeffrey Heer. 2018. Formalizing visualization design knowledge as constraints: Actionable and extensible models in draco. IEEE transactions on visualization and computer graphics 25, 1 (2018), 438–448.
Mutlu et al. (2016) Belgin Mutlu, Eduardo Veas, and Christoph Trattner. 2016. Vizrec: Recommending personalized visualizations. ACM Transactions on Interactive Intelligent Systems (TIIS) 6, 4 (2016), 1–39.
Noel et al. (2012) Joseph Noel, Scott Sanner, Khoi-Nguyen Tran, Peter Christen, Lexing Xie, Edwin V Bonilla, Ehsan Abbasnejad, and Nicolás Della Penna. 2012. New objective functions for social collaborative filtering. In Proceedings of the 21st International Conference on World Wide Web. 859–868.
Oh et al. (2015) Jinoh Oh, Wook-Shin Han, Hwanjo Yu, and Xiaoqian Jiang. 2015. Fast and robust parallel SGD matrix factorization. In SIGKDD. ACM, 865–874.
Perry et al. (2013) Daniel B Perry, Bill Howe, Alicia MF Key, and Cecilia Aragon. 2013. VizDeck: Streamlining exploratory visual analytics of scientific data. (2013).
Qian et al. (2020) Xin Qian, Ryan A. Rossi, Fan Du, Sungchul Kim, Eunyee Koh, Sana Malik, Tak Yeon Lee, and Joel Chan. 2020. ML-based Visualization Recommendation: Learning to Recommend Visualizations from Data. arXiv:2009.12316 (2020).
Ricci et al. (2011) Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Rec. Sys. handbook. 1–35.
Rossi and Zhou (2016) Ryan A. Rossi and Rong Zhou. 2016. Parallel Collective Factorization for Modeling Large Heterogeneous Networks. In Social Network Analysis and Mining. 30.
Roth et al. (1994) Steven F Roth, John Kolojejchick, Joe Mattis, and Jade Goldstein. 1994. Interactive graphic design using automatic presentation knowledge. In CHI. 112–117.
Sarawagi et al. (1998) Sunita Sarawagi, Rakesh Agrawal, and Nimrod Megiddo. 1998. Discovery-driven exploration of OLAP data cubes. In Extending Database Tech. 168–182.
Satyanarayan et al. (2016) Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactive graphics. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 341–350.
Schenker et al. (2021) Carla Schenker, Jeremy E Cohen, and Evrim Acar. 2021. An optimization framework for regularized linearly coupled matrix-tensor factorization. In 28th European Signal Processing Conference (EUSIPCO). 985–989.
Sedhain et al. (2015) Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web. 111–112.
Seo and Shneiderman (2005) Jinwook Seo and Ben Shneiderman. 2005. A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data. Information visualization 4, 2 (2005), 96–113.
Shapira et al. (2013) Bracha Shapira, Lior Rokach, and Shirley Freilikhman. 2013. Facebook single and cross domain data for recommendation systems. User Modeling and User-Adapted Interaction 23, 2-3 (2013), 211–247.
Siddiqui et al. (2016) Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya Parameswaran. 2016. Effortless Data Exploration With zenvisage: An Expressive and Interactive Visual Analytics System. arXiv preprint arXiv:1604.03583 (2016).
Sigurbjörnsson and Van Zwol (2008) Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In WWW. 327–336.
Singh and Gordon (2008) Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective matrix factorization. In KDD. 650–658.
Stolte et al. (2002) Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. TVCG 8, 1 (2002), 52–65.
Tang et al. (2012) Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. 2012. Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1285–1293.
Vartak et al. (2017) Manasi Vartak, Silu Huang, Tarique Siddiqui, Samuel Madden, and Aditya Parameswaran. 2017. Towards visualization recommendation systems. ACM SIGMOD 45, 4 (2017), 34–39.
Vartak et al. (2015) Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 8. NIH Public Access, 2182.
Wang et al. (2020) Xueting Wang, Yiwei Zhang, and Toshihiko Yamasaki. 2020. Earn More Social Attention: User Popularity Based Tag Recommendation System. In WWW.
Wilkinson et al. (2005) Leland Wilkinson, Anushka Anand, and Robert Grossman. 2005. Graph-theoretic scagnostics. In IEEE Symposium on Information Visualization. 157–164.
Wilkinson and Wills (2008) Leland Wilkinson and Graham Wills. 2008. Scagnostics Distributions. Journal of Computational and Graphical Statistics 17, 2 (2008), 473–491.
Wills and Wilkinson (2010) Graham Wills and Leland Wilkinson. 2010. Autovis: automatic visualization. Information Visualization 9, 1 (2010), 47–69.
Wongsuphasawat et al. (2015) Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics 22, 1 (2015), 649–658.
Wongsuphasawat et al. (2016) Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2016. Towards a general-purpose query language for visualization recommendation. In Workshop on Human-In-the-Loop Data Anal.
Wongsuphasawat et al. (2017) Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2017. Voyager 2: Augmenting visual analysis with partial view specifications. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2648–2659.
Xu et al. (2020) Xuhai Xu, Ahmed Hassan Awadallah, Susan T. Dumais, Farheen Omar, Bogdan Popp, Robert Rounthwaite, and Farnaz Jahanbakhsh. 2020. Understanding User Behavior For Document Recommendation. In WWW. 3012–3018.
Ye et al. (2011) Mao Ye, Peifeng Yin, Wang-Chien Lee, and Dik-Lun Lee. 2011. Exploiting Geo. Influence for Collaborative Point-of-Interest Recommendation. In SIGIR.
Yun et al. (2014) Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, SVN Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion. VLDB 7, 11 (2014), 975–986.
Zhang et al. (2017) Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. 2017. Joint representation learning for top-n recommendation with heterogeneous information sources. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1449–1458.
Zhao et al. (2020) Cheng Zhao, Chenliang Li, Rong Xiao, Hongbo Deng, and Aixin Sun. 2020. CATN: Cross-domain recommendation for cold-start users via aspect transfer network. In SIGIR. 229–238.
Zhou et al. (2019) Fan Zhou, Ruiyang Yin, Kunpeng Zhang, Goce Trajcevski, Ting Zhong, and Jin Wu. 2019. Adversarial point-of-interest recommendation. In WWW. 3462–34618.

Personalized Visualization Recommendation

Abstract.

1. Introduction

1.1. Summary of Contributions

1.2. Organization of article

2. Personalized Visualization Recommendation

2.1. Implicit and Explicit User Feedback for Personalized Vis. Rec.

2.1.1. Data Preferences of Users: Implicit and Explicit Data Feedback

2.1.2. Visual Preferences of Users: Implicit and Explicit Visual Feedback

2.2. Training User Personalized Visualization Recommendation Model

Definition 1 (Space of Attribute Combinations).

Property 1.

Definition 2 (Visualization Configuration).

Definition 3 (Space of Visualization Configurations).

Property 2.

Definition 4 (Space of Visualizations of Xi​j\boldsymbol{\mathrm{X}}_{ij}).

Definition 5 (Relevant Visualizations of User ii and Dataset Xi​j\boldsymbol{\mathrm{X}}_{ij}).

Definition 6 (Non-relevant Visualizations of User ii and Dataset Xi​j\boldsymbol{\mathrm{X}}_{ij}).

Definition 7 (Training Personalized Vis. Recommendation Model).

2.3. Personalized Visualization Scoring and Recommendation

Definition 8 (Personalized Visualization Scoring).

Definition 9 (Personalized Visualization Ranking).

3. Personalized Visualization Recommendation Framework

3.1. Representing Datasets in a Universal Shared Meta-feature Space

Claim 3.1.

Property 3.

3.1.1. Meta-Embedding of Meta-Features

3.2. Learning from User-level Data Preferences Across Different Datasets

3.3. Learning from User-level Visual Preferences Across Different Datasets

Definition 10 (Visual Configuration).

Claim 3.2.

3.4. Models for Personalized Visualization Recommendation

3.4.1. PVisRec:

3.4.2. PVisRec (𝐀\boldsymbol{\mathrm{A}},𝐂\boldsymbol{\mathrm{C}},𝐌\boldsymbol{\mathrm{M}} only):

3.4.3. PVisRec (𝐀\boldsymbol{\mathrm{A}},𝐂\boldsymbol{\mathrm{C}},𝐃\boldsymbol{\mathrm{D}} only)

3.5. Inferring Personalized Visualization Recommendations for Individual Users

3.5.1. Personalized Attribute Recommendation

3.5.2. Personalized Visual-Configuration Recommendation

3.5.3. Personalized Visualization Recommendation

4. Deep Personalized Visualization Recommendation Models

4.0.1. Neural PVisRec

4.0.2. Neural PVisRec-CMF

4.0.3. Training

5. Benchmark Data for Personalized Visualization Recommendation

6. Experiments

6.1. Personalized Visualization Recommendation Results

6.1.1. Experimental setup

6.1.2. Baselines

6.1.3. Results

6.1.4. Ablation Study Results

6.2. Comparing Personalized vs. Non-personalized Visualization Recommendation

6.3. Improving Space-Efficiency via Meta-Feature Embeddings

6.4. Neural Personalized Visualization Recommendation

6.4.1. Nonlinear activation function.

6.4.2. Hidden layers.

6.4.3. Layer size.

6.4.4. Runtime performance.

7. Related Work

7.1. Visualization Recommendation

7.2. Simpler Design and Data Tasks

7.3. Traditional Recommender Systems

8. Conclusion

References

Definition 4 (Space of Visualizations of $\boldsymbol{\mathrm{X}}_{ij}$ ).

Definition 5 (Relevant Visualizations of User $i$ and Dataset $\boldsymbol{\mathrm{X}}_{ij}$ ).

Definition 6 (Non-relevant Visualizations of User $i$ and Dataset $\boldsymbol{\mathrm{X}}_{ij}$ ).

3.4.2. PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{M}}$ only):

3.4.3. PVisRec ( $\boldsymbol{\mathrm{A}}$ , $\boldsymbol{\mathrm{C}}$ , $\boldsymbol{\mathrm{D}}$ only)