This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Simultaneous Reduction of Number of Spots and Energy Layers in Intensity Modulated Proton Therapy for Rapid Spot Scanning Delivery

Anqi Fu    Vicki T. Taasti    Masoud Zarepisheh
Abstract

Background: Reducing proton treatment time improves patient comfort and decreases the risk of error from intra-fractional motion, but must be balanced against clinical goals and treatment plan quality.
Purpose: To improve the delivery efficiency of spot scanning proton therapy by simultaneously reducing the number of spots and energy layers using the reweighted l1l_{1} regularization method.
Methods: We formulated the proton treatment planning problem as a convex optimization problem with a cost function consisting of a dosimetric plan quality term plus a weighted l1l_{1} regularization term. We iteratively solved this problem and adaptively updated the regularization weights to promote the sparsity of both the spots and energy layers. The proposed algorithm was tested on four head-and-neck cancer patients, and its performance, in terms of reducing the number of spots and energy layers, was compared with existing standard l1l_{1} and group l2l_{2} regularization methods. We also compared the effectiveness of the three methods (l1l_{1}, group l2l_{2}, and reweighted l1l_{1}) at improving plan delivery efficiency without compromising dosimetric plan quality by constructing each of their Pareto surfaces charting the trade-off between plan delivery and plan quality.
Results: The reweighted l1l_{1} regularization method reduced the number of spots and energy layers by an average over all patients of 40%40\% and 35%35\%, respectively, with an insignificant cost to dosimetric plan quality. From the Pareto surfaces, it is clear that reweighted l1l_{1} provided a better trade-off between plan delivery efficiency and dosimetric plan quality than standard l1l_{1} or group l2l_{2} regularization, requiring the lowest cost to quality to achieve any given level of delivery efficiency.
Conclusions: Reweighted l1l_{1} regularization is a powerful method for simultaneously promoting the sparsity of spots and energy layers at a small cost to dosimetric plan quality. This sparsity reduces the time required for spot scanning and energy layer switching, thereby improving the delivery efficiency of proton plans.

1 Introduction

Intensity modulated proton therapy (IMPT) is typically delivered via the pencil beam scanning technique. The patient is irradiated by a sequence of proton spots, arranged laterally to cover the treatment volume, where the depth of penetration of each spot is determined by its energy layer. During IMPT, protons are transmitted spot-by-spot within every energy layer, and layer-by-layer within every beam over a set of fixed-angle beams. The total plan delivery time is roughly equal to the total switching time between beams (gantry rotation plus beam setup time), switching time between energy layers, travel time between spots, and dose delivery time at each spot [GCM+20, ZSL+22].

In this study, we seek to reduce IMPT delivery time by reducing the number of proton spots and energy layers. A shorter treatment time is desirable because it improves patient comfort, increases patient throughput (hence lowering treatment costs), and decreases the risk of error due to intra-fractional motion [LZZ15, SPS+16, MCN+20]. Simultaneously, we want to ensure clinical goals are met and the quality of the treatment plan remains uncompromised. This trade-off between delivery time and plan quality has been the subject of abundant research.

One thread of research focuses on greedy algorithms for energy layer assignment. These algorithms combine a variety of techniques for control point sampling, energy layer distribution, energy layer filtration, and spot optimization [DLZ+16]. The goal is to reduce energy layer switching time by pruning the number of energy layers and sequencing them so layer switches only occur from low-to-high energy level [LLZ+20, EBW+22].

Another strand of research takes a structured optimization-based approach. For example, Müller and Wilkens (2016) [MW16] directly minimize the sum of spot intensities as part of a prioritized optimization routine. Other authors formulate the proton treatment delivery problem as a mixed-integer program (MIP), where each energy layer [CLL+14] or path between layers [WSGJ+22, WZSG+22] is associated with a binary indicator variable. Their objective is to minimize the dose fidelity (e.g., over/underdose to the target) plus some penalty or constraints on the energy layers, which promote a lower switching time. Although mathematically elegant, these MIPs are computationally difficult to solve, as they scale poorly with the number of energy layers due to the combinatorial nature of the problem.

To avoid this issue, researchers turned their attention to continuous optimization models. A proton treatment planning problem in this category only contains continuous variables, like spot intensities and doses. Typically, the objective includes a regularization function that is selected to encourage sparsity (i.e., more zero elements) in the spots and energy layers. The regularizer applies a penalty to the total spot intensity within each layer group. A variety of options have been proposed for this penalty function: logarithm [vdWKHH15, vdWBA+20], l2,1/2l_{2,1/2} norm [GRL+20], and l2l_{2} norm [JHP+18, LCL+19, GCM+20, WSGJ+22]. The last of these is of particular interest because it is convex and widely used in statistics for promoting group sparsity; the associated regularizer is known as the group lasso [YL06, MvdGB08, LH15, IPR16].

The standard lasso (i.e., l1l_{1} norm penalty) promotes sparsity of the spot intensities, but does not directly penalize the energy layers. The group lasso (i.e., group l2l_{2} norm penalty) promotes sparsity of the energy layers, but actually increases the number of nonzero spots. Since IMPT delivery time depends on both the number of spots and energy layers [GCM+20], neither of these regularizers is ideal. In this paper, we propose a new regularization method that simultaneously reduces the number of nonzero spots and energy layers, while upholding treatment plan quality. Our reweighted l1l_{1} method combines the l1l_{1} penalty from standard lasso with a weighting mechanism that differentiates between the spots of different energy layers, similar to the group lasso. We test the proposed method on four head-and-neck cancer patients and demonstrate its ability to 1) reduce the number of spots and energy layers simultaneously, and 2) provide a better trade-off between dosimetric plan quality and plan delivery efficiency than existing regularization methods (i.e., standard l1l_{1} and group lasso).

2 Methods and materials

2.1 Problem formulation

We discretize the patient’s body into mm voxels and the proton beams into nn spots. For each spot j{1,,n}j\in\{1,\ldots,n\}, we calculate the radiation dose delivered by a unit intensity of that spot to voxel i{1,,m}i\in\{1,\ldots,m\} and call this value AijA_{ij}. The dose influence matrix is then A𝐑+m×nA\in\mathbf{R}_{+}^{m\times n}, where its rows correspond to the voxels and its columns to the spots. Let p𝐑+mp\in\mathbf{R}_{+}^{m} be the prescription vector, i.e., pip_{i} equals the physician-prescribed dose if ii is a target voxel and zero otherwise. We concatenate the spot intensities of all beams and all energy layers within each beam, denoting this collective as the vector x𝐑nx\in\mathbf{R}^{n}. The typical treatment planning problem seeks a vector of spot intensities xx that minimizes the deviation of the delivered dose, d=Axd=Ax, from the prescription pp. This deviation can be decomposed into a penalty on the overdose, d¯=(dp)+=max(dp,0)\overline{d}=(d-p)_{+}=\max(d-p,0), and the underdose, d¯=(dp)=min(dp,0)\underline{d}=(d-p)_{-}=-\min(d-p,0), which we combine to form the cost function

f(d¯,d¯)=i=1mw¯id¯i2+w¯id¯i2,f(\overline{d},\underline{d})=\sum_{i=1}^{m}\overline{w}_{i}\overline{d}_{i}^{2}+\underline{w}_{i}\underline{d}_{i}^{2}, (1)

where w¯,w¯𝐑+n\overline{w},\underline{w}\in\mathbf{R}_{+}^{n} are penalty parameters that determine the relative importance of the over/underdose to the treatment plan. (Note the underdose is ignored for non-target voxels ii because pi=0p_{i}=0).

Dose constraints are defined for each anatomical structure. For a given structure ss, let As𝐑+ms×nA^{s}\in\mathbf{R}_{+}^{m_{s}\times n} be the row slice of AA containing only the rows of the msm_{s} voxels in ss. A maximum dose constraint takes the form of AsxdsmaxA^{s}x\leq d_{s}^{\textrm{\tiny max}}, where dsmaxd_{s}^{\textrm{\tiny max}} is an upper bound. Similarly, a mean dose constraint is of the form 1ms𝟏TAsxdsmean\frac{1}{m_{s}}\mathbf{1}^{T}A^{s}x\leq d_{s}^{\textrm{\tiny mean}}. By stacking the constraint matrices/vectors for all SS structures, we can represent the set of dose constraints as a single linear inequality BxcBx\leq c, where B=[A1,1m1𝟏TA1,,AS,1mS𝟏TAS]B=[A^{1},\frac{1}{m_{1}}\mathbf{1}^{T}A^{1},\ldots,A^{S},\frac{1}{m_{S}}\mathbf{1}^{T}A^{S}] and c=[d1max,d1mean,,dSmax,dSmean]c=[d_{1}^{\textrm{\tiny max}},d_{1}^{\textrm{\tiny mean}},\ldots,d_{S}^{\textrm{\tiny max}},d_{S}^{\textrm{\tiny mean}}]. Then, our treatment planning problem is

minimizef(d¯,d¯)subject tod¯=(Axp)+,d¯=(Axp),Bxcx0,d¯0,d¯0\begin{array}[]{ll}\mbox{minimize}&f(\overline{d},\underline{d})\\ \mbox{subject to}&\overline{d}=(Ax-p)_{+},\quad\underline{d}=(Ax-p)_{-},\quad Bx\leq c\\ &x\geq 0,\quad\overline{d}\geq 0,\quad\underline{d}\geq 0\end{array} (2)

with variables x𝐑n,d¯𝐑mx\in\mathbf{R}^{n},\overline{d}\in\mathbf{R}^{m}, and d¯𝐑m\underline{d}\in\mathbf{R}^{m}. Since the objective function ff is monotonically increasing in d¯\overline{d} and d¯\underline{d} over the nonnegative reals, we can write this problem equivalently as

minimizef(d¯,d¯)subject toAxd¯+d¯=p,Bxcx0,d¯0,d¯0.\begin{array}[]{ll}\mbox{minimize}&f(\overline{d},\underline{d})\\ \mbox{subject to}&Ax-\overline{d}+\underline{d}=p,\quad Bx\leq c\\ &x\geq 0,\quad\overline{d}\geq 0,\quad\underline{d}\geq 0.\end{array} (3)

(The derivation is provided in appendix A). Problem 3 is a convex quadratic program (QP), hence can be solved using standard convex methods, e.g., the alternating direction method of multipliers (ADMM) [BPC+11, FTZ23] or interior-point methods [Wri97, Gor22]. The reader is referred to S. Boyd and L. Vandenberghe (2004) [BV04] and J. Nocedal and S. J. Wright (2006) [NW06] for a thorough discussion of convex optimization.

2.2 Common regularizers

The cost function defined in 1 focuses solely on the difference of the delivered dose from the prescription, i.e., the dosimetric plan quality. However, in our treatment scenario, we are also interested in reducing the dose delivery time, i.e., increasing the plan delivery efficiency. The delivery time is positively correlated with the number of nonzero spots (spot scanning rate) and nonzero energy layers (energy switching time) [GCM+20, PEL+18]. Thus, we want to augment the objective of problem 3 with a regularization function r:𝐑n𝐑r:\mathbf{R}^{n}\rightarrow\mathbf{R}, which penalizes the spot vector xx in a way that reduces the number of nonzero spots/layers, while maintaining high plan quality. The regularized treatment planning problem is

minimizef(d¯,d¯)+λr(x)subject toAxd¯+d¯=p,Bxcx0,d¯0,d¯0\begin{array}[]{ll}\mbox{minimize}&f(\overline{d},\underline{d})+\lambda r(x)\\ \mbox{subject to}&Ax-\overline{d}+\underline{d}=p,\quad Bx\leq c\\ &x\geq 0,\quad\overline{d}\geq 0,\quad\underline{d}\geq 0\end{array} (4)

with respect to x,d¯x,\overline{d}, and d¯\underline{d}. Here we have introduced a regularization weight λ0\lambda\geq 0 to balance the trade-off between dosimetric plan quality, represented by the cost f(d¯,d¯)f(\overline{d},\underline{d}), and plan delivery efficiency, as captured by the regularization term r(x)r(x). A larger value of λ\lambda places more importance on efficiency.

In the following subsections, we review a few regularization functions that have been suggested in the literature. Let 𝒥={1,,n}\mathcal{J}=\{1,\ldots,n\} and 𝒢={𝒥1,,𝒥G}\mathcal{G}=\{\mathcal{J}_{1},\ldots,\mathcal{J}_{G}\} be a set of subsets of 𝒥\mathcal{J}, where each 𝒥g𝒥\mathcal{J}_{g}\subseteq\mathcal{J} has exactly ngnn_{g}\leq n elements. Specifically in our setting, 𝒢\mathcal{G} represents a partition of nn spots into GG energy layers with 𝒥g\mathcal{J}_{g} containing the indices of the ngn_{g} spots in layer gg.

2.2.1 l0l_{0} regularizer

One method of reducing the delivery time is to directly penalize the number of nonzero spots. This can be accomplished via the l0l_{0} regularizer

r0(x)=x0=𝐜𝐚𝐫𝐝({j:xj0}),r_{0}(x)=\|x\|_{0}=\mathbf{card}(\{j:x_{j}\neq 0\}), (5)

which we have defined as the number of nonzero elements in xx. (Here 𝐜𝐚𝐫𝐝(A)\mathbf{card}(A) denotes the cardinality of set AA). Unfortunately, the l0l_{0} regularization function is computationally expensive to implement. To solve problem 4 with r=r0r=r_{0}, we would need to solve a series of large mixed-integer programs in order to determine the optimal subset of nonzero spots out of all possible combinations from 𝒥\mathcal{J} [BKM16]. As the number of spots nn is typically very large (on the order of 10310^{3} to 10410^{4}), this quickly becomes computationally intractable.

Another option is to apply the l0l_{0} regularizer to the energy layers:

r~0(x)=[j𝒥1xj,,j𝒥Gxj]0=𝐜𝐚𝐫𝐝({g:jgxj0}).\tilde{r}_{0}(x)=\left\|\left[\sum_{j\in\mathcal{J}_{1}}x_{j},\ldots,\sum_{j\in\mathcal{J}_{G}}x_{j}\right]\right\|_{0}=\mathbf{card}\left(\left\{g:\sum_{j\in\mathcal{I}_{g}}x_{j}\neq 0\right\}\right). (6)

In this case, r~0\tilde{r}_{0} returns the number of nonzero energy layers, where a layer gg is zero if and only if all its spots are zero, i.e., j𝒥gxj=0\sum_{j\in\mathcal{J}_{g}}x_{j}=0. The associated combinatorial problem or MIP simplifies to finding the optimal subset of nonzero layers, which is more manageable since GG is typically on the order of 10210^{2}. Cao et al. (2014) [CLL+14] developed an iterative method to solve an approximation of this MIP and were able to reduce the number of proton energies in their IMPT plan, while satisfying certain dosimetric criteria. Nevertheless, as combinatorial optimization is still expensive, we turn our attention to a different regularization function.

2.2.2 l1l_{1} regularizer

A common approximation of the l0l_{0} regularizer is the l1l_{1} norm. Define the l1l_{1} regularization function to be

r1(x)=x1=j=1n|xj|.r_{1}(x)=\|x\|_{1}=\sum_{j=1}^{n}|x_{j}|. (7)

This function is closed, convex, and continuous. When used as a regularizer in problem 4, it produces a convex optimization problem – a form of the lasso problem – that promotes sparsity in the solution vector xx [Tib96]. The lasso problem is well-studied in the literature [VBL13, HTW15], and various methods have been developed to solve it quickly and efficiently [PH07, SFR07, WL08, SYOS10].

One downside of the l1l_{1} regularizer is that it does not differentiate between energy layers and thus is insensitive to the number of layers: since the spot vector is nonnegative, the absolute value of its elements |xj|=xj|x_{j}|=x_{j}, and any sum over energy layers decouples into the sum over all spots g=1Gj𝒥g|xj|=j=1nxj\sum_{g=1}^{G}\sum_{j\in\mathcal{J}_{g}}|x_{j}|=\sum_{j=1}^{n}x_{j}. As energy layer switching time is typically longer than spot delivery or travel time, l1l_{1} regularization is not the most effective method for improving plan delivery efficiency.

2.2.3 Group l2l_{2} regularizer

The group l2l_{2} regularizer, also known as the group lasso, provides an alternative method for efficiently implementing group penalties. This regularization function is defined as

r2(x)=g=1G1ng{xj:j𝒥g}2=g=1G1ngj𝒥gxj2.r_{2}(x)=\sum_{g=1}^{G}\frac{1}{\sqrt{n_{g}}}\|\{x_{j}:j\in\mathcal{J}_{g}\}\|_{2}=\sum_{g=1}^{G}\sqrt{\frac{1}{n_{g}}\sum_{j\in\mathcal{J}_{g}}x_{j}^{2}}. (8)

It is the sum of the l2l_{2} norm of the vector corresponding to each group (i.e., energy layer), weighted by the reciprocal of the square root of the total number of group elements [YL06]. (The weights 1ng\frac{1}{\sqrt{n_{g}}} may differ across applications; see N. Simon and R. Tibshirani (2012) [ST12] for alternatives). Group lasso has been widely researched in the context of statistical analysis and regression [MvdGB08, JOV09, LH15, IPR16, MSH07], and many algorithms exist for solving the associated optimization problem effectively [Rak11, QSG13, YZ15]. Jensen et al. (2018) [JHP+18] employed a version of the group lasso to perform adaptive IMPT energy layer optimization.

In our proton treatment delivery scenario, the group lasso is capable of differentiating between energy layers, and thus is a good regularizer for reducing the number of nonzero layers. However, it also tends to increase the number of nonzero spots within the active layers, as the quadratic penalty term in equation 8 mostly ignores small spot intensities (square of a small xjx_{j} is near zero). This failure to generate sparsity in the spot vector due to the characteristics of the l2l_{2} norm make it an inadequate regularizer for our purposes.

2.3 Reweighted l1l_{1} method

As we have discussed, the l1l_{1} regularization function promotes sparsity of the spots, but not the energy layers. The group l2l_{2} regularization function promotes sparsity of the layers, but not the spots – indeed, it tends to produce dense spot vectors due to the l2l_{2} norm. In this section, we introduce the reweighted l1l_{1} regularization method, which promotes sparsity in both the spots and the energy layers.

The reweighted l1l_{1} method assigns a weight to every spot based on its intensity and energy layer. The weights are chosen to counteract the intensity of each layer, so that all spots, regardless of intensity, contribute roughly equally to the total regularization penalty. This is done to imitate the “ideal” group l0l_{0} regularizer (equation 6), which counts every nonzero layer as one unit (due to 𝐜𝐚𝐫𝐝\mathbf{card}) regardless of intensity.

Formally, we define the weighted group l1l_{1} regularizer

r3(x;β)=g=1Gβgj𝒥g|xj|r_{3}(x;\beta)=\sum_{g=1}^{G}\beta_{g}\sum_{j\in\mathcal{J}_{g}}|x_{j}| (9)

with weight parameters βg𝐑+{+}\beta_{g}\in\mathbf{R}_{+}\cup\{+\infty\} for g=1,,Gg=1,\ldots,G. An intuitive way to set βg\beta_{g} is to make it inversely proportional to the optimal total intensity of energy layer gg, i.e.,

βg={1j𝒥gxjj𝒥gxj0+j𝒥gxj=0,\beta_{g}=\begin{cases}\frac{1}{\sum_{j\in\mathcal{J}_{g}}x_{j}^{\star}}&\sum_{j\in\mathcal{J}_{g}}x_{j}^{\star}\neq 0\\ +\infty&\sum_{j\in\mathcal{J}_{g}}x_{j}^{\star}=0\end{cases},

where x𝐑+nx^{\star}\in\mathbf{R}_{+}^{n} is an optimal spot vector. However, we do not know xx^{\star} beforehand. We will approximate this weighting scheme iteratively using the reweighted l1l_{1} method.

The reweighted l1l_{1} method is a type of majorization-minimization (MM) algorithm, which solves an optimization problem by iteratively minimizing a surrogate function that majorizes the actual objective function. MM algorithms have a rich history in the literature [FBDN07, SSW10, Mai15, SBP17], and reweighted l1l_{1} in particular has been used to solve problems in portfolio optimization [LFB07], matrix rank minimization [Faz02, FHB03], and sparse signal recovery [CWB08]. Research has shown that it is fast and robust, outperforming standard l1l_{1} regularization in a variety of settings.

We now describe the reweighted l1l_{1} method in our treatment planning setting. Let the initial weights β(1)=𝟏\beta^{(1)}=\mathbf{1}. At each iteration k=1,2,k=1,2,\ldots,

  1. 1.

    Set x(k)x^{(k)} to a solution of

    minimizef(d¯,d¯)+λr3(x;β(k))subject toAxd¯+d¯=p,Bxcx0,d¯0,d¯0.\begin{array}[]{ll}\mbox{minimize}&f(\overline{d},\underline{d})+\lambda r_{3}(x;\beta^{(k)})\\ \mbox{subject to}&Ax-\overline{d}+\underline{d}=p,\quad Bx\leq c\\ &x\geq 0,\quad\overline{d}\geq 0,\quad\underline{d}\geq 0.\end{array} (10)
  2. 2.

    Compute the total intensity of each energy layer eg(k)=j𝒥gxj(k)e_{g}^{(k)}=\sum_{j\in\mathcal{J}_{g}}x_{j}^{(k)}.

  3. 3.

    Lower threshold the solution

    e~g(k)=max(eg(k),ϵ(k)),g=1,,G,\tilde{e}_{g}^{(k)}=\max(e_{g}^{(k)},\epsilon^{(k)}),\quad g=1,\ldots,G,

    where ϵ(k)=δmaxgeg(k)\epsilon^{(k)}=\delta\max_{g^{\prime}}e_{g^{\prime}}^{(k)} for some small δ(0,1)\delta\in(0,1).

  4. 4.

    Update the weights. First, compute the standardized reciprocals

    αg(k)=(1e~g(k))/(g=1G1e~g(k)),g=1,,G.\alpha_{g}^{(k)}=\left(\frac{1}{\tilde{e}_{g}^{(k)}}\right)\bigg{/}\left(\sum_{g^{\prime}=1}^{G}\frac{1}{\tilde{e}_{g^{\prime}}^{(k)}}\right),\quad g=1,\ldots,G. (11)

    Then, calculate the scaling term

    μ(k)=g=1Ge~g(k)/(g=1Gαg(k)e~g(k)).\mu^{(k)}=\sum_{g=1}^{G}\tilde{e}_{g}^{(k)}\bigg{/}\left(\sum_{g=1}^{G}\alpha_{g}^{(k)}\tilde{e}_{g}^{(k)}\right). (12)

    The new weights are βg(k+1)=μ(k)αg(k)\beta_{g}^{(k+1)}=\mu^{(k)}\alpha_{g}^{(k)}.

  5. 5.

    Terminate on convergence of the objective, or when kk reaches a user-defined maximum number of iterations KK.

Step 3 was introduced to ensure stability of the algorithm, so that a zero energy layer estimate eg(k)=0e_{g}^{(k)}=0 would not preclude the subsequent estimate eg(k+1)e_{g}^{(k+1)} from being nonzero. In our computational experiments, we found that a threshold fraction of δ=0.01\delta=0.01 produced good results. Step 4 was added to ensure the l1l_{1} regularization term (r1(x)r_{1}(x)) and the reweighted l1l_{1} term (r3(x)r_{3}(x)) contribute a similar amount to the objective. To this end, the reweighted l1l_{1} term is scaled by μ(k)\mu^{(k)} so that r1(x(k))=r3(x(k))r_{1}(x^{(k)})=r_{3}(x^{(k)}).

The reweighted l1l_{1} method has a number of advantages over the other regularizers we have reviewed here. It encourages sparsity in energy layers by properly grouping the l1l_{1} penalty. It spreads this penalty evenly across all energy layers using its weighting scheme, rather than prioritizing those spots with large magnitudes. Finally, it is easy to implement: each iteration of the algorithm only requires that we solve a simple convex problem with l1l_{1} regularization, which can be done efficiently using many off-the-shelf solvers. Moreover, the number of reweighting iterations needed in practice is typically very low, with most of the improvement coming from the first two to three iterations, so its computational cost is overall low. As we will see in the next section, reweighted l1l_{1} outperforms regular l1l_{1} and group l2l_{2} penalties in sparsifying spots/layers.

2.4 Patient population and computational framework

We compared the reweighted l1l_{1} method with standard l1l_{1} and group l2l_{2} regularization on four head-and-neck cancer patient cases from The Cancer Imaging Archive (TCIA) [CVS+13, BdOCM]. For each patient, we created the dose influence matrix using the proton pencil beam calculation engine in the open-source package MatRad [WCW+17, WWG+18], assuming a relative biological effectiveness (RBE) of 1.11.1. The proton spots were situated on a rectangular grid with a spot spacing of 55 mm, and the grid covered the entire PTV plus 11 mm out from its perimeter. The voxel resolution and dose grid resolution were both 0.980.98 mm ×\times 0.980.98 mm ×\times 22 mm. Every patient plan was generated using two co-planar beams with beam angles selected using a Bayesian optimization algorithm [THS+20]. Table 1 provides more details.

Patient
1 2 3 4
Beam configuration 40,9040^{\circ},90^{\circ} 74,28574^{\circ},285^{\circ} 75,13075^{\circ},130^{\circ} 220,290220^{\circ},290^{\circ}
PTV volume (cm3) 162.7 169.7 129.9 12.4
Number of voxels (mm) 87012 117907 110869 50728
Number of spots (nn) 6378 7011 5257 713
Number of energy layers (GG) 56 70 62 38
Table 1: Head-and-neck cancer patient information.

Each patient had a single planning target volume (PTV) that was prescribed a dose of p=70p=70 Gy delivered in 3535 fractions of 22 Gy per fraction. The mean/max dose bounds for important structures are given in Table 2. We manually adjusted the weights in cost function 1, without exhaustive search, to obtain reasonable treatment plans. In the majority of cases, w¯i=1,w¯i=10\overline{w}_{i}=1,\underline{w}_{i}=10 for PTV voxels ii and w¯i=103\overline{w}_{i}=10^{-3} for all other voxels ii provided the best results.

Dose Bound (Gy)
dmaxd^{\textrm{\tiny max}} dmeand^{\textrm{\tiny mean}}
PTV 84 N/A
Left Parotid 73.5 26
Right Parotid 73.5 26
Mandible 70 N/A
Spinal Cord 45 N/A
Constrictors N/A 40
Brainstem 54 N/A
Table 2: Dose constraints for each structure. N/A means the max/mean dose was unbounded, i.e., dmax=+d^{\textrm{\tiny max}}=+\infty or dmean=+d^{\textrm{\tiny mean}}=+\infty.

We implemented the standard l1l_{1}, group l2l_{2}, and reweighted l1l_{1} based treatment planning methods in Python using CVXPY [DB16, AVDB18, DAM+22] and solved the associated optimization problems with MOSEK [ApS22]. All computational processes were executed on a 6464-bit PC with an AMD Ryzen 9 3900X CPU @ 3.803.80 GHz/ 1212 cores and 128128 GB RAM. For reweighted l1l_{1}, we ran the algorithm for K=3K=3 iterations due to the diminishing benefits of more iterations.

To facilitate comparisons, we scaled the group l2l_{2} regularizer so it lay in the same range as the standard l1l_{1} regularizer. First, we solved problem 4 with the standard l1l_{1} regularizer (equation 7) and λ=1\lambda=1. Let us call this solution x(1)x^{(1)}. Then, we computed a scaling term η>0\eta>0 such that ηr2(x(1))=r1(x(1))\eta r_{2}(x^{(1)})=r_{1}(x^{(1)}). When we ran the group l2l_{2} method, we used the scaled regularization function r~2(x;η):=ηr2(x)\tilde{r}_{2}(x;\eta):=\eta r_{2}(x) as the regularizer r(x)r(x) in problem 4. This allowed us to obtain better spot/energy layer comparison plots between the standard l1l_{1} and group l2l_{2} methods. As pointed out earlier, the reweighted l1l_{1} method is similarly scaled via step 4 of the algorithm.

After each method finished, we trimmed the optimal spot vector xx^{\star} further to increase sparsity. First, we zeroed out all elements xjx_{j}^{\star} that fell below a fraction γ(0,1)\gamma\in(0,1) of the maximum spot intensity, i.e., we set xj=0x_{j}^{\star}=0 if xj<γmaxjxjx_{j}^{\star}<\gamma\max_{j^{\prime}}x_{j^{\prime}}^{\star}. We then zeroed out all energy layers of the resulting x~\tilde{x}^{\star} that fell below the same fraction of the maximum layer intensity: for each g{1,,G}g\in\{1,\ldots,G\}, we set x~j=0\tilde{x}_{j}^{\star}=0 for all j𝒥gj\in\mathcal{J}_{g} if j𝒥gx~j<γmaxgj𝒥gxj\sum_{j^{\prime}\in\mathcal{J}_{g}}\tilde{x}_{j^{\prime}}^{\star}<\gamma\max_{g^{\prime}}\sum_{j^{\prime}\in\mathcal{J}_{g^{\prime}}}x_{j^{\prime}}^{\star}. A choice of γ=0.01\gamma=0.01 provided a reasonable trade-off between sparsity and dose coverage in our computational experiments.

3 Results

3.1 Simultaneous reduction of spots and energy layers

We first compared the results of the regularization methods on a single patient. Figure 1 depicts optimal spot intensities of the unregularized model and the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularized models for patient 22. For all three regularizers, a regularization weight of λ=5\lambda=5 was used; this choice accentuated the difference between their spot vectors. Without regularization, about one third of the total 70117011 spots are nonzero, with individual spot intensities ranging between 10310^{3} and 10410^{4}. Under l1l_{1} regularization, that fraction is reduced to only 13%13\%, or 918918 nonzero spots, as the l1l_{1} penalty encourages further sparsity. By contrast, with group l2l_{2} regularization, the number of active spots increases significantly to 50995099 or 72%72\%, while the average intensity drops to a little over 10310^{3}. Reweighted l1l_{1} regularization produced a spot vector with the lowest number of nonzero elements: just 541541 or 7.7%7.7\% of the spots are nonzero. For patient 22, these active spots tend to reside in the first beam, and their maximum intensity exceeds that of the other methods.

Figure 2 shows the optimal intensity of the energy layers for patient 22, using the three regularization models, all with λ=5\lambda=5. Over 95%95\% of the total 7070 energy layers are nonzero under the unregularized model. These active layers are divided fairly evenly into two clusters, which coincide with the two beams delineated by the vertical red line. The total intensity of each energy layer averages between 10410^{4} and 10510^{5}. With l1l_{1} regularization, the fraction of nonzero energy layers drops to a modest 80%80\%, where most of that reduction comes from deactivated layers at the edges of the clusters. Group l2l_{2} regularization results in a steeper drop in the fraction of active energy layers, down to 61%61\% with additional sparsity in the middle of both beam clusters. However, the reweighted l1l_{1} method performs better than both of these methods, cutting the number of nonzero energy layers down to only 1818 – a reduction of over 75%75\% – with a commensurate increase in the intensity of the active layers.

A summary of the results from the different regularization methods is given in Figure 3. For a fixed λ\lambda, it is clear that reweighted l1l_{1} achieves the lowest number of nonzero spots and nonzero energy layers out of all the methods.

Refer to caption
Figure 1: Optimal spot intensities resulting from the unregularized model and the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularized models (λ=5\lambda=5) for patient 22. The vertical red line divides the spots associated with the first beam (1–3516) from the second beam (3517–7011).
Refer to caption
Figure 2: Sum of spot intensities in each energy layer (1–70) for the unregularized model and the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularized models (λ=5\lambda=5) for patient 22. The vertical red line divides the layers associated with the first beam (1–33) from the second beam (34–70).
Refer to caption
Figure 3: Percentage of nonzero spots/energy layers (relative to the total number of spots/layers) for the unregularized model and the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularized models (λ=5\lambda=5) for patient 22.

3.2 Trade-off between delivery efficiency and PTV coverage

The regularization weight in the previous section was chosen to highlight the distinctions between the optimal intensity plots. However, λ\lambda must be carefully selected to balance the trade-off between the total delivery time (highly correlated with the sparsity of the spots/energy layers) and the quality of the resulting treatment plan. Figure 4 examines this trade-off for reweighted l1l_{1} regularization on patient 22 using two measures of plan quality: D98%98\% and D2%2\% for the PTV. For different values of λ\lambda, we solved problem 4 using the reweighted l1l_{1} method, counted up the number of nonzero spots/energy layers, and calculated the optimal dose vector and PTV dose percentiles. We then plotted a point corresponding to this result in each of the subfigures of Figure 4 with the sparsity metric on the vertical axis and the plan quality metric on the horizontal axis (e.g., the unregularized point λ=0\lambda=0 is marked by a triangle \triangle). By connecting the points in each subfigure, we obtained a set of Pareto optimal curves, which show the trade-off between plan delivery efficiency and plan quality.

The top left subfigure depicts the number of nonzero spots versus D98%98\% to the PTV for λ\lambda ranging from zero to 6.06.0 (marked by the square). As λ\lambda increases, the number of active spots decreases, but so does D98%98\%. A choice of λ=0.95\lambda=0.95 (marked by the star) achieves the lowest number of nonzero spots 720\approx 720, while still maintaining D98%98\% above 95%95\% of the prescription, indicated by the vertical gray dotted line at 66.566.5 Gy. A similar plot can be seen in the bottom left subfigure, which shows the number of nonzero energy layers versus D98%98\% to the PTV; the same choice of λ\lambda yields 2727 active layers. On the righthand side, the subfigures display the number of nonzero spots (top) and energy layers (bottom) versus D2%2\% to the PTV. As the regularization weight increases, D2%2\% also increases, but never exceeds 108%108\% of the prescription (as indicated by the dotted line at 75.675.6 Gy) for any λ0.95\lambda\leq 0.95. Thus, out of all the weights, λ=0.95\lambda=0.95 yields a good trade-off between sparsity and PTV coverage: it achieves a reduction of 89%89\% and 61%61\% in the number of spots and energy layers, respectively, while still fulfilling all target dose constraints.

Refer to caption
Figure 4: Number of nonzero spots (top) and energy layers (bottom) versus PTV dose percentile (left: D98%98\%; right: D2%2\%) for various values of the regularization weight λ\lambda, computed using the reweighted l1l_{1} method on patient 22. As λ\lambda increases, the curves sweep from the \triangle marker (λ=0.0\lambda=0.0) to the \blacksquare marker (λ=6.0\lambda=6.0). The vertical gray dotted lines indicate clinical dose constraints on the PTV (D98%>0.95p98\%>0.95p and D2%<1.08p2\%<1.08p, where p=70p=70 Gy is the prescription), and the red arrows indicate the directions of desirable change (increasing D98%98\%, decreasing D2%2\%, and decreasing number of nonzero spots/energy layers). A choice of weight λ=0.95\lambda=0.95, marked by the \bigstar, produces a plan with good sparsity that respects the clinical constraints.

3.3 Trade-off between delivery efficiency and overall plan quality

This section studies the Pareto optimal trade-off curves between spot/energy layer sparsity and treatment plan quality using different regularizers to determine which regularization method provides the best trade-off, i.e., the largest increase in sparsity for the least decrease in plan quality. Rather than plotting multiple dose-volume metrics, we focus on a single consolidated quality measure: the plan cost function (equation 1), which includes dose fidelity terms for the PTV and all organs-at-risk (OARs). A lower value of f(d¯,d¯)f(\overline{d},\underline{d}) at the optimum implies a higher quality treatment plan.

To facilitate comparison, we also focus on the relative change in sparsity (number of nonzero spots/energy layers) and plan cost with respect to the unregularized solution. Let xunregx_{unreg} be the optimal spots resulting from the unregularized problem 3, and xregx_{reg} be the optimal spots resulting from a particular regularization method. We evaluate the plan cost function given in equation 1 on xunregx_{unreg} (with d¯unreg=max(Axunregp,0)\overline{d}_{unreg}=\max(Ax_{unreg}-p,0) and d¯unreg=min(Axunregp,0)\underline{d}_{unreg}=-\min(Ax_{unreg}-p,0)) to obtain cunregc_{unreg}, and similarly on xregx_{reg} to obtain cregc_{reg}. Then, the relative percentage change in plan cost for the regularizer is 100(cregcunreg)/cunreg100(c_{reg}-c_{unreg})/c_{unreg}. The relative change in the number of nonzero spots (ss) and number of nonzero energy layers (ll) is defined in a similar fashion as 100(sregsunreg)/sunreg100(s_{reg}-s_{unreg})/s_{unreg} and 100(lreglunreg)/lunreg100(l_{reg}-l_{unreg})/l_{unreg}, respectively. Thus, to construct the trade-off curve, we solve the regularized problem for various values of λ\lambda and plot the relative change in sparsity versus the relative change in plan cost at each solution point.

For every patient, Figure 5 depicts the relative percentage change in the number of nonzero spots versus the relative percentage change in plan cost for the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularization methods. The origin corresponds to the unregularized plan (λ=0\lambda=0). Both the l1l_{1} and reweighted l1l_{1} trade-off curves drop sharply from the origin, attaining on average a 30%30\% to 45%45\% decrease in nonzero spots for a less than 10%10\% increase in plan cost, with reweighted l1l_{1} slightly outperforming l1l_{1} by on average 55 percentage points over all four patients. By contrast, the number of nonzero spots rises with group l2l_{2} regularization, increasing up to 140%140\% within the first 10%10\% to 15%15\% increase in plan cost for all except patient 44. This is consistent with our spot intensity plot for patient 22 (Figure 1), which shows the spot distribution is denser under group l2l_{2} than without regularization.

Figure 6 depicts the relative percentage change in the number of nonzero energy layers versus the relative percentage change in plan cost for the three regularization methods. Both l1l_{1} and group l2l_{2} trade-off curves decrease moderately from the origin, with group l2l_{2} averaging about 9.5%9.5\% lower number of nonzero layers for a given percentage increase in cost. This matches our observations in Figures 2 and 3 that the group l2l_{2} function is more effective at penalizing energy layers than the l1l_{1} norm.

However, the reweighted l1l_{1} method significantly outperforms both these regularizers. For patient 22, it achieves an over 50%50\% decrease in the number of nonzero energy layers for a less than 10%10\% increase in plan cost. For the other patients, it provides a 25%25\% to 35%35\% reduction in active energy layers with a less than 15%15\% increment in plan cost. The average reduction in the number of nonzero layers from reweighted l1l_{1} exceeds the best reduction from group l2l_{2} by 1212 percentage points, and the majority of this reduction is realized with only about 10%10\% cost to treatment plan quality, relative to the unregularized plan.

The vertical dotted lines in Figures 5 and 6 for patient 22 correspond to a 10%10\% increase in the plan cost. The intersection of these lines with the Pareto curves of different regularization methods demonstrates the reduction in the number of nonzero spots and energy layers obtained using different regularizers. Figure 7 (top right) depicts the dose-volume histogram (DVH) curves of the unregularized plan, the l1l_{1} regularized plan, and the reweighted l1l_{1} regularized plan at the same 10%10\% relative change in plan cost. (We chose not to show the DVHs of group l2l_{2} regularized plans because they overlap considerably with the DVHs of corresponding l1l_{1} regularized plans, and as we saw earlier, group l2l_{2} results in far worse delivery efficiency than l1l_{1} or reweighted l1l_{1} regularization). Compared to no regularization, the reweighted l1l_{1} method reduces the number of active spots and energy layers by more than 50%50\%, while providing relatively similar DVH curves with different trade-offs (compromised left parotid and PTV coverage/homogeneity, and improved right parotid and mandible). One can re-adjust the PTV/OAR weights in the plan cost function of the reweighted l1l_{1} problem to achieve more uniform trade-offs. In the same vein, compared to standard l1l_{1} regularization, reweighted l1l_{1} reduces the number of active spots and energy layers by about 10%10\% and 40%40\%, respectively, while producing almost identical DVH curves. The Pareto curves and DVHs of the other patients, also plotted at roughly 10%10\% relative change in plan cost, tell a similar story.

Refer to caption
Figure 5: Relative change in number of nonzero spots versus relative cost to plan quality with respect to the unregularized model. For patient 22, reweighted l1l_{1} regularization achieves a 57%57\% reduction in the number of nonzero spots at only a 10%10\% cost to overall plan quality, relative to the unregularized model, as indicated by the vertical gray dotted line.
Refer to caption
Figure 6: Relative change in number of nonzero energy layers versus relative cost to plan quality with respect to the unregularized model. For patient 22, reweighted l1l_{1} regularization achieves a 50%50\% reduction in the number of nonzero layers at only a 10%10\% cost to overall plan quality, relative to the unregularized model, as indicated by the vertical gray dotted line.
Refer to caption
Figure 7: DVH curves for each patient obtained from the unregularized model (solid), and the standard l1l_{1} (dashed) and reweighted l1l_{1} (dotted) models regularized to approximately 10%10\% relative cost to plan quality. The vertical gray line indicates the prescription p=70p=70 Gy.

3.4 Relative improvement in delivery time

The total plan delivery time is dependent on a multitude of machine-specific factors. In this section, we provide an estimate of how regularization directly impacts the delivery time, assuming a specific set of machine parameters. The total delivery time (TT) can be approximated by

T=Tb×(hb(x)1)++Te×(he(x)1)++g=1GTs×(hs(x,g)1)++j=1nxjTd,T=T_{b}\times(h_{b}(x)-1)_{+}+T_{e}\times(h_{e}(x)-1)_{+}+\sum_{g=1}^{G}T_{s}\times(h_{s}(x,\mathcal{I}_{g})-1)_{+}+\frac{\sum_{j=1}^{n}x_{j}}{T_{d}}, (13)

where hb(x)h_{b}(x) is the number of nonzero beams, he(x)h_{e}(x) is the number of nonzero energy layers, hs(x,g)h_{s}(x,\mathcal{I}_{g}) is the number of nonzero spots in energy layer gg, TbT_{b} is the beam switching time (gantry rotation plus beam setup time), TeT_{e} is the energy layer switching time, TsT_{s} is the spot travel time, and TdT_{d} is the proton dose rate [GCM+20, ZSL+22]. Following van de Water et al. (2015) [vdWKHH15], we let Tb=30T_{b}=30 seconds, Te=2T_{e}=2 seconds, Ts=0.01T_{s}=0.01 seconds, and Td=4×101160T_{d}=\frac{4\times 10^{11}}{60} protons/second. We compute TT using the unregularized model and the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularized models, with regularization weight λ\lambda chosen such that each regularized plan achieves a 10%10\% cost to plan quality. Then, we plot the relative percentage change in delivery time of each regularized model with respect to the unregularized model (i.e., 100(TregTunreg)/Tunreg100(T_{reg}-T_{unreg})/T_{unreg}).

The results for patient 22 are shown in Figure 8. Standard l1l_{1} reduces delivery time by a modest 13%13\%, while group l2l_{2} actually raises delivery time by 2%2\% due to the 149%149\% increase in the number of nonzero spots, which increases the spot delivery and spot travel time. By contrast, reweighted l1l_{1} achieves a 44%44\% reduction in delivery time through its simultaneous reduction of the number of nonzero spots and nonzero energy layers by 57%57\% and 51%51\%, respectively. This reduction comes at only a minor cost to the PTV and mandible – D2%2\% to the PTV goes up by 4%4\% and maximum dose to the mandible goes up by 2%2\%. Results for the other patients reflect similar outcomes, with reweighted l1l_{1} reducing total delivery time by between 20%20\% and 30%30\%; see Table 3 and Figure 9 in appendix B for additional data and plots.

Refer to caption
Figure 8: Relative change in delivery time and other plan metrics with respect to the unregularized model for l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularization for patient 22. For each regularizer, λ\lambda was chosen such that the regularized model resulted in about 10%10\% cost to plan quality. Here the conformity index (CI) is defined as the total number of voxels that received at least 95%95\% of the prescribed dose, divided by the number of voxels in the PTV [PSL17].

4 Discussion

This study proposed a method to improve the delivery efficiency of pencil beam scanning proton plans by simultaneously reducing the number of spots and energy layers using reweighted l1l_{1} regularization. One can exactly model the spot/energy layer reduction problem using the l0l_{0} regularizer, which in principle would improve plan delivery at the smallest possible cost to plan quality, but the l0l_{0}-regularized optimization problem is nonconvex and computationally prohibitive to solve. In imaging science and statistics, researchers often employ the l1l_{1} norm as a convex surrogate for the l0l_{0} norm, and in some cases (e.g., compressed sensing), the l1l_{1} norm has proven to be just as effective as the l0l_{0} norm at promoting sparsity [CRT06]. The reweighted l1l_{1} regularization method was proposed [CWB08] to bridge the gap between the l0l_{0} regularizer and the l1l_{1} regularizer by better approximating the l0l_{0} norm, while retaining the convexity of the l1l_{1} norm. In proton treatment planning, this property translates to improving the plan delivery efficiency at a lower cost to plan quality, which we have demonstrated in this work. Our limited computational experiments on four head-and-neck cancer patients show that, for the same cost to plan quality, the reweighted l1l_{1} method reduced the number of nonzero spots by up to 1010 percentage points more than standard l1l_{1} and the number of nonzero energy layers by 2525 to 3030 percentage points more than group l2l_{2} regularization.

Promoting spot/energy layer sparsity to improve plan delivery in IMPT is analogous to promoting beam profile smoothness to improve plan delivery in IMRT. Prior research has shown that plan delivery efficiency in IMRT can be significantly improved at minimal cost to dosimetric plan quality due to the phenomenon of degeneracy [AMNR02, LASP04]. The structure of the treatment planning problem results in a multitude of feasible plans with near-equal objective value (i.e., quality). This same phenomenon has been observed in IMPT planning problems [vdWBA+20, vdWKHH15, GCM+20, LCL+19], although unlike IMRT, it currently lacks a rigorous mathematical analysis. Our computational experiments demonstrated that with the reweighted l1l_{1} method, one can reduce the number of spots and energy layers by on average 40%40\% and 35%35\%, respectively, without significantly compromising the dosimetric plan quality.

In this study, we have adopted a constrained optimization framework, where the dosimetric plan quality is represented by a quadratic term in the objective, the sparsity promotion is carried out via a regularization penalty term, and the mean/max clinical dose criteria are enforced by hard constraints. However, the proposed reweighted l1l_{1} method is agnostic to the optimization framework and can also be used in conjunction with an automation tool (e.g., hierarchical optimization [ZHZ+22, BSH09, THDZ20], multiple criteria optimization (MCO) [CB08, MKBT08], knowledge-based planning (KBP) [AMT+12, SNZ+20]). DVH constraints and plan robustness may be integrated into the optimization problem using existing techniques in the literature [FUXB19, ZST+13, MHDZ20, FTZ23, THDZ20]. To limit the scope of this paper, we have not incorporated robustness into our formulation of the treatment planning problem. A preliminary robustness analysis of 1313 range and setup uncertainty scenarios shows that the reweighted l1l_{1} method generates plans with a similar level of robustness to the unregularized model’s plans. Moreover, even without accounting for these uncertainties in the optimization, the DVH curves and clinical metrics of the plans lie within an acceptable range across the majority of scenarios. Appendix B.1 provides details on our analysis and the resulting figures.

Finally, we mention that in this proof-of-concept work, we have not enforced the machine-specific minimum-monitor-unit (min-MU) constraint. One can enforce the min-MU constraint using a two-step optimization method as described in Lin et al. (2019) [LCL+19]: the first step identifies the active spots/energy layers (i.e., those with intensity greater than a pre-determined value), and the second step removes the inactive spots/energy layers and enforces the min-MU constraint on the remaining spots. Increasing the min-MU threshold also allows for a higher dose rate, which can accelerate the delivery of each spot. This is especially important because the intensities of the active spots usually increase with the overall sparsity of the spots/energy layers in the treatment plan. Gao et al. (2020) [GCM+20] suggested using different min-MU thresholds for each energy layer to further increase the dose rate and expedite spot delivery.

5 Conclusion

The reweighted l1l_{1} regularization method is capable of simultaneously reducing the number of spots and energy layers in a proton treatment plan, while imposing minimal cost to dosimetric plan quality. Moreover, it achieves a better trade-off between delivery efficiency and plan quality than standard l1l_{1} and group l2l_{2} regularization. Thus, reweighted l1l_{1} regularization is a powerful method for improving the delivery of proton therapy.

Acknowledgments

This work was partially supported by MSK Cancer Center Support Grant/Core Grant from the NIH (P30 CA008748).

References

  • [AMNR02] M. Alber, G. Meedt, F. Nüsslin, and R. Reemtsen. On the degeneracy of the IMRT optimization problem. Med. Phys., 29(11):2584–2589, 2002.
  • [AMT+12] L. M. Appenzoller, J. M. Michalski, W. L. Thorstad, S. Mutic, and K. L. Moore. Predicting dose-volume histograms for organs-at-risk in IMRT planning. Med. Phys., 39(12):7446–7461, 2012.
  • [ApS22] MOSEK ApS. The MOSEK Optimizer API for Python, Version 10.0, 2022.
  • [AVDB18] A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd. A rewriting system for convex optimization problems. J. Control Decis., 5(1):42–60, 2018.
  • [BdOCM] T. Bejarano, M. de Ornelas-Couto, and I. B. Mihaylov. Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT) [data set]. doi:10.7937/K9/TCIA.2018.13upr2xf.
  • [BKM16] D. Bertsimas, A. King, and R. Mazumder. Best subset selection via a modern optimization lens. Ann. Stat., 44(2):813–852, 4 2016. doi:10.1214/15-AOS1388.
  • [BPC+11] S. Boyd, N. Parikh, E. Chu, B. Peleato, , and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1):1–122, 2011.
  • [BSH09] S. Breedveld, P. Storchi, and B. Heijmen. The equivalence of multi-criteria methods for radiotherapy plan optimization. Phys. Med. Biol., 54(23):7199–7209, 2009.
  • [BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
  • [CB08] D. Craft and T. Bortfeld. How many plans are needed in an IMRT multi-objective plan database? Phys. Med. Biol., 53(11):2785–2796, 2008.
  • [CLL+14] W. Cao, G. Lim, L. Liao, Y. Li, S. Jiang, X. Li, H. Li, K. Suzuki, X. R. Zhu, D. Gomez, and X. Zhang. Proton energy optimization and reduction for intensity-modulated proton therapy. Phys. Med. Biol., 59(21):6341–6354, 2014. doi:10.1088/0031-9155/59/21/6341.
  • [CRT06] E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math., 59(8):1207–1223, 2006.
  • [CVS+13] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, and F. Prior. The cancer imaging archive (TCIA): Maintaining and operating a public information repository. J. Digit. Imaging, 26(6):1045–1057, 2013.
  • [CWB08] E. J. Candés, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted l1l_{1} minimization. J. Fourier Anal. Appl., 14(1):877–905, 2008. doi:10.1007/s00041-008-9045-x.
  • [DAM+22] S. Diamond, A. Agrawal, R. Murray, B. Stellato, and S. Boyd. CVXPY: Disciplined Convex Programming in Python, Version 1.3, 2022.
  • [DB16] S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. J. Mach. Learn. Res., 17(83):1–5, 2016.
  • [DLZ+16] X. Ding, X. Li, M. Zhang, P. Kabolizadeh, C. Stevens, and D. Yan. Spot-scanning proton arc (SPArc) therapy: The first robust and delivery-efficient spot-scanning proton arc therapy. Int. J. Radiat. Oncol. Biol. Phys., 96(5):1107–1116, 12 2016. doi:10.1016/j.ijrobp.2016.08.049.
  • [EBW+22] E. Engwall, C. Battinelli, V. Wase, O. Marthin, L. Glimelius, R. Bokrantz, B. Andersson, and A. Fredriksson. Fast robust optimization of proton PBS arc therapy plans using early energy layer selection and spot assignment. Phys. Med. Biol., 67(6):065010, 3 2022. doi:10.1088/1361-6560/ac55a6.
  • [Faz02] M. Fazel. Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002. https://faculty.washington.edu/mfazel/thesis-final.pdf.
  • [FBDN07] M. Figueiredo, J. Bioucas-Dias, and R. D. Nowak. Majorization–minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process., 16(12):2980–2991, 12 2007. doi:10.1109/TIP.2007.909318.
  • [FHB03] M. Fazel, H. Hindi, and S. Boyd. Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In Proceedings of the American Control Conference (ACM), pages 2156–2162, Denver, CO, 6 2003. doi:10.1109/ACC.2003.1243393.
  • [FTZ23] A. Fu, V. T. Taasti, and M. Zarepisheh. Distributed and scalable optimization for robust proton treatment planning. Med. Phys., 50(1):633–642, 1 2023. doi:10.1002/mp.15897.
  • [FUXB19] A. Fu, B. Ungun, L. Xing, and S. Boyd. A convex optimization approach to radiation treatment planning with dose constraints. Optim. Eng., 20:277–300, 2019.
  • [GCM+20] H. Gao, B. Clasie, M. McDonald, K. M. Langen, T. Liu, and Y. Lin. Technical note: Plan-delivery-time constrained inverse optimization method with minimum-MU-per-energy-layer (MMPEL) for efficient pencil beam scanning proton therapy. Med. Phys., 47(9):3892–3897, 9 2020.
  • [Gor22] B. L. Gorissen. Interior point methods can exploit structure of convex piecewise linear functions with application in radiation therapy. SIAM J. Optim., 32(1):256–275, 2022. doi:10.1137/21M1402364.
  • [GRL+20] W. Gu, D. Ruan, Q. Lyu, W. Zou, L. Dong, and K. Sheng. A novel energy layer optimization framework for spot-scanning proton arc therapy. Med. Phys., 47(5):2072–2084, 2 2020. doi:10.1002/mp.14083.
  • [HTW15] T. Hastie, R. Tibshirani, and M. Wainwright. Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability. Taylor and Francis, 2015.
  • [IPR16] S. Ivanoff, F. Picard, and V. Rivoirard. Adaptive lasso and group-lasso for functional Poisson regression. J. Mach. Learn. Res., 17(1):1903–1948, 2016.
  • [JHP+18] M. F. Jensen, L. Hoffmann, J. B. B. Petersen, D. S. Møller, and M. Alber. Energy layer optimization strategies for intensity-modulated proton therapy of lung cancer patients. Med. Phys., 45(10):4355–4363, 10 2018. doi:10.1002/mp.13139.
  • [JOV09] L. Jacob, G. Obozinski, and J.-P. Vert. Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning (ICML), pages 433–440, Montreal, Canada, 6 2009. doi:10.1145/1553374.1553431.
  • [LASP04] J. Llacer, N. Agazaryan, T. D. Solberg, and C. Promberger. Degeneracy, frequency response and filtering in IMRT optimization. Phys. Med. Biol., 49(13):2853–2880, 2004.
  • [LCL+19] Y. Lin, B. Clasie, T. Liu, M. McDonald, K. M. Langen, and H. Gao. Minimum-MU and sparse-energy-layer (MMSEL) constrained inverse optimization method for efficiently deliverable PBS, plans. Phys. Med. Biol., 64(20):205001, 2019. doi:10.1088/1361-6560/ab4529.
  • [LFB07] M. S. Lobo, M. Fazel, and S. Boyd. Portfolio optimization with linear and fixed transaction costs. Ann. Oper. Res, 152(1):341–365, 2007. doi:10.1007/s10479-006-0145-1.
  • [LH15] M. Lim and T. Hastie. Learning interactions via hierarchical group-lasso regularization. J. Comput. Graph. Stat., 24(3):627–654, 2015. doi:10.1080/10618600.2014.938812.
  • [LLZ+20] G. Liu, X. Li, L. Zhao, W. Zheng, A. Qin, S. Z hang, C. Stevens, D. Yan, P. Kabolizadeh, and X. Ding. A novel energy sequence optimization algorithm for efficient spot-scanning proton arc (SPArc) treatment delivery. Acta Oncol., 59(10):1178–1185, 2020. doi:10.1080/0284186X.2020.1765415.
  • [LZZ15] H. Li, X. R. Zhu, and X. Zhang. Reducing dose uncertainty for spot-scanning proton beam therapy of moving tumors by optimizing the spot delivery sequence. Int. J. Radiat. Oncol. Biol. Phys., 93(3):547–556, 2015. doi:10.1016/j.ijrobp.2015.06.019.
  • [Mai15] J. Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim., 25(2):829–855, 2015. doi:10.1137/140957639.
  • [MCN+20] D. Mah, C. C. Chen, A. O. Nawaz, G. Galbreath, R. Shmulenson, N. Lee, and B. Chon. Retrospective analysis of reduced energy switching and room switching times on throughput efficiency of a multi-room proton therapy center. Brit. J. Radiol., 93(1107):20190820, 3 2020. doi:10.1259/bjr.20190820.
  • [MHDZ20] S. Mukherjee, L. Hong, J. O. Deasy, and M. Zarepisheh. Integrating soft and hard dose-volume constraints into hierarchical constrained IMRT optimization. Med. Phys., 47(2):414–421, 2020.
  • [MKBT08] M. Monz, K.-H. Küfer, T. Bortfeld, and C. Thieke. Pareto navigation—algorithmic foundation of interactive multi-criteria IMRT planning. Phys. Med. Biol., 53(4):985–998, 2008.
  • [MSH07] S. Ma, X. Song, and J. Huang. Supervised group lasso with applications to microarray data analysis. BMC Bioinform., 8(60):1–17, 2 2007. doi:10.1186/1471-2105-8-60.
  • [MvdGB08] L. Meier, S. van de Geer, and P. Bühlmann. The group lasso for logistic regression. J. R. Stat. Soc., B: Stat. Methodol., 70(1):53–71, 2 2008. doi:10.1111/j.1467-9868.2007.00627.x.
  • [MW16] B. S. Müller and J. J. Wilkens. Prioritized efficiency optimization for intensity modulated proton therapy. Phys. Med. Biol., 61(23):8249–8265, 2016. doi:10.1088/0031-9155/61/23/8249.
  • [NW06] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, 2006.
  • [Pag12] H. Paganetti. Range uncertainties in proton therapy and the role of Monte Carlo simulations. Phys. Med. Biol., 57(11):R99–R117, 2012.
  • [PEL+18] P. R. Poulsen, J. Eley, U. Langner, II C. B. Simone, and K. Langen. Efficient interplay effect mitigation for proton pencil beam scanning by spot-adapted layered repainting evenly spread out over the full breathing cycle. Int. J. Radiat. Oncol. Biol. Phys., 100(1):226–234, 1 2018.
  • [PH07] M. Y. Park and T. Hastie. l1l_{1}-regularization path algorithm for generalized linear models. J. R. Stat. Soc., B: Stat. Methodol., 69(4):659–677, 9 2007. doi:10.1111/j.1467-9868.2007.00607.x.
  • [PSL17] D. Petrova, S. Smickovska, and E. Lazarevska. Conformity index and homogeneity index of the postoperative whole breast radiotherapy. Open Access Maced. J. Med. Sci., 5(6):736–739, 2017.
  • [QSG13] Z. Qin, K. Scheinberg, and D. Goldfarb. Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput., 5(2):143–169, 6 2013. doi:10.1007/s12532-013-0051-x.
  • [Rak11] A. Rakotomamonjy. Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Signal Process., 91(7):1505–1526, 7 2011. doi:10.1016/j.sigpro.2011.01.012.
  • [SBP17] Y. Sun, P. Babu, and D. P. Palomar. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Process., 65(3):794–816, 2 2017.
  • [SFR07] M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l1l_{1} regularization: A comparative study and two new approaches. In European Conference on Machine Learning (ECML), pages 286–297, Warsaw, Poland, 9 2007.
  • [SNZ+20] C. Shen, D. Nguyen, Z. Zhou, S. B. Jiang, B. Dong, and X. Jia. An introduction to deep learning in medical physics: Advantages, potential, and challenges. Phys. Med. Biol., 65(5):05TR01, 2020.
  • [SPS+16] K. Suzuki, M. B. Palmer, N. Sahoo, X. Zhang, F. Poenisch, D. S. Mackin, A. Y. Liu, R. Wu, X. R. Zhu, S. J. Frank, M. T. Gillin, and A. K. Lee. Quantitative analysis of treatment process time and throughput capacity for spot scanning proton therapy. Med. Phys., 43(7):3975–3986, 7 2016.
  • [SSW10] E. D. Schifano, R. L. Strawderman, and M. T. Wells. Majorization-minimization algorithms for nonsmoothly penalized objective functions. Electron. J. Stat., 4(1):1258–1299, 2010. doi:10.1214/10-EJS582.
  • [ST12] N. Simon and R. Tibshirani. Standardization and the group lasso. Stat. Sin., 22(3):983–1001, 7 2012. doi:10.5705/ss.2011.075.
  • [SYOS10] J. Shi, W. Yin, S. Osher, and P. Sajda. A fast hybrid algorithm for large-scale l1l_{1}-regularized logistic regression. J. Mach. Learn. Res., 11(23):713–741, 2010.
  • [TBD+18] V. T. Taasti, C. Bäumer, C. V. Dahlgren, A. J. Deisher, M. Ellerbrock, J. Free, J. Gora, A. Kozera, A. J. Lomax, L. De Marzi, S. Molinelli, B.-K. K. Teo, P. Wohlfahrt, J. B. B. Petersen, L. P. Muren, D. C. Hansen, and C. Richter. Inter-centre variability of CT-based stopping-power prediction in particle therapy: Survey-based evaluation. Phys. Imaging Radiat., 6(1):25–30, 2018.
  • [THDZ20] V. T. Taasti, L. Hong, J. O. Deasy, and M. Zarepisheh. Automated proton treatment planning with robust optimization using constrained hierarchical optimization. Med. Phys., 47(7):2779–2790, 2020.
  • [THS+20] V. T. Taasti, L. Hong, J. S. Shim, J. O. Deasy, and M. Zarepisheh. Automating proton treatment planning with beam angle selection using bayesian optimization. Med. Phys., 47(8):3286–3296, 8 2020. doi:10.1002/mp.14215.
  • [Tib96] R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc., B: Stat. Methodol., 58(1):267–288, 1996.
  • [VBL13] D. Vidaurre, C. Bielza, and P. Larrañaga. A survey of l1l_{1} regression. Int. Stat. Rev., 81(3):361–381, 12 2013.
  • [vdWBA+20] S. van de Water, M. F. Belosi, F. Albertini, C. Winterhalter, D. C. Weber, and A. J. Lomax. Shortening delivery times for intensity-modulated proton therapy by reducing the number of proton spots: An experimental verification. Phys. Med. Biol., 65(9):095008, 5 2020. doi:10.1088/1361-6560/ab7e7c.
  • [vdWKHH15] S. van de Water, H. M. Kooy, B. J. M. Heijmen, and M. S. Hoogeman. Shortening delivery times of intensity modulated proton therapy by reducing proton energy layers during treatment plan optimization. Int. J. Radiat. Oncol. Biol. Phys., 92(2):460–468, 2015. doi:10.1016/j.ijrobp.2015.01.031.
  • [WCW+17] H.-P. Wieser, E. Cisternas, N. Wahl, S. Ulrich, A. Stadler, H. Mescher, L.-R. Müller, T. Klinge, H. Gabrys, L. Burigo, A. Mairani, S. Ecker, B. Ackermann, M. Ellerbrock, K. Parodi, O. Jäkel, and M. Bangert. Development of the open-source dose calculation and optimization toolkit matRad. Med. Phys., 44(6):2556–2568, 2017.
  • [WL08] T. T. Wu and K. Lange. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat., 2(1):224–244, 2008. doi:10.1214/07-AOAS147.
  • [Wri97] S. J. Wright. Primal-Dual Interior Point Methods. Society for Industrial and Applied Mathematics, 1997.
  • [WSGJ+22] S. Wuyckens, M. Saint-Guillain, G. Janssens, L. Zhao, X. Li, X. Ding, E. Sterpin, J. A. Lee, and K. Souris. Treatment planning in arc proton therapy: Comparison of several optimization problem statements and their corresponding solvers. Comput. Biol. Med., 148(1):105609, 9 2022. doi:10.1016/j.compbiomed.2022.105609.
  • [WWG+18] H.P. Wieser, N. Wahl, H.S. Gabryś, L.R. Müller, G. Pezzano, J. Winter, S. Ulrich, L. Burigo, O. Jäkel, and M. Bangert. MatRad – an open-source treatment planning toolkit for educational purposes. Med. Phys. Int., 6(1):119–127, 2018.
  • [WZSG+22] S. Wuyckens, L. Zhao, M. Saint-Guillain, G. Janssens, E. Sterpin, K. Souris, X. Ding, and J. A. Lee. Bi-criteria Pareto optimization to balance irradiation time and dosimetric objectives in proton arc therapy. Phys. Med. Biol., 67(1):245017, 12 2022. doi:10.1088/1361-6560/aca5e9.
  • [YL06] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., B: Stat. Methodol., 68(1):49–67, 2 2006. doi:10.1111/j.1467-9868.2005.00532.x.
  • [YZ15] Y. Yang and H. Zou. A fast unified algorithm for solving group-lasso penalized learning problems. Stat. Comput., 25(1):1129–1141, 2015. doi:10.1007/s11222-014-9498-5.
  • [ZHZ+22] M. Zarepisheh, L. Hong, Y. Zhou, Q. Huang, J. Yang, G. Jhanwar, H. D. Pham, P. Dursun, P. Zhang, M. A. Hunt, G. S. Mageras, J. T. Yang, Y. Yamada, and J. O. Deasy. Automated and clinically optimal treatment planning for cancer radiotherapy. INFORMS J. Appl. Anal., 52(1):69–89, 2022.
  • [ZSL+22] G. Zhang, H. Shen, Y. Lin, R. C. Chen, Y. Long, and H. Gao. Energy layer optimization via energy matrix regularization for proton spot-scanning arc therapy. Med. Phys., 49(9):5752–5762, 9 2022. doi:10.1002/mp.15836.
  • [ZST+13] M. Zarepisheh, M. Shakourifar, G. Trigila, P. S. Ghomi, S. Couzens, A. Abebe, L. Noreña, W. Shang, S. B. Jiang, and Y. Zinchenko. A moment-based approach for DVH-guided radiotherapy treatment plan optimization. Phys. Med. Biol., 58(6):1869–1887, 2013.

Appendix A Derivation of problem 3

Problem 2 in the main text is a nonconvex optimization problem because the equality constraints d¯=(Axp)+\overline{d}=(Ax-p)_{+} and d¯=(Axp)\underline{d}=(Ax-p)_{-} are nonlinear. We will show that it is equivalent to the convex optimization problem 3, which replaces the nonlinear equality constraints with the linear equality constraint Axd¯+d¯=pAx-\overline{d}+\underline{d}=p. Since the objective functions are identical, it is sufficient to prove that any solution of problem 2 is a feasible point for problem 3 and vice versa.

Let v:=(x,d¯,d¯)v^{\star}:=(x^{\star},\overline{d}^{\star},\underline{d}^{\star}) be a solution of problem 2. Then,

Axd¯+d¯p\displaystyle Ax^{\star}-\overline{d}^{\star}+\underline{d}^{\star}-p =(Axp)(Axp)++(Axp)\displaystyle=(Ax^{\star}-p)-(Ax^{\star}-p)_{+}+(Ax^{\star}-p)_{-}
=(Axp)(max(Axp,0)+min(Axp,0))=0,\displaystyle=(Ax^{\star}-p)-(\max(Ax^{\star}-p,0)+\min(Ax^{\star}-p,0))=0,

because any real function can be written as the sum of its nonnegative and nonpositive parts. Thus, vv^{\star} is feasible for problem 3, and since the two problems have the same objective function, vv^{\star} is a solution of problem 3.

Now let us show the converse. First, we will relax the equality constraints in problem 2 to get

minimizef(d¯,d¯)subject tod¯(Axp)+,d¯(Axp),Bxcx0,d¯0,d¯0.\begin{array}[]{ll}\mbox{minimize}&f(\overline{d},\underline{d})\\ \mbox{subject to}&\overline{d}\geq(Ax-p)_{+},\quad\underline{d}\geq(Ax-p)_{-},\quad Bx\leq c\\ &x\geq 0,\quad\overline{d}\geq 0,\quad\underline{d}\geq 0.\end{array} (14)

This relaxed problem is equivalent to problem 2 because the relaxed constraints, d¯(Axp)+\overline{d}\geq(Ax-p)_{+} and d¯(Axp)\underline{d}\geq(Ax-p)_{-}, are tight at the optimum, since the objective function 1 is monotonically increasing over the nonnegative reals. Any feasible point (x^,d¯^,d¯^)(\hat{x},\hat{\overline{d}},\hat{\underline{d}}) of problem 14 that does not satisfy d¯^=(Ax^p)+\hat{\overline{d}}=(A\hat{x}-p)_{+} and d¯^=(Ax^p)\hat{\underline{d}}=(A\hat{x}-p)_{-} cannot be optimal, as we can perturb it to obtain another feasible point with a strictly lower objective value.

For example, suppose d¯^i>min(Ax^p,0)i\hat{\underline{d}}_{i}>\min(A\hat{x}-p,0)_{i} for a voxel i{1,,m}i\in\{1,\ldots,m\}, then we can decrease d¯^i\hat{\underline{d}}_{i} by some small quantity δ>0\delta>0 to get d¯~:=d¯^δei0\tilde{\underline{d}}:=\hat{\underline{d}}-\delta e_{i}\geq 0 that still satisfies the inequality constraint. (Here ei{0,1}ne_{i}\in\{0,1\}^{n} is the indicator vector of index ii, which equals 1 at ii and 0 elsewhere). The new point (x^,d¯^,d¯~)(\hat{x},\hat{\overline{d}},\tilde{\underline{d}}) is feasible for problem 14, but attains a lower objective f(d¯^,d¯~)<f(d¯^,d¯^)f(\hat{\overline{d}},\tilde{\underline{d}})<f(\hat{\overline{d}},\hat{\underline{d}}) because the quadratic term w¯id¯~i2<w¯id¯^i2\underline{w}_{i}\tilde{\underline{d}}_{i}^{2}<\underline{w}_{i}\hat{\underline{d}}_{i}^{2} by our choice of δ>0\delta>0, assuming w¯i0\underline{w}_{i}\neq 0. If w¯i=0\underline{w}_{i}=0, the underdose to voxel ii has no impact on the objective and can be removed entirely from the optimization problem.

It suffices then to show that any solution v=(x,d¯,d¯)v^{\star}=(x^{\star},\overline{d}^{\star},\underline{d}^{\star}) of problem 3 is a solution of problem 14. This is true because Axd¯+d¯=pAx^{\star}-\overline{d}^{\star}+\underline{d}^{\star}=p and d¯0,d¯0\overline{d}^{\star}\geq 0,\underline{d}^{\star}\geq 0 imply

d¯\displaystyle\overline{d}^{\star} =Axp+d¯Axp,\displaystyle=Ax^{\star}-p+\underline{d}^{\star}\geq Ax^{\star}-p,
d¯\displaystyle\underline{d}^{\star} =(Axp)+d¯(Axp).\displaystyle=-(Ax^{\star}-p)+\overline{d}^{\star}\geq-(Ax^{\star}-p).

Thus, we can conclude that

d¯\displaystyle\overline{d}^{\star} max(Axp,0)=(Axp)+,\displaystyle\geq\max(Ax^{\star}-p,0)=(Ax^{\star}-p)_{+},
d¯\displaystyle\underline{d}^{\star} max((Axp),0)=min(Axp,0)=(Axp,0).\displaystyle\geq\max(-(Ax^{\star}-p),0)=-\min(Ax^{\star}-p,0)=(Ax^{\star}-p,0)_{-}.

Appendix B Delivery efficiency and plan quality metrics

In this section, we present further results on the delivery time and other clinical metrics. Figure 9 is an extension of Figure 8 in the main text, depicting the relative percentage change in total delivery time, number of nonzero spots/energy layers, and several dosimetric quantities at 10%10\% cost to plan quality for patients 11, 33, and 44. Table 3 provides the absolute values for the metrics plotted in the two figures. Together, they show that reweighted l1l_{1} regularization outperforms l1l_{1} and group l2l_{2} regularization in all patients.

Refer to caption
Figure 9: Relative change in delivery time and other plan metrics with respect to the unregularized model for l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularization. For each regularizer, λ\lambda was chosen such that the regularized model resulted in about 10%10\% cost to plan quality. Here the conformity index (CI) is defined as the total number of voxels that received at least 95%95\% of the prescribed dose, divided by the number of voxels in the PTV [PSL17].
Unregularized l1l_{1} Group l2l_{2} Reweighted l1l_{1}
Patient 1 Num. Nonzero Spots 1796 1102 4264 1061
Num. Nonzero Layers 55 53 48 35
Delivery Time (s) 155.41 144.49 166.16 108.26
PTV: D2% (Gy) 71.90 73.10 73.18 72.70
PTV: D98% (Gy) 68.17 68.46 68.29 68.34
PTV: CI 1.40 1.37 1.38 1.40
Mandible: Dmax (Gy) 67.49 68.50 67.62 68.77
Parotids: Dmean (Gy) 11.50 8.96 11.53 7.95
Patient 2 Num. Nonzero Spots 2295 1182 5719 976
Num. Nonzero Layers 67 61 52 33
Delivery Time (s) 184.28 161.21 188.67 103.43
PTV: D2% (Gy) 70.37 73.49 73.61 73.30
PTV: D98% (Gy) 68.21 67.62 67.73 67.93
PTV: CI 1.49 1.44 1.45 1.48
Mandible: Dmax (Gy) 64.86 63.84 67.36 66.13
Parotids: Dmean (Gy) 2.04 1.29 3.63 1.49
Patient 3 Num. Nonzero Spots 1751 1247 3738 1172
Num. Nonzero Layers 58 55 50 41
Delivery Time (s) 160.93 149.92 164.88 121.31
PTV: D2% (Gy) 70.54 72.44 71.95 72.12
PTV: D98% (Gy) 67.70 66.97 67.06 67.14
PTV: CI 1.84 1.78 1.78 1.83
Mandible: Dmax (Gy) 67.06 67.53 65.91 67.11
Parotids: Dmean (Gy) 13.41 13.54 13.37 13.51
Patient 4 Num. Nonzero Spots 259 195 480 169
Num. Nonzero Layers 37 35 36 27
Delivery Time (s) 104.22 99.60 104.44 83.42
PTV: D2% (Gy) 71.27 72.47 71.77 71.88
PTV: D98% (Gy) 68.28 67.71 67.95 67.88
PTV: CI 1.59 1.48 1.54 1.56
Mandible: Dmax (Gy) 64.03 62.29 63.32 64.55
Parotids: Dmean (Gy) 7.01 5.83 6.79 6.14
Table 3: Values of various metrics under the unregularized plan and the l1l_{1}, group l2l_{2}, and reweighted l1l_{1} regularized plans. For each patient and regularizer, λ\lambda was chosen such that the regularized plan resulted in a 10%10\% cost to plan quality relative to the unregularized plan (e.g., as shown by the dotted line in Figure 5).

B.1 Robustness analysis

To analyze the robustness of the proton plans, we generated 1313 uncertainty scenarios (including the nominal scenario) by rescaling the stopping power ratio (SPR) image ±3.5%\pm 3.5\% to simulate range over/undershoot errors [TBD+18, Pag12] and shifting the isocenter ±3\pm 3 mm in the xx, yy, and zz directions to simulate setup errors. For each scenario r{1,,13}r\in\{1,\ldots,13\}, we computed the dose influence matrix Ar𝐑+m×nA^{r}\in\mathbf{R}_{+}^{m\times n}. We solved the unregularized treatment planning problem 3 using only the dose influence matrix of the nominal scenario to get the optimal nominal spot vector xunregx_{unreg}^{\star} and calculated the dose in each scenario to be dunregr=Arxunregd_{unreg}^{r}=A^{r}x_{unreg}^{\star}. We repeated this process, again using only the nominal dose influence matrix in the optimization, with the reweighted l1l_{1} regularization method to obtain doses drewl1r=Arxrewl1d_{rewl_{1}}^{r}=A^{r}x_{rewl_{1}}. (As before, λ\lambda was chosen such that the reweighted l1l_{1} plan results in a 10%10\% cost to plan quality relative to the unregularized plan). In the end, for each patient, we had two sets of dose vectors representing the potential scenario outcomes if we were to treat using the unregularized plan and the reweighted l1l_{1} regularized plan: (dunreg1,,dunreg13)(d_{unreg}^{1},\ldots,d_{unreg}^{13}) and (drewl11,,drewl113)(d_{rewl_{1}}^{1},\ldots,d_{rewl_{1}}^{13}).

Figures 1013 depict the DVH bands resulting from these sets of doses for each patient. Every band delineates the range of DVH curves for a particular structure, taken across all uncertainty scenarios. The corresponding solid line is the DVH curve in the nominal scenario. Generally, the DVH bands of the reweighted l1l_{1} regularized plan are similar to those of the unregularized plan. For patients 11 and 44, the reweighted l1l_{1} plan results in a slightly narrower PTV band (smaller range of uncertainty), while the opposite is true for patients 22 and 33. In the case of OARs, the most noticeable difference is the reweighted l1l_{1} plan exhibits thinner bands for the mandible in patient 11, but wider bands for the left parotid in patient 33. All other structures have comparable DVH bands between the unregularized and the reweighted l1l_{1} regularized plans. We also provide the median and range of several clinical metrics in Table 4. These values support our conclusion that the reweighted l1l_{1} method produces a treatment plan with similar robustness to the plan generated by the unregularized model, and its resulting dose satisfies clinical constraints in the majority of our uncertainty scenarios.

Refer to caption
Figure 10: DVH bands across all uncertainty scenarios obtained from the unregularized plan and the reweighted l1l_{1} regularized plan for patient 11. The solid lines indicate the DVH curves of the nominal scenario.
Refer to caption
Figure 11: DVH bands across all uncertainty scenarios obtained from the unregularized plan and the reweighted l1l_{1} regularized plan for patient 22. The solid lines indicate the DVH curves of the nominal scenario.
Refer to caption
Figure 12: DVH bands across all uncertainty scenarios obtained from the unregularized plan and the reweighted l1l_{1} regularized plan for patient 33. The solid lines indicate the DVH curves of the nominal scenario.
Refer to caption
Figure 13: DVH bands across all uncertainty scenarios obtained from the unregularized plan and the reweighted l1l_{1} regularized plan for patient 44. The solid lines indicate the DVH curves of the nominal scenario.
Unregularized Reweighted l1l_{1}
Median Range Median Range
Patient 1 PTV: D2% (Gy) 72.73 4.84 73.36 4.42
PTV: D98% (Gy) 64.06 12.69 62.96 13.60
Mandible: Dmax (Gy) 69.58 4.55 70.91 1.80
Left Parotid: Dmean (Gy) 25.70 9.42 17.76 8.14
Patient 2 PTV: D2% (Gy) 71.03 2.48 73.58 3.97
PTV: D98% (Gy) 64.89 9.07 64.18 9.89
Mandible: Dmax (Gy) 65.80 19.66 67.26 18.00
Left Parotid: Dmean (Gy) 1.15 1.16 2.32 2.08
Right Parotid: Dmean (Gy) 2.71 3.22 0.87 1.55
Patient 3 PTV: D2% (Gy) 71.66 4.08 72.73 5.33
PTV: D98% (Gy) 64.30 11.23 62.54 12.01
Mandible: Dmax (Gy) 67.33 5.63 67.12 3.31
Left Parotid: Dmean (Gy) 25.61 6.28 25.80 9.45
Spinal Cord: Dmax (Gy) 26.79 35.76 23.50 34.56
Patient 4 PTV: D2% (Gy) 71.29 7.97 71.85 7.62
PTV: D98% (Gy) 64.21 12.14 63.16 11.83
Mandible: Dmax (Gy) 69.70 4.94 70.22 3.69
Right Parotid: Dmean (Gy) 10.92 8.98 9.65 8.27
Constrictors: Dmean (Gy) 4.58 5.70 4.55 5.72
Table 4: The median and range (maximum - minimum) of various metrics calculated across 1313 uncertainty scenarios, using the unregularized plan and the reweighted l1l_{1} regularized plan optimized with respect to the nominal scenario. For each patient, λ\lambda was chosen such that the regularized plan resulted in a 10%10\% cost to plan quality relative to the unregularized plan.