This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Model Predictive Manipulation of Compliant Objects with Multi-Objective Optimizer and Adversarial Network for Occlusion Compensation

Jiaming Qi, Dongyu Li, Yufeng Gao, Peng Zhou, and David Navarro-Alarcon This work is supported in part by the Research Grants Council (RGC) of Hong Kong under grants 14203917 and 15212721, in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B090928001, and in part by the Jiangsu Industrial Technology Research Institute Collaborative Funding Scheme under grant 43-ZG9V.J. Qi and Y. Gao are with the Harbin Institute of Technology, Department of Control Science and Engineering, Harbin, Heilongjiang, China. (e-mail: [email protected], [email protected])D. Li is with Beihang University, School of Cyber Science and Technology, Beijing, China. (e-mail: [email protected])P. Zhou and D. Navarro-Alarcon are with The Hong Kong Polytechnic University, Department of Mechanical Engineering, Kowloon, Hong Kong. (e-mail: [email protected], [email protected])
Abstract

The robotic manipulation of compliant objects is currently one of the most active problems in robotics due to its potential to automate many important applications. Despite the progress achieved by the robotics community in recent years, the 3D shaping of these types of materials remains an open research problem. In this paper, we propose a new vision-based controller to automatically regulate the shape of compliant objects with robotic arms. Our method uses an efficient online surface/curve fitting algorithm that quantifies the object’s geometry with a compact vector of features; This feedback-like vector enables to establish an explicit shape servo-loop. To coordinate the motion of the robot with the computed shape features, we propose a receding-time estimator that approximates the system’s sensorimotor model while satisfying various performance criteria. A deep adversarial network is developed to robustly compensate for visual occlusions in the camera’s field of view, which enables to guide the shaping task even with partial observations of the object. Model predictive control is utilized to compute the robot’s shaping motions subject to workspace and saturation constraints. A detailed experimental study is presented to validate the effectiveness of the proposed control framework.

Index Terms:
Robotics; Visual Servoing; Deformable Objects; Occlusion Compensation; Model Predictive Control

I Introduction

The manipulation/shaping of deformable bodies by robots is a fundamental problem that has recently attracted the attention of many researchers [1]; Its complexity has forced researchers to develop new methods in a wide range of fundamental areas that include representation, learning, planning, and control. From an applied research perspective, this challenging problem has shown great potential in various economically-important tasks such as the assembly of compliant/delicate objects [2], surgical/medical robotics [3], cloth/fabric folding [4], etc. The manipulation of compliant materials contrast with its rigid-body counterpart in that physical interactions will invariably change the object’s shape, which introduces additional degrees-of-freedom to the typically unknown objects, and hence, complicates the manipulation task. While great progress has been achieved in recent years, the development of these types of embodied manipulation capabilities is still largely considered an open research problem in robotics and control.

There are various technical issues that hamper the implementation of these tasks in real-world unstructured environments, which largely differ from implementations in ideal simulation environments (a trend followed by many recent works). Here, we argue that to effectively servo-control the shape of deformable materials in the field, a sensor-based controller must possess the following features: 1) Efficient compression of the object’s high-dimensional shape; 2) Occlusion-tolerant estimation of the objects geometry; 3) Adaptive prediction of the differential shape-motion model; Our aim in this paper is precisely to develop a new shape controller endowed with all the above-mentioned features. The proposed method is formulated under the model predictive control (MPC) framework that enables to compute shaping actions that satisfy multiple performance criteria (a key property for real engineering applications).

I-A Related Work

Many researchers have previously studied this challenging problem (we refer the reader to [5, 6] for comprehensive reviews). To servo-control the object’s non-rigid shape, it is essential to design a low-dimensional feature representation that can capture the key geometric properties of the object. Several representation methods have been proposed before, e.g., geometric features based on points, angles, curvatures, etc [7, 8]; However, due to their hard-coded nature, these methods can only be used to represent a single shaping action. Other geometric features computed from contours and centerlines [9, 10, 11] can represent soft object deformation in a more general way. Various data-driven approaches have also been proposed to represent shapes, e.g., using fast point feature histograms [12], bottleneck layers [13, 14], principal component analysis [15], etc. However, there is no widely accepted approach to compute efficient/compact feature representations for 3D shape; This is still an open research problem.

Occlusions of a camera’s field of view pose many complications to the implementation of visual servoing controllers, as the computation of (standard) feedback features requires complete visual observations of the object at all times. Many methods have been developed to tackle this critical issue, e.g. [16] used the estimated interaction matrix to handle information loss in visual servoing, yet, this method requires a calibrated visual-motor model. Coherent point drift was utilized in [17] to register the topological structure from previous sequences and to predict occlusions; However, this method is sensitive to the initial point set, further affects the registration result. A structure preserved registration method was presented in [18] to track occluded objects; This approach has good accuracy and robustness to noise, yet, its efficiency decreases with the number of points. To efficiently implement vision-based strategies in the field, it is essential to develop algorithms that can robustly guide the servoing task, even in the presence of occlusions.

To visually guide the manipulation task, control methods must have some form of model that (at least approximately) describes that how the input robot motions produce output shape changes [19]; In the visual servoing community, such differential relation is typically captured by the so-called interaction (Jacobian) matrix [20]. Many methods have been proposed to address this issue, e.g. the Broyden update rule [21] is a classical algorithm to iteratively estimate this transformation matrix [7, 22, 23]. Although these types of algorithms do not require knowledge of the model’s structure, its estimation properties are only valid locally. Other approaches with global estimation properties include algorithms based on (deep) artificial neural networks [2], optimization-based algorithms [24], adaptive estimators [25], etc. However, the majority of existing methods estimate this model based on a single performance criterion (typically, a first-order Jacobian-like relation), which limits the types of dynamic responses that the robot can achieve during a task. A multi-objective model estimation is particularly important in the manipulation of compliant materials, as their mechanical properties are rarely known in practice, thus, making hard to meet various performance requirements.

To compute the active shaping motions, most control methods only formulate the problem in terms of the final target shape and do not typically consider the system’s physical constraints. MPC represents a feasible solution to these issues, as it performs the control tasks by optimizing cost functions over a finite time-horizon rather than finding the exact analytical solution [26]; This allows MPC to compute controls that guide the task while satisfying a set of constraints, e.g., control saturation, workspace bounds, etc. Despite its valuable and flexible properties, MPC has not been sufficiently studied in the context of shape control of deformable objects.

I-B Our Contribution

To solve the above-mentioned issues, in this paper we propose a new control framework to manipulate purely-elastic objects into desired configurations. The original features of our new methodology include: 1) A parametric shape descriptor to efficiently characterize 3D deformations based on online curve/surface fitting; 2) A robust shape prediction network based on adversarial neural networks to compensate visual occlusions; 3) An optimization-based estimator to approximate the deformation Jacobian matrix and satisfy various performance constraints; 4) An MPC-based motion controller to guide the shaping motions while simultaneously solving workspace and saturation constraints.

To the best of the authors’ knowledge, this is the first time that a shape servoing controller is developed with all the functions proposed in this paper. To validate the effectiveness of our new methodology, we report a detail experimental study with a robotic platform manipulating various types of compliant objects.

Refer to caption
Figure 1: Schematic diagram of the manipulations of elastic objects, including deformation (centerline, contour, and surface) and positioning tasks of rigid objects. The framework aims to command the robot to manipulate elastic objects into the target configurations.
Refer to caption
Figure 2: Various shape configurations, including centerline, contour and surface. The first row shows the pixel coordinates for each shape. The second row shows the Cartesian coordinates related to the corresponding pixel ones through OpenCV/RealSenseOpenCV/RealSense. The generated shapes are ordered, fixed-sampled and equidistant. The rigid object adopts surface representation.

II Problem Formulation

Notation: In this paper, we use the following frequently-used notation: Bold small letters, e.g., 𝐯\mathbf{v}, denote column vectors, while bold capital letters, e.g., 𝐌\mathbf{M}, denote matrices. Time evolving variables are denoted as 𝐱k\mathbf{x}_{k}, for kk as the discrete time instant. The n×mn\times m matrix of ones is denoted by 𝐈n×m\mathbf{I}_{n\times m} and the identity matrix as 𝐄n\mathbf{E}_{n}. 𝐋n\mathbf{L}_{n} represents the low triangle matrix of 𝐈n×n\mathbf{I}_{n\times n}, and \otimes represents the Kronecker product.

The schematic diagram of the proposed shape servoing framework is conceptually illustrated in Fig. 1. A depth camera with eye-to-hand configuration observes the shapes of elastic objects that are manipulated by the robot, see Fig. 2. We denote the 3D measurement points captured by the vision system as:

𝐜¯=[𝐜1,,𝐜N]3N,𝐜i=[xi,yi,zi]3\begin{array}[]{*{20}{c}}{\bar{\mathbf{c}}={{\left[{\mathbf{c}_{1}^{\intercal},\ldots,\mathbf{c}_{N}^{\intercal}}\right]}^{\intercal}}\in{\mathbb{R}^{3N}}},&{{\mathbf{c}_{i}}={{\left[{{x_{i}},{y_{i}},{z_{i}}}\right]}^{\intercal}}\in{\mathbb{R}^{3}}}\end{array} (1)

for NN as the number of points, and 𝐜i\mathbf{c}_{i} as the 3D coordinates of the iith point, expressed in the camera frame.

II-A Feature Sensorimotor Model

Let us denote the position of the robot’s end-effector by 𝐫3\mathbf{r}\in\mathbb{R}^{3}. As the dimension of the 3N3N observed points 𝐜¯\bar{\mathbf{c}} is typically very large, therefore, its direct use as a feedback signal for shape servocontrol is impractical. Therefore, an efficient controller may typically require to use some form of dimension reduction technique. To deal with this issue, we construct a feature vector 𝐬=𝐟s(𝐜¯):3Np\mathbf{s}=\mathbf{f}_{s}(\bar{\mathbf{c}}):\mathbb{R}^{3N}\mapsto\mathbb{R}^{p}, for p3Np\ll 3N, to represent the geometric feedback 𝐜¯\bar{\mathbf{c}} of the object. This compact feedback-like signal will be used to design our automatic 3D shape controller.

For purely elastic objects undergoing deformations, it is reasonable to model that the object configuration is only dependant on its potential energy 𝒫\mathcal{P} (thus, all inertial and viscous effects are neglected from our analysis [10]). We model that 𝒫\mathcal{P} is fully determined by the feedback feature vector 𝐬\mathbf{s} and the robot’s position 𝐫\mathbf{r} [27], i.e.: 𝒫=𝒫(𝐬,𝐫)\mathcal{P}=\mathcal{P}(\mathbf{s},\mathbf{r}). In steady-state, the extremum expression satisfies (𝒫/𝐬)=𝐚(𝐬,𝐫)=𝟎(\partial\mathcal{P}/\partial\mathbf{s})^{\intercal}=\mathbf{a}(\mathbf{s},\mathbf{r})=\mathbf{0} [28]. We then define the following matrices:

2𝒫𝐬2=𝐆(𝐬,𝐫),2𝒫𝐫𝐬=𝐊(𝐬,𝐫){\frac{\partial^{2}\mathcal{P}}{\partial\mathbf{s}^{2}}=\mathbf{G}(\mathbf{s},\mathbf{r})},\qquad{\frac{\partial^{2}\mathcal{P}}{{\partial{\mathbf{r}}}{\partial{\mathbf{s}}}}=\mathbf{K}(\mathbf{s},\mathbf{r})} (2)

which are useful to linearize the extremum equation (by using first-order Taylor’s series expansion) as follows:

𝐚(𝐬+Δ𝐬,𝐫+Δ𝐫)𝐚(𝐬,𝐫)+𝐆(𝐬,𝐫)Δ𝐬+𝐊(𝐬,𝐫)Δ𝐫\mathbf{a}(\mathbf{s}+\Delta\mathbf{s},\mathbf{r}+\Delta\mathbf{r})\approx\mathbf{a}(\mathbf{s},\mathbf{r})+\mathbf{G}(\mathbf{s},\mathbf{r})\Delta\mathbf{s}+\mathbf{K}(\mathbf{s},\mathbf{r})\Delta\mathbf{r} (3)

for Δ𝐬\Delta\mathbf{s} and Δ𝐫\Delta\mathbf{r} as small changes. Note that as 𝐚(𝐬+Δ𝐬,𝐫+Δ𝐫)=𝐚(𝐬,𝐫)=𝟎\mathbf{a}(\mathbf{s}+\Delta\mathbf{s},\mathbf{r}+\Delta\mathbf{r})=\mathbf{a}(\mathbf{s},\mathbf{r})=\mathbf{0} is satisfied, we can obtain the following motion model:

Δ𝐬𝐆(𝐬,𝐫)1𝐊(𝐬,𝐫)Δ𝐫=𝐉(𝐬,𝐫)𝐮\Delta\mathbf{s}\approx-\mathbf{G}(\mathbf{s},\mathbf{r})^{-1}\mathbf{K}(\mathbf{s},\mathbf{r})\Delta\mathbf{r}=\mathbf{J}(\mathbf{s},\mathbf{r})\mathbf{u} (4)

where 𝐉(𝐬,𝐫)=𝐆(𝐬,𝐫)1𝐊(𝐬,𝐫)p×3\mathbf{J}(\mathbf{s},\mathbf{r})=-\mathbf{G}(\mathbf{s},\mathbf{r})^{-1}\mathbf{K}(\mathbf{s},\mathbf{r})\in\mathbb{R}^{p\times 3} represents the deformation Jacobian matrix (which depends on both the feature vector and robot position), and 𝐮\mathbf{u} represents the robot’s motion control input. This model can be expressed in an intuitive discrete-time form:

Δ𝐬k+1=𝐉k(𝐬k,𝐫k)𝐮k\Delta\mathbf{s}_{k+1}=\mathbf{J}_{k}(\mathbf{s}_{k},\mathbf{r}_{k})\mathbf{u}_{k} (5)

The deformation Jacobian matrix (DJM) 𝐉k\mathbf{J}_{k} indicates how the robot’s action 𝐮k=𝐫k𝐫k1\mathbf{u}_{k}=\mathbf{r}_{k}-\mathbf{r}_{k-1} produces changes in the feedback features Δ𝐬k+1=𝐬k+1𝐬k\Delta\mathbf{s}_{k+1}=\mathbf{s}_{k+1}-\mathbf{s}_{k}. Clearly, the analytical computation of 𝐉k\mathbf{J}_{k} requires knowledge of the physical properties and model of the elastic object and the vision system, which are difficult to obtain in practice. Thus, numerical methods are often used to approximate this matrix in real-time, which enables to perform vision-guided manipulation tasks.

In this paper, we consider a robot manipulator whose control inputs represent velocity commands (here, modelled as the differential changes Δ𝐫\Delta\mathbf{r}). It is assumed that Δ𝐫\Delta\mathbf{r} can be instantaneously executed without delay [11].

Problem statement. Design a vision-based control method to automatically manipulate a compliant object into a target 3D configuration, while simultaneously compensating for visual occlusions of the camera and estimating the deformation Jacobian matrix of the object-robot system.

III Shape Representation

This section presents how online curve/surface fitting (least-squares minimization (LSM) [29] and moving least squares (MLS) [30]) is combined with a parametric shape descriptor to compute a compact vector of feedback shape features.

III-A LSM-Based Features

III-A1 Centerline and Contour Extraction

Both are expressed as a parametric curve dependent on the normalized arc-length 0ρ10\leq{\rho}\leq{1}. Then, the point can be represented as 𝐜i=𝐟(ρi)\mathbf{c}_{i}=\mathbf{f}(\rho_{i}), with ρi\rho_{i} as the arc-length between the start point 𝐜1\mathbf{c}_{1} and 𝐜i\mathbf{c}_{i}, where ρ1=0\rho_{1}=0 and ρN=1\rho_{N}=1. The fitting functional 𝐟()\mathbf{f}(\cdot) is constructed as follows:

𝐟(ρ)=j=0n𝐩jBj,n(ρ){\mathbf{f}\left({{\rho}}\right)=\sum\limits_{j=0}^{n}{{\mathbf{p}_{j}}{B_{j,n}}\left({{\rho}}\right)}} (6)

where the vector 𝐩j3\mathbf{p}_{j}\in\mathbb{R}^{3} denotes the shape weights, nn\in\mathbb{N}^{*} specifies the fitting order, and the scalar Bj,n(ρ)B_{j,n}(\rho) represents a parametric regression function, which may take various forms, such as: [31]

  • Polynominal parameterization [32]:

    Bj,n(ρ)=ρj{B_{j,n}}\left({{\rho}}\right)=\rho^{j} (7)
  • Bernstein parameterization [33]:

    Bj,n(ρ)=Cnj(1ρ)njρjB_{j,n}\left({{\rho}}\right)=C_{n}^{j}{\left({1-{\rho}}\right)^{n-j}}\rho^{j} (8)

    where CbjC_{b}^{j} represents the binomial coefficient.

  • Cox-deBoor parameterization [34]:

    Bj,n(ρ)=1n!l=0nj((1)lCn+1l(ρ+njl)n){B_{j,n}}\left(\rho\right)=\frac{1}{{n!}}\sum\limits_{l=0}^{n-j}{\left({{{\left({-1}\right)}^{l}}C_{n+1}^{l}{{\left({\rho+n-j-l}\right)}^{n}}}\right)} (9)
  • Trigonometric parameterization [35]:

    Bj,n(ρ)={1,j=0cos(j+12ρ),j>0,jisoddsin(j2ρ),j>0,jiseven\displaystyle{B_{j,n}}\left({{\rho}}\right)=\left\{{\begin{array}[]{*{20}{c}}1,&{j=0}\\ {\cos(\frac{j+1}{2}\rho)},&{j>0,\ j\ \rm{is}\ \rm{odd}}\\ {\sin(\frac{j}{2}\rho)},&{j>0,\ j\ \rm{is}\ \rm{even}}\end{array}}\right. (13)

From (1) and (6), we can compute the following fitting cost function:

Q=(𝐁𝐬𝐜¯)(𝐁𝐬𝐜¯)Q={\left({\mathbf{B}\mathbf{s}-\bar{\mathbf{c}}}\right)^{\intercal}}\left({\mathbf{B}\mathbf{s}-\bar{\mathbf{c}}}\right) (14)

for a “tall” regression-like matrix 𝐁\mathbf{B} constructed as:

𝐁\displaystyle\mathbf{B} =[𝐁1,,𝐁N]3N×3(n+1)\displaystyle=[\mathbf{B}_{1}^{\intercal},\ldots,\mathbf{B}_{N}^{\intercal}]^{\intercal}\in\mathbb{R}^{3N\times 3(n+1)}
𝐁i\displaystyle\mathbf{B}_{i} =[B0,n(ρi),,Bn,n(ρi)]𝐄33×3(n+1)\displaystyle=[B_{0,n}(\rho_{i}),\ldots,B_{n,n}(\rho_{i})]\otimes\mathbf{E}_{3}\in\mathbb{R}^{3\times 3(n+1)} (15)

and 𝐬=[𝐩0,,𝐩n]3(n+1)\mathbf{s}=[\mathbf{p}_{0}^{\intercal},\ldots,\mathbf{p}_{n}^{\intercal}]^{\intercal}\in\mathbb{R}^{3(n+1)} as a compact feedback feature vector that represents the shape. We seek to minimize (14) to obtain a feature vector 𝐬\mathbf{s} that closely approximates 𝐜¯𝐁𝐬\bar{\mathbf{c}}\approx\mathbf{B}\mathbf{s}. The solution to (14) is:

𝐬=(𝐁𝐁)1𝐁𝐜¯\mathbf{s}={\left({{\mathbf{B}^{\intercal}}\mathbf{B}}\right)^{-1}}{\mathbf{B}^{\intercal}}\bar{\mathbf{c}} (16)

where it is assumed that Nn+1N\gg n+1111In the following sections, QQ is used to generically represent different, albeit related, functions..

III-A2 Surface Extraction

The equation of the surface is defined as follows:

z=f(x,y)=j=0nxl=0nyBj,nx(x)Bl,ny(y)qjlz=f(x,y)=\sum\limits_{j=0}^{n_{x}}{\sum\limits_{l=0}^{n_{y}}{{B_{j,n_{x}}}\left({{x}}\right){B_{l,n_{y}}}\left(y\right){q_{jl}}}} (17)

where nx,nyn_{x},n_{y}\in\mathbb{N}^{*} are the fitting order along with xx and yy direction, and qjlq_{jl}\in\mathbb{R} is the shape weight. Same as with (14), as fitting cost function is introduced:

Q=(𝐃𝐬𝐳)(𝐃𝐬𝐳)Q={\left({\mathbf{D}\mathbf{s}-{\mathbf{z}}}\right)^{\intercal}}\left({\mathbf{D}\mathbf{s}-{\mathbf{z}}}\right) (18)

for a “augmented” regression matrix 𝐃\mathbf{D} satisfying:

𝐃\displaystyle\mathbf{D} =[𝐃1,,𝐃N]N×(nx+1)(ny+1)\displaystyle=[\mathbf{D}_{1}^{\intercal},\ldots,\mathbf{D}_{N}^{\intercal}]^{\intercal}\in\mathbb{R}^{N\times(n_{x}+1)(n_{y}+1)}
𝐃i\displaystyle\mathbf{D}_{i} =[B0,nx(xi)B0,ny(yi),,Bnx,nx(xi)Bny,ny(yi)]\displaystyle=[B_{0,n_{x}}(x_{i})B_{0,n_{y}}(y_{i}),\ldots,B_{n_{x},n_{x}}(x_{i})B_{n_{y},n_{y}}(y_{i})]
𝐃i\displaystyle\mathbf{D}_{i}^{\intercal} (nx+1)(ny+1),fori=1,,N\displaystyle\in\mathbb{R}^{(n_{x}+1)(n_{y}+1)},\ for\ \ i=1,\ldots,N (19)

The depth vector is defiend as 𝐳=[z1,,zN]N{\mathbf{z}}=[z_{1},\ldots,z_{N}]^{\intercal}\in\mathbb{R}^{N}, and the feature vector is 𝐬=[q00,,qnxny](nx+1)(ny+1)\mathbf{s}={\left[{{q_{00}},\ldots,{q_{{n_{x}}{n_{y}}}}}\right]^{\intercal}}\in{\mathbb{R}^{\left({{n_{x}}+1}\right)\left({{n_{y}}+1}\right)}}. The solution to the minimization of (18) that approximates 𝐳𝐃𝐬\mathbf{z}\approx\mathbf{D}\mathbf{s} is as follows:

𝐬=(𝐃𝐃)1𝐃𝐳\mathbf{s}={\left({{\mathbf{D}^{\intercal}}\mathbf{D}}\right)^{-1}}{\mathbf{D}^{\intercal}}{\mathbf{z}} (20)

For N(nx+1)(ny+1)N\gg(n_{x}+1)(n_{y}+1).

Refer to caption
Figure 3: Scheme diagram of MLS. The weight ω(εi)\omega(\varepsilon_{i}) is the square error between the fitted value and the given value. The fitting smoothness is regulated by adjusting the weight values.

III-B MLS-based Feature Extraction

Although LSM has an efficient one-step calculation, the weights of Bj,nB_{j,n} are the same all over the variable parameters (e.g., ρ\rho or x,yx,y). Thus, to approximate complex curves/surfaces, the fitting order needs to be increased, which may lead to over-fitting problems. To address this issue, MLS assumes that 𝐬\mathbf{s} is parameter-dependent, i.e., it changes with respect to ρ\rho (for centerline and contour), or to (x,y)(x,y) (for surface). This enables MLS to represent complex shapes with a lower fitting order. A support field [36] is introduced to ensure that the value of each weight is only influenced by data points within the support field. A conceptual diagram is given in Fig. 3.

III-B1 Centerline and Contour Extraction

The parametric equation with 𝝈j(ρ)\boldsymbol{\sigma}_{j}(\rho) is presented by referring to (6):

𝐟(ρ)=j=0n𝝈j(ρ)Bj,n(ρ){\mathbf{f}\left({{\rho}}\right)=\sum\limits_{j=0}^{n}{{\boldsymbol{\sigma}_{j}(\rho)}{B_{j,n}}\left({{\rho}}\right)}} (21)

where the parameter-dependent weight is denoted as 𝝈j(ρ)3\boldsymbol{\sigma}_{j}(\rho)\in\mathbb{R}^{3}. The weighted square residual function is defined as:

Q(ρ)=i=1Nω(εi)j=0n𝝈j(ρ)Bj,n(ρi)𝐜i2Q(\rho)=\sum\limits_{i=1}^{N}\omega\left(\varepsilon_{i}\right)\biggl{\|}{\sum\limits_{j=0}^{n}{{\boldsymbol{\sigma}_{j}}\left(\rho\right){B_{j,n}}\left({{\rho_{i}}}\right)}-{\mathbf{c}_{i}}}\biggr{\|}^{2} (22)

where εi=|ρρi|/d>0\varepsilon_{i}=|\rho-\rho_{i}|/d>0, and dd is the constant support field radius. The scalar function ω(εi)\omega(\varepsilon_{i}) is calculated as follows [37]:

ω(εi)={234εi2+4εi3,εi0.5434εi+4εi243εi3,0.5<εi10,εi>1\omega\left(\varepsilon_{i}\right)=\left\{{\begin{array}[]{*{20}{c}}{\frac{2}{3}-4\varepsilon_{i}^{2}+4\varepsilon_{i}^{3}},&{\varepsilon_{i}\leq 0.5}\\ {\frac{4}{3}-4\varepsilon_{i}+4\varepsilon_{i}^{2}-\frac{4}{3}\varepsilon_{i}^{3}},&{0.5<\varepsilon_{i}\leq 1}\\ 0,&{\varepsilon_{i}>1}\end{array}}\right. (23)

The function ω(εi)\omega(\varepsilon_{i}) indicates the weight of ρ\rho relative to ρi\rho_{i}, ω(εi)\omega(\varepsilon_{i}) decreases as εi\varepsilon_{i} increasing inside the support field. When ρ\rho is outside the support field, ω(εi)=0\omega(\varepsilon_{i})=0. MLS reduces into LSM when ω(εi)\omega(\varepsilon_{i}) is constant. The cost (22) can be equivalently expressed in matrix form as:

Q(ρ)=(𝐁ϑ(ρ)𝐜¯)𝐖(ε)(𝐁ϑ(ρ)𝐜¯)Q(\rho)=\left({\mathbf{B}\boldsymbol{\vartheta}(\rho)-\bar{\mathbf{c}}}\right)^{\intercal}\mathbf{W}(\varepsilon)\left({\mathbf{B}\boldsymbol{\vartheta}(\rho)-\bar{\mathbf{c}}}\right) (24)

where ϑ(ρ)\boldsymbol{\vartheta}(\rho) and 𝐖(ε)\mathbf{W}(\varepsilon) are defined as follows:

ϑ(ρ)\displaystyle\boldsymbol{\vartheta}(\rho) =[𝝈0(ρ),,𝝈n(ρ)]3(n+1)\displaystyle=[\boldsymbol{\sigma}_{0}^{\intercal}(\rho),\ldots,\boldsymbol{\sigma}_{n}^{\intercal}(\rho)]^{\intercal}\in\mathbb{R}^{3(n+1)} (25)
𝐖(ε)\displaystyle\mathbf{W}(\varepsilon) =diag(ω(ε1),,ω(εN))𝐄33N×3N\displaystyle=\operatorname{diag}({\omega(\varepsilon_{1}),\ldots,\omega(\varepsilon_{N})})\otimes\mathbf{E}_{3}\in{\mathbb{R}^{3N\times 3N}}

The value of ϑ(ρ)\boldsymbol{\vartheta}(\rho) that minimizes (24) is computed as follows:

ϑ(ρ)=(𝐁𝐖(ε)𝐁)1𝐁𝐖(ε)𝐜¯\boldsymbol{\vartheta}(\rho)=(\mathbf{B}^{\intercal}\mathbf{W}(\varepsilon)\mathbf{B})^{-1}\mathbf{B}^{\intercal}\mathbf{W}(\varepsilon)\bar{\mathbf{c}} (26)

By completing NN iterations along the parameter ρ1,,ρN\rho_{1},...,\rho_{N}, we can compute the augmented shape features as:

𝚷=[ϑ(ρ1),ϑ(ρN)]3(n+1)×N\boldsymbol{\Pi}=[\boldsymbol{\vartheta}(\rho_{1}),\ldots\boldsymbol{\vartheta}(\rho_{N})]\in\mathbb{R}^{3(n+1)\times N} (27)

As the dimension of (27) is very large, it is impractical to use its components as a feedback signal for control. Thus, principal components analysis (PCA) [15] is used to reduce the augmented structure (27) into a compact form 𝚷~3(n+1)×m\widetilde{\boldsymbol{\Pi}}\in\mathbb{R}^{3(n+1)\times m} where mNm\ll N by selecting the first mm most significant dimensions. The feature vector 𝐬3m(n+1)\mathbf{s}\in\mathbb{R}^{3m(n+1)} is computed by vectorizing the elements of 𝚷~\widetilde{\boldsymbol{\Pi}}.

III-B2 Surface Extraction

The equation of the surface is constructed as:

f(x,y)=j=0nxl=0nyBj,nx(x)Bl,ny(y)ϖjl(x,y){f(x,y)=\sum\limits_{j=0}^{n_{x}}{\sum\limits_{l=0}^{n_{y}}{{B_{j,n_{x}}}\left({{x}}\right){B_{l,n_{y}}}\left(y\right){\varpi_{jl}}(x,y)}}} (28)

where ϖjl(x,y)\varpi_{jl}(x,y)\in\mathbb{R} is the parameter-dependent weight related to (x,y)(x,y). Algorithm 1 gives a pseudocode description of this method.

Algorithm 1 Surface fitting procedure of MLS.
1:𝐜¯\bar{\mathbf{c}}, nxn_{x}, nyn_{y}, mm, and dd;
2:Calculate node distance:
εi=(xxi)2+(yyi)2/d\varepsilon_{i}=\sqrt{(x-x_{i})^{2}+(y-y_{i})^{2}}/d
3:Construct the fitting cost function:
Q=(𝐃ϕ(x,y)𝐳)𝐖(ε)(𝐃ϕ(x,y)𝐳)Q={\left({\mathbf{D}\boldsymbol{\phi}\left({x,y}\right)-\mathbf{z}}\right)^{\intercal}}\mathbf{W}\left(\varepsilon\right)\left({\mathbf{D}\boldsymbol{\phi}\left({x,y}\right)-\mathbf{z}}\right)
for ϕ(x,y)\boldsymbol{\phi}(x,y) and 𝐖(ε)\mathbf{W}(\varepsilon) satisfying:
ϕ\displaystyle\boldsymbol{\phi} =[ϖ00,,ϖnxny]\displaystyle={[{{\varpi_{00}},\ldots,{\varpi_{{n_{x}}{n_{y}}}}}]^{\intercal}}
𝐖\displaystyle\mathbf{W} =diag(ω(ε1),,ω(εN))\displaystyle=\operatorname{diag}\left({\omega\left(\varepsilon_{1}\right),\ldots,\omega\left(\varepsilon_{N}\right)}\right) (29)
4:Compute the structure ϕ(x,y)\boldsymbol{\phi}(x,y):
ϕ=(𝐃𝐖(ε)𝐃)1𝐃𝐖(ε)𝐳\boldsymbol{\phi}=({{{\mathbf{D}^{\intercal}}\mathbf{W}\left(\varepsilon\right)\mathbf{D}}})^{-1}{{\mathbf{D}^{\intercal}}\mathbf{W}\left(\varepsilon\right)}{\mathbf{z}}
5:Use xi,yi,i[1,N]x_{i},y_{i},i\in[1,N] to compute:
𝚷=[ϕ(x1,y1),,ϕ(xN,yN)]\boldsymbol{\Pi}=[\boldsymbol{\phi}(x_{1},y_{1}),\ldots,\boldsymbol{\phi}(x_{N},y_{N})] (30)
6:Use PCA to calculate 𝚷~\widetilde{\boldsymbol{\Pi}};
7:Vectorize 𝚷~\widetilde{\boldsymbol{\Pi}} to obtain 𝐬\mathbf{s};
8:return 𝐬\mathbf{s};
Remark 1.

In addition to the proposed basis functions, there are other approaches that can be used to obtain similar results, e.g., Chebyshev polynomials [38] and Legendre basis transformations [39].

IV Shape Prediction Network

During the manipulation process, occlusions caused by obstacles or the robot itself may affect the integrity of observed shapes, and hence, the vector 𝐬\mathbf{s} cannot properly describe the object’s configuration. As a solution to this critical issue, in this paper we propose an occlusion compensation shape prediction network (SPN), which is composed of a regulation input (RI), a multi-resolution encoder (MRE) and a discriminator network (DN) [40]. The proposed SPN utilizes the robot and object configurations and the active robot motions (i.e., 𝐜¯k\bar{\mathbf{c}}_{k}, 𝐫k\mathbf{r}_{k}, 𝐮k\mathbf{u}_{k}) as input to the network to predict the next instance shape, here denoted by 𝐜¯^k+1\hat{\bar{\mathbf{c}}}_{k+1}. Fig. 4 shows the overall architecture of the SPN.

IV-A Input Data Preprocessing

As the input data 𝐜¯k,𝐫k,𝐮k\bar{\mathbf{c}}_{k},\mathbf{r}_{k},\mathbf{u}_{k} to the network have different sizes, they need to be rearranged into structures with unified dimensions. To this end, 𝐜¯k\bar{\mathbf{c}}_{k} is first rearranged into the matrix 𝐓khigh=[𝐜1,𝐜2,,𝐜N]N×3\mathbf{T}^{high}_{k}=[\mathbf{c}_{1},\mathbf{c}_{2},\ldots,\mathbf{c}_{N}]^{\intercal}\in\mathbb{R}^{N\times 3}. Then, farthest point sampling (FPS) [41] is used to downsample 𝐓khigh\mathbf{T}^{high}_{k} to two resolutions 𝐓kmidNδ×3\mathbf{T}^{mid}_{k}\in\mathbb{R}^{\frac{N}{\delta}\times 3} and 𝐓klowNδ2×3\mathbf{T}^{low}_{k}\in\mathbb{R}^{\frac{N}{\delta^{2}}\times 3}, for δ\delta as the resolution scale. Finally, the vectors 𝐫k\mathbf{r}_{k} and 𝐮k\mathbf{u}_{k} are rearranged into the matrices 𝚲khigh=𝐈N×1𝐫kN×3\boldsymbol{\Lambda}_{k}^{high}={\mathbf{I}_{N\times 1}}\otimes\mathbf{r}_{k}^{\intercal}\in\mathbb{R}^{N\times 3} and 𝚺khigh=𝐈N×1𝐮kN×3\boldsymbol{\Sigma}_{k}^{high}={\mathbf{I}_{N\times 1}}\otimes\mathbf{u}_{k}^{\intercal}\in\mathbb{R}^{N\times 3}, which are similarly downsampled into mid and low resolutions as follows:

𝚲kmid\displaystyle\boldsymbol{\Lambda}_{k}^{mid} =𝐈Nδ×1𝐫k,𝚲klow=𝐈Nδ2×1𝐫k,\displaystyle={\mathbf{I}_{\frac{N}{\delta}\times 1}}\otimes\mathbf{r}_{k}^{\intercal},\qquad\boldsymbol{\Lambda}_{k}^{low}={\mathbf{I}_{\frac{N}{\delta^{2}}\times 1}}\otimes\mathbf{r}_{k}^{\intercal}, (31)
𝚺kmid\displaystyle\boldsymbol{\Sigma}_{k}^{mid} =𝐈Nδ×1𝐮k,𝚺klow=𝐈Nδ2×1𝐮k\displaystyle={\mathbf{I}_{\frac{N}{\delta}\times 1}}\otimes\mathbf{u}_{k}^{\intercal},\qquad\boldsymbol{\Sigma}_{k}^{low}={\mathbf{I}_{\frac{N}{\delta^{2}}\times 1}}\otimes\mathbf{u}_{k}^{\intercal} (32)

Thus, three different resolutions are generated for the network, high {𝐓khigh,𝚲khigh,𝚺khigh}\{\mathbf{T}_{k}^{high},\boldsymbol{\Lambda}_{k}^{high},\boldsymbol{\Sigma}_{k}^{high}\}, mid {𝐓kmid,𝚲kmid,𝚺kmid}\{\mathbf{T}_{k}^{mid},\boldsymbol{\Lambda}_{k}^{mid},\boldsymbol{\Sigma}_{k}^{mid}\}, and low {𝐓klow,𝚲klow,𝚺klow}\{\mathbf{T}_{k}^{low},\boldsymbol{\Lambda}_{k}^{low},\boldsymbol{\Sigma}_{k}^{low}\}, that provides a total input data of dimension [N×9,Nδ×9,Nδ2×9][N\times 9,\frac{N}{\delta}\times 9,\frac{N}{\delta^{2}}\times 9]. 𝐓khigh,𝐓kmid,𝐓klow\mathbf{T}_{k}^{high},\mathbf{T}_{k}^{mid},\mathbf{T}_{k}^{low} are the geometric shapes of the object under different compression sizes specified by δ\delta. Thus, these three terms can describe the potential feature structure of objects. The proposed SPN aims to predict the object’s shape that results from the robots actions, under the current object-robot configuration. By using this multi-resolution data {high, mid, low}, the encoder can better learn the potential feature information of shapes.

Refer to caption
Figure 4: SPN predicts the next-moment shape 𝐜¯k+1\bar{\mathbf{c}}_{k+1} by using the current-moment data in the case of occlusion, i.e., 𝐜¯k+𝐫k+𝐮k𝐜¯^k+1\bar{\mathbf{c}}_{k}+\mathbf{r}_{k}+\mathbf{u}_{k}\xrightarrow{}\hat{\bar{\mathbf{c}}}_{k+1}. FPS [41] regulates 𝐜¯k\bar{\mathbf{c}}_{k} (shown in yellow) uniformly to different scales (shown in red and blue). MRE stacks three-resolutions data together to form the total feature information, and obtains the predicted next-moment shape through convolution. DN is used to further improve the accuracy of the network.

IV-B Multi-Resolution Encoder

A combined multi-layer perceptron (CMLP) is the feature extractor of MRE, which uses the output of each layer in a MLP to form a multiple-dimensional feature vector. Traditional methods adopt the last layer of the MLP output as features, and do not consider the output of the intermediate layers, which leads to potentially losing important local information [42]. CMLP enables to make good use of low-level and mid-level features that include useful intermediate-transition information [40]. CMLP utilizes MLP to encode input data into multiple dimensions [64,128,256,512,1024][64,128,256,512,1024]. Then, we maxpool the output of the last four layers to construct a multiple-dimensional feature vector as follows:

𝐯1128,𝐯2256,𝐯3512,𝐯41024\displaystyle\mathbf{v}_{1}\in\mathbb{R}^{128},\ \mathbf{v}_{2}\in\mathbb{R}^{256},\ \mathbf{v}_{3}\in\mathbb{R}^{512},\ \mathbf{v}_{4}\in\mathbb{R}^{1024} (33)

The combined feature vector is constructed as: 𝐯¯i=[𝐯1,𝐯2,𝐯3,𝐯4]1920\bar{\mathbf{v}}_{i}=[\mathbf{v}_{1}^{\intercal},\mathbf{v}_{2}^{\intercal},\mathbf{v}_{3}^{\intercal},\mathbf{v}_{4}^{\intercal}]^{\intercal}\in\mathbb{R}^{1920}. Three independent CMLPs map three resolutions into three individual 𝐯¯i,\bar{\mathbf{v}}_{i}, for i=1,2,3i=1,2,3. Each 𝐯¯i\bar{\mathbf{v}}_{i} represents the extracted potential information of each resolution. Then, the augmented feature matrix 𝐕\mathbf{V} is generated by arranging its columns as 𝐕=[𝐯¯1,𝐯¯2,𝐯¯3]1920×3\mathbf{V}=[\bar{\mathbf{v}}_{1},\bar{\mathbf{v}}_{2},\bar{\mathbf{v}}_{3}]\in\mathbb{R}^{1920\times 3}, and further through 1D-convolution to obtain 𝐌N×3\mathbf{M}\in\mathbb{R}^{N\times 3}. Finally, the predicted next-moment shape 𝐜¯^k+13N\hat{\bar{\mathbf{c}}}_{k+1}\in\mathbb{R}^{3N} is obtained by vectorizing 𝐌\mathbf{M}. The prediction loss of MRE is:

Lmre=𝐜¯^k+1𝐜¯k+1{L_{mre}}=\left\|{\hat{\bar{\mathbf{c}}}_{k+1}-\bar{\mathbf{c}}_{k+1}}\right\| (34)

where 𝐜¯k+1\bar{\mathbf{c}}_{k+1} represents the ground-truth next-moment shape in the training data-set.

IV-C Discriminator Network

Generative Adversarial Network (GAN) is chosen as DN to enhance the prediction accuracy. For simplicity, we define Φ=MRE()\Phi=\rm{MRE}() and Ψ=DN()\Psi=\rm{DN}(). We define (𝐜¯k,𝐫k,𝐮k\bar{\mathbf{c}}_{k},\mathbf{r}_{k},\mathbf{u}_{k}) as the 𝒳\mathcal{X} input into Φ\Phi, while 𝒴\mathcal{Y} represents the true shape 𝐜¯k+1\bar{\mathbf{c}}_{k+1}. Ψ\Psi is a classification network with similar structure as CMLP, constituted by serial MLP layers [128,256,512,1024][128,256,512,1024] to distinguish the predicted shape Φ(𝒳)\Phi(\mathcal{X}) and the real shape 𝒴\mathcal{Y}. We maxpool the last three layers of Ψ\Psi to obtain feature vector [256,512,1024][256,512,1024]. Three feature vectors are concatenated into a latent vector 𝐦1792\mathbf{m}\in\mathbb{R}^{1792}, and then passed through the fully-connected layers [512,256,64,1][512,256,64,1] followed by sigmoid-classifier to obtain the evaluation. The adversarial loss is defined as follows:

Ladv=1ivlog(1Ψ(βi))+1jvlog(Ψ(Φ(αi))){L_{adv}}=\sum\limits_{1\leq i\leq v}{\log\left({1-\Psi\left({{\beta_{i}}}\right)}\right)}+\sum\limits_{1\leq j\leq v}{\log\left({\Psi\left(\Phi(\alpha_{i})\right)}\right)} (35)

where αi𝒳,βi𝒴,i=1,,v\alpha_{i}\in\mathcal{X},\beta_{i}\in\mathcal{Y},i=1,\ldots,v, and vv is size of the dataset including 𝒳\mathcal{X} and 𝒴\mathcal{Y}. The total loss of SPN is:

L=ζmreLmre+ζadvLadvL=\zeta_{mre}L_{mre}+\zeta_{adv}L_{adv} (36)

where ζmre\zeta_{mre} and ζadv\zeta_{adv} are the weights of LmreL_{mre} and LadvL_{adv}, respectively, which satisfy the condition: ζmre+ζadv=1\zeta_{mre}+\zeta_{adv}=1.

V Receding-time Model Estimation

In this paper, the objects are assumed to be manipulated by the robot slowly, thus 𝐉k(𝐬k,𝐫k)\mathbf{J}_{k}({\mathbf{s}_{k},\mathbf{r}_{k}}) is expected to change smoothly. For ease of presentation, we omit the arguments of 𝐉k(𝐬k,𝐫k)\mathbf{J}_{k}({\mathbf{s}_{k},\mathbf{r}_{k}}) and denote it as as 𝐉k\mathbf{J}_{k} from now on. To estimate the DJM, three indicators are considered, viz., accuracy, smoothness, and singularity. To this end, an optimization-based receding-time model (RTM) estimator is presented to estimate the changes of the Jacobian matrix, denoted by Δ𝐉^k=𝐉^k𝐉^k1\Delta\hat{\mathbf{\mathbf{J}}}_{k}=\hat{\mathbf{J}}_{k}-\hat{\mathbf{J}}_{k-1}, which enables to monitor the estimation procedure. Δ𝐉^k\Delta\hat{\mathbf{J}}_{k} can be obtained by considering the following three constraints:

  • (Q1Q_{1}) Constraint of receding-time error [43]. As 𝐉k\mathbf{J}_{k} depicts the relationship between Δ𝐬k+1\Delta\mathbf{s}_{k+1} and 𝐮k\mathbf{u}_{k} in a local range, thus we consider the accumulated error in η\eta past moments to ensure the estimation accuracy. η\eta is the receding window size. The receding-time error is given by:

    Q1=j=1ηγjΔ𝐬k+1j(𝐉^k1+Δ𝐉^k)𝐮kj2Q_{1}=\sum\limits_{j=1}^{\eta}{{{\gamma^{j}\bigg{\|}{{\Delta\mathbf{s}_{k+1-j}}-{({\hat{\mathbf{J}}}_{k-1}+\Delta\hat{\mathbf{J}}_{k})}{\mathbf{u}_{k-j}}}\bigg{\|}}^{2}}} (37)

    The sensitivity to noise can be improved by adjusting η\eta, which helps to address the measurement fluctuations. 0<γ10<\gamma\leq{1} is a constant forgetting factor giving less weight to the past observation data.

  • (Q2Q_{2}) Constraint of estimation smoothness [43]. As 𝐉k\mathbf{J}_{k} is assumed to be smooth, thus Δ𝐉^k\Delta\hat{\mathbf{J}}_{k} should be estimated smoothly to avoid sudden large fluctuations, which can be achieved by minimizing the Frobenius norm of Δ𝐉^k\Delta\hat{\mathbf{J}}_{k}:

    Q2=Δ𝐉^kF2\displaystyle Q_{2}=\|\Delta\hat{\mathbf{J}}_{k}\|_{F}^{2} (38)
  • (Q3Q_{3}) Constraint of shape manipulability [44]. It evaluates the feasibility of changing the object’s shape under the current object-robot configuration:

    Q3=λmax((𝐉^k1+Δ𝐉^k)(𝐉^k1+Δ𝐉^k))λmin((𝐉^k1+Δ𝐉^k)(𝐉^k1+Δ𝐉^k))2Q_{3}=\bigg{\|}\frac{\lambda_{\max}((\hat{\mathbf{J}}_{k-1}+\Delta\hat{\mathbf{J}}_{k})^{\intercal}(\hat{\mathbf{J}}_{k-1}+\Delta\hat{\mathbf{J}}_{k}))}{\lambda_{\min}((\hat{\mathbf{J}}_{k-1}+\Delta\hat{\mathbf{J}}_{k})^{\intercal}(\hat{\mathbf{J}}_{k-1}+\Delta\hat{\mathbf{J}}_{k}))}\bigg{\|}^{2} (39)

    where λmax\lambda_{\max} and λmin\lambda_{\min} are the maximum and minimum eigenvalue, respectively, and Q31Q_{3}\geq 1. When Q3=1Q_{3}=1, the object can deform isotropically in any direction. A growing Q3Q_{3} indicates that the object is reaching singular (non-manipulable) configuration.

Finally, the total weighted optimization index is given by:

Q(Δ𝐉^k)=μ1Q1+μ2Q2+μ3Q3\displaystyle Q(\Delta\hat{\mathbf{J}}_{k})=\mu_{1}Q_{1}+\mu_{2}Q_{2}+\mu_{3}Q_{3} (40)

where μi>0\mu_{i}>0 are the weights that specify the contribution of each constraint, and which satisfy μ1+μ2+μ3=1\mu_{1}+\mu_{2}+\mu_{3}=1. The index (40) is then solved by using numerical optimization tools (e.g., Matlab/fmincon or Python/CasADi) to obtain Δ𝐉^k\Delta\hat{\mathbf{J}}_{k} and thus, iteratively update the deformation Jacobian matrix as follows: 𝐉^k=𝐉^k1+Δ𝐉^k\hat{\mathbf{J}}_{k}=\hat{\mathbf{J}}_{k-1}+\Delta\hat{\mathbf{J}}_{k}.

VI Model Predictive Controller

It is assumed that the matrix 𝐉k\mathbf{J}_{k} has been accurately estimated at the time instant kk by the RTM, such that it satisfies Δ𝐬k+1=𝐉^k𝐮k\Delta\mathbf{s}_{k+1}={\hat{\mathbf{J}}_{k}}{\mathbf{u}_{k}}. Based on this model, we propose an MPC-based controller to derive the velocity inputs 𝐮k\mathbf{u}_{k} for the robot, while taking saturation and workspace constraints into account. Two vectors are defined as follow:

𝐬¯k\displaystyle\bar{\mathbf{s}}_{k} =[𝐬k+1|k,,𝐬k+h|k]ph\displaystyle={[{\mathbf{s}_{k+1|k}^{\intercal},\ldots,\mathbf{s}_{k+h|k}^{\intercal}}]^{\intercal}}\in{\mathbb{R}^{ph}}
𝐮¯k\displaystyle\bar{\mathbf{u}}_{k} =[𝐮k|k,,𝐮k+h1|k]3h\displaystyle={[{\mathbf{u}_{k|k}^{\intercal},\ldots,\mathbf{u}_{k+h-1|k}^{\intercal}}]^{\intercal}}\in{\mathbb{R}^{3h}} (41)

where 𝐬¯k\bar{\mathbf{s}}_{k} and 𝐮¯k\bar{\mathbf{u}}_{k} represent the predictions of 𝐬k\mathbf{s}_{k} and 𝐮k\mathbf{u}_{k} in the next hh periods, respectively. The vectors 𝐬k+i|k\mathbf{s}_{k+i|k} and 𝐮k+i|k\mathbf{u}_{k+i|k} denote the iith predictions of 𝐬k\mathbf{s}_{k} and 𝐮k\mathbf{u}_{k} from the time instant kk, where 𝐬k|k=𝐬k\mathbf{s}_{k|k}=\mathbf{s}_{k}, and 𝐮k|k=𝐮k\mathbf{u}_{k|k}=\mathbf{u}_{k} must hold. The prediction 𝐬¯k\bar{\mathbf{s}}_{k} can be calculated from the estimated Jacobian matrix by noting that 𝐉^k𝐉^k+h\hat{\mathbf{\mathbf{J}}}_{k}\approx\hat{\mathbf{\mathbf{J}}}_{k+h} is satisfied during period [k,k+h][k,k+h] (which is reasonable, given the regularity of the object). This way, the predictions are computed as follows:

𝐬k+j|k=𝐬k+i=0j1𝐉^k𝐮k+i|k,j=1,,h\begin{array}[]{*{20}{c}}{\mathbf{s}_{k+j|k}=\mathbf{s}_{k}+\sum\limits_{i=0}^{j-1}{{{\hat{\mathbf{J}}}_{k}}\mathbf{u}_{k+i|k}}},&{j=1,\ldots,h}\end{array} (42)

All predictions are then grouped and arranged into a single vector form:

𝐬¯k\displaystyle{\bar{\mathbf{s}}_{k}} =𝐀𝐬k+𝚯𝐮¯k,\displaystyle=\mathbf{A}\mathbf{s}_{k}+\boldsymbol{\Theta}{\bar{\mathbf{u}}_{k}},
𝐀\displaystyle\mathbf{A} =𝐈h×1𝐄pph×p,𝚯=𝐋h𝐉^kph×3h\displaystyle=\mathbf{I}_{h\times 1}\otimes\mathbf{E}_{p}\in{\mathbb{R}^{ph\times p}},\ \boldsymbol{\Theta}=\mathbf{L}_{h}\otimes\hat{\mathbf{J}}_{k}\in{\mathbb{R}^{ph\times 3h}} (43)

In addition to 𝐬¯k\bar{\mathbf{s}}_{k} and 𝐮¯k\bar{\mathbf{u}}_{k}, we define the constant sequence vector 𝐬¯k\bar{\mathbf{s}}_{k}^{*} that represents the desired shape feature as:

𝐬¯k=[𝐬k+1|k,,𝐬k+h|k]ph\bar{\mathbf{s}}_{k}^{*}={\left[{\mathbf{s}_{k+1|k}^{*{\intercal}},\ldots,\mathbf{s}_{k+h|k}^{*{\intercal}}}\right]^{\intercal}}\in{\mathbb{R}^{ph}} (44)

The cost function Q(𝐮¯k)Q\left(\bar{\mathbf{u}}_{k}\right) for the optimization of the control input is formulated as:

Q(𝐮¯k)=(𝐬¯k𝐬¯k)𝚼1(𝐬¯k𝐬¯k)+𝐮¯k𝚼2𝐮¯kQ\left(\bar{\mathbf{u}}_{k}\right)={\left({{\bar{\mathbf{s}}_{k}}-{\bar{\mathbf{s}}_{k}^{*}}}\right)^{\intercal}}\boldsymbol{\Upsilon}_{1}\left({{\bar{\mathbf{s}}_{k}}-{\bar{\mathbf{s}}_{k}^{*}}}\right)+\bar{\mathbf{u}}_{k}^{\intercal}\boldsymbol{\Upsilon}_{2}{\bar{\mathbf{u}}_{k}} (45)

where 𝚼1>0\boldsymbol{\Upsilon}_{1}>0 and 𝚼2>0\boldsymbol{\Upsilon}_{2}>0 are the weights for the error convergence rate and the smoothness of 𝐮k\mathbf{u}_{k}, respectively. Two constraints are considered:

  • Saturation limits. In practice, robots have limits on their achievable joint speeds. These constraints are useful in soft object manipulation tasks to avoid damaging the object. Therefore, 𝐮¯k\bar{\mathbf{u}}_{k} needs to be constrained:

    𝐮¯min𝐮¯k𝐮¯max{\bar{\mathbf{u}}_{\min}\leq\bar{\mathbf{u}}_{k}\leq\bar{\mathbf{u}}_{\max}} (46)

    where 𝐮¯min\bar{\mathbf{u}}_{\min} and 𝐮¯max\bar{\mathbf{u}}_{\max} are the constant lower and upper bounds, respectively.

  • Workspace limits. Robots are also often required to operate in a confined workspace to avoid colliding with the environment. In soft object manipulation, this constraint is needed to avoid over-stretching or over-compressing the manipulated object. To this end, the following constant bounds are introduced:

    𝐫k+i|kmin𝐫k+i|k𝐫k+i|kmax,0ih1\begin{array}[]{*{20}{c}}{\mathbf{r}_{k+i|k}^{\min}\leq{\mathbf{r}_{k+i|k}}\leq\mathbf{r}_{k+i|k}^{\max}},&{0\leq i\leq h-1}\end{array} (47)

    Similarly as in (42), the recursive structure of (47) can be obtained follows:

    𝚵min𝐂𝐮¯k𝚵max{\boldsymbol{\Xi}_{\min}}\leq\mathbf{C}{{\bar{\mathbf{u}}}_{k}}\leq{\boldsymbol{\Xi}_{\max}} (48)

    for 𝚵min,𝚵max3h\boldsymbol{\Xi}_{\min},\boldsymbol{\Xi}_{\max}\in\mathbb{R}^{3h} and 𝐂3h×3h\mathbf{C}\in\mathbb{R}^{3h\times 3h} defined as:

    𝚵min\displaystyle\boldsymbol{\Xi}_{\min} =[(𝐫k|kmin𝐫k1),,(𝐫k+h1|kmin𝐫k1)]\displaystyle=[({\mathbf{r}_{k|k}^{\min}-{\mathbf{r}_{k-1}}})^{\intercal},...,({\mathbf{r}_{k+h-1|k}^{\min}-{\mathbf{r}_{k-1}}})^{\intercal}]^{\intercal}
    𝚵max\displaystyle\boldsymbol{\Xi}_{\max} =[(𝐫k|kmax𝐫k1),,(𝐫k+h1|kmax𝐫k1)]\displaystyle=[({\mathbf{r}_{k|k}^{\max}-{\mathbf{r}_{k-1}}})^{\intercal},...,({\mathbf{r}_{k+h-1|k}^{\max}-{\mathbf{r}_{k-1}}})^{\intercal}]^{\intercal}
    𝐂\displaystyle\mathbf{C} =𝐋h𝐄33h×3h\displaystyle=\mathbf{L}_{h}\otimes\mathbf{E}_{3}\in\mathbb{R}^{3h\times 3h} (49)

The quadratic optimization problem is formulated as follows:

min𝐮¯kQ(𝐮¯k)=12𝐮¯k𝐇𝐮¯k+𝐪𝐮¯k𝚜.𝚝.𝐮¯min𝐮¯k𝐮¯max𝚵min𝐂𝐮¯k𝚵max\begin{array}[]{*{20}{c}}\underset{\bar{\mathbf{u}}_{k}}{\min}\leavevmode\nobreak\ Q(\bar{\mathbf{u}}_{k})=\frac{1}{2}\bar{\mathbf{u}}_{k}^{\intercal}{\mathbf{H}}{\bar{\mathbf{u}}_{k}}+{\mathbf{q}^{\intercal}}{\bar{\mathbf{u}}_{k}\vspace{0.1cm}}\\ {\begin{array}[]{*{20}{c}}{\mathtt{s.t.}}&{\begin{array}[]{*{20}{c}}{{\bar{\mathbf{u}}}_{\min}}\leq{{\bar{\mathbf{u}}}_{k}}\leq{{\bar{\mathbf{u}}}_{\max}\vspace{0.08cm}}\\ {\boldsymbol{\Xi}_{\min}}\leq\mathbf{C}{{\bar{\mathbf{u}}}_{k}}\leq{\boldsymbol{\Xi}_{\max}}\end{array}}\end{array}}\end{array} (50)

where 𝐇=2(𝚯𝚼1𝚯+𝚼2)3h×3h{\mathbf{H}}=2\left({{\boldsymbol{\Theta}^{\intercal}}\boldsymbol{\Upsilon}_{1}\boldsymbol{\Theta}+\boldsymbol{\Upsilon}_{2}}\right)\in{\mathbb{R}^{3h\times 3h}}, 𝐪=2𝛀𝚼1𝚯1×3h\mathbf{q}^{\intercal}=2{\boldsymbol{\Omega}^{\intercal}}\boldsymbol{\Upsilon}_{1}\boldsymbol{\Theta}\in{\mathbb{R}^{1\times 3h}} and 𝛀=𝐀𝐬k𝐬¯kph\boldsymbol{\Omega}=\mathbf{A}\mathbf{s}_{k}-\bar{\mathbf{s}}_{k}^{*}\in{\mathbb{R}^{ph}} are constant matrices. Then, 𝐮¯k\bar{\mathbf{u}}_{k} can be obtained by using a standard quadratic solver on (50). Finally, 𝐮k{\mathbf{u}}_{k} is calculated by the receding horizon scheme:

𝐮k=[𝐄3,𝟎,,𝟎]𝐮¯k\mathbf{u}_{k}=[\mathbf{E}_{3},\mathbf{0},\ldots,\mathbf{0}]\bar{\mathbf{u}}_{k} (51)

Fig. 5 presents the conceptual block diagram of the proposed framework.

Remark 2.

The proposed MPC-based technique (50) computes the robot’s shaping actions based on a performance objective and subject to system’s constraints; This approach does not require the identification of the full analytical model of the deformable object. Its quadratic optimization form enables to integrate additional metrics into the problem, e.g., rising time and overshoot.

Refer to caption
Figure 5: The block diagram of the proposed shape servoing framework, including representation, prediction, approximation, and manipulation within constraints.
Refer to caption
(a) Centerline
Refer to caption
(b) Contour
Refer to caption
(c) Surface
Refer to caption
(d) Plane
Figure 6: The experimental setup, including the objects (elastic and rigid), single-arm robot (UR5), and D455.

VII Results

VII-A Experimental Setup

Vision-based manipulation experiments are conducted to validate our proposed framework. The experimental platform used in our study includes a fixed D455 depth sensor, a UR5 robot manipulator, and various deformable objects shown in Fig. 6. The depth sensor receives the video stream, from which it computes the 3D shapes by using the OpenCV and RealSense libraries. In our experiments, only 3 DOF of the robot manipulator are considered, therefore, the control input 𝐮=[ux,uy,uz]3\mathbf{u}=[u_{x},u_{y},u_{z}]^{\intercal}\in\mathbb{R}^{3} represents the linear velocity of the end-effector; A saturation limit of |ui|0.01|u_{i}|\leq 0.01 m/s, is applied for i=x,y,zi=x,y,z. The motion control algorithm is implemented on ROS/Python, which runs with a servo-control loop of 10 Hz. A video of the conducted experiments can be downloaded from https://github.com/JiamingQi-Tom/experiment_video/raw/master/paper4/video.mp4

The proposed shape extraction algorithm is depicted in Fig. 7. The RGB image from the camera is transformed into HSV and combined with mask processing to obtain a binary image. OpenCV/thinningOpenCV/thinning is utilized with FPS to extract a fixed number of object points. The point on the centerline closest to the gripper’s green marker is chosen to sort the centerline along the cable. Then we obtain the 3D shapes by using the RealSense sensor and checking its 2D pixels. We adopt the method in [11] to compute the object’s contour. A surface is obtained in the similar way to the centerline, i.e., by sorting points from top to bottom, and from left to right. As 2D pixels and 3D points have a one-to-one correspondence in a depth camera, thus, our extraction method improves the robustness to measurement noise and is simpler than traditional point cloud processing algorithms.

Refer to caption
Figure 7: Shape extraction for centerline, contour and surface. The first row shows the 2-D image pixels, and the second row shows the 3D shapes. All shapes are fixed-sampled, equidistant, and ordered sorting.
Refer to caption
Figure 8: Shapes of various objects manipulated by UR5.
Refer to caption
(a) Centerline fitting
Refer to caption
(b) Contour fitting
Refer to caption
(c) Surface fitting
Figure 9: Comparison of centerline, contour and surface fitting in accuracy and time-consuming among (7)(8)(9)(13) between LSM and MLS with d=0.9d=0.9, d=0.1d=0.1, and d=0.2d=0.2, respectively. ni,i=x,yn_{i},i=x,y are the surface fitting order.
Table I: Fitting configurations. “N/A” stands for “Not Applicable”.
Centerline Contour Surface / Rigid
Method LSM MLS MLS
Support Radius N/A d=0.2d=0.2 d=0.2d=0.2
PCA N/A m=1m=1 m=1m=1
Fitting order n=5n=5 n=4n=4 nx=ny=2n_{x}=n_{y}=2
Basis-Function Bernstein Trigonometric Polynominal
Numbers N=64N=64 N=64N=64 N=32N=32

VII-B Online Fitting of the Parametric Shape Representation

In this section, ten thousand samples of centerlines, contours and surfaces with N=64,64,32N=64,64,32, respectively, are collected by commanding the robot to manipulate the objects, whose configuration is then captured by a depth sensor. Such shaping actions are shown in Fig. 8, and visualized in the accompanying multimedia attachment. This data is used to evaluate the performance (viz. its accuracy and computation time) of our representation framework. For that, we calculate the average error between the feedback shape 𝐜¯\bar{\mathbf{c}} and the reconstructed shape 𝐜¯^\hat{\bar{\mathbf{c}}} as follows mean(𝐜¯i𝐜¯^i)\operatorname{mean}(\sum{\left\|{{{\bar{\mathbf{c}}}_{i}}-{{\hat{\bar{\mathbf{c}}}}_{i}}}\right\|}). The computation time is defined as the average of the overall processing time of all sample data among each method.

Fig. 9 shows that the larger the scalars n,nx,nyn,n_{x},n_{y} are, the better the fitting accuracy of LSM and MLS is. MLS fits better than LSM under the same condition, as MLS calculates the independent weight while LSM assumes that each node has the same weight. MLS works better in fitting contour because the parametric curve may not be continuous in the end corner, thus, the equal weight assumption of LSM is not suitable here. As the number of data points for surface is N=32N=32, it does not satisfy the condition N(nx+1)(ny+1)N\gg(n_{x}+1)(n_{y}+1) for higher order fitting models (e.g., nx=ny5n_{x}=n_{y}\geq 5), thus, we only use nx=ny4n_{x}=n_{y}\leq 4. The results show that MLS performs better than LSM in the surface representation; Interestingly, MLS obtains satisfactory performance even with nx=ny=1n_{x}=n_{y}=1. Fig. 9 also shows that larger n,nx,nyn,n_{x},n_{y} will also increase the computation time. MLS has a more noticeable increase, as it calculates the weights of all nodes while LSM calculates them once. The trigonometric approach is the fastest, polynomial and Bernstein follow, while Cox-deBoor is the slowest with the most iterative operations. The above analysis verifies the effectiveness of the proposed extraction framework, which can represent objects with a low-dimensional feature. Details of the curve fitting configuration is given in Table. I.

Refer to caption
(a) Centerline
Refer to caption
(b) Contour
Refer to caption
(c) Surface
Figure 10: Validation of SPN among three shape configurations with moving the obstacle manually. The RGB image describes the occluded case. Red dot in the second row gives the feedback shape within the occlusion at the current instant, and blue ones are the predicted shape according to the past data.

VII-C Occlusion-Robust Prediction of Object Shapes

The data collected in Section VII-B is used to evaluate our proposed SPN. 80% of the data is used as the training set, and the remaining 20% of the data is used to test the trained network. We set δ=2\delta=2 for the centerline, contour, and surface parametrization. SPN is built using PyTorch and trained by an ADAM optimizer with a batch size of 500, and the initial learning rate set to 0.0001. RELU activation and batch normalization are adopted to improve the network’s performance. In this validation test, the robot deforms the objects with small babbling motions while a cardboard sheet covers parts of objects. Fig. 10 shows that SPN can predict and provide relatively complete shapes for the three types of manipulated objects. The accompanying video demonstrates the performance of this method.

Refer to caption
(a) Centerline DJM Estimation
Refer to caption
(b) Contour DJM Estimation
Refer to caption
(c) Surface DJM Estimation
Figure 11: Curves of T1,T2T_{1},T_{2}, and Q3Q_{3}, reviewing the accuracy, smoothness, and singularity of DJM, respectively. T1T_{1} represents the feature estimation error, and T2T_{2} depicts the estimation differential error. Besides, μ1=0.8,μ2=0.1,μ3=0.1\mu_{1}=0.8,\mu_{2}=0.1,\mu_{3}=0.1.

VII-D Estimation of the Sensorimotor Model

This section aims to evaluate RTM (40) that approximates the deformation Jacobian matrix; The performance of the RTM estimator is compared with the interaction matrix estimator (IEM) in [15], and the unscented Kalman filter (UKF) in [31]. To this end, we introduce two metrics, i.e., T1T_{1} and T2T_{2}, to quantitatively compare performances of these algorithms:

T1=𝐬^k+1𝐬k+1,T2=Δ𝐬k+1𝐉^k𝐮k\displaystyle\begin{array}[]{*{20}{c}}{{T_{1}}=\|{\widehat{{\mathbf{s}}}_{k+1}-{\mathbf{s}}_{k+1}}\|},&{{T_{2}}=\|{{\Delta\mathbf{s}_{k+1}}-{{\hat{\mathbf{J}}}_{k}}\mathbf{u}_{k}}\|}\end{array} (53)

where 𝐬^k+1=𝐬^k+𝐉^k𝐮k\widehat{\mathbf{s}}_{k+1}=\widehat{\mathbf{s}}_{k}+\hat{\mathbf{J}}_{k}\mathbf{u}_{k} is the approximated shape feature that is computed based on the control actions. Fig. 11 shows that RTM with η=20\eta=20 provides the best performance for T1T_{1} among the methods; This means that constraint Q1Q_{1} (37) enables RTM to learn from past data by adjusting η\eta. Small values of T2T_{2} reflect that RTM accurately predicts the differential changes induced by the DJM; This means that constraint Q2Q_{2} (38) helps to compute a smooth matrix 𝐉^k\hat{\mathbf{J}}_{k}. As the proposed method incorporates the constraint Q3Q_{3} (39), thus, RTM can prevent singularities in the estimation of the Jacobian matrix, while IEM and UKF are prone to reach ill-conditioned estimations. We use RTM with η=10\eta=10 in the following sections.

Refer to caption
(a) Exp1 / Centerline
Refer to caption
(b) Exp2 / Contour
Refer to caption
(c) Exp3 / Surface
Refer to caption
(d) Exp4 / Rigid Plane
Figure 12: Shape manipulation experiments, Exp1, Exp2, and Exp3 are for the deformation tasks, and Exp4 is the positioning task. The first row shows 2D image manipulation process (from left to right are initial, intermediate, intermediate, and desired shape), and the second row represents 3D shape manipulation process. The obstacle moves randomly during the process. The target shape is shown with black, the red represents the start shapes, and the gradient color are intermediate shapes.
Refer to caption
(a) Exp1 / Centerline
Refer to caption
(b) Exp2 / Contour
Refer to caption
(c) Exp3 / Surface
Refer to caption
(d) Exp4 / Rigid Plane
Figure 13: Profiles of the manipulation error 𝐬𝐬k\|\mathbf{s}^{*}-\mathbf{s}_{k}\|, the robot pose 𝐫k\mathbf{r}_{k}, the velocity command 𝐮k\mathbf{u}_{k}, and the manipulability index Q3Q_{3} among four experiments (Exp1, …, Exp4). Each experiment adopts four methods, i.e., CM1,,CM4\rm{CM}_{1},\ldots,\rm{CM}_{4}.
Refer to caption
Figure 14: Performance comparison in the manipulation tasks (Exp1 to Exp4).

VII-E Automatic Shape Servoing Control

This section conducts four automatic manipulation experiments, labeled as Exp1, Exp2, Exp3, and Exp4, respectively. The target contour 𝐜¯\bar{\mathbf{c}}^{*} is obtained from previous demonstrations of the manipulation task, which ensures its reachability. A cardboard sheet is manually placed over the object to produce (partial) occlusions and test the robustness of our algorithm. The estimation methods in [15] and [31] are compared with the proposed receding-time model with MPC (h=5h=5 and h=15h=15). We label these methods as CM1,,CM4\rm{CM}_{1},\ldots,CM_{4}, and each method has been optimized to achieve a balance of stability, convergence, and responsiveness.

In addition to the feedback shape error 𝐬𝐬k\|\mathbf{s}^{*}-\mathbf{s}_{k}\|, we also compare TmaxT_{max} (i.e., the number of steps from start to finish), tdt_{d} (the steps from 𝐬𝐬0\|\mathbf{s}^{*}-\mathbf{s}_{0}\| to 10% of this value), tst_{s} (the steps from 10% 𝐬𝐬0\|\mathbf{s}^{*}-\mathbf{s}_{0}\| to the threshold value), and deffd_{eff} (the total moving distance of the end-effector), [11, 23]. Fig. 12 shows the shaping motions of the manipulated objects toward the desired configuration (black curve), with the cardboard blocking the view at various instances. The results demonstrate that SPN can predict the object’s shape during occlusions and feed it back to the controller to enforce the shape servo-loop; This results in the objects being gradually manipulated towards the desired configuration. The error norm 𝐬𝐬k\|\mathbf{s}^{*}-\mathbf{s}_{k}\| plots in Fig. 13 shows that CM3\rm{CM}_{3} provides the best control performance for the error minimization, with CM4\rm{CM}_{4} as the second-best, and CM1\rm{CM}_{1} and CM2\rm{CM}_{2} showing similar performance. A higher hh is helpful for feature prediction, yet, since we assume that 𝐉^k\hat{\mathbf{J}}_{k} is constant in the window period hh, it may lead to inaccurate predictions and even wrong manipulation of objects. Therefore, hh should be chosen according to the performance requirements of the system.

From the Q3Q_{3} plots in Fig. 13, we can see RTM with η=10\eta=10 provides the smallest value, which validates that RTM can enhance the manipulation feasibility and avoid shapes falling into the singular configurations. The black dashed lines represent the workspace constraints of rx,ry,rzr_{x},r_{y},r_{z}, on which we can see the end-effector’s trajectories in the Cartesian coordinate system; The results show that CM3\rm{CM}_{3} and CM4\rm{CM}_{4} remain within the workspace, while CM1\rm{CM}_{1} and CM2\rm{CM}_{2} may violate this constraint. Due to the adopted input saturation in our platform, all four methods can satisfy the control saturation constraint. Exp4 shows that our method has good universality, not only for the shape control, but also for traditional rigid object positioning.

A performance comparison of these experiments is given in Fig. 14. The results illustrate that our proposed shape servoing framework achieves the best performance relative to the manipulation speed TmaxT_{max}, response speed tdt_{d}, convergence speed tst_{s}, and motion distance deffd_{eff}.

VIII Conclusions

In this paper, we present an occlusion-robust shape servoing framework to control shapes of elastic objects into target configurations, while considering workspace and saturation constraints. A low-dimensional feature extractor is proposed to represent 3D shapes based on LSM and MLS. A deep neural network is introduced to predict the object’s configuration subject to occlusions, and feed it to the shape servo-controller. A receding-time model estimator is designed to approximate the deformation Jacobian matrix with various constraints such as accuracy, smoothness, singularity. The conducted experiments validate the proposed methodology with multiple unstructured shape servoing tasks in visually occluded situations and with unknown deformation models.

However, there are some limitations in our framework. For example, as the support field radius dd is constant (i.e., it does not adjust with dynamic shapes), the computed representation lacks flexibility. Also, the SPN needs to obtain substantial offline training data to properly work, which might pose complications in practice. Note that our method may not accurately shape objects with negligible elastic properties (e.g., fabrics, food materials, etc). Future work include the incorporation of shape reachability detection into the framework in order to determine the feasibility of a given shaping task beforehand.

References

  • [1] J. Zhu, A. Cherubini, C. Dune, D. Navarro-Alarcon, F. Alambeigi, D. Berenson, F. Ficuciello, K. Harada, X. Li, J. Pan, et al., “Challenges and outlook in robotic manipulation of deformable objects,” arXiv preprint arXiv:2105.01767, 2021.
  • [2] X. Li, X. Su, and Y.-H. Liu, “Vision-based robotic manipulation of flexible pcbs,” IEEE/ASME Transactions on Mechatronics, vol. 23, no. 6, pp. 2739–2749, 2018.
  • [3] L. Cao, X. Li, P. T. Phan, A. M. H. Tiong, H. L. Kaan, J. Liu, W. Lai, Y. Huang, H. M. Le, M. Miyasaka, et al., “Sewing up the wounds: A robotic suturing system for flexible endoscopy,” IEEE Robotics & Automation Magazine, vol. 27, no. 3, pp. 45–54, 2020.
  • [4] Z. Hu, P. Sun, and J. Pan, “Three-dimensional deformable object manipulation using fast online gaussian process regression,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 979–986, 2018.
  • [5] J. Sanchez, J.-A. Corrales, B.-C. Bouzgarrou, and Y. Mezouar, “Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey,” Int. J. of Rob. Res., 2018.
  • [6] H. Yin, A. Varava, and D. Kragic, “Modeling, learning, perception, and control methods for deformable object manipulation,” Science Robotics, vol. 6, no. 54, 2021.
  • [7] D. Navarro-Alarcon, Y.-H. Liu, J. G. Romero, and P. Li, “Model-free visually servoed deformation control of elastic objects by robot manipulators,” IEEE Transactions on Robotics, vol. 29, no. 6, pp. 1457–1468, 2013.
  • [8] D. Navarro-Alarcon, Y.-h. Liu, J. G. Romero, and P. Li, “On the visual deformation servoing of compliant objects: Uncalibrated control methods and experiments,” The International Journal of Robotics Research, vol. 33, no. 11, pp. 1462–1480, 2014.
  • [9] H. Wang, B. Yang, J. Wang, X. Liang, W. Chen, and Y.-H. Liu, “Adaptive visual servoing of contour features,” IEEE/ASME Transactions on Mechatronics, vol. 23, no. 2, pp. 811–822, 2018.
  • [10] D. Navarro-Alarcon and Y.-H. Liu, “Fourier-based shape servoing: a new feedback method to actively deform soft objects into desired 2-d image contours,” IEEE Transactions on Robotics, vol. 34, no. 1, pp. 272–279, 2017.
  • [11] J. Qi, G. Ma, J. Zhu, P. Zhou, Y. Lyu, H. Zhang, and D. Navarro-Alarcon, “Contour moments based manipulation of composite rigid-deformable objects with finite time model estimation and shape/position control,” IEEE/ASME Transactions on Mechatronics, 2021.
  • [12] Z. Hu, T. Han, P. Sun, J. Pan, and D. Manocha, “3-d deformable object manipulation using deep neural networks,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4255–4261, 2019.
  • [13] J. Qi, G. Ma, P. Zhou, H. Zhang, Y. Lyu, and D. Navarro-Alarcon, “Towards latent space based manipulation of elastic rods using autoencoder models and robust centerline extractions,” Advanced Robotics, vol. 36, no. 3, pp. 101–115, 2022.
  • [14] P. Zhou, J. Zhu, S. Huo, and D. Navarro-Alarcon, “LaSeSOM: A latent and semantic representation framework for soft object manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5381–5388, 2021.
  • [15] J. Zhu, D. Navarro-Alarcon, R. Passama, and A. Cherubini, “Vision-based manipulation of deformable and rigid objects using subspace projections of 2d contours,” Robotics and Autonomous Systems, vol. 142, p. 103798, 2021.
  • [16] N. Cazy, P.-B. Wieber, P. R. Giordano, and F. Chaumette, “Visual servoing when visual information is missing: Experimental comparison of visual feature prediction schemes,” in 2015 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2015, pp. 6031–6036.
  • [17] T. Tang, C. Wang, and M. Tomizuka, “A framework for manipulating deformable linear objects by coherent point drift,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3426–3433, 2018.
  • [18] T. Tang and M. Tomizuka, “Track deformable objects from point clouds with structure preserved registration,” The International Journal of Robotics Research, p. 0278364919841431, 2018.
  • [19] D. Navarro-Alarcon, J. Qi, J. Zhu, and A. Cherubini, “A lyapunov-stable adaptive method to approximate sensorimotor models for sensor-based control,” Frontiers in Neurorobotics, vol. 14, 2020.
  • [20] F. Chaumette and S. Hutchinson, “Visual servo control, part I: Basic approaches,” IEEE Robotics and Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006.
  • [21] K. Hosoda and M. Asada, “Versatile visual servoing without knowledge of true jacobian,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’94), vol. 1.   IEEE, 1994, pp. 186–193.
  • [22] F. Alambeigi, Z. Wang, R. Hegeman, Y.-H. Liu, and M. Armand, “Autonomous data-driven manipulation of unknown anisotropic deformable tissues using unmodelled continuum manipulators,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 254–261, 2018.
  • [23] R. Lagneau, A. Krupa, and M. Marchal, “Active deformation through visual servoing of soft objects,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 8978–8984.
  • [24] Lagneau, K. Romain, M. Alexandre, and Maud, “Automatic shape control of deformable wires based on model-free visual servoing,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5252–5259, 2020.
  • [25] L. Han, H. Wang, Z. Liu, W. Chen, and X. Zhang, “Visual tracking control of deformable objects with a fat-based controller,” IEEE Transactions on Industrial Electronics, 2021.
  • [26] A. Hajiloo, M. Keshmiri, W.-F. Xie, and T.-T. Wang, “Robust online model predictive control for a constrained image-based visual servoing,” IEEE Transactions on Industrial Electronics, vol. 63, no. 4, pp. 2242–2250, 2015.
  • [27] H. Wakamatsu and S. Hirai, “Static modeling of linear object deformation based on differential geometry,” The International Journal of Robotics Research, vol. 23, no. 3, pp. 293–311, 2004.
  • [28] M. Yu, H. Zhong, and X. Li, “Shape control of deformable linear objects with offline and online learning of local linear deformation models,” arXiv preprint arXiv:2109.11091, 2021.
  • [29] H. Abdi et al., “The method of least squares,” Encyclopedia of Measurement and Statistics. CA, USA: Thousand Oaks, 2007.
  • [30] P. Lancaster and K. Salkauskas, “Surfaces generated by moving least squares methods,” Mathematics of computation, vol. 37, no. 155, pp. 141–158, 1981.
  • [31] J. Qi, W. Ma, D. Navarro-Alarcon, H. Gao, and G. Ma, “Adaptive shape servoing of elastic rods using parameterized regression features and auto-tuning motion controls,” arXiv preprint arXiv:2008.06896, 2020.
  • [32] K. Smith, “On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations,” Biometrika, vol. 12, no. 1/2, pp. 1–85, 1918.
  • [33] G. G. Lorentz, Bernstein polynomials.   American Mathematical Soc., 2013.
  • [34] I. J. Schoenberg, Cardinal spline interpolation.   SIAM, 1973.
  • [35] M. J. D. Powell et al., Approximation theory and methods.   Cambridge university press, 1981.
  • [36] W. Y. Tey, N. A. Che Sidik, Y. Asako, M. W Muhieldeen, and O. Afshar, “Moving least squares method and its improvement: A concise review,” Journal of Applied and Computational Mechanics, vol. 7, no. 2, pp. 883–889, 2021.
  • [37] H. Zhang, C. Guo, X. Su, and C. Zhu, “Measurement data fitting based on moving least squares method,” Mathematical Problems in Engineering, vol. 2015, 2015.
  • [38] J. C. Mason and D. C. Handscomb, Chebyshev polynomials.   CRC press, 2002.
  • [39] R. T. Farouki, “Legendre–bernstein basis transformations,” Journal of Computational and Applied Mathematics, vol. 119, no. 1-2, pp. 145–160, 2000.
  • [40] Z. Huang, Y. Yu, J. Xu, F. Ni, and X. Le, “Pf-net: Point fractal network for 3d point cloud completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7662–7670.
  • [41] X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5589–5598.
  • [42] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
  • [43] H. Mo, B. Ouyang, L. Xing, D. Dong, Y. Liu, and D. Sun, “Automated 3-d deformation of a soft object using a continuum robot,” IEEE Transactions on Automation Science and Engineering, 2020.
  • [44] K. M. Lynch and F. C. Park, Modern robotics.   Cambridge University Press, 2017.
[Uncaptioned image] Jiaming Qi received the M.Sc. in integrated circuit engineering from Harbin Institute of Technology, Harbin, China, in 2018. In 2019, he was a visiting PhD student at The Hong Kong Polytechnic University. He is currently pursuing the Ph.D. degree with control science and engineering, Harbin Institute of Technology, Harbin, China. His current research interests include data-driven control for soft object manipulation, visual servoing, robotics and control theory.
[Uncaptioned image] Dongyu Li (Member, IEEE) received the B.S. and Ph.D. degree in control science and engineering, Harbin Institute of Technology, China, in 2016 and 2020. He was a joint Ph.D. student with the Department of Electrical and Computer Engineering, National University of Singapore from 2017 to 2019, and a research fellow with the Department of Biomedical Engineering, National University of Singapore, from 2019 to 2021. He is currently an Associate Professor with the School of Cyber Science and Technology, Beihang University, China. His research interests include networked system cooperation, adaptive systems, and robotic control.
[Uncaptioned image] Yufeng Gao (Student Member, IEEE) received his bachelor of engineering and bachelor of economics degrees from Wuhan University of Technology and Wuhan University, China, both in 2016. And he received the Ph.D. degree in control science and engineering, Harbin Institute of Technology, China, in 2021. He was a joint Ph.D. student with the Chair of Automatic Control Engineering, Technical University of Munich, Germany, from 2018 to 2019. His current research interests include spacecraft system modeling, control and optimization.
[Uncaptioned image] Peng Zhou (Student Member, IEEE) was born in China. He received the M.Sc. degree in software engineering from the School of Software Engineering, Tongji University, Shanghai, China, in 2017 and is currently pursuing his the Ph.D. degree in mechanical engineering at The Hong Kong Polytechnic University, Kowloon, Hong Kong. His research interests include deformable object manipulation, motion planning and robot learning.
[Uncaptioned image] David Navarro-Alarcon (Senior Member, IEEE) received the Ph.D. degree in mechanical and automation engineering from The Chinese University of Hong Kong (CUHK), Shatin, Hong Kong, in 2014. From 2014 to 2017, he was a Postdoctoral Fellow and then a Research Assistant Professor with the CUHK T Stone Robotics Institute. Since 2017, he has been with The Hong Kong Polytechnic University, Hong Kong, where he is currently an Assistant Professor with the Department of Mechanical Engineering, and the Principal Investigator of the Robotics and Machine Intelligence Laboratory. His current research interests include perceptual robotics and control theory.