Enhancing Grasping Performance of Novel Objects through an Improved Fine-Tuning Process

Xiao Hu¹,Xiangsheng Chen^1,2 ¹Xiao Hu is with the School of Engineering, Shenzhen University,Shenzhen,China [email protected] The work is done when he is interning at Xpeng Robotics ²XiangSheng Chen ,Academician of the Chinese Academy of Engineering, Shenzhen,China [email protected]

Abstract

Grasping algorithms have evolved from planar depth grasping to utilizing point cloud information, allowing for application in a wider range of scenarios. However, data-driven grasps based on models trained on basic open-source datasets may not perform well on novel objects, which are often required in different scenarios, necessitating fine-tuning using new objects. The data driving these algorithms essentially corresponds to the closing region of the hand in 6D pose, and due to the uniqueness of 6D pose, synthetic annotation or real-machine annotation methods are typically employed. Acquiring large amounts of data with real-machine annotation is challenging, making synthetic annotation a common practice. However, obtaining annotated 6D pose data using conventional methods is extremely time-consuming. Therefore, we propose a method to quickly acquire data for novel objects, enabling more efficient fine-tuning. Our method primarily samples grasp orientations to generate and annotate grasps. Experimental results demonstrate that our fine-tuning process for a new object is 400 % faster than other methods. Furthermore, we propose an optimized grasp annotation framework that accounts for the effects of the gripper closing, making the annotations more reasonable. Upon acceptance of this paper, we will release our algorithm as open-source.

I INTRODUCTION

Grasping has always been a crucial task in the field of robotic manipulation. Due to the diversity of object shapes, robotic grasping faces a severe problem. In recent years, data-driven methods have made impressive progress against the problem[6]. Representative point cloud-based algorithms, such as PointnetGPD[2] and GraspNet[3], show distinct performance in dense clutter scenes. The core aspect of those algorithms lies in the input data[7], (see Fig. 1) called the closing region of the hand, which contains a portion of the object point cloud within the gripper.

Corresponding to the input data, labels used for training are the scores of force closure[13] (a method based on the force friction cone).

Refer to caption — Figure 1: The closing region of the hand Generated by different objects varies, necessary for point cloud-based grasping algorithms.

During the grasping execution phase, heuristic grasping sampling is performed on the scene point cloud (input from the depth camera). The resulting closing region of the hand for all samples is input into the network for inference, and the grasp with the highest score is selected for execution. Compared with other grasping algorithms[16], point cloud-based grasping methods can perform grasping tasks more effectively. This is mainly attributed to their output of 6D pose grasping information[14] (including grasping points and grasping orientation) and the adoption of basic network models for point cloud processing. This endows point cloud-based grasping algorithms with a certain degree of generalization when executing grasping tasks. Moreover, due to the richer 6D pose information output, the grasping success rate of these methods is higher than that of other techniques.

Although these grasping algorithms perform well, they may fail to grasp some novel objects successfully. But in the numerous application scenarios of grasping tasks, there is often a demand for handling the challenge of grasping novel objects. Therefore, we need to fine-tune the grasping dataset to achieve successful grasping of these novel objects.

These methods generate grasps based on the contact points of parallel jaws on the object and produce grasps by performing collision detection and random sampling of gripper poses within a certain angular range.

The core of these methods lies in the closing region of the hand (see Fig. 1) in the dataset and their corresponding score annotations. Therefore, we need to generate the corresponding closing region of the hand contains and score annotations for novel objects. However, using the Antipod method[15] to generate grasps for novel objects is time-consuming on mainstream home computer configurations (Intel Core i7 13th & RTX 3060), requiring 10 hours to generate 6500 grasps with annotation for one object. Hence, we propose a faster grasp generation method, which takes only 1.2 hours to generate 6500 grasps with annotation for an object, and the grasp quality is comparable to the previous method.

Moreover, the Antipod method relies on a pair of fixed points on the gripper as the contact points with the object to generate grasps. However, during the execution of grasping, other points on one side of the gripper may touch the object first, causing displacement of the object. This is due to the fact that the closing region of the hand contains is not symmetric with respect to the central axis of the gripper. Consequently, this situation leads to inaccurate scores for the generated grasp configurations. Additionally, some grasp configurations may cause the object to rotate and eventually fall due to an imbalance of torque when closing the gripper and lifting the object, as the contact force is limited by the gripper’s power constraints. To address these two issues, we make corrections in large-scale simulations, making the grasp scores more accurate and effective.

Our main contributions are:

$\bullet$

We propose a framework generation algorithm that is significantly faster than previous methods.
$\bullet$

We introduce a filtering approach reduce duplicated and infeasible grasps.
$\bullet$

Real-world experiments demonstrate the effectiveness of fine-tuning.

II Related Work

Point cloud based grasping algorithms primarily rely on offline grasp generation methods to generate the closing region of the hand and their corresponding score annotations.

II-A Grasp Configuration Detection

Given an object or a dense clutter scene, the goal of the grasping algorithm is to find and execute the optimal grasp at the moment. Current data-driven algorithms are mainly divided into two categories: planar depth grasping and point cloud-based 6D pose grasping. In planar depth grasping, leading algorithms in terms of grasping success rate include GGCNN[1], GQCNN[28], and others. The core idea of these algorithms is consistent. Taking GGCNN as an example, during the offline stage, a large number of RGB images of objects and their aligned planar depth maps are collected, and then these images are annotated with grasps (manually using rectangular bounding boxes). These data are then used to train the network. In the online stage, the input is the RGB image and its corresponding depth map, and the output is the grasp confidence, width, and angle for each pixel. The main drawback of these methods is the loss of the gripper’s degrees of freedom, which can only perform grasps from above. Therefore, flat objects such as pizza boxes cannot be grasped from above. Moreover, these algorithms have a lower grasping success rate when dealing with dense clutter scenes, so point cloud-based grasping algorithms are needed.

II-B Point cloud based grasping algorithms

Liang et al. proposed the PointNetGPD[2] algorithm to address the problem of selecting grasp configurations from point clouds. This method mainly consists of two stages: offline and online. In the offline stage, grasp datasets are generated from object point clouds in open-source datasets, specifically by sampling grasps for each object point cloud based on the Antipod method[15], with each grasp corresponding to the closing region of the hand and its score annotation. Subsequently, a network model is designed based on PointNet[29], which takes the closing region of the hand as input and outputs the corresponding score. In the online stage, the scene point cloud (provided by a depth camera) is input, and GPD[14] is used to sample grasps from the scene point cloud, obtaining candidate closing regions of the hand contains. Then the grasp with the highest score is executed. Firstly, when grasping novel objects different from those in the open-source dataset, the algorithm’s grasping success rate is much lower than that for grasping objects similar to those in the open-source object dataset. Secondly, the algorithm’s offline stage of generating the closing region of the hand for an object requires ten hours on a mainstream home computer (Intel Core i7 13th), which is too time-consuming. Lastly, the score annotation is based on the force closure method, which does not take into account that the object may move and cause grasping failure when the gripper closes.

Fang et al. proposed GraspNet[3] as a benchmark for large-scale grasp generation. GraspNet takes the scene point cloud as input, specifically, generating grasps for objects in open-source datasets using the PointNetGPD[2] method during the offline stage. Then, GraspNet arranges and combines these objects in the scene, removing grasps that collide with other objects, thereby generating scene point cloud grasps different from previous single-object grasps. This method has a higher grasping success rate in dense clutter scenarios, as it fully considers the effective closing region of the hand contains for objects stacked in the scene. However, its shortcomings are the same as those of PointNetGPD, namely, the large time consumption for grasp generation and the lack of consideration for the scoring annotation method during the grasping process.

II-C Grasp Quality Metrics

To evaluate the quality of grasps, many analytical methods have physically analyzed the geometric shape of the gripper configuration and the objects used for evaluating grasp quality. Force closure[13] and GWS[17] analysis are two mainstream grasp quality metrics. The force closure method considers the friction between the gripper and the object, while GWS can handle frictionless situations. These two metrics are very versatile and have driven the progress of numerous grasp research directions. However, these methods do not take into account that during the execution of grasping, other points on the gripper may contact the object first, causing the object to move and leading to the failure of some initially high-quality grasps. Therefore, we considered this situation and optimized the scoring.

II-D Grasp Annotations

Goldfeder et al. created the Columbia Grasp Database using GraspIt combined with form closure. It takes approximately 10 minutes to generate 15 form closure grasps, and the total computation time to build the grasp dataset on a server is about one month. This computational cost is evidently too high and not suitable for use during fine-tuning.

Lei et al. proposed a fast grasp generation method “Fast grasping ” This method requires downsampling of the object first, followed by PCA analysis to obtain the principal axes. Then, planar grasp annotations are performed on the object’s cross-section along two orthogonal principal axes, which are then converted back to 3D grasps. Although this method is significantly faster than others, its drawback is the low diversity of grasps, as the grasp vectors are all along the PCA principal axes constructed by the object, making it unsuitable for fine-tuning as well.

III Methods

Our proposed method (see Fig. 2) mainly aims to generate supplementary grasp datasets for novel objects more quickly and optimize the scoring annotations. Specifically, we first generate a CAD model for the novel object, and then based on spatial sampling grasp poses, we traverse the object, generating grasps and scoring annotations after filtering. Subsequently, we rescore the scoring annotations based on the object’s movement during the grasping process using large-scale simulation. Furthermore, we set the maximum contact force for the gripper and filter out grasps with unbalanced torques.

III-A Generate CAD models of novel objects

As data-driven grasping algorithms may fail to successfully grasp some novel objects when applied to different scenes, we need to fine-tune them to better grasp these novel objects.

The first requirement is the CAD model of novel objects. However, creating a detailed model of an object is very cumbersome. We have designed a framework that requires only a robotic arm and an RGB-D camera to output the CAD model of a new object for subsequent grasp generation.

First, we use the robotic arm to capture images based on the depth returned by the camera(see Fig. 2). The robotic arm takes 32 effective photos of the object from different angles. These photos are then input into Nerf (Reconstructing 3D scenes from a set of sparse 2D images) to generate the object’s CAD model. Finally, the generated object is upsampled to output a more detailed CAD model for subsequent grasp generation.

In this way, we achieve the CAD model of our novel objects in the same quality of open-source datasets which set for grasp generation.

III-B Generate Grasp

The process of generating grasp datasets using Antipod-based methods, such as Columbia Grasp[24] and PointnetGPD[2], are time-consuming,with thousands of grasps taking up to ten hours(On 12th Gen core i7 12700H). Therefore, we propose a new method to generate grasps, which can fine-tune more quickly.

First, the grasp configuration is defined as ,

\displaystyle\text{G}={[p,r]}\in\mathbb{R}^{6}

(1)

$p=[x,y,z]\in\mathbb{R}^{3}$ represents grapspoint , $r=[r_{x},r_{y}r_{z}]\in\mathbb{R}^{3}$ represents the gripper’s orientation.

The Antipod-based methods are based on contact points to generate grasps. Specifically, it first finds a point on the object, samples grasps with the gripper width as the radius, and then samples grasp poses within a certain range based on the sampled contact points. The grasp poses are evaluated for collision and force closure scores. which obviously causes many repetitive grasp configurations for parallel jaws, such as

\displaystyle{G[\frac{n1+n2}{2},r]}={G[\frac{n2+n1}{2},r]}

(2)

n1 n2 are the contact points.

Since we aim to obtain diverse closing regions of the hand during the fine-tuning process and to accelerate the grasp generation, we propose a framework that differs from other similar approaches[8]. We sample a set of grasp orientations that are uniformly distributed in space. These grasp orientations are used to generate grasps by iterating through every point on the object.

In the space we set a grasp orientation and then we obtain the collection of grasp orientation r.

\displaystyle{r_{i}}\in{[r_{1},r_{2},...,r_{n}]}

(3)

The algorithm for our entire grasp generation process is as follows:

Algorithm 1 Generate Grasp

0: CAD model, Config of parallel jaws

G

[grasppoint , grapsconfig , grasp quality]

1: Define uniformly sampled

R_{i}

grasp approach orientation

normal_{i}

Point cloud redirection

3: Iterate over the

n

points on the object

4: if gripper no collision with objects then

g_{i}\in G

6: end if

7: Remove repeated closing region

8: Computational force closure grasping

In this process, we use collision detection and local reorientation of the object’s normal vector to ensure that the generated grasps do not collide with the object. The algorithm complexity of our sampling process is $O(n)$ (Compared to the Antipod-based algorithm, the algorithm complexity is $O(n^{2})$ ). And then we remove duplicate closing region of the hand contains.Finally, we employ the force closure scoring strategy same as PointnetGPD[2] and GraspNet[3], and the results show that our method is 400% faster. The score distribution and grasp success rate of our method are basically consistent with previous algorithms.

The reasons why our method is faster than the antipodal approach and achieves equivalent results are as follows. First, our method does not generate duplicate grasps, as previously mentioned, the contact point-based antipodal approach produces repetitive grasps. Second, we filter out unreasonable grasps based on the differential normal vectors of the object faces(see in Fig. 3), which is more time-efficient than using force closure to eliminate these grasps. Finally, we employ the same scoring and labeling method as the antipodal approach, ensuring consistency in the generated grasp data and scoring effectiveness.

III-C Stable grasp

In the process of executing grasping, the gripper moves to the grasp point according to the grasp approach and then closes the gripper. However, during this process, an important step has been overlooked. As shown in Fig. 4, when the gripper is closing, if one side of the gripper touches the object first, the force balance of the object will be disrupted, resulting in the object’s movement.resulting in movement and an unstable grasp.

We define a stable grasp as:

\displaystyle PC_{1}\cdot r=PC_{2}

(4)

$r=I-2uu^{T}\in\mathbb{R}^{3},u=[0,0,1]\in\mathbb{R}^{3}$ represents the mirror rotation matrix.

$PC_{1}\&PC_{2}\in\mathbb{R}^{3}$ represent the point clouds on the left and right sides of the grasp approach within the closing region of the hand, respectively.

Slight differences may cause the gripper to touch the object on one side first during the grasping process, leading to object movement and grasping failure. Some grasps that are judged to be good under static force closure conditions may also fail during execution. Thus, we need to filter out these grasps.

Our goal is to minimize the discrepancy between grasps generated under static conditions and the execution of these grasps.

\displaystyle Score(||S(p,r)-G(p,r)||)

(5)

Optimize the grasps generated under static conditions $G(p,r)$ and the scoring annotations of grasps after the gripper closes $S(p,r)$ .

For the reasons mentioned above, we adopt large-scale simulation for grasp filtering. First, we perform the grasps generated under static conditions using large-scale simulation. After the gripper closes, if the object is outside the gripper space, the grasp is considered invalid and removed from the grasp configuration. In addition, after the grasp execution is completed, the score of the grasp is reassigned based on the ratio of the distance moved by the object’s center of mass to the size of the gripper. This makes the annotation of the grasp dataset more effective.we have designed a simple weighting function.

\displaystyle Weights=(1-\frac{d_{1}-d_{2}}{width})/2

(6)

d1 and d2 represent the Euclidean distances between the first points on both sides of the gripper to contact the object along the gripper closing direction and the object, respectively. width reperesent the width of gripper.

In some cases, grasp may fail, and objects may slip out of the gripper. This is mainly due to an upper limit on the contact force. When a larger torque balance is required for stable grasping, the grasp may fail.(bad grasp in Fig. ???2)

\sum\vec{M}\neq 0

(7)

To address the issue of power constraints and contact force limits during gripper closure. Firstly, we assign a certain mass to the objects to more accurately simulate the dynamic characteristics of objects during the actual grasping process. Next, we set the contact force limit during gripper closure to ensure that the gripper operation in the simulation environment conforms to the actual situation.

During the large-scale simulation, we test each grasp configuration and observe whether the object can be successfully moved to the target position after the gripper closes. We filter out grasp configurations that cannot successfully move the object within the contact force limit range. In this way, we can select grasp configurations that are more likely to succeed in actual operations, thereby generating better strategies.

IV Experiment

IV-A Generate Grasp

Our method significantly outperforms other approaches in terms of grasp generation speed. For the same set of novel objects, our method is capable of generating 6,500 optimal grasping configurations in just 1.2 hours, while other methods take an average of 10 hours to achieve the same results.(see in Table I)

We employ sampled grasp orientation to generate grasping actions. To evaluate and annotate these grasping actions, we adopt the force closure method, which is consistent with traditional methods. Throughout this process, we observe that the grasp score distributions generated by this approach remain similar for different novel objects, even though their shapes and characteristics vary. This indicates that our method is logically consistent and can be effectively applied to various types of objects. By combining the grasp orientation and force closure methods, we can ensure the logical clarity of the grasp generation process while maintaining its versatility for different objects.

TABLE I: time cost

Methods	Speed	Quantity	Quality
PointnetGPD[2]	7.2s/per	6500	high
Columbia grasp[24]	12s/per	5000	high
Fast Grasp[25]	0.5s/per	2000	low
Billion ways to grasp[26]	240s/per	15000	very high
Ours	1.2s/per	10000	high

*

Speed represents the time taken to generate a grasp on the same computer configuration as previously described.
**

low indicates that the success rate of executing generated grasps is below 30%, very high signifies a success rate of 95% or higher, and high represents a success rate greater than 50%.”

IV-B Large scale sim

IV-B1 Stable Grasp

By utilizing large-scale simulation grasping, we are able to quickly filter out grasp configurations that may lead to failure when the gripper closes. Specifically, these unsuccessful grasp configurations are often due to the asymmetry of the closing region of the hand relative to the grasp approach, causing the gripper to contact the object first during the closing process and disrupt the object’s equilibrium, resulting in object movement. This phenomenon is particularly evident in densely cluttered scenes, as unstable grasps can have a significant impact on the grasping performance in such scenarios.

In order to filter out effective grasp configurations, we employed a large-scale simulation approach. Specifically, we first created n simulation environments, in which the objects were placed in the same position and orientation. Subsequently, we performed grasping operations in each environment and observed the movement of the objects after the gripper closed. If the movement distance or orientation change of the objects exceeded the threshold, we considered the grasp configuration to be invalid and removed it from the candidate set. For those valid grasp configurations within the threshold range, we assigned weights to their annotated scores based on the object’s movement distance and orientation change.

As shown in Fig. 6, it can be observed that the original force-closure-based scoring method did not fully consider the potential relative movement of objects during grasping. To address this issue and make the scoring annotation more reasonable, we introduced a threshold to optimize the score distribution. Through this optimization approach, we can more effectively evaluate the feasibility of grasping and improve the success rate of grasping(see in In the table. II).

IV-B2 Desire Grasp

In this filtering step, we primarily eliminate grasp configurations that, after the gripper closes, may result in torque imbalance due to limited gripper contact force. Specifically, when the gripper closes, if the contact force cannot generate sufficient torque to maintain the object’s stability, the object may slide. At this point, static friction transitions to dynamic friction, reducing the frictional force and further exacerbating the object’s sliding. Ultimately, this unstable grasping may cause the object to fall from the gripper. By filtering out these potentially failing configurations, we can optimize the grasping strategy.

In our large-scale simulation setting with a maximum contact force of 25 N, we tested the antipodal method on a sample of 13 objects from the YCB dataset, each with 6,500 grasp configurations. We found that 42.2% (35,659/84,500) of the grasps were ultimately unsuccessful after the gripper closure. By removing these failed grasps and fine-tuning the model training, we were able to improve the grasp success rate.

IV-B3 Real robot experiments

In our real-world experiments, the gripper employed is not a Franka model; however, it is actuated following the Franka’s approach and also adopts a two-finger configuration.”

TABLE II: Real-world experiments

Objects	before finetune	after finetune
gamepad	0/3	3/3
glue bottle	3/3	3/3
Tape	1/3	1/3
Box	2/3	3/3
doll_01	1/3	1/3
doll_02	0/3	1/3
doll_03	2/3	2/3
milk box	3/3	3/3
Coke can	3/3	3/3

*

1/3 indicates one successful grasp out of three trials. The above figure illustrates the results of the first fine-tuning process for the gamepad controller. doll see in Fig. 7

In the table. II Prior to the fine-tuning of the game controller, all attempts to grasp it were unsuccessful. However, after fine-tuning using our proposed framework, the controller could be grasped effectively. Moreover, we conducted grasping tests on other objects, and the fine-tuning did not result in a decreased success rate for grasping these objects.

In addition, we conducted another set of real-world experiments, where three dolls, a game controller, and a milk carton were fine-tuned using our proposed workflow. The grasp success rate for each of the three dolls increased to 3/3, while the success rate for the other objects remained unchanged(see in table. II). These two sets of experiments demonstrate that our proposed framework can successfully grasp novel objects.

V CONCLUSIONS

Our proposed grasp generation method significantly reduces the time required for fine-tuning on novel objects. Furthermore, we introduce an optimized framework for annotating grasp scores, which improves the grasp success rate and provides more reasonable grasp score annotations. Ultimately, through real-world experiments, we demonstrate the effectiveness of our framework for fine-tuning on novel objects.

ACKNOWLEDGMENT

We would like to express our gratitude to BiDan Huang and JunKun Peng for their valuable suggestions on this article.

References

[1] D. Morrison, P. Corke, and J. Leitner, “Learning robust, real-time, reactive robotic grasping:,” The International Journal of Robotics Research, pp. 183–201, Mar. 2020, doi: 10.1177/0278364919859066.
[2] H. Liang et al., “PointNetGPD: Detecting Grasp Configurations from Point Sets.,” in 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, May 2019. doi: 10.1109/icra.2019.8794435.
[3] H.-S. Fang, C. Wang, M. Gou, and C. Lu, “GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, Jun. 2020. doi: 10.1109/cvpr42600.2020.01146.
[4] Yun Jiang, S. Moseson, and A. Saxena, “Efficient grasping from RGBD images: Learning using a new rectangle representation,” in 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, May 2011. doi: 10.1109/icra.2011.5980145.
[5] L. Pinto and A. Gupta, “Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours,” arXiv: Learning,arXiv: Learning, Sep. 2015.
[6] J. Redmon and A. Angelova, “Real-Time Grasp Detection Using Convolutional Neural Networks,” Cornell University - arXiv,Cornell University - arXiv, Dec. 2014.
[7] J. Bohg, A. Morales, T. Asfour, and D. Kragic, ”Data-driven grasp synthesis—a survey,” IEEE Transactions on Robotics (T-RO), 2014.
[8] A. T. Miller and P. K. Allen, ”Graspit! a versatile simulator for robotic grasping,” IEEE Robotics Automation Magazine, 2004.
[9] J. Mahler et al., “Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, May 2016. doi: 10.1109/icra.2016.7487342.applications. ACM Transactions on Graphics (TOG) 33(4):1–12 Theory of Computing (STOC), 1990.
[10] I. Lenz, H. Lee, and A. Saxena, ”Deep learning for detecting robotic grasps,” The International Journal of Robotics Research (IJRR), 2015.
[11] Robert Platt (2022). Grasp Learning: Models, Methods, and Performance
[12] Ni, P., Zhang, W., Zhu, X., Cao, Q. (2020, May). Pointnet++ grasping: Learning an end-to-end spatial grasp generation algorithm from sparse point clouds. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3619-3625). IEEE.
[13] V.-D. Nguyen, “Constructing force-closure grasps,” in Proceedings. 1986 IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, Mar. 2005. doi: 10.1109/robot.1986.1087483.
[14] A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt, “Grasp Pose Detection in Point Clouds,” The International Journal of Robotics Research, pp. 1455–1473, Dec. 2017, doi: 10.1177/0278364917735594.
[15] J. Bohg, A. Morales, T. Asfour, and D. Kragic, ”Data-driven grasp synthesis—a survey,” IEEE Transactions on Robotics (T-RO), 2014.
[16] Jiang Y, Moseson S, Saxena A. 2011. Efficient grasping from rgbd images: Learning using a new rectangle representation. In 2011 IEEE International conference on robotics and automation, pp. 3304–3311. IEEE
[17] Borst, C., Fischer, M., Hirzinger, G. (2004, April). Grasp planning: How to choose a suitable task wrench space. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 (Vol. 1, pp. 319-325). IEEE.
[18] J. Bohg, A. Morales, T. Asfour, and D. Kragic, ”Data-driven grasp synthesis—a survey,” IEEE Transactions on Robotics (T-RO), 2014.
[19] Depierre, A., Dellandréa, E., Chen, L. (2018, October). Jacquard: A large scale dataset for robotic grasp detection. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3511-3516). IEEE.
[20] Morrison D, Corke P, Leitner J. 2020. Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation. IEEE Robotics and Automation Letters 5(3):4368– 4375
[21] Savva M, Chang AX, Hanrahan P. 2015. Semantically-enriched 3D models for common-sense knowledge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 24–31
[22] Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas (2016). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation arXiv: Computer Vision and Pattern Recognition.
[23] Eppner C, Mousavian A, Fox D. 2021. ACRONYM: A large-scale grasp dataset based on simulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6222–6227. IEEE
[24] Goldfeder, C., Ciocarlie, M., Dang, H., Allen, P. K. (2009, May). The columbia grasp database. In 2009 IEEE international conference on robotics and automation (pp. 1710-1716). IEEE.
[25] Q. Lei, G. Chen, and M. Wisse, “Fast grasping of unknown objects using principal component analysis,” AIP Advances, Sep. 2017, doi: 10.1063/1.4991996.
[26] C. Eppner, A. Mousavian, and D. Fox, “A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set”.
[27] Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., … , State, G. (2021). Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470.
[28] Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., … , Goldberg, K. (2017). Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312.
[29] Qi, C. R., Su, H., Mo, K., Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652-660).