This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Pheno-Robot: An Auto-Digital Modelling System for In-Situ Phenotyping in the Field

Yaoqiang Pan1,∗, Kewei Hu1,∗, Tianhao Liu2, Chao Chen2, and Hanwen Kang2,# Equal Contribution1 K.Hu and Y.Pan are with the College of Engineering, South China Agriculture University, Guangzhou, China2 T.Liu, C.Chen, and H.Kang are with the Department of Mechanical and Aerospace Engineering, Monash University, Melbourne, Australia
Abstract

Accurate reconstruction of plant models for phenotyping analysis is critical for optimising sustainable agricultural practices in precision agriculture. Traditional laboratory-based phenotyping, while valuable, falls short of understanding how plants grow under uncontrolled conditions. Robotic technologies offer a promising avenue for large-scale, direct phenotyping in real-world environments. This study explores the deployment of emerging robotics and digital technology in plant phenotyping to improve performance and efficiency. Three critical functional modules: environmental understanding, robotic motion planning, and in-situ phenotyping, are introduced to automate the entire process. Experimental results demonstrate the effectiveness of the system in agricultural environments. The pheno-robot system autonomously collects high-quality data by navigating around plants. In addition, the in-situ modelling model reconstructs high-quality plant models from the data collected by the robot. The developed robotic system shows high efficiency and robustness, demonstrating its potential to advance plant science in real-world agricultural environments.

I Introduction

Plant phenotyping plays a fundamental role in precision agriculture. It involves identifying and selecting genetics with advantageous input traits and complementary output traits [1]. While phenotyping in the laboratory plays a key role in identifying promising lines for crossbreeding, they are a surrogate for the primary goal of understanding how a crop will grow in real-world environments [2]. Uncontrolled ’non-laboratory’ conditions present significant challenges, particularly in the analysis of the traits responsible for beneficial responses [3]. Therefore, the mass collection of phenotypic data in the field is essential for precision agriculture.

At the heart of plant phenotyping is the digital modelling of plant growth and traits, including both appearance and geometry [4]. This process involves monitoring the growth, development, and changes in plants, taking into account factors such as climate, soil properties, pests, and diseases [5]. Image-based plant modelling is a widely used approach, particularly for analysing observable traits [4]. However, analysing plants from a single viewpoint poses challenges, especially when plants overlap [6]. To overcome this limitation, RGB images acquired from multiple viewpoints are widely used. Recently, 3D reconstruction technology has become a prominent tool for plant analysis [7]. This new paradigm provides essential information about the geometric aspects of plant [8], including height [9], volume [10], and even mass [11]. These advances in modelling technology, coupled with the increasing availability of information-rich data, contribute significantly to the capture of detailed plant characteristics.

Refer to caption
Figure 1: Pheno-Robot system operates in the greenhouse.
Refer to caption
Figure 2: System Overview of the Pheno-Robot system.

Despite significant advances, the current practice heavily relies on manual operation, which is highly labour-intensive and inefficient, making it impractical for large-scale farms. Robotics offers the potential for widespread sampling in the field under authentic agricultural conditions. However, the realisation of robotic automation for high-quality in-situ plant phenotyping in the field faces two main challenges. Firstly, there is a lack of effective methods for perception and autonomously navigating the intricacies of farm landscapes to perform large-scale and direct phenotyping. Secondly, there is a crucial gap in the availability of an accurate and effective digital modelling method capable of generating high-fidelity and multi-modal plant models. Successfully overcoming these challenges holds the key to enabling capabilities for repeated and detailed assessments of plants, potentially rendering a paradigm shift in the development of agri-genetics.

In this research, we present an innovative robotic system designed for autonomous in-situ phenotyping in the field, synthesising both robotic and digital technologies. The developed Pheno-Robot system comprises a comprehensive framework that includes a deep learning model and a motion planning method for robotic automation. It also incorporates a Neural Radiance Field (NeRF)-based modelling network, which enhances the system’s ability to perform detailed and accurate in-situ phenotyping. Our key contributions are:

  • develops a novel environmental understanding and the robotic navigation method tailored for farm environments.

  • develops an improved NeRF method to address sparse-view input to achieve good quality in plant modelling.

  • presents a hierarchy mapping method and demonstrates the Pheno-Robot system in agricultural settings.

The rest of this paper is organised as follows. Section II surveys related work, followed by the proposed methodologies in Section III. The experiment results are given in Sections IV. and then the conclusions are given in Section V.

II Related Works

II-A Robotic Automation in Agriculture

Robots play a key role in precision agriculture, using smart and intervention technologies to improve efficiency through sensing and automation. Precision agriculture addresses spatial and temporal variability in the environment and plant growth patterns, scaling from traditional farm level [9] to sub-field precision [12]. Horticulture farms are more complex than crop farms, requiring robots to navigate semi-structured, challenging terrain in dynamic environments [13]. Firstly, sensor information is essential for detecting objects and potential risks in the field to ensure the safe operation of robotic vehicles. Secondly, machine learning is indispensable for farming robots to maximize locomotion flexibility (e.g., moving sideways or navigating narrow spaces between crops) and optimize input utilization with greater efficacy. The agility of these robots, coupled with the ability to carry specialized sensors, holds the key to unlocking the full potential of precision agriculture. Despite notable progress, exemplified by robots designed for autonomous operations in orchard environments for selective harvesting [14] or monitoring applications [15], there is still uncharted territory. Robots capable of precise environmental recognition, robust and flexible motion, and accurate plant modelling remain an area for exploration.

II-B In-situ phenotyping

Phenomics encompasses the study of various phenotypic plant traits, including growth, yield, plant height, leaf area index and more. Traditional methods rely on imaging and 3D range sensors to measure various plant traits such as colour, shape, volume, and spatial structure [8]. For example, the growth rate of rosette plants such as Arabidopsis [16] and tobacco [6] is often analysed using images taken from a single viewpoint. On the other hand, 3D range sensors allow accurate measurement of plant geometry and characteristics such as height, width and volume [13]. Kang et al. [17] proposed a sensor fusion system for yield estimation in apple orchards, using deep learning-based panoptic segmentation algorithms. Wu et al. [2] utilised Multi-View Stereo (MVS) for the reconstruction of crop geometry in the field. However, 3D range sensors face limitations in agricultural environments due to strong occlusion, often resulting in a bleeding effect [18] near discontinuous geometry. In addition, their point resolution is sparse compared to image-based data. While MVS-based methods offer better quality with sufficient data, they require hours to process even simple plants [11], limiting their applicability in real-time operations. Furthermore, current technologies do not provide a multi-modal representation of instances, making them inefficient for analysing plant features when multiple data types (image/geometry) are required.

III Methods

The Pheno-Robot system comprises three subsystems: an Environmental Perception Module (EPM), a Motion Planning Module (MPM), and an In-situ Phenotyping Module (IPM), as illustrated in Figure 2.

III-A Environment Perception

III-A1 3D Detection Network

In this study, we utilise a novel neural network, the 3D Object Detection Network (3D-ODN), as introduced in our previous study [19]. Specifically designed for processing point cloud maps in Bird’s-Eye View (BEV), the 3D-ODN consists of three integral network branches: point cloud subdivision, feature coding, and a detection head. First, the entire point cloud map is uniformly segmentated. The points within each segment are then projected onto the BEV perspective, where local semantic features are extracted. These features are then fed into the main branch, forming a single-stage network architecture dedicated to processing the point cloud features from the BEV angle. To improve the learning of multi-scale feature embeddings, the feature pyramid is used to facilitate the fusion of features of multi-scales. Finally, the recognition results are projected into 3D space based on the predicted height values. The identified instances are denoted as 𝐓={t1,t2,|ti2}\mathbf{T}=\{t_{1},t_{2},\cdots\rvert t_{i}\in\mathbb{R}^{2}\}, representing the instances detected by the 3D-ODN (see Fig.3 (a)).

Refer to caption
Figure 3: Illustration of EPM principle.

III-A2 Graph Mapping

For each instance, eight nodes are designated to represent the four corners in two different directions of motion, as shown in Fig.3 (b). This orientation consideration aims to mitigate turning movements in narrow passages, which could otherwise lead to a higher failure rate of movement due to trapping into potential obstacles. Consequently, each instance oo is associated with eight nodes, denoted as ti={N1i,N2i,,N8i|Nji2}t_{i}=\{N^{i}_{1},N^{i}_{2},\cdots,N^{i}_{8}\rvert N^{i}_{j}\in\mathbb{R}^{2}\} (see Fig.3 (b)). To extract the line structure of farms, a line detection algorithm is used to group instances in the same row, denoted 𝐠\mathbf{g}. Each row has two common access nodes at both ends, denoted NlN_{l} and NrN_{r} (NlN_{l} and Nr2N_{r}\in\mathbb{R}^{2}). These nodes serve to establish connections between all instances within the group and to form links with other groups (see Fig.3 (a)). A group is thus characterised as gi={t1,t2,,Nli,Nri}g_{i}=\{t_{1},t_{2},\cdots,N^{i}_{l},N^{i}_{r}\}. The map is then represented as a set of groups denoted by 𝐆={g1,g2,}\mathbf{G}=\{g_{1},g_{2},\cdots\}.

III-B Trajectory Planning

Refer to caption
Figure 4: Illustration of MPM principle.

III-B1 Global Path Planning

A Greedy-search-based path generation algorithm on the graph map is developed, as illustrated in Alg.1. The following are the relevant definitions:

  • 𝐓\mathbf{T} represents the goal set that requires phenotyping.

  • 𝐕\mathbf{V} represents the subgroups if their instances in 𝐓\mathbf{T}.

  • 𝐍𝐬𝐭𝐚𝐫𝐭\mathbf{N_{start}} represents the robot position at the start.

  • 𝚪\mathbf{\Gamma} represents the global path that includes a sequence of nodes.

Algorithm 1 Global Path Generation
0:  𝐓,𝐆,𝐍𝐬𝐭𝐚𝐫𝐭\mathbf{T},\mathbf{G},\mathbf{N_{start}}
0:  𝚪\mathbf{\Gamma}
1:  for tit_{i} in 𝐓\mathbf{T} do
2:     𝐯new𝐅𝐢𝐧𝐝𝐏𝐚𝐫𝐞𝐧𝐭(ti);\mathbf{v}_{new}\leftarrow\mathbf{FindParent}(t_{i});
3:     𝐕𝐕𝐯new;\mathbf{V}\leftarrow\mathbf{V}\bigcup\mathbf{v}_{new};
4:  end for
5:  while 𝐕\mathbf{V}\neq\mathbf{\emptyset} do
6:     𝐯j𝐅𝐢𝐧𝐝𝐍𝐞𝐚𝐫𝐞𝐬𝐭𝐒𝐮𝐛𝐠𝐫𝐨𝐮𝐩(𝚪,𝐕);\mathbf{v}_{j}\leftarrow\mathbf{FindNearestSubgroup}(\mathbf{\Gamma},\mathbf{V});
7:     𝚪new𝐏𝐥𝐚𝐧𝐂𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐨𝐧(𝚪,𝐯j);\mathbf{\Gamma}_{new}\leftarrow\mathbf{PlanConnection}(\mathbf{\Gamma},\mathbf{v}_{j});
8:     𝚪𝚪𝚪new;\mathbf{\Gamma}\leftarrow\mathbf{\Gamma}\bigcup\mathbf{\Gamma}_{new};
9:     𝐃𝐞𝐥𝐞𝐭𝐞𝐯j\mathbf{Delete}\ \mathbf{v}_{j} 𝐟𝐫𝐨𝐦𝐕;\mathbf{from}\ \mathbf{V};
10:  end while
11:  return  𝚪\mathbf{\Gamma}

Two key sub-functions presented in Alg.1 are described as follows:

  • 𝐅𝐢𝐧𝐝𝐏𝐚𝐫𝐞𝐧𝐭\mathbf{FindParent}: this function finds the group that this instance belongs to.

  • 𝐅𝐢𝐧𝐝𝐍𝐞𝐚𝐫𝐞𝐬𝐭𝐒𝐮𝐛𝐠𝐫𝐨𝐮𝐩\mathbf{FindNearestSubgroup}: this function finds the subgroup that has the instance closest to the current robot location.

The key functions 𝐏𝐥𝐚𝐧𝐂𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐨𝐧\mathbf{PlanConnection} is described in Alg.2.

Algorithm 2 𝐏𝐥𝐚𝐧𝐂𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐨𝐧\mathbf{PlanConnection}
0:  𝚪,𝐯={t1,t2,,tn|ti𝐠i}\mathbf{\Gamma},\mathbf{v}=\{t_{1},t_{2},\cdot,t_{n}\rvert t_{i}\in\mathbf{g}_{i}\}
0:  𝚪new\mathbf{\Gamma}_{new}
1:  while 𝐯\mathbf{v}\neq\mathbf{\emptyset} do
2:     𝐍new,ti𝐅𝐢𝐧𝐝𝐍𝐞𝐚𝐫𝐞𝐬𝐭𝐀𝐧𝐝𝐅𝐞𝐚𝐬𝐢𝐛𝐥𝐞(𝚪,𝐯);\mathbf{N}_{new},\ t_{i}\leftarrow\mathbf{FindNearestAndFeasible}(\mathbf{\mathbf{\Gamma},v});
3:     𝚪new𝚪new𝐍new;\mathbf{\Gamma}_{new}\leftarrow\mathbf{\Gamma}_{new}\bigcup\mathbf{N}_{new};
4:     if 𝐈𝐅𝐅𝐮𝐥𝐥𝐲𝐂𝐨𝐯𝐞𝐫(Γnew,ti)\mathbf{IFFullyCover}(\Gamma_{new},t_{i}) then
5:        𝐃𝐞𝐥𝐞𝐭𝐞ti\mathbf{Delete}\ t_{i} 𝐟𝐫𝐨𝐦𝐯;\mathbf{from}\ \mathbf{v};
6:     end if
7:  end while
8:  return  𝚪new\mathbf{\Gamma}_{new}

The function 𝐅𝐢𝐧𝐝𝐍𝐞𝐚𝐫𝐞𝐬𝐭𝐀𝐧𝐝𝐅𝐞𝐚𝐬𝐢𝐛𝐥𝐞\mathbf{FindNearestAndFeasible} finds the feasible and the nearest node (see Fig.4 (a)) of an instance in the subgroup. It evaluates the feasibility from two perspectives: traversability and orientation, which will be detailed in Sec III-B3. Another function 𝐈𝐅𝐅𝐮𝐥𝐥𝐲𝐂𝐨𝐯𝐞𝐫\mathbf{IFFullyCover} evaluates if an instance has been fully covered by the generated path. This is achieved by checking the number of connected nodes of an instance, as shown in Fig.4 (b).

III-B2 Local Trajectory Generation

This step aims to compute the detailed path based on the global path, including three steps: path generation, optimisation, and interpolation.

Initial path Generation: Given the global path Γ\Gamma, the first step is to generate an initial collision-free path between two nodes. Two scenarios are considered, including path generation for phenotyping data acquisition (where two adjacent nodes belong to the same instance) and paths between nodes of different instances. In the first case, a sequence of collision-free preset sample positions is established and the RRT algorithm is used to connect these positions to form the sampling trajectory between nodes. In the second case, the 𝔸\mathbb{A}^{*} algorithm is used to determine a collision-free path between nodes.

Trajectory Optimisation: The initial path Φ\Phi consists of a sequence of points {Q0,Q1,|Q2}\{Q_{0},Q_{1},\cdots|Q\in\mathbb{R}^{2}\}, and we conceptualise a trajectory t[0,1]Φ2t\in[0,1]\rightarrow\Phi\subset\mathbb{R}^{2} as a continuous function mapping time to robot states. The objective function incorporates three key aspects of the robot’s motion. It penalises velocities to encourage smoothness, proximity to the environment to secure trajectories that maintain a certain distance from obstacles, and the distance between the state and the preset viewpoint. These terms are represented as fsf_{s}, fcf_{c}, and fof_{o}, respectively. The objective function is:

Φ=𝐚𝐫𝐠𝐦𝐢𝐧Φαsfs(Φ)+αcfc(Φ)+αofo(Φ)\Phi^{*}=\mathop{\mathbf{argmin}}_{\Phi}\ \alpha_{s}f_{s}(\Phi)+\alpha_{c}f_{c}(\Phi)+\alpha_{o}f_{o}(\Phi) (1)

the initial state Φ(0)=Q0\Phi(0)=Q_{0} and final state Φ(1)=Q1\Phi(1)=Q_{1} are fixed. αs\alpha_{s}, αc\alpha_{c}, and αo\alpha_{o} are weights for each penalty terms. The detailed formulations of fsf_{s}, fcf_{c}, and fof_{o} are as follows:

fs(Φ)=ΦddtΦ(t)2f_{s}(\Phi)=\int_{\Phi}||\frac{d}{dt}\Phi(t)||^{2} (2)
fc(Φ)=Φc(Φ(t))ddtΦ(t)𝑑tf_{c}(\Phi)=\int_{\Phi}c(\Phi(t))||\frac{d}{dt}\Phi(t)||dt (3)
fo(Φ)=ΦΦ(t)Φd2𝑑tf_{o}(\Phi)=\int_{\Phi}||\Phi(t)-\Phi_{d}||^{2}dt (4)

where c():2c(\cdot):\mathbb{R}^{2}\to\mathbb{R} be the function that penalises the state near the obstacles, Φd\Phi_{d} is the trajectory that contains the desired view-position for data acquisition.

We update the trajectory by the functional gradients ¯f(Φi)\bar{\nabla}f(\Phi_{i}) using Φi+1=Φil𝐫¯f(Φi)\Phi_{i+1}=\Phi_{i}-l_{\mathbf{r}}\cdot\bar{\nabla}f(\Phi_{i}), following the work [20], where l𝐫l_{\mathbf{r}} is the learning rate. The functional gradient of the objective in (2)-(4) is given by

¯fs(Φ)=d2dt2Φ(t)\bar{\nabla}f_{s}(\Phi)=-\frac{d^{2}}{dt^{2}}\Phi(t) (5)
¯fc(Φ)=Φ(t)[(IΦ(t)Φ(t)T)ccκ]\bar{\nabla}f_{c}(\Phi)=||\Phi^{\prime}(t)||\cdot[(I-\Phi^{\prime}(t)\Phi^{\prime}(t)^{T})\nabla c-c\kappa] (6)
¯fo(Φ)=(Φ(t)Φd)\bar{\nabla}f_{o}(\Phi)=-(\Phi(t)-\Phi_{d}) (7)

where c\nabla c is the derivative of obstacles to the control points QjQ_{j} by the c/Qj\partial c/\partial Q_{j}, the definition of κ\kappa is given as below.

κ=Φ(t)2((IΦ(t)Φ(t)T)Φ′′)\kappa=||\Phi^{\prime}(t)||^{2}((I-\Phi^{\prime}(t)\Phi^{\prime}(t)^{T})\Phi^{\prime\prime}) (8)

B-spline Interpolation: The optimised trajectory is then parameterised by a piece-wise B-spline into a uniform curve ΦB\Phi_{B}. Given the determined degree mm, and a knot vector {k0,k1,,kM}\{k_{0},k_{1},\cdot,k_{M}\}, where M=nq+2mM=n_{q}+2m. The parameterised uniform curve ζ(t)\zeta(t) can be formulated as:

ζ(t)=i=1nqRi,m(t)wiQii=1nqRi,m(t)wi\zeta(t)=\frac{\sum\limits_{i=1}^{n_{q}}R_{i,m}(t)w_{i}Q_{i}}{\sum\limits_{i=1}^{n_{q}}R_{i,m}(t)w_{i}} (9)

where wiw_{i} is the weight of each control points QiQ_{i}, Ri,mR_{i,m} is the basis function for QiQ_{i} at mm degree, uu is the control value of the curve. Then, the TEB-planner [21] is used to find the proper velocity to follow the trajectory.

III-B3 Feasibility checking

The 𝐅𝐢𝐧𝐝𝐍𝐞𝐚𝐫𝐞𝐬𝐭𝐀𝐧𝐝𝐅𝐞𝐚𝐬𝐢𝐛𝐥𝐞\mathbf{FindNearestAndFeasible} function in Alg.2 evaluates the feasibility between two nodes from two aspects: orientations and traversability.

I. Orientation Checking: This term evaluates whether two nodes have similar directions, as the narrow tunnels between plants do not allow turn-round. The largest allowed orientation difference between two nodes is 1/3π1/3\pi.

Refer to caption
Figure 5: Illustration of terrain analysis, a) the points of robot surroundings, b) the risk area is converted into polygons.

II. Traversability Analysis: The point cloud between two nodes is projected onto a 2D grid map. The traversability of each grid is assessed as the multiple risk factors, including:

(a) Collision risk: A risk factor quantified by the possibility of a grid belonging to obstacles, denoted θ[0,1]\theta\in[0,1]. A terrain analysis [22] is utilised here.

(b) Slope risk: For each point on the terrain TiT_{i}, points on the map are selected by a cube box with sides of length lsl_{s}, which is represented as Ωi={(pij)j=1:Ni|pij3}\Omega_{i}=\{(p^{j}_{i})_{j=1:N_{i}}\rvert p_{i}^{j}\in\mathbb{R}^{3}\}. The SVD is used to fit a plane PiP_{i} from the Ωi\Omega_{i} and get the normal vector nz3n_{z}\in\mathbb{R}^{3}. The slop angle between terrain TiT_{i} and vertical direction nzn_{z} is obtained by 𝐬=𝐚𝐫𝐜𝐜𝐨𝐬eznzeznz\mathbf{s}=\mathbf{arccos}\frac{||e_{z}\cdot n_{z}||}{||e_{z}||\cdot||n_{z}||}.

(c) Step risk: This term evaluates the height gap between adjacent grid cells. Negative obstacles can also be detected by checking the lack of measurement points in a cell. The maximum height gap is denoted as λ\lambda.

The above three risks are combined by a weighted sum to obtain the weighted traversability value Υ\Upsilon, as:

Υ=θ+α𝐬𝐬𝐬crit+αλλλcrit\Upsilon=\theta+\alpha_{\mathbf{s}}\frac{\mathbf{s}}{\mathbf{s}_{crit}}+\alpha_{\lambda}\frac{\lambda}{\lambda_{crit}} (10)

where the 𝐬crit\mathbf{s}_{crit} and λcrit\lambda_{crit} are the maximum allowed slope angle and height gap, respectively.

III. Terrain analysis for trajectory optimisation: For robot operation, the odometry and KD-tree [23] are utilised to build the local terrain map (Fig.5 (a)). The detected obstacle points (Fig.5 (b)) are converted to polygons, denoted as {𝒪2}\{\mathcal{O}\subset\mathcal{R}^{2}\}. Let the ϕ(Qi,𝒪)\phi(Q_{i},\mathcal{O}) describe the minimal Euclidean distance between obstacles and a control point. A minimum separation ϕmin\phi_{\mathrm{min}} between all obstacles and QiQ_{i} is found by

ϕmin=min[ϕ(Qi,𝒪1),ϕ(Qi,𝒪2),]\phi_{\mathrm{min}}=\mathrm{min}[\phi(Q_{i},\mathcal{O}_{1}),\phi(Q_{i},\mathcal{O}_{2}),\cdots] (11)

The obstacles 𝒪\mathcal{O} is updated online for dynamic environments.

III-C In-situ Phenotyping Model

III-C1 Neural Rendering

NeRF represent the 3D scene as a radiance field which describes volume density σ\sigma and view-dependent color cc for every point xx and every viewing direction dd via a MLP [24]:

σ,c=HΘ(x,d)\sigma,c=H_{\Theta}(x,d) (12)

The ray tracing-based volume rendering is used to render the parameters of a 3D scene into colours C^\hat{C} of a ray rr, expressed as:

C^(r)\displaystyle\hat{C}(r) =i=1NTi(1exp(σiδi))ci,Ti\displaystyle=\sum_{i=1}^{N}T_{i}\left(1-\exp\left(-\sigma_{i}\delta_{i}\right)\right)c_{i},T_{i} (13)

Where TT is the volume transmittance and δ\delta is the step size of ray marching. For every pixel, the squared loss on photometric error is used for the optimization of MLP. When applied across the image, this loss Rendering\mathcal{L}_{\mathrm{Rendering}} is represented as:

Rendering=rRC^(r)C(r)22\mathcal{L}_{\mathrm{Rendering}}=\sum_{r\in R}||\hat{C}(r)-C(r)||_{2}^{2} (14)

where C(r)C(r) is the ground truth colours.

Refer to caption
Figure 6: Few-shot learning from sparse input. a) Ray samples under limited training views. b) Occlusion regularization for sparse-view rendering.

III-C2 Few-shot learning from Sparse Views

The artefact of ”white floaters” caused by rendering distortion is the common failure mode in NeRF learning. Autonomous data acquisition by robots can always lead to imperfect density and sparse views with fewer overlapping regions (Fig. 6 (a)), thus distorting the rendering of these regions. This is essential because, between these sparse views, NeRF’s training lacks sufficient information to estimate the correct geometric information of the scene, leading to significant and dense floats in the region that is close to the camera. To reduce those artefacts, an optimisation term occ\mathcal{L}_{occ}, which is designed to regulate the learning behaviour of NeRF [25], is used to penalize the dense fields near the camera via “occlusion” regularization, which can be expressed as:

occ\displaystyle\mathcal{L}_{occ} =𝝈K𝐦KK=1KKσkmk\displaystyle=\frac{\boldsymbol{\sigma}_{K}\mathbf{m}_{K}}{K}=\frac{1}{K}\sum_{K}\sigma_{k}\cdot m_{k} (15)

where 𝝈𝑲\boldsymbol{\sigma_{K}} represents the density values of the KK points sampled along the ray, ranked in order of proximity to the origin, 𝐦𝐊\mathbf{m_{K}} is the binary mask vector that determines whether a ray sector will be penalised or not.

III-C3 Geometry Extraction from Field

Given a predefined 3D region of interest, a set of spatial points P={p1,p2,,pn}P=\{p_{1},p_{2},...,p_{n}\} is generated via dense volumetric sampling. For each point piPp_{i}\in P, it’s evaluated through the NeRF model to obtain The density values, σ(pi)=NeRFσ(pi)\sigma(p_{i})=\mathrm{NeRF}_{\sigma}(p_{i}), form the basis for surface extraction. The Marching Cubes algorithm identifies the iso-surface by threshold of the density values:

M=𝐌𝐚𝐫𝐜𝐡𝐢𝐧𝐠𝐂𝐮𝐛𝐞𝐬(P,σthreshold)M=\mathbf{MarchingCubes}(P,\sigma_{\text{threshold}}) (16)

Where MM is the resultant mesh and σthreshold\sigma_{threshold} is an optimal density value demarcating the object’s boundary.

For vertex vjv_{j} in mesh MM, a viewing ray rjr_{j} is constructed and queried 𝐜(vj)=NeRF𝐜(vj,rj)\mathbf{c}(v_{j})=\mathrm{NeRF}_{\mathbf{c}}(v_{j},r_{j}) in NeRF. Those values derived from the field are mapped onto mesh MM, assigning colour to each vertex:

Color(vj)=𝐜(vj)\operatorname{Color}(v_{j})=\mathbf{c}(v_{j}) (17)

For a 2D texture representation, the vertex-coloured mesh undergoes UV unwrapping. To minimize distortion, Least Squares Conformal Mapping (LSCM) [26] is used, which is to minimize the conformal energy:

E(u,v)=Ω(|u|2+|v|2)𝑑AE(u,v)=\int_{\Omega}\left(|\nabla u|^{2}+|\nabla v|^{2}\right)dA (18)

where (u,v)(u,v) are the 2D texture coordinates for each vertex in MM, Ω\Omega represents the object’s surface, and dAdA is a differential area element on the mesh’s surface, indicating that the energy is computed by integrating over the surface of the mesh.

III-C4 NeRF Model

The Instant-NGP [27], which makes use of a multi-resolution hashed encoding that can represent learned features of the scenes with tiny MLPs, is utilised. In detail, Instant-NGP operates on the premise that the object to be reconstructed is enclosed within multi-resolution grids. For any point 𝐱3\mathbf{x}\in\mathbb{R}^{3} in various resolution grids, it obtains the hash encoding hi(𝐱)dh^{i}(\mathbf{x})\in\mathbb{R}^{d}, where dd is the features’ dimension, ii is the level of tri-linear interpolation. The hash encodings of all levels are concatenated to form the multi-resolution feature h(𝐱)={hi(𝐱)}i=1LL×dh(\mathbf{x})=\{h^{i}(\mathbf{x})\}_{i=1}^{L}\in\mathbb{R}^{L\times d}.

Training: The primary goal of our study is to produce high-quality rendering models of instances. We utilise the rendering loss as it quantifies the discrepancy between the rendered images and the input images, denoted as color{\mathcal{L}_{color}}. Besides, to improve the MLP learning under sparse view, we also apply occ{\mathcal{L}_{occ}} from (12). Therefore, given a set of posed-images IgtI_{gt} and the predicted renderings from the network IpredI_{pred}, the training loss MLP{\mathcal{L}_{MLP}} is defined as:

MLP=color+occ{\mathcal{L}_{MLP}}={\mathcal{L}_{color}}+\mathcal{L}_{occ} (19)

where color=(1/N)i=1NIgt(i)Ipred(i)22{\mathcal{L}_{color}}=(1/N)\cdot\sum_{i=1}^{N}\|I_{gt}^{(i)}-I_{pred}^{(i)}\|_{2}^{2} and NN is the total number of images in the datasets.

III-D Hierarchy Map Representation

A hierarchy scene representation that combines large-scale mapping with local high-fidelity rendering is introduced, including structure-level and instance-level.

I. Structure-level representation: The coarse-level map is designed to extract high-level information, focusing on the structure of the scene and the plants in the environment. In this context, we construct a map of the farm by creating a coloured point cloud map and its corresponding semantic map.

II. Instance-level representation: Each plant in the environment will be meticulously represented, ensuring that each plant oio_{i} is captured in fine detail through the robot’s visual image stream. In the context of phenotyping, each detected plant will be associated with a high-fidelity neural-learned rendering model, accompanied by its corresponding 3D model.

IV Experiments

IV-A Pheno-Robot Hardware

The Pheno-Robot is equipped with a 32-line LiDAR, a Realsense D435 camera and a 9-axis IMU (see Fig.7 (a) and (b)) for navigation. Communication between the different modules is via the common message layer on the ROS server. The sensors are connected to the computer via ROS-Noetic on Ubuntu 20.04. A 4K GoPro Hero-11 is mounted on a gimbal stabilizer on the left side, and the image stream from the GoPro is transmitted to the computer through the GoPro-ROS-node. Plant modelling by NeRF training is performed remotely, with data transmission via the 5G wireless network.

Refer to caption
Figure 7: (a) and (b): Pheno-Robot systems; (c) and (d): the demonstration of the EPM results in two test scenarios.

IV-B Evaluation on Pheno-Robot system

IV-B1 Evaluation on EPM

This section evaluates the performance of the EPM in typical agricultural environments. Two different settings, shown in Fig. 7 (a) and (b), were selected for the system evaluation, and their corresponding semantic maps are shown in Fig. 7(c) and (d), respectively. The results show the precise instance-level understanding of the EPM of the point cloud map in agricultural scenarios, with an overall recall and precision for detection of 0.95 and 1.0, respectively. The average position error and bounding box error are measured to be 0.06m and 0.02m respectively. Particularly in agricultural landscapes with row structures, our method demonstrates the ability to detect such features and generate a graph map for robot autonomy. Furthermore, the method is versatile and supports both online and offline modes depending on specific requirements. For this study, we perform the semantic extraction process offline.

IV-B2 Evaluation on MPM

This section evaluates the performance of the MPM for automated phenotyping by robots. We conducted tests in two different environments, shown in Fig. 8 and 9, corresponding to the environments shown in Fig. 7 (a) and (b), respectively.

Refer to caption
Figure 8: Planning scene and motion planning results of the MPM in orchard-like environments.
Refer to caption
Figure 9: Planning scene and motion planning results of the MPM in greenhouses.

In the experiment, target plants requiring phenotyping were randomly selected. The results show that the presented MPM achieves a high success rate and meets the robot’s requirements in both scenarios, as shown in (a) of Fig. 8 and 9. Compared to conventional AA^{*} or Dijkstra planners, which may generate inappropriate global trajectories leading to turning movements in narrow channels, causing local motion planning failures for car-like robots, our graph-based global planner performs better. Beyond the global planner, the developed local trajectory planner also shows strong robustness in field environments, as illustrated in (b) and (c) of Figs. 8 and 9, respectively. A significant factor affecting the performance of the local trajectory planner is the traversability analysis. Given the complex terrain in agricultural environments, traversability may be prone to overestimating or underestimating risk areas during planning, leading to motion planning failures. In the experiment, the robot was optimised with an additional solid-state lidar at the front to improve its perception under complex conditions. In addition, a replanning mechanism is employed for local motion planning, with an update frequency of 5HZ and a forward planning distance of 10m. Overall, our system demonstrates robust performance in the field with appropriate parameter tuning. The maximum speed during sampling is 0.2m/s, while the maximum speed for other conditions is 1m/s.

Refer to caption
Refer to caption
Figure 10: Demonstration of hierarchy maps of two scenarios in orchard-like environments (A) and greenhouse (B). The a) of both A and B are global-level maps of the scenes, and b) are the detailed models of instances in the environments.

IV-B3 Evaluation on in-situ Phenotype

This section assesses the performance of IPM. We compare the in-situ phenotyping models of both scenarios by using handheld data acquisition and robot-automated acquisition. The quality of the results is evaluated by Peak Signal to Noise Ratio (PSNR), as detailed in Table 1.

TABLE I: Comparison of Modelling quality
  PSNR(dB) Time(min)
Dataset Methods HA RA HA RA
  Outdoor Instant-NGP 24.8 22.5 2.4 3.2
Ours 25.2 24.2 2.3 2.7
Greenhouse Instant-NGP 23.4 21.5 3.1 5.2
Ours 23.3 22.7 2.3 4.1
 

In the initial experiments, both vanilla Instant-NGP and our model showed fast convergence, taking less than 2 minutes when using hand-collected samples, resulting in a PSNR above 24 dB and indicating high-quality results. However, when using robot-collected samples, Instant-NGP took over 4 minutes to converge and achieved a PSNR below 23 dB. In contrast, our method achieved a PSNR above 23.5 dB and converged in 3 minutes, demonstrating the effectiveness of occlusion regularisation in NeRF training with sparse-view inputs, effectively mitigating geometric estimation errors in the renderings. Moving on to the greenhouse experiments, the quality of modelling using both Instant-NGP and our model showed a decrease, with PSNR values of around 23 dB in approximately 3 minutes when using handheld collected samples. This reduction in quality is mainly due to the complex geometry of the canopy leading to occlusion. When modelling with robotically collected samples, both models showed a decrease in model quality along with a longer training convergence time. In comparison, our model performed better under these field conditions. The hierarchy maps for both scenarios are shown in Figures 10 and LABEL:fig:enter-label-2, revealing global structure-aware maps of farms and detailed models for individual plants. These results highlight the ability of the robotic system to achieve high-quality in-situ phenotyping in complex agricultural environments, demonstrating resilience despite certain limitations.

IV-C Demonstration of Digital-Modelling for Simulation

Our system can streamline the creation of virtual environments for robot development using Isaac Sim, an advanced virtual environment known for realistic physics simulation, rich sensor simulation, and graphical rendering. The environment model, derived from terrain and trees of hierarchy maps, is seamlessly integrated (see Fig. 11).

Refer to caption
Figure 11: Isaac simulation environments of a) robot navigation and b) robot harvesting.

The first scenario simulates a car-like robot equipped with LiDAR operating in orchards (Fig. 11 (a)). The real-world terrain is converted into 2D images and meshes are constructed based on these images. Tree locations are imported directly from EPM predictions and tree models are generated by extracting meshes from the renderings. In the second scenario (Fig. 11 (b) and (c)), fruit models with fragile joints are added to the trees, creating a reinforcement learning environment for robotic harvesting tasks. The harvesting robot, equipped with a UR-5 manipulator, a soft gripper and a camera on the remote robot base, is trained and evaluated in this virtual environment.

V Conclusion

This study investigates the utilisation of robots in plant phenotyping to increase efficiency and reduce labour-intensive tasks. The proposed system consists of three key sub-systems that address environmental information processing, motion planning for data acquisition, and modelling using data collected by the robot. Experimental results demonstrate the effectiveness of the system, particularly in outdoor environments with mild terrains, where the robot can collect high-quality samples leading to superior plant phenotypic models. For undulating terrains, typically challenging for plant phenotyping, our enhancements to the NeRF model also exhibit promise in generating quality plant phenotypic models. Future research will focus on further refining phenotyping quality by exploiting the robotic system, incorporating the robotic arm, and making additional improvements to the NeRF model to minimise artefacts in challenging environments.

References

  • [1] L. Fu, F. Gao, J. Wu, R. Li, M. Karkee, and Q. Zhang, “Application of consumer rgb-d cameras for fruit detection and localization in field: A critical review,” Computers and Electronics in Agriculture, vol. 177, p. 105687, 2020.
  • [2] S. Wu, W. Wen, Y. Wang, J. Fan, C. Wang, W. Gou, and X. Guo, “Mvs-pheno: a portable and low-cost phenotyping platform for maize shoots using multiview stereo 3d reconstruction,” Plant Phenomics, 2020.
  • [3] D. M. Deery and H. G. Jones, “Field phenomics: Will it enable crop improvement?” Plant Phenomics, 2021.
  • [4] Y. Zhang and N. Zhang, “Imaging technologies for plant high-throughput phenotyping: a review,” Frontiers of Agricultural Science and Engineering, vol. 5, no. 4, pp. 406–419, 2018.
  • [5] R. Xu and C. Li, “A modular agricultural robotic system (mars) for precision farming: Concept and implementation,” Journal of Field Robotics, vol. 39, no. 4, pp. 387–409, 2022.
  • [6] B. Dellen, H. Scharr, and C. Torras, “Growth signatures of rosette plants from time-lapse video,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 6, pp. 1470–1478, 2015.
  • [7] L. Feng, S. Chen, C. Zhang, Y. Zhang, and Y. He, “A comprehensive review on recent applications of unmanned aerial vehicle remote sensing with various sensors for high-throughput plant phenotyping,” Computers and electronics in agriculture, vol. 182, p. 106033, 2021.
  • [8] S. Paulus, “Measuring crops in 3d: using geometry for plant phenotyping,” Plant methods, vol. 15, no. 1, pp. 1–13, 2019.
  • [9] N. Virlet, K. Sabermanesh, P. Sadeghi-Tehran, and M. J. Hawkesford, “Field scanalyzer: An automated robotic field phenotyping platform for detailed crop monitoring,” Functional Plant Biology, vol. 44, no. 1, pp. 143–153, 2016.
  • [10] S. Paulus, H. Schumann, H. Kuhlmann, and J. Léon, “High-precision laser scanning system for capturing 3d plant architecture and analysing growth of cereal plants,” Biosystems Engineering, vol. 121, pp. 1–11, 2014.
  • [11] D. Xiong, D. Wang, X. Liu, S. Peng, J. Huang, and Y. Li, “Leaf density explains variation in leaf mass per area in rice between cultivars and nitrogen treatments,” Annals of Botany, vol. 117, no. 6, pp. 963–971, 2016.
  • [12] M. S. A. Mahmud, M. S. Z. Abidin, A. A. Emmanuel, and H. S. Hasan, “Robotics and automation in agriculture: present and future applications,” Applications of Modelling and Simulation, vol. 4, pp. 130–140, 2020.
  • [13] T. Duckett, S. Pearson, S. Blackmore, B. Grieve, W.-H. Chen, G. Cielniak, J. Cleaversmith, J. Dai, S. Davis, C. Fox et al., “Agricultural robotics: the future of robotic agriculture,” arXiv preprint arXiv:1806.06762, 2018.
  • [14] W. Au, H. Zhou, T. Liu, E. Kok, X. Wang, M. Wang, and C. Chen, “The monash apple retrieving system: a review on system intelligence and apple harvesting performance,” Computers and Electronics in Agriculture, vol. 213, p. 108164, 2023.
  • [15] D. Albani, J. IJsselmuiden, R. Haken, and V. Trianni, “Monitoring and mapping with robot swarms for agricultural applications,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).   IEEE, 2017, pp. 1–6.
  • [16] M. Jansen, F. Gilmer, B. Biskup, K. A. Nagel, U. Rascher, A. Fischbach, S. Briem, G. Dreissen, S. Tittmann, S. Braun et al., “Simultaneous phenotyping of leaf growth and chlorophyll fluorescence via growscreen fluoro allows detection of stress tolerance in arabidopsis thaliana and other rosette plants,” Functional Plant Biology, vol. 36, no. 11, pp. 902–914, 2009.
  • [17] H. Kang, H. Zhou, X. Wang, and C. Chen, “Real-time fruit recognition and grasping estimation for robotic apple harvesting,” Sensors, vol. 20, no. 19, p. 5670, 2020.
  • [18] H. Kang, X. Wang, and C. Chen, “Accurate fruit localisation using high resolution lidar-camera fusion and instance segmentation,” Computers and Electronics in Agriculture, vol. 203, p. 107450, 2022.
  • [19] Y. Pan, H. Cao, K. Hu, H. Kang, and X. Wang, “A novel perception and semantic mapping method for robot autonomy in orchards,” arXiv e-prints, pp. arXiv–2308, 2023.
  • [20] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” The International journal of robotics research, vol. 32, no. 9-10, pp. 1164–1193, 2013.
  • [21] C. Rösmann, F. Hoffmann, and T. Bertram, “Kinodynamic trajectory optimization and control for car-like robots,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 5681–5686.
  • [22] W. Zhang, J. Qi, P. Wan, H. Wang, D. Xie, X. Wang, and G. Yan, “An easy-to-use airborne lidar data filtering method based on cloth simulation,” Remote sensing, vol. 8, no. 6, p. 501, 2016.
  • [23] Y. Cai, W. Xu, and F. Zhang, “ikd-tree: An incremental kd tree for robotic applications,” arXiv preprint arXiv:2102.10808, 2021.
  • [24] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  • [25] J. Yang, M. Pavone, and Y. Wang, “Freenerf: Improving few-shot neural rendering with free frequency regularization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8254–8263.
  • [26] B. Lévy, S. Petitjean, N. Ray, and J. Maillot, “Least squares conformal maps for automatic texture atlas generation,” in Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 2023, pp. 193–202.
  • [27] T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.