Kernels for the ICP
pipeline.
More...
Classes | |
struct | dist_id |
Struct holding a value and a key. More... | |
Functions | |
kernel void | getLMs (global float4 *in, global float4 *out) |
Samples a point cloud for landmarks. More... | |
kernel void | getReps (global float8 *in, global float8 *out) |
Samples a set of landmarks for representatives. More... | |
kernel void | icpComputeReduceWeights (global dist_id *in, global float *weights, global double *sums, local float *data, uint n) |
Computes a set of weights \( \{w_i = \frac{100}{100+\|x_i-x'_i\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \). More... | |
kernel void | icpComputeReduceWeights_WG (global dist_id *in, global float *weights, global float *sums, local float *data, uint n) |
Computes a set of weights \( \{w_i = \frac{100}{100+\|x_i-x'_i\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \). More... | |
kernel void | reduce_sum_fd (global float4 *in, global double *out, local double *data, uint n) |
Performs a reduce operation on the columns of an array. More... | |
kernel void | icpMean (global float4 *F, global float4 *M, global float4 *mean, local float *data, uint n) |
Performs reduce operations on arrays of 8-D points. More... | |
kernel void | icpMean_Weighted (global float4 *F, global float4 *M, global float4 *MEAN, global float *W, constant double *sum_w, local float *data, uint n) |
Performs reduce operations on arrays of 8-D points. More... | |
kernel void | icpGMean (global float4 *in, global float4 *out, local float *data, uint n) |
Performs a reduce operation on an array of 4-D points. More... | |
kernel void | icpSubtractMean (global float4 *F, global float4 *M, global float4 *DF, global float4 *DM, constant float4 *mean) |
Computes the deviations from the means of the fixed and moving sets of 8-D points. More... | |
kernel void | icpSijProducts (global float4 *M, global float4 *F, global float *Sij, uint m, float c) |
Produces the products in the Sij elements of the S matrix. More... | |
kernel void | icpSijProducts_Weighted (global float4 *M, global float4 *F, global float *W, global float *Sij, uint m, float c) |
Produces the weighted products in the Sij elements of the S matrix. More... | |
kernel void | icpTransform_Quaternion (global float4 *M, global float4 *tM, constant float4 *data) |
Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = (\omega, \mathcal{v}) = q_w + (q_x i + q_y j + q_z k) = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \). More... | |
kernel void | icpTransform_Quaternion_2 (global float4 *M, global float4 *tM, constant float4 *data) |
Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = q_w + q_x i + q_y j + q_z k = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \). More... | |
kernel void | icpTransform_Matrix (global float4 *M, global float4 *tM, constant float4 *T) |
Performs a homogeneous transformation on a set of points. More... | |
void | prod (float4 *N, float4 *x, float4 *x_new) |
Computes a matrix-vector product, \( x_{new}=Nx \). More... | |
kernel void | icpPowerMethod (global float *Sij, global float4 *means, global float4 *Tk) |
Computes the quantities that represent the incremental development in the transformation estimation in iteration k . More... | |
Kernels for the ICP
pipeline.
kernel void getLMs | ( | global float4 * | in, |
global float4 * | out | ||
) |
Samples a point cloud for landmarks.
Chooses landmarks at specific intervals in the x and y dimension.
1:4
in the x dimension and 1:3
in the y dimension. There is also an offset 1
in the x dimension and 1 in the y dimension. This creates an array \( (128 \times 128) \) of landmarks. [in] | in | array (point cloud) of float8 elements. |
[out] | out | array (landmarks) of float8 elements. |
kernel void getReps | ( | global float8 * | in, |
global float8 * | out | ||
) |
Samples a set of landmarks for representatives.
Chooses representatives at specific intervals in the x and y dimension.
[in] | in | array (landmarks) of float8 elements. |
[out] | out | array (representatives) of float8 elements. |
kernel void icpComputeReduceWeights | ( | global dist_id * | in, |
global float * | weights, | ||
global double * | sums, | ||
local float * | data, | ||
uint | n | ||
) |
Computes a set of weights \( \{w_i = \frac{100}{100+\|x_i-x'_i\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \).
Takes distances between pairs of points and produces a set of weights.
n
, should be a multiple of 2. The global workspace should be one dimensional, and its x dimension, \( gXdim \), should be greater than or equal to the number of elements in the array, n
, divided by 2. That is, \( \ gXdim \geq n/2 \). Each work-item handles 2 dist_id
elements. The local workspace should be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp
per work-group. double
so that a consistent API is maintained.[in] | in | array of dist_id elements. |
[out] | weights | array with the computed weights (float elements). |
[out] | sums | (reduced) array with the sum. Its size should be \( sizeof\ (double) \). |
[in] | data | local buffer. Its size should be 2 float elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (float) \). |
[in] | n | number of elements in the array. |
kernel void icpComputeReduceWeights_WG | ( | global dist_id * | in, |
global float * | weights, | ||
global float * | sums, | ||
local float * | data, | ||
uint | n | ||
) |
Computes a set of weights \( \{w_i = \frac{100}{100+\|x_i-x'_i\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \).
Takes distances between pairs of points and produces a set of weights.
n
, should be a multiple of 2. The global workspace should be one dimensional, and its x dimension, \( gXdim \), should be greater than or equal to the number of elements in the array, n
, divided by 2. That is, \( \ gXdim \geq n/2 \). Each work-item handles 2 dist_id
elements. The local workspace should be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp
per work-group. icpComputeReduceWeights
instead. reduce_sum_fd
) on those sums for the final result. The number of work-groups in the x dimension, \( wgXdim \), for the case of multiple work-groups, should be made a multiple of 4. The potential extra work-groups are used for enforcing correctness. They write the necessary identity operands, 0.f
, in the output array, since in the next phase the data are going to be handled as float4
.[in] | in | array of dist_id elements. |
[out] | weights | array with the computed weights (float elements). |
[out] | sums | (reduced) array with the sums. Its size should be \( wgXdim*sizeof\ (float) \). |
[in] | data | local buffer. Its size should be 2 float elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (float) \). |
[in] | n | number of elements in the array. |
kernel void icpGMean | ( | global float4 * | in, |
global float4 * | out, | ||
local float * | data, | ||
uint | n | ||
) |
Performs a reduce operation on an array of 4-D points.
Computes the sums \( \bar{x}_j = \sum^{n}_{i}{x_{ij}}, j=\{0,1,2\} \) on the xyz dimensions of the points in the fixed and moving sets.
icpMean
. n
, in the array divided by 2. That is, \( \ gXdim \geq n/2 \). The local workspace should also be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp
per work-group. [in] | in | array of float4 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[out] | out | array of mean float4 vectors. When the kernel is dispatched with just one work-group, the array contains one vector with the means on the xyz dimensions, and its size should be \( sizeof\ (float4) \). When the kernel is dispatched with more than one work-group, the array contains the means from each block reduction, and its size should be \( wgXdim*sizeof\ (float4) \). |
[in] | data | local buffer. Its size should be 6 float elements for each work-item in a work-group. That is \( lXdim*(2*(3*sizeof\ (float))) \). |
[in] | n | number of points in the array. |
kernel void icpMean | ( | global float4 * | F, |
global float4 * | M, | ||
global float4 * | mean, | ||
local float * | data, | ||
uint | n | ||
) |
Performs reduce operations on arrays of 8-D points.
Computes the means \( \bar{x}_j = \sum^{n}_{i}{\frac{x_{ij}}{n}}, j=\{0,1,2\} \) on the xyz dimensions of the points in the fixed and moving sets.
n
, in the arrays should be a multiple of 2 (each work-item loads 2 points). The x dimension of the global workspace, \( gXdim \), should be greater than or equal to the number of points in the arrays divided by 2. That is, \( \ gXdim \geq n/2 \). The y dimension of the global workspace, \( gYdim \), should be equal to 2. That is, \( \ gYdim = 2 \). The local workspace should also be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp
per work-group. [in] | F | fixed set of float8 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[in] | M | moving set of float8 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[out] | mean | array of mean float4 vectors. When the kernel is dispatched with just one work-group per row, the array contains one vector per set with the means on the xyz dimensions. Its size should be \( 2*sizeof\ (float4) \). The first float4 is the mean for the fixed set, and the second float4 is the mean for the moving set. When the kernel is dispatched with more than one work-group, the array contains the means from each block reduction, and its size should be \( 2*(wgXdim*sizeof\ (float4)) \). The first row contains the block means for the fixed set, and the second row contains the block means for the moving set. |
[in] | data | local buffer. Its size should be 6 float elements for each work-item in a work-group. That is \( lXdim*(2*(3*sizeof\ (float))) \). |
[in] | n | number of points in the sets. |
kernel void icpMean_Weighted | ( | global float4 * | F, |
global float4 * | M, | ||
global float4 * | MEAN, | ||
global float * | W, | ||
constant double * | sum_w, | ||
local float * | data, | ||
uint | n | ||
) |
Performs reduce operations on arrays of 8-D points.
Computes the weighted means \( \bar{x}_j = \frac{\sum^{n}_{i}{w_i*x_{ij}}} {\sum^{n}_{i}{w_i}}, j=\{0,1,2\} \) on the xyz dimensions of the points in the fixed and moving sets.
n
, in the arrays should be a multiple of 2 (each work-item loads 2 points). The x dimension of the global workspace, \( gXdim \), should be greater than or equal to the number of points in the arrays divided by 2. That is, \( \ gXdim \geq n/2 \). The y dimension of the global workspace, \( gYdim \), should be equal to 2. That is, \( \ gYdim = 2 \). The local workspace should also be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp
per work-group. [in] | F | fixed set of float8 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[in] | M | moving set of float8 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[out] | MEAN | array of mean float4 vectors. When the kernel is dispatched with just one work-group per row, the array contains one vector per set with the means on the xyz dimensions. Its size should be \( 2*sizeof\ (float4) \). The first float4 is the mean for the fixed set, and the second float4 is the mean for the moving set. When the kernel is dispatched with more than one work-group, the array contains the means from each block reduction, and its size should be \( 2*(wgXdim*sizeof\ (float4)) \). The first row contains the block means for the fixed set, and the second row contains the block means for the moving set. |
[in] | W | array with the weights between the pairs of points. |
[in] | sum_w | sum of the weights. |
[in] | data | local buffer. Its size should be 6 float elements for each work-item in a work-group. That is \( lXdim*(2*(3*sizeof\ (float))) \). |
[in] | n | number of points in the sets. |
kernel void icpPowerMethod | ( | global float * | Sij, |
global float4 * | means, | ||
global float4 * | Tk | ||
) |
Computes the quantities that represent the incremental development in the transformation estimation in iteration k
.
Uses the Power Method
to estimate the unit quaternion \( q_k \) that represents the rotation, and then computes the scale \( s_k \) and translation \( t_k \).
[in] | Sij | array (sums of products) of size \(11*sizeof\ (float)\). The first 9 elements (in row major order) are the \(S_k\) matrix, and the next 2 are the numerator and denominator of the scale \(s_k\). |
[in] | means | array (fixed and moving set means) of size \(2*sizeof\ (float4)\). |
[out] | Tk | array of size \( 2 * sizeof\ (float4) \). The first float4 is the unit quaternion \( \dot{q_k} = q_w + q_x i + q_y j + q_z k = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), and the second one is the translation vector \( t_k=\left[ \begin{matrix} t_x & t_y & t_z & 1 \end{matrix} \right]^T \). The scale is placed in the last element of the translation vector. That is, \( t_k = \left[ \begin{matrix} t_x & t_y & t_z & s_k \end{matrix} \right]^T \). |
kernel void icpSijProducts | ( | global float4 * | M, |
global float4 * | F, | ||
global float * | Sij, | ||
uint | m, | ||
float | c | ||
) |
Produces the products in the Sij elements of the S matrix.
Multiplies the deviations of corresponding points in the fixed set \( \mathcal{F}_{m \times 3} \) and the moving set \( \mathcal{M}_{m \times 3} \).
m
, in the sets. That is, \( \ gXdim \geq m \). There is no requirement for the local workspace.[in] | M | array (moving set deviations) of float4 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[in] | F | array (fixed set deviations) of float4 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[out] | Sij | array (partial sums of products). Its number of rows is 11 . Its number of columns is \( gXdim \). So, its size should be \( 11 * gXdim * sizeof\ (float) \). |
[in] | m | number of points in the sets. |
[in] | c | scaling factor. |
kernel void icpSijProducts_Weighted | ( | global float4 * | M, |
global float4 * | F, | ||
global float * | W, | ||
global float * | Sij, | ||
uint | m, | ||
float | c | ||
) |
Produces the weighted products in the Sij elements of the S matrix.
Multiplies the deviations of corresponding points in the fixed set \( \mathcal{F}_{m \times 3} \) and the moving set \( \mathcal{M}_{m \times 3} \).
m
, in the sets. That is, \( \ gXdim \geq m \). There is no requirement for the local workspace.[in] | M | array (moving set deviations) of float4 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[in] | F | array (fixed set deviations) of float4 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[in] | W | array (weights) of float elements. |
[out] | Sij | array (partial sums of products). Its number of rows is 11 . Its number of columns is \( gXdim \). So, its size should be \( 11 * gXdim * sizeof\ (float) \). |
[in] | m | number of points in the sets. |
[in] | c | scaling factor. |
kernel void icpSubtractMean | ( | global float4 * | F, |
global float4 * | M, | ||
global float4 * | DF, | ||
global float4 * | DM, | ||
constant float4 * | mean | ||
) |
Computes the deviations from the means of the fixed and moving sets of 8-D points.
Subtracts the means from the 4-D geometric coordinates of the points.
[in] | F | fixed set of float8 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[in] | M | moving set of float8 elements. The first 3 dimensions should contain the xyz coordinates of the points. |
[out] | DF | array of float4 elements (fixed set deviations from the mean). Only the geometric information gets transfered in the output. |
[out] | DM | array of float4 elements (moving set deviations from the mean). Only the geometric information gets transfered in the output. |
[in] | mean | fixed and moving set means. The first float4 is the fixed set mean, amd the second float is the moving set mean. |
kernel void icpTransform_Matrix | ( | global float4 * | M, |
global float4 * | tM, | ||
constant float4 * | T | ||
) |
Performs a homogeneous transformation on a set of points.
Transforms each point in a set, \( p' = Tp = \left[ \begin{matrix} R' & t \\ 0 & 1 \end{matrix} \right]\left[ \begin{matrix} p \\ 1 \end{matrix} \right] = \left[ \begin{matrix} sR & t \\ 0 & 1 \end{matrix} \right]\left[ \begin{matrix} p \\ 1 \end{matrix} \right] = sRp+t \).
m
, in the sets. That is, \( \ gYdim = m \). There is no requirement for the local workspace.[in] | M | array of float8 elements. The first 4 dimensions should contain the homogeneous coordinates of the points. |
[out] | tM | array of float8 elements. The first 4 dimensions will contain the transformed homogeneous coordinates of the points. |
[in] | T | the transformation matrix of size \( 16 * sizeof\ (float) \). The elements should be laid out in row major order. |
kernel void icpTransform_Quaternion | ( | global float4 * | M, |
global float4 * | tM, | ||
constant float4 * | data | ||
) |
Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = (\omega, \mathcal{v}) = q_w + (q_x i + q_y j + q_z k) = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \).
Transforms each point in a set, \(\ p' = s\dot{q}\dot{p}\dot{q}^*+t = s(p + 2\mathcal{v} \times (\mathcal{v} \times p + \omega p)) + t \).
m
, in the sets. That is, \( \ gYdim = m \). There is no requirement for the local workspace.[in] | M | array of float8 elements. The first 4 dimensions should contain the homogeneous coordinates of the points. |
[out] | tM | array of float8 elements. The first 4 dimensions will contain the transformed homogeneous coordinates of the points. |
[in] | data | array of size \( 2 * sizeof\ (float4) \). The first float4 is the quaternion, and the second is the translation vector. If there is a need to apply scaling, the factor should be available in the last element of the translation vector. That is, \( t = \left[ \begin{matrix} t_x & t_y & t_z & s \end{matrix} \right]^T \). |
kernel void icpTransform_Quaternion_2 | ( | global float4 * | M, |
global float4 * | tM, | ||
constant float4 * | data | ||
) |
Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = q_w + q_x i + q_y j + q_z k = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \).
Transforms each point in a set, \(\ p' = s\dot{q}\dot{p}\dot{q}^*+t = s\bar{Q}^TQ\dot{p}+t = s \left[ \begin{matrix} q_w & q_z & -q_y & q_x \\ -q_z & q_w & q_x & q_y \\ q_y & -q_x & q_w & q_z \\ -q_x & -q_y & -q_z & q_w \end{matrix} \right]^T \left[ \begin{matrix} q_w & -q_z & q_y & q_x \\ q_z & q_w & -q_x & q_y \\ -q_y & q_x & q_w & q_z \\ -q_x & -q_y & -q_z & q_w \end{matrix} \right] \left[ \begin{matrix} p_x \\ p_y \\ p_z \\ 0 \end{matrix} \right] + \left[ \begin{matrix} t_x \\ t_y \\ t_z \\ 1 \end{matrix} \right] = s \left[ \begin{matrix} 1-2q_y^2-2q_z^2 & 2(q_xq_y-q_zq_w) & 2(q_xq_z+q_yq_w) & 0 \\ 2(q_xq_y+q_zq_w) & 1-2q_x^2-2q_z^2 & 2(q_yq_z-q_xq_w) & 0 \\ 2(q_xq_z-q_yq_w) & 2(q_yq_z+q_xq_w) & 1-2q_x^2-2q_y^2 & 0 \\ 0 & 0 & 0 & 1 \end{matrix} \right] \left[ \begin{matrix} p_x \\ p_y \\ p_z \\ 0 \end{matrix} \right] + \left[ \begin{matrix} t_x \\ t_y \\ t_z \\ 1 \end{matrix} \right]\).
m
, in the sets. That is, \( \ gYdim = m \). There is no requirement for the local workspace.[in] | M | array of float8 elements. The first 4 dimensions should contain the homogeneous coordinates of the points. |
[out] | tM | array of float8 elements. The first 4 dimensions will contain the transformed homogeneous coordinates of the points. |
[in] | data | array of size \( 2 * sizeof\ (float4) \). The first float4 is the quaternion, and the second is the translation vector. If there is a need to apply scaling, the factor should be available in the last element of the translation vector. That is, \( t = \left[ \begin{matrix} t_x & t_y & t_z & s \end{matrix} \right]^T \). |
|
inline |
Computes a matrix-vector product, \( x_{new}=Nx \).
[in] | N | 4x4 matrix (4xfloat4 elements). |
[in] | x | vector (float4 element). |
[out] | x_new | vector (float4 element). |
kernel void reduce_sum_fd | ( | global float4 * | in, |
global double * | out, | ||
local double * | data, | ||
uint | n | ||
) |
Performs a reduce operation on the columns of an array.
Computes the sum of the elements of each row in an array.
N
, in a row of the array should be a multiple of 4 (the data are handled as float4
). The x dimension of the global workspace, \( gXdim \), should be greater or equal to the number of elements in a row of the array divided by 8. That is, \( \ gXdim \geq N/8 \). Each work-item handles 8 float
(= 2 float4
) elements in a row of the array. The y dimension of the global workspace, \( gYdim \), should be equal to the number of rows, M
, in the array. That is, \( \ gYdim = M \). The local workspace should be 1
in the y dimension, and a power of 2 in the x dimension. It is recommended to use one wavefront/warp
per work-group. 0.0
, in the output array, since in the next phase the data are going to be handled as double4
.[in] | in | array of float elements. |
[out] | out | (reduced) array of double elements. When the kernel is dispatched with one work-group per row, the array contains the final results, and its size should be \( rows*sizeof\ (double) \). When the kernel is dispatched with more than one work-groups per row, the array contains the results from each block reduction, and its size should be \( wgXdim*rows*sizeof\ (double) \). |
[in] | data | local buffer. Its size should be 2 double elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (double) \). |
[in] | n | number of elements in a row of the array divided by 4. |