Kernels for the ICP pipeline. More...

Classes
struct	dist_id
	Struct holding a value and a key. More...

Functions
kernel void	getLMs (global float4 in, global float4 out)
	Samples a point cloud for landmarks. More...

kernel void	getReps (global float8 in, global float8 out)
	Samples a set of landmarks for representatives. More...

kernel void	icpComputeReduceWeights (global dist_id in, global float weights, global double sums, local float data, uint n)
	Computes a set of weights \( \{w_i = \frac{100}{100+\\|x_i-x'_i\\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \). More...

kernel void	icpComputeReduceWeights_WG (global dist_id in, global float weights, global float sums, local float data, uint n)
	Computes a set of weights \( \{w_i = \frac{100}{100+\\|x_i-x'_i\\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \). More...

kernel void	reduce_sum_fd (global float4 in, global double out, local double *data, uint n)
	Performs a reduce operation on the columns of an array. More...

kernel void	icpMean (global float4 F, global float4 M, global float4 mean, local float data, uint n)
	Performs reduce operations on arrays of 8-D points. More...

kernel void	icpMean_Weighted (global float4 F, global float4 M, global float4 MEAN, global float W, constant double sum_w, local float data, uint n)
	Performs reduce operations on arrays of 8-D points. More...

kernel void	icpGMean (global float4 in, global float4 out, local float *data, uint n)
	Performs a reduce operation on an array of 4-D points. More...

kernel void	icpSubtractMean (global float4 F, global float4 M, global float4 DF, global float4 DM, constant float4 *mean)
	Computes the deviations from the means of the fixed and moving sets of 8-D points. More...

kernel void	icpSijProducts (global float4 M, global float4 F, global float *Sij, uint m, float c)
	Produces the products in the Sij elements of the S matrix. More...

kernel void	icpSijProducts_Weighted (global float4 M, global float4 F, global float W, global float Sij, uint m, float c)
	Produces the weighted products in the Sij elements of the S matrix. More...

kernel void	icpTransform_Quaternion (global float4 M, global float4 tM, constant float4 *data)
	Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = (\omega, \mathcal{v}) = q_w + (q_x i + q_y j + q_z k) = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \). More...

kernel void	icpTransform_Quaternion_2 (global float4 M, global float4 tM, constant float4 *data)
	Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = q_w + q_x i + q_y j + q_z k = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \). More...

kernel void	icpTransform_Matrix (global float4 M, global float4 tM, constant float4 *T)
	Performs a homogeneous transformation on a set of points. More...

void	prod (float4 N, float4 x, float4 *x_new)
	Computes a matrix-vector product, \( x_{new}=Nx \). More...

kernel void	icpPowerMethod (global float Sij, global float4 means, global float4 *Tk)
	Computes the quantities that represent the incremental development in the transformation estimation in iteration `k`. More...

Detailed Description

Kernels for the ICP pipeline.

Author: Nick Lamprianidis

Version: 1.0

Date: 2015

Copyright: The MIT License (MIT)

: Copyright (c) 2015 Nick Lamprianidis

: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Function Documentation

kernel void getLMs	(	global float4 *	in,
		global float4 *	out
	)

Samples a point cloud for landmarks.

Chooses landmarks at specific intervals in the x and y dimension.

Note: 16384 landmarks are extracted from the point cloud \( (640 \times 480) \). These landmarks come from a center area. 10% of the points around the cloud are ignored.; From the center area \( (512 \times 384) \), points are sampled 1:4 in the x dimension and 1:3 in the y dimension. There is also an offset 1 in the x dimension and 1 in the y dimension. This creates an array \( (128 \times 128) \) of landmarks.; Invalid points (zero coordinates) are going to be picked. Further processing is needed for those points to be discraded, if necessary.; The x dimension of the global workspace, \( gXdim \), should be equal to two times the number of landmarks per row (2 work-items per landmark). That is, \( \ gXdim=256 \). The y dimension of the global workspace, \( gYdim \), should be equal to the number of landmarks per column. That is, \( \ gYdim=128 \). There is no requirement for the local workspace.

Parameters

[in]	in	array (point cloud) of `float8` elements.
[out]	out	array (landmarks) of `float8` elements.

kernel void getReps	(	global float8 *	in,
		global float8 *	out
	)

Samples a set of landmarks for representatives.

Chooses representatives at specific intervals in the x and y dimension.

Note: Representatives are extracted from the set of landmarks \( (128 \times 128) \).; Representatives are sampled \( 1:(128/n_{rx}) \) in the x dimension and \( 1:(128/n_{ry}) \) in the y dimension. There is also an offset \( (128/n_{rx})/2-1 \) in the x dimension and \( (128/n_{ry})/2-1 \) in the y dimension. This creates an array \( (n_{rx} \times n_{ry}) \) of representatives.; The x dimension of the global workspace, \( gXdim \), should be equal to the number of representatives per row. That is, \( \ gXdim=n_{rx }\). The y dimension of the global workspace, \( gYdim \), should be equal to the number of representatives per column. That is, \( \ gYdim=n_{ry} \). There is no requirement for the local workspace.

Parameters

[in]	in	array (landmarks) of `float8` elements.
[out]	out	array (representatives) of `float8` elements.

kernel void icpComputeReduceWeights	(	global dist_id *	in,
		global float *	weights,
		global double *	sums,
		local float *	data,
		uint	n
	)

Computes a set of weights \( \{w_i = \frac{100}{100+\|x_i-x'_i\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \).

Takes distances between pairs of points and produces a set of weights.

Note: The number of elements in the input array, n, should be a multiple of 2. The global workspace should be one dimensional, and its x dimension, \( gXdim \), should be greater than or equal to the number of elements in the array, n, divided by 2. That is, \( \ gXdim \geq n/2 \). Each work-item handles 2 dist_id elements. The local workspace should be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp per work-group.; The kernel is aimed to be used when the number of elements in the array is small enough to be handled by a single work-group. It promotes the output data type to double so that a consistent API is maintained.

Parameters

[in]	in	array of `dist_id` elements.
[out]	weights	array with the computed weights (`float` elements).
[out]	sums	(reduced) array with the sum. Its size should be \( sizeof\ (double) \).
[in]	data	local buffer. Its size should be `2 float` elements for each work-item in a work-group. That is \( 2lXdimsizeof\ (float) \).
[in]	n	number of elements in the array.

kernel void icpComputeReduceWeights_WG	(	global dist_id *	in,
		global float *	weights,
		global float *	sums,
		local float *	data,
		uint	n
	)

Computes a set of weights \( \{w_i = \frac{100}{100+\|x_i-x'_i\|_p}\} \), and reduces them to get their sum, \( \sum^n_i{w_i} \).

Takes distances between pairs of points and produces a set of weights.

Note: The number of elements in the input array, n, should be a multiple of 2. The global workspace should be one dimensional, and its x dimension, \( gXdim \), should be greater than or equal to the number of elements in the array, n, divided by 2. That is, \( \ gXdim \geq n/2 \). Each work-item handles 2 dist_id elements. The local workspace should be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp per work-group.; When the number of elements is small enough to be handled by a single work-group, use icpComputeReduceWeights instead.; When the elements are more than what a single work-group can handle, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the sum from each block reduction. A reduction should then be made (by reduce_sum_fd) on those sums for the final result. The number of work-groups in the x dimension, \( wgXdim \), for the case of multiple work-groups, should be made a multiple of 4. The potential extra work-groups are used for enforcing correctness. They write the necessary identity operands, 0.f, in the output array, since in the next phase the data are going to be handled as float4.

Parameters

[in]	in	array of `dist_id` elements.
[out]	weights	array with the computed weights (`float` elements).
[out]	sums	(reduced) array with the sums. Its size should be \( wgXdim*sizeof\ (float) \).
[in]	data	local buffer. Its size should be `2 float` elements for each work-item in a work-group. That is \( 2lXdimsizeof\ (float) \).
[in]	n	number of elements in the array.

kernel void icpGMean	(	global float4 *	in,
		global float4 *	out,
		local float *	data,
		uint	n
	)

Performs a reduce operation on an array of 4-D points.

Computes the sums \( \bar{x}_j = \sum^{n}_{i}{x_{ij}}, j=\{0,1,2\} \) on the xyz dimensions of the points in the fixed and moving sets.

Note: This kernel is supposed to be used for the reduction of the block means from icpMean.; The global workspace should be one dimensional and its x dimension, \( gXdim \), should be greater or equal to the number of points, n, in the array divided by 2. That is, \( \ gXdim \geq n/2 \). The local workspace should also be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp per work-group.; When the number of points in the array is small enough to be handled by a single work-group, the output array will contain the true means. When the points are more than that, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the means from each block reduction. A reduction should then be made on those means for the final result.

Parameters

[in]	in	array of `float4` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[out]	out	array of mean `float4` vectors. When the kernel is dispatched with just one work-group, the array contains one vector with the means on the xyz dimensions, and its size should be \( sizeof\ (float4) \). When the kernel is dispatched with more than one work-group, the array contains the means from each block reduction, and its size should be \( wgXdim*sizeof\ (float4) \).
[in]	data	local buffer. Its size should be `6 float` elements for each work-item in a work-group. That is \( lXdim(2(3*sizeof\ (float))) \).
[in]	n	number of points in the array.

kernel void icpMean	(	global float4 *	F,
		global float4 *	M,
		global float4 *	mean,
		local float *	data,
		uint	n
	)

Performs reduce operations on arrays of 8-D points.

Computes the means \( \bar{x}_j = \sum^{n}_{i}{\frac{x_{ij}}{n}}, j=\{0,1,2\} \) on the xyz dimensions of the points in the fixed and moving sets.

Note: The kernel normalizes the data, and then performs the reduction.; The number of elements, n, in the arrays should be a multiple of 2 (each work-item loads 2 points). The x dimension of the global workspace, \( gXdim \), should be greater than or equal to the number of points in the arrays divided by 2. That is, \( \ gXdim \geq n/2 \). The y dimension of the global workspace, \( gYdim \), should be equal to 2. That is, \( \ gYdim = 2 \). The local workspace should also be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp per work-group.; When the number of points in the arrays is small enough to be handled by a single work-group, the output arrays will contain the true means. When the points are more than that, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the means from each block reduction. A reduction should then be made on those means for the final result.

Parameters

[in]	F	fixed set of `float8` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[in]	M	moving set of `float8` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[out]	mean	array of mean `float4` vectors. When the kernel is dispatched with just one work-group per row, the array contains one vector per set with the means on the xyz dimensions. Its size should be \( 2sizeof\ (float4) \). The first `float4` is the mean for the fixed set, and the second `float4` is the mean for the moving set. When the kernel is dispatched with more than one work-group, the array contains the means from each block reduction, and its size should be \( 2(wgXdim*sizeof\ (float4)) \). The first row contains the block means for the fixed set, and the second row contains the block means for the moving set.
[in]	data	local buffer. Its size should be `6 float` elements for each work-item in a work-group. That is \( lXdim(2(3*sizeof\ (float))) \).
[in]	n	number of points in the sets.

kernel void icpMean_Weighted	(	global float4 *	F,
		global float4 *	M,
		global float4 *	MEAN,
		global float *	W,
		constant double *	sum_w,
		local float *	data,
		uint	n
	)

Performs reduce operations on arrays of 8-D points.

Computes the weighted means \( \bar{x}_j = \frac{\sum^{n}_{i}{w_i*x_{ij}}} {\sum^{n}_{i}{w_i}}, j=\{0,1,2\} \) on the xyz dimensions of the points in the fixed and moving sets.

Note: The kernel normalizes the data, and then performs the reduction.; The number of elements, n, in the arrays should be a multiple of 2 (each work-item loads 2 points). The x dimension of the global workspace, \( gXdim \), should be greater than or equal to the number of points in the arrays divided by 2. That is, \( \ gXdim \geq n/2 \). The y dimension of the global workspace, \( gYdim \), should be equal to 2. That is, \( \ gYdim = 2 \). The local workspace should also be one dimensional, and its x dimension should be a power of 2. It is recommended to use one wavefront/warp per work-group.; When the number of points in the arrays is small enough to be handled by a single work-group, the output arrays will contain the true means. When the points are more than that, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the means from each block reduction. A reduction should then be made on those means for the final result.

Parameters

[in]	F	fixed set of `float8` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[in]	M	moving set of `float8` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[out]	MEAN	array of mean `float4` vectors. When the kernel is dispatched with just one work-group per row, the array contains one vector per set with the means on the xyz dimensions. Its size should be \( 2sizeof\ (float4) \). The first `float4` is the mean for the fixed set, and the second `float4` is the mean for the moving set. When the kernel is dispatched with more than one work-group, the array contains the means from each block reduction, and its size should be \( 2(wgXdim*sizeof\ (float4)) \). The first row contains the block means for the fixed set, and the second row contains the block means for the moving set.
[in]	W	array with the weights between the pairs of points.
[in]	sum_w	sum of the weights.
[in]	data	local buffer. Its size should be `6 float` elements for each work-item in a work-group. That is \( lXdim(2(3*sizeof\ (float))) \).
[in]	n	number of points in the sets.

kernel void icpPowerMethod	(	global float *	Sij,
		global float4 *	means,
		global float4 *	Tk
	)

Computes the quantities that represent the incremental development in the transformation estimation in iteration k.

Uses the Power Method to estimate the unit quaternion \( q_k \) that represents the rotation, and then computes the scale \( s_k \) and translation \( t_k \).

Note: The Power Method is performed on \(N\) and computes the eigenvector corresponding to the maximum magnitude eigenvalue \( \mu \). The eigenvector of interest is the one corresponding to the most positive eigenvalue \(\lambda\). If \( \mu<0 \), the algorithm is executed again on \( N'=N + |\lambda| I \). Then, the eigenvalue \( \lambda \) is \( \mu' - \mu \). The corresponding eigenvector doesn't change.; The kernel should be dispatched as a task (1 work-item).

Parameters

[in]	Sij	array (sums of products) of size \(11*sizeof\ (float)\). The first `9` elements (in row major order) are the \(S_k\) matrix, and the next `2` are the numerator and denominator of the scale \(s_k\).
[in]	means	array (fixed and moving set means) of size \(2*sizeof\ (float4)\).
[out]	Tk	array of size \( 2 * sizeof\ (float4) \). The first `float4` is the unit quaternion \( \dot{q_k} = q_w + q_x i + q_y j + q_z k = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), and the second one is the translation vector \( t_k=\left[ \begin{matrix} t_x & t_y & t_z & 1 \end{matrix} \right]^T \). The scale is placed in the last element of the translation vector. That is, \( t_k = \left[ \begin{matrix} t_x & t_y & t_z & s_k \end{matrix} \right]^T \).

kernel void icpSijProducts	(	global float4 *	M,
		global float4 *	F,
		global float *	Sij,
		uint	m,
		float	c
	)

Produces the products in the Sij elements of the S matrix.

Multiplies the deviations of corresponding points in the fixed set \( \mathcal{F}_{m \times 3} \) and the moving set \( \mathcal{M}_{m \times 3} \).

Note: Since large sums are built, there is an option to scale the points in order to deal with floating point arithmetic issues. The resulting S matrix will have accordingly scaled eigenvalues. The eigenvectors stay the same. The constituents of the scaling factor s are scaled as well, but the factor itself is not affected.; Each work-item processes up to 4 pairs of points. So, the output of each work-item is partial sums of products. On a next step, these results will have to be reduced to get the final S matrix.; The global workspace should be one dimensional and its x dimension, \( gXdim \), should be greater or equal to the number of points, m, in the sets. That is, \( \ gXdim \geq m \). There is no requirement for the local workspace.

Parameters

[in]	M	array (moving set deviations) of `float4` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[in]	F	array (fixed set deviations) of `float4` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[out]	Sij	array (partial sums of products). Its number of rows is `11`. Its number of columns is \( gXdim \). So, its size should be \( 11 * gXdim * sizeof\ (float) \).
[in]	m	number of points in the sets.
[in]	c	scaling factor.

kernel void icpSijProducts_Weighted	(	global float4 *	M,
		global float4 *	F,
		global float *	W,
		global float *	Sij,
		uint	m,
		float	c
	)

Produces the weighted products in the Sij elements of the S matrix.

Multiplies the deviations of corresponding points in the fixed set \( \mathcal{F}_{m \times 3} \) and the moving set \( \mathcal{M}_{m \times 3} \).

Note: Since large sums are built, there is an option to scale the points in order to deal with floating point arithmetic issues. The resulting S matrix will have accordingly scaled eigenvalues. The eigenvectors stay the same. The constituents of the scaling factor s are scaled as well, but the factor itself is not affected.; Each work-item processes up to 4 pairs of points. So, the output of each work-item is partial sums of products. On a next step, these results will have to be reduced to get the final S matrix.; The global workspace should be one dimensional and its x dimension, \( gXdim \), should be greater or equal to the number of points, m, in the sets. That is, \( \ gXdim \geq m \). There is no requirement for the local workspace.

Parameters

[in]	M	array (moving set deviations) of `float4` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[in]	F	array (fixed set deviations) of `float4` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[in]	W	array (weights) of `float` elements.
[out]	Sij	array (partial sums of products). Its number of rows is `11`. Its number of columns is \( gXdim \). So, its size should be \( 11 * gXdim * sizeof\ (float) \).
[in]	m	number of points in the sets.
[in]	c	scaling factor.

kernel void icpSubtractMean	(	global float4 *	F,
		global float4 *	M,
		global float4 *	DF,
		global float4 *	DM,
		constant float4 *	mean
	)

Computes the deviations from the means of the fixed and moving sets of 8-D points.

Subtracts the means from the 4-D geometric coordinates of the points.

Note: The x dimension of the global workspace, \( gXdim \), should be equal to the number of points in the sets. That is, \( \ gXdim=n \). The y dimension of the global workspace should be equal to 2. That is, \( \ gYdim=2 \). There is no requirement for the local workspace.

Parameters

[in]	F	fixed set of `float8` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[in]	M	moving set of `float8` elements. The first 3 dimensions should contain the xyz coordinates of the points.
[out]	DF	array of `float4` elements (fixed set deviations from the mean). Only the geometric information gets transfered in the output.
[out]	DM	array of `float4` elements (moving set deviations from the mean). Only the geometric information gets transfered in the output.
[in]	mean	fixed and moving set means. The first `float4` is the fixed set mean, amd the second `float` is the moving set mean.

kernel void icpTransform_Matrix	(	global float4 *	M,
		global float4 *	tM,
		constant float4 *	T
	)

Performs a homogeneous transformation on a set of points.

Transforms each point in a set, \( p' = Tp = \left[ \begin{matrix} R' & t \\ 0 & 1 \end{matrix} \right]\left[ \begin{matrix} p \\ 1 \end{matrix} \right] = \left[ \begin{matrix} sR & t \\ 0 & 1 \end{matrix} \right]\left[ \begin{matrix} p \\ 1 \end{matrix} \right] = sRp+t \).

Note: It there is a need for scaling, the scaling factor should already be incorporated in the rotation matrix.; The x dimension of the global workspace, \( gXdim \), should be equal to 2. That is, \( \ gXdim = 2 \). The y dimension of the global workspace should be equal to the number of points, m, in the sets. That is, \( \ gYdim = m \). There is no requirement for the local workspace.

Parameters

[in]	M	array of `float8` elements. The first 4 dimensions should contain the homogeneous coordinates of the points.
[out]	tM	array of `float8` elements. The first 4 dimensions will contain the transformed homogeneous coordinates of the points.
[in]	T	the transformation matrix of size \( 16 * sizeof\ (float) \). The elements should be laid out in row major order.

kernel void icpTransform_Quaternion	(	global float4 *	M,
		global float4 *	tM,
		constant float4 *	data
	)

Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = (\omega, \mathcal{v}) = q_w + (q_x i + q_y j + q_z k) = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \).

Transforms each point in a set, \(\ p' = s\dot{q}\dot{p}\dot{q}^*+t = s(p + 2\mathcal{v} \times (\mathcal{v} \times p + \omega p)) + t \).

Note: The x dimension of the global workspace, \( gXdim \), should be equal to 2. That is, \( \ gXdim = 2 \). The y dimension of the global workspace should be equal to the number of points, m, in the sets. That is, \( \ gYdim = m \). There is no requirement for the local workspace.

Parameters

[in]	M	array of `float8` elements. The first 4 dimensions should contain the homogeneous coordinates of the points.
[out]	tM	array of `float8` elements. The first 4 dimensions will contain the transformed homogeneous coordinates of the points.
[in]	data	array of size \( 2 * sizeof\ (float4) \). The first `float4` is the quaternion, and the second is the translation vector. If there is a need to apply scaling, the factor should be available in the last element of the translation vector. That is, \( t = \left[ \begin{matrix} t_x & t_y & t_z & s \end{matrix} \right]^T \).

kernel void icpTransform_Quaternion_2	(	global float4 *	M,
		global float4 *	tM,
		constant float4 *	data
	)

Performs a homogeneous transformation on a set of points, \( p = \left[ \begin{matrix} p_x & p_y & p_z & 1 \end{matrix} \right]^T \) (or as a quaternion, \( \dot{p} = \left[ \begin{matrix} p_x & p_y & p_z & 0 \end{matrix} \right]^T \)), using unit quaternions \( \dot{q} = q_w + q_x i + q_y j + q_z k = \left[ \begin{matrix} q_x & q_y & q_z & q_w \end{matrix} \right]^T \), where \( \dot{q}\cdot\dot{q}=1 \).

Transforms each point in a set, \(\ p' = s\dot{q}\dot{p}\dot{q}^*+t = s\bar{Q}^TQ\dot{p}+t = s \left[ \begin{matrix} q_w & q_z & -q_y & q_x \\ -q_z & q_w & q_x & q_y \\ q_y & -q_x & q_w & q_z \\ -q_x & -q_y & -q_z & q_w \end{matrix} \right]^T \left[ \begin{matrix} q_w & -q_z & q_y & q_x \\ q_z & q_w & -q_x & q_y \\ -q_y & q_x & q_w & q_z \\ -q_x & -q_y & -q_z & q_w \end{matrix} \right] \left[ \begin{matrix} p_x \\ p_y \\ p_z \\ 0 \end{matrix} \right] + \left[ \begin{matrix} t_x \\ t_y \\ t_z \\ 1 \end{matrix} \right] = s \left[ \begin{matrix} 1-2q_y^2-2q_z^2 & 2(q_xq_y-q_zq_w) & 2(q_xq_z+q_yq_w) & 0 \\ 2(q_xq_y+q_zq_w) & 1-2q_x^2-2q_z^2 & 2(q_yq_z-q_xq_w) & 0 \\ 2(q_xq_z-q_yq_w) & 2(q_yq_z+q_xq_w) & 1-2q_x^2-2q_y^2 & 0 \\ 0 & 0 & 0 & 1 \end{matrix} \right] \left[ \begin{matrix} p_x \\ p_y \\ p_z \\ 0 \end{matrix} \right] + \left[ \begin{matrix} t_x \\ t_y \\ t_z \\ 1 \end{matrix} \right]\).

Note: The x dimension of the global workspace, \( gXdim \), should be equal to 2. That is, \( \ gXdim = 2 \). The y dimension of the global workspace should be equal to the number of points, m, in the sets. That is, \( \ gYdim = m \). There is no requirement for the local workspace.

Parameters

[in]	M	array of `float8` elements. The first 4 dimensions should contain the homogeneous coordinates of the points.
[out]	tM	array of `float8` elements. The first 4 dimensions will contain the transformed homogeneous coordinates of the points.
[in]	data	array of size \( 2 * sizeof\ (float4) \). The first `float4` is the quaternion, and the second is the translation vector. If there is a need to apply scaling, the factor should be available in the last element of the translation vector. That is, \( t = \left[ \begin{matrix} t_x & t_y & t_z & s \end{matrix} \right]^T \).

void prod	(	float4 *	N,
		float4 *	x,
		float4 *	x_new
	)

inline

Computes a matrix-vector product, \( x_{new}=Nx \).

Parameters

[in]	N	`4x4` matrix (`4xfloat4` elements).
[in]	x	vector (`float4` element).
[out]	x_new	vector (`float4` element).

kernel void reduce_sum_fd	(	global float4 *	in,
		global double *	out,
		local double *	data,
		uint	n
	)

Performs a reduce operation on the columns of an array.

Computes the sum of the elements of each row in an array.

Note: When there are multiple rows in the array, a reduce operation is performed per row, in parallel.; The number of elements, N, in a row of the array should be a multiple of 4 (the data are handled as float4). The x dimension of the global workspace, \( gXdim \), should be greater or equal to the number of elements in a row of the array divided by 8. That is, \( \ gXdim \geq N/8 \). Each work-item handles 8 float (= 2 float4) elements in a row of the array. The y dimension of the global workspace, \( gYdim \), should be equal to the number of rows, M, in the array. That is, \( \ gYdim = M \). The local workspace should be 1 in the y dimension, and a power of 2 in the x dimension. It is recommended to use one wavefront/warp per work-group.; When the number of elements per row of the array is small enough to be handled by a single work-group, the output array will contain the true sums. When the elements are more than that, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the sum from each block reduction. A reduction should then be made on those sums for the final results. The number of work-groups in the x dimension, \( wgXdim \), for the case of multiple work-groups, should be made a multiple of 4. The potential extra work-groups are used for enforcing correctness. They write the necessary identity operands, 0.0, in the output array, since in the next phase the data are going to be handled as double4.

Parameters

[in]	in	array of `float` elements.
[out]	out	(reduced) array of `double` elements. When the kernel is dispatched with one work-group per row, the array contains the final results, and its size should be \( rowssizeof\ (double) \). When the kernel is dispatched with more than one work-groups per row, the array contains the results from each block reduction, and its size should be \( wgXdimrows*sizeof\ (double) \).
[in]	data	local buffer. Its size should be `2 double` elements for each work-item in a work-group. That is \( 2lXdimsizeof\ (double) \).
[in]	n	number of elements in a row of the array divided by 4.

Classes

Functions

Detailed Description

Function Documentation