SIFT

group NVCV_C_ALGORITHM_SIFT

Functions

NVCVStatus cvcudaSIFTCreate(NVCVOperatorHandle *handle, int3 maxShape, int maxOctaveLayers)

Constructs and an instance of the SIFT operator.

Parameters:

handle – [out] Where the operator instance handle will be written to.
- Must not be NULL.
maxShape – [in] Maximum shape of input tensor images as WxHxN, i.e. x=W y=H z=N of int3, where W=width, H=height and N=samples, or number of images in tensor.
- N (z coordinate) must be >= 1 and <= 65535.
- W and H (x and y) must be >= 2, and they must take into account if the input is to be expanded during execution time, NVCVSIFTFlagType.
maxOctaveLayers – [in] Maximum layers per octave to be used by the operator, an octave is a level in the Gaussian pyramid containing input scale-space layers in the SIFT algorithm.
- It must be >= 1 and <= 16.

Return values:

NVCV_ERROR_INVALID_ARGUMENT – Handle is null.
NVCV_ERROR_INVALID_ARGUMENT – An argument is outside valid range.
NVCV_ERROR_OUT_OF_MEMORY – Not enough memory to create the operator.
NVCV_SUCCESS – Operation executed successfully.

NVCVStatus cvcudaSIFTSubmit(NVCVOperatorHandle handle, cudaStream_t stream, NVCVTensorHandle in, NVCVTensorHandle featCoords, NVCVTensorHandle featMetadata, NVCVTensorHandle featDescriptors, NVCVTensorHandle numFeatures, int numOctaveLayers, float contrastThreshold, float edgeThreshold, float initSigma, NVCVSIFTFlagType flags)

Executes the SIFT operation on the given cuda stream. This operation does not wait for completion.

Limitations:

Input: Data Layout: [HWC, NHWC] Channels: [1]

Data Type	Allowed
8bit Unsigned	Yes
8bit Signed	No
16bit Unsigned	No
16bit Signed	No
32bit Unsigned	No
32bit Signed	No
32bit Float	No
64bit Float	No

Note

The SIFT operation does not guarantee deterministic output. Each output tensor limits the number of features found by the operator, that is the total number may be greater than this limitation and the order of features returned might differ in different runs. Although the order of features found is random within each image of the input tensor, their relative position across output tensors is consistent.

Parameters:

handle – [in] Handle to the operator.
- Must not be NULL.
stream – [in] Handle to a valid CUDA stream.
in – [in] Input tensor. The expected layout is [HWC] or [NHWC], where N is the number of samples, i.e. images with height H and width W and channels C, inside the tensor. This operator extracts features and computes descriptors of each input image in the in tensor.
- Check above limitations table to the input tensor data layout, number of channels and data type.
featCoords – [out] Output tensor with features coordinates. The expected layout is [NM] or [NMC] meaning a rank-2 or rank-3 tensor with first dimension as number of samples N, second dimension M as maximum number of features to be found, and a potential last dimension C with number of feature coordinates.
- It must have the same number of samples N as input tensor.
- It must have a number of elements M per sample N equal to the maximum allowed number of features that can be extracted by the SIFT algorithm per sample image. This number M must be the same for all output tensors. The actual number of features extracted is stored in numFeatures (see below).
- It must have F32 data type with C=4 or 4F32 data type with C=1 (or C not present in tensor) to store feature coordinates (x, y) along sample image width W and height H, respectively, coordinate (z) of the octave (i.e. the level of the SIFT Gaussian pyramid) and coordinate (w) of the layer in the octave (i.e. the scale-space layer in the pyramid level) of each extracted feature.
featMetadata – [out] Output tensor with features metadata: orientation angle, score response and size. The expected layout is [NM] or [NMC] meaning a rank-2 or rank-3 tensor with first dimension as number of samples N, second dimension M as maximum number of features to be found, and a potential last dimension C with number of feature metadata.
- It must have the same number of samples N as input tensor.
- It must have a number of elements M per sample N equal to the maximum allowed number of features that can be extracted by the SIFT algorithm per sample image. This number M must be the same for all output tensors. The actual number of features extracted is stored in numFeatures (see below).
- It must have F32 data type with C=3 or 3F32 data type with C=1 (or C not present in tensor) to store orientation angle in (x), score response in (y) and feature size in (z) of each extracted feature.
featDescriptors – [out] Output tensor with features descriptors. The expected layout is [NMD] meaning a rank-3 tensor with first dimension as number of samples N, second dimension M as maximum number of features to be found, and a third dimension D as depth of each feature descriptor (SIFT descriptor has a fixed 128-Byte depth).
- It must have the same number of samples N as input tensor.
- It must have a number of elements M per sample N equal to the maximum allowed number of features that can be extracted by the SIFT algorithm per sample image. This number M must be the same for all output tensors. The actual number of features extracted is stored in numFeatures (see below).
- It must have U8 data type and D=128 to store each 128-Byte feature descriptor.
numFeatures – [out] Output tensor to store the number of features found in the input tensor. The expected layout is [N] or [NC], meaning rank-1 or rank-2 tensor with first dimension as number of samples N, and a potential last dimension C with number of channels. It expresses the total number of features found, regardless of the maximum allowed number of features M in output tensors (see above). Since features are found randomly on each image in input tensor, they are discarded in a non-deterministic way when the number of features found is bigger than M.
- It must have the same number of samples as input tensor.
- It must have S32 data type to store number of features found.
- It must have one element per sample, i.e. number of channels must be 1 in a [NC] tensor.
numOctaveLayers – [in] Number of layers in each octave. One suggestion, given by the original algorithm description, is to use numOctaveLayers = 3. The number of octaves is computed from the input image resolution WxH as \( log(min(W, H))/log(2) - 2 \).
- It must be positive.
- It must be at most maxOctaveLayers, that is defined in operator constructor cvcudaSIFTCreate.
contrastThreshold – [in] The contrast threshold used to remove features with low contrast. The larger this threshold, the less features are extracted by the operator. One suggestion, given by the original algorithm description, is to use \( 0.03 \).
- It must be positive.
edgeThreshold – [in] The edge threshold used to remove features that are similar to edges. The larger this threshold, the more features are extracted by the operator. One suggestion, given by the original algorithm description, is to use \( 10.0 \).
- It must be positive.
initSigma – [in] The initial sigma to be applied by the first Gaussian filter done at the first octave. This sigma is progressively applied for each scale-space layer within each octave (i.e. the level of the SIFT Gaussian pyramid). One suggestion, given by the original algorithm description, is to use same sigma equals to 1.6.
- It must be positive.
flags – [in] Set up additional flags for SIFT operator, see NVCVSIFTFlagType. It supports one flag to control whether to expand input images by a factor of 2, using bilinear interpolation, prior to building the SIFT Gaussian scale-space pyramid. This is to avoid ignoring the highest spatial frequencies. One suggestion, given by the original algorithm description, is to apply this expansion and thus use flags = NVCV_SIFT_USE_EXPANDED_INPUT_SIZE.
- It must be one of NVCVSIFTFlagType.

Return values:

NVCV_ERROR_INVALID_ARGUMENT – Some parameter is outside valid range.
NVCV_ERROR_INTERNAL – Internal error in the operator, invalid types passed in.
NVCV_SUCCESS – Operation executed successfully.