NihAV — crates overview

There is a separate document with a general architecture overview and here I'd like to talk about what actual functionality NihAV crate offer. This review does not take crates that provide (de)coders and (de)muxers since they offer just one to four public functions to register the offered codecs and such.

While I tried to document every public function, data structure and interface, this page still may serve as a quick guide to what things to find where.

nihav_core
nihav_codec_support
nihav_registry

nihav_core

As you can guess from the name this crate provides core definitions and utility functions. There are following modules in the crate:

codecs—definitions for encoder and decoder interfaces;
compr—definitions and common (de)compression formats support. Currently it contains just the definitions for decompression and deflate decompressor;
demuxers—definitions for demuxer interfaces;
formats—contains definitions of audio and video formats including some common ones;
frame—contains various definitions and utility code for frames and packets processing.
io—the module for byte- and bit-oriented input and output. It contains not some additional functionality for e.g. reading variable-length codes or codes from codebooks;
muxers—definitions for muxer interfaces;
options—definitions for options support (setting/querying/parsing) plus some definitions for common options;
refs—implementation for NABufferRef, a custom reference-counted owner for buffers (in frames and packets);
reorder—definitions and several implementations of frame reordering (required for decoders that output frames in non-sequential order);
scale—NIHed library for converting video frames into another format (e.g. with different dimensions or colourspace);
soundcvt—NIHed library for converting audio frames into another format.

formats contains the definitions of NASoniton (audio sample format definition), NAChannelType (should be obvious), and NAPixelFormaton and NAPixelChromaton (those define pixel format used by image and a description of a single component of it). NihAV was designed with a dislike for long enumerations that are added to frequently (NAChannelType does not fit that description since it is a list of standard channel names), that is why instead of defining a list of supported or recognised pixel formats NihAV operates on NAPixelFormaton that should be able to represent most of the sane formats and they will be processed without a need for central registry of known formats and knowledge how to process every individual format. A bit more about it will be discusses in scale module description. Nevertheless, for convenience this module also contains some commonly used format definitions as constants e.g. SND_F32P_FORMAT or YUV420_FORMAT.

frame contains a lot of structures used in processing data: there is NAStream that represents single demuxed or muxed stream, there is NACodecInfo to store the information about the codec associated with the stream, there is NAPacket to store the raw data for the stream and NATimeInfo with the time information for it (PTS, DTS, duration, stream time base), there is NASideData as well to pass palette changes in codec-agnostic ways, there is NAFrame to hold the decoded data and NABufferType to store data and metadata for decoded image or audio. There are also utility functions to allocate those buffers called alloc_video_buffer(), alloc_audio_buffer() and alloc_data_buffer(). It should be noted that the function for image buffer allocation takes two parameters, image information (containing pixel format and image dimensions) and dimension alignment (0 – unaligned, 4 – aligned to multiple of 16, etc). It is done this way to avoid confusion between actual image dimensions and the aligned dimensions required by some codec.

io provides various byte- and bit-oriented input and output interfaces and more. For byte operations there are two structures, ByteReader and ByteWriter that take some object implementing ByteIO trait and offer common operations like peeking/reading/writing 8/16/24/32/64-bit integers of different endianness and 32- or 64-bit floats. Some of those functions are available as standalone, e.g. read_u16be() will attempt to read 16-bit big-endian unsigned integer from a byte slice. Of course there are some common reader/writer interfaces implemented as well, namely FileReader for reading from file, MemoryReader for reading from byte array, FileWriter for writing into file, MemoryWriter for writing into byte array and GrowableMemoryWriter for writing into a resizeable byte array.

There is single BitReader that reads data from byte array but it can treat input bitstream in several modes depending on the way it was created: LSB first, MSB first on input bytes, MSB first on 16-bit little-endiand words or MSB first on 32-bit little-endian words. While this sacrifices efficiency by not having a special bit reader for each instance, this simplifies interacting with the cases when you e.g. have audio codec with data stored in 16-bit words and streams in both kinds of endianness exist (like ATSC A/52).

Additionally there are functions to enable reading variable-length integer codes (in nihav_core::io::intcode) and functionality for reading codebook values from the bitstream (in nihav_core::io::codebook). First you need to construct a codebook using an object implementing CodebookDescReader trait to report the number of codes and their length/bits/value (it can be read from the table or constructed on the fly, it is up to you to decide how to implement it) and then reading a value is simple as bit_reader.read_cb(&codebook)?.

There is bit writing functionality as well, BitWriter pushes bytes into destination vector of bytes using the interface very similar to BitReader.

scale is the module responsible for converting image into different format. As mentioned above, NihAV uses pixel format descriptor instead of a tag so format conversion does not need to handle special cases (even if it might do so for performance reasons). For instance, if input is RGB555 and output is RGB24 all we have to do is repacking which means extracting all three components for each pixel, padding them to eight bits and packing again. And if we convert YUVA410 to YUV420 we simply need to upscale chroma planes (which is determined by comparing component subsampling in corresponding NAPixelChromatons) and discard alpha plane. When doing conversions inside one colourspace all we need to care about is how many components we have and how they are represented. And if we need to convert image in the generic case we simply unpack it to planar format, convert each pixel and pack into the destination format.

Obviously this is based on my earlier work on AVScale (one of them) and later NAScale, hence the name of the structure that does the conversion.

soundcvt is audio counterpart of scale except that currently it cannot perform resampling. It offers convert_audio_frame() function that converts input audio frame into output audio frame. Currently this function can convert between different input and output formats (e.g. packed 16-bit integers to planar 32-bit floats) and perform channel reordering and limited remixing (calculating remixing matrix for arbitrary input and output channel maps is not easy and not currently needed).

If there is a need for resampling as well, one can use NAResample that is initialised with the source/target parameters and then takes input frames to convert them to the desired format (sample type, channels and sample rate).

nihav_codec_support

This is a crate with functionality that is useful just for developing codes within NihAV framework and not directly to the theoretical end users. The crate contains following modules:

codecs—various bits of code useful when implementing decoders;
data—currently it is just GenericCache which stores data for last several lines (you use it e.g. when you predict motion vectors from left, top and top-right neighbours and do not need to store all motion vectors for later) but in the future more data structures may be added here;
dsp—various DSP functions (FFT, (M)DCT, QMF, window generation);
imgwrite—a couple of public functions to write video frames as PPM or PGMYUV (I dump frames as images in tests and in nihav-tool so I decided to factor out common code);
test—functionality for testing decoders and encoders;
vq—vector quantisation for generic data types.

codecs contains various utility code for helping developing decoders like HAMShuffler/IPShuffler/IPBShuffler for managing references to the reference frames (do not confuse it with frame reorderer from nihav_core, this one is used when a decoder needs to retrieve reference frame to perform motion compensation) or MV type for motion vector operations (so you can write e.g. blk.mv = MV::pred(a_mv, b_mv, c_mv) + diff_mv instead of doing it for each component independently). Also there is common code for various decoders, currently it has IMA ADPCM predictor (used in QT and WAV IMA ADPCM and Duck ADPCM) and H.263 scaffold that calls codec-specific functions for decoding various headers and block coefficients and DSP functions (if provided default DSP implementation differs from the desired one).

test contains functionality for testing codecs and muxers. For decoders there are several functions: test_file_decoding() for decoding and (optional) dumping decoded output, test_decode_audio() for doing the same for audio streams and test_decoding() for performing proper testing. The last function tests specified decoder in a stream against one of three possible results: that the decoding finishes without errors, that the full decoded stream MD5 hash matches the expected one or that the per-frame MD5 hashes match the list of expected per-frame hashes. This function also has the mode that generates those per-frame hashes instead of checking them.

Muxers can be tested with test_remuxing() and test_remuxing_md5(). The former function creates output file with streams copied from the input, the latter one calculates MD5 hash of it without creating the file.

Encoders have similar test_encoding_to_file() and test_encoding_md5() that decode a stream from input file, feed it to encoder and then either write encoded stream to file or simply calculate MD5 hash of the encoded output.

vq module contains implementation for two methods of performing vector quantisation of generic data so you can choose between faster median cut and much slower but giving better results ELBG algorithm. In order to use it you need to implement two traits: VQElement that represents the element (plus some operations on it required for vector quantisation) and VQElementSum which essentially calculates a centroid for a cluster of elements.

nihav_registry

This is a utility crate that helps to detect various formats and keeps codec lists and thus might be useful for both demuxers developed inside NihAV and to NihAV end users. It contains only two modules: detect and register.

detect contains two public functions (detect_format() for format detection from file name and contents and detect_format_by_name() for doing just the former) and DetectionScore to tell whether the format was detected by extension or by contents. Internally there is a list with rules to detect formats: a list of known extensions and list with conditions to try on file contents. Those conditions are inspired by the well-known UNIX magic pattern file: you have an offset at which to try the rule, data type, and data condition. For example Bink files start with 32-bit big-endian value between BIKb and BIKz or between KB2a and KB2z, wave files have 'RIFF' at offset 0 and 'WAVEfmt ' at offset 8 etc etc.

The rationale behind this was to have all supported formats detection in one place that would work regardless to actual demuxer being present (IMO knowing that you lack a demuxer for format X is better than knowing you have an unrecognised file and that's all).

register contains various codec-related lists and public functions to query their contents. Those functions are:

get_codec_description() returns codec description containing its full name, type (audio/video/subtitles/other) and various features (e.g. intra-only, lossless and such);
find_codec_from_avi_fourcc() returns short video codec name from provided FOURCC;
find_codec_from_wav_twocc() returns short audio codec name from provided TWOCC;
find_avi_fourcc() and find_wav_twocc() do the opposite and try to find tag corresponding to the codec;
find_codec_from_mov_video_fourcc() and find_codec_from_mov_audio_fourcc() are MOV counterparts of the first two functions.

Back to the main page