There is a separate document with a general architecture overview and here I'd like to talk about what actual functionality
NihAV crate offer. This review does not take crates that provide (de)coders and (de)muxers since they offer just one to four public functions to register the offered codecs and such.
While I tried to document every public function, data structure and interface, this page still may serve as a quick guide to what things to find where.
As you can guess from the name this crate provides core definitions and utility functions. There are following modules in the crate:
codecs—definitions for encoder and decoder interfaces;
compr—definitions and common (de)compression formats support. Currently it contains just the definitions for decompression and
demuxers—definitions for demuxer interfaces;
formats—contains definitions of audio and video formats including some common ones;
frame—contains various definitions and utility code for frames and packets processing.
io—the module for byte- and bit-oriented input and output. It contains not some additional functionality for e.g. reading variable-length codes or codes from codebooks;
muxers—definitions for muxer interfaces;
options—definitions for options support (setting/querying/parsing) plus some definitions for common options;
NABufferRef, a custom reference-counted owner for buffers (in frames and packets);
reorder—definitions and several implementations of frame reordering (required for decoders that output frames in non-sequential order);
scale—NIHed library for converting video frames into another format (e.g. with different dimensions or colourspace);
soundcvt—NIHed library for converting audio frames into another format.
formats contains the definitions of
NASoniton (audio sample format definition),
NAChannelType (should be obvious), and
NAPixelChromaton (those define pixel format used by image and a description of a single component of it).
NihAV was designed with a dislike for long enumerations that are added to frequently (
NAChannelType does not fit that description since it is a list of standard channel names), that is why instead of defining a list of supported or recognised pixel formats
NihAV operates on
NAPixelFormaton that should be able to represent most of the sane formats and they will be processed without a need for central registry of known formats and knowledge how to process every individual format. A bit more about it will be discusses in
scale module description. Nevertheless, for convenience this module also contains some commonly used format definitions as constants e.g.
frame contains a lot of structures used in processing data: there is
NAStream that represents single demuxed or muxed stream, there is
NACodecInfo to store the information about the codec associated with the stream, there is
NAPacket to store the raw data for the stream and
NATimeInfo with the time information for it (PTS, DTS, duration, stream time base), there is
NASideData as well to pass palette changes in codec-agnostic ways, there is
NAFrame to hold the decoded data and
NABufferType to store data and metadata for decoded image or audio. There are also utility functions to allocate those buffers called
alloc_data_buffer(). It should be noted that the function for image buffer allocation takes two parameters, image information (containing pixel format and image dimensions) and dimension alignment (0 – unaligned, 4 – aligned to multiple of 16, etc). It is done this way to avoid confusion between actual image dimensions and the aligned dimensions required by some codec.
io provides various byte- and bit-oriented input and output interfaces and more. For byte operations there are two structures,
ByteWriter that take some object implementing
ByteIO trait and offer common operations like peeking/reading/writing 8/16/24/32/64-bit integers of different endianness and 32- or 64-bit floats. Some of those functions are available as standalone, e.g.
read_u16be() will attempt to read 16-bit big-endian unsigned integer from a byte slice. Of course there are some common reader/writer interfaces implemented as well, namely
FileReader for reading from file,
MemoryReader for reading from byte array,
FileWriter for writing into file,
MemoryWriter for writing into byte array and
GrowableMemoryWriter for writing into a resizeable byte array.
There is single
BitReader that reads data from byte array but it can treat input bitstream in several modes depending on the way it was created: LSB first, MSB first on input bytes, MSB first on 16-bit little-endiand words or MSB first on 32-bit little-endian words. While this sacrifices efficiency by not having a special bit reader for each instance, this simplifies interacting with the cases when you e.g. have audio codec with data stored in 16-bit words and streams in both kinds of endianness exist (like ATSC A/52).
Additionally there are functions to enable reading variable-length integer codes (in
nihav_core::io::intcode) and functionality for reading codebook values from the bitstream (in
nihav_core::io::codebook). First you need to construct a codebook using an object implementing
CodebookDescReader trait to report the number of codes and their length/bits/value (it can be read from the table or constructed on the fly, it is up to you to decide how to implement it) and then reading a value is simple as
There is bit writing functionality as well,
BitWriter pushes bytes into destination vector of bytes using the interface very similar to
scale is the module responsible for converting image into different format. As mentioned above,
NihAV uses pixel format descriptor instead of a tag so format conversion does not need to handle special cases (even if it might do so for performance reasons). For instance, if input is RGB555 and output is RGB24 all we have to do is repacking which means extracting all three components for each pixel, padding them to eight bits and packing again. And if we convert YUVA410 to YUV420 we simply need to upscale chroma planes (which is determined by comparing component subsampling in corresponding
NAPixelChromatons) and discard alpha plane. When doing conversions inside one colourspace all we need to care about is how many components we have and how they are represented. And if we need to convert image in the generic case we simply unpack it to planar format, convert each pixel and pack into the destination format.
Obviously this is based on my earlier work on
AVScale (one of them) and later
NAScale, hence the name of the structure that does the conversion.
soundcvt is audio counterpart of
scale except that currently it cannot perform resampling. It offers
convert_audio_frame() function that converts input audio frame into output audio frame. Currently this function can convert between different input and output formats (e.g. packed 16-bit integers to planar 32-bit floats) and perform channel reordering and limited remixing (calculating remixing matrix for arbitrary input and output channel maps is not easy and not currently needed).
If there is a need for resampling as well, one can use
NAResample that is initialised with the source/target parameters and then takes input frames to convert them to the desired format (sample type, channels and sample rate).
This is a crate with functionality that is useful just for developing codes within
NihAV framework and not directly to the theoretical end users. The crate contains following modules:
codecs—various bits of code useful when implementing decoders;
data—currently it is just
GenericCachewhich stores data for last several lines (you use it e.g. when you predict motion vectors from left, top and top-right neighbours and do not need to store all motion vectors for later) but in the future more data structures may be added here;
dsp—various DSP functions (FFT, (M)DCT, QMF, window generation);
imgwrite—a couple of public functions to write video frames as PPM or PGMYUV (I dump frames as images in tests and in
nihav-toolso I decided to factor out common code);
test—functionality for testing decoders and encoders;
vq—vector quantisation for generic data types.
codecs contains various utility code for helping developing decoders like
IPBShuffler for managing references to the reference frames (do not confuse it with frame reorderer from
nihav_core, this one is used when a decoder needs to retrieve reference frame to perform motion compensation) or
MV type for motion vector operations (so you can write e.g.
blk.mv = MV::pred(a_mv, b_mv, c_mv) + diff_mv instead of doing it for each component independently). Also there is common code for various decoders, currently it has IMA ADPCM predictor (used in QT and WAV IMA ADPCM and Duck ADPCM) and H.263 scaffold that calls codec-specific functions for decoding various headers and block coefficients and DSP functions (if provided default DSP implementation differs from the desired one).
test contains functionality for testing codecs and muxers. For decoders there are several functions:
test_file_decoding() for decoding and (optional) dumping decoded output,
test_decode_audio() for doing the same for audio streams and
test_decoding() for performing proper testing. The last function tests specified decoder in a stream against one of three possible results: that the decoding finishes without errors, that the full decoded stream MD5 hash matches the expected one or that the per-frame MD5 hashes match the list of expected per-frame hashes. This function also has the mode that generates those per-frame hashes instead of checking them.
Muxers can be tested with
test_remuxing_md5(). The former function creates output file with streams copied from the input, the latter one calculates MD5 hash of it without creating the file.
Encoders have similar
test_encoding_md5() that decode a stream from input file, feed it to encoder and then either write encoded stream to file or simply calculate MD5 hash of the encoded output.
vq module contains implementation for two methods of performing vector quantisation of generic data so you can choose between faster median cut and much slower but giving better results ELBG algorithm. In order to use it you need to implement two traits:
VQElement that represents the element (plus some operations on it required for vector quantisation) and
VQElementSum which essentially calculates a centroid for a cluster of elements.
This is a utility crate that helps to detect various formats and keeps codec lists and thus might be useful for both demuxers developed inside
NihAV and to
NihAV end users. It contains only two modules:
detect contains two public functions (
detect_format() for format detection from file name and contents and
detect_format_by_name() for doing just the former) and
DetectionScore to tell whether the format was detected by extension or by contents. Internally there is a list with rules to detect formats: a list of known extensions and list with conditions to try on file contents. Those conditions are inspired by the well-known UNIX
magic pattern file: you have an offset at which to try the rule, data type, and data condition. For example Bink files start with 32-bit big-endian value between
BIKz or between
KB2z, wave files have
'RIFF' at offset 0 and
'WAVEfmt ' at offset 8 etc etc.
The rationale behind this was to have all supported formats detection in one place that would work regardless to actual demuxer being present (IMO knowing that you lack a demuxer for format X is better than knowing you have an unrecognised file and that's all).
register contains various codec-related lists and public functions to query their contents. Those functions are:
get_codec_description()returns codec description containing its full name, type (audio/video/subtitles/other) and various features (e.g. intra-only, lossless and such);
find_codec_from_avi_fourcc()returns short video codec name from provided FOURCC;
find_codec_from_wav_twocc()returns short audio codec name from provided TWOCC;
find_wav_twocc()do the opposite and try to find tag corresponding to the codec;
find_codec_from_mov_audio_fourcc()are MOV counterparts of the first two functions.