Description
The command-line utility
ngraminfo
prints various information about an n-gram model obtained from the
NGramModel class and the underlying
FST class.
Usage
Examples
See
here for an example use of the command-line utility. At the C++ level, it corresponds to:
cout << "# of states: " << model->GetFst()->NumStates();
cout << "unigram state: " << model->UnigramState();
cout << "n-gram order: " << model-<HiOrder();
cout << "well-formed: " << model->CheckTopology();
cout << "normalized: " << model->CheckNormalization();
and so forth.
Caveats
The number of unigrams will differ by one from an ARPA format of the model, since the ARPA format includes a unigram for the start symbol <s>, which is not represented as an n-gram in our model (rather as the start state). We include it in our ARPA format output to be consistent with typical conventions. Note that n-grams that end in the final symbol (</s>) are also not represented as arcs in our representation, instead by final cost. Hence the total number of n-grams are the sum of the number of ngram arcs and the number of final states. For the precise details of the n-gram format, see
here.