Kaldi 安装及ASpIRE 模型的使用

0x00 参考来源

0x01 安装Kaldi

从GitHub下载kaldi项目

git clone <https://github.com/kaldi-asr/kaldi>

检查依赖环境
```
cd tools
extras/check_dependencies.sh
```
运行完该命令后，会提示要安装的一些依赖。直接使用给出的命令安装就行。

安装完成后，继续执行上述检查依赖环境的命令，直到出现下面的情况。

并行编译(当前目录下）

make -j 20 # 20为并行处理的作业数，可以根据自己的cpu核心数进行修改，我用的是自己的虚拟机，所以直接用的单线程

如果编译成功的话，出现下列情况。

进入到../src文件夹，执行如下命令
```
make depend
```
出现以下情况，按提示完成配置。

编译：
```
make
```
kaldi安装完毕，可以使用自带的模型进行测试。这里我们不再演示。

下载ASpIRE 模型到指定文件夹中。

cd kaldi/egs/aspire/s5
wget <http://dl.kaldi-asr.org/models/0001_aspire_chain_model.tar.gz>
tar xfv 0001_aspire_chain_model.tar.gz

根据README.txt中的说明进行操作，这里仅仅输入我们自己需要的。

steps/online/nnet3/prepare_online_decoding.sh --mfcc-config conf/mfcc_hires.conf data/lang_chain exp/nnet3/extractor exp/chain/tdnn_7b exp/tdnn_7b_chain_online

接下来的这个命令需要运行挺长的时间。

utils/mkgraph.sh --self-loop-scale 1.0 data/lang_pp_test exp/tdnn_7b_chain_online exp/tdnn_7b_chain_online/graph_pp

安装中的问题

解决方法

ubuntu 16.04 sudo apt-get update 报错：下列签名无效： KEYEXPIRED 1538166745 KEYEXPIRED 1538166745 解决方法 - 代码先锋网

Ubuntu：apt-get update出错：由于没有公钥，无法验证下列签名_嘟嘟嘟嘟-CSDN博客

这个问题是由于签名未更新的原因，

0x02 AspIRE 模型的使用

0x00 参考来源

Decoding an audio file using a pre-trained model with Kaldi

0x01 数据准备

sox   test.wav   -r   8000  test-8K.wav

在线解码命令

**Useage**:
online2-wav-nnet3-latgen-faster [options] <nnet3-in> <fst-in> "
        "<spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>\\n"
        "The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if\\n"
        "you want to decode utterance by utterance.\\n";

**Command**:
online2-wav-nnet3-latgen-faster --online=false --do-endpointing=false --frame-subsampling-factor=3 --config=exp/tdnn_7b_chain_online/conf/online.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=exp/tdnn_7b_chain_online/graph_pp/words.txt exp/tdnn_7b_chain_online/final.mdl exp/tdnn_7b_chain_online/graph_pp/HCLG.fst 'ark:echo utterance-id1 utterance-id1|' 'scp:echo utterance-id1 decoderesult/wav1/0_0_0_0_1_1_1_1.wav|' 'ark:/dev/null'

online2-wav-nnet3-latgen-faster \\
	--online=false \\ #是否实时翻译
	--do-endpointing=false \\
	--frame-subsampling-factor=3 \\
	--config=exp/tdnn_7b_chain_online/conf/online.conf \\
	--max-active=7000 \\
	--beam=15.0 \\  #值越大，精准度越高，相应的需要的时间越多
	--lattice-beam=6.0 \\
	--acoustic-scale=1.0 \\
	--word-symbol-table=exp/tdnn_7b_chain_online/graph_pp/words.txt \\ #标记和单词间的映射
	exp/tdnn_7b_chain_online/final.mdl \\  #声学模型的路径
	exp/tdnn_7b_chain_online/graph_pp/HCLG.fst \\  #HCLG WFST图结构路径
	'ark:echo utterance-id1 utterance-id1|' \\  #由wav文件生成的数据<spk2utt-rspecifier>
	'scp:echo utterance-id1 <你的wav文件>|' \\ #<wav-rspecifier>
	'ark:/dev/null'  #输出到屏幕上

效果展示

一步一步解码

decode_test.sh

#计算MFCC特征 相当于语音识别过程中的特征提取模块
compute-mfcc-feats --allow_downsample=true --config=conf/mfcc_hires.conf scp:decoderesult/wav1/wav.scp ark,scp,t:decoderesult/Scmdfeats.ark,decoderesult/Scmdfeats.scp;
#依然是特征信息 kaldi中需要提取ivector向量信息
ivector-extract-online2 --config=exp/tdnn_7b_chain_online/conf/ivector_extractor.conf  --ivector-period=10 ark:decoderesult/wav1/spk2utt scp:decoderesult/Scmdfeats.scp ark,scp,t:decoderesult/Scmdivectors.ark,decoderesult/Scmdivectors.scp;
#声学模型 DNN部分
nnet3-compute --online-ivectors="scp:decoderesult/Scmdivectors.scp" --online_ivector_period=10 --apply-exp=false exp/tdnn_7b_chain_online/final.mdl  p,scp:decoderesult/Scmdfeats.scp ark,t:decoderesult/Scmdnn.ark; 
#语言模型解码
latgen-faster-mapped --min-active=200 --max-active=7000 --max-mem=50000000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/tdnn_7b_chain_online/graph_pp/words.txt exp/tdnn_7b_chain_online/final.mdl  exp/tdnn_7b_chain_online/graph_pp/HCLG.fst 'ark:decoderesult/Scmdnn.ark' 'ark,t:decoderesult/decode_Scmdnn.ark' \\