Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTM2XML error on initial test #1

Open
proycon opened this issue May 1, 2019 · 12 comments
Open

CTM2XML error on initial test #1

proycon opened this issue May 1, 2019 · 12 comments

Comments

@proycon
Copy link
Member

proycon commented May 1, 2019

The new LaMachine with the webservice is deployed on ponyland now. I recorded and submitted a wave file to test it, and the service runs but produces an error:

Argument /var/www/webservices-lst/live/writable/eng_ASR/scratch//test.wav is a sound file, using it as audio

[                                                  ] Diarization (0/1)

[                                                  ] Diarization (0/1)

[                                                  ] Diarization (0/1)

[                                                  ] Diarization (0/1)
Diarization completed in 0:00:03 (CPU: 0:00:08), Memory used: 64 MB                
Split 1 source file into 1 segment                              
Number of speakers is less than 8, reducing number of jobs to 1
Duration of speech: 0h:0m:4s

[                                                  ] Chain Decoding (0/1)
Chain decoding completed in 0:00:00 (CPU: 0:00:00), Memory used: 7 MB                
Traceback (most recent call last):
  File "./scripts/ctm2xml.py", line 48, in <module>
    sdur=str(float(CTM[-1]['stime'])+float(CTM[-1]['dur'])) #file duration
IndexError: list index out of range
INFO       2019-05-01 21:18:58,026 Processing file '/var/www/webservices-lst/live/writable/eng_ASR/projects/proycon/test/output//test.ctm'...
Traceback (most recent call last):
  File "./scripts/ctm2tg.py", line 100, in <module>
    main(sys.argv)
  File "./scripts/ctm2tg.py", line 39, in main
    word_tier_intervals = create_tier_intervals_from_ctm(os.path.join(ctm_file),textgrid_file)
  File "./scripts/ctm2tg.py", line 51, in create_tier_intervals_from_ctm
    new_interval = interval.Interval(0.0, float(ctm_lines[0].split()[2]) , "")
IndexError: list index out of range
@proycon
Copy link
Member Author

proycon commented May 2, 2019

The CTM file is actually entirely empty, which is obviously wrong. The ctm2xml converter stumbles over it, but that's secondary.

@schemreier
Copy link
Collaborator

This has nothing to do with the ctm2xml conversion. ASR engine fails at some point. I need to see the log files.

@proycon
Copy link
Member Author

proycon commented May 2, 2019

Here's the CLAM full log of my test run: error.log.

And here's the scratch dir: http://lst.science.ru.nl/~proycon/scratchtest.tar.gz

@proycon
Copy link
Member Author

proycon commented May 2, 2019

Ok, the exact same error now turns up for oral_history as well, despite that not having changed, so the problem isn't even eng_ASR as such. Something may have gone wrong in the underlying kaldi installation. I'll force another update with recompilation of kaldi.

@proycon
Copy link
Member Author

proycon commented May 2, 2019

I recompiled kaldi but the issue persists.

@proycon
Copy link
Member Author

proycon commented May 2, 2019

I'm following a lead now, from two of the logs:

ompute-mfcc-feats --verbose=2 --config=/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/mfcc_hires.conf ark:- ark:- | copy-feats --compress=true ark:- ark,scp:/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/mfcc/raw_mfcc_ALL.1.ark,/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/mfcc/raw_mfcc_ALL.1.scp                                                                                   # Started at Thu May  2 14:12:15 CEST 2019
#
bash: line 1: extract-segments: command not found
bash: line 1: compute-mfcc-feats: command not found
bash: line 1: copy-feats: command not found
# online2-wav-nnet3-latgen-faster --do-endpointing=false --frames-per-chunk=20 --extra-left-context-initial=0 --online=true --frame-subsampling-factor=3 --config=/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/tmp/conf/online.
conf --min-active=200 --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=models/AM/online/graph/words.txt /var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/tmp/final.mdl models/AM/online/graph/HCLG.fst ark:/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/data/ALL/split1/1/spk2utt "ark,s,cs:extract-segments scp,p:/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/data/ALL/split1/1/wav.scp /var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/data/ALL/split1/1/segments ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/tmp/tmp.FNFnIGfsME/lat.1.gz"                                                                                                                                                                                           # Started at Thu May  2 14:12:15 CEST 2019
#
bash: line 1: online2-wav-nnet3-latgen-faster: command not found
# Accounting: time=0 threads=1
# Ended (code 127) at Thu May  2 14:12:15 CEST 2019, elapsed time 0 seconds

Next question is what provides these programs and why can't they be found?

@schemreier
Copy link
Collaborator

I just sent an email.

@proycon
Copy link
Member Author

proycon commented May 2, 2019

Ah great, I see, we found the problem at the same time then :) I'll investigate why they're not in $PATH

@schemreier
Copy link
Collaborator

Awesome! :)

@proycon
Copy link
Member Author

proycon commented May 2, 2019

Ok, it seems they were never in $PATH in LaMachine so I now wonder why it used to work before. Also, kaldi is a bit chaotic as they have a whole bunch of bin/ dirs and it doesn't have a proper installation script:

kaldi/src (weblamachine) $ ls -d *bin/                                                                                                                                          
bin/  chainbin/  featbin/  fgmmbin/  fstbin/  gmmbin/  ivectorbin/  kwsbin/  latbin/  lmbin/  nnet2bin/  nnet3bin/  nnetbin/  online2bin/  onlinebin/  rnnlmbin/  sgmm2bin/  tfrnnlmbin/

Now I could of course simply add all of these to $PATH (I'm assume there is no conflict in names then). Shall we do that or do you have a better suggestion? (I know there are some local env.sh and path.sh scripts in the resources that perhaps assume this role?)

@proycon
Copy link
Member Author

proycon commented May 2, 2019

Ah, I found the problem in your path.sh (I assume this gets executed?). Here you set the environment but you use an absolute path, which you can't do, you'll have to let LaMachine set KALDI_ROOT and not overwrite it.

export KALDI_ROOT=/home/eyilmaz/main/kaldi
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/src/bin:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/sctk/bin:$KALDI_ROOT/src/fstbin/:$KALDI_ROOT/src/gmmbin/:$KALDI_ROOT/src/featbin/:$KALDI_ROOT/src/lm/:$KALDI_ROOT/src/sgmmbin/:$KALDI_ROOT/src/sgmm2bin/:$KALDI_ROOT/src/fgmmbin/:$KALDI_ROOT/src/latbin/:$KALDI_ROOT/src/nnetbin:$KALDI_ROOT/src/nnet2bin/:$KALDI_ROOT/src/kwsbin:$KALDI_ROOT/src/online2bin/:$KALDI_ROOT/src/ivectorbin/:$KALDI_ROOT/src/lmbin/:$KALDI_ROOT/src/nnet3bin/:$PWD:$PATH
export LC_ALL=C

@proycon
Copy link
Member Author

proycon commented May 2, 2019

The oral history webservice doesn't have that $PATH problem but actually fails on something else:

ERROR (online2-wav-nnet3-latgen-faster[5.5.221~1-19721]:ReadConfigFile():parse-options.cc:469) Cannot open config file: /vol/customopt/kaldi/egs/Kaldi_NL/Models/NL/UTwente/HMI/AM/CGN_all/nnet3_online/tdnn/v1.0/conf/mfcc.conf

The hard link to /vol/customopt/kaldi/ is the problem there, but that also explains why it fails, as I moved that dir away as I was under the impression we all use the LaMachine kaldi now. I'll move it back and that hopefully patches oral_history for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants