SMOOTHENING OF CONCATENATIVE GENERATED SPEECH by Sarpreet Kaur Gill
Speech Synthesis is the procedure of changing texture to speech signal. Basically, it is an artificial production of human speech. There are mainly three methods used to build Text To Speech (TTS) systems, namely, Articulatory, Formant and Concatenate synthesis. One of the main problems is the occurrence of various discontinuities among two concatenative samples. When the connection between various speech samples is well audible, it deals with the gapping scenario. The speech which is synthesized can work very genuine if the discontinuities at the points of the concatenation are noiseless. But the problem arises when these concatenations of the joins are clearly audible, their presence can create problems to the auditor and it deals with the reduction of the overall quality measure for the synthesis of speech samples. Databases containing longer discourse units and the assortment of yield is restricted there will be lesser connection focuses. In frameworks where discourse is made by consolidating substantial number of littler discourse units, there is an expansion in the quantity of joins and subsequently discontinuities. So our proposed system deals with the hybrid filtration with hidden markov model for the speech synthesis. The filtration process deals with the beam forming approach which will filter out the high frequency components and will eliminate it from the signal. Then HMM will train the system and as a result of which we will get the transition and estimation probabilities. The evaluation of the output speech signal is done on the basis of two parameters Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR). For smoothen signal the value of Mean Square Error should be low and the value of Peak Signal to Noise Ratio should be high.